I think it is almost always assumed that superintelligent artificial intelligence (SAI) disempowering humans would be bad, but are we confident about that? Is this an under-discussed crucial consideration?
Most people (including me) would prefer the extinction of a random species to that of humans. I suppose this is mostly due to a desire for self-preservation, but can also be justified on altruistic grounds if humans have a greater ability to shape the future for the better. However, a priori, would it be reasonable to assume that more intelligent agents would do better than humans, at least under moral realism? If not, can one be confident that humans would do better than other species?
From the point of view of the universe, I believe one should strive to align SAI with impartial value, not human value. It is unclear to me how much these differ, but one should beware of surprising and suspicious convergence.
In any case, I do not think this shift in focus means humanity should accelerate AI progress (as proposed by effective accelerationism?). Intuitively, aligning SAI with impartial value is a harder problem, and therefore needs even more time to be solved.
Historically the field has been focused on impartiality/âcosmopolitanism/âpluralism, and thereâs a rough consensus that âhuman value is a decent bet at proxying impartial value, with respect to the distribution of possible valuesâ that falls from fragility/âcomplexity of value. I.e., many of us suspect that embracing a random draw from the distribution over utility functions leads to worse performance at impartiality than human values.
I do recommend âtry to characterize a good successor criterionâ (as per Paulâs framing of the problem) as an exercise, I found thinking through it very rewarding. Iâve definitely taken seriously thought processes like âstop being tribalist, your values arenât better than paperclipsâ, so I feel like Iâm thinking clearly about the class of mistakes the latest crop of accelerationists may be making.
Even though weâre vaguing over any particular specification language that expresses values and so on, I suspect that a view of this based on descriptive complexity is robust to moral uncertainty, at the very least to perturbation in choice of utilitarianism flavor.
Thanks, quinn!
Could you provide evidence for these?
I am not familiar with the posts you link to, but it looks like they focus on human values (emphasis mine):
Most people would consider a universe filled with paperclips pretty bad, but I think it can plausibly be good as long as the superintelligent AI is having a good time turning everything into paperclips.
Thanks for sharing! Paul concludes that:
Why would an unaligned AI have lower option value than an aligned one? I agree with Paul that:
cosmopolitanism: I have a roundup of links here. I think your concerns are best discussed in the Arbital article on generalized cosmopolitanism:
Re the fragility/âcomplexity link:
My view after reading is that âhumanâ is a shorthand that isnât doubled down on throughout. Fun theory especially characterizes a sense of whatâs at stake when we have values that can at least entertain the idea of pluralism (as opposed to values that donât hesitate to create extreme lock-in scenarios), and âhuman valueâ is sort of a first term approximate proxy of a detailed understanding of that.
This is a niche and extreme flavor of utilitarianism and I wouldnât expect itâs conclusions to be robust to moral uncertainty.
But itâs a nice question that identifies cruxes in metaethics.
I think this is just a combination of taking lock-in actions that we canât undo seriously along with forecasts about how aggressively a random draw from values can be expected to lock in.
Thanks!
I have just remembered the post AGI and lock-in is relevant to this discussion.
Busy, will come back over the next few days.
Hereâs my just-so story for how humans evolved impartial altruism by going through several particular steps:
First there was kin selection evolving for particular reasons related to how DNA is passed on. This selects for the precursors to altruism.
With ability to recognise individual characteristics and a long-term memory allowing you to keep track of them, species can evolve stable pairwise reputations.
This allows reciprocity to evolve on top of kin selection, because reputations allow you to keep track of whoâs likely to reciprocate vs defect.
More advanced communication allows larger groups to rapidly synchronise reputations. Precursors of this include âeavesdroppingâ, âtriadic awarenessâ,[1] all the way up to what we know as âgossipâ.
This leads to indirect reciprocity. So when you cheat one person, it affects everybodyâs willingness to trade with you.
Thereâs some kind of inertia to the proxies human brains generalise on. This seems to be a combination of memetic evolution plus specific facts about how brains generalise very fast.
If altruistic reputation is a stable proxy for long enough, the meme stays in social equilibrium even past the point where it benefits individual genetic fitness.
In sum, I think impartial altruism (e.g. EA) is the result of âovergeneralisingâ the notion of indirect reciprocity, such that you end up wanting to help everybody everywhere.[2] And Iâm skeptical a randomly drawn AI will meet the same requirements for that to happen to them.
âWhite-faced capuchin monkeys show triadic awareness in their choice of alliesâ:
You can get allies by being nice, but not unless youâre also dominant.
For me, itâs not primarily about human values. Itâs about altruistic values. Whatever anything cares about, I care about that in proportion to how much they care about it.
Thanks for that story!
I think 100 % alignement with human values would be better than random values, but superintelligent AI would presumably be trained on human data, so it would be aligned with human values somewhat. I also wonder about the extent to which the values of the superintelligent AI could change, hopefully for the better (as human values have).
I also have specific just-so stories for why human values have changed for âmoral circle expansionâ over time, and Iâm not optimistic that process will continue indefinitely unless intervened on.
Anyway, these are important questions!
From Nate Soaresâ post Cosmopolitan values donât come free: