I think it is almost always assumed that superintelligent artificial intelligence (SAI) disempowering humans would be bad, but are we confident about that? Is this an under-discussed crucial consideration?
Most people (including me) would prefer the extinction of a random species to that of humans. I suppose this is mostly due to a desire for self-preservation, but can also be justified on altruistic grounds if humans have a greater ability to shape the future for the better. However, a priori, would it be reasonable to assume that more intelligent agents would do better than humans, at least under moral realism? If not, can one be confident that humans would do better than other species?
From the point of view of the universe, I believe one should strive to align SAI with impartial value, not human value. It is unclear to me how much these differ, but one should beware of surprising and suspicious convergence.
In any case, I do not think this shift in focus means humanity should accelerate AI progress (as proposed by effective accelerationism?). Intuitively, aligning SAI with impartial value is a harder problem, and therefore needs even more time to be solved.
Historically the field has been focused on impartiality/cosmopolitanism/pluralism, and there’s a rough consensus that “human value is a decent bet at proxying impartial value, with respect to the distribution of possible values” that falls from fragility/complexity of value. I.e., many of us suspect that embracing a random draw from the distribution over utility functions leads to worse performance at impartiality than human values.
I do recommend “try to characterize a good successor criterion” (as per Paul’s framing of the problem) as an exercise, I found thinking through it very rewarding. I’ve definitely taken seriously thought processes like “stop being tribalist, your values aren’t better than paperclips”, so I feel like I’m thinking clearly about the class of mistakes the latest crop of accelerationists may be making.
Even though we’re vaguing over any particular specification language that expresses values and so on, I suspect that a view of this based on descriptive complexity is robust to moral uncertainty, at the very least to perturbation in choice of utilitarianism flavor.
Thanks, quinn!
Could you provide evidence for these?
I am not familiar with the posts you link to, but it looks like they focus on human values (emphasis mine):
Most people would consider a universe filled with paperclips pretty bad, but I think it can plausibly be good as long as the superintelligent AI is having a good time turning everything into paperclips.
Thanks for sharing! Paul concludes that:
Why would an unaligned AI have lower option value than an aligned one? I agree with Paul that:
cosmopolitanism: I have a roundup of links here. I think your concerns are best discussed in the Arbital article on generalized cosmopolitanism:
Re the fragility/complexity link:
My view after reading is that “human” is a shorthand that isn’t doubled down on throughout. Fun theory especially characterizes a sense of what’s at stake when we have values that can at least entertain the idea of pluralism (as opposed to values that don’t hesitate to create extreme lock-in scenarios), and “human value” is sort of a first term approximate proxy of a detailed understanding of that.
This is a niche and extreme flavor of utilitarianism and I wouldn’t expect it’s conclusions to be robust to moral uncertainty.
But it’s a nice question that identifies cruxes in metaethics.
I think this is just a combination of taking lock-in actions that we can’t undo seriously along with forecasts about how aggressively a random draw from values can be expected to lock in.
Thanks!
I have just remembered the post AGI and lock-in is relevant to this discussion.
Busy, will come back over the next few days.
Here’s my just-so story for how humans evolved impartial altruism by going through several particular steps:
First there was kin selection evolving for particular reasons related to how DNA is passed on. This selects for the precursors to altruism.
With ability to recognise individual characteristics and a long-term memory allowing you to keep track of them, species can evolve stable pairwise reputations.
This allows reciprocity to evolve on top of kin selection, because reputations allow you to keep track of who’s likely to reciprocate vs defect.
More advanced communication allows larger groups to rapidly synchronise reputations. Precursors of this include “eavesdropping”, “triadic awareness”,[1] all the way up to what we know as “gossip”.
This leads to indirect reciprocity. So when you cheat one person, it affects everybody’s willingness to trade with you.
There’s some kind of inertia to the proxies human brains generalise on. This seems to be a combination of memetic evolution plus specific facts about how brains generalise very fast.
If altruistic reputation is a stable proxy for long enough, the meme stays in social equilibrium even past the point where it benefits individual genetic fitness.
In sum, I think impartial altruism (e.g. EA) is the result of “overgeneralising” the notion of indirect reciprocity, such that you end up wanting to help everybody everywhere.[2] And I’m skeptical a randomly drawn AI will meet the same requirements for that to happen to them.
“White-faced capuchin monkeys show triadic awareness in their choice of allies”:
You can get allies by being nice, but not unless you’re also dominant.
For me, it’s not primarily about human values. It’s about altruistic values. Whatever anything cares about, I care about that in proportion to how much they care about it.
Thanks for that story!
I think 100 % alignement with human values would be better than random values, but superintelligent AI would presumably be trained on human data, so it would be aligned with human values somewhat. I also wonder about the extent to which the values of the superintelligent AI could change, hopefully for the better (as human values have).
I also have specific just-so stories for why human values have changed for “moral circle expansion” over time, and I’m not optimistic that process will continue indefinitely unless intervened on.
Anyway, these are important questions!
From Nate Soares’ post Cosmopolitan values don’t come free: