Historically the field has been focused on impartiality/cosmopolitanism/pluralism, and there’s a rough consensus that “human value is a decent bet at proxying impartial value, with respect to the distribution of possible values” that falls from fragility/complexity of value. I.e., many of us suspect that embracing a random draw from the distribution over utility functions leads to worse performance at impartiality than human values.
I do recommend “try to characterize a good successor criterion” (as per Paul’s framing of the problem) as an exercise, I found thinking through it very rewarding. I’ve definitely taken seriously thought processes like “stop being tribalist, your values aren’t better than paperclips”, so I feel like I’m thinking clearly about the class of mistakes the latest crop of accelerationists may be making.
Even though we’re vaguing over any particular specification language that expresses values and so on, I suspect that a view of this based on descriptive complexity is robust to moral uncertainty, at the very least to perturbation in choice of utilitarianism flavor.
Historically the field has been focused on impartiality/cosmopolitanism/pluralism, and there’s a rough consensus that “human value is a decent bet at proxying impartial value
Could you provide evidence for these?
there’s a rough consensus that “human value is a decent bet at proxying impartial value, with respect to the distribution of possible values” that falls from fragility/complexity of value
I am not familiar with the posts you link to, but it looks like they focus on human values (emphasis mine):
Complexity of value is the thesis that human values have high Kolmogorov complexity; that ourpreferences, the things we care about, cannot be summed by a few simple rules, or compressed. Fragility of value is the thesis that losing even a small part of the rules that make up our values could lead to results that most of us would now consider as unacceptable (just like dialing nine out of ten phone digits correctly does not connect you to a person 90% similar to your friend).
Most people would consider a universe filled with paperclips pretty bad, but I think it can plausibly be good as long as the superintelligent AI is having a good time turning everything into paperclips.
I do recommend “try to characterize a good successor criterion” (as per Paul’s framing of the problem) as an exercise, I found thinking through it very rewarding. I’ve definitely taken seriously thought processes like “stop being tribalist, your values aren’t better than paperclips”, so I feel like I’m thinking clearly about the class of mistakes the latest crop of accelerationists may be making.
Even if we knew how to build an unaligned AI that is probably a good successor, I still think we should strongly prefer to build aligned AGI. The basic reason is option value: if we build an aligned AGI, we keep all of our options open, and can spend more time thinking before making any irreversible decision.
Why would an unaligned AI have lower option value than an aligned one? I agree with Paul that:
Overall, I think the question “which AIs are good successors?” is both neglected and time-sensitive, and is my best guess for the highest impact question in moral philosophy right now [25th May 2018].
cosmopolitanism: I have a roundup of links here. I think your concerns are best discussed in the Arbital article on generalized cosmopolitanism:
“We are cosmopolitans! We also grew up reading science fiction about aliens that turned out to have their own perspectives, and AIs willing to extend a hand in friendship but being mistreated by carbon chaunivists! We’d be fine with a weird and wonderful intergalactic civilization full of non-organic beings appreciating their own daily life in ways we wouldn’t understand. But paperclip maximizersdon’t do that! We predict that if you got to see the use a paperclip maximizer would make of the cosmic endowment, if you really understood what was going on inside that universe, you’d be as horrified as we are. You and I have a difference of empirical predictions about the consequences of running a paperclip maximizer, not a values difference about how far to widen the circle of concern.”
Re the fragility/complexity link:
I am not familiar with the posts you link to, but it looks like they focus on human values (emphasis mine):
My view after reading is that “human” is a shorthand that isn’t doubled down on throughout. Fun theory especially characterizes a sense of what’s at stake when we have values that can at least entertain the idea of pluralism (as opposed to values that don’t hesitate to create extreme lock-in scenarios), and “human value” is sort of a first term approximate proxy of a detailed understanding of that.
Most people would consider a universe filled with paperclips pretty bad, but I think it can plausibly be good as long as the superintelligent AI is having a good time turning everything into paperclips.
This is a niche and extreme flavor of utilitarianism and I wouldn’t expect it’s conclusions to be robust to moral uncertainty.
But it’s a nice question that identifies cruxes in metaethics.
Why would an unaligned AI have lower option value than an aligned one?
I think this is just a combination of taking lock-in actions that we can’t undo seriously along with forecasts about how aggressively a random draw from values can be expected to lock in.
I think this is just a combination of taking lock-in actions that we can’t undo seriously along with forecasts about how aggressively a random draw from values can be expected to lock in.
I have just remembered the post AGI and lock-in is relevant to this discussion.
Historically the field has been focused on impartiality/cosmopolitanism/pluralism, and there’s a rough consensus that “human value is a decent bet at proxying impartial value, with respect to the distribution of possible values” that falls from fragility/complexity of value. I.e., many of us suspect that embracing a random draw from the distribution over utility functions leads to worse performance at impartiality than human values.
I do recommend “try to characterize a good successor criterion” (as per Paul’s framing of the problem) as an exercise, I found thinking through it very rewarding. I’ve definitely taken seriously thought processes like “stop being tribalist, your values aren’t better than paperclips”, so I feel like I’m thinking clearly about the class of mistakes the latest crop of accelerationists may be making.
Even though we’re vaguing over any particular specification language that expresses values and so on, I suspect that a view of this based on descriptive complexity is robust to moral uncertainty, at the very least to perturbation in choice of utilitarianism flavor.
Thanks, quinn!
Could you provide evidence for these?
I am not familiar with the posts you link to, but it looks like they focus on human values (emphasis mine):
Most people would consider a universe filled with paperclips pretty bad, but I think it can plausibly be good as long as the superintelligent AI is having a good time turning everything into paperclips.
Thanks for sharing! Paul concludes that:
Why would an unaligned AI have lower option value than an aligned one? I agree with Paul that:
cosmopolitanism: I have a roundup of links here. I think your concerns are best discussed in the Arbital article on generalized cosmopolitanism:
Re the fragility/complexity link:
My view after reading is that “human” is a shorthand that isn’t doubled down on throughout. Fun theory especially characterizes a sense of what’s at stake when we have values that can at least entertain the idea of pluralism (as opposed to values that don’t hesitate to create extreme lock-in scenarios), and “human value” is sort of a first term approximate proxy of a detailed understanding of that.
This is a niche and extreme flavor of utilitarianism and I wouldn’t expect it’s conclusions to be robust to moral uncertainty.
But it’s a nice question that identifies cruxes in metaethics.
I think this is just a combination of taking lock-in actions that we can’t undo seriously along with forecasts about how aggressively a random draw from values can be expected to lock in.
Thanks!
I have just remembered the post AGI and lock-in is relevant to this discussion.
Busy, will come back over the next few days.