And if they don’t then I’m probably wrong about it being important.
I’m not sure what you mean by “wrong” here. :) Maybe you place a lot of value on the values that would be reached by a collective of smart, rational people thinking about the issues for a long time, and your current values are just best guesses of what this idealized group of people would arrive at? (assuming, unrealistically, that there would be a single unique output of that idealization process across a broad range of parameter settings)
For people who hold very general values of caring what other smart, rational people would care about, values-spreading seems far less promising. In contrast, if—in light of the utter arbitrariness of values—you care more about whatever random values you happen to feel now based on your genetic and environmental background, values-spreading seems more appealing.
People are negative utilitarians because the worst possible suffering outweighs the best possible happiness in humans (and probably in all sentient animals), but this is likely untrue over the space of all possible minds. If we could modify humans to experience happiness equal to their capacity for suffering, they should choose, for example, 2 seconds of extreme happiness plus 1 second of extreme suffering rather than none of either.
And if we could modify humans to recognize just how amazing paperclips are, they should choose, for example, 2 paperclips plus 1 second of extreme suffering rather than none of either.
However, it seems considerably more likely that we will go extinct than that we will get locked in to values that are bad but not bad in a way that kills all humans.
I’m curious to know your probabilities of these outcomes. If the chance of extinction (including by uncontrolled AI) in the next century is 20%, and if human-level AI arrives in the next century, then the chance of human-controlled AI would be 80%. Within that 80%, I personally would put most of the probability mass on AIs that favor particular values or give most of the decision-making power to particular groups of people. (Indeed, this has been the trend throughout human history and up to the present. Even in democracies, wealthy people have far more power than ordinary citizens.)
I’m fairly confident that hedonistic utilitarianism is true (for some sense of “true”). Much of my confidence comes from the observation that people’s objections to utilitarianism play into well-known cognitive biases, and if these biases were removed, I’d expect more people to agree with me. If they didn’t agree with me even if they didn’t have these biases, that would be grounds for questioning my confidence in utilitarianism.
I think there’s a difference between modifying people to be able to experience more happiness and modifying them to believe paperclips are great. The former modifies an experience and lets people’s preferences arise naturally; the latter modifies preferences directly, so we can’t trust that their preferences reflect what’s actually good for them. Of course, preferences that arise naturally don’t always reflect what’s good for people either, but they do tend in that direction.
Within that 80%, I personally would put most of the probability mass on AIs that favor particular values or give most of the decision-making power to particular groups of people.
I hadn’t considered this as a particularly likely possibility. If you’ll allow me to go up one meta level, this sort of argument is why I prefer to be more epistemically modest about far-future concerns, and why I wish more people would be more modest. This argument you’ve just made had not occurred to me during the many hours of thinking and discussion I’ve already conducted, and it seems plausible that a nontrivial portion of the probability mass of the far future falls on “a small group of people get control of everything and optimize the world for their own benefit.” The existence of this argument, and the fact that I hadn’t considered it before, makes me uncertain about my own ability to reason about the expected value of the far future.
One man’s bias is another’s intrinsic value, at least for “normative” biases like scope insensitivity, status-quo bias, and failure to aggregate. But at least I understand your meaning better. :) Most of LessWrong is not hedonistic utilitarian (most people there are more preference utilitarian or complexity-of-value consequentialist), so one might wonder why other people who think a lot about overcoming those normative biases aren’t hedonistic utilitarians.
Of course, one could give people the experience of having grown up in a culture that valued paperclips, of meeting the Great Paperclip in the Sky and hearing him tell them that paperclips are the meaning of life, and so on. These might “naturally” incline people to intrinsically value paperclips. But I agree there seem to be some differences between this case and the pleasure case.
I’m glad that comment was useful. :) I think it’s unfortunate that it’s so often assumed that “human-controlled AI” means something like CEV, when in fact CEV seems to me a remote possibility.
I don’t know that you should downshift your ability to reason about the far future that much. :) Over time you’ll hear more and more perspectives, which can help challenge previous assumptions.
Great post!
I’m not sure what you mean by “wrong” here. :) Maybe you place a lot of value on the values that would be reached by a collective of smart, rational people thinking about the issues for a long time, and your current values are just best guesses of what this idealized group of people would arrive at? (assuming, unrealistically, that there would be a single unique output of that idealization process across a broad range of parameter settings)
For people who hold very general values of caring what other smart, rational people would care about, values-spreading seems far less promising. In contrast, if—in light of the utter arbitrariness of values—you care more about whatever random values you happen to feel now based on your genetic and environmental background, values-spreading seems more appealing.
And if we could modify humans to recognize just how amazing paperclips are, they should choose, for example, 2 paperclips plus 1 second of extreme suffering rather than none of either.
I’m curious to know your probabilities of these outcomes. If the chance of extinction (including by uncontrolled AI) in the next century is 20%, and if human-level AI arrives in the next century, then the chance of human-controlled AI would be 80%. Within that 80%, I personally would put most of the probability mass on AIs that favor particular values or give most of the decision-making power to particular groups of people. (Indeed, this has been the trend throughout human history and up to the present. Even in democracies, wealthy people have far more power than ordinary citizens.)
Addressing each of your comments in turn:
I’m fairly confident that hedonistic utilitarianism is true (for some sense of “true”). Much of my confidence comes from the observation that people’s objections to utilitarianism play into well-known cognitive biases, and if these biases were removed, I’d expect more people to agree with me. If they didn’t agree with me even if they didn’t have these biases, that would be grounds for questioning my confidence in utilitarianism.
I think there’s a difference between modifying people to be able to experience more happiness and modifying them to believe paperclips are great. The former modifies an experience and lets people’s preferences arise naturally; the latter modifies preferences directly, so we can’t trust that their preferences reflect what’s actually good for them. Of course, preferences that arise naturally don’t always reflect what’s good for people either, but they do tend in that direction.
I hadn’t considered this as a particularly likely possibility. If you’ll allow me to go up one meta level, this sort of argument is why I prefer to be more epistemically modest about far-future concerns, and why I wish more people would be more modest. This argument you’ve just made had not occurred to me during the many hours of thinking and discussion I’ve already conducted, and it seems plausible that a nontrivial portion of the probability mass of the far future falls on “a small group of people get control of everything and optimize the world for their own benefit.” The existence of this argument, and the fact that I hadn’t considered it before, makes me uncertain about my own ability to reason about the expected value of the far future.
Thanks!
One man’s bias is another’s intrinsic value, at least for “normative” biases like scope insensitivity, status-quo bias, and failure to aggregate. But at least I understand your meaning better. :) Most of LessWrong is not hedonistic utilitarian (most people there are more preference utilitarian or complexity-of-value consequentialist), so one might wonder why other people who think a lot about overcoming those normative biases aren’t hedonistic utilitarians.
Of course, one could give people the experience of having grown up in a culture that valued paperclips, of meeting the Great Paperclip in the Sky and hearing him tell them that paperclips are the meaning of life, and so on. These might “naturally” incline people to intrinsically value paperclips. But I agree there seem to be some differences between this case and the pleasure case.
I’m glad that comment was useful. :) I think it’s unfortunate that it’s so often assumed that “human-controlled AI” means something like CEV, when in fact CEV seems to me a remote possibility. I don’t know that you should downshift your ability to reason about the far future that much. :) Over time you’ll hear more and more perspectives, which can help challenge previous assumptions.
Simple: just because LessWrongers know that these biases exist doesn’t mean they’re immune to them.
It was already pretty low, this is just an example of why I think it should be low.