I haven’t explored the debate over AI risk in the EA movement in depth, so I’m not informed enough to take a strong position. But Kosta’s comment gets at one of the things that has puzzled me—as basically an interested outsider—about the concern for x-risk in EA. A very strong fear of human extinction seems to treat humanity as innately important. But in a hedonic utilitarian framework, humanity is only contingently important to the extent that the continuation of humanity improves overall utility. If an AI or AIs could improve overall utility by destroying humanity (perhaps after determining that humans feel more suffering than pleasure overall, or that humans cause more suffering than pleasure overall, or that AIs feel more pleasure and less suffering than humans and so should use all space and resources to sustain as many AIs as possible), then hedonic utilitarians (and EAs, to the extent that they are hedonic utilitarians) should presumably want AIs to do this.
I’m sure there are arguments that an AI that destroys humanity would end up lowering utility, but I don’t get the impression that x-risk-centered EAs only oppose the destruction of humanity if it turns out humanity adds more pleasure to the aggregate. I would have expected to see EAs arguing something more like, “Let’s make sure an AI only destroys us if destroying us turns out to raise aggregate good,” but instead the x-risk EAs seem to be saying something more like, “Let’s make sure an AI doesn’t destroy us.”
But maybe the x-risk-centered EAs aren’t hedonic utilitarians, or they mostly tend to think an AI destroying humanity would lower overall utility and that’s why they oppose it, or there’s something else that I’m missing – which is probably the case, since I haven’t investigated the debate in detail.
Cautious support of giving an AI control is not opposed to x-risk reduction. Reduction of x-risk is defined as curtailing the potential of Earth-originating life. Turning civ over to AIs or ems might be inevitable, but would still be safety-critical.
A non-careful transition to AI is bad for utilitarians and many others because of its irreversibility. Once you codify values (a definition of happiness and whatever else) in an AI, they’re stuck, unless you’ve programmed the AI a way for it to reflect on its values. When combined with Bostrom’s argument in Astronomical Waste, that the eventual awesomeness of a technologically mature civilisation is more important than when it is achived, this gives a strong reason for caution.
I forgot to mention that your post did help to clarify points and alleviate some of my confusion. Particularly the idea that an ultra-powerful AI tool (which may or may not be sentient) “would still permit one human to wield power over all others.”
The hypothetical of an AI wiping out all of humanity because it figures out (or thinks it figures out) that it will increase overall utility by doing so is just one extreme possibility. There must be a lot of credible seeming scenarios opposed to this one in which an AI could be used to increase overall suffering. (Unless the assumption was that a super intelligent being or device couldn’t help but come around to a utilitarian perspective, no matter how it was initially programmed!)
Also, like Scott Alexander wrote on his post about this, x-risk reduction is not all about AI.
Still, from a utilitarian perspective, it seems like talking about “AI friendliness” should mean friendliness to overall utility, which won’t automatically mean friendliness to humanity or human rights. But again, I imagine plenty of EAs do make that distinction, and I’m just not aware of it because I haven’t looked that far into it. And anyway, that’s not a critique of AI risk being a concern for EAs; at most, it’s a critique of some instances of rhetoric.
Brian Tomasik is ab self-described “negative-leaning” hedonic utilitarian who is a prominent thinker for effective altruism. He’s written about how humanity might have values which lead us to generating much suffering in the future, but also worries a machine superintelligence might end up doing the same. They’re myriad reasons he thinks this I can’t do justice to here. I believe right now he thinks the best course of action is to try steering values of present-day humanity, as much of it or as crucially an influential subset as possible, towards neglecting suffering less. He also believes doing foundational research into ascertaining better the chances of a singleton to promulgate suffering throughout space in the future. To this end he both does research with and funds colleagues at the Foundational Research Institute.
His whole body of work concerning future suffering is referred to as “astronomical suffering” considerations, sort of complementary utilitarian consideration to Dr
Bostrom’s astronomical waste argument. You can read more of Mr.
Tomasik’s work on the far future and related topics here. Note some of it is advanced and may require you to read beforehand to understand all premises in some of his essays, but he also usually provides citations for all this.
If you’re a hedonic utilitarian, you might retain some uncertainty over this, and think it’s best to at least hold off on destroying humanity for a while out of deference to other moral theories, and because of the option value.
Even if someone took the view you describe, though, it’s not clear that it would be a helpful one to communicate, because talking about “AI destroying humanity” does a good job of successfully communicating concern about the scenarios you’re worried about (where AI destroys humanity without this being a good outcome) to other people. As the exceptions are things people generally won’t even think of, caveating might well cause more confusion than clarity.
An ‘option value’ argument assumes that (a) the AI wouldn’t take that uncertainty into account and (b) the AI wouldn’t be able to recreate humanity at some later point if it decided that this was in fact the correct maximisation course. Even if it set us back by fully 10,000 years (very roughly the time from the dawn of civilisation up to now) it wouldn’t be obviously that bad in the long run. Indeed, for all we know this could have already happened...
In other words, in the context of an ultra-powerful ultra-well-resourced ultra-smart AI, there are few things in this world that are truly irreversible, and I see little need to give special ‘option value’ to humanity’s or even civilisation’s existence.
Agree with the rest of your post re. rhetoric, and that’s generally what I’ve assumed is going on here when this has puzzled me also.
Agree with this. I was being a bit vague about what the option value was, but I was thinking of something like the value of not locking in a value set that on reflection we would disagree with. I think this covers some but not all of the scenarios Rhys was discussing.
I haven’t explored the debate over AI risk in the EA movement in depth, so I’m not informed enough to take a strong position. But Kosta’s comment gets at one of the things that has puzzled me—as basically an interested outsider—about the concern for x-risk in EA. A very strong fear of human extinction seems to treat humanity as innately important. But in a hedonic utilitarian framework, humanity is only contingently important to the extent that the continuation of humanity improves overall utility. If an AI or AIs could improve overall utility by destroying humanity (perhaps after determining that humans feel more suffering than pleasure overall, or that humans cause more suffering than pleasure overall, or that AIs feel more pleasure and less suffering than humans and so should use all space and resources to sustain as many AIs as possible), then hedonic utilitarians (and EAs, to the extent that they are hedonic utilitarians) should presumably want AIs to do this.
I’m sure there are arguments that an AI that destroys humanity would end up lowering utility, but I don’t get the impression that x-risk-centered EAs only oppose the destruction of humanity if it turns out humanity adds more pleasure to the aggregate. I would have expected to see EAs arguing something more like, “Let’s make sure an AI only destroys us if destroying us turns out to raise aggregate good,” but instead the x-risk EAs seem to be saying something more like, “Let’s make sure an AI doesn’t destroy us.”
But maybe the x-risk-centered EAs aren’t hedonic utilitarians, or they mostly tend to think an AI destroying humanity would lower overall utility and that’s why they oppose it, or there’s something else that I’m missing – which is probably the case, since I haven’t investigated the debate in detail.
Cautious support of giving an AI control is not opposed to x-risk reduction. Reduction of x-risk is defined as curtailing the potential of Earth-originating life. Turning civ over to AIs or ems might be inevitable, but would still be safety-critical.
A non-careful transition to AI is bad for utilitarians and many others because of its irreversibility. Once you codify values (a definition of happiness and whatever else) in an AI, they’re stuck, unless you’ve programmed the AI a way for it to reflect on its values. When combined with Bostrom’s argument in Astronomical Waste, that the eventual awesomeness of a technologically mature civilisation is more important than when it is achived, this gives a strong reason for caution.
I forgot to mention that your post did help to clarify points and alleviate some of my confusion. Particularly the idea that an ultra-powerful AI tool (which may or may not be sentient) “would still permit one human to wield power over all others.”
The hypothetical of an AI wiping out all of humanity because it figures out (or thinks it figures out) that it will increase overall utility by doing so is just one extreme possibility. There must be a lot of credible seeming scenarios opposed to this one in which an AI could be used to increase overall suffering. (Unless the assumption was that a super intelligent being or device couldn’t help but come around to a utilitarian perspective, no matter how it was initially programmed!)
Also, like Scott Alexander wrote on his post about this, x-risk reduction is not all about AI.
Still, from a utilitarian perspective, it seems like talking about “AI friendliness” should mean friendliness to overall utility, which won’t automatically mean friendliness to humanity or human rights. But again, I imagine plenty of EAs do make that distinction, and I’m just not aware of it because I haven’t looked that far into it. And anyway, that’s not a critique of AI risk being a concern for EAs; at most, it’s a critique of some instances of rhetoric.
Brian Tomasik is ab self-described “negative-leaning” hedonic utilitarian who is a prominent thinker for effective altruism. He’s written about how humanity might have values which lead us to generating much suffering in the future, but also worries a machine superintelligence might end up doing the same. They’re myriad reasons he thinks this I can’t do justice to here. I believe right now he thinks the best course of action is to try steering values of present-day humanity, as much of it or as crucially an influential subset as possible, towards neglecting suffering less. He also believes doing foundational research into ascertaining better the chances of a singleton to promulgate suffering throughout space in the future. To this end he both does research with and funds colleagues at the Foundational Research Institute.
His whole body of work concerning future suffering is referred to as “astronomical suffering” considerations, sort of complementary utilitarian consideration to Dr Bostrom’s astronomical waste argument. You can read more of Mr. Tomasik’s work on the far future and related topics here. Note some of it is advanced and may require you to read beforehand to understand all premises in some of his essays, but he also usually provides citations for all this.
Worth noting that the negative-learning position is pretty fringe though, especially in mainstream philosophy. Personally, I avoid it.
If you’re a hedonic utilitarian, you might retain some uncertainty over this, and think it’s best to at least hold off on destroying humanity for a while out of deference to other moral theories, and because of the option value.
Even if someone took the view you describe, though, it’s not clear that it would be a helpful one to communicate, because talking about “AI destroying humanity” does a good job of successfully communicating concern about the scenarios you’re worried about (where AI destroys humanity without this being a good outcome) to other people. As the exceptions are things people generally won’t even think of, caveating might well cause more confusion than clarity.
An ‘option value’ argument assumes that (a) the AI wouldn’t take that uncertainty into account and (b) the AI wouldn’t be able to recreate humanity at some later point if it decided that this was in fact the correct maximisation course. Even if it set us back by fully 10,000 years (very roughly the time from the dawn of civilisation up to now) it wouldn’t be obviously that bad in the long run. Indeed, for all we know this could have already happened...
In other words, in the context of an ultra-powerful ultra-well-resourced ultra-smart AI, there are few things in this world that are truly irreversible, and I see little need to give special ‘option value’ to humanity’s or even civilisation’s existence.
Agree with the rest of your post re. rhetoric, and that’s generally what I’ve assumed is going on here when this has puzzled me also.
Agree with this. I was being a bit vague about what the option value was, but I was thinking of something like the value of not locking in a value set that on reflection we would disagree with. I think this covers some but not all of the scenarios Rhys was discussing.