I’m not aware of such summaries, but I’ll take a stab at it here:
Even though it’s possible for the expected disvalue of a very improbable outcome to be high if the outcome is sufficiently awful, the relatively large degree of investment in AI safety work by the EA community today would only make sense if the probability of AI-catalyzed GCR were decently high. This Open Phil post for example doesn’t frame this as a “yes it’s extremely unlikely, but the downsides could be massive, so in expectation it’s worth working on” cause; many EAs in general give estimates of a non-negligible probability of very bad AI outcomes. So, accordingly, AI is considered not only a viable cause to work on but indeed one of the top priorities.
But arguably the scenarios in which AGI becomes a catastrophic threat rely on a conjunction of several improbable assumptions. One of which is that general “intelligence” in the sense of a capacity to achieve goals on a global scale—rather than capacity merely to solve problems easily representable within e.g. a Markov decision process—is something that computers can develop without a long process of real world trial and error, or cooperation in the human economy. (If such a process is necessary, then humans should be able to stop potentially dangerous AIs in their tracks before they become too powerful.) The key takeaway from the essay as far as I found was that we should be cautious about using one definition of intelligence, i.e. the sort that deep RL algorithms have demonstrated in game settings, as grounds for predicting dangerous outcomes resulting from a much more difficult-to-automate sense of intelligence, namely ability to achieve goals in physical reality.
The actual essay is more subtle than this, of course, and I’d definitely encourage people to at least skim it before dismissing the weaker form of the argument I’ve sketched here. But I agree that the AI safety research community has a responsibility to make that connection between current deep learning “intelligence” and intelligence-as-power more explicit, otherwise it’s a big equivocation fallacy.
Thanks for the stab, Anthony. It’s fairly fair. :-)
Some clarifying points:
First, I should note that my piece was written from the perspective of suffering-focused ethics.
Second, I would not say that “investment in AI safety work by the EA community today would only make sense if the probability of AI-catalyzed GCR were decently high”. Even setting aside the question of what “decently high” means, I would note that:
1) Whether such investments in AI safety make sense depends in part on one’s values. (Though another critique I would make is that “AI safety” is less well-defined than people often seem to think: https://magnusvinding.com/2018/12/14/is-ai-alignment-possible/, but more on this below.)
2) Even if “the probability of AI-catalyzed GCR” were decently high — say, >2 percent — this would not imply that one should focus on “AI safety” in a standard narrow sense (roughly: constructing the right software), nor that other risks are not greater in expectation (compared to the risks we commonly have in mind when we think of “AI-catalyzed catastrophic risks”).
You write of “scenarios in which AGI becomes a catastrophic threat”. But a question I would raise is: what does this mean? Do we all have a clear picture of this in our minds? This sounds to me like a rather broad class of scenarios, and a worry I have is that we all have “poorly written software” scenarios in mind, although such scenarios could well comprise a relatively narrow subset of the entire class that is “catastrophic scenarios involving AI”.
Zooming out, my critique can be crudely summarized as a critique of two significant equivocations that I see doing an exceptional amount of work in many standard arguments for “prioritizing AI”.
First, there is what we may call the AI safety equivocation (or motte and bailey): people commonly fail to distinguish between 1) a focus on future outcomes controlled by AI and 2) a focus on writing “safe” software. Accepting that we should adopt the former focus by no means implies we should adopt the latter. By (imperfect) analogy, to say that we should focus on future outcomes controlled by humans does not imply that we should focus primarily on writing safe human genomes.
The second is what we may call the intelligence equivocation, which is the one you described. We operate with two very different senses of the term “intelligence”, namely 1) the ability to achieve goals in general (derived from Legg & Hutter, 2007), and 2) “intelligence” in the much narrower sense of “advanced cognitive abilities”, roughly equivalent to IQ in humans.
Intelligence2 lies all in the brain, whereas intelligence1 includes the brain and so much more, including all the rest of our well-adapted body parts (vocal cords, hands, upright walk — remove just one of these completely in all humans and human civilization is likely gone for good). Not to mention our culture and technology as a whole, which is the level at which our ability to achieve goals at a significant level really emerges: it derives not from any single advanced machine but from our entire economy. A vastly greater toolbox than what intelligence2 covers.
Thus, to assume that we by boosting intelligence2 to vastly super-human levels necessarily get intelligence1 at a vastly super-human level is a mistake, not least since “human-level intelligence1” already includes vastly super-human intelligence2 in many cognitive domains.
I’m not aware of such summaries, but I’ll take a stab at it here:
Even though it’s possible for the expected disvalue of a very improbable outcome to be high if the outcome is sufficiently awful, the relatively large degree of investment in AI safety work by the EA community today would only make sense if the probability of AI-catalyzed GCR were decently high. This Open Phil post for example doesn’t frame this as a “yes it’s extremely unlikely, but the downsides could be massive, so in expectation it’s worth working on” cause; many EAs in general give estimates of a non-negligible probability of very bad AI outcomes. So, accordingly, AI is considered not only a viable cause to work on but indeed one of the top priorities.
But arguably the scenarios in which AGI becomes a catastrophic threat rely on a conjunction of several improbable assumptions. One of which is that general “intelligence” in the sense of a capacity to achieve goals on a global scale—rather than capacity merely to solve problems easily representable within e.g. a Markov decision process—is something that computers can develop without a long process of real world trial and error, or cooperation in the human economy. (If such a process is necessary, then humans should be able to stop potentially dangerous AIs in their tracks before they become too powerful.) The key takeaway from the essay as far as I found was that we should be cautious about using one definition of intelligence, i.e. the sort that deep RL algorithms have demonstrated in game settings, as grounds for predicting dangerous outcomes resulting from a much more difficult-to-automate sense of intelligence, namely ability to achieve goals in physical reality.
The actual essay is more subtle than this, of course, and I’d definitely encourage people to at least skim it before dismissing the weaker form of the argument I’ve sketched here. But I agree that the AI safety research community has a responsibility to make that connection between current deep learning “intelligence” and intelligence-as-power more explicit, otherwise it’s a big equivocation fallacy.
Magnus, is this a fair representation?
Thanks for the stab, Anthony. It’s fairly fair. :-)
Some clarifying points:
First, I should note that my piece was written from the perspective of suffering-focused ethics.
Second, I would not say that “investment in AI safety work by the EA community today would only make sense if the probability of AI-catalyzed GCR were decently high”. Even setting aside the question of what “decently high” means, I would note that:
1) Whether such investments in AI safety make sense depends in part on one’s values. (Though another critique I would make is that “AI safety” is less well-defined than people often seem to think: https://magnusvinding.com/2018/12/14/is-ai-alignment-possible/, but more on this below.)
2) Even if “the probability of AI-catalyzed GCR” were decently high — say, >2 percent — this would not imply that one should focus on “AI safety” in a standard narrow sense (roughly: constructing the right software), nor that other risks are not greater in expectation (compared to the risks we commonly have in mind when we think of “AI-catalyzed catastrophic risks”).
You write of “scenarios in which AGI becomes a catastrophic threat”. But a question I would raise is: what does this mean? Do we all have a clear picture of this in our minds? This sounds to me like a rather broad class of scenarios, and a worry I have is that we all have “poorly written software” scenarios in mind, although such scenarios could well comprise a relatively narrow subset of the entire class that is “catastrophic scenarios involving AI”.
Zooming out, my critique can be crudely summarized as a critique of two significant equivocations that I see doing an exceptional amount of work in many standard arguments for “prioritizing AI”.
First, there is what we may call the AI safety equivocation (or motte and bailey): people commonly fail to distinguish between 1) a focus on future outcomes controlled by AI and 2) a focus on writing “safe” software. Accepting that we should adopt the former focus by no means implies we should adopt the latter. By (imperfect) analogy, to say that we should focus on future outcomes controlled by humans does not imply that we should focus primarily on writing safe human genomes.
The second is what we may call the intelligence equivocation, which is the one you described. We operate with two very different senses of the term “intelligence”, namely 1) the ability to achieve goals in general (derived from Legg & Hutter, 2007), and 2) “intelligence” in the much narrower sense of “advanced cognitive abilities”, roughly equivalent to IQ in humans.
These two are often treated as virtually identical, and we fail to appreciate the rather enormous difference between them, as argued in/evident from books such as The Knowledge Illusion: Why We Never Think Alone, The Ascent of Man, The Evolution of Everything, and The Secret of Our Success. This was also the main point in my Reflections on Intelligence.
Intelligence2 lies all in the brain, whereas intelligence1 includes the brain and so much more, including all the rest of our well-adapted body parts (vocal cords, hands, upright walk — remove just one of these completely in all humans and human civilization is likely gone for good). Not to mention our culture and technology as a whole, which is the level at which our ability to achieve goals at a significant level really emerges: it derives not from any single advanced machine but from our entire economy. A vastly greater toolbox than what intelligence2 covers.
Thus, to assume that we by boosting intelligence2 to vastly super-human levels necessarily get intelligence1 at a vastly super-human level is a mistake, not least since “human-level intelligence1” already includes vastly super-human intelligence2 in many cognitive domains.