(This is a repost of a short-form, which I realized might be worth making into its own post. It’s partly inspired by Greg Lewis’s recent post “Rational Predictions Often Update Predictably.”)[1]
The existential risk community’s level of concern about different possible risks is correlated with how hard-to-analyse these risks are. For example, here is The Precipice’s ranking of the top five most concerning existential risks:
For a number of risks, when you first hear and think a bit about them, it’s reasonable to have the reaction “Oh, hm, maybe that could be a huge threat to human survival” and initially assign something on the order of a 10% credence to the hypothesis that it will by default lead to existentially bad outcomes. In each case, if we can gain much greater clarity about the risk, then we should think there’s about a 90% chance this clarity will make us less worried about it. We’re likely to remain decently worried about hard-to-analyze risks (because we can’t get greater clarity about them) while becoming less worried about easy-to-analyze risks.
In particular, our level of worry about different plausible existential risks is likely to roughly track our ability to analyze them (e.g. through empirical evidence, predictively accurate formal models, and clearcut arguments).
Some plausible existential risks also are far easier to analyze than others. If you compare 80,000 Hours’ articles on climate change and artificial intelligence, for example, then I think it is pretty clear that people analyzing existential risks from climate change simply have a lot more to go on. When we study climate change, we can rely on climate models that we have reason to believe have a decent amount of validity. We can also draw on empirical evidence about the historical effects of previous large changes in global temperature and about the ability of humans and other species to survive under different local climate conditions. As a conceptual foundation, we are also lucky to have a set of precise and scientifically validated concepts (e.g. “temperature” and “sea-level”) that we can use to avoid ambiguity in our analysis. And so on.
We’re in a much worse epistemic position when it comes to analyzing the existential risk posed by misaligned AI: we’re reliant abstract arguments that use ambiguous concepts (e.g. “objectives” and “intelligence”), rough analogies, observations of the behaviour of present-day AI systems (e.g. reinforcement learners that play videogames) that will probably be very different than future AI systems, a single datapoint (the evolution of human intelligence and values) that has a lot of important differences with the case we’re considering, and attempts to predict the incentives and beliefs of future actors in development scenarios that are still very opaque to us. Even if misaligned AI actually poses very little risk to continued human survival, then it’s hard to see how we could become really confident of that.
Some upshots:
The fact that the existential risk community is particularly worried about misaligned AI might largely reflect the fact that it’s hard to analyze risks from misaligned AI.
Nonetheless, even if the above possibility is true, it doesn’t at all follow that the community is irrational to worry more about misaligned AI than other potential risks. It’s completely coherent to have something like this attitude: “If I could think more clearly about the risk from misaligned AI, then I would probably come to realize it’s not a far bigger deal than other risks. But, in practice, I can’t yet think very clearly about it. That means that, unlike in the case of climate change, I also can’t rule out the small possibility that clarity would make me much more worried about it than I currently am. So, on balance, I should feel more worried about misaligned AI than I do about other risks. I should focus my efforts on it, even if — to better-informed future observers — I’ll probably look over-worried after the fact.”
For hard-to-analyze risks, it matters a lot what your “prior” in the risks is (since detailed evidence, models, and arguments can only really move you so far from your baseline impression). I sometimes get the sense that some people are starting from a prior that’s not far from 50%: For example, people who are very worried about misaligned AI sometimes use the rhetorical move “How would the world look different if AI wasn’t going to kill everyone?”, and this move seems to assume that empirical evidence is needed to shift us down from a high credence. I think that other people (including myself) are often implicitly starting from a low prior and feel the need to be argued up. Insofar as it’s very unclear how we should determine our priors, and it’s even a bit unclear what exactly a “prior” means in this case, it’s also unsurprising that there’s a particularly huge range of variation in estimates of the risk from misaligned AI.[3]
Clarification: The title of this post is using the word “expect” in the everyday sense of the world, rather than the formal probability theory sense of the word. A less ambiguous title might have been “We will predictably worry more about speculative risks.”
Toby Ord actually notes, in the section of The Precipice that gives risk estimates: “The case for existential risk from AI is clearly speculative. Indeed, it is the most speculative case for a major risk in this book.”
Of course, not everyone agrees with that it’s so difficult to assess the risk from misaligned AI. Some people believe that the available arguments, evidence from evolution, and so on actually do count very strongly — or, even, nearly decisively — toward AI progress leading to human extinction by default. The argument I’ve made in this post doesn’t apply very well to this group. Rather, the argument applies to people who think of existing analysis of AI risk as suggestive, perhaps strongly suggestive, but still far from clearcut.
We should expect to worry more about speculative risks
(This is a repost of a short-form, which I realized might be worth making into its own post. It’s partly inspired by Greg Lewis’s recent post “Rational Predictions Often Update Predictably.”)[1]
The existential risk community’s level of concern about different possible risks is correlated with how hard-to-analyse these risks are. For example, here is The Precipice’s ranking of the top five most concerning existential risks:
Unaligned artificial intelligence[2]
Unforeseen anthropogenic risks (tied)
Engineered pandemics (tied)
Other anthropogenic risks
Nuclear war (tied)
Climate change (tied)
This isn’t surprising.
For a number of risks, when you first hear and think a bit about them, it’s reasonable to have the reaction “Oh, hm, maybe that could be a huge threat to human survival” and initially assign something on the order of a 10% credence to the hypothesis that it will by default lead to existentially bad outcomes. In each case, if we can gain much greater clarity about the risk, then we should think there’s about a 90% chance this clarity will make us less worried about it. We’re likely to remain decently worried about hard-to-analyze risks (because we can’t get greater clarity about them) while becoming less worried about easy-to-analyze risks.
In particular, our level of worry about different plausible existential risks is likely to roughly track our ability to analyze them (e.g. through empirical evidence, predictively accurate formal models, and clearcut arguments).
Some plausible existential risks also are far easier to analyze than others. If you compare 80,000 Hours’ articles on climate change and artificial intelligence, for example, then I think it is pretty clear that people analyzing existential risks from climate change simply have a lot more to go on. When we study climate change, we can rely on climate models that we have reason to believe have a decent amount of validity. We can also draw on empirical evidence about the historical effects of previous large changes in global temperature and about the ability of humans and other species to survive under different local climate conditions. As a conceptual foundation, we are also lucky to have a set of precise and scientifically validated concepts (e.g. “temperature” and “sea-level”) that we can use to avoid ambiguity in our analysis. And so on.
We’re in a much worse epistemic position when it comes to analyzing the existential risk posed by misaligned AI: we’re reliant abstract arguments that use ambiguous concepts (e.g. “objectives” and “intelligence”), rough analogies, observations of the behaviour of present-day AI systems (e.g. reinforcement learners that play videogames) that will probably be very different than future AI systems, a single datapoint (the evolution of human intelligence and values) that has a lot of important differences with the case we’re considering, and attempts to predict the incentives and beliefs of future actors in development scenarios that are still very opaque to us. Even if misaligned AI actually poses very little risk to continued human survival, then it’s hard to see how we could become really confident of that.
Some upshots:
The fact that the existential risk community is particularly worried about misaligned AI might largely reflect the fact that it’s hard to analyze risks from misaligned AI.
Nonetheless, even if the above possibility is true, it doesn’t at all follow that the community is irrational to worry more about misaligned AI than other potential risks. It’s completely coherent to have something like this attitude: “If I could think more clearly about the risk from misaligned AI, then I would probably come to realize it’s not a far bigger deal than other risks. But, in practice, I can’t yet think very clearly about it. That means that, unlike in the case of climate change, I also can’t rule out the small possibility that clarity would make me much more worried about it than I currently am. So, on balance, I should feel more worried about misaligned AI than I do about other risks. I should focus my efforts on it, even if — to better-informed future observers — I’ll probably look over-worried after the fact.”
For hard-to-analyze risks, it matters a lot what your “prior” in the risks is (since detailed evidence, models, and arguments can only really move you so far from your baseline impression). I sometimes get the sense that some people are starting from a prior that’s not far from 50%: For example, people who are very worried about misaligned AI sometimes use the rhetorical move “How would the world look different if AI wasn’t going to kill everyone?”, and this move seems to assume that empirical evidence is needed to shift us down from a high credence. I think that other people (including myself) are often implicitly starting from a low prior and feel the need to be argued up. Insofar as it’s very unclear how we should determine our priors, and it’s even a bit unclear what exactly a “prior” means in this case, it’s also unsurprising that there’s a particularly huge range of variation in estimates of the risk from misaligned AI.[3]
Clarification: The title of this post is using the word “expect” in the everyday sense of the world, rather than the formal probability theory sense of the word. A less ambiguous title might have been “We will predictably worry more about speculative risks.”
Toby Ord actually notes, in the section of The Precipice that gives risk estimates: “The case for existential risk from AI is clearly speculative. Indeed, it is the most speculative case for a major risk in this book.”
Of course, not everyone agrees with that it’s so difficult to assess the risk from misaligned AI. Some people believe that the available arguments, evidence from evolution, and so on actually do count very strongly — or, even, nearly decisively — toward AI progress leading to human extinction by default. The argument I’ve made in this post doesn’t apply very well to this group. Rather, the argument applies to people who think of existing analysis of AI risk as suggestive, perhaps strongly suggestive, but still far from clearcut.