I put little weight on this analysis because it seems like a central example of the multiple stage fallacy. But it does seem worth trying to identify clear example of the authors not accounting properly for conditionals. So here are three concrete criticisms (though note that these are based on skimming rather than close-reading the PDF):
A lot of the authors’ analysis about the probability of war derailment is focused on Taiwan, which is currently a crucial pivot point. But conditional on chip production scaling up massively, Taiwan would likely be far less important.
If there is extensive regulation of AI, it will likely slow down both algorithmic and hardware progress. So conditional on the types of progress listed under events 1-5, the probability of extensive regulation is much lower than it would be otherwise.
The third criticism is more involved; I’ll summarize it as “the authors are sometimes treating the different events as sequential in time, and sometimes sequential in logical flow”. For example, the authors assign around 1% to events 1-5 happening before 2043. If they’re correct, then conditioning on events 1-5 happening before 2043, they’ll very likely only happen just before 2043. But this leaves very little time for any “derailing” to occur after that, and so the conditional probability of derailing should be far smaller than what they’ve given (62%).
The authors might instead say that they’re not conditioning on events 1-5 literally happening when estimating conditional probability of derailing, but rather conditioning on something more like “events 1-5 would have happened without the 5 types of disruption listed”. That way, their 10% estimate for a derailing pandemic could include a pandemic in 2025 in a world which was otherwise on track for reaching AGI. But I don’t think this is consistent, because the authors often appeal to the assumption that AGI already exists when talking about the probability of derailing (e.g. the probability of pandemics being created). So it instead seems to me like they’re explicitly treating the events as sequential in time, but implicitly treating the events as sequential in logical flow, in a way which significantly decreases the likelihood they assign to TAI by 2043.
I suspect that I have major disagreements with the way the authors frame events 1-5 as well, but don’t want to try to dig into those now.
Great comment! Thanks especially for trying to point the actual stages going wrong, rather than hand-waving the multiple stage fallacy, which we all are of course well aware of.
Replying to the points:
For example, the authors assign around 1% to events 1-5 happening before 2043. If they’re correct, then conditioning on events 1-5 happening before 2043, they’ll very likely only happen just before 2043. But this leaves very little time for any “derailing” to occur after that, and so the conditional probability of derailing should be far smaller than what they’ve given (62%).
From my POV, if events 1-5 have happened, then we have TAGI. It’s already done. The derailments are not things that could happen after TAGI to return us to a pre-TAGI state. They are events that happen before TAGI and modify the estimates above.
The authors might instead say that they’re not conditioning on events 1-5 literally happening when estimating conditional probability of derailing, but rather conditioning on something more like “events 1-5 would have happened without the 5 types of disruption listed”. That way, their 10% estimate for a derailing pandemic could include a pandemic in 2025 in a world which was otherwise on track for reaching AGI. But I don’t think this is consistent, because the authors often appeal to the assumption that AGI already exists when talking about the probability of derailing (e.g. the probability of pandemics being created). So it instead seems to me like they’re explicitly treating the events as sequential in time, but implicitly treating the events as sequential in logical flow, in a way which significantly decreases the likelihood they assign to TAI by 2043.
Yes, we think AGI will precede TAGI by quite some time, and therefore it’s reasonable to talk about derailments of TAGI conditional on AGI.
If events 1-5 constitute TAGI, and events 6-10 are conditional on AGI, and TAGI is very different from AGI, then you can’t straightforwardly get an overall estimate by multiplying them together. E.g. as I discuss above, 0.3 seems like a reasonable estimate of P(derailment from wars) if the chip supply remains concentrated in Taiwan, but doesn’t seem reasonable if the supply of chips is on track to be “massively scaled up”.
I think that’s a great criticism. Perhaps our conditional odds of Taiwan derailment are too high because we’re too anchored to today’s distribution of production.
One clarification/correction to what I said above: I see the derailment events 6-10 as being conditional on us being on the path to TAGI had the derailments not occurred. So steps 1-5 might not have happened yet, but we are in a world where they will happen if the derailment does not occur. (So not really conditional on TAGI already occurring, and not necessarily conditional on AGI, but probably AGI is occurring in most of those on-the-path-to-TAGI scenarios.)
Edit: More precisely, the cascade is: - Probability of us developing TAGI, assuming no derailments - Probability of us being derailed, conditional on otherwise being on track to develop TAGI without derailment
More precisely, the cascade is: - Probability of us developing TAGI, assuming no derailments - Probability of us being derailed, conditional on otherwise being on track to develop TAGI without derailment
Got it. As mentioned I disagree with your 0.7 war derailment. Upon further thought I don’t necessarily disagree with your 0.7 “regulation derailment”, but I think that in most cases where I’m talking to people about AI risk, I’d want to factor this out (because I typically want to make claims like “here’s what happens if we don’t do something about it”).
Anyway, the “derailment” part isn’t really the key disagreement here. The key disagreement is methodological. Here’s one concrete alternative methodology which I think is better: a more symmetric model which involves three estimates:
Probability of us developing TAGI, assuming that nothing extreme happens
Probability of us being derailed, conditional on otherwise being on track to develop TAGI
Probability of us being rerailed, conditional on otherwise not being on track to develop TAGI
By “rerailed” here I mean roughly “something as extreme as a derailment happens, but in a way which pushes us over the threshold to be on track towards TAGI by 2043″. Some possibilities include:
An international race towards AGI, akin to the space race or race towards nukes
A superintelligent but expensive AGI turns out to good enough at science to provide us with key breakthroughs
Massive economic growth superheats investment into TAGI
Suppose we put 5% credence on each of these “rerailing” us. Then our new calculation (using your numbers) would be:
The chance of being on track assuming that nothing extreme happens: 0.6*0.4*0.16*0.6*0.46 = 1%
P(no derailment conditional on being on track) = 0.7*0.9*0.7*0.9*0.95 = 38%
P(rerailment conditional on not being on track) = 1 − 0.95*0.95*0.95 = 14%
P(TAGI by 2043) = 0.99*0.14 + 0.01*0.38 = 14.2%
That’s over 30x higher than your original estimate, and totally changes your conclusions! So presumably you must think either that there’s something wrong with the structure I’ve used here, or else that 5% is way too high for each of those three rerailments. But I’ve tried to make the rerailments as analogous to the derailments as possible. For example, if you think a depression could derail us, then it seems pretty plausible that the opposite of a depression could rerail us using approximately the same mechanisms.
You might say “look, the chance of being on track to hit all of events 1-5 by 2043 is really low. This means that in worlds where we’re on track, we’re probably barely on track; whereas in worlds where we’re not on track, we’re often missing it by decades. This makes derailment much easier than rerailment.” Which… yeah, conditional on your numbers for events 1-5, this seems true. But the low likelihood of being on track also means that even very low rerailment probabilities could change your final estimate dramatically—e.g. even 1% for each of the rerailments above would increase your headline estimate by almost an order of magnitude. And I do think that many people would interpret a headline claim of “<1%” as pretty different from “around 3%”.
Having said that, speaking for myself, I don’t care very much about <1% vs 3%; I care about 3% vs 30% vs 60%. The difference between those is going to primarily depend on events 1-5, not on derailments or rerailments. I have been trying to avoid getting into the weeds on that, since everyone else has been doing so already. So I’ll just say the following: to me, events 1-5 all look pretty closely related. “Way better algorithms” and “far more rapid learning” and “cheaper inference” and “better robotic control” all seem in some sense to be different facets of a single underlying trend; and chip + power production will both contribute to that trend and also be boosted by that trend. And so, because of this, it seems likely to me that there are alternative factorizations which are less disjoint and therefore get very different results. I think this was what Paul was getting at, but that discussion didn’t seem super productive, so if I wanted to engage more with it a better approach might be to just come up with my own alternative factorization and then argue about whether it’s better or worse than yours. But this comment is already too long so will leave it be for now.
Great comment. We didn’t explicitly allocate probability to those scenarios, and if you do, you end up with much higher numbers. Very reasonable to do so.
I put little weight on this analysis because it seems like a central example of the multiple stage fallacy. But it does seem worth trying to identify clear example of the authors not accounting properly for conditionals. So here are three concrete criticisms (though note that these are based on skimming rather than close-reading the PDF):
A lot of the authors’ analysis about the probability of war derailment is focused on Taiwan, which is currently a crucial pivot point. But conditional on chip production scaling up massively, Taiwan would likely be far less important.
If there is extensive regulation of AI, it will likely slow down both algorithmic and hardware progress. So conditional on the types of progress listed under events 1-5, the probability of extensive regulation is much lower than it would be otherwise.
The third criticism is more involved; I’ll summarize it as “the authors are sometimes treating the different events as sequential in time, and sometimes sequential in logical flow”. For example, the authors assign around 1% to events 1-5 happening before 2043. If they’re correct, then conditioning on events 1-5 happening before 2043, they’ll very likely only happen just before 2043. But this leaves very little time for any “derailing” to occur after that, and so the conditional probability of derailing should be far smaller than what they’ve given (62%).
The authors might instead say that they’re not conditioning on events 1-5 literally happening when estimating conditional probability of derailing, but rather conditioning on something more like “events 1-5 would have happened without the 5 types of disruption listed”. That way, their 10% estimate for a derailing pandemic could include a pandemic in 2025 in a world which was otherwise on track for reaching AGI. But I don’t think this is consistent, because the authors often appeal to the assumption that AGI already exists when talking about the probability of derailing (e.g. the probability of pandemics being created). So it instead seems to me like they’re explicitly treating the events as sequential in time, but implicitly treating the events as sequential in logical flow, in a way which significantly decreases the likelihood they assign to TAI by 2043.
I suspect that I have major disagreements with the way the authors frame events 1-5 as well, but don’t want to try to dig into those now.
Great comment! Thanks especially for trying to point the actual stages going wrong, rather than hand-waving the multiple stage fallacy, which we all are of course well aware of.
Replying to the points:
From my POV, if events 1-5 have happened, then we have TAGI. It’s already done. The derailments are not things that could happen after TAGI to return us to a pre-TAGI state. They are events that happen before TAGI and modify the estimates above.
Yes, we think AGI will precede TAGI by quite some time, and therefore it’s reasonable to talk about derailments of TAGI conditional on AGI.
If events 1-5 constitute TAGI, and events 6-10 are conditional on AGI, and TAGI is very different from AGI, then you can’t straightforwardly get an overall estimate by multiplying them together. E.g. as I discuss above, 0.3 seems like a reasonable estimate of P(derailment from wars) if the chip supply remains concentrated in Taiwan, but doesn’t seem reasonable if the supply of chips is on track to be “massively scaled up”.
I think that’s a great criticism. Perhaps our conditional odds of Taiwan derailment are too high because we’re too anchored to today’s distribution of production.
One clarification/correction to what I said above: I see the derailment events 6-10 as being conditional on us being on the path to TAGI had the derailments not occurred. So steps 1-5 might not have happened yet, but we are in a world where they will happen if the derailment does not occur. (So not really conditional on TAGI already occurring, and not necessarily conditional on AGI, but probably AGI is occurring in most of those on-the-path-to-TAGI scenarios.)
Edit: More precisely, the cascade is:
- Probability of us developing TAGI, assuming no derailments
- Probability of us being derailed, conditional on otherwise being on track to develop TAGI without derailment
Got it. As mentioned I disagree with your 0.7 war derailment. Upon further thought I don’t necessarily disagree with your 0.7 “regulation derailment”, but I think that in most cases where I’m talking to people about AI risk, I’d want to factor this out (because I typically want to make claims like “here’s what happens if we don’t do something about it”).
Anyway, the “derailment” part isn’t really the key disagreement here. The key disagreement is methodological. Here’s one concrete alternative methodology which I think is better: a more symmetric model which involves three estimates:
Probability of us developing TAGI, assuming that nothing extreme happens
Probability of us being derailed, conditional on otherwise being on track to develop TAGI
Probability of us being rerailed, conditional on otherwise not being on track to develop TAGI
By “rerailed” here I mean roughly “something as extreme as a derailment happens, but in a way which pushes us over the threshold to be on track towards TAGI by 2043″. Some possibilities include:
An international race towards AGI, akin to the space race or race towards nukes
A superintelligent but expensive AGI turns out to good enough at science to provide us with key breakthroughs
Massive economic growth superheats investment into TAGI
Suppose we put 5% credence on each of these “rerailing” us. Then our new calculation (using your numbers) would be:
The chance of being on track assuming that nothing extreme happens: 0.6*0.4*0.16*0.6*0.46 = 1%
P(no derailment conditional on being on track) = 0.7*0.9*0.7*0.9*0.95 = 38%
P(rerailment conditional on not being on track) = 1 − 0.95*0.95*0.95 = 14%
P(TAGI by 2043) = 0.99*0.14 + 0.01*0.38 = 14.2%
That’s over 30x higher than your original estimate, and totally changes your conclusions! So presumably you must think either that there’s something wrong with the structure I’ve used here, or else that 5% is way too high for each of those three rerailments. But I’ve tried to make the rerailments as analogous to the derailments as possible. For example, if you think a depression could derail us, then it seems pretty plausible that the opposite of a depression could rerail us using approximately the same mechanisms.
You might say “look, the chance of being on track to hit all of events 1-5 by 2043 is really low. This means that in worlds where we’re on track, we’re probably barely on track; whereas in worlds where we’re not on track, we’re often missing it by decades. This makes derailment much easier than rerailment.” Which… yeah, conditional on your numbers for events 1-5, this seems true. But the low likelihood of being on track also means that even very low rerailment probabilities could change your final estimate dramatically—e.g. even 1% for each of the rerailments above would increase your headline estimate by almost an order of magnitude. And I do think that many people would interpret a headline claim of “<1%” as pretty different from “around 3%”.
Having said that, speaking for myself, I don’t care very much about <1% vs 3%; I care about 3% vs 30% vs 60%. The difference between those is going to primarily depend on events 1-5, not on derailments or rerailments. I have been trying to avoid getting into the weeds on that, since everyone else has been doing so already. So I’ll just say the following: to me, events 1-5 all look pretty closely related. “Way better algorithms” and “far more rapid learning” and “cheaper inference” and “better robotic control” all seem in some sense to be different facets of a single underlying trend; and chip + power production will both contribute to that trend and also be boosted by that trend. And so, because of this, it seems likely to me that there are alternative factorizations which are less disjoint and therefore get very different results. I think this was what Paul was getting at, but that discussion didn’t seem super productive, so if I wanted to engage more with it a better approach might be to just come up with my own alternative factorization and then argue about whether it’s better or worse than yours. But this comment is already too long so will leave it be for now.
Great comment. We didn’t explicitly allocate probability to those scenarios, and if you do, you end up with much higher numbers. Very reasonable to do so.