More precisely, the cascade is: - Probability of us developing TAGI, assuming no derailments - Probability of us being derailed, conditional on otherwise being on track to develop TAGI without derailment
Got it. As mentioned I disagree with your 0.7 war derailment. Upon further thought I don’t necessarily disagree with your 0.7 “regulation derailment”, but I think that in most cases where I’m talking to people about AI risk, I’d want to factor this out (because I typically want to make claims like “here’s what happens if we don’t do something about it”).
Anyway, the “derailment” part isn’t really the key disagreement here. The key disagreement is methodological. Here’s one concrete alternative methodology which I think is better: a more symmetric model which involves three estimates:
Probability of us developing TAGI, assuming that nothing extreme happens
Probability of us being derailed, conditional on otherwise being on track to develop TAGI
Probability of us being rerailed, conditional on otherwise not being on track to develop TAGI
By “rerailed” here I mean roughly “something as extreme as a derailment happens, but in a way which pushes us over the threshold to be on track towards TAGI by 2043″. Some possibilities include:
An international race towards AGI, akin to the space race or race towards nukes
A superintelligent but expensive AGI turns out to good enough at science to provide us with key breakthroughs
Massive economic growth superheats investment into TAGI
Suppose we put 5% credence on each of these “rerailing” us. Then our new calculation (using your numbers) would be:
The chance of being on track assuming that nothing extreme happens: 0.6*0.4*0.16*0.6*0.46 = 1%
P(no derailment conditional on being on track) = 0.7*0.9*0.7*0.9*0.95 = 38%
P(rerailment conditional on not being on track) = 1 − 0.95*0.95*0.95 = 14%
P(TAGI by 2043) = 0.99*0.14 + 0.01*0.38 = 14.2%
That’s over 30x higher than your original estimate, and totally changes your conclusions! So presumably you must think either that there’s something wrong with the structure I’ve used here, or else that 5% is way too high for each of those three rerailments. But I’ve tried to make the rerailments as analogous to the derailments as possible. For example, if you think a depression could derail us, then it seems pretty plausible that the opposite of a depression could rerail us using approximately the same mechanisms.
You might say “look, the chance of being on track to hit all of events 1-5 by 2043 is really low. This means that in worlds where we’re on track, we’re probably barely on track; whereas in worlds where we’re not on track, we’re often missing it by decades. This makes derailment much easier than rerailment.” Which… yeah, conditional on your numbers for events 1-5, this seems true. But the low likelihood of being on track also means that even very low rerailment probabilities could change your final estimate dramatically—e.g. even 1% for each of the rerailments above would increase your headline estimate by almost an order of magnitude. And I do think that many people would interpret a headline claim of “<1%” as pretty different from “around 3%”.
Having said that, speaking for myself, I don’t care very much about <1% vs 3%; I care about 3% vs 30% vs 60%. The difference between those is going to primarily depend on events 1-5, not on derailments or rerailments. I have been trying to avoid getting into the weeds on that, since everyone else has been doing so already. So I’ll just say the following: to me, events 1-5 all look pretty closely related. “Way better algorithms” and “far more rapid learning” and “cheaper inference” and “better robotic control” all seem in some sense to be different facets of a single underlying trend; and chip + power production will both contribute to that trend and also be boosted by that trend. And so, because of this, it seems likely to me that there are alternative factorizations which are less disjoint and therefore get very different results. I think this was what Paul was getting at, but that discussion didn’t seem super productive, so if I wanted to engage more with it a better approach might be to just come up with my own alternative factorization and then argue about whether it’s better or worse than yours. But this comment is already too long so will leave it be for now.
Great comment. We didn’t explicitly allocate probability to those scenarios, and if you do, you end up with much higher numbers. Very reasonable to do so.
Got it. As mentioned I disagree with your 0.7 war derailment. Upon further thought I don’t necessarily disagree with your 0.7 “regulation derailment”, but I think that in most cases where I’m talking to people about AI risk, I’d want to factor this out (because I typically want to make claims like “here’s what happens if we don’t do something about it”).
Anyway, the “derailment” part isn’t really the key disagreement here. The key disagreement is methodological. Here’s one concrete alternative methodology which I think is better: a more symmetric model which involves three estimates:
Probability of us developing TAGI, assuming that nothing extreme happens
Probability of us being derailed, conditional on otherwise being on track to develop TAGI
Probability of us being rerailed, conditional on otherwise not being on track to develop TAGI
By “rerailed” here I mean roughly “something as extreme as a derailment happens, but in a way which pushes us over the threshold to be on track towards TAGI by 2043″. Some possibilities include:
An international race towards AGI, akin to the space race or race towards nukes
A superintelligent but expensive AGI turns out to good enough at science to provide us with key breakthroughs
Massive economic growth superheats investment into TAGI
Suppose we put 5% credence on each of these “rerailing” us. Then our new calculation (using your numbers) would be:
The chance of being on track assuming that nothing extreme happens: 0.6*0.4*0.16*0.6*0.46 = 1%
P(no derailment conditional on being on track) = 0.7*0.9*0.7*0.9*0.95 = 38%
P(rerailment conditional on not being on track) = 1 − 0.95*0.95*0.95 = 14%
P(TAGI by 2043) = 0.99*0.14 + 0.01*0.38 = 14.2%
That’s over 30x higher than your original estimate, and totally changes your conclusions! So presumably you must think either that there’s something wrong with the structure I’ve used here, or else that 5% is way too high for each of those three rerailments. But I’ve tried to make the rerailments as analogous to the derailments as possible. For example, if you think a depression could derail us, then it seems pretty plausible that the opposite of a depression could rerail us using approximately the same mechanisms.
You might say “look, the chance of being on track to hit all of events 1-5 by 2043 is really low. This means that in worlds where we’re on track, we’re probably barely on track; whereas in worlds where we’re not on track, we’re often missing it by decades. This makes derailment much easier than rerailment.” Which… yeah, conditional on your numbers for events 1-5, this seems true. But the low likelihood of being on track also means that even very low rerailment probabilities could change your final estimate dramatically—e.g. even 1% for each of the rerailments above would increase your headline estimate by almost an order of magnitude. And I do think that many people would interpret a headline claim of “<1%” as pretty different from “around 3%”.
Having said that, speaking for myself, I don’t care very much about <1% vs 3%; I care about 3% vs 30% vs 60%. The difference between those is going to primarily depend on events 1-5, not on derailments or rerailments. I have been trying to avoid getting into the weeds on that, since everyone else has been doing so already. So I’ll just say the following: to me, events 1-5 all look pretty closely related. “Way better algorithms” and “far more rapid learning” and “cheaper inference” and “better robotic control” all seem in some sense to be different facets of a single underlying trend; and chip + power production will both contribute to that trend and also be boosted by that trend. And so, because of this, it seems likely to me that there are alternative factorizations which are less disjoint and therefore get very different results. I think this was what Paul was getting at, but that discussion didn’t seem super productive, so if I wanted to engage more with it a better approach might be to just come up with my own alternative factorization and then argue about whether it’s better or worse than yours. But this comment is already too long so will leave it be for now.
Great comment. We didn’t explicitly allocate probability to those scenarios, and if you do, you end up with much higher numbers. Very reasonable to do so.