The best way to stop people racing (for whatever reason) is to convince them that it’s suicidal. It’s a race where if anyone gets to the finish line, we all die. No one knows how to reliably align an AGI enough to prevent a catastrophic outcome from it’s existence. It may even be impossible.
I’ve seen this a few times but I’m skeptical about taking this rhetorical approach.
I think a large fraction of AI risk comes from worlds where the ex ante probability of catastrophe is more like 50% than 100%. And in many of those worlds, the counterfactual impact of individual developers move faster is several times smaller (since someone else is likely to kill us all in the bad 50% of worlds). On top of that, reasonable people might disagree about probabilities and think 10% in a case where I think 50%.
So putting that together they may conclude that racing faster increases the risk of doom by 0.03% for every 1% that it increases your share of the future (whether measured in profit, or reduced opportunity for misuse of frontier systems). And that’s just not going to be compelling.
I think you will have an extremely hard time convincing people that the race is obviously suicidal. I know some folks are confident about this, but I don’t really find that position credible today and I’ve spent a very long time thinking about the problem and engaging with pessimistic people. Maybe it will become obvious tomorrow, and maybe it’s OK for some people to be betting their chips on that, but I don’t want to get lumped in with them (because I think their political position is going to become increasingly untenable over time).
On the flip side, I don’t think it’s controversial to say: “If the probability of AI takeover is 10%, AI developers need to stop racing.”
It’s a tiny bit unclear what that means, so to be a bit more precise: “If people didn’t stop AI development until things look significantly more dangerous than they do today, then then the probability of takeover would be more than 10%.” I don’t think that’s true today, but will likely become true.
Have you had an in depth discussion with Roman Yampolskiy? (If not, I think you should!)
I think the Overton Window is really shifting on the issue of AGI x-risk now, with it going mainstream. The burden of proof should be on the developers of AGI to prove that it is 100% safe (as opposed to the previous era where it was on the x-risk worriers to prove it was dangerous). Do you have a good post-GPT-4+plugins/AutoGPT (new 2023 era) answer to this question (a mechanistic explanation for why we get an ok outcome, given AGI).
I’m pushing back against the framing: “this is a suicide race with no benefit from winning.”
If there is a 10% chance of AI takeover, then there is a real and potentially huge benefit from winning the race. But we still should not be OK with someone unilaterally taking that risk.
I agree that AI developers should have to prove that the systems they build are reasonably safe. I don’t think 100% is a reasonable ask, but 90% or 99% seem pretty safe (i.e. robustly reasonable asks).
(Edited to complete cutoff sentence and clarify “safe.”)
I agree that AI developers should have to prove that the systems they build are reasonably safe. I don’t think 100% is a reasonable ask, but 90% or 99% seem pretty safe.
Sorry, just to clarify, what do we mean by “safe” here? Clearly a 90% chance of the system not disempowering all of humanity is not sufficient (and neither would a 99% chance, though that’s maybe a bit more debatable), so presumably you mean something else here.
I mean that 90% or 99% seem like clearly reasonable asks, and 100% is a clearly unreasonable ask.
I’m just saying that the argument “this is a suicide race” is really not the way we should go. We should say the risk is >10% and that’s obviously unacceptable, because that’s an argument we can actually win.
I’m just saying that the argument “this is a suicide race” is really not the way we should go. We should say the risk is >10% and that’s obviously unacceptable, because that’s an argument we can actually win.
Hmm, just to be clear, I think saying that “this deployment has a 1% chance of causing an existential risk, so you can’t deploy it” seems like a pretty reasonable ask to me.
I agree that I would like to focus on the >10% case first, but I also don’t want to set wrong expectations that I think it’s reasonable at 1% or below.
I agree. When I give numbers I usually say “We should keep the risk of AI takeover beneath 1%” (though I haven’t thought about it very much and mostly the numbers seem less important than the qualitative standard of evidence).
I think that 10% is obviously too high. I think that a society making reasonable tradeoffs could end up with 1% risk, but that it’s not something a government should allow AI developers to do without broader public input (and I suspect that our society would not choose to take this level of risk).
90% or 99% safe is still gambling the lives of 80M-800M humans in expectation (in the limit of scaling to superintelligence). I don’t think it’s acceptable for AI companies, with no democratic mandate, to be unilaterally making that decision!
But we still should not be OK with someone
Or did you mean to say something to that effect with this truncated sentence?
Yeah, the sentence cut off. I was saying: obviously a 10% risk is socially unacceptable. Trying to convince someone it’s not in their interest is not the right approach, because doing so requires you to argue that P(doom) is much greater than 10% (at least with some audiences who care a lot about winning a race). Whereas trying to convince policy makers and the public that they shouldn’t tolerate the risk requires meeting a radically lower bar, probably even 1% is good enough.
I think arguing P(doom|AGI) >>10% is a decent strategy. So far I haven’t had anyone give good enough reasons for me to update in the other direction. I think the CEOs in the vanguard of AGI development need to really think about this. If they have good reasons for thinking that P(doom|AGI) ≤ 10%, I want to hear them! To give a worrying example: LeCun is, frankly, sounding like he has no idea of what the problem even is. OpenAI might think they can solve alignment, but their progress on alignment to date isn’t encouraging (this is so far away from the 100% watertight, 0 failure modes that we need). And Google Deepmind are throwing caution to the wind (despite safetywashing their statement with 7 mentions of the word “responsible”/”responsibly”).
The above also has the effect of shifting the public framing toward the burden being on the AI companies to prove their products are safe (in terms of not causing global catastrophe). I’m unsure as to whether the public at large would tolerate a 1% risk. Maybe they would (given the potential upside). But we are not in that world. The risk is at least 50%, probably closer to 99% imo.
Paul, you are saying “50/50 chance of doom” here (on the Bankless podcast). Surely that is enough to be using the suicide race argument!? I mean “it’s not suicide, it’s a coin flip; heads utopia, tails you’re doomed” seems like quibbling at this point. Or at least: you should be explicit when talking to CEOs that you think it’s 50⁄50 that AGI dooms us!
Re the you’re in “you’re doomed”—I used that instead of “we’re doomed”, because when CEOs hear “we’re”, they’re probably often thinking “not we’re, you’re. I’ll be alright in my secure compound in NZ”. But they really won’t! Do they think that if the shit hits the fan with this and there are survivors, there won’t be the vast majority of the survivors wanting justice?
The best way to stop people racing (for whatever reason) is to convince them that it’s suicidal. It’s a race where if anyone gets to the finish line, we all die. No one knows how to reliably align an AGI enough to prevent a catastrophic outcome from it’s existence. It may even be impossible.
I’ve seen this a few times but I’m skeptical about taking this rhetorical approach.
I think a large fraction of AI risk comes from worlds where the ex ante probability of catastrophe is more like 50% than 100%. And in many of those worlds, the counterfactual impact of individual developers move faster is several times smaller (since someone else is likely to kill us all in the bad 50% of worlds). On top of that, reasonable people might disagree about probabilities and think 10% in a case where I think 50%.
So putting that together they may conclude that racing faster increases the risk of doom by 0.03% for every 1% that it increases your share of the future (whether measured in profit, or reduced opportunity for misuse of frontier systems). And that’s just not going to be compelling.
I think you will have an extremely hard time convincing people that the race is obviously suicidal. I know some folks are confident about this, but I don’t really find that position credible today and I’ve spent a very long time thinking about the problem and engaging with pessimistic people. Maybe it will become obvious tomorrow, and maybe it’s OK for some people to be betting their chips on that, but I don’t want to get lumped in with them (because I think their political position is going to become increasingly untenable over time).
On the flip side, I don’t think it’s controversial to say: “If the probability of AI takeover is 10%, AI developers need to stop racing.”
It’s a tiny bit unclear what that means, so to be a bit more precise: “If people didn’t stop AI development until things look significantly more dangerous than they do today, then then the probability of takeover would be more than 10%.” I don’t think that’s true today, but will likely become true.
Have you had an in depth discussion with Roman Yampolskiy? (If not, I think you should!)
I think the Overton Window is really shifting on the issue of AGI x-risk now, with it going mainstream. The burden of proof should be on the developers of AGI to prove that it is 100% safe (as opposed to the previous era where it was on the x-risk worriers to prove it was dangerous). Do you have a good post-GPT-4+plugins/AutoGPT (new 2023 era) answer to this question (a mechanistic explanation for why we get an ok outcome, given AGI).
I’m pushing back against the framing: “this is a suicide race with no benefit from winning.”
If there is a 10% chance of AI takeover, then there is a real and potentially huge benefit from winning the race. But we still should not be OK with someone unilaterally taking that risk.
I agree that AI developers should have to prove that the systems they build are reasonably safe. I don’t think 100% is a reasonable ask, but 90% or 99% seem pretty safe (i.e. robustly reasonable asks).
(Edited to complete cutoff sentence and clarify “safe.”)
Sorry, just to clarify, what do we mean by “safe” here? Clearly a 90% chance of the system not disempowering all of humanity is not sufficient (and neither would a 99% chance, though that’s maybe a bit more debatable), so presumably you mean something else here.
I mean that 90% or 99% seem like clearly reasonable asks, and 100% is a clearly unreasonable ask.
I’m just saying that the argument “this is a suicide race” is really not the way we should go. We should say the risk is >10% and that’s obviously unacceptable, because that’s an argument we can actually win.
Hmm, just to be clear, I think saying that “this deployment has a 1% chance of causing an existential risk, so you can’t deploy it” seems like a pretty reasonable ask to me.
I agree that I would like to focus on the >10% case first, but I also don’t want to set wrong expectations that I think it’s reasonable at 1% or below.
I agree. When I give numbers I usually say “We should keep the risk of AI takeover beneath 1%” (though I haven’t thought about it very much and mostly the numbers seem less important than the qualitative standard of evidence).
I think that 10% is obviously too high. I think that a society making reasonable tradeoffs could end up with 1% risk, but that it’s not something a government should allow AI developers to do without broader public input (and I suspect that our society would not choose to take this level of risk).
Cool, makes sense. Seems like we are mostly on the same page on this subpoint.
90% or 99% safe is still gambling the lives of 80M-800M humans in expectation (in the limit of scaling to superintelligence). I don’t think it’s acceptable for AI companies, with no democratic mandate, to be unilaterally making that decision!
Or did you mean to say something to that effect with this truncated sentence?
Yeah, the sentence cut off. I was saying: obviously a 10% risk is socially unacceptable. Trying to convince someone it’s not in their interest is not the right approach, because doing so requires you to argue that P(doom) is much greater than 10% (at least with some audiences who care a lot about winning a race). Whereas trying to convince policy makers and the public that they shouldn’t tolerate the risk requires meeting a radically lower bar, probably even 1% is good enough.
I think arguing P(doom|AGI) >>10% is a decent strategy. So far I haven’t had anyone give good enough reasons for me to update in the other direction. I think the CEOs in the vanguard of AGI development need to really think about this. If they have good reasons for thinking that P(doom|AGI) ≤ 10%, I want to hear them! To give a worrying example: LeCun is, frankly, sounding like he has no idea of what the problem even is. OpenAI might think they can solve alignment, but their progress on alignment to date isn’t encouraging (this is so far away from the 100% watertight, 0 failure modes that we need). And Google Deepmind are throwing caution to the wind (despite safetywashing their statement with 7 mentions of the word “responsible”/”responsibly”).
The above also has the effect of shifting the public framing toward the burden being on the AI companies to prove their products are safe (in terms of not causing global catastrophe). I’m unsure as to whether the public at large would tolerate a 1% risk. Maybe they would (given the potential upside). But we are not in that world. The risk is at least 50%, probably closer to 99% imo.
Paul, you are saying “50/50 chance of doom” here (on the Bankless podcast). Surely that is enough to be using the suicide race argument!? I mean “it’s not suicide, it’s a coin flip; heads utopia, tails you’re doomed” seems like quibbling at this point. Or at least: you should be explicit when talking to CEOs that you think it’s 50⁄50 that AGI dooms us!
Re the you’re in “you’re doomed”—I used that instead of “we’re doomed”, because when CEOs hear “we’re”, they’re probably often thinking “not we’re, you’re. I’ll be alright in my secure compound in NZ”. But they really won’t! Do they think that if the shit hits the fan with this and there are survivors, there won’t be the vast majority of the survivors wanting justice?