Thanks—I definitely don’t completely agree, but it’s good to hear that people at the labs take this seriously.
Given that, I’ll respond to your response.
I think the idea here is that we should eventually aim for an international regulatory regime that is not a pause, but a significant chunk of x-risk from misaligned AI is in the near future, so we should enact an unconditional pause right now.
If so, my main disagreement is that I think the pause we enact right now should be conditional: specifically, I think it’s important that you evaluate the safety of a model after you train it, not before[18]. I may also disagree (perhaps controversially) that a significant chunk of x-risk from misaligned AI is in the near future, depending on what “near future” means.
This seems basically on point, and I think we disagree less than it seems. To explain why, two things are critical.
The first is that x-risks from models come in many flavors, and uncontrolled AI takeoff is only one longer term concern. If there are nearer term risks from misuse, which I think are even more critical for immediate action, we should be worried about short timelines, in the lower single digit year range. (Perhaps you disagree?)
The second is that “near future” means really different things to engineers, domestic political figures, and treaty negotiators. Political deals take years to make—we’re optimistically a year or more from getting anything in place. Even if a bill is proposed today and miraculously passes tomorrow, it likely would not go into effect for months or years, and that’s excluding court challenges. I don’t think we want to wait to get this started—we’re already behind schedule!
(OpenAI says we have 3-4 years until ASI. I’m more skeptical, but also confused by their opposition to immediate attempts to regulate—by the time we get to AGI, it won’t matter if regulations are coming online, abuse risks are stratospheric.
Either way, the suggestion that we shouldn’t pause now is already a done deal , and a delayed pause starting in 2 years, which is what I proposed, until there is a treaty, seems behind schedule already. If you’d prefer that be conditional, that seems fine—though I think we’re less than 2 years from hitting the remaining fire alarms, so your pause timelines might create a stop faster than mine. But I think we need something to light a fire to act faster.
I’m optimistic that in worlds where risk shows up rapidly, we’ll see treaties written and signed faster than this—but I think that either way, pushing hard now makes sense. To explain why, I’ll note that I predicted that COVID-19 would create incentives to get vaccines faster than previously seen, and along the same logic, I think the growing and increasingly obvious risks from AGI will cause us to do something similar—but this is conditioned on there being significant and rapid advances that scare people much more, which I think you say you doubt.
If you think that significant realized risks are further away, so that we have time to wait for a conditional pause, I think it’s also unreasonable to predict these shorter time frames. So conditional on what seem to be your beliefs about AI risk timelines, I claim I’m right to push now, and conditional on my tentative concerns about shorter timelines, I claim we’re badly behind schedule. Either way, we should be pushing for action now.
If there are nearer term risks from misuse, which I think are even more critical for immediate action, we should be worried about short timelines, in the lower single digit year range. (Perhaps you disagree?)
I agree, but in that case I’m even more baffled by the call for an unconditional pause. It’s so much easier to tell whether an AI system can be misused, relative to whether it is misaligned! Why would you not just ask for a ban on AI systems that pose great misuse risks?
In the rest of your comment you seem to assume that I think we shouldn’t push for action now. I’m not sure why you think that—I’m very interested in pushing for a conditional pause now, because as you say it takes time to actually implement.
I don’t think you can make general AI systems that are powerful enough to be used for misuse that aren’t powerful enough to pose risks of essentially full takeoff. I talked about this in terms of autonomy, but think the examples used illustrate why these are nearly impossible to separate. Misuse risk isn’t an unfortunate correlate of greater general capabilities, it’s exactly the same thing.
That said, if you’re not opposed to immediate steps, I think that where we differ most is a strategic detail about how to get to the point where we have a robust international system that conditionally bans only potentially dangerous systems, namely, whether we need a blanket ban on very large models until that system is in place.
I think that where we differ most is a strategic detail about how to get to the point where we have a robust international system that conditionally bans only potentially dangerous systems, namely, whether we need a blanket ban on very large models until that system is in place.
Yes, but it seems like an important detail, and I still haven’t seen you give a single reason why we should do the blanket ban instead of the conditional one. (I named four in my post, but I’m not sure if you care about any of them.)
The conditional one should get way more support! That seems really important! Why is everyone dying on the hill of a blanket ban?
I don’t think you can make general AI systems that are powerful enough to be used for misuse that aren’t powerful enough to pose risks of essentially full takeoff.
There are huge differences between misuse threat models and misalignment threat models, even if you set aside whether models will be misaligned in the first place:
In misuse, an AI doesn’t have to hide its bad actions (e.g. it can do chains of thought that explicitly reason about how to cause harm).
In misuse, an AI gets active help from humans (e.g. maybe the AI is great at creating bioweapon designs, but only the human can get them synthesized and released into the world).
It’s possible that the capabilities to overcome these issues comes at the same time as the capability for misuse, but I doubt that will happen.
Also, this seems to contradict your previous position:
The first is that x-risks from models come in many flavors, and uncontrolled AI takeoff is only one longer term concern. If there are nearer term risks from misuse, which I think are even more critical for immediate action, we should be worried about short timelines, in the lower single digit year range.
I didn’t say “blanket ban on ML,” I said “a blanket ban on very large models.”
Why? Because I have not seen, and don’t think anyone can make, a clear criterion for “too high risk of doom,” both because there are value judgements, and because we don’t know how capabilities arise—so there needs to be some cutoff past which we ban everything, until we have an actual regulatory structure in place to evaluate. Is that different than the “conditional ban” you’re advocating? If so, how?
I didn’t say “blanket ban on ML,” I said “a blanket ban on very large models.”
I know? I never said you asked for a blanket ban on ML?
Because I have not seen, and don’t think anyone can make, a clear criterion for “too high risk of doom,”
My post discusses “pause once an agent passes 10 or more of the ARC Evals tasks”. I think this is too weak a criterion and I’d argue for a harder test, but I think this is already better than a blanket ban on very large models.
ARC Evals is pushing responsible scaling, which is a conditional pause proposal.
But also, the blanket ban on very large models is implicitly saying that “more powerful than GPT-4” is the criterion of “too high risk of doom”, so I really don’t understand at all where you’re coming from.
Thanks—I definitely don’t completely agree, but it’s good to hear that people at the labs take this seriously.
Given that, I’ll respond to your response.
This seems basically on point, and I think we disagree less than it seems. To explain why, two things are critical.
The first is that x-risks from models come in many flavors, and uncontrolled AI takeoff is only one longer term concern. If there are nearer term risks from misuse, which I think are even more critical for immediate action, we should be worried about short timelines, in the lower single digit year range. (Perhaps you disagree?)
The second is that “near future” means really different things to engineers, domestic political figures, and treaty negotiators. Political deals take years to make—we’re optimistically a year or more from getting anything in place. Even if a bill is proposed today and miraculously passes tomorrow, it likely would not go into effect for months or years, and that’s excluding court challenges. I don’t think we want to wait to get this started—we’re already behind schedule!
(OpenAI says we have 3-4 years until ASI. I’m more skeptical, but also confused by their opposition to immediate attempts to regulate—by the time we get to AGI, it won’t matter if regulations are coming online, abuse risks are stratospheric.
Either way, the suggestion that we shouldn’t pause now is already a done deal , and a delayed pause starting in 2 years, which is what I proposed, until there is a treaty, seems behind schedule already. If you’d prefer that be conditional, that seems fine—though I think we’re less than 2 years from hitting the remaining fire alarms, so your pause timelines might create a stop faster than mine. But I think we need something to light a fire to act faster.
Why? Because years are likely needed just to get people on board for negotiating a real international regulatory regime! Climate treaties take a decade or more to convene, then set their milestones to hit a decade or more in the future, narrowly bilateral nuclear arms control treaties talks take years to start, span 3-4 years before there is a treaty, and the resulting treaty has a 30-year time span.
I’m optimistic that in worlds where risk shows up rapidly, we’ll see treaties written and signed faster than this—but I think that either way, pushing hard now makes sense. To explain why, I’ll note that I predicted that COVID-19 would create incentives to get vaccines faster than previously seen, and along the same logic, I think the growing and increasingly obvious risks from AGI will cause us to do something similar—but this is conditioned on there being significant and rapid advances that scare people much more, which I think you say you doubt.
If you think that significant realized risks are further away, so that we have time to wait for a conditional pause, I think it’s also unreasonable to predict these shorter time frames. So conditional on what seem to be your beliefs about AI risk timelines, I claim I’m right to push now, and conditional on my tentative concerns about shorter timelines, I claim we’re badly behind schedule. Either way, we should be pushing for action now.
I agree, but in that case I’m even more baffled by the call for an unconditional pause. It’s so much easier to tell whether an AI system can be misused, relative to whether it is misaligned! Why would you not just ask for a ban on AI systems that pose great misuse risks?
In the rest of your comment you seem to assume that I think we shouldn’t push for action now. I’m not sure why you think that—I’m very interested in pushing for a conditional pause now, because as you say it takes time to actually implement.
(I agree that we mostly don’t disagree.)
I don’t think you can make general AI systems that are powerful enough to be used for misuse that aren’t powerful enough to pose risks of essentially full takeoff. I talked about this in terms of autonomy, but think the examples used illustrate why these are nearly impossible to separate. Misuse risk isn’t an unfortunate correlate of greater general capabilities, it’s exactly the same thing.
That said, if you’re not opposed to immediate steps, I think that where we differ most is a strategic detail about how to get to the point where we have a robust international system that conditionally bans only potentially dangerous systems, namely, whether we need a blanket ban on very large models until that system is in place.
Yes, but it seems like an important detail, and I still haven’t seen you give a single reason why we should do the blanket ban instead of the conditional one. (I named four in my post, but I’m not sure if you care about any of them.)
The conditional one should get way more support! That seems really important! Why is everyone dying on the hill of a blanket ban?
There are huge differences between misuse threat models and misalignment threat models, even if you set aside whether models will be misaligned in the first place:
In misuse, an AI doesn’t have to hide its bad actions (e.g. it can do chains of thought that explicitly reason about how to cause harm).
In misuse, an AI gets active help from humans (e.g. maybe the AI is great at creating bioweapon designs, but only the human can get them synthesized and released into the world).
It’s possible that the capabilities to overcome these issues comes at the same time as the capability for misuse, but I doubt that will happen.
Also, this seems to contradict your previous position:
I didn’t say “blanket ban on ML,” I said “a blanket ban on very large models.”
Why? Because I have not seen, and don’t think anyone can make, a clear criterion for “too high risk of doom,” both because there are value judgements, and because we don’t know how capabilities arise—so there needs to be some cutoff past which we ban everything, until we have an actual regulatory structure in place to evaluate. Is that different than the “conditional ban” you’re advocating? If so, how?
I know? I never said you asked for a blanket ban on ML?
My post discusses “pause once an agent passes 10 or more of the ARC Evals tasks”. I think this is too weak a criterion and I’d argue for a harder test, but I think this is already better than a blanket ban on very large models.
Anthropic just committed to a conditional pause.
ARC Evals is pushing responsible scaling, which is a conditional pause proposal.
But also, the blanket ban on very large models is implicitly saying that “more powerful than GPT-4” is the criterion of “too high risk of doom”, so I really don’t understand at all where you’re coming from.