I don’t think you can make general AI systems that are powerful enough to be used for misuse that aren’t powerful enough to pose risks of essentially full takeoff. I talked about this in terms of autonomy, but think the examples used illustrate why these are nearly impossible to separate. Misuse risk isn’t an unfortunate correlate of greater general capabilities, it’s exactly the same thing.
That said, if you’re not opposed to immediate steps, I think that where we differ most is a strategic detail about how to get to the point where we have a robust international system that conditionally bans only potentially dangerous systems, namely, whether we need a blanket ban on very large models until that system is in place.
I think that where we differ most is a strategic detail about how to get to the point where we have a robust international system that conditionally bans only potentially dangerous systems, namely, whether we need a blanket ban on very large models until that system is in place.
Yes, but it seems like an important detail, and I still haven’t seen you give a single reason why we should do the blanket ban instead of the conditional one. (I named four in my post, but I’m not sure if you care about any of them.)
The conditional one should get way more support! That seems really important! Why is everyone dying on the hill of a blanket ban?
I don’t think you can make general AI systems that are powerful enough to be used for misuse that aren’t powerful enough to pose risks of essentially full takeoff.
There are huge differences between misuse threat models and misalignment threat models, even if you set aside whether models will be misaligned in the first place:
In misuse, an AI doesn’t have to hide its bad actions (e.g. it can do chains of thought that explicitly reason about how to cause harm).
In misuse, an AI gets active help from humans (e.g. maybe the AI is great at creating bioweapon designs, but only the human can get them synthesized and released into the world).
It’s possible that the capabilities to overcome these issues comes at the same time as the capability for misuse, but I doubt that will happen.
Also, this seems to contradict your previous position:
The first is that x-risks from models come in many flavors, and uncontrolled AI takeoff is only one longer term concern. If there are nearer term risks from misuse, which I think are even more critical for immediate action, we should be worried about short timelines, in the lower single digit year range.
I didn’t say “blanket ban on ML,” I said “a blanket ban on very large models.”
Why? Because I have not seen, and don’t think anyone can make, a clear criterion for “too high risk of doom,” both because there are value judgements, and because we don’t know how capabilities arise—so there needs to be some cutoff past which we ban everything, until we have an actual regulatory structure in place to evaluate. Is that different than the “conditional ban” you’re advocating? If so, how?
I didn’t say “blanket ban on ML,” I said “a blanket ban on very large models.”
I know? I never said you asked for a blanket ban on ML?
Because I have not seen, and don’t think anyone can make, a clear criterion for “too high risk of doom,”
My post discusses “pause once an agent passes 10 or more of the ARC Evals tasks”. I think this is too weak a criterion and I’d argue for a harder test, but I think this is already better than a blanket ban on very large models.
ARC Evals is pushing responsible scaling, which is a conditional pause proposal.
But also, the blanket ban on very large models is implicitly saying that “more powerful than GPT-4” is the criterion of “too high risk of doom”, so I really don’t understand at all where you’re coming from.
I don’t think you can make general AI systems that are powerful enough to be used for misuse that aren’t powerful enough to pose risks of essentially full takeoff. I talked about this in terms of autonomy, but think the examples used illustrate why these are nearly impossible to separate. Misuse risk isn’t an unfortunate correlate of greater general capabilities, it’s exactly the same thing.
That said, if you’re not opposed to immediate steps, I think that where we differ most is a strategic detail about how to get to the point where we have a robust international system that conditionally bans only potentially dangerous systems, namely, whether we need a blanket ban on very large models until that system is in place.
Yes, but it seems like an important detail, and I still haven’t seen you give a single reason why we should do the blanket ban instead of the conditional one. (I named four in my post, but I’m not sure if you care about any of them.)
The conditional one should get way more support! That seems really important! Why is everyone dying on the hill of a blanket ban?
There are huge differences between misuse threat models and misalignment threat models, even if you set aside whether models will be misaligned in the first place:
In misuse, an AI doesn’t have to hide its bad actions (e.g. it can do chains of thought that explicitly reason about how to cause harm).
In misuse, an AI gets active help from humans (e.g. maybe the AI is great at creating bioweapon designs, but only the human can get them synthesized and released into the world).
It’s possible that the capabilities to overcome these issues comes at the same time as the capability for misuse, but I doubt that will happen.
Also, this seems to contradict your previous position:
I didn’t say “blanket ban on ML,” I said “a blanket ban on very large models.”
Why? Because I have not seen, and don’t think anyone can make, a clear criterion for “too high risk of doom,” both because there are value judgements, and because we don’t know how capabilities arise—so there needs to be some cutoff past which we ban everything, until we have an actual regulatory structure in place to evaluate. Is that different than the “conditional ban” you’re advocating? If so, how?
I know? I never said you asked for a blanket ban on ML?
My post discusses “pause once an agent passes 10 or more of the ARC Evals tasks”. I think this is too weak a criterion and I’d argue for a harder test, but I think this is already better than a blanket ban on very large models.
Anthropic just committed to a conditional pause.
ARC Evals is pushing responsible scaling, which is a conditional pause proposal.
But also, the blanket ban on very large models is implicitly saying that “more powerful than GPT-4” is the criterion of “too high risk of doom”, so I really don’t understand at all where you’re coming from.