even with the pause mechanism kicking in (e.g., “from now on, any training runs that use 0.2x the compute of the model that failed the eval will be prohibited”), algorithmic progress at another lab or driven by people on twitter(/X) tinkering with older, already-released models, will get someone to the same capability threshold again soon enough. [...] you can still incorporate this concern by adding enough safety margin to when your evals trigger the pause button.
Safety margin is one way, but I’d be much more keen on continuing to monitor the strongest models even after the pause has kicked in, so that you notice the effects of algorithmic progress and can tighten controls if needed. This includes rolling back model releases if people on Twitter tinker with them to exceed capability thresholds. (This also implies no open sourcing, since you can’t roll back if you’ve open sourced the model.)
But I also wish you’d say what exactly your alternative course of action is, and why it’s better. E.g. the worry of “algorithmic progress gets you to the threshold” also applies to unconditional pauses. Right now your comments feel to me like a search for anything negative about a conditional pause, without checking whether that negative applies to other courses of action.
But I also wish you’d say what exactly your alternative course of action is, and why it’s better. E.g. the worry of “algorithmic progress gets you to the threshold” also applies to unconditional pauses. Right now your comments feel to me like a search for anything negative about a conditional pause, without checking whether that negative applies to other courses of action.
The way I see it, the main difference between conditional vs unconditional pause is that the unconditional pause comes with a bigger safety margin (as big as we can muster). So, given that I’m more worried about surprising takeoffs, that position seems prima facie more appealing to me.
In addition, as I say in my other comment, I’m open to (edit: or, more strongly, I’d ideally prefer this!) some especially safety-conscious research continuing onwards through the pause. I gather that this is one of your primary concerns? I agree that an outcome where that’s possible requires nuanced discourse, which we may not get if public reaction to AI goes too far in one direction. So, I agree that there’s a tradeoff around public advocacy.
Safety margin is one way, but I’d be much more keen on continuing to monitor the strongest models even after the pause has kicked in, so that you notice the effects of algorithmic progress and can tighten controls if needed. This includes rolling back model releases if people on Twitter tinker with them to exceed capability thresholds. (This also implies no open sourcing, since you can’t roll back if you’ve open sourced the model.)
But I also wish you’d say what exactly your alternative course of action is, and why it’s better. E.g. the worry of “algorithmic progress gets you to the threshold” also applies to unconditional pauses. Right now your comments feel to me like a search for anything negative about a conditional pause, without checking whether that negative applies to other courses of action.
The way I see it, the main difference between conditional vs unconditional pause is that the unconditional pause comes with a bigger safety margin (as big as we can muster). So, given that I’m more worried about surprising takeoffs, that position seems prima facie more appealing to me.
In addition, as I say in my other comment, I’m open to (edit: or, more strongly, I’d ideally prefer this!) some especially safety-conscious research continuing onwards through the pause. I gather that this is one of your primary concerns? I agree that an outcome where that’s possible requires nuanced discourse, which we may not get if public reaction to AI goes too far in one direction. So, I agree that there’s a tradeoff around public advocacy.