What happens when a dangerous capability eval goes off— does the government have the ability to implement a national pause?
I think presumably the pause would just be for that company’s scaling—presumably other organizations that were still in compliance would still be fine.
If the narrative is “hey, we agree that the government should force everyone to scale responsibly, and this means that the government would have the ability to tell people that they have to stop scaling if the government decides it’s too risky”, then I’d still probably prefer stopping right now, but I’d be much more sympathetic to the RSP position.
That’s definitely my position, yeah—and I think it’s also ARC’s and Anthropic’s position. I think the key thing with the current advocacy around companies doing this is that one of the best ways to get a governmentally-enforced RSP regime is for companies to first voluntarily commit to the sort of RSPs that you want the government to later enforce.
I think presumably the pause would just be for that company’s scaling—presumably other organizations that were still in compliance would still be fine.
I think this makes sense for certain types of dangerous capabilities (e.g., a company develops a system that has strong cyberoffensive capabilities. That company has to stop but other companies can keep going).
But what about dangerous capabilities that have more to do with AI takeover (e.g., a company develops a system that shows signs of autonomous replication, manipulation, power-seeking, deception) or scientific capabilities (e.g., the ability to develop better AI systems)?
Supposing that 3-10 other companies are within a few months of these systems, do you think at this point we need a coordinated pause, or would it be fine to just force company 1 to pause?
That’s definitely my position, yeah—and I think it’s also ARC’s and Anthropic’s position.
Do you know if ARC or Anthropic have publicly endorsed this position anywhere? (And if not, I’d be curious for your take on why, although that’s more speculative so feel free to pass).
But what about dangerous capabilities that have more to do with AI takeover (e.g., a company develops a system that shows signs of autonomous replication, manipulation, power-seeking, deception) or scientific capabilities (e.g., the ability to develop better AI systems)?
Supposing that 3-10 other companies are within a few months of these systems, do you think at this point we need a coordinated pause, or would it be fine to just force company 1 to pause?
If they can’t do that, then the other labs catch up and they’re all blocked on the same spot, which if you’ve put your capabilities bars at the right spots, shouldn’t be dangerous.
If they can do that, then they get to keep going, ahead of other labs, until they hit another blocker and need to demonstrate safety/understanding/alignment to an even greater degree.
I think presumably the pause would just be for that company’s scaling—presumably other organizations that were still in compliance would still be fine.
That’s definitely my position, yeah—and I think it’s also ARC’s and Anthropic’s position. I think the key thing with the current advocacy around companies doing this is that one of the best ways to get a governmentally-enforced RSP regime is for companies to first voluntarily commit to the sort of RSPs that you want the government to later enforce.
Thanks! A few quick responses/questions:
I think this makes sense for certain types of dangerous capabilities (e.g., a company develops a system that has strong cyberoffensive capabilities. That company has to stop but other companies can keep going).
But what about dangerous capabilities that have more to do with AI takeover (e.g., a company develops a system that shows signs of autonomous replication, manipulation, power-seeking, deception) or scientific capabilities (e.g., the ability to develop better AI systems)?
Supposing that 3-10 other companies are within a few months of these systems, do you think at this point we need a coordinated pause, or would it be fine to just force company 1 to pause?
Do you know if ARC or Anthropic have publicly endorsed this position anywhere? (And if not, I’d be curious for your take on why, although that’s more speculative so feel free to pass).
I wrote up a bunch of my thoughts on this in more detail here.
What should happen there is that the leading lab is forced to stop and try to demonstrate that e.g. they understand their model sufficiently such that they can keep scaling. Then:
If they can’t do that, then the other labs catch up and they’re all blocked on the same spot, which if you’ve put your capabilities bars at the right spots, shouldn’t be dangerous.
If they can do that, then they get to keep going, ahead of other labs, until they hit another blocker and need to demonstrate safety/understanding/alignment to an even greater degree.