@evhub can you say more about what you envision a governmentally-enforced RSP world would look like? Is it similar to licensing? What happens when a dangerous capability eval goes off— does the government have the ability to implement a national pause?
Aside: IMO it’s pretty clear that the voluntary-commitment RSP regime is insufficient, since some companies simply won’t develop RSPs, and even if lots of folks adopted RSPs, the competitive pressures in favor of racing seem like they’d make it hard for anyone to pause for >a few months. I was surprised/disappointed that neither ARC nor Anthropic mentioned this. ARC says some stuff about how maybe in the future one day we might have some stuff from RSPs that could maybe inform government standards, but (in my opinion) their discussion of government involvement was quite weak, perhaps even to the point of being misleading (by making it seem like the voluntary commitments will be sufficient.)
I think some of the negative reaction to responsible scaling, at least among some people I know, is that it seems like an attempt for companies to say “trust us— we can scale responsibly, so we don’t need actual government regulation.” If the narrative is “hey, we agree that the government should force everyone to scale responsibly, and this means that the government would have the ability to tell people that they have to stop scaling if the government decides it’s too risky”, then I’d still probably prefer stopping right now, but I’d be much more sympathetic to the RSP position.
What happens when a dangerous capability eval goes off— does the government have the ability to implement a national pause?
I think presumably the pause would just be for that company’s scaling—presumably other organizations that were still in compliance would still be fine.
If the narrative is “hey, we agree that the government should force everyone to scale responsibly, and this means that the government would have the ability to tell people that they have to stop scaling if the government decides it’s too risky”, then I’d still probably prefer stopping right now, but I’d be much more sympathetic to the RSP position.
That’s definitely my position, yeah—and I think it’s also ARC’s and Anthropic’s position. I think the key thing with the current advocacy around companies doing this is that one of the best ways to get a governmentally-enforced RSP regime is for companies to first voluntarily commit to the sort of RSPs that you want the government to later enforce.
I think presumably the pause would just be for that company’s scaling—presumably other organizations that were still in compliance would still be fine.
I think this makes sense for certain types of dangerous capabilities (e.g., a company develops a system that has strong cyberoffensive capabilities. That company has to stop but other companies can keep going).
But what about dangerous capabilities that have more to do with AI takeover (e.g., a company develops a system that shows signs of autonomous replication, manipulation, power-seeking, deception) or scientific capabilities (e.g., the ability to develop better AI systems)?
Supposing that 3-10 other companies are within a few months of these systems, do you think at this point we need a coordinated pause, or would it be fine to just force company 1 to pause?
That’s definitely my position, yeah—and I think it’s also ARC’s and Anthropic’s position.
Do you know if ARC or Anthropic have publicly endorsed this position anywhere? (And if not, I’d be curious for your take on why, although that’s more speculative so feel free to pass).
But what about dangerous capabilities that have more to do with AI takeover (e.g., a company develops a system that shows signs of autonomous replication, manipulation, power-seeking, deception) or scientific capabilities (e.g., the ability to develop better AI systems)?
Supposing that 3-10 other companies are within a few months of these systems, do you think at this point we need a coordinated pause, or would it be fine to just force company 1 to pause?
If they can’t do that, then the other labs catch up and they’re all blocked on the same spot, which if you’ve put your capabilities bars at the right spots, shouldn’t be dangerous.
If they can do that, then they get to keep going, ahead of other labs, until they hit another blocker and need to demonstrate safety/understanding/alignment to an even greater degree.
@evhub can you say more about what you envision a governmentally-enforced RSP world would look like? Is it similar to licensing? What happens when a dangerous capability eval goes off— does the government have the ability to implement a national pause?
Aside: IMO it’s pretty clear that the voluntary-commitment RSP regime is insufficient, since some companies simply won’t develop RSPs, and even if lots of folks adopted RSPs, the competitive pressures in favor of racing seem like they’d make it hard for anyone to pause for >a few months. I was surprised/disappointed that neither ARC nor Anthropic mentioned this. ARC says some stuff about how maybe in the future one day we might have some stuff from RSPs that could maybe inform government standards, but (in my opinion) their discussion of government involvement was quite weak, perhaps even to the point of being misleading (by making it seem like the voluntary commitments will be sufficient.)
I think some of the negative reaction to responsible scaling, at least among some people I know, is that it seems like an attempt for companies to say “trust us— we can scale responsibly, so we don’t need actual government regulation.” If the narrative is “hey, we agree that the government should force everyone to scale responsibly, and this means that the government would have the ability to tell people that they have to stop scaling if the government decides it’s too risky”, then I’d still probably prefer stopping right now, but I’d be much more sympathetic to the RSP position.
I think presumably the pause would just be for that company’s scaling—presumably other organizations that were still in compliance would still be fine.
That’s definitely my position, yeah—and I think it’s also ARC’s and Anthropic’s position. I think the key thing with the current advocacy around companies doing this is that one of the best ways to get a governmentally-enforced RSP regime is for companies to first voluntarily commit to the sort of RSPs that you want the government to later enforce.
Thanks! A few quick responses/questions:
I think this makes sense for certain types of dangerous capabilities (e.g., a company develops a system that has strong cyberoffensive capabilities. That company has to stop but other companies can keep going).
But what about dangerous capabilities that have more to do with AI takeover (e.g., a company develops a system that shows signs of autonomous replication, manipulation, power-seeking, deception) or scientific capabilities (e.g., the ability to develop better AI systems)?
Supposing that 3-10 other companies are within a few months of these systems, do you think at this point we need a coordinated pause, or would it be fine to just force company 1 to pause?
Do you know if ARC or Anthropic have publicly endorsed this position anywhere? (And if not, I’d be curious for your take on why, although that’s more speculative so feel free to pass).
I wrote up a bunch of my thoughts on this in more detail here.
What should happen there is that the leading lab is forced to stop and try to demonstrate that e.g. they understand their model sufficiently such that they can keep scaling. Then:
If they can’t do that, then the other labs catch up and they’re all blocked on the same spot, which if you’ve put your capabilities bars at the right spots, shouldn’t be dangerous.
If they can do that, then they get to keep going, ahead of other labs, until they hit another blocker and need to demonstrate safety/understanding/alignment to an even greater degree.