In RSP, Anthropic committed to define ASL-4 by the time they reach ASL-3.
With Claude 4 released today, they have reached ASL-3. They haven’t yet defined ASL-4.
Turns out, they have quietly walked back on the commitment. The change happened less than two months ago and, to my knowledge, was not announced on LW or other visible places unlike other important changes to the RSP. It’s also not in the changelog on their website; in the description of the relevant update, they say they added a new commitment but don’t mention removing this one.
Anthropic’s behavior is not at all the behavior of a responsible AI company. Trained a new model that reaches ASL-3 before you can define ASL-4? No problem, update the RSP so that you no longer have to, and basically don’t tell anyone. (Did anyone not working for Anthropic know the change happened?)
When their commitments go against their commercial interests, we can’t trust their commitments.
You should not work at Anthropic on AI capabilities.
This is false. Our ASL-4 thresholds are clearly specified in the current RSP—see “CBRN-4” and “AI R&D-4″. We evaluated Claude Opus 4 for both of these thresholds prior to release and found that the model was not ASL-4. All of these evaluations are detailed in the Claude 4 system card.
The original commitment was (IIRC!) about defining the thresholds, not about mitigations. I didn’t notice ASL-4 when I briefly checked the RSP table of contents earlier today and I trusted the reporting on this from Obsolete. I apologized and retracted the take on LessWrong, but forgot I posted it here as well; want to apologize to everyone here, too, I was wrong.
In RSP, Anthropic committed to define ASL-4 by the time they reach ASL-3.
With Claude 4 released today, they have reached ASL-3. They haven’t yet defined ASL-4.
Turns out, they have quietly walked back on the commitment. The change happened less than two months ago and, to my knowledge, was not announced on LW or other visible places unlike other important changes to the RSP. It’s also not in the changelog on their website; in the description of the relevant update, they say they added a new commitment but don’t mention removing this one.
Anthropic’s behavior is not at all the behavior of a responsible AI company. Trained a new model that reaches ASL-3 before you can define ASL-4? No problem, update the RSP so that you no longer have to, and basically don’t tell anyone. (Did anyone not working for Anthropic know the change happened?)
When their commitments go against their commercial interests, we can’t trust their commitments.
You should not work at Anthropic on AI capabilities.
This is false. Our ASL-4 thresholds are clearly specified in the current RSP—see “CBRN-4” and “AI R&D-4″. We evaluated Claude Opus 4 for both of these thresholds prior to release and found that the model was not ASL-4. All of these evaluations are detailed in the Claude 4 system card.
The thresholds are pretty meaningless without at least a high-level standard, no?
The RSP specifies that CBRN-4 and AI R&D-5 both require ASL-4 security. Where is ASL-4 itself defined?
The original commitment was (IIRC!) about defining the thresholds, not about mitigations. I didn’t notice ASL-4 when I briefly checked the RSP table of contents earlier today and I trusted the reporting on this from Obsolete. I apologized and retracted the take on LessWrong, but forgot I posted it here as well; want to apologize to everyone here, too, I was wrong.