MikhailSamin comments on Samin’s Quick takes

MikhailSamin 22 May 2025 23:40 UTC
1 point
1 ∶ 2
In RSP, Anthropic committed to define ASL-4 by the time they reach ASL-3.
With Claude 4 released today, they have reached ASL-3. They haven’t yet defined ASL-4.
Turns out, they have quietly walked back on the commitment. The change happened less than two months ago and, to my knowledge, was not announced on LW or other visible places unlike other important changes to the RSP. It’s also not in the changelog on their website; in the description of the relevant update, they say they added a new commitment but don’t mention removing this one.
Anthropic’s behavior is not at all the behavior of a responsible AI company. Trained a new model that reaches ASL-3 before you can define ASL-4? No problem, update the RSP so that you no longer have to, and basically don’t tell anyone. (Did anyone not working for Anthropic know the change happened?)
When their commitments go against their commercial interests, we can’t trust their commitments.
You should not work at Anthropic on AI capabilities.
What links here?
- Mo Putera's comment on Anthropic is Quietly Backpedalling on its Safety Commitments by garrison (LessWrong; 23 May 2025 8:43 UTC; 8 points)
- evhub 23 May 2025 1:17 UTC
  18 points
  1 ∶ 2
  Parent
  This is false. Our ASL-4 thresholds are clearly specified in the current RSP—see “CBRN-4” and “AI R&D-4″. We evaluated Claude Opus 4 for both of these thresholds prior to release and found that the model was not ASL-4. All of these evaluations are detailed in the Claude 4 system card.
  What links here?
  - Mo Putera's comment on Anthropic is Quietly Backpedalling on its Safety Commitments by garrison (LessWrong; 23 May 2025 8:43 UTC; 8 points)
  - Zach Stein-Perlman 23 May 2025 16:14 UTC
    13 points
    1 ∶ 0
    Parent
    The thresholds are pretty meaningless without at least a high-level standard, no?
  - MichaelDickens 23 May 2025 2:12 UTC
    2 points
    0 ∶ 1
    Parent
    The RSP specifies that CBRN-4 and AI R&D-5 both require ASL-4 security. Where is ASL-4 itself defined?
    - MikhailSamin 23 May 2025 2:19 UTC
      26 points
      1 ∶ 0
      Parent
      The original commitment was (IIRC!) about defining the thresholds, not about mitigations. I didn’t notice ASL-4 when I briefly checked the RSP table of contents earlier today and I trusted the reporting on this from Obsolete. I apologized and retracted the take on LessWrong, but forgot I posted it here as well; want to apologize to everyone here, too, I was wrong.