[Question] Has Anthropic already made the externally legible commitments that it planned to make?

Ofer12 Mar 2024 13:45 UTC

21 points

A year ago (2023-03-08) Anthropic published an announcement that included the following:

In the near future, we also plan to make externally legible commitments to only develop models beyond a certain capability threshold if safety standards can be met, and to allow an independent, external organization to evaluate both our model’s capabilities and safety.

Has Anthropic made such externally legible commitments so far?

Ofer12 Mar 2024 13:45 UTC

21 points

3 comments1 min readEA link

AI governance AI safety

Neel Nanda 12 Mar 2024 18:00 UTC
12 points
3 ∶ 0
Yes, I presume this is referring to their Responsible Scaling Policy
- Ofer 12 Mar 2024 19:38 UTC
  15 points
  0 ∶ 0
  Parent
  Thanks!
  
  Follow up questions to anyone who may know:
  
  Is METR (formerly ARC Evals) meant to be the “independent, external organization” that is allowed to evaluate the capabilities and safety of Anthropic’s models? As of 2023-12-04 METR was spinning off from the Alignment Research Center (ARC) into their own standalone nonprofit 501(c)(3) organization, according to their website. Who is on METR’s board of directors?
  
  Note: OpenPhil seemingly recommended a total of $1,515,000 to ARC in 2022. Holden Karnofsky (co-founder and co-CEO of OpenPhil at the time, and currently a board member) is married to Daniela Amodei (co-founder of Anthropic and sibling of the CEO of Anthropic Dario Amodei) according to Wikipedia.
  - Habryka [Deactivated] 13 Mar 2024 2:58 UTC
    10 points
    2 ∶ 3
    Parent
    (Just for the record, I don’t think METR would be accurately described as an independent organization, but also I don’t see any other candidate organization that is better placed. But in as much as Anthropic promised it would find an independent organization, METR, in my opinion, does not qualify)

No comments.