A year ago (2023-03-08) Anthropic published an announcement that included the following:
In the near future, we also plan to make externally legible commitments to only develop models beyond a certain capability threshold if safety standards can be met, and to allow an independent, external organization to evaluate both our model’s capabilities and safety.
Has Anthropic made such externally legible commitments so far?
Yes, I presume this is referring to their Responsible Scaling Policy
Thanks!
Follow up questions to anyone who may know:
Is METR (formerly ARC Evals) meant to be the “independent, external organization” that is allowed to evaluate the capabilities and safety of Anthropic’s models? As of 2023-12-04 METR was spinning off from the Alignment Research Center (ARC) into their own standalone nonprofit 501(c)(3) organization, according to their website. Who is on METR’s board of directors?
Note: OpenPhil seemingly recommended a total of $1,515,000 to ARC in 2022. Holden Karnofsky (co-founder and co-CEO of OpenPhil at the time, and currently a board member) is married to Daniela Amodei (co-founder of Anthropic and sibling of the CEO of Anthropic Dario Amodei) according to Wikipedia.
(Just for the record, I don’t think METR would be accurately described as an independent organization, but also I don’t see any other candidate organization that is better placed. But in as much as Anthropic promised it would find an independent organization, METR, in my opinion, does not qualify)