Error

LW server reports: not allowed.

This probably means the post has been deleted or moved back to the author's drafts.

lukehmiles May 26, 2024, 6:49 PM
3 points
0 ∶ 0

Crossposting my comment from LW for visibility and feedback on my evaluator access proposal.
Anthropic said that collaborating with METR “requir[ed] significant science and engineering support on our end”; it has not clarified why.

I can comment on this (I think without breaking NDA). I will oversimplify. They were changing around their deployment system, infra, etc. We wanted uptime and throughput. Big pain in the ass to keep the model up (with proper access control) while they were overhauling stuff. Furthermore, anthropic and METR kept changing points of contact (rapidly growing teams).
This was and is my proposal for evaluator model access: If at least 10 people at a lab can access a model then at least 1 person at METR must have access.

This is for the labs self-enforcing via public agreements.

This seems like something they would actually agree to.

If it were a law then you would replace METR with “a govt approved auditor”.

I think conformance could be greatly improved by getting labs to use a little login widget (could be CLI) which allows eg METR to see access permission changes (possibly with codenames for models andor people). Ideally this would be very little effort for labs and sidestepping it would be more effort once it was set up.

Feedback welcome.
External red-teaming is not external model evaluation. External red-teaming … several people …. ~10 hours each. External model evals … experts … evals suites … ~10,000 hours developing.

Yes there is some awkwardness here… Red teaming could be extremely effective if structured as an open competition. Possibly more effective than orgs like METR. The problem is that this trains up tons of devs on Doing Evil With AI and probably also produces lots of really useful github repos. So I agree with you.