Image format that might effective for animal welfare
Left: chicken in tiny box
Cost per chicken over lifetime: X$
Right: chicken in bigger box
Cost per chicken over lifetime: Y$
Image format that might effective for animal welfare
Left: chicken in tiny box
Cost per chicken over lifetime: X$
Right: chicken in bigger box
Cost per chicken over lifetime: Y$
Crossposting my comment from LW for visibility and feedback on my evaluator access proposal.
Anthropic said that collaborating with METR “requir[ed] significant science and engineering support on our end”; it has not clarified why.
I can comment on this (I think without breaking NDA). I will oversimplify. They were changing around their deployment system, infra, etc. We wanted uptime and throughput. Big pain in the ass to keep the model up (with proper access control) while they were overhauling stuff. Furthermore, anthropic and METR kept changing points of contact (rapidly growing teams).
This was and is my proposal for evaluator model access: If at least 10 people at a lab can access a model then at least 1 person at METR must have access.
This is for the labs self-enforcing via public agreements.
This seems like something they would actually agree to.
If it were a law then you would replace METR with “a govt approved auditor”.
I think conformance could be greatly improved by getting labs to use a little login widget (could be CLI) which allows eg METR to see access permission changes (possibly with codenames for models andor people). Ideally this would be very little effort for labs and sidestepping it would be more effort once it was set up.
Feedback welcome.
External red-teaming is not external model evaluation. External red-teaming … several people …. ~10 hours each. External model evals … experts … evals suites … ~10,000 hours developing.
Yes there is some awkwardness here… Red teaming could be extremely effective if structured as an open competition. Possibly more effective than orgs like METR. The problem is that this trains up tons of devs on Doing Evil With AI and probably also produces lots of really useful github repos. So I agree with you.
The first O in OODA implies something new to observe, no? And within the OODAL of a city there are many smaller loops where eg you see if your friend where’s a mask if you ask them.
And with the ToC and such I thought the first post was kind of an introduction/abstract.
Anyway I’m looking forward to these posts and very curious what the OODA loop of a city looks like
Frustrating that the lines between the rows in that table kind of shift and wiggle because of the various kinds and unclear relative importance and meaning of different kinds of “feedback”.
The canonical example is Einstein getting special relativity basically perfectly right with almost no reality-feedback because he had math-feedback. Started with a good amount of data but so do we.
In AI research & bio we are blessed with several kinds of useful feedback at several timescales, although the ultimate review hasn’t come yet.
I’m tempted to be rude and say your first post should be titled “tips for interacting with large orgs”. I may be misunderstanding you so that comment isnt really granted. If you did title it that though, I would be just as interested.
For planes, a hill had enough feedback. For masks, a kitchen spray faucet is maybe enough if you’re honest with yourself. The US military gets mountains of data about its operations and their failure causes but whoever is running things might do better having a date night with their diary than having a presentation from their intelligence officers. I don’t think data/experiments are the big missing piece across the board. In policy of course it is about practice and structures and connections 95%...
So all this to say: there are most likely big ways we can get more feedback on all our longterm efforts and we certainly ought to, but I expect that this advice will need to be extremely specific to be useful, and that people are already trying very hard all the time to get meaningful data, and that just saying moar experimetns won’t get us far.
Is there a good post explaining what the biggest/easiest wins in animal welfare have been and what easy things can be done next?