Thanks for the post! For those reading, Iâm Jayâhead of technology and standards for the Inspect Evals repo, and the reviewer of Declanâs PR. I happened to spot this post without realising it was from a recent contributor! A couple of quick clarifications around the structure of Inspect Evals (it is pretty confusing)
Inspect Evals != Inspect, and isnât run by the same team. Inspect is the evals framework, Inspect Evals is a repository of evals that use that framework. Inspect Evals is run by Arcadia Impact, and weâre contracted by the UK AISI to maintain it.
Our developers work remotely as contractors, so moving to London isnât required. Iâm in Australia at the moment. (though weâre doing a restructure atm and itâs uncertain how thatâs going to pan out, so Iâm not sure about our current hiring)
I think the EA Hotel people have a point about evaluations, personally. I think if youâre going to open-source evaluations, you should ask âWould I be okay if frontier AI companies trained on this /â hill-climbed on this metric?â For frontier maths evals you might not want thisâis it worth the increased knowledge we get about these capabilities? For moral reasoning under uncertainty, you may actively want them to do something like this.
Finally, Iâm glad you liked the agent stuffâthatâs been a majority of where my timeâs gone this quarter. Appreciate the feedback, and more is always welcome :)
Thanks for the post! For those reading, Iâm Jayâhead of technology and standards for the Inspect Evals repo, and the reviewer of Declanâs PR. I happened to spot this post without realising it was from a recent contributor! A couple of quick clarifications around the structure of Inspect Evals (it is pretty confusing)
Inspect Evals != Inspect, and isnât run by the same team. Inspect is the evals framework, Inspect Evals is a repository of evals that use that framework. Inspect Evals is run by Arcadia Impact, and weâre contracted by the UK AISI to maintain it.
Our developers work remotely as contractors, so moving to London isnât required. Iâm in Australia at the moment. (though weâre doing a restructure atm and itâs uncertain how thatâs going to pan out, so Iâm not sure about our current hiring)
I think the EA Hotel people have a point about evaluations, personally. I think if youâre going to open-source evaluations, you should ask âWould I be okay if frontier AI companies trained on this /â hill-climbed on this metric?â For frontier maths evals you might not want thisâis it worth the increased knowledge we get about these capabilities? For moral reasoning under uncertainty, you may actively want them to do something like this.
Finally, Iâm glad you liked the agent stuffâthatâs been a majority of where my timeâs gone this quarter. Appreciate the feedback, and more is always welcome :)