This seems to be of questionable effectiveness. Brief answers/challenges:
Evaluations are key input to ineffective governance. The safety frameworks presented by the frontier labs are “safety-washing”, more appropriately considered roadmaps towards an unsurvivable future.
Disagreement on AI capabilities underpin performative disagreements on AI Risk. As far as I know, there’s no recent published substantial such disagreement—I’d like sources for your claim, please.
We don’t need more situational awareness of what current frontier models can and cannot do in order to respond appropriately. No decision-relevant conclusions can be drawn from evaluations in the style of Cybench and Re-Bench.
> The safety frameworks presented by the frontier labs are “safety-washing”, more appropriately considered roadmaps towards an unsurvivable future
I don’t see the labs as the main audience for evaluation results, and I don’t think voluntary safety frameworks should be how deployment and safeguard decisions are made in the long-term, so I don’t think the quality of lab safety frameworks is that relevant to this RFP.
I’m surprised you think the disagreements are “performative” – in my experience, many sceptics of GCRs from AI really do sincerely hold their beliefs.
> No decision-relevant conclusions can be drawn from evaluations in the style of Cybench and Re-Bench.
I think Cybench and RE-Bench are useful, if imperfect, proxies for frontier model capabilities at cyberoffense and ML engineering respectively, and those capabilities are central to threats from cyberattacks and AI R&D. My claim isn’t that running these evals will tell you exactly what to do: it’s that these evaluations are being used as inputs into RSPs and governance proposals more broadly, and provide some evidence on the likelihood of GCRs from AI, but will need to be harder and more robust to be relied upon.
This seems to be of questionable effectiveness. Brief answers/challenges:
Evaluations are key input to ineffective governance. The safety frameworks presented by the frontier labs are “safety-washing”, more appropriately considered roadmaps towards an unsurvivable future.
Disagreement on AI capabilities underpin performative disagreements on AI Risk. As far as I know, there’s no recent published substantial such disagreement—I’d like sources for your claim, please.
We don’t need more situational awareness of what current frontier models can and cannot do in order to respond appropriately. No decision-relevant conclusions can be drawn from evaluations in the style of Cybench and Re-Bench.
Hi Søren,
Thanks for commenting. Some quick responses:
> The safety frameworks presented by the frontier labs are “safety-washing”, more appropriately considered roadmaps towards an unsurvivable future
I don’t see the labs as the main audience for evaluation results, and I don’t think voluntary safety frameworks should be how deployment and safeguard decisions are made in the long-term, so I don’t think the quality of lab safety frameworks is that relevant to this RFP.
> I’d like sources for your claim, please.
Sure, see e.g. the sources linked to in our RFP for this claim: What Are the Real Questions in AI? and What the AI debate is really about.
I’m surprised you think the disagreements are “performative” – in my experience, many sceptics of GCRs from AI really do sincerely hold their beliefs.
> No decision-relevant conclusions can be drawn from evaluations in the style of Cybench and Re-Bench.
I think Cybench and RE-Bench are useful, if imperfect, proxies for frontier model capabilities at cyberoffense and ML engineering respectively, and those capabilities are central to threats from cyberattacks and AI R&D. My claim isn’t that running these evals will tell you exactly what to do: it’s that these evaluations are being used as inputs into RSPs and governance proposals more broadly, and provide some evidence on the likelihood of GCRs from AI, but will need to be harder and more robust to be relied upon.