What is the situation where people coordinate to sound alarm bells over an AI benchmark? I basically don’t think people pay attention to benchmarks in the way that matters: a benchmark comes out demonstrating some new potential danger, AI safety people raise concerns about it, and the people with the actual power continue to ignore them.
Thinking of some historical examples:
Anthropic’s findings on alignment faking should have alerted people that AI is too dangerous to keep building, but it didn’t.
Anthropic/OpenAI recently finding that their latest models may cross ASL-4 / CBRN thresholds should’ve been an alarm bell that they can’t release those models, but they released them anyway and nobody could do anything to stop them.
A relevant question I’m not sure about: for people who talk to politicians about AI risk, how useful are benchmarks? I’m not involved in those conversations so I can’t really say. My guess is that politicians are more interested in obvious capabilities (e.g. Claude can write good code now) than they are in benchmark performance.
What is the situation where people coordinate to sound alarm bells over an AI benchmark? I basically don’t think people pay attention to benchmarks in the way that matters: a benchmark comes out demonstrating some new potential danger, AI safety people raise concerns about it, and the people with the actual power continue to ignore them.
Thinking of some historical examples:
Anthropic’s findings on alignment faking should have alerted people that AI is too dangerous to keep building, but it didn’t.
Anthropic/OpenAI recently finding that their latest models may cross ASL-4 / CBRN thresholds should’ve been an alarm bell that they can’t release those models, but they released them anyway and nobody could do anything to stop them.
A relevant question I’m not sure about: for people who talk to politicians about AI risk, how useful are benchmarks? I’m not involved in those conversations so I can’t really say. My guess is that politicians are more interested in obvious capabilities (e.g. Claude can write good code now) than they are in benchmark performance.