Curious why METR. This is less METR specific and more about capabilities benchmarks: Doesn’t frontier capabilities benchmarking help accelerate ai development? I know they also do pure safety but in practice I feel like they have done more to push forth the agentic autmation race.
Curious why METR. This is less METR specific and more about capabilities benchmarks: Doesn’t frontier capabilities benchmarking help accelerate ai development? I know they also do pure safety but in practice I feel like they have done more to push forth the agentic autmation race.