Race to the Top: Benchmarks for AI Safety

This is an executive summary of a blog post. Read the full texts here.

Summary

Benchmarks support the empirical, quantitative evaluation of progress in AI research. Although benchmarks are ubiquitous in most subfields of machine learning, they are still rare in the subfield of AI safety.

I argue that creating benchmarks should be a high priority for AI safety. While this idea is not new, I think it may still be underrated. Among other benefits, benchmarks would make it much easier to:

  • track the field’s progress and focus resources on the most productive lines of work;

  • create professional incentives for researchers—especially Chinese researchers—to work on problems that are relevant to AGI safety;

  • develop auditing regimes and regulations for advanced AI systems.

Unfortunately, we cannot assume that good benchmarks will be developed quickly enough “by default.” I discuss several reasons to expect them to be undersupplied. I also outline actions that different groups can take today to accelerate their development.

For example, AI safety researchers can help by:

  • directly trying their hand at creating safety-relevant benchmarks;

  • clarifying certain safety-relevant traits (such as “honesty” and “power-seekingness”) that it could be important to measure in the future;

  • building up relevant expertise and skills, for instance by working on other benchmarking projects;

  • drafting “benchmark roadmaps,” which identify categories of benchmarks that could be valuable in the future and outline prerequisites for developing them.

And AI governance professionals can help by:

  • co-organizing workshops, competitions, and prizes focused on benchmarking;

  • creating third-party institutional homes for benchmarking work;

  • clarifying, ahead of time, how auditing and regulatory frameworks can put benchmarks to use;

  • advising safety researchers on political, institutional, and strategic considerations that matter for benchmark design;

  • popularizing the narrative of a “race to the top” on AI safety.

Ultimately, we can and should begin to build benchmark-making capability now.

Acknowledgment

I would like to thank Ben Garfinkel and Owen Cotton-Barratt for their mentorship, Emma Bluemke and many others at the Centre for the Governance of AI for their warmhearted support. All views and errors are my own.

Future research

I am working on a paper on the topic, and if you are interested in benchmarks and model evaluation, especially if you are a technical AI safety researcher, I would love to hear from you!