Benchmarks are tests which enable us to measure the progress of AI capabilities, and test for characteristics which might pose safety risks.
Further reading
BASALT: A Benchmark for Learning from Human Feedback—AI Alignment Forum
Misaligned Powerseeking — SERI ML Alignment Theory Scholars Program | Summer 2022
[2110.06674] Truthful AI: Developing and governing AI that does not lie
I’m not totally sure whether this should exist, and whether it should be called this.