AI evaluations and standards (or “evals”) are processes that check or audit AI models. Evaluations can focus on how powerful models are (“capability evaluations”) and on whether models are exhibiting dangerous behaviors or are misaligned (“alignment evaluations” or “safety evaluations”).Working on AI evaluations might involve developing standards and enforcing compliance with the standards.Evaluations can help labs determine whether it’s safe to deploy new models, and can help with AI governance and regulation.
Further reading
Lesswrong (2023) AI Evaluation posts
Karnofsky, Holden (2022) Racing through the minefield, Cold Takes, December 22.
Karnofsky, Holden (2022) AI Safety Seems Hard to Measure, Cold Takes, December 8.
Alignment Research Center (2023) Evals: A project of the non-profit Alignment Research Center focused on evaluating the capabilities and alignment of advanced ML models
Barnes, Beth (2023) Safety evaluations and standards for AI, EAG Bay Area, March 20.
Related entries
AI Safety | AI Governance | AI forecasting | Compute Governance | Slowing down AI | AI race
This comes pretty high on some google queries.
I suggest some tidying: Lesswrong (2023) -- the year seems off, does not match the elements under the link.
A space is missing before the last sentence of the first paragraph.