AI evaluations and standards

TagLast edit: Apr 25, 2023, 5:39 PM by Dane Valerie

AI evaluations and standards (or “evals”) are processes that check or audit AI models. Evaluations can focus on how powerful models are (“capability evaluations”) and on whether models are exhibiting dangerous behaviors or are misaligned (“alignment evaluations” or “safety evaluations”).Working on AI evaluations might involve developing standards and enforcing compliance with the standards.Evaluations can help labs determine whether it’s safe to deploy new models, and can help with AI governance and regulation.

DeepMind: Model evaluation for extreme risks

Zach Stein-PerlmanMay 25, 2023, 3:00 AM

49 points

3 comments1 min readEA link

How technical safety standards could promote TAI safety

Cullen 🔸Aug 8, 2022, 4:57 PM

128 points

15 comments7 min readEA link

AI Safety Seems Hard to Measure

Holden KarnofskyDec 11, 2022, 1:31 AM

90 points

4 comments14 min readEA link

(www.cold-takes.com)

Trendlines in AIxBio evals

ljustenOct 31, 2024, 12:09 AM

39 points

2 comments11 min readEA link

(www.lennijusten.com)

Case studies on social-welfare-based standards in various industries

Holden KarnofskyJun 20, 2024, 1:33 PM

73 points

2 comments1 min readEA link

Racing through a minefield: the AI deployment problem

Holden KarnofskyDec 31, 2022, 9:44 PM

79 points

1 comment13 min readEA link

(www.cold-takes.com)

AI Risk Management Framework | NIST

𝕮𝖎𝖓𝖊𝖗𝖆Jan 26, 2023, 3:27 PM

50 points

0 comments1 min readEA link

AI Governance Needs Technical Work

MauSep 5, 2022, 10:25 PM

121 points

3 comments8 min readEA link

The case for more ambitious language model evals

JozdienJan 30, 2024, 9:24 AM

7 points

0 comments5 min readEA link

[Cause Exploration Prizes] Creating a “regulatory turbocharger” for EA relevant policies

Open PhilanthropyAug 11, 2022, 10:42 AM

5 points

1 comment11 min readEA link

Announcing ForecastBench, a new benchmark for AI and human forecasting abilities

Forecasting Research InstituteOct 1, 2024, 12:31 PM

20 points

1 comment3 min readEA link

(arxiv.org)

12 tentative ideas for US AI policy (Luke Muehlhauser)

LizkaApr 19, 2023, 9:05 PM

117 points

12 comments4 min readEA link

(www.openphilanthropy.org)

What AI companies can do today to help with the most important century

Holden KarnofskyFeb 20, 2023, 5:40 PM

104 points

8 comments11 min readEA link

(www.cold-takes.com)

Supplement to “The Brussels Effect and AI: How EU AI regulation will impact the global AI market”

MarkusAnderljungAug 16, 2022, 8:55 PM

109 points

7 comments8 min readEA link

Why Would AI “Aim” To Defeat Humanity?

Holden KarnofskyNov 29, 2022, 6:59 PM

24 points

0 comments32 min readEA link

(www.cold-takes.com)

AI policy ideas: Reading list

Zach Stein-PerlmanApr 17, 2023, 7:00 PM

60 points

3 comments1 min readEA link

[Question] Open-source AI safety projects?

defun 🔸Jan 29, 2024, 10:09 AM

8 points

2 comments1 min readEA link

FLI report: Policymaking in the Pause

Zach Stein-PerlmanApr 15, 2023, 5:01 PM

29 points

4 comments1 min readEA link

High-level hopes for AI alignment

Holden KarnofskyDec 20, 2022, 2:11 AM

123 points

14 comments19 min readEA link

(www.cold-takes.com)

An ‘AGI Emergency Eject Criteria’ consensus could be really useful.

tcelferactApr 7, 2023, 4:21 PM

27 points

3 comments1 min readEA link

Rolling Thresholds for AGI Scaling Regulation

LarksJan 12, 2025, 1:30 AM

60 points

4 comments6 min readEA link

Benchmark Performance is a Poor Measure of Generalisable AI Reasoning Capabilities

James FodorFeb 21, 2025, 4:25 AM

12 points

3 comments24 min readEA link

NIST AI Risk Management Framework request for information (RFI)

Aryeh EnglanderAug 31, 2021, 10:24 PM

7 points

0 comments2 min readEA link

Proposals for the AI Regulatory Sandbox in Spain

Guillem BasApr 27, 2023, 10:33 AM

55 points

2 comments11 min readEA link

(riesgoscatastroficosglobales.com)

Begging, Pleading AI Orgs to Comment on NIST AI Risk Management Framework

BridgesApr 15, 2022, 7:35 PM

87 points

3 comments2 min readEA link

GovAI: Towards best practices in AGI safety and governance: A survey of expert opinion

Zach Stein-PerlmanMay 15, 2023, 1:42 AM

68 points

3 comments1 min readEA link

Actionable-guidance and roadmap recommendations for the NIST AI Risk Management Framework

Tony BarrettMay 17, 2022, 3:27 PM

11 points

0 comments3 min readEA link

Seeking (Paid) Case Studies on Standards

Holden KarnofskyMay 26, 2023, 5:58 PM

99 points

14 comments1 min readEA link

AI Safety Newsletter #8: Rogue AIs, how to screen for AI risks, and grants for research on democratic governance of AI

Center for AI SafetyMay 30, 2023, 11:44 AM

16 points

3 comments6 min readEA link

(newsletter.safe.ai)

The EU AI Act needs a definition of high-risk foundation models to avoid regulatory overreach and backlash

matthias_samwaldMay 31, 2023, 3:34 PM

17 points

0 comments4 min readEA link

Announcing Apollo Research

mariushobbhahnMay 30, 2023, 4:17 PM

158 points

5 comments1 min readEA link

Biden-Harris Administration Announces First-Ever Consortium Dedicated to AI Safety

ben.smithFeb 9, 2024, 6:40 AM

15 points

1 comment1 min readEA link

(www.nist.gov)

Will the EU regulations on AI matter to the rest of the world?

hanadulsetJan 1, 2022, 9:56 PM

33 points

5 comments5 min readEA link

An overview of standards in biosafety and biosecurity

rosehadsharJul 26, 2023, 12:19 PM

77 points

7 comments11 min readEA link

The ‘Old AI’: Lessons for AI governance from early electricity regulation

Sam ClarkeDec 19, 2022, 2:46 AM

58 points

1 comment13 min readEA link

A Map to Navigate AI Governance

hanadulsetFeb 14, 2022, 10:41 PM

72 points

11 comments25 min readEA link

Nobody’s on the ball on AGI alignment

leopoldMar 29, 2023, 2:26 PM

327 points

65 comments9 min readEA link

(www.forourposterity.com)

Join the AI Evaluation Tasks Bounty Hackathon

Esben KranMar 18, 2024, 8:15 AM

20 points

0 comments4 min readEA link

Success without dignity: a nearcasting story of avoiding catastrophe by luck

Holden KarnofskyMar 15, 2023, 8:17 PM

113 points

3 comments1 min readEA link

How evals might (or might not) prevent catastrophic risks from AI

AkashFeb 7, 2023, 8:16 PM

28 points

0 comments1 min readEA link

How major governments can help with the most important century

Holden KarnofskyFeb 24, 2023, 7:37 PM

56 points

4 comments4 min readEA link

(www.cold-takes.com)

Project ideas: Sentience and rights of digital minds

Lukas FinnvedenJan 4, 2024, 7:26 AM

33 points

1 comment20 min readEA link

(lukasfinnveden.substack.com)

A Taxonomy Of AI System Evaluations

Maxime Riché 🔸Aug 19, 2024, 9:08 AM

8 points

0 comments14 min readEA link

Thinking About Propensity Evaluations

Maxime Riché 🔸Aug 19, 2024, 9:24 AM

12 points

1 comment27 min readEA link

o3

Zach Stein-PerlmanDec 20, 2024, 9:00 PM

84 points

8 comments1 min readEA link

Evals projects I’d like to see, and a call to apply to OP’s evals RFP

cbMar 25, 2025, 11:50 AM

19 points

2 comments3 min readEA link

BenchMoral: A benchmarking to assess the moral sensitivity of large language models (LLMs) in Spanish.

Flor Betzabeth Ampa FloresApr 30, 2025, 9:26 PM

1 point

0 comments18 min readEA link

OpenAI’s o3 model scores 3% on the ARC-AGI-2 benchmark, compared to 60% for the average human

YarrowMay 1, 2025, 1:57 PM

14 points

7 comments3 min readEA link

(arcprize.org)

ARC-AGI-2 Overview With François Chollet

YarrowApr 10, 2025, 6:54 PM

7 points

0 comments1 min readEA link

(youtu.be)

Performance comparison of Large Language Models (LLMs) in code generation and application of best practices in frontend web development

Diana V. Guaiña A.May 1, 2025, 2:57 PM

1 point

0 comments24 min readEA link

Why I am Still Skeptical about AGI by 2030

James FodorMay 2, 2025, 7:13 AM

53 points

2 comments6 min readEA link

AIs Are Expert-Level at Many Virology Skills

Center for AI SafetyMay 2, 2025, 4:07 PM

15 points

0 comments1 min readEA link

The current state of RSPs

Zach Stein-PerlmanNov 4, 2024, 4:00 PM

19 points

1 comment1 min readEA link

AI Audit in Costa Rica

Priscilla CamposJan 27, 2025, 2:57 AM

10 points

4 comments9 min readEA link

[Question] Whose track record of AI predictions would you like to see evaluated?

Jonny Spicer 🔸Jan 29, 2025, 11:57 AM

10 points

13 comments1 min readEA link

The Elicitation Game: Evaluating capability elicitation techniques

Teun van der WeijFeb 27, 2025, 8:33 PM

3 points

0 comments1 min readEA link

College technical AI safety hackathon retrospective—Georgia Tech

yixiongNov 14, 2024, 1:34 PM

18 points

0 comments5 min readEA link

(yixiong.substack.com)

Comparing AI Labs and Pharmaceutical Companies

mxschonsNov 13, 2024, 2:51 PM

13 points

0 comments1 min readEA link

(mxschons.com)

AGI Risk: How to internationally regulate industries in non-democracies

Timothy_LiptrotMay 16, 2022, 10:45 PM

9 points

2 comments9 min readEA link

OpenAI’s o1 tried to avoid being shut down, and lied about it, in evals

Greg_Colbourn ⏸️ Dec 6, 2024, 3:25 PM

23 points

9 comments1 min readEA link

(www.transformernews.ai)

Avoiding AI Races Through Self-Regulation

Gordon Seidoh WorleyMar 12, 2018, 8:52 PM

4 points

4 comments1 min readEA link

[Job]: AI Standards Development Research Assistant

Tony BarrettOct 14, 2022, 8:18 PM

13 points

0 comments2 min readEA link

What is the EU AI Act and why should you care about it?

MathiasKB🔸Sep 10, 2021, 7:47 AM

117 points

10 comments7 min readEA link

Antitrust-Compliant AI Industry Self-Regulation

Cullen 🔸Jul 7, 2020, 8:52 PM

26 points

1 comment1 min readEA link

(cullenokeefe.com)

Slightly against aligning with neo-luddites

Matthew_BarnettDec 26, 2022, 11:27 PM

77 points

17 comments4 min readEA link

Ways EU law might matter for farmed animals

Neil_Dullaghan🔹 Aug 17, 2020, 1:16 AM

54 points

0 comments15 min readEA link

The AIA and its Brussels Effect

Kathryn O'RourkeDec 27, 2022, 4:01 PM

16 points

0 comments5 min readEA link

Main paths to impact in EU AI Policy

JOMG_MonnetDec 8, 2022, 4:17 PM

69 points

2 comments8 min readEA link

A California Effect for Artificial Intelligence

henryjSep 9, 2022, 2:17 PM

73 points

1 comment4 min readEA link

(docs.google.com)

Who owns AI-generated content?

Johan S DanielDec 7, 2022, 3:03 AM

−2 points

0 comments2 min readEA link

“AGI timelines: ignore the social factor at their peril” (Future Fund AI Worldview Prize submission)

ketanramaNov 5, 2022, 5:45 PM

10 points

0 comments12 min readEA link

(trevorklee.substack.com)

LW4EA: Six economics misconceptions of mine which I’ve resolved over the last few years

JeremyAug 30, 2022, 3:20 PM

8 points

0 comments1 min readEA link

(www.lesswrong.com)

Meta: Frontier AI Framework

Zach Stein-PerlmanFeb 3, 2025, 10:00 PM

23 points

0 comments1 min readEA link

(ai.meta.com)

[Question] Should AI writers be prohibited in education?

Eleni_AJan 16, 2023, 10:29 PM

3 points

2 comments1 min readEA link

Scaling and Sustaining Standards: A Case Study on the Basel Accords

C.K.Jul 16, 2023, 6:18 PM

18 points

0 comments7 min readEA link

(docs.google.com)

US Congress introduces CREATE AI Act for establishing National AI Research Resource

Daniel_EthJul 28, 2023, 11:27 PM

9 points

1 comment1 min readEA link

(eshoo.house.gov)

Compliance Monitoring as an Impactful Mechanism of AI Safety Policy

CAISIDFeb 7, 2024, 4:10 PM

6 points

3 comments9 min readEA link

Asterisk Magazine Issue 03: AI

alejandroJul 24, 2023, 3:53 PM

34 points

3 comments1 min readEA link

(asteriskmag.com)

What would a compute monitoring plan look like? [Linkpost]

AkashMar 26, 2023, 7:33 PM

61 points

1 comment1 min readEA link

Scalable And Transferable Black-Box Jailbreaks For Language Models Via Persona Modulation

soroushjpNov 7, 2023, 6:00 PM

10 points

0 comments2 min readEA link

(arxiv.org)

Reframing the burden of proof: Companies should prove that models are safe (rather than expecting auditors to prove that models are dangerous)

AkashApr 25, 2023, 6:49 PM

35 points

1 comment1 min readEA link

Archetypal Transfer Learning: a Proposed Alignment Solution that solves the Inner x Outer Alignment Problem while adding Corrigible Traits to GPT-2-medium

MiguelApr 26, 2023, 12:40 AM

13 points

0 comments10 min readEA link

OpenAI’s new Preparedness team is hiring

leopoldOct 26, 2023, 8:41 PM

85 points

13 comments1 min readEA link

Open call: AI Act Standard for Dev. Phase Risk Assessment

miller-maxDec 8, 2023, 7:57 PM

5 points

1 comment1 min readEA link

Podcast (+transcript): Nathan Barnard on how US financial regulation can inform AI governance

Aaron BergmanAug 8, 2023, 9:46 PM

12 points

0 comments23 min readEA link

(www.aaronbergman.net)

Report: Artificial Intelligence Risk Management in Spain

JorgeTorresCJun 15, 2023, 4:08 PM

22 points

0 comments3 min readEA link

(riesgoscatastroficosglobales.com)

Safety evaluations and standards for AI | Beth Barnes | EAG Bay Area 23

Beth BarnesJun 16, 2023, 2:15 PM

28 points

0 comments17 min readEA link

[Question] How independent is the research coming out of OpenAI’s preparedness team?

EarthlingFeb 10, 2024, 4:59 PM

18 points

0 comments1 min readEA link

[Paper] AI Sandbagging: Language Models can Strategically Underperform on Evaluations

Teun van der WeijJun 13, 2024, 10:04 AM

24 points

2 comments1 min readEA link

(arxiv.org)

Introducing METR’s Autonomy Evaluation Resources

Megan KinnimentMar 15, 2024, 11:19 PM

28 points

0 comments1 min readEA link

(metr.github.io)

Model evals for dangerous capabilities

Zach Stein-PerlmanSep 23, 2024, 11:00 AM

19 points

0 comments1 min readEA link

METR is hiring!

ElizabethBarnesDec 26, 2023, 9:03 PM

50 points

0 comments1 min readEA link

(www.lesswrong.com)

Bounty: Diverse hard tasks for LLM agents

ElizabethBarnesDec 20, 2023, 4:31 PM

17 points

0 comments1 min readEA link

OMMC Announces RIP

Adam_SchollApr 1, 2024, 11:38 PM

7 points

0 comments2 min readEA link

LLM Evaluators Recognize and Favor Their Own Generations

Arjun PanicksseryApr 17, 2024, 9:09 PM

21 points

4 comments1 min readEA link

(tiny.cc)

Demonstrate and evaluate risks from AI to society at the AI x Democracy research hackathon

Esben KranApr 19, 2024, 2:46 PM

24 points

0 comments6 min readEA link

(www.apartresearch.com)

Join the $10K AutoHack 2024 Tournament

Paul BricmanSep 25, 2024, 11:56 AM

17 points

0 comments1 min readEA link

(noemaresearch.com)

I read every major AI lab’s safety plan so you don’t have to

sarahhwDec 16, 2024, 2:12 PM

67 points

2 comments11 min readEA link

(longerramblings.substack.com)

Submit Your Toughest Questions for Humanity’s Last Exam

Matrice JacobineSep 18, 2024, 8:03 AM

6 points

0 comments2 min readEA link

(www.safe.ai)

The “low-hanging fruits” of AI safety

Julian NalenzDec 19, 2024, 1:38 PM

−1 points

0 comments6 min readEA link

(blog.hermesloom.org)

Improving capability evaluations for AI governance: Open Philanthropy’s new request for proposals

cbFeb 7, 2025, 9:30 AM

37 points

3 comments3 min readEA link

OpenAI’s CBRN tests seem unclear

Luca Righetti 🔸Nov 21, 2024, 5:26 PM

82 points

3 comments7 min readEA link

Martin (Huge) Vlach Nov 28, 2023, 1:59 PM
1 point
0 ∶ 0
This comes pretty high on some google queries.
I suggest some tidying: Lesswrong (2023) -- the year seems off, does not match the elements under the link.
A space is missing before the last sentence of the first paragraph.

AI eval­u­a­tions and standards

Further reading

Related entries

AI evaluations and standards