RSS

AI eval­u­a­tions and standards

TagLast edit: Apr 25, 2023, 5:39 PM by Dane Valerie

AI evaluations and standards (or “evals”) are processes that check or audit AI models. Evaluations can focus on how powerful models are (“capability evaluations”) and on whether models are exhibiting dangerous behaviors or are misaligned (“alignment evaluations” or “safety evaluations”).Working on AI evaluations might involve developing standards and enforcing compliance with the standards.Evaluations can help labs determine whether it’s safe to deploy new models, and can help with AI governance and regulation.

Further reading

Lesswrong (2023) AI Evaluation posts

Karnofsky, Holden (2022) Racing through the minefield, Cold Takes, December 22.

Karnofsky, Holden (2022) AI Safety Seems Hard to Measure, Cold Takes, December 8.

Alignment Research Center (2023) Evals: A project of the non-profit Alignment Research Center focused on evaluating the capabilities and alignment of advanced ML models

Barnes, Beth (2023) Safety evaluations and standards for AI, EAG Bay Area, March 20.

Related entries

AI Safety | AI Governance | AI forecasting | Compute Governance | Slowing down AI | AI race

Deep­Mind: Model eval­u­a­tion for ex­treme risks

Zach Stein-PerlmanMay 25, 2023, 3:00 AM
49 points
3 comments1 min readEA link

Trendlines in AIxBio evals

ljustenOct 31, 2024, 12:09 AM
39 points
2 comments11 min readEA link
(www.lennijusten.com)

Rac­ing through a minefield: the AI de­ploy­ment problem

Holden KarnofskyDec 31, 2022, 9:44 PM
79 points
1 comment13 min readEA link
(www.cold-takes.com)

How tech­ni­cal safety stan­dards could pro­mote TAI safety

Cullen 🔸Aug 8, 2022, 4:57 PM
128 points
15 comments7 min readEA link

AI Safety Seems Hard to Measure

Holden KarnofskyDec 11, 2022, 1:31 AM
90 points
4 comments14 min readEA link
(www.cold-takes.com)

Case stud­ies on so­cial-welfare-based stan­dards in var­i­ous industries

Holden KarnofskyJun 20, 2024, 1:33 PM
73 points
2 comments1 min readEA link

[Cause Ex­plo­ra­tion Prizes] Creat­ing a “reg­u­la­tory tur­bocharger” for EA rele­vant policies

Open PhilanthropyAug 11, 2022, 10:42 AM
5 points
1 comment11 min readEA link

AI Gover­nance Needs Tech­ni­cal Work

MauSep 5, 2022, 10:25 PM
121 points
3 comments8 min readEA link

The case for more am­bi­tious lan­guage model evals

JozdienJan 30, 2024, 9:24 AM
7 points
0 comments5 min readEA link

AI Risk Man­age­ment Frame­work | NIST

𝕮𝖎𝖓𝖊𝖗𝖆Jan 26, 2023, 3:27 PM
50 points
0 comments1 min readEA link

An­nounc­ing Fore­castBench, a new bench­mark for AI and hu­man fore­cast­ing abilities

Forecasting Research InstituteOct 1, 2024, 12:31 PM
20 points
1 comment3 min readEA link
(arxiv.org)

12 ten­ta­tive ideas for US AI policy (Luke Muehlhauser)

LizkaApr 19, 2023, 9:05 PM
117 points
12 comments4 min readEA link
(www.openphilanthropy.org)

Join the AI Eval­u­a­tion Tasks Bounty Hackathon

Esben KranMar 18, 2024, 8:15 AM
20 points
0 comments4 min readEA link

Suc­cess with­out dig­nity: a nearcast­ing story of avoid­ing catas­tro­phe by luck

Holden KarnofskyMar 15, 2023, 8:17 PM
113 points
3 comments1 min readEA link

How evals might (or might not) pre­vent catas­trophic risks from AI

AkashFeb 7, 2023, 8:16 PM
28 points
0 comments1 min readEA link

How ma­jor gov­ern­ments can help with the most im­por­tant century

Holden KarnofskyFeb 24, 2023, 7:37 PM
56 points
4 comments4 min readEA link
(www.cold-takes.com)

Pro­ject ideas: Sen­tience and rights of digi­tal minds

Lukas FinnvedenJan 4, 2024, 7:26 AM
33 points
1 comment20 min readEA link
(lukasfinnveden.substack.com)

A Tax­on­omy Of AI Sys­tem Evaluations

Maxime Riché 🔸Aug 19, 2024, 9:08 AM
8 points
0 comments14 min readEA link

Think­ing About Propen­sity Evaluations

Maxime Riché 🔸Aug 19, 2024, 9:24 AM
12 points
1 comment27 min readEA link

What AI com­pa­nies can do to­day to help with the most im­por­tant century

Holden KarnofskyFeb 20, 2023, 5:40 PM
104 points
8 comments11 min readEA link
(www.cold-takes.com)

Sup­ple­ment to “The Brus­sels Effect and AI: How EU AI reg­u­la­tion will im­pact the global AI mar­ket”

MarkusAnderljungAug 16, 2022, 8:55 PM
109 points
7 comments8 min readEA link

Why Would AI “Aim” To Defeat Hu­man­ity?

Holden KarnofskyNov 29, 2022, 6:59 PM
24 points
0 comments32 min readEA link
(www.cold-takes.com)

AI policy ideas: Read­ing list

Zach Stein-PerlmanApr 17, 2023, 7:00 PM
60 points
3 comments1 min readEA link

[Question] Open-source AI safety pro­jects?

defun 🔸Jan 29, 2024, 10:09 AM
8 points
2 comments1 min readEA link

FLI re­port: Poli­cy­mak­ing in the Pause

Zach Stein-PerlmanApr 15, 2023, 5:01 PM
29 points
4 comments1 min readEA link

High-level hopes for AI alignment

Holden KarnofskyDec 20, 2022, 2:11 AM
123 points
14 comments19 min readEA link
(www.cold-takes.com)

Rol­ling Thresh­olds for AGI Scal­ing Regulation

LarksJan 12, 2025, 1:30 AM
60 points
4 comments6 min readEA link

Bench­mark Perfor­mance is a Poor Mea­sure of Gen­er­al­is­able AI Rea­son­ing Capabilities

James FodorFeb 21, 2025, 4:25 AM
12 points
3 comments24 min readEA link

An ‘AGI Emer­gency Eject Cri­te­ria’ con­sen­sus could be re­ally use­ful.

tcelferactApr 7, 2023, 4:21 PM
27 points
3 comments1 min readEA link

NIST AI Risk Man­age­ment Frame­work re­quest for in­for­ma­tion (RFI)

Aryeh EnglanderAug 31, 2021, 10:24 PM
7 points
0 comments2 min readEA link

Pro­pos­als for the AI Reg­u­la­tory Sand­box in Spain

Guillem BasApr 27, 2023, 10:33 AM
55 points
2 comments11 min readEA link
(riesgoscatastroficosglobales.com)

Beg­ging, Plead­ing AI Orgs to Com­ment on NIST AI Risk Man­age­ment Framework

BridgesApr 15, 2022, 7:35 PM
87 points
3 comments2 min readEA link

GovAI: Towards best prac­tices in AGI safety and gov­er­nance: A sur­vey of ex­pert opinion

Zach Stein-PerlmanMay 15, 2023, 1:42 AM
68 points
3 comments1 min readEA link

Ac­tion­able-guidance and roadmap recom­men­da­tions for the NIST AI Risk Man­age­ment Framework

Tony BarrettMay 17, 2022, 3:27 PM
11 points
0 comments3 min readEA link

Seek­ing (Paid) Case Stud­ies on Standards

Holden KarnofskyMay 26, 2023, 5:58 PM
99 points
14 comments1 min readEA link

AI Safety Newslet­ter #8: Rogue AIs, how to screen for AI risks, and grants for re­search on demo­cratic gov­er­nance of AI

Center for AI SafetyMay 30, 2023, 11:44 AM
16 points
3 comments6 min readEA link
(newsletter.safe.ai)

The EU AI Act needs a defi­ni­tion of high-risk foun­da­tion mod­els to avoid reg­u­la­tory over­reach and backlash

matthias_samwaldMay 31, 2023, 3:34 PM
17 points
0 comments4 min readEA link

An­nounc­ing Apollo Research

mariushobbhahnMay 30, 2023, 4:17 PM
158 points
5 comments1 min readEA link

Bi­den-Har­ris Ad­minis­tra­tion An­nounces First-Ever Con­sor­tium Ded­i­cated to AI Safety

ben.smithFeb 9, 2024, 6:40 AM
15 points
1 comment1 min readEA link
(www.nist.gov)

An overview of stan­dards in biosafety and biosecurity

rosehadsharJul 26, 2023, 12:19 PM
77 points
7 comments11 min readEA link

Will the EU reg­u­la­tions on AI mat­ter to the rest of the world?

hanadulsetJan 1, 2022, 9:56 PM
33 points
5 comments5 min readEA link

The ‘Old AI’: Les­sons for AI gov­er­nance from early elec­tric­ity regulation

Sam ClarkeDec 19, 2022, 2:46 AM
58 points
1 comment13 min readEA link

A Map to Nav­i­gate AI Governance

hanadulsetFeb 14, 2022, 10:41 PM
72 points
11 comments25 min readEA link

No­body’s on the ball on AGI alignment

leopoldMar 29, 2023, 2:26 PM
327 points
65 comments9 min readEA link
(www.forourposterity.com)

o3

Zach Stein-PerlmanDec 20, 2024, 9:00 PM
84 points
5 comments1 min readEA link

The cur­rent state of RSPs

Zach Stein-PerlmanNov 4, 2024, 4:00 PM
19 points
1 comment1 min readEA link

AI Au­dit in Costa Rica

Priscilla CamposJan 27, 2025, 2:57 AM
10 points
4 comments9 min readEA link

[Question] Whose track record of AI pre­dic­tions would you like to see eval­u­ated?

Jonny Spicer 🔸Jan 29, 2025, 11:57 AM
10 points
13 comments1 min readEA link

The Elic­i­ta­tion Game: Eval­u­at­ing ca­pa­bil­ity elic­i­ta­tion techniques

Teun van der WeijFeb 27, 2025, 8:33 PM
3 points
0 comments1 min readEA link

Col­lege tech­ni­cal AI safety hackathon ret­ro­spec­tive—Ge­or­gia Tech

yixiongNov 14, 2024, 1:34 PM
18 points
0 comments5 min readEA link
(yixiong.substack.com)

Com­par­ing AI Labs and Phar­ma­ceu­ti­cal Companies

mxschonsNov 13, 2024, 2:51 PM
13 points
0 comments1 min readEA link
(mxschons.com)

AGI Risk: How to in­ter­na­tion­ally reg­u­late in­dus­tries in non-democracies

Timothy_LiptrotMay 16, 2022, 10:45 PM
9 points
2 comments9 min readEA link

OpenAI’s o1 tried to avoid be­ing shut down, and lied about it, in evals

Greg_Colbourn ⏸️ Dec 6, 2024, 3:25 PM
23 points
9 comments1 min readEA link
(www.transformernews.ai)

Avoid­ing AI Races Through Self-Regulation

Gordon Seidoh WorleyMar 12, 2018, 8:52 PM
4 points
4 comments1 min readEA link

[Job]: AI Stan­dards Devel­op­ment Re­search Assistant

Tony BarrettOct 14, 2022, 8:18 PM
13 points
0 comments2 min readEA link

What is the EU AI Act and why should you care about it?

MathiasKB🔸Sep 10, 2021, 7:47 AM
116 points
10 comments7 min readEA link

An­titrust-Com­pli­ant AI In­dus­try Self-Regulation

Cullen 🔸Jul 7, 2020, 8:52 PM
26 points
1 comment1 min readEA link
(cullenokeefe.com)

Slightly against al­ign­ing with neo-luddites

Matthew_BarnettDec 26, 2022, 11:27 PM
77 points
17 comments4 min readEA link

Ways EU law might mat­ter for farmed animals

Neil_Dullaghan🔹 Aug 17, 2020, 1:16 AM
54 points
0 comments15 min readEA link

The AIA and its Brus­sels Effect

Kathryn O'RourkeDec 27, 2022, 4:01 PM
16 points
0 comments5 min readEA link

Main paths to im­pact in EU AI Policy

JOMG_MonnetDec 8, 2022, 4:17 PM
69 points
2 comments8 min readEA link

A Cal­ifor­nia Effect for Ar­tifi­cial Intelligence

henryjSep 9, 2022, 2:17 PM
73 points
1 comment4 min readEA link
(docs.google.com)

Who owns AI-gen­er­ated con­tent?

Johan S DanielDec 7, 2022, 3:03 AM
−2 points
0 comments2 min readEA link

“AGI timelines: ig­nore the so­cial fac­tor at their peril” (Fu­ture Fund AI Wor­ld­view Prize sub­mis­sion)

ketanramaNov 5, 2022, 5:45 PM
10 points
0 comments12 min readEA link
(trevorklee.substack.com)

LW4EA: Six eco­nomics mis­con­cep­tions of mine which I’ve re­solved over the last few years

JeremyAug 30, 2022, 3:20 PM
8 points
0 comments1 min readEA link
(www.lesswrong.com)

Meta: Fron­tier AI Framework

Zach Stein-PerlmanFeb 3, 2025, 10:00 PM
23 points
0 comments1 min readEA link
(ai.meta.com)

[Question] Should AI writ­ers be pro­hibited in ed­u­ca­tion?

Eleni_AJan 16, 2023, 10:29 PM
3 points
2 comments1 min readEA link

Scal­ing and Sus­tain­ing Stan­dards: A Case Study on the Basel Accords

C.K.Jul 16, 2023, 6:18 PM
18 points
0 comments7 min readEA link
(docs.google.com)

US Congress in­tro­duces CREATE AI Act for es­tab­lish­ing Na­tional AI Re­search Resource

Daniel_EthJul 28, 2023, 11:27 PM
9 points
1 comment1 min readEA link
(eshoo.house.gov)

Com­pli­ance Mon­i­tor­ing as an Im­pact­ful Mechanism of AI Safety Policy

CAISIDFeb 7, 2024, 4:10 PM
6 points
3 comments9 min readEA link

As­ter­isk Magaz­ine Is­sue 03: AI

alejandroJul 24, 2023, 3:53 PM
34 points
3 comments1 min readEA link
(asteriskmag.com)

What would a com­pute mon­i­tor­ing plan look like? [Linkpost]

AkashMar 26, 2023, 7:33 PM
61 points
1 comment1 min readEA link

Scal­able And Trans­fer­able Black-Box Jailbreaks For Lan­guage Models Via Per­sona Modulation

soroushjpNov 7, 2023, 6:00 PM
10 points
0 comments2 min readEA link
(arxiv.org)

Refram­ing the bur­den of proof: Com­pa­nies should prove that mod­els are safe (rather than ex­pect­ing au­di­tors to prove that mod­els are dan­ger­ous)

AkashApr 25, 2023, 6:49 PM
35 points
1 comment1 min readEA link

Archety­pal Trans­fer Learn­ing: a Pro­posed Align­ment Solu­tion that solves the In­ner x Outer Align­ment Prob­lem while adding Cor­rigible Traits to GPT-2-medium

MiguelApr 26, 2023, 12:40 AM
13 points
0 comments10 min readEA link

OpenAI’s new Pre­pared­ness team is hiring

leopoldOct 26, 2023, 8:41 PM
85 points
13 comments1 min readEA link

Open call: AI Act Stan­dard for Dev. Phase Risk Assess­ment

miller-maxDec 8, 2023, 7:57 PM
5 points
1 comment1 min readEA link

Pod­cast (+tran­script): Nathan Barnard on how US fi­nan­cial reg­u­la­tion can in­form AI governance

Aaron BergmanAug 8, 2023, 9:46 PM
12 points
0 comments23 min readEA link
(www.aaronbergman.net)

Re­port: Ar­tifi­cial In­tel­li­gence Risk Man­age­ment in Spain

JorgeTorresCJun 15, 2023, 4:08 PM
22 points
0 comments3 min readEA link
(riesgoscatastroficosglobales.com)

Safety eval­u­a­tions and stan­dards for AI | Beth Barnes | EAG Bay Area 23

Beth BarnesJun 16, 2023, 2:15 PM
28 points
0 comments17 min readEA link

[Question] How in­de­pen­dent is the re­search com­ing out of OpenAI’s pre­pared­ness team?

EarthlingFeb 10, 2024, 4:59 PM
18 points
0 comments1 min readEA link

[Paper] AI Sand­bag­ging: Lan­guage Models can Strate­gi­cally Un­der­perform on Evaluations

Teun van der WeijJun 13, 2024, 10:04 AM
22 points
2 comments1 min readEA link
(arxiv.org)

In­tro­duc­ing METR’s Au­ton­omy Eval­u­a­tion Resources

Megan KinnimentMar 15, 2024, 11:19 PM
28 points
0 comments1 min readEA link
(metr.github.io)

Model evals for dan­ger­ous capabilities

Zach Stein-PerlmanSep 23, 2024, 11:00 AM
19 points
0 comments1 min readEA link

METR is hiring!

ElizabethBarnesDec 26, 2023, 9:03 PM
50 points
0 comments1 min readEA link
(www.lesswrong.com)

Bounty: Di­verse hard tasks for LLM agents

ElizabethBarnesDec 20, 2023, 4:31 PM
17 points
0 comments1 min readEA link

OMMC An­nounces RIP

Adam_SchollApr 1, 2024, 11:38 PM
7 points
0 comments2 min readEA link

LLM Eval­u­a­tors Rec­og­nize and Fa­vor Their Own Generations

Arjun PanicksseryApr 17, 2024, 9:09 PM
21 points
4 comments1 min readEA link
(tiny.cc)

De­mon­strate and eval­u­ate risks from AI to so­ciety at the AI x Democ­racy re­search hackathon

Esben KranApr 19, 2024, 2:46 PM
24 points
0 comments6 min readEA link
(www.apartresearch.com)

Join the $10K Au­toHack 2024 Tournament

Paul BricmanSep 25, 2024, 11:56 AM
17 points
0 comments1 min readEA link
(noemaresearch.com)

I read ev­ery ma­jor AI lab’s safety plan so you don’t have to

sarahhwDec 16, 2024, 2:12 PM
67 points
2 comments11 min readEA link
(longerramblings.substack.com)

Sub­mit Your Tough­est Ques­tions for Hu­man­ity’s Last Exam

Matrice JacobineSep 18, 2024, 8:03 AM
6 points
0 comments2 min readEA link
(www.safe.ai)

The “low-hang­ing fruits” of AI safety

Julian NalenzDec 19, 2024, 1:38 PM
−1 points
0 comments6 min readEA link
(blog.hermesloom.org)

Im­prov­ing ca­pa­bil­ity eval­u­a­tions for AI gov­er­nance: Open Philan­thropy’s new re­quest for proposals

cbFeb 7, 2025, 9:30 AM
37 points
3 comments3 min readEA link

OpenAI’s CBRN tests seem unclear

Luca Righetti 🔸Nov 21, 2024, 5:26 PM
82 points
3 comments7 min readEA link