RSS

AI eval­u­a­tions and standards

TagLast edit: 25 Apr 2023 17:39 UTC by Dane Valerie

AI evaluations and standards (or “evals”) are processes that check or audit AI models. Evaluations can focus on how powerful models are (“capability evaluations”) and on whether models are exhibiting dangerous behaviors or are misaligned (“alignment evaluations” or “safety evaluations”).Working on AI evaluations might involve developing standards and enforcing compliance with the standards.Evaluations can help labs determine whether it’s safe to deploy new models, and can help with AI governance and regulation.

Further reading

Lesswrong (2023) AI Evaluation posts

Karnofsky, Holden (2022) Racing through the minefield, Cold Takes, December 22.

Karnofsky, Holden (2022) AI Safety Seems Hard to Measure, Cold Takes, December 8.

Alignment Research Center (2023) Evals: A project of the non-profit Alignment Research Center focused on evaluating the capabilities and alignment of advanced ML models

Barnes, Beth (2023) Safety evaluations and standards for AI, EAG Bay Area, March 20.

Related entries

AI Safety | AI Governance | AI forecasting | Compute Governance | Slowing down AI | AI race

Deep­Mind: Model eval­u­a­tion for ex­treme risks

Zach Stein-Perlman25 May 2023 3:00 UTC
48 points
2 comments1 min readEA link

How tech­ni­cal safety stan­dards could pro­mote TAI safety

Cullen 🔸8 Aug 2022 16:57 UTC
128 points
15 comments7 min readEA link

Case stud­ies on so­cial-welfare-based stan­dards in var­i­ous industries

Holden Karnofsky20 Jun 2024 13:33 UTC
73 points
2 comments1 min readEA link

Trendlines in AIxBio evals

ljusten31 Oct 2024 0:09 UTC
39 points
2 comments11 min readEA link
(www.lennijusten.com)

Rac­ing through a minefield: the AI de­ploy­ment problem

Holden Karnofsky31 Dec 2022 21:44 UTC
79 points
1 comment13 min readEA link
(www.cold-takes.com)

AI Safety Seems Hard to Measure

Holden Karnofsky11 Dec 2022 1:31 UTC
90 points
4 comments14 min readEA link
(www.cold-takes.com)

An­nounc­ing Fore­castBench, a new bench­mark for AI and hu­man fore­cast­ing abilities

Forecasting Research Institute1 Oct 2024 12:31 UTC
20 points
1 comment3 min readEA link
(arxiv.org)

AI Risk Man­age­ment Frame­work | NIST

𝕮𝖎𝖓𝖊𝖗𝖆26 Jan 2023 15:27 UTC
50 points
0 comments1 min readEA link

AI Gover­nance Needs Tech­ni­cal Work

Mau5 Sep 2022 22:25 UTC
118 points
3 comments8 min readEA link

[Cause Ex­plo­ra­tion Prizes] Creat­ing a “reg­u­la­tory tur­bocharger” for EA rele­vant policies

Open Philanthropy11 Aug 2022 10:42 UTC
5 points
1 comment11 min readEA link

The case for more am­bi­tious lan­guage model evals

Jozdien30 Jan 2024 9:24 UTC
7 points
0 comments5 min readEA link

12 ten­ta­tive ideas for US AI policy (Luke Muehlhauser)

Lizka19 Apr 2023 21:05 UTC
117 points
12 comments4 min readEA link
(www.openphilanthropy.org)

Ac­tion­able-guidance and roadmap recom­men­da­tions for the NIST AI Risk Man­age­ment Framework

Tony Barrett17 May 2022 15:27 UTC
11 points
0 comments3 min readEA link

Will the EU reg­u­la­tions on AI mat­ter to the rest of the world?

hanadulset1 Jan 2022 21:56 UTC
33 points
5 comments5 min readEA link

The ‘Old AI’: Les­sons for AI gov­er­nance from early elec­tric­ity regulation

Sam Clarke19 Dec 2022 2:46 UTC
58 points
1 comment13 min readEA link

No­body’s on the ball on AGI alignment

leopold29 Mar 2023 14:26 UTC
327 points
65 comments9 min readEA link
(www.forourposterity.com)

Join the AI Eval­u­a­tion Tasks Bounty Hackathon

Esben Kran18 Mar 2024 8:15 UTC
20 points
0 comments4 min readEA link

Suc­cess with­out dig­nity: a nearcast­ing story of avoid­ing catas­tro­phe by luck

Holden Karnofsky15 Mar 2023 20:17 UTC
113 points
3 comments1 min readEA link

How evals might (or might not) pre­vent catas­trophic risks from AI

Akash7 Feb 2023 20:16 UTC
28 points
0 comments1 min readEA link

How ma­jor gov­ern­ments can help with the most im­por­tant century

Holden Karnofsky24 Feb 2023 19:37 UTC
56 points
4 comments4 min readEA link
(www.cold-takes.com)

Pro­ject ideas: Sen­tience and rights of digi­tal minds

Lukas Finnveden4 Jan 2024 7:26 UTC
32 points
1 comment20 min readEA link
(lukasfinnveden.substack.com)

A Tax­on­omy Of AI Sys­tem Evaluations

Maxime_Riche19 Aug 2024 9:08 UTC
8 points
0 comments14 min readEA link

Think­ing About Propen­sity Evaluations

Maxime_Riche19 Aug 2024 9:24 UTC
12 points
1 comment27 min readEA link

What AI com­pa­nies can do to­day to help with the most im­por­tant century

Holden Karnofsky20 Feb 2023 17:40 UTC
104 points
8 comments11 min readEA link
(www.cold-takes.com)

Sup­ple­ment to “The Brus­sels Effect and AI: How EU AI reg­u­la­tion will im­pact the global AI mar­ket”

MarkusAnderljung16 Aug 2022 20:55 UTC
109 points
7 comments8 min readEA link

A Map to Nav­i­gate AI Governance

hanadulset14 Feb 2022 22:41 UTC
72 points
11 comments25 min readEA link

Why Would AI “Aim” To Defeat Hu­man­ity?

Holden Karnofsky29 Nov 2022 18:59 UTC
24 points
0 comments32 min readEA link
(www.cold-takes.com)

AI policy ideas: Read­ing list

Zach Stein-Perlman17 Apr 2023 19:00 UTC
60 points
3 comments1 min readEA link

[Question] Open-source AI safety pro­jects?

defun 🔸29 Jan 2024 10:09 UTC
8 points
2 comments1 min readEA link

FLI re­port: Poli­cy­mak­ing in the Pause

Zach Stein-Perlman15 Apr 2023 17:01 UTC
29 points
4 comments1 min readEA link

High-level hopes for AI alignment

Holden Karnofsky20 Dec 2022 2:11 UTC
123 points
14 comments19 min readEA link
(www.cold-takes.com)

An ‘AGI Emer­gency Eject Cri­te­ria’ con­sen­sus could be re­ally use­ful.

tcelferact7 Apr 2023 16:21 UTC
27 points
3 comments1 min readEA link

Pro­pos­als for the AI Reg­u­la­tory Sand­box in Spain

Guillem Bas27 Apr 2023 10:33 UTC
55 points
2 comments11 min readEA link
(riesgoscatastroficosglobales.com)

NIST AI Risk Man­age­ment Frame­work re­quest for in­for­ma­tion (RFI)

Aryeh Englander31 Aug 2021 22:24 UTC
7 points
0 comments2 min readEA link

GovAI: Towards best prac­tices in AGI safety and gov­er­nance: A sur­vey of ex­pert opinion

Zach Stein-Perlman15 May 2023 1:42 UTC
68 points
3 comments1 min readEA link

Beg­ging, Plead­ing AI Orgs to Com­ment on NIST AI Risk Man­age­ment Framework

Bridges15 Apr 2022 19:35 UTC
87 points
3 comments2 min readEA link

Seek­ing (Paid) Case Stud­ies on Standards

Holden Karnofsky26 May 2023 17:58 UTC
99 points
14 comments1 min readEA link

AI Safety Newslet­ter #8: Rogue AIs, how to screen for AI risks, and grants for re­search on demo­cratic gov­er­nance of AI

Center for AI Safety30 May 2023 11:44 UTC
16 points
3 comments6 min readEA link
(newsletter.safe.ai)

The EU AI Act needs a defi­ni­tion of high-risk foun­da­tion mod­els to avoid reg­u­la­tory over­reach and backlash

matthias_samwald31 May 2023 15:34 UTC
17 points
0 comments4 min readEA link

An­nounc­ing Apollo Research

mariushobbhahn30 May 2023 16:17 UTC
156 points
5 comments1 min readEA link

Bi­den-Har­ris Ad­minis­tra­tion An­nounces First-Ever Con­sor­tium Ded­i­cated to AI Safety

ben.smith9 Feb 2024 6:40 UTC
15 points
1 comment1 min readEA link
(www.nist.gov)

An overview of stan­dards in biosafety and biosecurity

rosehadshar26 Jul 2023 12:19 UTC
77 points
7 comments11 min readEA link

Col­lege tech­ni­cal AI safety hackathon ret­ro­spec­tive—Ge­or­gia Tech

yixiong14 Nov 2024 13:34 UTC
18 points
0 comments5 min readEA link
(yixiong.substack.com)

Com­par­ing AI Labs and Phar­ma­ceu­ti­cal Companies

mxschons13 Nov 2024 14:51 UTC
13 points
0 comments1 min readEA link
(mxschons.com)

AGI Risk: How to in­ter­na­tion­ally reg­u­late in­dus­tries in non-democracies

Timothy_Liptrot16 May 2022 22:45 UTC
9 points
2 comments9 min readEA link

OpenAI’s o1 tried to avoid be­ing shut down, and lied about it, in evals

Greg_Colbourn6 Dec 2024 15:25 UTC
23 points
9 comments1 min readEA link
(www.transformernews.ai)

Avoid­ing AI Races Through Self-Regulation

Gordon Seidoh Worley12 Mar 2018 20:52 UTC
4 points
4 comments1 min readEA link

[Job]: AI Stan­dards Devel­op­ment Re­search Assistant

Tony Barrett14 Oct 2022 20:18 UTC
13 points
0 comments2 min readEA link

What is the EU AI Act and why should you care about it?

MathiasKB🔸10 Sep 2021 7:47 UTC
116 points
10 comments7 min readEA link

An­titrust-Com­pli­ant AI In­dus­try Self-Regulation

Cullen 🔸7 Jul 2020 20:52 UTC
26 points
1 comment1 min readEA link
(cullenokeefe.com)

Slightly against al­ign­ing with neo-luddites

Matthew_Barnett26 Dec 2022 23:27 UTC
77 points
17 comments4 min readEA link

Ways EU law might mat­ter for farmed animals

Neil_Dullaghan🔹 17 Aug 2020 1:16 UTC
54 points
0 comments15 min readEA link

The AIA and its Brus­sels Effect

Kathryn O'Rourke27 Dec 2022 16:01 UTC
16 points
0 comments5 min readEA link

Main paths to im­pact in EU AI Policy

JOMG_Monnet8 Dec 2022 16:17 UTC
69 points
2 comments8 min readEA link

A Cal­ifor­nia Effect for Ar­tifi­cial Intelligence

henryj9 Sep 2022 14:17 UTC
73 points
1 comment4 min readEA link
(docs.google.com)

Who owns AI-gen­er­ated con­tent?

Johan S Daniel7 Dec 2022 3:03 UTC
−2 points
0 comments2 min readEA link

“AGI timelines: ig­nore the so­cial fac­tor at their peril” (Fu­ture Fund AI Wor­ld­view Prize sub­mis­sion)

ketanrama5 Nov 2022 17:45 UTC
10 points
0 comments12 min readEA link
(trevorklee.substack.com)

LW4EA: Six eco­nomics mis­con­cep­tions of mine which I’ve re­solved over the last few years

Jeremy30 Aug 2022 15:20 UTC
8 points
0 comments1 min readEA link
(www.lesswrong.com)

[Question] Should AI writ­ers be pro­hibited in ed­u­ca­tion?

Eleni_A16 Jan 2023 22:29 UTC
3 points
2 comments1 min readEA link

Scal­ing and Sus­tain­ing Stan­dards: A Case Study on the Basel Accords

Conrad K.16 Jul 2023 18:18 UTC
18 points
0 comments7 min readEA link
(docs.google.com)

US Congress in­tro­duces CREATE AI Act for es­tab­lish­ing Na­tional AI Re­search Resource

Daniel_Eth28 Jul 2023 23:27 UTC
9 points
1 comment1 min readEA link
(eshoo.house.gov)

Com­pli­ance Mon­i­tor­ing as an Im­pact­ful Mechanism of AI Safety Policy

CAISID7 Feb 2024 16:10 UTC
4 points
3 comments9 min readEA link

As­ter­isk Magaz­ine Is­sue 03: AI

Alejandro Ortega24 Jul 2023 15:53 UTC
34 points
3 comments1 min readEA link
(asteriskmag.com)

What would a com­pute mon­i­tor­ing plan look like? [Linkpost]

Akash26 Mar 2023 19:33 UTC
61 points
1 comment1 min readEA link

Scal­able And Trans­fer­able Black-Box Jailbreaks For Lan­guage Models Via Per­sona Modulation

soroushjp7 Nov 2023 18:00 UTC
10 points
0 comments2 min readEA link
(arxiv.org)

Refram­ing the bur­den of proof: Com­pa­nies should prove that mod­els are safe (rather than ex­pect­ing au­di­tors to prove that mod­els are dan­ger­ous)

Akash25 Apr 2023 18:49 UTC
35 points
1 comment1 min readEA link

Archety­pal Trans­fer Learn­ing: a Pro­posed Align­ment Solu­tion that solves the In­ner x Outer Align­ment Prob­lem while adding Cor­rigible Traits to GPT-2-medium

Miguel26 Apr 2023 0:40 UTC
13 points
0 comments10 min readEA link

OpenAI’s new Pre­pared­ness team is hiring

leopold26 Oct 2023 20:41 UTC
85 points
13 comments1 min readEA link

Open call: AI Act Stan­dard for Dev. Phase Risk Assess­ment

miller-max8 Dec 2023 19:57 UTC
5 points
1 comment1 min readEA link

Pod­cast (+tran­script): Nathan Barnard on how US fi­nan­cial reg­u­la­tion can in­form AI governance

Aaron Bergman8 Aug 2023 21:46 UTC
12 points
0 comments23 min readEA link
(www.aaronbergman.net)

Re­port: Ar­tifi­cial In­tel­li­gence Risk Man­age­ment in Spain

JorgeTorresC15 Jun 2023 16:08 UTC
22 points
0 comments3 min readEA link
(riesgoscatastroficosglobales.com)

Safety eval­u­a­tions and stan­dards for AI | Beth Barnes | EAG Bay Area 23

Beth Barnes16 Jun 2023 14:15 UTC
28 points
0 comments17 min readEA link

[Question] How in­de­pen­dent is the re­search com­ing out of OpenAI’s pre­pared­ness team?

Earthling10 Feb 2024 16:59 UTC
18 points
0 comments1 min readEA link

[Paper] AI Sand­bag­ging: Lan­guage Models can Strate­gi­cally Un­der­perform on Evaluations

Teun van der Weij13 Jun 2024 10:04 UTC
22 points
2 comments1 min readEA link
(arxiv.org)

In­tro­duc­ing METR’s Au­ton­omy Eval­u­a­tion Resources

Megan Kinniment15 Mar 2024 23:19 UTC
28 points
0 comments1 min readEA link
(metr.github.io)

Model evals for dan­ger­ous capabilities

Zach Stein-Perlman23 Sep 2024 11:00 UTC
19 points
0 comments1 min readEA link

METR is hiring!

ElizabethBarnes26 Dec 2023 21:03 UTC
50 points
0 comments1 min readEA link
(www.lesswrong.com)

Bounty: Di­verse hard tasks for LLM agents

ElizabethBarnes20 Dec 2023 16:31 UTC
17 points
0 comments1 min readEA link

OMMC An­nounces RIP

Adam_Scholl1 Apr 2024 23:38 UTC
7 points
0 comments2 min readEA link

LLM Eval­u­a­tors Rec­og­nize and Fa­vor Their Own Generations

Arjun Panickssery17 Apr 2024 21:09 UTC
21 points
4 comments1 min readEA link
(tiny.cc)

De­mon­strate and eval­u­ate risks from AI to so­ciety at the AI x Democ­racy re­search hackathon

Esben Kran19 Apr 2024 14:46 UTC
24 points
0 comments6 min readEA link
(www.apartresearch.com)

Join the $10K Au­toHack 2024 Tournament

Paul Bricman25 Sep 2024 11:56 UTC
17 points
0 comments1 min readEA link
(noemaresearch.com)

I read ev­ery ma­jor AI lab’s safety plan so you don’t have to

sarahhw16 Dec 2024 14:12 UTC
33 points
1 comment11 min readEA link
(longerramblings.substack.com)

Sub­mit Your Tough­est Ques­tions for Hu­man­ity’s Last Exam

Matrice Jacobine18 Sep 2024 8:03 UTC
6 points
0 comments2 min readEA link
(www.safe.ai)

OpenAI’s CBRN tests seem unclear

Luca Righetti 🔸21 Nov 2024 17:26 UTC
82 points
3 comments7 min readEA link

The cur­rent state of RSPs

Zach Stein-Perlman4 Nov 2024 16:00 UTC
19 points
1 comment1 min readEA link