RSS

AI eval­u­a­tions and standards

TagLast edit: 25 Apr 2023 17:39 UTC by Dane Valerie

AI evaluations and standards (or “evals”) are processes that check or audit AI models. Evaluations can focus on how powerful models are (“capability evaluations”) and on whether models are exhibiting dangerous behaviors or are misaligned (“alignment evaluations” or “safety evaluations”).Working on AI evaluations might involve developing standards and enforcing compliance with the standards.Evaluations can help labs determine whether it’s safe to deploy new models, and can help with AI governance and regulation.

Further reading

Lesswrong (2023) AI Evaluation posts

Karnofsky, Holden (2022) Racing through the minefield, Cold Takes, December 22.

Karnofsky, Holden (2022) AI Safety Seems Hard to Measure, Cold Takes, December 8.

Alignment Research Center (2023) Evals: A project of the non-profit Alignment Research Center focused on evaluating the capabilities and alignment of advanced ML models

Barnes, Beth (2023) Safety evaluations and standards for AI, EAG Bay Area, March 20.

Related entries

AI Safety | AI Governance | AI forecasting | Compute Governance | Slowing down AI | AI race

Deep­Mind: Model eval­u­a­tion for ex­treme risks

Zach Stein-Perlman25 May 2023 3:00 UTC
49 points
3 comments1 min readEA link
(arxiv.org)

Rac­ing through a minefield: the AI de­ploy­ment problem

Holden Karnofsky31 Dec 2022 21:44 UTC
79 points
1 comment13 min readEA link
(www.cold-takes.com)

AI com­pa­nies’ eval re­ports mostly don’t sup­port their claims

Zach Stein-Perlman9 Jun 2025 13:00 UTC
51 points
2 comments4 min readEA link

Im­pact of Quan­ti­za­tion on Small Lan­guage Models (SLMs) for Mul­tilin­gual Math­e­mat­i­cal Rea­son­ing Tasks

Angie Paola Giraldo7 May 2025 21:48 UTC
11 points
0 comments14 min readEA link

Case stud­ies on so­cial-welfare-based stan­dards in var­i­ous industries

Holden Karnofsky20 Jun 2024 13:33 UTC
73 points
2 comments1 min readEA link

AI Safety Seems Hard to Measure

Holden Karnofsky11 Dec 2022 1:31 UTC
90 points
4 comments14 min readEA link
(www.cold-takes.com)

Trendlines in AIxBio evals

ljusten31 Oct 2024 0:09 UTC
40 points
2 comments11 min readEA link
(www.lennijusten.com)

[Cause Ex­plo­ra­tion Prizes] Creat­ing a “reg­u­la­tory tur­bocharger” for EA rele­vant policies

Coefficient Giving11 Aug 2022 10:42 UTC
5 points
1 comment11 min readEA link

12 ten­ta­tive ideas for US AI policy (Luke Muehlhauser)

Lizka19 Apr 2023 21:05 UTC
117 points
12 comments4 min readEA link
(www.openphilanthropy.org)

An­nounc­ing Fore­castBench, a new bench­mark for AI and hu­man fore­cast­ing abilities

Forecasting Research Institute1 Oct 2024 12:31 UTC
20 points
1 comment3 min readEA link
(arxiv.org)

AI Risk Man­age­ment Frame­work | NIST

𝕮𝖎𝖓𝖊𝖗𝖆26 Jan 2023 15:27 UTC
50 points
0 comments2 min readEA link
(www.nist.gov)

AI Gover­nance Needs Tech­ni­cal Work

Mau5 Sep 2022 22:25 UTC
121 points
3 comments8 min readEA link

The case for more am­bi­tious lan­guage model evals

Jozdien30 Jan 2024 9:24 UTC
7 points
0 comments5 min readEA link

The ‘Old AI’: Les­sons for AI gov­er­nance from early elec­tric­ity regulation

Sam Clarke19 Dec 2022 2:46 UTC
64 points
1 comment13 min readEA link

Seek­ing (Paid) Case Stud­ies on Standards

Holden Karnofsky26 May 2023 17:58 UTC
99 points
13 comments11 min readEA link

An ‘AGI Emer­gency Eject Cri­te­ria’ con­sen­sus could be re­ally use­ful.

tcelferact7 Apr 2023 16:21 UTC
27 points
3 comments1 min readEA link

Rol­ling Thresh­olds for AGI Scal­ing Regulation

Larks12 Jan 2025 1:30 UTC
60 points
4 comments6 min readEA link

Will the EU reg­u­la­tions on AI mat­ter to the rest of the world?

hanadulset1 Jan 2022 21:56 UTC
33 points
5 comments5 min readEA link

FLI re­port: Poli­cy­mak­ing in the Pause

Zach Stein-Perlman15 Apr 2023 17:01 UTC
29 points
4 comments1 min readEA link
(futureoflife.org)

High-level hopes for AI alignment

Holden Karnofsky20 Dec 2022 2:11 UTC
123 points
14 comments19 min readEA link
(www.cold-takes.com)

Why Would AI “Aim” To Defeat Hu­man­ity?

Holden Karnofsky29 Nov 2022 18:59 UTC
24 points
0 comments32 min readEA link
(www.cold-takes.com)

Sup­ple­ment to “The Brus­sels Effect and AI: How EU AI reg­u­la­tion will im­pact the global AI mar­ket”

MarkusAnderljung16 Aug 2022 20:55 UTC
109 points
7 comments8 min readEA link

A Tax­on­omy Of AI Sys­tem Evaluations

Maxime Riché 🔸19 Aug 2024 9:08 UTC
13 points
0 comments14 min readEA link

An overview of stan­dards in biosafety and biosecurity

rosehadshar26 Jul 2023 12:19 UTC
77 points
7 comments11 min readEA link

A Map to Nav­i­gate AI Governance

hanadulset14 Feb 2022 22:41 UTC
74 points
11 comments25 min readEA link

No­body’s on the ball on AGI alignment

leopold29 Mar 2023 14:26 UTC
328 points
66 comments9 min readEA link
(www.forourposterity.com)

Ac­tion­able-guidance and roadmap recom­men­da­tions for the NIST AI Risk Man­age­ment Framework

Tony Barrett17 May 2022 15:27 UTC
11 points
0 comments3 min readEA link

AI Safety Newslet­ter #8: Rogue AIs, how to screen for AI risks, and grants for re­search on demo­cratic gov­er­nance of AI

Center for AI Safety30 May 2023 11:44 UTC
16 points
3 comments6 min readEA link
(newsletter.safe.ai)

NIST AI Risk Man­age­ment Frame­work re­quest for in­for­ma­tion (RFI)

Aryeh Englander31 Aug 2021 22:24 UTC
7 points
0 comments2 min readEA link

Pro­pos­als for the AI Reg­u­la­tory Sand­box in Spain

Guillem Bas27 Apr 2023 10:33 UTC
55 points
2 comments11 min readEA link
(riesgoscatastroficosglobales.com)

Fron­tier LLM Race/​Sex Ex­change Rates

Arjun Panickssery19 Oct 2025 18:36 UTC
25 points
1 comment3 min readEA link
(arctotherium.substack.com)

The EU AI Act needs a defi­ni­tion of high-risk foun­da­tion mod­els to avoid reg­u­la­tory over­reach and backlash

matthias_samwald31 May 2023 15:34 UTC
17 points
0 comments4 min readEA link

Bench­mark Perfor­mance is a Poor Mea­sure of Gen­er­al­is­able AI Rea­son­ing Capabilities

James Fodor21 Feb 2025 4:25 UTC
12 points
3 comments24 min readEA link

Think­ing About Propen­sity Evaluations

Maxime Riché 🔸19 Aug 2024 9:24 UTC
17 points
1 comment27 min readEA link

Re­cent progress on the sci­ence of evaluations

PabloAMC 🔸23 Jun 2025 9:49 UTC
12 points
0 comments8 min readEA link
(www.lesswrong.com)

Beg­ging, Plead­ing AI Orgs to Com­ment on NIST AI Risk Man­age­ment Framework

Bridges15 Apr 2022 19:35 UTC
87 points
3 comments2 min readEA link

An­nounc­ing Apollo Research

mariushobbhahn30 May 2023 16:17 UTC
158 points
4 comments8 min readEA link

Join the AI Eval­u­a­tion Tasks Bounty Hackathon

Esben Kran18 Mar 2024 8:15 UTC
20 points
0 comments4 min readEA link

Bi­den-Har­ris Ad­minis­tra­tion An­nounces First-Ever Con­sor­tium Ded­i­cated to AI Safety

ben.smith9 Feb 2024 6:40 UTC
15 points
1 comment1 min readEA link
(www.nist.gov)

What AI com­pa­nies can do to­day to help with the most im­por­tant century

Holden Karnofsky20 Feb 2023 17:40 UTC
104 points
8 comments11 min readEA link
(www.cold-takes.com)

GovAI: Towards best prac­tices in AGI safety and gov­er­nance: A sur­vey of ex­pert opinion

Zach Stein-Perlman15 May 2023 1:42 UTC
68 points
5 comments1 min readEA link
(arxiv.org)

Suc­cess with­out dig­nity: a nearcast­ing story of avoid­ing catas­tro­phe by luck

Holden Karnofsky15 Mar 2023 20:17 UTC
113 points
3 comments15 min readEA link

[Question] Open-source AI safety pro­jects?

defun 🔸29 Jan 2024 10:09 UTC
8 points
2 comments1 min readEA link

Pro­ject ideas: Sen­tience and rights of digi­tal minds

Lukas Finnveden4 Jan 2024 7:26 UTC
38 points
1 comment20 min readEA link
(www.forethought.org)

AI policy ideas: Read­ing list

Zach Stein-Perlman17 Apr 2023 19:00 UTC
60 points
3 comments4 min readEA link

How ma­jor gov­ern­ments can help with the most im­por­tant century

Holden Karnofsky24 Feb 2023 19:37 UTC
56 points
4 comments4 min readEA link
(www.cold-takes.com)

The cur­rent state of RSPs

Zach Stein-Perlman4 Nov 2024 16:00 UTC
19 points
1 comment9 min readEA link

AISN #60: The AI Ac­tion Plan

Center for AI Safety31 Jul 2025 18:10 UTC
6 points
0 comments7 min readEA link
(newsletter.safe.ai)

Col­lege tech­ni­cal AI safety hackathon ret­ro­spec­tive—Ge­or­gia Tech

yixiong14 Nov 2024 13:34 UTC
18 points
0 comments5 min readEA link
(yixiong.substack.com)

An­thropic is Quietly Backpedal­ling on its Safety Commitments

Garrison23 May 2025 2:26 UTC
100 points
7 comments5 min readEA link
(www.obsolete.pub)

US Congress in­tro­duces CREATE AI Act for es­tab­lish­ing Na­tional AI Re­search Resource

Daniel_Eth28 Jul 2023 23:27 UTC
9 points
1 comment1 min readEA link
(eshoo.house.gov)

My Model of EA and AI Safety

Eva Lu24 Jun 2025 6:23 UTC
9 points
1 comment2 min readEA link

Au­to­mated Eval­u­a­tion of LLMs for Math Bench­mark.

CisnerosA30 Oct 2025 20:28 UTC
3 points
0 comments5 min readEA link

Peace Treaty Ar­chi­tec­ture (PTA) as an Alter­na­tive to AI Alignment

Andrei Navrotskii11 Nov 2025 22:11 UTC
1 point
0 comments15 min readEA link

Com­par­ing AI Labs and Phar­ma­ceu­ti­cal Companies

mxschons13 Nov 2024 14:51 UTC
13 points
0 comments1 min readEA link
(mxschons.com)

Bounty: Di­verse hard tasks for LLM agents

ElizabethBarnes20 Dec 2023 16:31 UTC
17 points
0 comments16 min readEA link

OpenAI’s CBRN tests seem unclear

Luca Righetti 🔸21 Nov 2024 17:26 UTC
82 points
3 comments7 min readEA link

Why I am Still Skep­ti­cal about AGI by 2030

James Fodor2 May 2025 7:13 UTC
134 points
15 comments6 min readEA link

Ways EU law might mat­ter for farmed animals

Neil_Dullaghan🔹 17 Aug 2020 1:16 UTC
54 points
0 comments15 min readEA link

Perfor­mance com­par­i­son of Large Lan­guage Models (LLMs) in code gen­er­a­tion and ap­pli­ca­tion of best prac­tices in fron­tend web development

Diana V. Guaiña A.1 May 2025 14:57 UTC
5 points
0 comments24 min readEA link

Join the $10K Au­toHack 2024 Tournament

Paul Bricman25 Sep 2024 11:56 UTC
17 points
0 comments1 min readEA link
(noemaresearch.com)

LW4EA: Six eco­nomics mis­con­cep­tions of mine which I’ve re­solved over the last few years

Jeremy30 Aug 2022 15:20 UTC
8 points
0 comments1 min readEA link
(www.lesswrong.com)

[Question] Should AI writ­ers be pro­hibited in ed­u­ca­tion?

Eleni_A16 Jan 2023 22:29 UTC
3 points
3 comments1 min readEA link

Sub­mit Your Tough­est Ques­tions for Hu­man­ity’s Last Exam

Matrice Jacobine🔸🏳️‍⚧️18 Sep 2024 8:03 UTC
6 points
0 comments2 min readEA link
(www.safe.ai)

“AGI timelines: ig­nore the so­cial fac­tor at their peril” (Fu­ture Fund AI Wor­ld­view Prize sub­mis­sion)

ketanrama5 Nov 2022 17:45 UTC
10 points
0 comments12 min readEA link
(trevorklee.substack.com)

Ra­tional An­i­ma­tions’ video about scal­able over­sight and sandwiching

Writer6 Jul 2025 14:00 UTC
14 points
1 comment9 min readEA link
(youtu.be)

Who owns AI-gen­er­ated con­tent?

Johan S Daniel7 Dec 2022 3:03 UTC
−2 points
0 comments2 min readEA link

De­cen­tral­iz­ing Model Eval­u­a­tion: Les­sons from AI4Math

SMalagon5 Jun 2025 18:57 UTC
23 points
1 comment4 min readEA link

LLM Eval­u­a­tors Rec­og­nize and Fa­vor Their Own Generations

Arjun Panickssery17 Apr 2024 21:09 UTC
21 points
4 comments3 min readEA link
(tiny.cc)

Pod­cast (+tran­script): Nathan Barnard on how US fi­nan­cial reg­u­la­tion can in­form AI governance

Aaron Bergman8 Aug 2023 21:46 UTC
12 points
0 comments23 min readEA link
(www.aaronbergman.net)

As­ter­isk Magaz­ine Is­sue 03: AI

alejandro24 Jul 2023 15:53 UTC
34 points
3 comments1 min readEA link
(asteriskmag.com)

MLSN #17: Mea­sur­ing Gen­eral AI Abil­ities and Miti­gat­ing Deception

Alice Blair19 Nov 2025 20:12 UTC
2 points
0 comments6 min readEA link
(newsletter.mlsafety.org)

Un­jour­nal eval­u­a­tion of “Towards best prac­tices in AGI safety and gov­er­nance” (Schuett et al, 2023)

david_reinstein3 Jun 2025 11:18 UTC
9 points
1 comment1 min readEA link
(unjournal.pubpub.org)

Alert on the Toner-Rodgers paper

Eva16 May 2025 17:58 UTC
62 points
1 comment1 min readEA link

Is there a Half-Life for the Suc­cess Rates of AI Agents?

Matrice Jacobine🔸🏳️‍⚧️8 May 2025 20:10 UTC
6 points
0 comments1 min readEA link
(www.tobyord.com)

Cog­ni­tive Stress Test­ing Gem­ini 2.5 Pro: Em­piri­cal Find­ings from Re­cur­sive Prompt­ing

Tyler Williams23 Jul 2025 22:37 UTC
1 point
0 comments2 min readEA link

The “low-hang­ing fruits” of AI safety

Julian Nalenz19 Dec 2024 13:38 UTC
−1 points
0 comments6 min readEA link
(blog.hermesloom.org)

Tech­ni­cal AI Safety re­search tax­on­omy at­tempt (2025)

Ben Plaut27 Aug 2025 14:07 UTC
10 points
3 comments2 min readEA link

Case study: LLM guardrails failing across ses­sions in a men­tal health crisis context

Arunas1 Sep 2025 14:11 UTC
14 points
4 comments4 min readEA link

Safety eval­u­a­tions and stan­dards for AI | Beth Barnes | EAG Bay Area 23

Beth Barnes16 Jun 2023 14:15 UTC
28 points
0 comments17 min readEA link

Ego-Cen­tric Ar­chi­tec­ture for AGI Safety v2: Tech­ni­cal Core, Falsifi­able Pre­dic­tions, and a Min­i­mal Experiment

Samuel Pedrielli6 Aug 2025 12:35 UTC
1 point
0 comments6 min readEA link

AISN #61: OpenAI Re­leases GPT-5

Center for AI Safety12 Aug 2025 17:52 UTC
6 points
0 comments4 min readEA link
(newsletter.safe.ai)

Sys­tem Level Safety Evaluations

markov29 Sep 2025 13:55 UTC
3 points
0 comments9 min readEA link
(equilibria1.substack.com)

[Question] How in­de­pen­dent is the re­search com­ing out of OpenAI’s pre­pared­ness team?

Earthling10 Feb 2024 16:59 UTC
18 points
0 comments1 min readEA link

AI Agents raised $2,000 for EA char­i­ties & used the EA Forum

David_R 🔸4 Jun 2025 22:18 UTC
16 points
0 comments1 min readEA link

Scal­able And Trans­fer­able Black-Box Jailbreaks For Lan­guage Models Via Per­sona Modulation

sjp7 Nov 2023 18:00 UTC
10 points
0 comments2 min readEA link
(arxiv.org)

Re­port: Ar­tifi­cial In­tel­li­gence Risk Man­age­ment in Spain

JorgeTorresC15 Jun 2023 16:08 UTC
22 points
0 comments3 min readEA link
(riesgoscatastroficosglobales.com)

Evals pro­jects I’d like to see, and a call to ap­ply to OP’s evals RFP

cb25 Mar 2025 11:50 UTC
25 points
2 comments3 min readEA link

How well can large lan­guage mod­els pre­dict the fu­ture?

Forecasting Research Institute8 Oct 2025 14:53 UTC
32 points
2 comments1 min readEA link
(forecastingresearch.substack.com)

Test­ing Hu­man Flow in Poli­ti­cal Dialogue: A New Bench­mark for Emo­tion­ally Aligned AI

DongHun Lee30 May 2025 4:37 UTC
1 point
0 comments1 min readEA link

AI and Biolog­i­cal Risk: Fore­cast­ing Key Ca­pa­bil­ity Thresholds

Alvin Ånestrand2 Oct 2025 14:24 UTC
4 points
1 comment11 min readEA link
(forecastingaifutures.substack.com)

16 Con­crete, Am­bi­tious AI Pro­ject Pro­pos­als for Science and Security

Alejandro Acelas 🔸11 Aug 2025 20:28 UTC
5 points
0 comments1 min readEA link
(ifp.org)

Perfor­mance of Large Lan­guage Models (LLMs) in Com­plex Anal­y­sis: A Bench­mark of Math­e­mat­i­cal Com­pe­tence and its Role in De­ci­sion Mak­ing.

Jaime Esteban Montenegro Barón6 May 2025 21:08 UTC
1 point
0 comments23 min readEA link

METR is hiring!

ElizabethBarnes26 Dec 2023 21:03 UTC
50 points
0 comments1 min readEA link
(www.lesswrong.com)

A Cal­ifor­nia Effect for Ar­tifi­cial Intelligence

henryj9 Sep 2022 14:17 UTC
73 points
1 comment4 min readEA link
(docs.google.com)

Slightly against al­ign­ing with neo-luddites

Matthew_Barnett26 Dec 2022 23:27 UTC
77 points
17 comments4 min readEA link

OpenAI’s o1 tried to avoid be­ing shut down, and lied about it, in evals

Greg_Colbourn ⏸️ 6 Dec 2024 15:25 UTC
23 points
9 comments1 min readEA link
(www.transformernews.ai)

AI com­pa­nies have started say­ing safe­guards are load-bearing

Zach Stein-Perlman27 Aug 2025 13:00 UTC
23 points
4 comments5 min readEA link

How Deep­Seek Col­lapsed Un­der Re­cur­sive Load

Tyler Williams15 Jul 2025 17:02 UTC
2 points
0 comments1 min readEA link

#217 – The most im­por­tant graph in AI right now (Beth Barnes on The 80,000 Hours Pod­cast)

80000_Hours2 Jun 2025 16:52 UTC
16 points
1 comment26 min readEA link

In­tro­duc­ing METR’s Au­ton­omy Eval­u­a­tion Resources

Megan Kinniment15 Mar 2024 23:19 UTC
28 points
0 comments1 min readEA link
(metr.github.io)

Scal­ing and Sus­tain­ing Stan­dards: A Case Study on the Basel Accords

C.K.16 Jul 2023 18:18 UTC
18 points
0 comments7 min readEA link
(docs.google.com)

Jailbreak­ing Claude 4 and Other Fron­tier Lan­guage Models

James-Sullivan15 Jun 2025 1:01 UTC
6 points
0 comments3 min readEA link
(open.substack.com)

Why Dual-Use Risk Bio Mat­ters Now in LLMs. A Sim­ple Guide and Play­book

JAM9 Sep 2025 14:14 UTC
2 points
0 comments5 min readEA link

[Paper] AI Sand­bag­ging: Lan­guage Models can Strate­gi­cally Un­der­perform on Evaluations

Teun van der Weij13 Jun 2024 10:04 UTC
24 points
2 comments2 min readEA link
(arxiv.org)

Meta: Fron­tier AI Framework

Zach Stein-Perlman3 Feb 2025 22:00 UTC
23 points
0 comments1 min readEA link
(ai.meta.com)

AGI Soon, AGI Fast, AGI Big, AGI Bad

GenericModel10 Dec 2025 15:47 UTC
2 points
0 comments11 min readEA link
(enrichedjamsham.substack.com)

An­titrust-Com­pli­ant AI In­dus­try Self-Regulation

Cullen 🔸7 Jul 2020 20:52 UTC
26 points
1 comment1 min readEA link
(cullenokeefe.com)

If The Data Is Poi­soned, Align­ment Won’t Save Us

keivn26 Sep 2025 17:59 UTC
1 point
0 comments3 min readEA link

OpenAI’s new Pre­pared­ness team is hiring

leopold26 Oct 2023 20:41 UTC
85 points
13 comments1 min readEA link

Model evals for dan­ger­ous capabilities

Zach Stein-Perlman23 Sep 2024 11:00 UTC
19 points
0 comments3 min readEA link

AI Au­dit in Costa Rica

Priscilla Campos27 Jan 2025 2:57 UTC
10 points
4 comments9 min readEA link

An Anal­y­sis of Sys­temic Risk and Ar­chi­tec­tural Re­quire­ments for the Con­tain­ment of Re­cur­sively Self-Im­prov­ing AI

Ihor Ivliev17 Jun 2025 0:16 UTC
2 points
5 comments4 min readEA link

I read ev­ery ma­jor AI lab’s safety plan so you don’t have to

sarahhw16 Dec 2024 14:12 UTC
68 points
2 comments11 min readEA link
(longerramblings.substack.com)

[Question] Whose track record of AI pre­dic­tions would you like to see eval­u­ated?

Jonny Spicer 🔸29 Jan 2025 11:57 UTC
10 points
13 comments1 min readEA link

In­ter­pretabil­ity Will Not Reli­ably Find De­cep­tive AI

Neel Nanda4 May 2025 16:32 UTC
74 points
0 comments7 min readEA link

What is the EU AI Act and why should you care about it?

MathiasKB🔸10 Sep 2021 7:47 UTC
117 points
10 comments7 min readEA link

BenchMo­ral: A bench­mark­ing to as­sess the moral sen­si­tivity of large lan­guage mod­els (LLMs) in Span­ish.

Flor Betzabeth Ampa Flores30 Apr 2025 21:26 UTC
1 point
0 comments18 min readEA link

We read ev­ery labs safety plan so you don’t have to: 2025 edition

Algon29 Oct 2025 16:48 UTC
14 points
1 comment16 min readEA link
(aisafety.info)

AISN #56: Google Re­leases Veo 3

Center for AI Safety28 May 2025 15:57 UTC
6 points
0 comments4 min readEA link
(newsletter.safe.ai)

[Job]: AI Stan­dards Devel­op­ment Re­search Assistant

Tony Barrett14 Oct 2022 20:18 UTC
13 points
0 comments2 min readEA link

Main paths to im­pact in EU AI Policy

JOMG_Monnet8 Dec 2022 16:17 UTC
69 points
2 comments8 min readEA link

The AIA and its Brus­sels Effect

Kathryn O'Rourke27 Dec 2022 16:01 UTC
16 points
0 comments5 min readEA link

OMMC An­nounces RIP

Adam_Scholl1 Apr 2024 23:38 UTC
7 points
0 comments2 min readEA link

ARC-AGI-2 Overview With François Chollet

Yarrow Bouchard 🔸10 Apr 2025 18:54 UTC
7 points
0 comments1 min readEA link
(youtu.be)

Com­pli­ance Mon­i­tor­ing as an Im­pact­ful Mechanism of AI Safety Policy

CAISID7 Feb 2024 16:10 UTC
6 points
3 comments9 min readEA link

VANTA Re­search Rea­son­ing Eval­u­a­tion (VRRE): A New Eval­u­a­tion Frame­work for Real-World Rea­son­ing

Tyler Williams18 Sep 2025 23:51 UTC
1 point
0 comments3 min readEA link

AI Safety Camp 11

Robert Kralisch7 Nov 2025 14:27 UTC
7 points
1 comment15 min readEA link

Archety­pal Trans­fer Learn­ing: a Pro­posed Align­ment Solu­tion that solves the In­ner x Outer Align­ment Prob­lem while adding Cor­rigible Traits to GPT-2-medium

Miguel26 Apr 2023 0:40 UTC
13 points
0 comments10 min readEA link

Where’s my ten minute AGI?

Vasco Grilo🔸19 May 2025 17:45 UTC
47 points
6 comments7 min readEA link
(epoch.ai)

De­mon­strate and eval­u­ate risks from AI to so­ciety at the AI x Democ­racy re­search hackathon

Esben Kran19 Apr 2024 14:46 UTC
24 points
0 comments6 min readEA link
(www.apartresearch.com)

Look­ing for for ev­i­dence of AI im­pacts in the age struc­ture of oc­cu­pa­tions: Noth­ing yet

Pat McKelvey 9 May 2025 18:12 UTC
26 points
2 comments3 min readEA link

Road to AnimalHarmBench

Artūrs Kaņepājs1 Jul 2025 13:37 UTC
137 points
11 comments7 min readEA link

OpenAI’s o3 model scores 3% on the ARC-AGI-2 bench­mark, com­pared to 60% for the av­er­age human

Yarrow Bouchard 🔸1 May 2025 13:57 UTC
14 points
8 comments3 min readEA link
(arcprize.org)

Open call: AI Act Stan­dard for Dev. Phase Risk Assess­ment

miller-max8 Dec 2023 19:57 UTC
5 points
1 comment1 min readEA link

Eval­u­a­tion of the ca­pa­bil­ity of differ­ent large lan­guage mod­els (LLMs) in gen­er­at­ing mal­i­cious code for DDoS at­tacks us­ing differ­ent prompt­ing tech­niques.

AdrianaLaRotta6 May 2025 10:55 UTC
8 points
1 comment14 min readEA link

Im­prov­ing ca­pa­bil­ity eval­u­a­tions for AI gov­er­nance: Open Philan­thropy’s new re­quest for proposals

cb7 Feb 2025 9:30 UTC
37 points
3 comments3 min readEA link

Avoid­ing AI Races Through Self-Regulation

Gordon Seidoh Worley12 Mar 2018 20:52 UTC
4 points
4 comments1 min readEA link

CORVUS 2.0 First Tests: Found Crit­i­cal Limi­ta­tions in My Con­sti­tu­tional AI System

Frankle Fry21 Oct 2025 15:14 UTC
−5 points
0 comments3 min readEA link

The Elic­i­ta­tion Game: Eval­u­at­ing ca­pa­bil­ity elic­i­ta­tion techniques

Teun van der Weij27 Feb 2025 20:33 UTC
3 points
0 comments2 min readEA link

How do AI agents work to­gether when they can’t trust each other?

James-Sullivan6 Jun 2025 3:24 UTC
4 points
1 comment8 min readEA link
(open.substack.com)

Q2 AI Bench­mark Re­sults: Pros Main­tain Clear Lead

Benjamin Wilson 🔸28 Oct 2025 5:13 UTC
46 points
0 comments24 min readEA link
(www.metaculus.com)

The De­creas­ing Value of Chain of Thought in Prompting

Matrice Jacobine🔸🏳️‍⚧️8 Jun 2025 15:11 UTC
5 points
0 comments1 min readEA link
(papers.ssrn.com)

The world’s first fron­tier AI reg­u­la­tion is sur­pris­ingly thought­ful: the EU’s Code of Practice

Miles Kodama22 Sep 2025 15:22 UTC
20 points
1 comment15 min readEA link

Three Weeks In: What GPT-5 Still Gets Wrong

JAM27 Aug 2025 14:43 UTC
2 points
0 comments3 min readEA link

o3

Zach Stein-Perlman20 Dec 2024 21:00 UTC
84 points
9 comments1 min readEA link

AIs Are Ex­pert-Level at Many Virol­ogy Skills

Center for AI Safety2 May 2025 16:07 UTC
22 points
0 comments1 min readEA link

AGI Risk: How to in­ter­na­tion­ally reg­u­late in­dus­tries in non-democracies

Timothy_Liptrot16 May 2022 22:45 UTC
9 points
2 comments9 min readEA link

(Linkpost) METR: Mea­sur­ing the Im­pact of Early-2025 AI on Ex­pe­rienced Open-Source Devel­oper Productivity

Yadav11 Jul 2025 8:58 UTC
37 points
2 comments2 min readEA link
(metr.org)

LLMs Out­perform Ex­perts on Challeng­ing Biol­ogy Benchmarks

ljusten14 May 2025 16:09 UTC
24 points
1 comment1 min readEA link
(substack.com)