RSS

AI benchmarks

TagLast edit: Feb 2, 2024, 10:57 AM by Toby TremlettšŸ”¹

Benchmarks are tests which enable us to measure the progress of AI capabilities, and test for characteristics which might pose safety risks.

Further reading

The Benchmark Lottery

BASALT: A Benchmark for Learning from Human Feedbackā€”AI Alignment Forum

Misaligned Powerseeking ā€” SERI ML Alignment Theory Scholars Program | Summer 2022

[2110.06674] Truthful AI: Developing and governing AI that does not lie

Related entries

AI safety | standards and regulation

ļƒTrendlines in AIxBio evals

ljustenOct 31, 2024, 12:09 AM
39 points
2 comments11 min readEA link
(www.lennijusten.com)

ļƒOpen Phil reĀ­leases RFPs on LLM BenchĀ­marks and Forecasting

Lawrence ChanNov 11, 2023, 3:01 AM
12 points
0 comments1 min readEA link
(www.openphilanthropy.org)

Prizes for ML Safety BenchĀ­mark Ideas

JoshcOct 28, 2022, 2:44 AM
56 points
8 comments1 min readEA link

SurĀ­vey on the acĀ­celĀ­erĀ­aĀ­tion risks of our new RFPs to study LLM capabilities

AjeyaNov 10, 2023, 11:59 PM
38 points
1 comment8 min readEA link

ļƒAnĀ­nouncĀ­ing Epochā€™s newly exĀ­panded PaĀ­ramĀ­eĀ­ters, ComĀ­pute and Data Trends in MaĀ­chine LearnĀ­ing database

Robi RahmanOct 25, 2023, 3:03 AM
38 points
1 comment1 min readEA link
(epochai.org)

$250K in Prizes: SafeBench ComĀ­peĀ­tiĀ­tion AnĀ­nounceĀ­ment

Center for AI SafetyApr 3, 2024, 10:07 PM
47 points
0 comments1 min readEA link

XPT foreĀ­casts on (some) Direct ApĀ­proach model inputs

Forecasting Research InstituteAug 20, 2023, 12:39 PM
37 points
0 comments9 min readEA link

LanĀ­guage modĀ­els surĀ­prised us

AjeyaAug 29, 2023, 9:18 PM
59 points
10 comments5 min readEA link

Long list of AI quesĀ­tions

NunoSempereDec 6, 2023, 11:12 AM
124 points
14 comments86 min readEA link

ļƒReĀ­sults from an AdĀ­verĀ­sarĀ­ial ColĀ­labĀ­oĀ­raĀ­tion on AI Risk (FRI)

Forecasting Research InstituteMar 11, 2024, 3:54 PM
193 points
25 comments9 min readEA link
(forecastingresearch.org)

A comĀ­pute-based frameĀ­work for thinkĀ­ing about the fuĀ­ture of AI

Matthew_BarnettMay 31, 2023, 10:00 PM
96 points
36 comments19 min readEA link

AnĀ­nouncĀ­ing Epochā€™s dashĀ­board of key trends and figures in MaĀ­chine Learning

Jaime SevillaApr 13, 2023, 7:33 AM
127 points
4 comments1 min readEA link

ļƒAI ForeĀ­castĀ­ing ReĀ­search Ideas

Jaime SevillaNov 17, 2022, 5:37 PM
78 points
1 comment1 min readEA link
(docs.google.com)

BenchĀ­mark PerforĀ­mance is a Poor MeaĀ­sure of GenĀ­erĀ­alĀ­isĀ­able AI ReaĀ­sonĀ­ing Capabilities

James FodorFeb 21, 2025, 4:25 AM
12 points
3 comments24 min readEA link

Fact Check: 57% of the inĀ­terĀ­net is NOT AI-genĀ­erĀ­ated

James-Hartree-LawJan 17, 2025, 9:26 PM
1 point
0 comments1 min readEA link

ļƒThe MASK BenchĀ­mark: DisenĀ­tanĀ­gling HonĀ­esty From AcĀ­cuĀ­racy in AI Systems

Mantas MazeikaMar 4, 2025, 5:44 PM
22 points
0 comments2 min readEA link
(www.mask-benchmark.ai)

ļƒLaunchĀ­ing the AI ForeĀ­castĀ­ing BenchĀ­mark Series Q3 | $30k in Prizes

christianJul 8, 2024, 5:20 PM
17 points
0 comments1 min readEA link
(www.metaculus.com)

ļƒAnĀ­nouncĀ­ing the AI ForeĀ­castĀ­ing BenchĀ­mark Series | July 8, $120k in Prizes

christianJun 19, 2024, 9:37 PM
52 points
4 comments5 min readEA link
(www.metaculus.com)

o3

Zach Stein-PerlmanDec 20, 2024, 9:00 PM
84 points
5 comments1 min readEA link

ļƒWe are in a New Paradigm of AI Progressā€”OpenAIā€™s o3 model makes huge gains on the toughĀ­est AI benchĀ­marks in the world

GarrisonDec 22, 2024, 9:45 PM
26 points
0 comments4 min readEA link
(garrisonlovely.substack.com)

EnĀ­culĀ­tured AI, Part 1: EnĀ­abling New Benchmarks

Andrew CritchAug 8, 2022, 10:49 PM
17 points
0 comments6 min readEA link

MeĀ­tacĀ­uĀ­lus Q4 AI BenchĀ­markĀ­ing: Bots Are ClosĀ­ing The Gap

Molly HickmanFeb 19, 2025, 10:46 PM
41 points
7 comments13 min readEA link

ļƒIs AI HitĀ­ting a Wall or MovĀ­ing Faster Than Ever?

GarrisonJan 9, 2025, 10:18 PM
35 points
3 comments5 min readEA link
(garrisonlovely.substack.com)

PreĀ­dict 2025 AI caĀ­paĀ­bilĀ­ities (by SunĀ­day)

Jonas VJan 15, 2025, 12:16 AM
16 points
0 comments1 min readEA link

ļƒAI BenchĀ­marks Series ā€” MeĀ­tacĀ­uĀ­lus QuesĀ­tions on EvalĀ­uĀ­aĀ­tions of AI Models Against TechĀ­niĀ­cal Benchmarks

christianMar 27, 2024, 11:05 PM
10 points
0 comments1 min readEA link
(www.metaculus.com)

Race to the Top: BenchĀ­marks for AI Safety

isaduanDec 4, 2022, 10:50 PM
52 points
8 comments1 min readEA link