RSS

AI benchmarks

TagLast edit: 2 Feb 2024 10:57 UTC by Toby Tremlett🔹

Benchmarks are tests which enable us to measure the progress of AI capabilities, and test for characteristics which might pose safety risks.

Further reading

The Benchmark Lottery

BASALT: A Benchmark for Learning from Human Feedback—AI Alignment Forum

Misaligned Powerseeking — SERI ML Alignment Theory Scholars Program | Summer 2022

[2110.06674] Truthful AI: Developing and governing AI that does not lie

Related entries

AI safety | standards and regulation

Open Phil re­leases RFPs on LLM Bench­marks and Forecasting

Lawrence Chan11 Nov 2023 3:01 UTC
12 points
0 comments1 min readEA link
(www.openphilanthropy.org)

Prizes for ML Safety Bench­mark Ideas

Joshc28 Oct 2022 2:44 UTC
56 points
8 comments1 min readEA link

Lan­guage mod­els sur­prised us

Ajeya29 Aug 2023 21:18 UTC
59 points
10 comments5 min readEA link

Long list of AI ques­tions

NunoSempere6 Dec 2023 11:12 UTC
124 points
14 comments86 min readEA link

Re­sults from an Ad­ver­sar­ial Col­lab­o­ra­tion on AI Risk (FRI)

Forecasting Research Institute11 Mar 2024 15:54 UTC
193 points
25 comments9 min readEA link
(forecastingresearch.org)

A com­pute-based frame­work for think­ing about the fu­ture of AI

Matthew_Barnett31 May 2023 22:00 UTC
96 points
36 comments19 min readEA link

An­nounc­ing Epoch’s dash­board of key trends and figures in Ma­chine Learning

Jaime Sevilla13 Apr 2023 7:33 UTC
127 points
4 comments1 min readEA link

AI Fore­cast­ing Re­search Ideas

Jaime Sevilla17 Nov 2022 17:37 UTC
78 points
1 comment1 min readEA link
(docs.google.com)

Sur­vey on the ac­cel­er­a­tion risks of our new RFPs to study LLM capabilities

Ajeya10 Nov 2023 23:59 UTC
38 points
1 comment8 min readEA link

An­nounc­ing Epoch’s newly ex­panded Pa­ram­e­ters, Com­pute and Data Trends in Ma­chine Learn­ing database

Robi Rahman25 Oct 2023 3:03 UTC
38 points
1 comment1 min readEA link
(epochai.org)

$250K in Prizes: SafeBench Com­pe­ti­tion An­nounce­ment

Center for AI Safety3 Apr 2024 22:07 UTC
47 points
0 comments1 min readEA link

XPT fore­casts on (some) Direct Ap­proach model inputs

Forecasting Research Institute20 Aug 2023 12:39 UTC
37 points
0 comments9 min readEA link

En­cul­tured AI, Part 1: En­abling New Benchmarks

Andrew Critch8 Aug 2022 22:49 UTC
17 points
0 comments6 min readEA link

AI Bench­marks Series — Me­tac­u­lus Ques­tions on Eval­u­a­tions of AI Models Against Tech­ni­cal Benchmarks

christian27 Mar 2024 23:05 UTC
10 points
0 comments1 min readEA link
(www.metaculus.com)

An­nounc­ing the AI Fore­cast­ing Bench­mark Series | July 8, $120k in Prizes

christian19 Jun 2024 21:37 UTC
50 points
4 comments5 min readEA link
(www.metaculus.com)

Launch­ing the AI Fore­cast­ing Bench­mark Series Q3 | $30k in Prizes

christian8 Jul 2024 17:20 UTC
17 points
0 comments1 min readEA link
(www.metaculus.com)

Trendlines in AIxBio evals

ljusten31 Oct 2024 0:09 UTC
11 points
1 comment11 min readEA link
(www.lennijusten.com)

Race to the Top: Bench­marks for AI Safety

isaduan4 Dec 2022 22:50 UTC
51 points
8 comments1 min readEA link