AI Benchmarks Series — Metaculus Questions on Evaluations of AI Models Against Technical Benchmarks

christian27 Mar 2024 23:05 UTC

10 points

AI benchmarks Announcements and updates AI forecasting Metaculus AI safety Forecasting

How capable will top AI models be in 2025?

Forecast LLM agents’ autonomous replication & adaptation (ARA) abilities and model performance on benchmarks like GPQA & GAIA in AI Benchmarks, a collaboration with the AI Safety Student Team at Harvard (AISST).

Start here.

AISST questions are inspired by work by @elifland.

christian27 Mar 2024 23:05 UTC

10 points

0 comments1 min readEA link

AI benchmarks Announcements and updates AI forecasting Metaculus AI safety Forecasting

No comments.