Link post
How capable will top AI models be in 2025?
Forecast LLM agents’ autonomous replication & adaptation (ARA) abilities and model performance on benchmarks like GPQA & GAIA in AI Benchmarks, a collaboration with the AI Safety Student Team at Harvard (AISST). Start here. AISST questions are inspired by work by @elifland.
AI Benchmarks Series — Metaculus Questions on Evaluations of AI Models Against Technical Benchmarks
Link post
How capable will top AI models be in 2025?
Forecast LLM agents’ autonomous replication & adaptation (ARA) abilities and model performance on benchmarks like GPQA & GAIA in AI Benchmarks, a collaboration with the AI Safety Student Team at Harvard (AISST).
Start here.
AISST questions are inspired by work by @elifland.