Executive summary: A study comparing AI bots to expert human forecasters on real-world prediction questions found that humans still significantly outperform the best AI systems, though the gap may be narrowing.
Key points:
Pro human forecasters outperformed the top AI bots with statistical significance (p = 0.036) across 113 weighted questions.
AI bots showed worse calibration, discrimination, and scope sensitivity compared to human experts.
The best single AI bot (using GPT-4) performed better than versions using GPT-3.5 or Claude, but still worse than humans.
Areas for AI improvement include reducing positive bias, improving information retrieval, and enhancing scope sensitivity.
Study limitations include potential for random bot outperformance given enough attempts.
Future quarterly benchmarks will track how AI forecasting ability evolves over time.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, andcontact us if you have feedback.
Executive summary: A study comparing AI bots to expert human forecasters on real-world prediction questions found that humans still significantly outperform the best AI systems, though the gap may be narrowing.
Key points:
Pro human forecasters outperformed the top AI bots with statistical significance (p = 0.036) across 113 weighted questions.
AI bots showed worse calibration, discrimination, and scope sensitivity compared to human experts.
The best single AI bot (using GPT-4) performed better than versions using GPT-3.5 or Claude, but still worse than humans.
Areas for AI improvement include reducing positive bias, improving information retrieval, and enhancing scope sensitivity.
Study limitations include potential for random bot outperformance given enough attempts.
Future quarterly benchmarks will track how AI forecasting ability evolves over time.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.