Working to reduce extreme suffering for all sentient beings.
Author of Suffering-Focused Ethics: Defense and Implications; Reasoned Politics; & Essays on Suffering-Focused Ethics.
Co-founder (with Tobias Baumann) of the Center for Reducing Suffering (CRS).
Naive projection about o4 and beyond
The Codeforces Elo progression from o1-mini to o3-mini was around 400 points (with compute costs held constant). Similarly, the Elo jumps from 4o (~800) to o1-preview (~1250) to o1-mini (~1650) were also each around 400 points (the compute costs of 4o appear similar to those of o1-mini, while they’re higher for o1-preview).
People from OpenAI report that o4 is now being trained and that training runs take around three months in the current “reasoning paradigm”. So if we were to engage in naive projection, we might project a continued ~400 point Codeforces progression every three months.
Below is a naive such projection for the o1-mini cost range, with the dates referring to when model scores are announced (not when the models are released).
March 2025 (March 14th?): o4 ~2400
June 2025: o5 ~2800
September 2025: o6 ~3200
December 2025: o7 ~3600
If high compute adds around 700 Elo points for full o7 (as it does for o3), this would give full o7 a superhuman score of ~4300
March 2026: o8 ~4000 (a score only ever achieved by two people)
June 2026: o9 ~4400 (superhuman level for cheap)
Part of the motivation for making such a naive projection is that it can provide a salient yardstick to hold future progress up against, to notice whether progress on this benchmark is slowing down, keeping pace, or accelerating.
Additionally, as further motivation, one can note that there is some precedent for Elo scores improving linearly over time in other domains, e.g. in chess:
Likewise, while they’re more subjective, Elo scores on the LLM leaderboard also appear to have increased fairly consistently by an average of ~20 points per month over the last year (the trend has continued beyond the graph below; the current top 10 average is at the ~1360 level one would have predicted based on a naive extrapolation of the post-2023-11 trendline below):