Lorenzo Buonanno🔸 comments on bruce’s Quick takes

Lorenzo Buonanno🔸 20 Jan 2025 1:49 UTC
19 points
2 ∶ 0
Note that the hold-out set doesn’t exist yet. https://x.com/ElliotGlazer/status/1880812021966602665
What does this mean for OpenAI’s 25% score on the benchmark?
Note that only some of FrontierMath’s problems are actually frontier, while others are relatively easier (i.e. IMO level, and Deepmind was already one point from gold on IMO level problems) https://x.com/ElliotGlazer/status/1870235655714025817