What does this mean for OpenAI’s 25% score on the benchmark?
Note that only some of FrontierMath’s problems are actually frontier, while others are relatively easier (i.e. IMO level, and Deepmind was already one point from gold on IMO level problems) https://x.com/ElliotGlazer/status/1870235655714025817
Note that the hold-out set doesn’t exist yet. https://x.com/ElliotGlazer/status/1880812021966602665
Note that only some of FrontierMath’s problems are actually frontier, while others are relatively easier (i.e. IMO level, and Deepmind was already one point from gold on IMO level problems) https://x.com/ElliotGlazer/status/1870235655714025817