bruce comments on bruce’s Quick takes

bruce 19 Jan 2025 8:25 UTC
45 points
3 ∶ 0
Reposting from LessWrong, for people who might be less active there:^[1]

TL;DR
- FrontierMath was funded by OpenAI^[2]
- This was not publicly disclosed until December 20th, the date of OpenAI’s o3 announcement, including in earlier versions of the arXiv paper where this was eventually made public.
- There was allegedly no active communication about this funding to the mathematicians contributing to the project before December 20th, due to the NDAs Epoch signed, but also no communication after the 20th, once the NDAs had expired.
- OP claims that “I have heard second-hand that OpenAI does have access to exercises and answers and that they use them for validation. I am not aware of an agreement between Epoch AI and OpenAI that prohibits using this dataset for training if they wanted to, and have slight evidence against such an agreement existing.”
Tamay’s response:
- Seems to have confirmed the OpenAI funding + NDA restrictions
- Claims OpenAI has “access to a large fraction of FrontierMath problems and solutions, with the exception of a unseen-by-OpenAI hold-out set that enables us to independently verify model capabilities.”
  - They also have “a verbal agreement that these materials will not be used in model training.”
Edit (19/01): Elliot (the project lead) points out that the holdout set does not yet exist (emphasis added):
As for where the o3 score on FM stands: yes I believe OAI has been accurate with their reporting on it, but Epoch can’t vouch for it until we independently evaluate the model using the holdout set we are developing.^[3]
Edit (24/01):
Tamay tweets an apology (possibly including the timeline drafted by Elliot). It’s pretty succinct so I won’t summarise it here! Blog post version for people without twitter. Perhaps the most relevant point:
OpenAI commissioned Epoch AI to produce 300 advanced math problems for AI evaluation that form the core of the FrontierMath benchmark. As is typical of commissioned work, OpenAI retains ownership of these questions and has access to the problems and solutions.
Nat from OpenAI with an update from their side:
- We did not use FrontierMath data to guide the development of o1 or o3, at all.
- We didn’t train on any FM derived data, any inspired data, or any data targeting FrontierMath in particular
- I’m extremely confident, because we only downloaded frontiermath for our evals *long* after the training data was frozen, and only looked at o3 FrontierMath results after the final announcement checkpoint was already picked .
============
Some quick uncertainties I had:
- What does this mean for OpenAI’s 25% score on the benchmark?
- What steps did Epoch take or consider taking to improve transparency between the time they were offered the NDA and the time of signing the NDA?
- What is Epoch’s level of confidence that OpenAI will keep to their verbal agreement to not use these materials in model training, both in some technically true sense, and in a broader interpretation of an agreement? (see e.g. bottom paragraph of Ozzi’s comment).
- In light of the confirmation that OpenAI not only has access to the problems and solutions but has ownership of them, what steps did Epoch consider before signing the relevant agreement to get something stronger than a verbal agreement that this won’t be used in training, now or in the future?
1. ^
  Epistemic status: quickly summarised + liberally copy pasted with ~0 additional fact checking given Tamay’s replies in the comment section
2. ^
  arXiv v5 (Dec 20th version) “We gratefully acknowledge OpenAI for their support in creating the benchmark.”
3. ^
  See clarification in case you interpreted Tamay’s comments (e.g. that OpenAI “do not have access to a separate holdout set that serves as an additional safeguard for independent verification”) to mean that the holdout set already exists
- Lorenzo Buonanno🔸 20 Jan 2025 1:49 UTC
  19 points
  2 ∶ 0
  Parent
  Note that the hold-out set doesn’t exist yet. https://x.com/ElliotGlazer/status/1880812021966602665
  What does this mean for OpenAI’s 25% score on the benchmark?
  Note that only some of FrontierMath’s problems are actually frontier, while others are relatively easier (i.e. IMO level, and Deepmind was already one point from gold on IMO level problems) https://x.com/ElliotGlazer/status/1870235655714025817
- yanni 20 Jan 2025 2:08 UTC
  8 points
  4 ∶ 1
  Parent
  first funding, then talent, then PR, and now this.
  
  how much juice will OpenAI squeeze out of EA?
  - NickLaing 22 Jan 2025 4:05 UTC
    −1 points
    0 ∶ 1
    Parent
    Its OK man because Sam has promised to donate 500 million a year to EA causes!
    - yanni 23 Jan 2025 1:52 UTC
      3 points
      0 ∶ 0
      Parent
      What did we say about making jokes on the forum Nick?
      - NickLaing 23 Jan 2025 3:07 UTC
        0 points
        1 ∶ 0
        Parent
        It’s true we’ve discussed this already...
- NunoSempere 22 Jan 2025 13:10 UTC
  5 points
  2 ∶ 0
  Parent
  I’ve known Jaime for about ten years. Seems like he made an arguably wrong call when first dealing with real powaah, but overall I’m confident his heart is in the right place.