William_S’s Quick takes

William_S3 May 2024 18:14 UTC

3 points

6 comments EA link

William_S 3 May 2024 18:14 UTC
72 points
2 ∶ 0
I worked at OpenAI for three years, from 2021-2024 on the Alignment team, which eventually became the Superalignment team. I worked on scalable oversight, part of the team developing critiques as a technique for using language models to spot mistakes in other language models. I then worked to refine an idea from Nick Cammarata into a method for using language model to generate explanations for features in language models. I was then promoted to managing a team of 4 people which worked on trying to understand language model features in context, leading to the release of an open source “transformer debugger” tool.
I resigned from OpenAI on February 15, 2024.
- huw 5 May 2024 5:06 UTC
  32 points
  0 ∶ 0
  Parent
  FWIW on timelines:
  - June 13, 2022: Critiques paper (link 1)
  - May 9, 2023: Language models explain language models paper (link 2)
  - November 17, 2023: Altman removal & reinstatement
  - February 15, 2024: William_S resigns
  - March 8, 2024: Altman is reinstated to the OpenAI board
  - March 12, 2024: Transformer debugger is open-sourced
  - April 2024: Cullen O’Keefe departs (via LinkedIn)
  - April 11, 2024: Leopold Aschenbrenner & Pavel Izmailov fired for leaking information
  - April 18, 2024: Users notice Daniel Kokotaljo has resigned
- Will Aldred 3 May 2024 20:00 UTC
  26 points
  10 ∶ 0
  Parent
  Thank you for your work there. I’m curious about what made you resign, and also about why you’ve chosen now to communicate that?
  (I expect that you are under some form of NDA, and that if you were willing and able to talk about why you resigned then you would have done so in your initial post. Therefore, for readers interested in some possibly related news: last month, Daniel Kokotajlo quit OpenAI’s Futures/Governance team “due to losing confidence that it [OpenAI] would behave responsibly around the time of AGI,” and a Superalignment researcher was forced out of OpenAI in what may have been a political firing (source). OpenAI appears to be losing its most safety-conscious people.)
- yanni kyriacos 4 May 2024 10:19 UTC
  7 points
  2 ∶ 0
  Parent
  Hi William! Thanks for posting. Can you elaborate on your motivation for posting this Quick Take?
  - William_S 4 May 2024 18:21 UTC
    25 points
    0 ∶ 0
    Parent
    No comment.
    - Larks 7 May 2024 15:38 UTC
      14 points
      9 ∶ 0
      Parent
      Presumably NDA + forbidden to talk about the NDA (hence forbidden to talk about being forbidden to talk about … )