RSS

William_S

Karma: 334

I worked at OpenAI for three years, from 2021-2024 on the Alignment team, which eventually became the Superalignment team. I worked on scalable oversight, part of the team developing critiques as a technique for using language models to spot mistakes in other language models. I then worked to refine an idea from Nick Cammarata into a method for using language model to generate explanations for features in language models. I was then promoted to managing a team of 4 people which worked on trying to understand language model features in context, leading to the release of an open source “transformer debugger” tool.
I resigned from OpenAI on February 15, 2024.

Prin­ci­ples for the AGI Race

William_S30 Aug 2024 14:30 UTC
74 points
4 comments18 min readEA link

William_S’s Quick takes

William_S3 May 2024 18:14 UTC
3 points
6 comments1 min readEA link