Formerly a software engineer at Google, now I’m doing independent AI alignment research.
Because of my focus on AI alignment, I tend to post more on LessWrong and AI Alignment Forum than I do here.
I’m always happy to connect with other researchers or people interested in AI alignment and effective altruism. Feel free to send me a private message!
I place significant weight on the possibility that when labs are in the process of training AGI or near-AGI systems, they will be able to see alignment opportunities that we can’t from a more theoretical or distanced POV. In this sense, I’m sympathetic to Anthropic’s empirical approach to safety. I also think there are a lot of really smart and creative people working at these labs.
Leading labs also employ some people focused on the worst risks. For misalignment risks, I am most worried about deceptive alignment, and Anthropic recently hired one of the people who coined that term. (From this angle, I would feel safer about these risks if Anthropic were in the lead rather than OpenAI. I know less about OpenAI’s current alignment team.)
Let me be clear though: Even if I’m right above and massively catastrophic misalignment risk one of these labs creating AGI is ~20%, I consider that very much an unacceptably high risk. I think even a 1% chance of extinction is unacceptably high. If some other kind of project had a 1% chance of causing human extinction, I don’t think the public would stand for it. Imagine some particle accelerator or biotech project had a 1% chance of causing human extinction. If the public found out, I think they would want the project shut down immediately until it could be pursued safely. And I think they would be justified in that, if there’s a way to coordinate on doing so.