Senior research analyst at Open Philanthropy. Doctorate in philosophy at the University of Oxford. Opinions my own.
Joe_Carlsmith
Karma: 3,426
AI for AI safety
Paths and waystations in AI safety
When should we worry about AI power-seeking?
What is it to solve the alignment problem?
How do we solve the alignment problem?
Fake thinking and real thinking
Takes on “Alignment Faking in Large Language Models”
Incentive design and capability elicitation
Option control
Motivation control
How might we solve the alignment problem? (Part 1: Intro, summary, ontology)
Video and transcript of presentation on Otherness and control in the age of AGI
What is it to solve the alignment problem? (Notes)
Value fragility and AI takeover
A framework for thinking about AI power-seeking
Loving a world you don’t trust
On “first critical tries” in AI alignment
Thanks for making this, Michel :)
Very glad to hear it, Lizka :) -- and thanks for letting me know.