Alignment Theory Series

Distillation pieces for those who want to start from somewhere but don’t know where.

De­cep­tion as the op­ti­mal: mesa-op­ti­miz­ers and in­ner al­ign­ment

Three sce­nar­ios of pseudo-al­ign­ment

My sum­mary of “Prag­matic AI Safety”