How might we solve the alignment problem?

This is a four-part series of posts about how we might solve the alignment problem. It also builds off of my previous post, here, about what it would even be to solve the alignment problem; and to some extent, off of this post outlining a framework for thinking the incentives at stake in AI power-seeking.

How might we solve the al­ign­ment prob­lem? (Part 1: In­tro, sum­mary, on­tol­ogy)

Mo­ti­va­tion control

Op­tion control

In­cen­tive de­sign and ca­pa­bil­ity elicitation