I don’t think alignment is a problem that can be solved. I think we can do better and better. But to have it be existentially safe, the bar seems really, really high and I don’t think we’re going to get there. So we’re going to need to have some ability to coordinate and say let’s not pursue this development path or let’s not deploy these kinds of systems right now.
I don’t like the framing of “solving” “the” alignment problem. I picture something like “Taking as many measures as we can (see previous post) to make catastrophic misalignment as unlikely as we can for the specific systems we’re deploying in the specific contexts we’re deploying them in, then using those systems as part of an ongoing effort to further improve alignment measures that can be applied to more-capable systems.” In other words, I don’t think there is a single point where the alignment problem is “solved”; instead I think we will face a number of “alignment problems” for systems with different capabilities. (And I think there could be some systems that are very easy to align, but just not very powerful.) So I tend to talk about whether we have “systems that are both aligned and transformative” rather than whether the “alignment problem is solved.”
Holden makes a similar point in “Nearcast-based ‘deployment problem’ analysis” (2022):