I think it’s worth separating out two questions about alignment in the re-run: (1) how likely are we to retain alignment knowledge, and (2) how likely are we to rediscover alignment knowledge? The second seems important because, for some possible futures rediscovery of alignment techniques seems likely to correlate strongly with rediscovery of the technology necessary to make ASI. Then, conditional on being able to make capable AI, we might be quite likely to be able to align it.
The more that alignment is easy, and the relevant techniques look a lot like the techniques you need to make capable AI in the first place, the more likely alignment rediscovery conditional on AI rediscovery seems. While it’s currently uncertain whether today’s alignment techniques will scale to ASI, many (e.g. RL-based techniques) do seem quite closely related to the techniques you need to make the AI capable in the first place.
I think it’s worth separating out two questions about alignment in the re-run: (1) how likely are we to retain alignment knowledge, and (2) how likely are we to rediscover alignment knowledge? The second seems important because, for some possible futures rediscovery of alignment techniques seems likely to correlate strongly with rediscovery of the technology necessary to make ASI. Then, conditional on being able to make capable AI, we might be quite likely to be able to align it.
The more that alignment is easy, and the relevant techniques look a lot like the techniques you need to make capable AI in the first place, the more likely alignment rediscovery conditional on AI rediscovery seems. While it’s currently uncertain whether today’s alignment techniques will scale to ASI, many (e.g. RL-based techniques) do seem quite closely related to the techniques you need to make the AI capable in the first place.