I think that as systems get more capable, we will see a large increase in our alignment efforts and monitoring of AI systems, even without any further intervention from longtermists.
Maybe so. But I can’t really see mechanistic interpretability being solved to a sufficient degree to detect a situationally aware AI playing the training game, in time to avert doom. Not without a long pause first at least!
Maybe so. But I can’t really see mechanistic interpretability being solved to a sufficient degree to detect a situationally aware AI playing the training game, in time to avert doom. Not without a long pause first at least!