I can’t get past the feeling that all the doom will flow through an asymptote of imperfect alignment. How can scalable alignment ever be watertight enough for x-risk to drop to insignificant levels? Especially given the (ML-based) engineering approach suggested. It sounds like formal, verifiable, proofs of existential safety won’t ever be an end product of all this. How long do we last in a world like that, where AI capabilities continue improving up to physical limits? Can the acute risk period really be brought to a close this way?
I can’t get past the feeling that all the doom will flow through an asymptote of imperfect alignment. How can scalable alignment ever be watertight enough for x-risk to drop to insignificant levels? Especially given the (ML-based) engineering approach suggested. It sounds like formal, verifiable, proofs of existential safety won’t ever be an end product of all this. How long do we last in a world like that, where AI capabilities continue improving up to physical limits? Can the acute risk period really be brought to a close this way?