On section 4, where you ask about retaining alignment knowledge:
It feels kind of like you’re mislabelling the ends of the spectrum?
My guess is that rather than think about “how much alignment knowledge is lost?”, you should be asking about the differential between how much AI knowledge is lost and how much alignment knowledge is lost
I’m not sure that’s quite right either, but it feels a little bit closer?
Okay, looking at the spectrum again, it still seems to me like I’ve labelled them correctly? Maybe I’m missing something. It’s optimistic if we can retain a knowledge of how to align AGI because then we can just use that knowledge later and we don’t face the same magnitude of risk of the misaligned AI.
Sorry, I didn’t mean mislabelled in terms of having the labels the wrong way around. I meant that the points you describe aren’t necessarily the ends of the spectrum—for instance, worse than just losing all alignment knowledge is losing all the alignment knowledge while keeping all of the knowledge about how to build highly effective AI.
At least that’s what I had in mind at the time of writing my comment. I’m now wondering if it would actually be better to keep the capabilities knowledge, because it makes it easier to do meaningful alignment work as you do the rerun. It’s plausible that this is actually more important than the more explicitly “alignment” knowledge. (Assuming that compute will be the bottleneck.)
On section 4, where you ask about retaining alignment knowledge:
It feels kind of like you’re mislabelling the ends of the spectrum?
My guess is that rather than think about “how much alignment knowledge is lost?”, you should be asking about the differential between how much AI knowledge is lost and how much alignment knowledge is lost
I’m not sure that’s quite right either, but it feels a little bit closer?
Okay, looking at the spectrum again, it still seems to me like I’ve labelled them correctly? Maybe I’m missing something. It’s optimistic if we can retain a knowledge of how to align AGI because then we can just use that knowledge later and we don’t face the same magnitude of risk of the misaligned AI.
Sorry, I didn’t mean mislabelled in terms of having the labels the wrong way around. I meant that the points you describe aren’t necessarily the ends of the spectrum—for instance, worse than just losing all alignment knowledge is losing all the alignment knowledge while keeping all of the knowledge about how to build highly effective AI.
At least that’s what I had in mind at the time of writing my comment. I’m now wondering if it would actually be better to keep the capabilities knowledge, because it makes it easier to do meaningful alignment work as you do the rerun. It’s plausible that this is actually more important than the more explicitly “alignment” knowledge. (Assuming that compute will be the bottleneck.)