Robust alignment requires alignment-relevant intervention during pretraining
I’d say this is the wrong question. Like, I do not expect that any current alignment approach is going to work. If we do ever figure out what works, it will not look like “pretraining” or “post-training”, it will be something completely different.
Although I guess you could call that “pretraining”?
Thanks Michael, we avoided mentioning post-training to imply that “new paradigm needed” would also count on the “disagree” side of the spectrum. In other words, “disagree” on this question would mean either “post-training is sufficient” or “new paradigms are needed/sufficient”.
I’d say this is the wrong question. Like, I do not expect that any current alignment approach is going to work. If we do ever figure out what works, it will not look like “pretraining” or “post-training”, it will be something completely different.
Although I guess you could call that “pretraining”?
Thanks Michael, we avoided mentioning post-training to imply that “new paradigm needed” would also count on the “disagree” side of the spectrum. In other words, “disagree” on this question would mean either “post-training is sufficient” or “new paradigms are needed/sufficient”.