Robust alignment requires alignment-relevant intervention during pretraining
Frankly I neither agree nor disagree with this statement. Robust alignment has nothing to do with the current pre training regime. It should work with or without it.
If robust alignment is orthogonal to pretraining then shouldn’t that mean a strong disagreement with the statement (that alignment requires pretraining)?
I think it’s neither necessary nor sufficient for robust alignment. I’m uncertain as to whether it’s possible to get some kind of “fragile” alignment from pretraining. I don’t think robust alignment requires it, but neither do I think that it doesn’t. It definitely doesn’t hurt.
Frankly I neither agree nor disagree with this statement. Robust alignment has nothing to do with the current pre training regime. It should work with or without it.
If robust alignment is orthogonal to pretraining then shouldn’t that mean a strong disagreement with the statement (that alignment requires pretraining)?
I think it’s neither necessary nor sufficient for robust alignment. I’m uncertain as to whether it’s possible to get some kind of “fragile” alignment from pretraining. I don’t think robust alignment requires it, but neither do I think that it doesn’t. It definitely doesn’t hurt.