Robust alignment requires alignment-relevant intervention during pretraining
I have weak intuitions this isn’t true but not in ways that are articulable
I have weak intuitions this isn’t true but not in ways that are articulable