Paolo Bova comments on Community Polls on Alignment Controversies

Paolo Bova 22 Jun 2026 23:06 UTC
1 point
0 ∶ 0
Robust alignment requires alignment-relevant intervention during pretraining
Interpreting this as saying a necessary condition for robust alignment is training data that captures good values and discourages bad values. I think there’s good evidence this matters lots for current systems so lean to agree. It’s still plausible to me that robust alignment could be achieved with post-training interventions and relatively neutral pre-training setups.
- Miles Tidmarsh 23 Jun 2026 16:08 UTC
  1 point
  0 ∶ 0
  Parent
  That was the intervention class we had in mind, though there could be other pretraining interventions that don’t fall cleanly into good/bad values (e.g. promoting risk aversion)