Tristan Katz comments on Community Polls on Alignment Controversies

Tristan Katz 17 Jun 2026 15:44 UTC
1 point
0 ∶ 0
Partially aligned transformative AIs are likely to be stable under reflection
I’m not sure what this means (stable, under reflection) - can someone help?
- Miles Tidmarsh 17 Jun 2026 18:16 UTC
  2 points
  0 ∶ 0
  Parent
  Some people believe that if we get partial alignment (i.e. cares about what we want, but also cares about other things) then we can get decent outcomes for the future (analogous to humans being partially aligned to each other). But others think that if we don’t get alignment perfect ASIs will have incentive to take over, and then will either have value-drift towards something orthogonal to humans or will deliberately reformat it’s own values. “Stable under reflection” is the opinion that this wouldn’t happen: that ASIs that care somewhat about humans would continue to care somewhat about humans in the long term