VeryJerry comments on Cost-effectiveness model for AI alignment-to-animals vs. alignment-in-general

VeryJerry 29 Mar 2026 0:10 UTC
1 point
0 ∶ 0
Hmm, that opens up a lot of interesting conversation threads. I actually think some goals will be easier to align ai towards than others, for example we’ve aligned some ai to winning at chess and now they’re better than any human. Obviously that kind of goal is much simpler than any values framework that would be worth aligning agi too, but I think sentientist values would be easier to instill than “human values” (although not in the case of LLMs, I think they’re already basically “aligned” with human values and we now need to shift them towards caring more about all sentient beings). And on top of that, I think sentientist values will care enough about us and our values that a sentientist agi would “go well” for us.
But I’m not even close to an expert, so that’s all very tentative speculation.
- MichaelDickens 29 Mar 2026 0:59 UTC
  4 points
  0 ∶ 0
  Parent
  
  for example we’ve aligned some ai to winning at chess and now they’re better than any human
  
  Chess bots are narrow AI, not general AI, which makes the situation very different. We don’t know how to align an ASI to the goal of winning at chess. The most likely outcome would be some sort of severe misalignment—for example, maybe we think we trained the ASI to win at chess, but what actually maximizes its reward signal is the checkmate position, so it builds a fleet of robots to cut down every tree in the world to build trillions of chess sets and arranges every chess board into a checkmate position. See A simple case for extreme inner misalignment for more on why this sort of thing would happen.
  
  Chess bots don’t do that because they have no concept of any world existing outside of the game they’re playing, which would not be the case for ASI.
  
  ETA: That’s also why a lot of people oppose building ASI but still want to build powerful-but-narrow AIs like AlphaFold.