Erich_Grunewald 🔸 comments on If Anything Changes, All Value Dies?

Erich_Grunewald 🔸 18 Jan 2026 14:44 UTC
7 points
0 ∶ 0
Interesting!
Given that these failures were predictable, it should be possible to systematically predict many analogous failures that might result from training AI systems on specific data sets or (simulated) environments.
Your framework seems to work for simple cases like “ice cream, sucralose, or sex with contraception”, but I don’t think it works for more complex cases like “peacocks would like giant colorful tails”?
There is so much human behaviour also that would have been essentially impossible to predict just from first principles and natural selection under constraints: poetry, chess playing, comedy, monasticism, sports, philosophy, effective altruism. These behaviours seem further removed from your detectors for instrumentally important subgoals, and/or to have a more complex relationship to those detectors, but they’re still widespread and important parts of human life. This seems to support the argument that the relationship between how a mind was evolved (e.g., by natural selection) and what it ends up wanting is unpredictable, possibly in dangerous ways.
Your model might still tell us that generalisation failures are very likely to occur, even if, as I am suggesting, it can’t predict many of the specific ways things will misgeneralise. But I’m not sure this offers much practical guidance when trying to develop safer AI systems. But maybe I’m wrong about that?
- Vasco Grilo🔸 18 Jan 2026 15:06 UTC
  2 points
  0 ∶ 0
  Parent
  I think the post The Selfish Machine by Maarten Boudry is relevant to this discussion.
  Consider dogs. Canine evolution under human domestication satisfies Lewontin’s three criteria: variation, heritability, and differential reproduction. But most dogs are bred to be meek and friendly, the very opposite of selfishness. Breeders ruthlessly select against aggression, and any dog attacking a human usually faces severe fitness consequences—it is put down, or at least not allowed to procreate. In the evolution of dogs, humans call the shots, not nature. Some breeds, like pit bulls or Rottweilers, are of course selected for aggression (to other animals, not to its guardian), but that just goes to show that domesticated evolution depends on breeders’ desires.
  How can we extend this difference between blind evolution and domestication to the domain of AI? In biology, the defining criterion of domestication is control over reproduction. If humans control an animal’s reproduction, deciding who gets to mate with whom, then it’s domesticated. If animals escape and regain their autonomy, they’re feral. By that criterion, house cats are only partly domesticated, as most moggies roam about unsupervised and choose their own mates, outside of human control. If you apply this framework to AIs, it should be clear that AI systems are still very much in a state of domestication. Selection pressures come from human designers, programmers, consumers, and regulators, not from blind forces. It is true that some AI systems self-improve without direct human supervision, but humans still decide which AIs are developed and released. GPT-4 isn’t autonomously spawning GPT-5 after competing in the wild with different LLMs; humans control its evolution.
  By and large, current selective pressures for AI are the opposite of selfishness. We want friendly, cooperative AIs that don’t harm users or produce offensive content. If chatbots engage in dangerous behavior, like encouraging suicide or enticing journalists to leave their spouse, companies will frantically try to update their models and stamp out the unwanted behavior. In fact, some language models have become so safe, avoiding any sensitive topics or giving anodyne answers, that consumers now complain they are boring. And Google became a laughing stock when its image generator proved to be so politically correct as to produce ethnically diverse Vikings or founding fathers.