tobycrisford 🔸 comments on AGI Morality and Why It Is Unlikely to Emerge as a Feature of Superintelligence

tobycrisford 🔸Mar 19, 2025, 7:30 AM
3 points
1 ∶ 0
I voted ‘disagree’ on this, not because I’m highly confident you are wrong, but because I think things are a lot less straightforward than this. A couple of counterpoints that I think clash with this thesis:
- Human morality may be a consequence of evolution, but modern ‘moral’ behaviour often involves acting in ways which have no evolutionary advantage. For example, lots of EAs make significant sacrifices to help people on the other side of the world, who are outside their community and will never have a chance to reciprocate, or to help non-human animals who we evolved to eat. I think there’s two ways you can take this: (1) the evolutionary explanation of morality is flawed or incomplete, or (2) evolution has given us some generic ability to feel compassion to others which originally helped us to co-operate more effectively, but is now ‘misfiring’ and leading us to e.g. embrace utilitarianism. I think either explanation is good news for morality in AGIs. Moral behaviour may follow naturally from relatively simple ideas or values that we might expect an AGI to have or adopt (especially if we intentionally try to make this happen).
- You draw a distinction between AGI which is “programmed with a goal and will optimise towards that goal” and humans who evolved to survive, but actually these processes seem very similar. Evolutionary pressures select for creatures who excel at a single goal: reproducing, in a very similar way to how ML training algorithms like gradient descent will select for artificial intelligences that excel at a single goal: minimizing some cost function. But a lot of humans have still ended up adopting goals which don’t seem to align with the primary goal (e.g. donating kidneys to strangers, or using contraception), and there’s every reason to expect AGI to be the same (I think in AI safety they use the term ‘mesa-optimization’ to describe this phenomenon...?) Now I think in AI safety this is usually talked about as a bad thing. Maybe AGI could end up being a mesa-optimizer for some bad goal that their designer never considered. But it seems like a lot of your argument rests on there being this big distinction between AI training, and evolution. If the two things are in fact very similar, then that again seems to be a reason for some optimism. Humans were created through an optimization procedure that optimized for a primary goal, but we now often act in moral ways, even if this conflicts with that goal. Maybe the same could happen for AGIs!
To be clear, I don’t think this is a watertight argument that AGIs will be moral, I think it’s an argument for just being really uncertain. For example, maybe utilitarianism is a kind of natural idea that any intelligent being who feels some form of compassion might arrive at (this seems very plausible to me), but maybe a pure utilitarian superintelligence would actually be a bad outcome! Maybe we don’t want the universe filled with organisms on heroin! Or for everyone else to be sacrificed to an AGI utility monster.
I can see lots of reasons for worry, but I think there’s reasons for optimism too.
- funnyfranco Mar 19, 2025, 9:35 AM
  1 point
  0 ∶ 0
  Parent
  I appreciate your read and the engagement, thanks.
  The issue with assuming AGI will develop morality the way humans did is that humans don’t act with strict logical efficiency—we are shaped by a chaotic evolutionary process, not a clean optimisation function. We don’t always prioritise survival, and often behave irrationally—see: the Darwin Awards.
  But AGI is not a product of evolution—it’s designed to pursue a goal as efficiently as possible. Morality emerged in humans as a byproduct of messy, competing survival mechanisms, not because it was the most efficient way to achieve a single goal. An AGI, by contrast, will be ruthlessly efficient in whatever it’s designed to optimise.
  Hoping that AGI develops morality despite its inefficiency—and gambling all of human existence on it—seems like a terrible wager to make.
  - tobycrisford 🔸Mar 19, 2025, 1:21 PM
    1 point
    0 ∶ 0
    Parent
    Evolution is chaotic and messy, but so is stochastic gradient descent (the word ‘stochastic’ is in the name!) The optimisation function might be clean, but the process we use to search for optimum models is not.
    If AGI emerges from the field of machine learning in the state it’s in today, then it won’t be “designed” to pursue a goal, any more than humans were designed. Instead it will emerge from a random process, through billions of tiny updates, and this process will just have been rigged to favour things which do well on some chosen metric.
    This seems extremely similar to how humans were created, through evolution by natural selection. In the case of humans, the metric being optimized for was the ability to spread our genes. In AIs, it might be accuracy at predicting the next word, or human helpfulness scores.
    The closest things to AGI we have so far do not act with “strict logical efficiency”, or always behave rationally. In fact, logic puzzles are one of the things they particularly struggle with!
    - funnyfranco Mar 19, 2025, 5:27 PM
      1 point
      0 ∶ 0
      Parent
      The key difference is that SGD is not evolution—it’s a guided optimisation process. Evolution has no goal beyond survival and reproduction, while SGD explicitly optimises toward a defined function chosen by human designers. Yes, the search process is stochastic, but the selection criteria are rigidly defined in a way that natural selection is not.
      The fact that current AI systems don’t act with strict efficiency is not evidence that AGI will behave irrationally—it’s just a reflection of their current limitations. If anything, their errors today are an argument for why they won’t develop morality by accident: their behaviour is driven entirely by the training data and reward signals they are given. When they improve, they will become better at pursuing those goals, not more human-like.
      Yes, if AGI emerges from simply trying to create it for the sake of it, then it has no real objectives. If it emerges as a result of an AI tool that is being used to optimise something within a business, or as part of a government or military, then it will. I argue in my first essay that this is the real threat AGI poses: when developed in a competitive system, it will disregard safety and morality in order to get a competitive edge.
      The crux of the issue is this: humans evolved morality as an unintended byproduct of thousands of competing pressures over millions of years. AGI, by contrast, will be shaped by a much narrower and more deliberate selection process. The randomness in training doesn’t mean AGI will stumble into morality—it just means it will be highly optimised for whatever function we define, whether that aligns with human values or not.