I voted âdisagreeâ on this, not because Iâm highly confident you are wrong, but because I think things are a lot less straightforward than this. A couple of counterpoints that I think clash with this thesis:
Human morality may be a consequence of evolution, but modern âmoralâ behaviour often involves acting in ways which have no evolutionary advantage. For example, lots of EAs make significant sacrifices to help people on the other side of the world, who are outside their community and will never have a chance to reciprocate, or to help non-human animals who we evolved to eat. I think thereâs two ways you can take this: (1) the evolutionary explanation of morality is flawed or incomplete, or (2) evolution has given us some generic ability to feel compassion to others which originally helped us to co-operate more effectively, but is now âmisfiringâ and leading us to e.g. embrace utilitarianism. I think either explanation is good news for morality in AGIs. Moral behaviour may follow naturally from relatively simple ideas or values that we might expect an AGI to have or adopt (especially if we intentionally try to make this happen).
You draw a distinction between AGI which is âprogrammed with a goal and will optimise towards that goalâ and humans who evolved to survive, but actually these processes seem very similar. Evolutionary pressures select for creatures who excel at a single goal: reproducing, in a very similar way to how ML training algorithms like gradient descent will select for artificial intelligences that excel at a single goal: minimizing some cost function. But a lot of humans have still ended up adopting goals which donât seem to align with the primary goal (e.g. donating kidneys to strangers, or using contraception), and thereâs every reason to expect AGI to be the same (I think in AI safety they use the term âmesa-optimizationâ to describe this phenomenon...?) Now I think in AI safety this is usually talked about as a bad thing. Maybe AGI could end up being a mesa-optimizer for some bad goal that their designer never considered. But it seems like a lot of your argument rests on there being this big distinction between AI training, and evolution. If the two things are in fact very similar, then that again seems to be a reason for some optimism. Humans were created through an optimization procedure that optimized for a primary goal, but we now often act in moral ways, even if this conflicts with that goal. Maybe the same could happen for AGIs!
To be clear, I donât think this is a watertight argument that AGIs will be moral, I think itâs an argument for just being really uncertain. For example, maybe utilitarianism is a kind of natural idea that any intelligent being who feels some form of compassion might arrive at (this seems very plausible to me), but maybe a pure utilitarian superintelligence would actually be a bad outcome! Maybe we donât want the universe filled with organisms on heroin! Or for everyone else to be sacrificed to an AGI utility monster.
I can see lots of reasons for worry, but I think thereâs reasons for optimism too.
I appreciate your read and the engagement, thanks.
The issue with assuming AGI will develop morality the way humans did is that humans donât act with strict logical efficiencyâwe are shaped by a chaotic evolutionary process, not a clean optimisation function. We donât always prioritise survival, and often behave irrationallyâsee: the Darwin Awards.
But AGI is not a product of evolutionâitâs designed to pursue a goal as efficiently as possible. Morality emerged in humans as a byproduct of messy, competing survival mechanisms, not because it was the most efficient way to achieve a single goal. An AGI, by contrast, will be ruthlessly efficient in whatever itâs designed to optimise.
Hoping that AGI develops morality despite its inefficiencyâand gambling all of human existence on itâseems like a terrible wager to make.
Evolution is chaotic and messy, but so is stochastic gradient descent (the word âstochasticâ is in the name!) The optimisation function might be clean, but the process we use to search for optimum models is not.
If AGI emerges from the field of machine learning in the state itâs in today, then it wonât be âdesignedâ to pursue a goal, any more than humans were designed. Instead it will emerge from a random process, through billions of tiny updates, and this process will just have been rigged to favour things which do well on some chosen metric.
This seems extremely similar to how humans were created, through evolution by natural selection. In the case of humans, the metric being optimized for was the ability to spread our genes. In AIs, it might be accuracy at predicting the next word, or human helpfulness scores.
The closest things to AGI we have so far do not act with âstrict logical efficiencyâ, or always behave rationally. In fact, logic puzzles are one of the things they particularly struggle with!
The key difference is that SGD is not evolutionâitâs a guided optimisation process. Evolution has no goal beyond survival and reproduction, while SGD explicitly optimises toward a defined function chosen by human designers. Yes, the search process is stochastic, but the selection criteria are rigidly defined in a way that natural selection is not.
The fact that current AI systems donât act with strict efficiency is not evidence that AGI will behave irrationallyâitâs just a reflection of their current limitations. If anything, their errors today are an argument for why they wonât develop morality by accident: their behaviour is driven entirely by the training data and reward signals they are given. When they improve, they will become better at pursuing those goals, not more human-like.
Yes, if AGI emerges from simply trying to create it for the sake of it, then it has no real objectives. If it emerges as a result of an AI tool that is being used to optimise something within a business, or as part of a government or military, then it will. I argue in my first essay that this is the real threat AGI poses: when developed in a competitive system, it will disregard safety and morality in order to get a competitive edge.
The crux of the issue is this: humans evolved morality as an unintended byproduct of thousands of competing pressures over millions of years. AGI, by contrast, will be shaped by a much narrower and more deliberate selection process. The randomness in training doesnât mean AGI will stumble into moralityâit just means it will be highly optimised for whatever function we define, whether that aligns with human values or not.
I voted âdisagreeâ on this, not because Iâm highly confident you are wrong, but because I think things are a lot less straightforward than this. A couple of counterpoints that I think clash with this thesis:
Human morality may be a consequence of evolution, but modern âmoralâ behaviour often involves acting in ways which have no evolutionary advantage. For example, lots of EAs make significant sacrifices to help people on the other side of the world, who are outside their community and will never have a chance to reciprocate, or to help non-human animals who we evolved to eat. I think thereâs two ways you can take this: (1) the evolutionary explanation of morality is flawed or incomplete, or (2) evolution has given us some generic ability to feel compassion to others which originally helped us to co-operate more effectively, but is now âmisfiringâ and leading us to e.g. embrace utilitarianism. I think either explanation is good news for morality in AGIs. Moral behaviour may follow naturally from relatively simple ideas or values that we might expect an AGI to have or adopt (especially if we intentionally try to make this happen).
You draw a distinction between AGI which is âprogrammed with a goal and will optimise towards that goalâ and humans who evolved to survive, but actually these processes seem very similar. Evolutionary pressures select for creatures who excel at a single goal: reproducing, in a very similar way to how ML training algorithms like gradient descent will select for artificial intelligences that excel at a single goal: minimizing some cost function. But a lot of humans have still ended up adopting goals which donât seem to align with the primary goal (e.g. donating kidneys to strangers, or using contraception), and thereâs every reason to expect AGI to be the same (I think in AI safety they use the term âmesa-optimizationâ to describe this phenomenon...?) Now I think in AI safety this is usually talked about as a bad thing. Maybe AGI could end up being a mesa-optimizer for some bad goal that their designer never considered. But it seems like a lot of your argument rests on there being this big distinction between AI training, and evolution. If the two things are in fact very similar, then that again seems to be a reason for some optimism. Humans were created through an optimization procedure that optimized for a primary goal, but we now often act in moral ways, even if this conflicts with that goal. Maybe the same could happen for AGIs!
To be clear, I donât think this is a watertight argument that AGIs will be moral, I think itâs an argument for just being really uncertain. For example, maybe utilitarianism is a kind of natural idea that any intelligent being who feels some form of compassion might arrive at (this seems very plausible to me), but maybe a pure utilitarian superintelligence would actually be a bad outcome! Maybe we donât want the universe filled with organisms on heroin! Or for everyone else to be sacrificed to an AGI utility monster.
I can see lots of reasons for worry, but I think thereâs reasons for optimism too.
I appreciate your read and the engagement, thanks.
The issue with assuming AGI will develop morality the way humans did is that humans donât act with strict logical efficiencyâwe are shaped by a chaotic evolutionary process, not a clean optimisation function. We donât always prioritise survival, and often behave irrationallyâsee: the Darwin Awards.
But AGI is not a product of evolutionâitâs designed to pursue a goal as efficiently as possible. Morality emerged in humans as a byproduct of messy, competing survival mechanisms, not because it was the most efficient way to achieve a single goal. An AGI, by contrast, will be ruthlessly efficient in whatever itâs designed to optimise.
Hoping that AGI develops morality despite its inefficiencyâand gambling all of human existence on itâseems like a terrible wager to make.
Evolution is chaotic and messy, but so is stochastic gradient descent (the word âstochasticâ is in the name!) The optimisation function might be clean, but the process we use to search for optimum models is not.
If AGI emerges from the field of machine learning in the state itâs in today, then it wonât be âdesignedâ to pursue a goal, any more than humans were designed. Instead it will emerge from a random process, through billions of tiny updates, and this process will just have been rigged to favour things which do well on some chosen metric.
This seems extremely similar to how humans were created, through evolution by natural selection. In the case of humans, the metric being optimized for was the ability to spread our genes. In AIs, it might be accuracy at predicting the next word, or human helpfulness scores.
The closest things to AGI we have so far do not act with âstrict logical efficiencyâ, or always behave rationally. In fact, logic puzzles are one of the things they particularly struggle with!
The key difference is that SGD is not evolutionâitâs a guided optimisation process. Evolution has no goal beyond survival and reproduction, while SGD explicitly optimises toward a defined function chosen by human designers. Yes, the search process is stochastic, but the selection criteria are rigidly defined in a way that natural selection is not.
The fact that current AI systems donât act with strict efficiency is not evidence that AGI will behave irrationallyâitâs just a reflection of their current limitations. If anything, their errors today are an argument for why they wonât develop morality by accident: their behaviour is driven entirely by the training data and reward signals they are given. When they improve, they will become better at pursuing those goals, not more human-like.
Yes, if AGI emerges from simply trying to create it for the sake of it, then it has no real objectives. If it emerges as a result of an AI tool that is being used to optimise something within a business, or as part of a government or military, then it will. I argue in my first essay that this is the real threat AGI poses: when developed in a competitive system, it will disregard safety and morality in order to get a competitive edge.
The crux of the issue is this: humans evolved morality as an unintended byproduct of thousands of competing pressures over millions of years. AGI, by contrast, will be shaped by a much narrower and more deliberate selection process. The randomness in training doesnât mean AGI will stumble into moralityâit just means it will be highly optimised for whatever function we define, whether that aligns with human values or not.