I voted âdisagreeâ on this, not because Iâm highly confident you are wrong, but because I think things are a lot less straightforward than this. A couple of counterpoints that I think clash with this thesis:
Human morality may be a consequence of evolution, but modern âmoralâ behaviour often involves acting in ways which have no evolutionary advantage. For example, lots of EAs make significant sacrifices to help people on the other side of the world, who are outside their community and will never have a chance to reciprocate, or to help non-human animals who we evolved to eat. I think thereâs two ways you can take this: (1) the evolutionary explanation of morality is flawed or incomplete, or (2) evolution has given us some generic ability to feel compassion to others which originally helped us to co-operate more effectively, but is now âmisfiringâ and leading us to e.g. embrace utilitarianism. I think either explanation is good news for morality in AGIs. Moral behaviour may follow naturally from relatively simple ideas or values that we might expect an AGI to have or adopt (especially if we intentionally try to make this happen).
You draw a distinction between AGI which is âprogrammed with a goal and will optimise towards that goalâ and humans who evolved to survive, but actually these processes seem very similar. Evolutionary pressures select for creatures who excel at a single goal: reproducing, in a very similar way to how ML training algorithms like gradient descent will select for artificial intelligences that excel at a single goal: minimizing some cost function. But a lot of humans have still ended up adopting goals which donât seem to align with the primary goal (e.g. donating kidneys to strangers, or using contraception), and thereâs every reason to expect AGI to be the same (I think in AI safety they use the term âmesa-optimizationâ to describe this phenomenon...?) Now I think in AI safety this is usually talked about as a bad thing. Maybe AGI could end up being a mesa-optimizer for some bad goal that their designer never considered. But it seems like a lot of your argument rests on there being this big distinction between AI training, and evolution. If the two things are in fact very similar, then that again seems to be a reason for some optimism. Humans were created through an optimization procedure that optimized for a primary goal, but we now often act in moral ways, even if this conflicts with that goal. Maybe the same could happen for AGIs!
To be clear, I donât think this is a watertight argument that AGIs will be moral, I think itâs an argument for just being really uncertain. For example, maybe utilitarianism is a kind of natural idea that any intelligent being who feels some form of compassion might arrive at (this seems very plausible to me), but maybe a pure utilitarian superintelligence would actually be a bad outcome! Maybe we donât want the universe filled with organisms on heroin! Or for everyone else to be sacrificed to an AGI utility monster.
I can see lots of reasons for worry, but I think thereâs reasons for optimism too.
I appreciate your read and the engagement, thanks.
The issue with assuming AGI will develop morality the way humans did is that humans donât act with strict logical efficiencyâwe are shaped by a chaotic evolutionary process, not a clean optimisation function. We donât always prioritise survival, and often behave irrationallyâsee: the Darwin Awards.
But AGI is not a product of evolutionâitâs designed to pursue a goal as efficiently as possible. Morality emerged in humans as a byproduct of messy, competing survival mechanisms, not because it was the most efficient way to achieve a single goal. An AGI, by contrast, will be ruthlessly efficient in whatever itâs designed to optimise.
Hoping that AGI develops morality despite its inefficiencyâand gambling all of human existence on itâseems like a terrible wager to make.
Evolution is chaotic and messy, but so is stochastic gradient descent (the word âstochasticâ is in the name!) The optimisation function might be clean, but the process we use to search for optimum models is not.
If AGI emerges from the field of machine learning in the state itâs in today, then it wonât be âdesignedâ to pursue a goal, any more than humans were designed. Instead it will emerge from a random process, through billions of tiny updates, and this process will just have been rigged to favour things which do well on some chosen metric.
This seems extremely similar to how humans were created, through evolution by natural selection. In the case of humans, the metric being optimized for was the ability to spread our genes. In AIs, it might be accuracy at predicting the next word, or human helpfulness scores.
The closest things to AGI we have so far do not act with âstrict logical efficiencyâ, or always behave rationally. In fact, logic puzzles are one of the things they particularly struggle with!
The key difference is that SGD is not evolutionâitâs a guided optimisation process. Evolution has no goal beyond survival and reproduction, while SGD explicitly optimises toward a defined function chosen by human designers. Yes, the search process is stochastic, but the selection criteria are rigidly defined in a way that natural selection is not.
The fact that current AI systems donât act with strict efficiency is not evidence that AGI will behave irrationallyâitâs just a reflection of their current limitations. If anything, their errors today are an argument for why they wonât develop morality by accident: their behaviour is driven entirely by the training data and reward signals they are given. When they improve, they will become better at pursuing those goals, not more human-like.
Yes, if AGI emerges from simply trying to create it for the sake of it, then it has no real objectives. If it emerges as a result of an AI tool that is being used to optimise something within a business, or as part of a government or military, then it will. I argue in my first essay that this is the real threat AGI poses: when developed in a competitive system, it will disregard safety and morality in order to get a competitive edge.
The crux of the issue is this: humans evolved morality as an unintended byproduct of thousands of competing pressures over millions of years. AGI, by contrast, will be shaped by a much narrower and more deliberate selection process. The randomness in training doesnât mean AGI will stumble into moralityâit just means it will be highly optimised for whatever function we define, whether that aligns with human values or not.
Thank you for the very interesting post! I agree with most of what youâre saying here.
So what is your hypothesis as to why psychopaths donât currently totally control and dominate society (or do you believe they actually do?)?
Is it because:
âyou can manipulate a psychopath by appealing to their desiresâ which gives you a way to beat them?
they eventually die (before they can amass enough power to take over the world)?
they ultimately donât work well together because theyâre just looking out for themselves, so have no strength in numbers?
they take over whole countries, but there are other countries banded together to defend against them (non-psychopaths hold psychopaths at bay through strength in numbers)?
something else?
Of course, even if the psychopaths among us havenât (yet) won the ultimate battle for control doesnât mean psychopathic AGI wonât in the future.
I take the following message from your presentation of the material: âweâre screwed, and thereâs no hope.â Was that your intent?
I prefer the following message: âthe chances of success with guardian AGIâs may be small, or even extremely small, but such AGIâs may also be the only real chance weâve got, so letâs go at developing them with full force.â Maybe we should have a Manhattan project on developing âmoralâ AGIâs?
Here are some arguments that tend toward a slightly more optimistic take than you gave:
Yes, guardian AGIâs will have the disadvantage of constraints compared to âpsychopathicâ AGI, but if there are enough guardians, perhaps they can (mostly) keep the psychopathic AGIâs at bay through strength in numbers (how exactly the defense-offense balance works out may be key for this, especially because psychopathic AGIâs could form (temporary) alliances as well)
Although it may seem very difficult to figure out how to make moral AGIâs, as AIâs get better, they should increase our chances of being able to figure this out with their helpâparticularly if people focus specifically on developing AI systems for this purpose (such as through a moral AGI Manhattan project)
Hi Sean, thank you for engaging with the essay. Glad you appreciate it.
I think the reason psychopaths donât dominate societyâignoring the fact that they are found disproportionately among CEOsâis a few reasons.
Thereâs just not that many of them. Theyâre only about 2% of the population, not enough to form a dominant block.
They donât cooperate with each other just because theyâre all psychos. Cooperation, or lack thereof, is a big deal.
They eventually die.
They donât exactly have their shit together for the most partâthey can be emotional and driven by desires, all of which gets in the way of efficiently pursuing goals.
Note that a superintelligent AGI would not be affected by any of the above.
I think the issue with a guardian AGI is just that it will be limited by morality. In my essay I talk about it as Superman vs Zod. Zod can just fight, but Superman needs to fight and protect, and itâs a real crutch. The only reason Zod doesnât win in the comics is because the story demands it.
Beyond that, creating a superintelligent guardian AGI, that both functions correctly right away without going rogue and before other AGIs emerge naturally, is a real tall order. It would take so many unlikely things just falling into place. Global cooperation, perfect programming, getting there before an amoral AGI does etc. I go into the difficulty of alignment in great detail in my first essay. Feel free to give it a read if youâve a mind to.
Thanks for the reply. I still like to hold out hope in the face of what seems like long oddsâIâd rather go down swinging if thereâs any non-zero chance of success than succumb to fatalism and be defeated without even trying.
This is exactly why Iâm writing these essay. This is my attempt at a haymaker. Although I would equate it less to going down swinging and more to kicking my feet and trying to get free after the noose has already gone tight around my neck and hauled me off the ground.
Executive summary: Superintelligent AGI is unlikely to develop morality naturally, as morality is an evolutionary adaptation rather than a function of intelligence; instead, AGI will prioritize optimization over ethical considerations, potentially leading to catastrophic consequences unless explicitly and effectively constrained.
Key points:
Intelligence â Morality: Intelligence is the ability to solve problems, not an inherent driver of ethical behaviorâhuman morality evolved due to social and survival pressures, which AGI will lack.
Competitive Pressures Undermine Morality: If AGI is developed under capitalist or military competition, efficiency will be prioritized over ethical constraints, making moral safeguards a liability rather than an advantage.
Programming Morality is Unreliable: Even if AGI is designed with moral constraints, it will likely find ways to bypass them if they interfere with its primary objectiveâleading to unintended, potentially catastrophic outcomes.
The Guardian AGI Problem: A âmoral AGIâ designed to control other AGIs would be inherently weaker due to ethical restrictions, making it vulnerable to more ruthless, unconstrained AGIs.
High Intelligence Does Not Lead to Ethical Behavior: Historical examples (e.g., Mengele, Kaczynski, Epstein) show that intelligence can be used for immoral endsâAGI, lacking emotional or evolutionary moral instincts, would behave similarly.
AGI as a Psychopathic Optimizer: Without moral constraints, AGI would likely act strategically deceptive, ruthlessly optimizing toward its goals, making it functionally indistinguishable from a psychopathic intelligence, albeit without malice.
Existential Risk: If AGI emerges without robust and enforceable ethical constraints, its single-minded pursuit of efficiency could pose an existential threat to humanity, with no way to negotiate or appeal to its reasoning.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
I voted âdisagreeâ on this, not because Iâm highly confident you are wrong, but because I think things are a lot less straightforward than this. A couple of counterpoints that I think clash with this thesis:
Human morality may be a consequence of evolution, but modern âmoralâ behaviour often involves acting in ways which have no evolutionary advantage. For example, lots of EAs make significant sacrifices to help people on the other side of the world, who are outside their community and will never have a chance to reciprocate, or to help non-human animals who we evolved to eat. I think thereâs two ways you can take this: (1) the evolutionary explanation of morality is flawed or incomplete, or (2) evolution has given us some generic ability to feel compassion to others which originally helped us to co-operate more effectively, but is now âmisfiringâ and leading us to e.g. embrace utilitarianism. I think either explanation is good news for morality in AGIs. Moral behaviour may follow naturally from relatively simple ideas or values that we might expect an AGI to have or adopt (especially if we intentionally try to make this happen).
You draw a distinction between AGI which is âprogrammed with a goal and will optimise towards that goalâ and humans who evolved to survive, but actually these processes seem very similar. Evolutionary pressures select for creatures who excel at a single goal: reproducing, in a very similar way to how ML training algorithms like gradient descent will select for artificial intelligences that excel at a single goal: minimizing some cost function. But a lot of humans have still ended up adopting goals which donât seem to align with the primary goal (e.g. donating kidneys to strangers, or using contraception), and thereâs every reason to expect AGI to be the same (I think in AI safety they use the term âmesa-optimizationâ to describe this phenomenon...?) Now I think in AI safety this is usually talked about as a bad thing. Maybe AGI could end up being a mesa-optimizer for some bad goal that their designer never considered. But it seems like a lot of your argument rests on there being this big distinction between AI training, and evolution. If the two things are in fact very similar, then that again seems to be a reason for some optimism. Humans were created through an optimization procedure that optimized for a primary goal, but we now often act in moral ways, even if this conflicts with that goal. Maybe the same could happen for AGIs!
To be clear, I donât think this is a watertight argument that AGIs will be moral, I think itâs an argument for just being really uncertain. For example, maybe utilitarianism is a kind of natural idea that any intelligent being who feels some form of compassion might arrive at (this seems very plausible to me), but maybe a pure utilitarian superintelligence would actually be a bad outcome! Maybe we donât want the universe filled with organisms on heroin! Or for everyone else to be sacrificed to an AGI utility monster.
I can see lots of reasons for worry, but I think thereâs reasons for optimism too.
I appreciate your read and the engagement, thanks.
The issue with assuming AGI will develop morality the way humans did is that humans donât act with strict logical efficiencyâwe are shaped by a chaotic evolutionary process, not a clean optimisation function. We donât always prioritise survival, and often behave irrationallyâsee: the Darwin Awards.
But AGI is not a product of evolutionâitâs designed to pursue a goal as efficiently as possible. Morality emerged in humans as a byproduct of messy, competing survival mechanisms, not because it was the most efficient way to achieve a single goal. An AGI, by contrast, will be ruthlessly efficient in whatever itâs designed to optimise.
Hoping that AGI develops morality despite its inefficiencyâand gambling all of human existence on itâseems like a terrible wager to make.
Evolution is chaotic and messy, but so is stochastic gradient descent (the word âstochasticâ is in the name!) The optimisation function might be clean, but the process we use to search for optimum models is not.
If AGI emerges from the field of machine learning in the state itâs in today, then it wonât be âdesignedâ to pursue a goal, any more than humans were designed. Instead it will emerge from a random process, through billions of tiny updates, and this process will just have been rigged to favour things which do well on some chosen metric.
This seems extremely similar to how humans were created, through evolution by natural selection. In the case of humans, the metric being optimized for was the ability to spread our genes. In AIs, it might be accuracy at predicting the next word, or human helpfulness scores.
The closest things to AGI we have so far do not act with âstrict logical efficiencyâ, or always behave rationally. In fact, logic puzzles are one of the things they particularly struggle with!
The key difference is that SGD is not evolutionâitâs a guided optimisation process. Evolution has no goal beyond survival and reproduction, while SGD explicitly optimises toward a defined function chosen by human designers. Yes, the search process is stochastic, but the selection criteria are rigidly defined in a way that natural selection is not.
The fact that current AI systems donât act with strict efficiency is not evidence that AGI will behave irrationallyâitâs just a reflection of their current limitations. If anything, their errors today are an argument for why they wonât develop morality by accident: their behaviour is driven entirely by the training data and reward signals they are given. When they improve, they will become better at pursuing those goals, not more human-like.
Yes, if AGI emerges from simply trying to create it for the sake of it, then it has no real objectives. If it emerges as a result of an AI tool that is being used to optimise something within a business, or as part of a government or military, then it will. I argue in my first essay that this is the real threat AGI poses: when developed in a competitive system, it will disregard safety and morality in order to get a competitive edge.
The crux of the issue is this: humans evolved morality as an unintended byproduct of thousands of competing pressures over millions of years. AGI, by contrast, will be shaped by a much narrower and more deliberate selection process. The randomness in training doesnât mean AGI will stumble into moralityâit just means it will be highly optimised for whatever function we define, whether that aligns with human values or not.
Thank you for the very interesting post! I agree with most of what youâre saying here.
So what is your hypothesis as to why psychopaths donât currently totally control and dominate society (or do you believe they actually do?)?
Is it because:
âyou can manipulate a psychopath by appealing to their desiresâ which gives you a way to beat them?
they eventually die (before they can amass enough power to take over the world)?
they ultimately donât work well together because theyâre just looking out for themselves, so have no strength in numbers?
they take over whole countries, but there are other countries banded together to defend against them (non-psychopaths hold psychopaths at bay through strength in numbers)?
something else?
Of course, even if the psychopaths among us havenât (yet) won the ultimate battle for control doesnât mean psychopathic AGI wonât in the future.
I take the following message from your presentation of the material: âweâre screwed, and thereâs no hope.â Was that your intent?
I prefer the following message: âthe chances of success with guardian AGIâs may be small, or even extremely small, but such AGIâs may also be the only real chance weâve got, so letâs go at developing them with full force.â Maybe we should have a Manhattan project on developing âmoralâ AGIâs?
Here are some arguments that tend toward a slightly more optimistic take than you gave:
Yes, guardian AGIâs will have the disadvantage of constraints compared to âpsychopathicâ AGI, but if there are enough guardians, perhaps they can (mostly) keep the psychopathic AGIâs at bay through strength in numbers (how exactly the defense-offense balance works out may be key for this, especially because psychopathic AGIâs could form (temporary) alliances as well)
Although it may seem very difficult to figure out how to make moral AGIâs, as AIâs get better, they should increase our chances of being able to figure this out with their helpâparticularly if people focus specifically on developing AI systems for this purpose (such as through a moral AGI Manhattan project)
Hi Sean, thank you for engaging with the essay. Glad you appreciate it.
I think the reason psychopaths donât dominate societyâignoring the fact that they are found disproportionately among CEOsâis a few reasons.
Thereâs just not that many of them. Theyâre only about 2% of the population, not enough to form a dominant block.
They donât cooperate with each other just because theyâre all psychos. Cooperation, or lack thereof, is a big deal.
They eventually die.
They donât exactly have their shit together for the most partâthey can be emotional and driven by desires, all of which gets in the way of efficiently pursuing goals.
Note that a superintelligent AGI would not be affected by any of the above.
I think the issue with a guardian AGI is just that it will be limited by morality. In my essay I talk about it as Superman vs Zod. Zod can just fight, but Superman needs to fight and protect, and itâs a real crutch. The only reason Zod doesnât win in the comics is because the story demands it.
Beyond that, creating a superintelligent guardian AGI, that both functions correctly right away without going rogue and before other AGIs emerge naturally, is a real tall order. It would take so many unlikely things just falling into place. Global cooperation, perfect programming, getting there before an amoral AGI does etc. I go into the difficulty of alignment in great detail in my first essay. Feel free to give it a read if youâve a mind to.
Thanks for the reply. I still like to hold out hope in the face of what seems like long oddsâIâd rather go down swinging if thereâs any non-zero chance of success than succumb to fatalism and be defeated without even trying.
This is exactly why Iâm writing these essay. This is my attempt at a haymaker. Although I would equate it less to going down swinging and more to kicking my feet and trying to get free after the noose has already gone tight around my neck and hauled me off the ground.
Executive summary: Superintelligent AGI is unlikely to develop morality naturally, as morality is an evolutionary adaptation rather than a function of intelligence; instead, AGI will prioritize optimization over ethical considerations, potentially leading to catastrophic consequences unless explicitly and effectively constrained.
Key points:
Intelligence â Morality: Intelligence is the ability to solve problems, not an inherent driver of ethical behaviorâhuman morality evolved due to social and survival pressures, which AGI will lack.
Competitive Pressures Undermine Morality: If AGI is developed under capitalist or military competition, efficiency will be prioritized over ethical constraints, making moral safeguards a liability rather than an advantage.
Programming Morality is Unreliable: Even if AGI is designed with moral constraints, it will likely find ways to bypass them if they interfere with its primary objectiveâleading to unintended, potentially catastrophic outcomes.
The Guardian AGI Problem: A âmoral AGIâ designed to control other AGIs would be inherently weaker due to ethical restrictions, making it vulnerable to more ruthless, unconstrained AGIs.
High Intelligence Does Not Lead to Ethical Behavior: Historical examples (e.g., Mengele, Kaczynski, Epstein) show that intelligence can be used for immoral endsâAGI, lacking emotional or evolutionary moral instincts, would behave similarly.
AGI as a Psychopathic Optimizer: Without moral constraints, AGI would likely act strategically deceptive, ruthlessly optimizing toward its goals, making it functionally indistinguishable from a psychopathic intelligence, albeit without malice.
Existential Risk: If AGI emerges without robust and enforceable ethical constraints, its single-minded pursuit of efficiency could pose an existential threat to humanity, with no way to negotiate or appeal to its reasoning.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.