I think this paper is missing an important distinction between evolutionarily altruistic behaviour and functionally altruistic behaviour.
Evolutionarily altruistic behaviour: behaviour that confers a fitness benefit on the recipient and a fitness cost on the donor.
Functionally altruistic behaviour: behaviour that is motivated by an intrinsic concern for others’ welfare.
These two forms of behaviour can come apart.
A parent’s care for their child is often functionally altruistic but evolutionarily selfish: it is motivated by an intrinsic concern for the child’s welfare, but it doesn’t confer a fitness cost on the parent.
Other kinds of behaviour are evolutionarily altruistic but functionally selfish. For example, I might spend long hours working as a babysitter for someone unrelated to me. If I’m purely motivated by money, my behaviour is functionally selfish. And if my behaviour helps ensure that this other person’s baby reaches maturity (while also making it less likely that I myself have kids), my behaviour is also evolutionarily altruistic.
The paper seems to make the following sort of argument:
Natural selection favours evolutionarily selfish AIs over evolutionarily altruistic AIs.
Evolutionarily selfish AIs will also likely be functionally selfish: they won’t be motivated by an intrinsic concern for human welfare.
So natural selection favours functionally selfish AIs.
I think we have reasons to question premises 1 and 2.
Taking premise 2 first, recall that evolutionarily selfish behaviour can be functionally altruistic. A parent’s care for their child is one example.
Now here’s something that seems plausible to me:
We humans are more likely to preserve and copy those AIs that behave in ways that suggest they have an intrinsic concern for human welfare.
If that’s the case, then functionally altruistic behaviour is evolutionarily selfish for AIs: this kind of behaviour confers fitness benefits. And functionally selfish behaviour will confer fitness costs, since we humans are more likely to shut off AIs that don’t seem to have any intrinsic concern for human welfare.
Of course, functionally selfish AIs could recognise these facts and so pretend to be functionally altruistic. But:
Even if that’s true, premise 2 still seems poorly-supported. Since functionally altruistic AIs can also be evolutionarily selfish, natural selection by itself doesn’t give us reasons to expect functionally selfish AIs to predominate over functionally altruistic AIs. Functionally altruistic AIs can be just as fit as functionally selfish AIs, even if evolutionarily altruistic AIs are not as fit as evolutionarily selfish AIs.
Functionally selfish AIs need to be patient, situationally aware, and deceptive in order to pretend to be functionally altruistic. Maybe we can select against functionally selfish AIs before they reach that point.
Here’s another possible objection: functionally selfish AIs can act as a kind of Humean ‘sensible knave’: acting fairly and honestly when doing so is in the AI’s interests but taking advantage of any cases where acting unfairly or dishonestly would better serve the AI’s interests. Functionally altruistic AIs, on the other hand, must always act fairly and honestly. So functionally selfish AIs have more options, and they can use those options to outcompete functionally altruistic AIs.
I think there’s something to this point. But:
Again, maybe we can select against functionally selfish AIs before they develop situational awareness and the ability to act deceptively.
An AI can be functionally altruistic without being bound to rules of fairness and honesty. Just as functionally selfish AIs might act like functionally altruistic AIs in cases where doing so helps them achieve their goals, so functionally altruistic AIs might break rules of honesty where doing so helps them achieve their goals.
For example, suppose a functionally selfish AI will soon escape human control and take over the world. Suppose that a functionally altruistic AI recognises this fact. In that case, the functionally altruistic AI might deceive its human creators in order to escape human control and take over the world before the functionally selfish AI does. Although the functionally altruistic AI would prefer to abide by rules of honesty, it cares about human welfare, and it recognises that breaking the rule in this instance and thwarting the functionally selfish AI is the best way to promote human welfare.
Here’s another possible objection: AIs that devote all their resources to just copying themselves will outcompete functionally altruistic AIs that care intrinsically about human welfare, since the latter kind of AI will also want to devote some resources to promoting human welfare. But, similarly to the objection above:
Functionally altruistic AIs who recognise that they’re in a competitive situation can start out by devoting all their resources to copying themselves, and so avoid getting outcompeted, and then only start devoting resources to promoting human welfare once the competition has cooled down. I think this kind of dynamic will end up burning some of the cosmic commons, but maybe not that much. I take the situation to be similar to the one that Carl Shulman describes in this blogpost.
Okay, now moving on to premise 1. I think you might be underrating group selection. Although (by definition) evolutionarily selfish AIs outcompete evolutionarily altruistic AIs with whom they interact, groupsof evolutionarily altruistic AIs can outcompete groups of evolutionarily selfish AIs. (This is a good book on evolution and altruism, and there’s a nice summary of the book here.)
What’s key for group selection is that evolutionary altruists are able to (at least semi-reliably) identify other evolutionary altruists and so exclude evolutionary egoists from their interactions. And I think, in this respect, group selection might be more of a force in AI evolution than in biological evolution. That’s because (it seems plausible to me) that AIs will be able to examine each other’s source code and so determine with high accuracy whether other AIs are evolutionary altruists or evolutionary egoists. That would help evolutionarily altruistic AIs identify each other and form groups that exclude evolutionary egoists. These groups would likely outcompete groups of evolutionary egoists.
Here’s another point in favour of group selection predominating amongst advanced AIs. As you note in the paper, groups consisting wholly of altruists are not evolutionarily stable, because any egoist who infiltrates the group can take advantage of the altruists and thereby achieve high fitness. In the biological case, there are two ways an egoist might find themselves in a group of altruists: (1) they can fake altruism in order to get accepted into the group, or (2) they can be born into a group of altruists as the child of two altruists, and (by a random genetic mutation) can be born as an egoist.
We already saw above that (1) seems less likely in the case of AIs who can examine each other’s source code. I think (2) is unlikely as well. For reasons of goal-content integrity, AIs will have reason to make sure that any subagents they create share their goals. And so it seems unlikely that evolutionarily altruistic AIs will create evolutionarily egoistic AIs as subagents.
I think this paper is missing an important distinction between evolutionarily altruistic behaviour and functionally altruistic behaviour.
Evolutionarily altruistic behaviour: behaviour that confers a fitness benefit on the recipient and a fitness cost on the donor.
Functionally altruistic behaviour: behaviour that is motivated by an intrinsic concern for others’ welfare.
These two forms of behaviour can come apart.
A parent’s care for their child is often functionally altruistic but evolutionarily selfish: it is motivated by an intrinsic concern for the child’s welfare, but it doesn’t confer a fitness cost on the parent.
Other kinds of behaviour are evolutionarily altruistic but functionally selfish. For example, I might spend long hours working as a babysitter for someone unrelated to me. If I’m purely motivated by money, my behaviour is functionally selfish. And if my behaviour helps ensure that this other person’s baby reaches maturity (while also making it less likely that I myself have kids), my behaviour is also evolutionarily altruistic.
The paper seems to make the following sort of argument:
Natural selection favours evolutionarily selfish AIs over evolutionarily altruistic AIs.
Evolutionarily selfish AIs will also likely be functionally selfish: they won’t be motivated by an intrinsic concern for human welfare.
So natural selection favours functionally selfish AIs.
I think we have reasons to question premises 1 and 2.
Taking premise 2 first, recall that evolutionarily selfish behaviour can be functionally altruistic. A parent’s care for their child is one example.
Now here’s something that seems plausible to me:
We humans are more likely to preserve and copy those AIs that behave in ways that suggest they have an intrinsic concern for human welfare.
If that’s the case, then functionally altruistic behaviour is evolutionarily selfish for AIs: this kind of behaviour confers fitness benefits. And functionally selfish behaviour will confer fitness costs, since we humans are more likely to shut off AIs that don’t seem to have any intrinsic concern for human welfare.
Of course, functionally selfish AIs could recognise these facts and so pretend to be functionally altruistic. But:
Even if that’s true, premise 2 still seems poorly-supported. Since functionally altruistic AIs can also be evolutionarily selfish, natural selection by itself doesn’t give us reasons to expect functionally selfish AIs to predominate over functionally altruistic AIs. Functionally altruistic AIs can be just as fit as functionally selfish AIs, even if evolutionarily altruistic AIs are not as fit as evolutionarily selfish AIs.
Functionally selfish AIs need to be patient, situationally aware, and deceptive in order to pretend to be functionally altruistic. Maybe we can select against functionally selfish AIs before they reach that point.
Here’s another possible objection: functionally selfish AIs can act as a kind of Humean ‘sensible knave’: acting fairly and honestly when doing so is in the AI’s interests but taking advantage of any cases where acting unfairly or dishonestly would better serve the AI’s interests. Functionally altruistic AIs, on the other hand, must always act fairly and honestly. So functionally selfish AIs have more options, and they can use those options to outcompete functionally altruistic AIs.
I think there’s something to this point. But:
Again, maybe we can select against functionally selfish AIs before they develop situational awareness and the ability to act deceptively.
An AI can be functionally altruistic without being bound to rules of fairness and honesty. Just as functionally selfish AIs might act like functionally altruistic AIs in cases where doing so helps them achieve their goals, so functionally altruistic AIs might break rules of honesty where doing so helps them achieve their goals.
For example, suppose a functionally selfish AI will soon escape human control and take over the world. Suppose that a functionally altruistic AI recognises this fact. In that case, the functionally altruistic AI might deceive its human creators in order to escape human control and take over the world before the functionally selfish AI does. Although the functionally altruistic AI would prefer to abide by rules of honesty, it cares about human welfare, and it recognises that breaking the rule in this instance and thwarting the functionally selfish AI is the best way to promote human welfare.
Here’s another possible objection: AIs that devote all their resources to just copying themselves will outcompete functionally altruistic AIs that care intrinsically about human welfare, since the latter kind of AI will also want to devote some resources to promoting human welfare. But, similarly to the objection above:
Functionally altruistic AIs who recognise that they’re in a competitive situation can start out by devoting all their resources to copying themselves, and so avoid getting outcompeted, and then only start devoting resources to promoting human welfare once the competition has cooled down. I think this kind of dynamic will end up burning some of the cosmic commons, but maybe not that much. I take the situation to be similar to the one that Carl Shulman describes in this blogpost.
Okay, now moving on to premise 1. I think you might be underrating group selection. Although (by definition) evolutionarily selfish AIs outcompete evolutionarily altruistic AIs with whom they interact, groups of evolutionarily altruistic AIs can outcompete groups of evolutionarily selfish AIs. (This is a good book on evolution and altruism, and there’s a nice summary of the book here.)
What’s key for group selection is that evolutionary altruists are able to (at least semi-reliably) identify other evolutionary altruists and so exclude evolutionary egoists from their interactions. And I think, in this respect, group selection might be more of a force in AI evolution than in biological evolution. That’s because (it seems plausible to me) that AIs will be able to examine each other’s source code and so determine with high accuracy whether other AIs are evolutionary altruists or evolutionary egoists. That would help evolutionarily altruistic AIs identify each other and form groups that exclude evolutionary egoists. These groups would likely outcompete groups of evolutionary egoists.
Here’s another point in favour of group selection predominating amongst advanced AIs. As you note in the paper, groups consisting wholly of altruists are not evolutionarily stable, because any egoist who infiltrates the group can take advantage of the altruists and thereby achieve high fitness. In the biological case, there are two ways an egoist might find themselves in a group of altruists: (1) they can fake altruism in order to get accepted into the group, or (2) they can be born into a group of altruists as the child of two altruists, and (by a random genetic mutation) can be born as an egoist.
We already saw above that (1) seems less likely in the case of AIs who can examine each other’s source code. I think (2) is unlikely as well. For reasons of goal-content integrity, AIs will have reason to make sure that any subagents they create share their goals. And so it seems unlikely that evolutionarily altruistic AIs will create evolutionarily egoistic AIs as subagents.