Here’s my just-so story for how humans evolved impartial altruism by going through several particular steps:
First there was kin selection evolving for particular reasons related to how DNA is passed on. This selects for the precursors to altruism.
With ability to recognise individual characteristics and a long-term memory allowing you to keep track of them, species can evolve stable pairwise reputations.
This allows reciprocity to evolve on top of kin selection, because reputations allow you to keep track of who’s likely to reciprocate vs defect.
More advanced communication allows larger groups to rapidly synchronise reputations. Precursors of this include “eavesdropping”, “triadic awareness”,[1] all the way up to what we know as “gossip”.
This leads to indirect reciprocity. So when you cheat one person, it affects everybody’s willingness to trade with you.
There’s some kind of inertia to the proxies human brains generalise on. This seems to be a combination of memetic evolution plus specific facts about how brains generalise very fast.
If altruistic reputation is a stable proxy for long enough, the meme stays in social equilibrium even past the point where it benefits individual genetic fitness.
In sum, I think impartial altruism (e.g. EA) is the result of “overgeneralising” the notion of indirect reciprocity, such that you end up wanting to help everybody everywhere.[2] And I’m skeptical a randomly drawn AI will meet the same requirements for that to happen to them.
“White-faced capuchin monkeys show triadic awareness in their choice of allies”:
″...contestants preferentially solicited prospective coalition partners that (1) were dominant to their opponents, and (2) had better social relationships (higher ratios of affiliative/cooperative interactions to agonistic interactions) with themselves than with their opponents.”
You can get allies by being nice, but not unless you’re also dominant.
For me, it’s not primarily about human values. It’s about altruistic values. Whatever anything cares about, I care about that in proportion to how much they care about it.
I’m skeptical a randomly drawn AI will meet the same requirements for that to happen to them.
I think 100 % alignement with human values would be better than random values, but superintelligent AI would presumably be trained on human data, so it would be aligned with human values somewhat. I also wonder about the extent to which the values of the superintelligent AI could change, hopefully for the better (as human values have).
I also have specific just-so stories for why human values have changed for “moral circle expansion” over time, and I’m not optimistic that process will continue indefinitely unless intervened on.
Here’s my just-so story for how humans evolved impartial altruism by going through several particular steps:
First there was kin selection evolving for particular reasons related to how DNA is passed on. This selects for the precursors to altruism.
With ability to recognise individual characteristics and a long-term memory allowing you to keep track of them, species can evolve stable pairwise reputations.
This allows reciprocity to evolve on top of kin selection, because reputations allow you to keep track of who’s likely to reciprocate vs defect.
More advanced communication allows larger groups to rapidly synchronise reputations. Precursors of this include “eavesdropping”, “triadic awareness”,[1] all the way up to what we know as “gossip”.
This leads to indirect reciprocity. So when you cheat one person, it affects everybody’s willingness to trade with you.
There’s some kind of inertia to the proxies human brains generalise on. This seems to be a combination of memetic evolution plus specific facts about how brains generalise very fast.
If altruistic reputation is a stable proxy for long enough, the meme stays in social equilibrium even past the point where it benefits individual genetic fitness.
In sum, I think impartial altruism (e.g. EA) is the result of “overgeneralising” the notion of indirect reciprocity, such that you end up wanting to help everybody everywhere.[2] And I’m skeptical a randomly drawn AI will meet the same requirements for that to happen to them.
“White-faced capuchin monkeys show triadic awareness in their choice of allies”:
You can get allies by being nice, but not unless you’re also dominant.
For me, it’s not primarily about human values. It’s about altruistic values. Whatever anything cares about, I care about that in proportion to how much they care about it.
Thanks for that story!
I think 100 % alignement with human values would be better than random values, but superintelligent AI would presumably be trained on human data, so it would be aligned with human values somewhat. I also wonder about the extent to which the values of the superintelligent AI could change, hopefully for the better (as human values have).
I also have specific just-so stories for why human values have changed for “moral circle expansion” over time, and I’m not optimistic that process will continue indefinitely unless intervened on.
Anyway, these are important questions!