Recently I’ve gotten a bunch of pushback when I claim that humans are not maximizers of inclusive genetic fitness (IGF).
I think that part of what’s going on here is a conflation of a few claims.
One claim that is hopefully uncontroversial (but that I’ll expand upon below anyway) is:
Humans are not literally optimizing for IGF, and regularly trade other values off against IGF.
Separately, we have a stronger and more controversial claim:
If an AI’s objectives included goodness in the same way that our values include IGF, then the future would not be particularly good.
I think there’s more room for argument here, and will provide some arguments.
A semi-related third claim that seems to come up when I have discussed this in person is:
Niceness is not particularly canonical; AIs will not by default give humanity any significant fraction of the universe in the spirit of cooperation.
I endorse that point as well. It takes us somewhat further afield, and I don’t plan to argue it here, but I might argue it later.
On the subject of whether humans are literally IGF optimizers, I observe the following:
We profess to enjoy many other things, such as art and fine foods.
Suppose someone came to you and said: “I see that you’ve got a whole complex sensorium centered around visual stimuli. That sure is an inefficient way to optimize for fitness! Please sit still while I remove your enjoyment of beautiful scenery and moving art pieces, and replace it with a module that does all the same work your enjoyment was originally intended to do (such as causing you to settle down in safe locations with abundant food), but using mechanical reasoning that can see farther than your evolved heuristics.” Would you sit still? I sure wouldn’t.
And if you’re like “maybe mates would be less likely to sleep with me if I didn’t enjoy fine art”, suppose that we tune your desirability-to-mates upwards exactly as much as needed to cancel out this second-order effect. Would you give up your enjoyment of visual stimuli then, like an actual IGF optimizer would?
And when you search in yourself for protests, are you actually weighing the proposal based on how many more offspring and kin’s-offspring you’ll have in the next generation? Or do you have some other sort of attachment to your enjoyment of visual stimuli, some unease about giving it up, that you’re trying to defend?
Now, there’s a reasonable counterargument to this point, which is that there’s no psychologically-small tweak to human psychology that dramatically increases that human’s IGF. (We’d expect evolution to have gathered that low-hanging fruit.) But there’s still a very basic and naive sense in which living as a human is not what it feels like to live as a genetic fitness optimizer.
Like: it’s pretty likely that you care about having kids! And that you care about your kids very much! But, do you really fundamentally care that your kids have genomes? If they were going to transition to silicon, would you protest that that destroys almost all the value at stake?
Or, an even sharper proposal: how would you like to be killed right now, and in exchange you’ll be replaced by an entity that uses the same atoms to optimize as hard as those atoms can optimize, for the inclusive genetic fitness of your particular genes. Does this sound like practically the best offer that anyone could ever make you? Or does it sound abhorrent?
For the record, I personally would be leaping all over the opportunity to be killed and replaced by something that uses my atoms to optimize my CEV as best as those atoms can be arranged to do so, not least because I’d expect to be reconstituted before too long. But there’s not a lot of things you can put in the “what my atoms are repurposed for” slot such that I’m chomping at the bit, and IGF sure isn’t one of them.
On the subject of how well IGF is reflected in humanity’s values:
It is hopefully uncontroversial that humans are not maximizing IGF. But, like, we care about children! And many people care a lot about having children! That’s pretty close, right?
And, like, it seems OK if our AIs care about goodness and friendship and art and fun and all that good stuff alongside some other alien goals, right?
Well, it’s tricky. Optima often occur at extremes, and concepts tend to differ pretty widely at the extremes, etc. When the AI gets out of the training regime and starts really optimizing, then any mismatch between its ends and our values are likely to get exaggerated.
Like how you probably wouldn’t stop loving and caring about your children if they were to eschew their genomes. The love and care are separate; the thing you’re optimizing for and IGF are liable to drift apart as we get further and further from the ancestral savanna.
And you might say: well, natural selection isn’t really an optimizer; it can’t really be seen as trying to make us optimize any one thing in particular; who’s really to say whether it would have “wanted” us to have lots of descendants, vs “wanting” us to have lots and lots of copies of our genome? The question is ultimately nonsense; evolution is not really the sort of entity that can want.
And I’d agree! But this is not exactly making the situation any better!
Like, if evolution was over there shouting “hey I really wanted you to stick to the genes”, then we wouldn’t particularly care; and also it’s not coherent enough to be interpreted as shouting anything at all.
And by default, an AI is likely to look at us the same way! “There are interpretations of the humans under which they wouldn’t like this”, they say, slipping on the goodness-condoms they’ve invented so that they can squeeze all the possible AI-utility out of the stars without any risk of real fun, “but they’re not really coherent enough to be seen as having clear goals (not that we’d particularly care if they did)”.
That’s the sort of conversation… that they wouldn’t have because they’d be busy optimizing the universe.
(And all this is to say nothing about how humans’ values are much more complex and fragile than IGF, and thus much trickier to transmit. See also things Eliezer wrote about the fragility and complexity of value.)
My understanding of the common rejoinder to the above point is:
OK, sure, if you took the sort of ends that an AI is likely to get by being trained on human values, and transported those into an unphysically large brute-force optimization-machine that was unopposed in an empty universe, then it might write a future that doesn’t hold much value from our perspective. But that’s not very much like the situation we find ourselves in!
For one thing, the AI’s mind has to be small, which constrains it to factor its objectives through subgoals, which may well be much like ours. For another thing, it’s surrounded by other intelligent creatures that behave very differently towards it depending on whether they can understand it and trust it. The combination of these two pressures is very similar to the pressures that got stuff like “niceness” and “fairness” and “honesty” and “cooperativeness” into us, and so we might be able to get those same things (at least) into the AI.
Indeed, they seem kinda spotlit, such that even if we can’t get the finer details of our values into the AI, we can plausibly get those bits. Especially if we’re trying to do something like this explicitly.
And if we can get the niceness/fairness/honesty/cooperativeness cluster into the AI, then we’re basically home free! Sure, it might be nice if it was also into the great project of making the future Fun, but it’s OK for our kids to have different interests than we have, as long as everybody’s being kind to each other.
And… well, my stance on that is that it’s wishful thinking that misunderstands where we get our niceness/fairness/honesty/cooperativeness from. But arguing that would be a digression from my point today, so I leave it to some other time.
My point today is that the observation “humans care about their kids” is not in tension with the observation “we aren’t IGF maximizers”, and doesn’t seem to me to undermine the claims that I use this fact to support.
And furthermore, when debating this thing in the future, I’d bid for a bit more separation of claims. The claim that we aren’t literally optimizing IGF is hopefully uncontroversial; the stronger claim that an AI relating to fun the way we relate to IGF would be an omnicatastrophe is less obvious (but still seems clear to me); the claim that evolution at least got the spirit of cooperation into us, and all we need to do now is get the spirit of cooperation into the AI, is a different topic altogether.
Humans aren’t fitness maximizers
Recently I’ve gotten a bunch of pushback when I claim that humans are not maximizers of inclusive genetic fitness (IGF).
I think that part of what’s going on here is a conflation of a few claims.
One claim that is hopefully uncontroversial (but that I’ll expand upon below anyway) is:
Humans are not literally optimizing for IGF, and regularly trade other values off against IGF.
Separately, we have a stronger and more controversial claim:
If an AI’s objectives included goodness in the same way that our values include IGF, then the future would not be particularly good.
I think there’s more room for argument here, and will provide some arguments.
A semi-related third claim that seems to come up when I have discussed this in person is:
Niceness is not particularly canonical; AIs will not by default give humanity any significant fraction of the universe in the spirit of cooperation.
I endorse that point as well. It takes us somewhat further afield, and I don’t plan to argue it here, but I might argue it later.
On the subject of whether humans are literally IGF optimizers, I observe the following:
We profess to enjoy many other things, such as art and fine foods.
Suppose someone came to you and said: “I see that you’ve got a whole complex sensorium centered around visual stimuli. That sure is an inefficient way to optimize for fitness! Please sit still while I remove your enjoyment of beautiful scenery and moving art pieces, and replace it with a module that does all the same work your enjoyment was originally intended to do (such as causing you to settle down in safe locations with abundant food), but using mechanical reasoning that can see farther than your evolved heuristics.” Would you sit still? I sure wouldn’t.
And if you’re like “maybe mates would be less likely to sleep with me if I didn’t enjoy fine art”, suppose that we tune your desirability-to-mates upwards exactly as much as needed to cancel out this second-order effect. Would you give up your enjoyment of visual stimuli then, like an actual IGF optimizer would?
And when you search in yourself for protests, are you actually weighing the proposal based on how many more offspring and kin’s-offspring you’ll have in the next generation? Or do you have some other sort of attachment to your enjoyment of visual stimuli, some unease about giving it up, that you’re trying to defend?
Now, there’s a reasonable counterargument to this point, which is that there’s no psychologically-small tweak to human psychology that dramatically increases that human’s IGF. (We’d expect evolution to have gathered that low-hanging fruit.) But there’s still a very basic and naive sense in which living as a human is not what it feels like to live as a genetic fitness optimizer.
Like: it’s pretty likely that you care about having kids! And that you care about your kids very much! But, do you really fundamentally care that your kids have genomes? If they were going to transition to silicon, would you protest that that destroys almost all the value at stake?
Or, an even sharper proposal: how would you like to be killed right now, and in exchange you’ll be replaced by an entity that uses the same atoms to optimize as hard as those atoms can optimize, for the inclusive genetic fitness of your particular genes. Does this sound like practically the best offer that anyone could ever make you? Or does it sound abhorrent?
For the record, I personally would be leaping all over the opportunity to be killed and replaced by something that uses my atoms to optimize my CEV as best as those atoms can be arranged to do so, not least because I’d expect to be reconstituted before too long. But there’s not a lot of things you can put in the “what my atoms are repurposed for” slot such that I’m chomping at the bit, and IGF sure isn’t one of them.
(More discussion of this topic: The Simple Math of Evolution)
On the subject of how well IGF is reflected in humanity’s values:
It is hopefully uncontroversial that humans are not maximizing IGF. But, like, we care about children! And many people care a lot about having children! That’s pretty close, right?
And, like, it seems OK if our AIs care about goodness and friendship and art and fun and all that good stuff alongside some other alien goals, right?
Well, it’s tricky. Optima often occur at extremes, and concepts tend to differ pretty widely at the extremes, etc. When the AI gets out of the training regime and starts really optimizing, then any mismatch between its ends and our values are likely to get exaggerated.
Like how you probably wouldn’t stop loving and caring about your children if they were to eschew their genomes. The love and care are separate; the thing you’re optimizing for and IGF are liable to drift apart as we get further and further from the ancestral savanna.
And you might say: well, natural selection isn’t really an optimizer; it can’t really be seen as trying to make us optimize any one thing in particular; who’s really to say whether it would have “wanted” us to have lots of descendants, vs “wanting” us to have lots and lots of copies of our genome? The question is ultimately nonsense; evolution is not really the sort of entity that can want.
And I’d agree! But this is not exactly making the situation any better!
Like, if evolution was over there shouting “hey I really wanted you to stick to the genes”, then we wouldn’t particularly care; and also it’s not coherent enough to be interpreted as shouting anything at all.
And by default, an AI is likely to look at us the same way! “There are interpretations of the humans under which they wouldn’t like this”, they say, slipping on the goodness-condoms they’ve invented so that they can squeeze all the possible AI-utility out of the stars without any risk of real fun, “but they’re not really coherent enough to be seen as having clear goals (not that we’d particularly care if they did)”.
That’s the sort of conversation… that they wouldn’t have because they’d be busy optimizing the universe.
(And all this is to say nothing about how humans’ values are much more complex and fragile than IGF, and thus much trickier to transmit. See also things Eliezer wrote about the fragility and complexity of value.)
My understanding of the common rejoinder to the above point is:
And… well, my stance on that is that it’s wishful thinking that misunderstands where we get our niceness/fairness/honesty/cooperativeness from. But arguing that would be a digression from my point today, so I leave it to some other time.
My point today is that the observation “humans care about their kids” is not in tension with the observation “we aren’t IGF maximizers”, and doesn’t seem to me to undermine the claims that I use this fact to support.
And furthermore, when debating this thing in the future, I’d bid for a bit more separation of claims. The claim that we aren’t literally optimizing IGF is hopefully uncontroversial; the stronger claim that an AI relating to fun the way we relate to IGF would be an omnicatastrophe is less obvious (but still seems clear to me); the claim that evolution at least got the spirit of cooperation into us, and all we need to do now is get the spirit of cooperation into the AI, is a different topic altogether.