I think the fact that people are partial to humanity explains a large fraction of the disagreement people have with me.
Maybe, it’s hard for me to know. But I predict most the pushback you’re getting from relatively thoughtful longtermists isn’t due to this.
I’ve noticed that EAs are happy to concede that AIs could be moral patients, but are generally reluctant to admit AIs as moral agents, in the way they’d be happy to accept humans as independent moral agents (e.g. newborns) into our society.
I agree with this.
I’dcall this “being partial to humanity”, or at least, “being partial to the values of the human species”.
I think “being partial to humanity” is a bad description of what’s going on because (e.g.) these same people would be considerably more on board with aliens. I think the main thing going on is that people have some (probably mistaken) levels of pessimism about how AIs would act as moral agents which they don’t have about (e.g.) aliens.
To test this hypothesis, I recently asked three questions on Twitter about whether people would be willing to accept immigration through a portal to another universe from three sources:
“a society of humans who are very similar to us”
“a society of people who look & act like humans, but each of them only cares about their family”
“a society of people who look & act like humans, but they only care about maximizing paperclips”
...
I claim there just aren’t really any defensible reasons to maintain this choice other than by implicitly appealing to a partiality towards humanity.
This comparison seems to me to be missing the point. Minimally I think what’s going on is not well described as “being partial to humanity”.
Here’s a comparison I prefer:
A society of humans who are very similar to us.
A society of humans who are very similar to us in basically every way, except that they have a genetically-caused and strong terminal preference for maximizing the total expected number of paper clips (over the entire arc of history) and only care about other things instrumentally. They are sufficiently commited to paper clip maximization that this will persist on arbitrary reflection (e.g. they’d lock in this view immediately when given this option) and let’s also suppose that this view is transmitted genetically and in a gene-drive-y way such that all of their descendents will also only care about paper clips. (You can change paper clips to basically anything else which is broadly recognized to have no moral value on its own, e.g. gold twisted into circles.)
A society of beings (e.g. aliens) who are extremely different in basically every way to humans except that they also have something pretty similar to the concepts of “morality”, “pain”, “pleasure”, “moral patienthood”, “happyness”, “preferences”, “altruism”, and “careful reasoning about morality (moral thoughtfulness)”. And the society overall also has a roughly similar relationship with these concepts (e.g. the level of “altruism” is similar). (Note that having the same relationship as humans to these concepts is a pretty low bar! Humans aren’t that morally thoughtful!)
I think I’m almost equally happy with (1) and (3) on this list and quite unhappy with (2).
If you changed (3) to instead be “considerably more altruistic”, I would prefer (3) over (1).
I think it seems weird to call my views on the comparison I just outlined as “being partial to humanity”: I actually prefer (3) over (2) even though (2) are literally humans!
(Also, I’m not that commited to having concepts of “pain” and “pleasure”, but I’m relatively commited to having a concepts which are something like “moral patienthood”, “preferences”, and “altruism”.)
Below is a mild spoiler for a story by Eliezer Yudkowsky:
To make the above comparison about different beings more concrete, in the case of three worlds collide, I would basically be fine giving the universe over the the super-happies relative to humans (I think mildly better than humans?) and I think it seems only mildly worse than humans to hand it over to the baby-eaters. In both cases, I’m pricing in some amount of reflection and uplifting which doesn’t happen in the actual story of three worlds collide, but would likely happen in practice. That is, I’m imagining seeing these societies prior to their singularity and then based on just observations of their societies at this point, deciding how good they are (pricing in the fact that the society might change over time).
To be clear, it seems totally reasonable to call this “being partial to some notion of moral thoughtfulness about pain, pleasure, and preferences”, but these concepts don’t seem that “human” to me. (I predict these occur pretty frequently in evolved life that reaches a singularity for instance. And they might occur in AIs, but I expect misaligned AIs which seize control of the world are worse from my perspective than if humans retain control.)
When I say that people are partial to humanity, I’m including an irrational bias towards thinking that humans, or evolved beings, are unusually thoughtful or ethical compared to the alternatives (I believe this is in fact an irrational bias, since the arguments I’ve seen for thinking that unaligned AIs will be less thoughtful or ethical than aliens seem very weak to me).
In other cases, when people irrationally hold a certain group X to a higher standard than a group Y, it is routinely described as “being partial to group Y over group X”. I think this is just what “being partial” means, in an ordinary sense, across a wide range of cases.
For example, if I proposed aligning AI to my local friend group, with the explicit justification that I thought my friends are unusually thoughtful, I think this would be well-described as me being “partial” to my friend group.
To the extent you’re seeing me as saying something else about how longtermists view the argument, I suspect you’re reading me as saying something stronger than what I originally intended.
In that case, my main disagreement is thinking that your twitter poll is evidence for your claims.
More specifically:
I claim there just aren’t really any defensible reasons to maintain this choice other than by implicitly appealing to a partiality towards humanity.
Like you claim there aren’t any defensible reasons to think that what humans will do is better than literally maximizing paper clips? This seems totally wild to me.
Like you claim there aren’t any defensible reasons to think that what humans will do is better than literally maximizing paper clips?
I’m not exactly sure what you mean by this. There were three options, and human paperclippers were only one of these options. I was mainly discussing the choice between (1) and (2) in the comment, not between (1) and (3).
Here’s my best guess at what you’re saying: it sounds like you’re repeating that you expect humans to be unusually altruistic or thoughtful compared to an unaligned alternative. But the point of my previous comment was to state my view that this bias counted as “being partial towards humanity”, since I view the bias as irrational. In light of that, what part of my comment are you objecting to?
To be clear, you can think the bias I’m talking about is actually rational; that’s fine. But I just disagree with you for pretty mundane reasons.
[Incorporating what you said in the other comment]
Also, to be clear, I agree that the question of “how much worse/better is it for AIs to get vast amounts of resources without human society intending to grant those resources to the AIs from a longtermist perspective” is underinvestigated, but I think there are pretty good reasons to systematically expect human control to be a decent amount better.
Then I think it’s worth concretely explaining what these reasons are to believe that human control will be a decent amount better in expectation. You don’t need to write this up yourself, of course. I think the EA community should write these reasons up. Because I currently view the proposition as non-obvious, and despite being a critical belief in AI risk discussions, it’s usually asserted without argument. When I’ve pressed people in the past, they typically give very weak reasons.
I don’t know how to respond to an argument whose details are omitted.
Then I think it’s worth concretely explaining what these reasons are to believe that human control will be a decent amount better in expectation. You don’t need to write this up yourself, of course.
+1, but I don’t generally think it’s worth counting on “the EA community” to do something like this. I’ve been vaguely trying to pitch Joe on doing something like this (though there are probably better uses of his time) and his recent blogs posts are touching similar topics.
Here’s my best guess at what you’re saying: it sounds like you’re repeating that you expect humans to be unusually altruistic or thoughtful compared to an unaligned alternative.
There, I’m just saying that human control is better than literal paperclip maximization.
This response still seems underspecified to me. Is the default unaligned alternative paperclip maximization in your view? I understand that Eliezer Yudkowsky has given arguments for this position, but it seems like you diverge significantly from Eliezer’s general worldview, so I’d still prefer to hear this take spelled out in more detail from your own point of view.
“a society of people who look & act like humans, but they only care about maximizing paperclips”
And then you say:
so far my followers, who are mostly EAs, are much more happy to let the humans immigrate to our world, compared to the last two options. I claim there just aren’t really any defensible reasons to maintain this choice other than by implicitly appealing to a partiality towards humanity.
So, I think more human control is better than more literal paperclip maximization, the option given in your poll.
My overall position isn’t that the AIs will certainly be paperclippers, I’m just arguing in isolation about why I think the choice given in the poll is defensible.
I have the feeling we’re talking past each other a bit. I suspect talking about this poll was kind of a distraction. I personally have the sense of trying to convey a central point, and instead of getting the point across, I feel the conversation keeps slipping into talking about how to interpret minor things I said, which I don’t see as very relevant.
I will probably take a break from replying for now, for these reasons, although I’d be happy to catch up some time and maybe have a call to discuss these questions in more depth. I definitely see you as trying a lot harder than most other EAs in trying to make progress on these questions collaboratively with me.
I’d be very happy to have some discussion on these topics with you Matthew. For what it’s worth, I really have found much of your work insightful, thought-provoking, and valuable. I think I just have some strong, core disagreements on multiple empirical/epistemological/moral levels with your latest series of posts.
That doesn’t mean I don’t want you to share your views, or that they’re not worth discussion, and I apologise if I came off as too hostile. An open invitation to have some kind of deeper discussion stands.[1]
Also, to be clear, I agree that the question of “how much worse/better is it for AIs to get vast amounts of resources without human society intending to grant those resources to the AIs from a longtermist perspective” is underinvestigated, but I think there are pretty good reasons to systematically expect human control to be a decent amount better.
Maybe, it’s hard for me to know. But I predict most the pushback you’re getting from relatively thoughtful longtermists isn’t due to this.
I agree with this.
I think “being partial to humanity” is a bad description of what’s going on because (e.g.) these same people would be considerably more on board with aliens. I think the main thing going on is that people have some (probably mistaken) levels of pessimism about how AIs would act as moral agents which they don’t have about (e.g.) aliens.
This comparison seems to me to be missing the point. Minimally I think what’s going on is not well described as “being partial to humanity”.
Here’s a comparison I prefer:
A society of humans who are very similar to us.
A society of humans who are very similar to us in basically every way, except that they have a genetically-caused and strong terminal preference for maximizing the total expected number of paper clips (over the entire arc of history) and only care about other things instrumentally. They are sufficiently commited to paper clip maximization that this will persist on arbitrary reflection (e.g. they’d lock in this view immediately when given this option) and let’s also suppose that this view is transmitted genetically and in a gene-drive-y way such that all of their descendents will also only care about paper clips. (You can change paper clips to basically anything else which is broadly recognized to have no moral value on its own, e.g. gold twisted into circles.)
A society of beings (e.g. aliens) who are extremely different in basically every way to humans except that they also have something pretty similar to the concepts of “morality”, “pain”, “pleasure”, “moral patienthood”, “happyness”, “preferences”, “altruism”, and “careful reasoning about morality (moral thoughtfulness)”. And the society overall also has a roughly similar relationship with these concepts (e.g. the level of “altruism” is similar). (Note that having the same relationship as humans to these concepts is a pretty low bar! Humans aren’t that morally thoughtful!)
I think I’m almost equally happy with (1) and (3) on this list and quite unhappy with (2).
If you changed (3) to instead be “considerably more altruistic”, I would prefer (3) over (1).
I think it seems weird to call my views on the comparison I just outlined as “being partial to humanity”: I actually prefer (3) over (2) even though (2) are literally humans!
(Also, I’m not that commited to having concepts of “pain” and “pleasure”, but I’m relatively commited to having a concepts which are something like “moral patienthood”, “preferences”, and “altruism”.)
Below is a mild spoiler for a story by Eliezer Yudkowsky:
To make the above comparison about different beings more concrete, in the case of three worlds collide, I would basically be fine giving the universe over the the super-happies relative to humans (I think mildly better than humans?) and I think it seems only mildly worse than humans to hand it over to the baby-eaters. In both cases, I’m pricing in some amount of reflection and uplifting which doesn’t happen in the actual story of three worlds collide, but would likely happen in practice. That is, I’m imagining seeing these societies prior to their singularity and then based on just observations of their societies at this point, deciding how good they are (pricing in the fact that the society might change over time).
To be clear, it seems totally reasonable to call this “being partial to some notion of moral thoughtfulness about pain, pleasure, and preferences”, but these concepts don’t seem that “human” to me. (I predict these occur pretty frequently in evolved life that reaches a singularity for instance. And they might occur in AIs, but I expect misaligned AIs which seize control of the world are worse from my perspective than if humans retain control.)
When I say that people are partial to humanity, I’m including an irrational bias towards thinking that humans, or evolved beings, are unusually thoughtful or ethical compared to the alternatives (I believe this is in fact an irrational bias, since the arguments I’ve seen for thinking that unaligned AIs will be less thoughtful or ethical than aliens seem very weak to me).
In other cases, when people irrationally hold a certain group X to a higher standard than a group Y, it is routinely described as “being partial to group Y over group X”. I think this is just what “being partial” means, in an ordinary sense, across a wide range of cases.
For example, if I proposed aligning AI to my local friend group, with the explicit justification that I thought my friends are unusually thoughtful, I think this would be well-described as me being “partial” to my friend group.
To the extent you’re seeing me as saying something else about how longtermists view the argument, I suspect you’re reading me as saying something stronger than what I originally intended.
In that case, my main disagreement is thinking that your twitter poll is evidence for your claims.
More specifically:
Like you claim there aren’t any defensible reasons to think that what humans will do is better than literally maximizing paper clips? This seems totally wild to me.
I’m not exactly sure what you mean by this. There were three options, and human paperclippers were only one of these options. I was mainly discussing the choice between (1) and (2) in the comment, not between (1) and (3).
Here’s my best guess at what you’re saying: it sounds like you’re repeating that you expect humans to be unusually altruistic or thoughtful compared to an unaligned alternative. But the point of my previous comment was to state my view that this bias counted as “being partial towards humanity”, since I view the bias as irrational. In light of that, what part of my comment are you objecting to?
To be clear, you can think the bias I’m talking about is actually rational; that’s fine. But I just disagree with you for pretty mundane reasons.
[Incorporating what you said in the other comment]
Then I think it’s worth concretely explaining what these reasons are to believe that human control will be a decent amount better in expectation. You don’t need to write this up yourself, of course. I think the EA community should write these reasons up. Because I currently view the proposition as non-obvious, and despite being a critical belief in AI risk discussions, it’s usually asserted without argument. When I’ve pressed people in the past, they typically give very weak reasons.
I don’t know how to respond to an argument whose details are omitted.
+1, but I don’t generally think it’s worth counting on “the EA community” to do something like this. I’ve been vaguely trying to pitch Joe on doing something like this (though there are probably better uses of his time) and his recent blogs posts are touching similar topics.
Also, it’s usually only the crux of longtermists which is probably one of the reasons why no one has gotten around to this.
You didn’t make this clear, so was just responding generically.
Separately, I think I feel a pretty similar intution for case (2), people literally only caring about their families seems pretty clearly worse.
There, I’m just saying that human control is better than literal paperclip maximization.
This response still seems underspecified to me. Is the default unaligned alternative paperclip maximization in your view? I understand that Eliezer Yudkowsky has given arguments for this position, but it seems like you diverge significantly from Eliezer’s general worldview, so I’d still prefer to hear this take spelled out in more detail from your own point of view.
Your poll says:
And then you say:
So, I think more human control is better than more literal paperclip maximization, the option given in your poll.
My overall position isn’t that the AIs will certainly be paperclippers, I’m just arguing in isolation about why I think the choice given in the poll is defensible.
I have the feeling we’re talking past each other a bit. I suspect talking about this poll was kind of a distraction. I personally have the sense of trying to convey a central point, and instead of getting the point across, I feel the conversation keeps slipping into talking about how to interpret minor things I said, which I don’t see as very relevant.
I will probably take a break from replying for now, for these reasons, although I’d be happy to catch up some time and maybe have a call to discuss these questions in more depth. I definitely see you as trying a lot harder than most other EAs in trying to make progress on these questions collaboratively with me.
I’d be very happy to have some discussion on these topics with you Matthew. For what it’s worth, I really have found much of your work insightful, thought-provoking, and valuable. I think I just have some strong, core disagreements on multiple empirical/epistemological/moral levels with your latest series of posts.
That doesn’t mean I don’t want you to share your views, or that they’re not worth discussion, and I apologise if I came off as too hostile. An open invitation to have some kind of deeper discussion stands.[1]
I’d like to try out the new dialogue feature on the Forum, but that’s a weak preference
Agreed, sorry about that.
Also, to be clear, I agree that the question of “how much worse/better is it for AIs to get vast amounts of resources without human society intending to grant those resources to the AIs from a longtermist perspective” is underinvestigated, but I think there are pretty good reasons to systematically expect human control to be a decent amount better.