(Cross-posted from my website. Podcast version here, or search for “Joe Carlsmith Audio” on your podcast app.
This essay is part of a series I’m calling “Otherness and control in
the age of AGI.” I’m hoping that the individual essays can be read
fairly well on their own, but see here for a brief summary of the essays
that have been released thus far.)
In my last essay, I discussed the way in which what I’ve called “deep
atheism” (that is, a fundamental mistrust towards both “Nature” and
“bare intelligence”) can prompt an aspiration to exert extreme levels of
control over the universe; I highlighted the sense in which both humans
and AIs, on Yudkowsky’s AI risk narrative, are animated by this sort
of aspiration; and I discussed some ways in which our civilization has
built up wariness around control-seeking of this kind. I think we should
be taking this sort of wariness quite seriously.
In this spirit, I want to look, in this essay, at Robin Hanson’s
critique of the AI risk discourse – a critique especially attuned the
way in which this discourse risks control-gone-wrong. In particular,
I’m interested in Hanson’s accusation that AI risk “others” the AIs (see
e.g.
here,
here,
and
here).
Hearing the claim that AIs may eventually differ greatly from us, and
become very capable, and that this could possibly happen fast, tends
to invoke our general fear-of-difference heuristic. Making us afraid
of these “others” and wanting to control them somehow … “Hate” and
“intolerance” aren’t overly strong terms for this attitude.[1]
I think he’s right to notice a tension in this vicinity. AI risk is,
indeed, about fearing some sort of uncontrolled other. But is that
always the bad sort of “othering?”
Some basic points up front
Well, let’s at least avoid basic mistakes/misunderstandings. For one:
hardcore AI risk folks like Yudkowsky are generally happy to care about
AI welfare – at least if welfare means something like “happy
sentience.” And pace some of Hanson’s accusations of bio-chauvinism,
these folks are extremelynotfussed about the fact that AI minds
are made of silicon (indeed: come now). Of course, this isn’t to say
that AI welfare (and AI rights) issues don’t get complicated (see e.g.
here
and here for a
glimpse of some of the complications), or that humanity as a whole will get the
“digital minds matter” stuff right. Indeed, I worry that we will get it
horribly wrong – and I do think that the AI risk discourse
under-attends to some of the tensions. But species-ism 101 (201?) – e.g., “I don’t care about digital suffering” – isn’t AI risk’s vice.
For two: clearly some sorts of otherness warrant some sorts of fear.
For example: maybe you, personally, don’t like to murder. But Bob, well:
Bob is different. If Bob gets a bunch of power, then: yep, it’s OK to
hold your babies close. And often OK, too, to try to “control” Bob into
not-killing-your-babies. Cf, also, the discussion of
getting-eaten-by-bears in the first essay. And the Nazis, too, were
different in their own way. Of course, there’s a long and ongoing
history of mistaking “different” for “the type of different that wants
to kill your babies.” We should, indeed, be very wary. But liberal
tolerance has never been a blank check; and not all fear is hatred.
Indeed, many attempts to diagnose the ethical mistake behind various
canonical difference-related vices (racism, sexism, species-ism, etc)
reveal a certain shallowness of commitment to difference-per-se. In
particular: such vices are often understood as missing some underlying
sameness – for example, “common humanity,” “persons,” “sentient
beings,” “children of the universe,” and so forth. And calls for social
harmony often recapitulate this structure: we might be different in X
ways, but (watch for the but) we have blah in common. This isn’t to
say that ethical commitment to a less adulterated difference-per-se is
impossible. But one wants, generally, a story about why it’s OK to eat
apples but not babies; why Furbies programmed to say “Biden” shouldn’t
get the vote; and why you can own a laptop but not a slave. And such a
story requires differences. The apple, the Furby, the laptop must be
importantly “Other” relative to e.g. human adults. They must be
outside some circle. Ethics is always drawing lines.
ChatGPT wouldn’t let the furby be voting for Biden in particular…
What exactly is Hanson’s critique?
With these basics in mind, then, what exactly is Hanson’s “other-ing the
AIs” critique? It has many facets, but here’s one attempt at
reconstruction:
People worried about AI risk are much more scared of future AIs than
future humans, because they think that:
a. AIs are more likely to do stuff like murder all the humans,
overthrow the government, and violate property rights, and
b. AIs are more likely to have values pursuit of which will result
in a ~zero-value future more generally.
But in fact, neither of these things are true.
So greater fear of future AIs relative to future humans is best
understood as a kind of arbitrary, in-group partiality – i.e.,
=”othering the AIs.”
Clearly, (2) is where the action is, here. Whence such a departure from
Yudkowsky’s nightmare? We can divide Hanson’s justification into two
components. The first argues that future AIs will be more similar to us
than the AI risk story suggests. The second argues that future humans,
by default, will be more different.
Will the AIs be more similar to us than AI risk expects?
Let’s start with “AIs will be more similar to us than AI risk expects.”
Above I mentioned propensity-to-murder as a classic form of otherness
that it’s OK to fear/control. And we often put “violating property
rights” and “overthrowing the government” in a similar bucket.
Presumably Hanson is not OK with AIs doing this stuff? But he doesn’t
think they will – or at least, not more than humans will. And why not?
It’s some combination of (i) “AIs would be designed and evolved to think
and act roughly like humans, in order to fit smoothly into our many
roughly-human-shaped social roles,” and (ii) like humans, they’ll be
constrained by legal and social incentives. And even setting aside
violence, Hanson generally appeals to (i) in response to objections like
“so … are you actually fine with future agents tiling the universe
with paperclips”? The AI values, says Hanson, won’t be that
alien.[2]
Big if true. But is it true? I won’t dive in much here, except to say
that this aspect of Hanson’s story generally strikes me as
under-argued. In particular, I think Hanson moves too quickly from “the
AIs will be trained to fit into the human economy” to “the AIs will have
values relevantly similar to human values,” and that he takes too much
for granted that legal and social incentives protecting humans from
being murdered/violently-disempowered will continue to bind adequately
if the AIs have most of the hard power. In this, I think, his argument
for (2) misses a lot of the core doom concern.
Will future humans be more different from us than AI risk expects?
But I think the other aspect of his argument for (2) – namely, “future
humans will be more different from us than AI risk expects” – is more
interesting. Here, Hanson’s basic move is to question the “alignment” of
the default human future, even absent AI. That is: human values have
changed dramatically over time – and not, argues Hanson, centrally in
response to a process of rational reflection, but rather in response to
other sorts of competition, contingency, and
economic/social/technological change. And even absent AIs, we should
expect this process to only continue and intensify, such that humans ten
generations from now (or: after ten doublings of GDP, or whatever) would
have values very different from our own – and not from having
done-more-philosophy.
Now, we can debate the empirics of past and future, here (though what
processes of values-change we endorse as “rational” may not be entirely
empirical).
Indeed, I think Hanson may be over-estimating how horrified the ancient
Greeks, or the hunter-gatherers, would be on reflection by the values of
the present-day world – and this even setting aside our material
abundance. And I might disagree, too, about exactly how different the
values of future humans would be, given various possible “futures
without AI” (though it’s not an especially clear-cut category).
How pissed would they be, on reflection, about present-day values? (Image source here.)
Still, I think Hanson is poking at something important and
uncomfortable. In particular: suppose we grant him the empirics.
Suppose, indeed, that even without AI, the default values of future
humans would “drift” until they were as paperclippers relative to us, such that
the world they create would be utterly valueless from our perspective.
What follows? Well, umm, if you care about the future having value …
then what follows is a need to exert more control. More yang. It is,
indeed, the “good future’ part of the alignment problem all over again
(though not the “notkilleveryone” part).
Of course, trying to make sure that future humans aren’t paperclippers
doesn’t mean locking in your specific, object-level values right now
(you still want to leave room for moral progress you’d
endorse-on-reflection). Nor, pace some of Hanson’s language, does it
mean “brainwashing” or “lobotomizing” the future people. If a boulder is
rolling towards a button that will create Sally, a paperclipper, and you
divert it towards a button that will create Bill, a deontologist, you’re
not brainwashing or lobotomizing Sally.[3] (Confusions in this vein
are a classic
issue
for reasoning about your impact on future people – and Hanson’s
analysis is not immune.)
Still, though: are you playing too much God, or too-Stalin? Who are you
to divert Nature’s boulder – that oh-so-defined “default”? And Sally,
at least, is pissed. Indeed, Hanson reminds us: aren’t we glad that the
ancient greeks didn’t try to divert the future to replace us with
people more like them? (Well, who knows how much they tried. But good
thing they didn’t succeed! Though, wait: how much did they succeed?).
But the question – or at least, the first-pass question – isn’t
whether we’re glad that the Greeks didn’t control our
values-on-reflection to be more greek. Indeed, basically everyone who
gets created with some set of values-on-reflection is glad that the
process that created them didn’t push towards agents with different
values instead.[4] If, in some horrible mistake, we set in motion a
future filled with suffering-maximizers, they, too, will be glad we
didn’t “control” the values of the future more (because this would’ve led to a future-with-less-suffering). But from our perspective, it’s not a good test.
Rather, the first-pass test, re:
lessons-from-the-ancient-greeks-about-controlling-future-values, is
whether the Greeks would be glad, on reflection, that they didn’t
make our values more greek. And one traditional answer, here, is yes. If
we could sit down with Aristotle, and explain to him why actually,
slavery is wrong, and that no one is by nature someone else’s
property,
then our hearts and his would sing in harmony. That is, on this story,
if Aristotle had somehow prevented future people from abolishing
slavery, then he would’ve been making a mistake by his own lights – preventing the flower-he-loves from blooming, via the march of Reason,
in history’s hand.
“A master (right) and his slave (left) in a phlyax play, Silician red-figured calyx-krater, c. 350 BC–340 BC.” (Image source here.)
But this isn’t the central story Hanson wants to tell. Rather, when
Hanson talks about values changing over time, he specifically wants to
deny that Reason has much to do with it. That is, it sounds a lot like
Hanson wants to say both that the ancient Greeks would be horrified even on reflection by our values, and that we should take our cues from
the ancient Greeks in deciding how much control to try to exert over the
values of future people. And at a high level, that sounds like a recipe
for, well, being horrified even on reflection by the values of future
people. Remind me why that’s good again? Indeed, on any meta-ethics
where the normative truth would be revealed to our reflection, we just
stipulated that it’s horrifying.
Now, we might try to construct Hanson’s story in other, more complicated
ways (see e.g.
here
for one attempt). But I want to stay, for now, with the dialectic that
this version of his view creates, which I think is plenty interesting.
In particular: on the one hand, we just stipulated that absent control,
the values of future humans would be horrifying/meaningless to us, even
on reflection and full understanding. On the other hand, some sort of
discomfort in trying to control the values of future humans persists (at
least for me). I think Hanson is right to notice it – and to notice,
too, its connection to trying to control the values of the AIs. I think
the AI alignment discourse should, in fact, prompt this discomfort – and that we should be serious about understanding, and avoiding, the
sort of yang-gone-wrong that it’s trying to track.
Indeed, I think when we bring certain other Yudkowskian vibes into view – and in particular, vibes related to the “fragility of value,”
“extremal Goodhart,” and “the tails come apart” – this discomfort
should deepen yet further. I’ll turn to this in the next essay.
There’s also a bit in the original quote where Robin accuses the
AI risk discourse of wanting to use “genocide, slavery, lobotomy, or
mind-control” to control the AIs. But this is extra charged (and I
don’t know where Robin got the genocide bit), so I want to set it
aside for a moment.
Though: how alien is too alien? Hanson doesn’t tend to say. And
my sense is that he thinks, too, that even unadulterated Moloch will
lead to a complex, diverse, and interesting ecosystem rather than a
monoculture. (Though: is a diverse ecosystem of different
office-supplies all that much of an improvement?) And also: that
this ecosystem will retain various path-dependent
“legacies”
of the present. (Though: will they be legacies we care about?)
Though, importantly, contemporary AI training does not look like
creating a mind from scratch, and raises much more serious
“brain-washing” type concerns.
And often glad, too, that the process wasn’t altered in any tiny
way at all, lest their existence be canceled by the non-identity
problem. But setting that aside.
Does AI risk “other” the AIs?
(Cross-posted from my website. Podcast version here, or search for “Joe Carlsmith Audio” on your podcast app.
This essay is part of a series I’m calling “Otherness and control in the age of AGI.” I’m hoping that the individual essays can be read fairly well on their own, but see here for a brief summary of the essays that have been released thus far.)
In my last essay, I discussed the way in which what I’ve called “deep atheism” (that is, a fundamental mistrust towards both “Nature” and “bare intelligence”) can prompt an aspiration to exert extreme levels of control over the universe; I highlighted the sense in which both humans and AIs, on Yudkowsky’s AI risk narrative, are animated by this sort of aspiration; and I discussed some ways in which our civilization has built up wariness around control-seeking of this kind. I think we should be taking this sort of wariness quite seriously.
In this spirit, I want to look, in this essay, at Robin Hanson’s critique of the AI risk discourse – a critique especially attuned the way in which this discourse risks control-gone-wrong. In particular, I’m interested in Hanson’s accusation that AI risk “others” the AIs (see e.g. here, here, and here).
Hanson sees this vice as core to the disagreement (“my best one-factor model to explain opinion variance here is this: some of us ‘other’ the AIs more”). And he invokes a deep lineage of liberal ideals in opposition.
I think he’s right to notice a tension in this vicinity. AI risk is, indeed, about fearing some sort of uncontrolled other. But is that always the bad sort of “othering?”
Some basic points up front
Well, let’s at least avoid basic mistakes/misunderstandings. For one: hardcore AI risk folks like Yudkowsky are generally happy to care about AI welfare – at least if welfare means something like “happy sentience.” And pace some of Hanson’s accusations of bio-chauvinism, these folks are extremely not fussed about the fact that AI minds are made of silicon (indeed: come now). Of course, this isn’t to say that AI welfare (and AI rights) issues don’t get complicated (see e.g. here and here for a glimpse of some of the complications), or that humanity as a whole will get the “digital minds matter” stuff right. Indeed, I worry that we will get it horribly wrong – and I do think that the AI risk discourse under-attends to some of the tensions. But species-ism 101 (201?) – e.g., “I don’t care about digital suffering” – isn’t AI risk’s vice.
For two: clearly some sorts of otherness warrant some sorts of fear. For example: maybe you, personally, don’t like to murder. But Bob, well: Bob is different. If Bob gets a bunch of power, then: yep, it’s OK to hold your babies close. And often OK, too, to try to “control” Bob into not-killing-your-babies. Cf, also, the discussion of getting-eaten-by-bears in the first essay. And the Nazis, too, were different in their own way. Of course, there’s a long and ongoing history of mistaking “different” for “the type of different that wants to kill your babies.” We should, indeed, be very wary. But liberal tolerance has never been a blank check; and not all fear is hatred.
Indeed, many attempts to diagnose the ethical mistake behind various canonical difference-related vices (racism, sexism, species-ism, etc) reveal a certain shallowness of commitment to difference-per-se. In particular: such vices are often understood as missing some underlying sameness – for example, “common humanity,” “persons,” “sentient beings,” “children of the universe,” and so forth. And calls for social harmony often recapitulate this structure: we might be different in X ways, but (watch for the but) we have blah in common. This isn’t to say that ethical commitment to a less adulterated difference-per-se is impossible. But one wants, generally, a story about why it’s OK to eat apples but not babies; why Furbies programmed to say “Biden” shouldn’t get the vote; and why you can own a laptop but not a slave. And such a story requires differences. The apple, the Furby, the laptop must be importantly “Other” relative to e.g. human adults. They must be outside some circle. Ethics is always drawing lines.
ChatGPT wouldn’t let the furby be voting for Biden in particular…
What exactly is Hanson’s critique?
With these basics in mind, then, what exactly is Hanson’s “other-ing the AIs” critique? It has many facets, but here’s one attempt at reconstruction:
People worried about AI risk are much more scared of future AIs than future humans, because they think that:
a. AIs are more likely to do stuff like murder all the humans, overthrow the government, and violate property rights, and
b. AIs are more likely to have values pursuit of which will result in a ~zero-value future more generally.
But in fact, neither of these things are true.
So greater fear of future AIs relative to future humans is best understood as a kind of arbitrary, in-group partiality – i.e.,
Clearly, (2) is where the action is, here. Whence such a departure from Yudkowsky’s nightmare? We can divide Hanson’s justification into two components. The first argues that future AIs will be more similar to us than the AI risk story suggests. The second argues that future humans, by default, will be more different.
Will the AIs be more similar to us than AI risk expects?
Let’s start with “AIs will be more similar to us than AI risk expects.” Above I mentioned propensity-to-murder as a classic form of otherness that it’s OK to fear/control. And we often put “violating property rights” and “overthrowing the government” in a similar bucket. Presumably Hanson is not OK with AIs doing this stuff? But he doesn’t think they will – or at least, not more than humans will. And why not? It’s some combination of (i) “AIs would be designed and evolved to think and act roughly like humans, in order to fit smoothly into our many roughly-human-shaped social roles,” and (ii) like humans, they’ll be constrained by legal and social incentives. And even setting aside violence, Hanson generally appeals to (i) in response to objections like “so … are you actually fine with future agents tiling the universe with paperclips”? The AI values, says Hanson, won’t be that alien.[2]
Big if true. But is it true? I won’t dive in much here, except to say that this aspect of Hanson’s story generally strikes me as under-argued. In particular, I think Hanson moves too quickly from “the AIs will be trained to fit into the human economy” to “the AIs will have values relevantly similar to human values,” and that he takes too much for granted that legal and social incentives protecting humans from being murdered/violently-disempowered will continue to bind adequately if the AIs have most of the hard power. In this, I think, his argument for (2) misses a lot of the core doom concern.
Will future humans be more different from us than AI risk expects?
But I think the other aspect of his argument for (2) – namely, “future humans will be more different from us than AI risk expects” – is more interesting. Here, Hanson’s basic move is to question the “alignment” of the default human future, even absent AI. That is: human values have changed dramatically over time – and not, argues Hanson, centrally in response to a process of rational reflection, but rather in response to other sorts of competition, contingency, and economic/social/technological change. And even absent AIs, we should expect this process to only continue and intensify, such that humans ten generations from now (or: after ten doublings of GDP, or whatever) would have values very different from our own – and not from having done-more-philosophy.
Now, we can debate the empirics of past and future, here (though what processes of values-change we endorse as “rational” may not be entirely empirical). Indeed, I think Hanson may be over-estimating how horrified the ancient Greeks, or the hunter-gatherers, would be on reflection by the values of the present-day world – and this even setting aside our material abundance. And I might disagree, too, about exactly how different the values of future humans would be, given various possible “futures without AI” (though it’s not an especially clear-cut category).
How pissed would they be, on reflection, about present-day values? (Image source here.)
Still, I think Hanson is poking at something important and uncomfortable. In particular: suppose we grant him the empirics. Suppose, indeed, that even without AI, the default values of future humans would “drift” until they were as paperclippers relative to us, such that the world they create would be utterly valueless from our perspective. What follows? Well, umm, if you care about the future having value … then what follows is a need to exert more control. More yang. It is, indeed, the “good future’ part of the alignment problem all over again (though not the “notkilleveryone” part).
Of course, trying to make sure that future humans aren’t paperclippers doesn’t mean locking in your specific, object-level values right now (you still want to leave room for moral progress you’d endorse-on-reflection). Nor, pace some of Hanson’s language, does it mean “brainwashing” or “lobotomizing” the future people. If a boulder is rolling towards a button that will create Sally, a paperclipper, and you divert it towards a button that will create Bill, a deontologist, you’re not brainwashing or lobotomizing Sally.[3] (Confusions in this vein are a classic issue for reasoning about your impact on future people – and Hanson’s analysis is not immune.)
Still, though: are you playing too much God, or too-Stalin? Who are you to divert Nature’s boulder – that oh-so-defined “default”? And Sally, at least, is pissed. Indeed, Hanson reminds us: aren’t we glad that the ancient greeks didn’t try to divert the future to replace us with people more like them? (Well, who knows how much they tried. But good thing they didn’t succeed! Though, wait: how much did they succeed?).
But the question – or at least, the first-pass question – isn’t whether we’re glad that the Greeks didn’t control our values-on-reflection to be more greek. Indeed, basically everyone who gets created with some set of values-on-reflection is glad that the process that created them didn’t push towards agents with different values instead.[4] If, in some horrible mistake, we set in motion a future filled with suffering-maximizers, they, too, will be glad we didn’t “control” the values of the future more (because this would’ve led to a future-with-less-suffering). But from our perspective, it’s not a good test.
Rather, the first-pass test, re: lessons-from-the-ancient-greeks-about-controlling-future-values, is whether the Greeks would be glad, on reflection, that they didn’t make our values more greek. And one traditional answer, here, is yes. If we could sit down with Aristotle, and explain to him why actually, slavery is wrong, and that no one is by nature someone else’s property, then our hearts and his would sing in harmony. That is, on this story, if Aristotle had somehow prevented future people from abolishing slavery, then he would’ve been making a mistake by his own lights – preventing the flower-he-loves from blooming, via the march of Reason, in history’s hand.
“A master (right) and his slave (left) in a phlyax play, Silician red-figured calyx-krater, c. 350 BC–340 BC.” (Image source here.)
But this isn’t the central story Hanson wants to tell. Rather, when Hanson talks about values changing over time, he specifically wants to deny that Reason has much to do with it. That is, it sounds a lot like Hanson wants to say both that the ancient Greeks would be horrified even on reflection by our values, and that we should take our cues from the ancient Greeks in deciding how much control to try to exert over the values of future people. And at a high level, that sounds like a recipe for, well, being horrified even on reflection by the values of future people. Remind me why that’s good again? Indeed, on any meta-ethics where the normative truth would be revealed to our reflection, we just stipulated that it’s horrifying.
Now, we might try to construct Hanson’s story in other, more complicated ways (see e.g. here for one attempt). But I want to stay, for now, with the dialectic that this version of his view creates, which I think is plenty interesting. In particular: on the one hand, we just stipulated that absent control, the values of future humans would be horrifying/meaningless to us, even on reflection and full understanding. On the other hand, some sort of discomfort in trying to control the values of future humans persists (at least for me). I think Hanson is right to notice it – and to notice, too, its connection to trying to control the values of the AIs. I think the AI alignment discourse should, in fact, prompt this discomfort – and that we should be serious about understanding, and avoiding, the sort of yang-gone-wrong that it’s trying to track.
Indeed, I think when we bring certain other Yudkowskian vibes into view – and in particular, vibes related to the “fragility of value,” “extremal Goodhart,” and “the tails come apart” – this discomfort should deepen yet further. I’ll turn to this in the next essay.
There’s also a bit in the original quote where Robin accuses the AI risk discourse of wanting to use “genocide, slavery, lobotomy, or mind-control” to control the AIs. But this is extra charged (and I don’t know where Robin got the genocide bit), so I want to set it aside for a moment.
Though: how alien is too alien? Hanson doesn’t tend to say. And my sense is that he thinks, too, that even unadulterated Moloch will lead to a complex, diverse, and interesting ecosystem rather than a monoculture. (Though: is a diverse ecosystem of different office-supplies all that much of an improvement?) And also: that this ecosystem will retain various path-dependent “legacies” of the present. (Though: will they be legacies we care about?)
Though, importantly, contemporary AI training does not look like creating a mind from scratch, and raises much more serious “brain-washing” type concerns.
And often glad, too, that the process wasn’t altered in any tiny way at all, lest their existence be canceled by the non-identity problem. But setting that aside.