A thought on how we describe existential risks from misaligned AI:
Sometimes discussions focus on a fairly specific version of AI risk, which involves humanity being quickly wiped out. Increasingly, though, the emphasis seems to be on the more abstract idea of “humanity losing control of its future.” I think it might be worthwhile to unpack this latter idea a bit more.
There’s already a fairly strong sense in which humanity has never controlled its own future. For example, looking back ten thousand years, no one decided that the sedentary agriculture would increasingly supplant hunting and gathering, that increasingly complex states would arise, that slavery would become common, that disease would take off, that social hierarchies and gender divisions would become stricter, etc. The transition to the modern world, and everything that came with this transition, also doesn’t seem to have been meaningfully chosen (or even really understood by anyone). The most serious effort to describe a possible future in detail — Hanson’s Age of Em — also describes a future with loads of features that most present-day people would not endorse.
As long as there are still strong competitive pressures or substantial random drift, it seems to me, no generation ever really gets to choose the future.[1] It’s actually sort of ambiguous, then, what it means to worry about “losing control of our future.”
Here are a few alternative versions of the concern that feel a bit crisper to me:
If we ‘mess up on AI,’ then even the most powerful individual humans will have unusually little influence over their own lives or the world around them.[2]
If we ‘mess up on AI,’ then future people may be unusually dissatisfied about the world they live in. In other words, people’s preferences will be unfilled to an unusually large degree.
Humanity may have a rare opportunity to take control of its own future, by achieving strong coordination and then locking various things in. But if we ‘mess up on AI,’ then we’ll miss out on this opportunity.[3]
Something that’s a bit interesting about these alternative versions of the concern, though, is that they’re not inherently linked to AI alignment issues. Even if AI systems behave roughly as their users intend, I believe each of these outcomes is still conceivable. For example, if there’s a missed opportunity to achieve strong coordination around AI, the story might look like the failure of the Baruch Plan for international control of nuclear weapons: that failure had much more to do with politics than it had to do with the way engineers designed the technology in question.
In general, if we move beyond discussing very sharp alignment-related catastrophes (e.g. humanity being quickly wiped out), then I think concerns about misaligned AI start to bleed into broader AI governance concerns. It starts to become more ambiguous whether technical alignment issues are actually central or necessary to the disaster stories people tell.
As an analogy, in the world of The Matrix, people may not actually have much less control over the long-run future than hunter-gatherers did twenty thousand years ago. But they certainly have much less control over their own lives.
Another interpretation of the concern, though related to your (3), is that misaligned AI may cause humanity to lose the potential to control its future. This is consistent with humanity not having (and never having had) actual control of its future; it only requires that this potential exists, and that misaligned AI poses a threat to it.
[ETA: I now realize that I think the following is basically just restating what Pablo already suggested in another comment.]
I think the following is a plausible & stronger concern, which could be read as a stronger version of your crisp concern #3.
“Humanity has not had meaningful control over its future, but AI will now take control one way or the other. Shaping the transition to a future controlled by AI is therefore our first and last opportunity to take control. If we mess up on AI, not only have we failed to seize this opportunity, there also won’t be any other.”
Of course, AI being our first and only opportunity to take control of the future is a strictly stronger claim than AI being one such opportunity. And so it must be less likely. But my impression is that the stronger claim is sufficiently more important that it could be justified to basically ‘wager’ most AI risk work on it being true.
I agree with this general point. I’m not sure if you think this is an interesting point to notice that’s useful for building a world-model, and/or a reason to be skeptical of technical alignment work. I’d agree with the former but disagree with the latter.
I’m not sure if you think this is an interesting point to notice that’s useful for building a world-model, and/or a reason to be skeptical of technical alignment work. I’d agree with the former but disagree with the latter.
Mostly the former!
I think the point may have implications for how much we should prioritize alignment research, relative to other kinds of work, but this depends on what the previous version of someone’s world model was.
For example, if someone has assumed that solving the ‘alignment problem’ is close to sufficient to ensure that humanity has “control” of its future, then absorbing this point (if it’s correct) might cause them to update downward on the expected impact of technical alignment research. Research focused on coordination-related issues (e.g. cooperative AI stuff) might increase in value, at least in relative terms.
Do you have the intuition that absent further technological development, human values would drift arbitrarily far? It’s not clear to me that they would—in that sense, I do feel like we’re “losing control” in that even non-extinction AI is enabling a new set of possibilities that modern-day humans would endorse much less than the decisions of future humans otherwise. (It does also feel like we’re missing the opportunity to “take control” and enable a new set of possibilities that we would endorse much more.)
Relatedly, it doesn’t feel to me like the values of humans 150,000 years ago and humans now and even ems in Age of Em are all that different on some more absolute scale.
Do you have the intuition that absent further technological development, human values would drift arbitrarily far?
Certainly not arbitrarily far. I also think that technological development (esp. the emergence of agriculture and modern industry) has played a much larger role in changing the world over time than random value drift has.
[E]ven non-extinction AI is enabling a new set of possibilities that modern-day humans would endorse much less than the decisions of future humans otherwise.
I definitely think that’s true. But I also think that was true of agriculture, relative to the values of hunter-gatherer societies.
To be clear, I’m not downplaying the likelihood or potential importance of any of the three crisper concerns I listed. For example, I think that AI progress could conceivably lead to a future that is super alienating and bad.
I’m just (a) somewhat pedantically arguing that we shouldn’t frame the concerns as being about a “loss of control over the future” and (b) suggesting that you can rationally have all these same concerns even if you come to believe that technical alignment issues aren’t actually a big deal.
FWIW, I wouldn’t say I agree with the main thesis of that post.
However, while I expect machines that outcompete humans for jobs, I don’t see how that greatly increases the problem of value drift. Human cultural plasticity already ensures that humans are capable of expressing a very wide range of values. I see no obviously limits there. Genetic engineering will allow more changes to humans. Ems inherit human plasticity, and may add even more via direct brain modifications.
In principle, non-em-based artificial intelligence is capable of expressing the entire space of possible values. But in practice, in the shorter run, such AIs will take on social roles near humans, and roles that humans once occupied....
I don’t see why people concerned with value drift should be especially focused on AI. Yes, AI may accompany faster change, and faster change can make value drift worse for people with intermediate discount rates. (Though it seems to me that altruistic discount rates should scale with actual rates of change, not with arbitrary external clocks.)
I definitely think that human biology creates at least very strong biases toward certain values (if not hard constraints) and that AI system would not need to have these same biases. If you’re worried about future agents having super different and bad values, then AI is a natural focal point for your worry.
A couple other possible clarifications about my views here:
I think that the outcome of the AI Revolution could be much worse, relative to our current values, than the Neolithic Revolution was relative to the values of our hunter-gatherer ancestors. But I think the question “Will the outcome be worse?” is distinct from the question “Will we have less freedom to choose the outcome?”
I’m personally not so focused on value drift as a driver of long-run social change. For example, the changes associated with the Neolithic Revolution weren’t really driven by people becoming less egalitarian, more pro-slavery, more inclined to hold certain religious beliefs, more ideologically attached to sedentism/farming, more happy to accept risks from disease, etc. There were value changes, but, to some significant degree, they seem to have been downstream of technological/economic change.
Really appreciate the clarifications! I think I was interpreting “humanity loses control of the future” in a weirdly temporally narrow sense that makes it all about outcomes, i.e. where “humanity” refers to present-day humans, rather than humans at any given time period. I totally agree that future humans may have less freedom to choose the outcome in a way that’s not a consequence of alignment issues.
I also agree value drift hasn’t historically driven long-run social change, though I kind of do think it will going forward, as humanity has more power to shape its environment at will.
I also agree value drift hasn’t historically driven long-run social change
My impression is that the differences in historical vegetarianism rates between India and China, and especially India and southern China (where there is greater similarity of climate and crops used), is a moderate counterpoint. At the timescale of centuries, vegetarianism rates in India are much higher than rates in China. Since factory farming is plausibly one of the larger sources of human-caused suffering today, the differences aren’t exactly a rounding error.
I do agree that quasi-random variation in culture can be really important. And I agree that this variation is sometimes pretty sticky (e.g. Europe being predominantly Christian and the Middle East being predominantly Muslim for more than a thousand years). I wouldn’t say that this kind of variation is a “rounding error.”
Over sufficiently long timespans, though, I think that technological/economic change has been more significant.
As an attempt to operationalize this claim: The average human society in 1000AD was obviously very different than the average human society in 10,000BC. I think that the difference would have been less than half as large (at least in intuitive terms) if there hadn’t been technological/economic change.
I think that the pool of available technology creates biases in the sorts of societies that emerge and stick around. For large enough amounts of technological change, and long enough timespans (long enough for selection pressures to really matter), I think that shifts in these technological biases will explain a large portion of the shifts we see in the traits of the average society.[1]
Would you consider making this into a top-level post? The discussion here is really interesting and could use more attention, and a top-level post helps to deliver that (this also means the post can be tagged for greater searchability).
I think the top-level post could be exactly the text here, plus a link to the Shortform version so people can see those comments. Though I’d also be interested to see the updated version of the original post which takes comments into account (if you felt like doing that).
A thought on how we describe existential risks from misaligned AI:
Sometimes discussions focus on a fairly specific version of AI risk, which involves humanity being quickly wiped out. Increasingly, though, the emphasis seems to be on the more abstract idea of “humanity losing control of its future.” I think it might be worthwhile to unpack this latter idea a bit more.
There’s already a fairly strong sense in which humanity has never controlled its own future. For example, looking back ten thousand years, no one decided that the sedentary agriculture would increasingly supplant hunting and gathering, that increasingly complex states would arise, that slavery would become common, that disease would take off, that social hierarchies and gender divisions would become stricter, etc. The transition to the modern world, and everything that came with this transition, also doesn’t seem to have been meaningfully chosen (or even really understood by anyone). The most serious effort to describe a possible future in detail — Hanson’s Age of Em — also describes a future with loads of features that most present-day people would not endorse.
As long as there are still strong competitive pressures or substantial random drift, it seems to me, no generation ever really gets to choose the future.[1] It’s actually sort of ambiguous, then, what it means to worry about “losing control of our future.”
Here are a few alternative versions of the concern that feel a bit crisper to me:
If we ‘mess up on AI,’ then even the most powerful individual humans will have unusually little influence over their own lives or the world around them.[2]
If we ‘mess up on AI,’ then future people may be unusually dissatisfied about the world they live in. In other words, people’s preferences will be unfilled to an unusually large degree.
Humanity may have a rare opportunity to take control of its own future, by achieving strong coordination and then locking various things in. But if we ‘mess up on AI,’ then we’ll miss out on this opportunity.[3]
Something that’s a bit interesting about these alternative versions of the concern, though, is that they’re not inherently linked to AI alignment issues. Even if AI systems behave roughly as their users intend, I believe each of these outcomes is still conceivable. For example, if there’s a missed opportunity to achieve strong coordination around AI, the story might look like the failure of the Baruch Plan for international control of nuclear weapons: that failure had much more to do with politics than it had to do with the way engineers designed the technology in question.
In general, if we move beyond discussing very sharp alignment-related catastrophes (e.g. humanity being quickly wiped out), then I think concerns about misaligned AI start to bleed into broader AI governance concerns. It starts to become more ambiguous whether technical alignment issues are actually central or necessary to the disaster stories people tell.
Although, admittedly, notable individuals or groups (e.g. early Christians) do sometimes have a fairly lasting and important influence.
As an analogy, in the world of The Matrix, people may not actually have much less control over the long-run future than hunter-gatherers did twenty thousand years ago. But they certainly have much less control over their own lives.
Notably, this is only a bad thing if we expect the relevant generation of humans to choose a better future than would be arrived at by default.
Another interpretation of the concern, though related to your (3), is that misaligned AI may cause humanity to lose the potential to control its future. This is consistent with humanity not having (and never having had) actual control of its future; it only requires that this potential exists, and that misaligned AI poses a threat to it.
I agree with most of what you say here.
[ETA: I now realize that I think the following is basically just restating what Pablo already suggested in another comment.]
I think the following is a plausible & stronger concern, which could be read as a stronger version of your crisp concern #3.
“Humanity has not had meaningful control over its future, but AI will now take control one way or the other. Shaping the transition to a future controlled by AI is therefore our first and last opportunity to take control. If we mess up on AI, not only have we failed to seize this opportunity, there also won’t be any other.”
Of course, AI being our first and only opportunity to take control of the future is a strictly stronger claim than AI being one such opportunity. And so it must be less likely. But my impression is that the stronger claim is sufficiently more important that it could be justified to basically ‘wager’ most AI risk work on it being true.
I agree with this general point. I’m not sure if you think this is an interesting point to notice that’s useful for building a world-model, and/or a reason to be skeptical of technical alignment work. I’d agree with the former but disagree with the latter.
Mostly the former!
I think the point may have implications for how much we should prioritize alignment research, relative to other kinds of work, but this depends on what the previous version of someone’s world model was.
For example, if someone has assumed that solving the ‘alignment problem’ is close to sufficient to ensure that humanity has “control” of its future, then absorbing this point (if it’s correct) might cause them to update downward on the expected impact of technical alignment research. Research focused on coordination-related issues (e.g. cooperative AI stuff) might increase in value, at least in relative terms.
Do you have the intuition that absent further technological development, human values would drift arbitrarily far? It’s not clear to me that they would—in that sense, I do feel like we’re “losing control” in that even non-extinction AI is enabling a new set of possibilities that modern-day humans would endorse much less than the decisions of future humans otherwise. (It does also feel like we’re missing the opportunity to “take control” and enable a new set of possibilities that we would endorse much more.)
Relatedly, it doesn’t feel to me like the values of humans 150,000 years ago and humans now and even ems in Age of Em are all that different on some more absolute scale.
Certainly not arbitrarily far. I also think that technological development (esp. the emergence of agriculture and modern industry) has played a much larger role in changing the world over time than random value drift has.
I definitely think that’s true. But I also think that was true of agriculture, relative to the values of hunter-gatherer societies.
To be clear, I’m not downplaying the likelihood or potential importance of any of the three crisper concerns I listed. For example, I think that AI progress could conceivably lead to a future that is super alienating and bad.
I’m just (a) somewhat pedantically arguing that we shouldn’t frame the concerns as being about a “loss of control over the future” and (b) suggesting that you can rationally have all these same concerns even if you come to believe that technical alignment issues aren’t actually a big deal.
Wow, I just learned that Robin Hanson has written about this, because obviously, and he agrees with you.
And Paul Christiano agrees with me. Truly, time makes fools of us all.
FWIW, I wouldn’t say I agree with the main thesis of that post.
I definitely think that human biology creates at least very strong biases toward certain values (if not hard constraints) and that AI system would not need to have these same biases. If you’re worried about future agents having super different and bad values, then AI is a natural focal point for your worry.
A couple other possible clarifications about my views here:
I think that the outcome of the AI Revolution could be much worse, relative to our current values, than the Neolithic Revolution was relative to the values of our hunter-gatherer ancestors. But I think the question “Will the outcome be worse?” is distinct from the question “Will we have less freedom to choose the outcome?”
I’m personally not so focused on value drift as a driver of long-run social change. For example, the changes associated with the Neolithic Revolution weren’t really driven by people becoming less egalitarian, more pro-slavery, more inclined to hold certain religious beliefs, more ideologically attached to sedentism/farming, more happy to accept risks from disease, etc. There were value changes, but, to some significant degree, they seem to have been downstream of technological/economic change.
Really appreciate the clarifications! I think I was interpreting “humanity loses control of the future” in a weirdly temporally narrow sense that makes it all about outcomes, i.e. where “humanity” refers to present-day humans, rather than humans at any given time period. I totally agree that future humans may have less freedom to choose the outcome in a way that’s not a consequence of alignment issues.
I also agree value drift hasn’t historically driven long-run social change, though I kind of do think it will going forward, as humanity has more power to shape its environment at will.
My impression is that the differences in historical vegetarianism rates between India and China, and especially India and southern China (where there is greater similarity of climate and crops used), is a moderate counterpoint. At the timescale of centuries, vegetarianism rates in India are much higher than rates in China. Since factory farming is plausibly one of the larger sources of human-caused suffering today, the differences aren’t exactly a rounding error.
That’s a good example.
I do agree that quasi-random variation in culture can be really important. And I agree that this variation is sometimes pretty sticky (e.g. Europe being predominantly Christian and the Middle East being predominantly Muslim for more than a thousand years). I wouldn’t say that this kind of variation is a “rounding error.”
Over sufficiently long timespans, though, I think that technological/economic change has been more significant.
As an attempt to operationalize this claim: The average human society in 1000AD was obviously very different than the average human society in 10,000BC. I think that the difference would have been less than half as large (at least in intuitive terms) if there hadn’t been technological/economic change.
I think that the pool of available technology creates biases in the sorts of societies that emerge and stick around. For large enough amounts of technological change, and long enough timespans (long enough for selection pressures to really matter), I think that shifts in these technological biases will explain a large portion of the shifts we see in the traits of the average society.[1]
If selection pressures become a lot weaker in the future, though, then random drift might become more important in relative terms.
Would you consider making this into a top-level post? The discussion here is really interesting and could use more attention, and a top-level post helps to deliver that (this also means the post can be tagged for greater searchability).
I think the top-level post could be exactly the text here, plus a link to the Shortform version so people can see those comments. Though I’d also be interested to see the updated version of the original post which takes comments into account (if you felt like doing that).