It seems to me that the ‘alien preferences’ argument is a red herring. Humans have all kinds of different preferences—only some of ours overlap, and I have no doubt that if one human became superintelligent that would also have a high risk of disaster, precisely because they would have preferences that I don’t share (probably selfish ones). So they don’t need to be alien in any strong sense to be dangerous.
I know it’s Y&S’s argument. But it would have been nice if the authors of this article had also tried to make it stronger before refuting it.
Help me understand what you’re saying here. Are you saying that Yudkowsky and Soares’s argument is just so obviously wrong that it’s almost uninteresting to discuss why it’s wrong? That you find the Mechanize co-founders refutation of the Yudkowsky and Soares argument disappointing because you found that argument so weak to begin with?
If so, I’m not saying that’s a wrong view — not at all. But it’s worth noting how controversial that view is in the EA community (and other communities that talk a lot about AGI). Essays like this need to be written because so many people in this community (and others) believe Yudkowsky and Soares’ argument is correct. If my impression of the EA community is off base and actually there’s a community consensus that Yudkowsky and Soares’ argument is wrong, then more people should talk about this, because it’s really hard to get the wrong impression.
I think it’s also worth discussing the question of what if AGI turns out to have generally human-like motivations and psychology. What dangers might it pose? How would it behave? But not every relevant and worthy question can be addressed in a single essay.
I don’t think that Yudkowsky & Soares’s argument as a whole is obviously wrong and uninteresting. On the contrary, I’m rather convinced by it, and I also want more critics to engage with it.
But I think the argument presented in the book was not particularly strong, and others seem to agree: the reviews on this forum are pretty mixed (e.g.). So I’d prefer critics to argue against the best version of this argument, not just the one presented in the book. If these critics had only set out to write a book review, then I’d say fine. But that’s not what they were doing here. They write “there is no standard argument to respond to, no single text that unifies the AI safety community”—true, but you can engage with multiple texts in order to respond to the best form of the argument. In fact that’s pretty standard, in academia and outside of it.
So, if the best version of Yudkowsky and Soares’ argument is not the one made in their book, what is the best version? Can you explain how that version of the argument, which they made previously elsewhere, is different than the version in the book?
I can’t tell if you’re saying:
a) that the alien preferences thing is not a crux of Yudkowsky and Soares’ overall argument for AI doom (it seems like it is) or if
b) the version of the specific argument about alien preferences they gave in the book isn’t as good as previous versions they’ve given (which is why I asked what version is better) or if
c) you’re saying that Yudkowsky and Soares’ book overall isn’t as good as their previous writings on AI alignment.
I don’t know that academic reviewers of Yudkowsky and Soares’ argument would take a different approach. The book is supposed to be the most up-to-date version of the argument, and one the authors took a lot of care in formulating. It doesn’t feel intuitive to go back and look at their earlier writings and compare different version of the argument, which aren’t obviously different at first glance. (Will MacAskill and Clara Collier both complained the book wasn’t sufficiently different from previous formulations of the argument, i.e. wasn’t updated enough in light of advancements in deep learning and deep reinforcement learning over the last decade.) I think an academic reviewer might just trust that Yudkowsky and Soares’ book is going to be the best thing to read and respond to if they want to engage with their argument.
You might, as an academic, engage in a really close reading of many versions of a similar argument made by Aristotle in different texts, if you’re a scholar of Aristotle, but this level of deep textual analysis doesn’t typical apply to contemporary works by lesser-known writers outside academia.
The academic philosopher David Thorstad is writing a blog series in response to the book. I haven’t read it yet, so I don’t know if he pulls his alternative Yudkowsky and Soares writings other than the book itself. However, I think it would be perfectly fine for him to just focus on the book, and not seek out other texts from the same authors that make the same argument in maybe a better form.
If what you’re saying is that there are multiple independent (and mutually incompatible) arguments for the AI safety community’s core claims, including ones that Yudkowsky and Soares don’t make, then I agree with that. I agree you can criticize that sentence in the Mechanize co-founders’ essay if you believe Yudkowsky’s views and arguments don’t actually unifies (or adequately represents) the views and arguments of the AI safety community overall. Maybe you could point out what those other arguments are and who has formulated them best. Maybe the Mechanize co-founders would write a follow-up piece engaging with those non-Yudkowsky arguments as well, to give a more complete engage with the AI safety community’s worldview.
The argument I’m referring to is the AI doom argument. Y&S are its most prominent proponents, but are widely known to be eccentric and not everyone agrees with their presentation of it. I’m not that deep in the AI safety space myself, but I think that’s pretty clear.
The authors of this post seemed to respond to the AI doom argument more generally, and took the book to be the best representative of the argument. So that already seems like a questionable move, and I wish they’d gone further.
I don’t think the point about alien preferences is a crux of the AI doom argument generally. I think it it’s presented in Bostrom’s Superintelligence and Rob Miles videos (and surely countless other places) as: “an ASI optimising for anything that doesn’t fully capture collective human preferences would be disastrous. Since we can’t define collective human preferences, this spells disaster.” In that sense it doesn’t have to be ‘alien’, just different from the collective sum of human preferences. I guess Y&S took the opportunity to say “LLMs seem MUCH more different” in an attempt to strengthen their argument, but they didn’t have to.
So, as I said, I’m not really that deep into AI safety, so I’m not the person to go to for the best version of these arguments. But I read the book, sat down with some friends to discuss it… and we each identified flaws, as the authors of this post did, and then found ways to make the argument better, using other ideas we’d been exposed to and some critical reflection. It would have been really nice if the authors of the post had made that second step and steelmanned it a bit.
There’s a fine line between steelmanning people’s views and creating new views that are facially similar to those views but are crucially different from the views those people actually hold. I think what you’re describing is not steelmanning, but developing your own views different from Yudkowsky and Soares’ — views that they would almost certainly disagree with in strong terms.
I think it would be constructive for you to publish the views you developed after reading Yudkowsky and Soares’ book. People might find that useful to read. That could give people something interesting to engage with. But if you write that Yudkowsky and Soares’ claim about alien preferences is wrong, many people will disagree with you (including Yudkowsky and Soares, if they read it). So, it’s important to get very clear on what different people in a discussion are saying and what they’re not saying. Just to keep everything straight, at least.
I agree the alien preferences thing is not necessarily a crux of AI doom arguments more generally, but it is certainly a crux of Yudkowsky and Soares’ overall AI doom argument specifically. Yes, you can change their overall argument into some other argument that doesn’t depend on the alien preferences thing anymore, but then that’s no longer their argument, that’s a different argument.
I agree that Yudkowsky and Soares (and their book) are not fully representative of the AI safety community’s views, and probably no single text or person (or pair of people) are. I agree that it isn’t really reasonable to say that if you can refute Yudkowsky and Soares (or their book), you refute the AI safety community’s views overall. So, I agree with that critique.
I have no doubt that if one human became superintelligent that would also have a high risk of disaster, precisely because they would have preferences that I don’t share (probably selfish ones)
I would worry if a single human had much more power than all other humans combined. Likewise, I would worry if an AI agent had more power than all other AI agents and humans combined. However, I think the probability of any of these scenarios becoming true in the next 10 years is lower than 0.001 %. Elon Musk has a net worth of 765 billion $, 0.543 % (= 765*10^9/(141*10^12)) of the market cap of all publicly listed companies of 141 T$.
Elon Musk has already used this power to do actions which will potentially kill millions (by funding the Trump campaign enough to get to close down USAID). I think that should worry us, and the chance of people amassing even more power should worry us even more.
Hi Guy. Elon Musk was not the only person responsible for the recent large cuts in foreign aid from the United States (US). In addition, I believe outcomes like human extinction are way less likely. I agree it makes sense to worry about concentration of power, but not about extreme outcomes like human extinction.
I think the evolution analogy becomes relevant again here: consider that the genus Homo was at first more intelligent than other species but not more powerful than their numbers combined… until suddenly one jump in intelligence let homo sapiens wreak havoc across the globe. Similarly, there might be a tipping point in AI intelligence where fighting back becomes very suddenly infeasible. I think this is a much better analogy than Elon Musk, because like an evolving species a superintelligent AI can multiply and self-improve.
I think a good point that Y&S make is that we shouldn’t expect to know where the point of no return is, and should be prudent enough to stop well before it. I suppose you must have some source/reason for the 0.001% confidence claim, but it seems pretty wild to me to be so confident in a field like that is evolving and—at least from my perspective—pretty hard to understand.
It is unclear to me whether all humans together are more powerful than all other organisms on Earth together. It depends on what is meat by powerful. The power consumption of humans is 19.6 TW (= 1.07 + 18.5), only 0.700 % (= 19.6/(2.8*10^3)) of all organisms. In any case, all humans together being more powerful than all other organisms on Earth together is still way more likely than the most powerful human being much more powerful than all other organisms on Earth together.
My upper bound of 0.001 % is just a guess, but I do endorse it. You can have a best guess that an event in very unlikely, but still be super uncertain about its probability. For example, one could believe an event has a probability of 10^-100 to 10^-10, which would imply it is super unlikely despite 90 (= −10 - (-100)) orders of magnitude (OOMs) of uncertainty in the probability.
By power I mean: ability to change the world, according to one’s preferences. Humans clearly dominate today in terms of this kind of power. Our power is limited, but it is not the case that other organisms have power over us, because while we might rely on them, they are not able to leverage that dependency. Rather, we use them as much as we can.
No human is currently so powerful as to have power over all other humans, and I think that’s definitely a good thing. But it doesn’t seem like it would take much more advantage to let one intelligent being dominate all others.
Are you thinking about humans as an aligned collective in the 1st paragraph of your comment? I agree all humans coordinating their actions together would have more power than other groups of organisms with their actual levels of coordination. However, such level of coordination among humans is not realistic. All 10^30 bacteria (see Table S1 of Bar-On et al. (2018)) coordinating their actions together would arguably also have more power than all humans with their actual level of coordination.
I agree it is good that no human has power over all humans. However, I still think one being dominating all others has a probability lower than 0.001 % over the next 10 years. I am open tobetsagainst short AI timelines, or what they supposedly imply, up to 10 k$. Do you see any that we could make that is good for both of us under our own views?
It seems to me that the ‘alien preferences’ argument is a red herring. Humans have all kinds of different preferences—only some of ours overlap, and I have no doubt that if one human became superintelligent that would also have a high risk of disaster, precisely because they would have preferences that I don’t share (probably selfish ones). So they don’t need to be alien in any strong sense to be dangerous.
I know it’s Y&S’s argument. But it would have been nice if the authors of this article had also tried to make it stronger before refuting it.
Help me understand what you’re saying here. Are you saying that Yudkowsky and Soares’s argument is just so obviously wrong that it’s almost uninteresting to discuss why it’s wrong? That you find the Mechanize co-founders refutation of the Yudkowsky and Soares argument disappointing because you found that argument so weak to begin with?
If so, I’m not saying that’s a wrong view — not at all. But it’s worth noting how controversial that view is in the EA community (and other communities that talk a lot about AGI). Essays like this need to be written because so many people in this community (and others) believe Yudkowsky and Soares’ argument is correct. If my impression of the EA community is off base and actually there’s a community consensus that Yudkowsky and Soares’ argument is wrong, then more people should talk about this, because it’s really hard to get the wrong impression.
I think it’s also worth discussing the question of what if AGI turns out to have generally human-like motivations and psychology. What dangers might it pose? How would it behave? But not every relevant and worthy question can be addressed in a single essay.
Thanks Yarrow, I can see that that was confusing.
I don’t think that Yudkowsky & Soares’s argument as a whole is obviously wrong and uninteresting. On the contrary, I’m rather convinced by it, and I also want more critics to engage with it.
But I think the argument presented in the book was not particularly strong, and others seem to agree: the reviews on this forum are pretty mixed (e.g.). So I’d prefer critics to argue against the best version of this argument, not just the one presented in the book. If these critics had only set out to write a book review, then I’d say fine. But that’s not what they were doing here. They write “there is no standard argument to respond to, no single text that unifies the AI safety community”—true, but you can engage with multiple texts in order to respond to the best form of the argument. In fact that’s pretty standard, in academia and outside of it.
So, if the best version of Yudkowsky and Soares’ argument is not the one made in their book, what is the best version? Can you explain how that version of the argument, which they made previously elsewhere, is different than the version in the book?
I can’t tell if you’re saying:
a) that the alien preferences thing is not a crux of Yudkowsky and Soares’ overall argument for AI doom (it seems like it is) or if
b) the version of the specific argument about alien preferences they gave in the book isn’t as good as previous versions they’ve given (which is why I asked what version is better) or if
c) you’re saying that Yudkowsky and Soares’ book overall isn’t as good as their previous writings on AI alignment.
I don’t know that academic reviewers of Yudkowsky and Soares’ argument would take a different approach. The book is supposed to be the most up-to-date version of the argument, and one the authors took a lot of care in formulating. It doesn’t feel intuitive to go back and look at their earlier writings and compare different version of the argument, which aren’t obviously different at first glance. (Will MacAskill and Clara Collier both complained the book wasn’t sufficiently different from previous formulations of the argument, i.e. wasn’t updated enough in light of advancements in deep learning and deep reinforcement learning over the last decade.) I think an academic reviewer might just trust that Yudkowsky and Soares’ book is going to be the best thing to read and respond to if they want to engage with their argument.
You might, as an academic, engage in a really close reading of many versions of a similar argument made by Aristotle in different texts, if you’re a scholar of Aristotle, but this level of deep textual analysis doesn’t typical apply to contemporary works by lesser-known writers outside academia.
The academic philosopher David Thorstad is writing a blog series in response to the book. I haven’t read it yet, so I don’t know if he pulls his alternative Yudkowsky and Soares writings other than the book itself. However, I think it would be perfectly fine for him to just focus on the book, and not seek out other texts from the same authors that make the same argument in maybe a better form.
If what you’re saying is that there are multiple independent (and mutually incompatible) arguments for the AI safety community’s core claims, including ones that Yudkowsky and Soares don’t make, then I agree with that. I agree you can criticize that sentence in the Mechanize co-founders’ essay if you believe Yudkowsky’s views and arguments don’t actually unifies (or adequately represents) the views and arguments of the AI safety community overall. Maybe you could point out what those other arguments are and who has formulated them best. Maybe the Mechanize co-founders would write a follow-up piece engaging with those non-Yudkowsky arguments as well, to give a more complete engage with the AI safety community’s worldview.
The argument I’m referring to is the AI doom argument. Y&S are its most prominent proponents, but are widely known to be eccentric and not everyone agrees with their presentation of it. I’m not that deep in the AI safety space myself, but I think that’s pretty clear.
The authors of this post seemed to respond to the AI doom argument more generally, and took the book to be the best representative of the argument. So that already seems like a questionable move, and I wish they’d gone further.
I don’t think the point about alien preferences is a crux of the AI doom argument generally. I think it it’s presented in Bostrom’s Superintelligence and Rob Miles videos (and surely countless other places) as: “an ASI optimising for anything that doesn’t fully capture collective human preferences would be disastrous. Since we can’t define collective human preferences, this spells disaster.” In that sense it doesn’t have to be ‘alien’, just different from the collective sum of human preferences. I guess Y&S took the opportunity to say “LLMs seem MUCH more different” in an attempt to strengthen their argument, but they didn’t have to.
So, as I said, I’m not really that deep into AI safety, so I’m not the person to go to for the best version of these arguments. But I read the book, sat down with some friends to discuss it… and we each identified flaws, as the authors of this post did, and then found ways to make the argument better, using other ideas we’d been exposed to and some critical reflection. It would have been really nice if the authors of the post had made that second step and steelmanned it a bit.
There’s a fine line between steelmanning people’s views and creating new views that are facially similar to those views but are crucially different from the views those people actually hold. I think what you’re describing is not steelmanning, but developing your own views different from Yudkowsky and Soares’ — views that they would almost certainly disagree with in strong terms.
I think it would be constructive for you to publish the views you developed after reading Yudkowsky and Soares’ book. People might find that useful to read. That could give people something interesting to engage with. But if you write that Yudkowsky and Soares’ claim about alien preferences is wrong, many people will disagree with you (including Yudkowsky and Soares, if they read it). So, it’s important to get very clear on what different people in a discussion are saying and what they’re not saying. Just to keep everything straight, at least.
I agree the alien preferences thing is not necessarily a crux of AI doom arguments more generally, but it is certainly a crux of Yudkowsky and Soares’ overall AI doom argument specifically. Yes, you can change their overall argument into some other argument that doesn’t depend on the alien preferences thing anymore, but then that’s no longer their argument, that’s a different argument.
I agree that Yudkowsky and Soares (and their book) are not fully representative of the AI safety community’s views, and probably no single text or person (or pair of people) are. I agree that it isn’t really reasonable to say that if you can refute Yudkowsky and Soares (or their book), you refute the AI safety community’s views overall. So, I agree with that critique.
Thanks for the comment, Tristan.
I would worry if a single human had much more power than all other humans combined. Likewise, I would worry if an AI agent had more power than all other AI agents and humans combined. However, I think the probability of any of these scenarios becoming true in the next 10 years is lower than 0.001 %. Elon Musk has a net worth of 765 billion $, 0.543 % (= 765*10^9/(141*10^12)) of the market cap of all publicly listed companies of 141 T$.
Elon Musk has already used this power to do actions which will potentially kill millions (by funding the Trump campaign enough to get to close down USAID). I think that should worry us, and the chance of people amassing even more power should worry us even more.
Hi Guy. Elon Musk was not the only person responsible for the recent large cuts in foreign aid from the United States (US). In addition, I believe outcomes like human extinction are way less likely. I agree it makes sense to worry about concentration of power, but not about extreme outcomes like human extinction.
Extinction perhaps not, but I think eternal autocracy is definitely possible.
I think the evolution analogy becomes relevant again here: consider that the genus Homo was at first more intelligent than other species but not more powerful than their numbers combined… until suddenly one jump in intelligence let homo sapiens wreak havoc across the globe. Similarly, there might be a tipping point in AI intelligence where fighting back becomes very suddenly infeasible. I think this is a much better analogy than Elon Musk, because like an evolving species a superintelligent AI can multiply and self-improve.
I think a good point that Y&S make is that we shouldn’t expect to know where the point of no return is, and should be prudent enough to stop well before it. I suppose you must have some source/reason for the 0.001% confidence claim, but it seems pretty wild to me to be so confident in a field like that is evolving and—at least from my perspective—pretty hard to understand.
It is unclear to me whether all humans together are more powerful than all other organisms on Earth together. It depends on what is meat by powerful. The power consumption of humans is 19.6 TW (= 1.07 + 18.5), only 0.700 % (= 19.6/(2.8*10^3)) of all organisms. In any case, all humans together being more powerful than all other organisms on Earth together is still way more likely than the most powerful human being much more powerful than all other organisms on Earth together.
My upper bound of 0.001 % is just a guess, but I do endorse it. You can have a best guess that an event in very unlikely, but still be super uncertain about its probability. For example, one could believe an event has a probability of 10^-100 to 10^-10, which would imply it is super unlikely despite 90 (= −10 - (-100)) orders of magnitude (OOMs) of uncertainty in the probability.
By power I mean: ability to change the world, according to one’s preferences. Humans clearly dominate today in terms of this kind of power. Our power is limited, but it is not the case that other organisms have power over us, because while we might rely on them, they are not able to leverage that dependency. Rather, we use them as much as we can.
No human is currently so powerful as to have power over all other humans, and I think that’s definitely a good thing. But it doesn’t seem like it would take much more advantage to let one intelligent being dominate all others.
Are you thinking about humans as an aligned collective in the 1st paragraph of your comment? I agree all humans coordinating their actions together would have more power than other groups of organisms with their actual levels of coordination. However, such level of coordination among humans is not realistic. All 10^30 bacteria (see Table S1 of Bar-On et al. (2018)) coordinating their actions together would arguably also have more power than all humans with their actual level of coordination.
I agree it is good that no human has power over all humans. However, I still think one being dominating all others has a probability lower than 0.001 % over the next 10 years. I am open to bets against short AI timelines, or what they supposedly imply, up to 10 k$. Do you see any that we could make that is good for both of us under our own views?