Help me understand what youâre saying here. Are you saying that Yudkowsky and Soaresâs argument is just so obviously wrong that itâs almost uninteresting to discuss why itâs wrong? That you find the Mechanize co-founders refutation of the Yudkowsky and Soares argument disappointing because you found that argument so weak to begin with?
If so, Iâm not saying thatâs a wrong view â not at all. But itâs worth noting how controversial that view is in the EA community (and other communities that talk a lot about AGI). Essays like this need to be written because so many people in this community (and others) believe Yudkowsky and Soaresâ argument is correct. If my impression of the EA community is off base and actually thereâs a community consensus that Yudkowsky and Soaresâ argument is wrong, then more people should talk about this, because itâs really hard to get the wrong impression.
I think itâs also worth discussing the question of what if AGI turns out to have generally human-like motivations and psychology. What dangers might it pose? How would it behave? But not every relevant and worthy question can be addressed in a single essay.
I donât think that Yudkowsky & Soaresâs argument as a whole is obviously wrong and uninteresting. On the contrary, Iâm rather convinced by it, and I also want more critics to engage with it.
But I think the argument presented in the book was not particularly strong, and others seem to agree: the reviews on this forum are pretty mixed (e.g.). So Iâd prefer critics to argue against the best version of this argument, not just the one presented in the book. If these critics had only set out to write a book review, then Iâd say fine. But thatâs not what they were doing here. They write âthere is no standard argument to respond to, no single text that unifies the AI safety communityââtrue, but you can engage with multiple texts in order to respond to the best form of the argument. In fact thatâs pretty standard, in academia and outside of it.
So, if the best version of Yudkowsky and Soaresâ argument is not the one made in their book, what is the best version? Can you explain how that version of the argument, which they made previously elsewhere, is different than the version in the book?
I canât tell if youâre saying:
a) that the alien preferences thing is not a crux of Yudkowsky and Soaresâ overall argument for AI doom (it seems like it is) or if
b) the version of the specific argument about alien preferences they gave in the book isnât as good as previous versions theyâve given (which is why I asked what version is better) or if
c) youâre saying that Yudkowsky and Soaresâ book overall isnât as good as their previous writings on AI alignment.
I donât know that academic reviewers of Yudkowsky and Soaresâ argument would take a different approach. The book is supposed to be the most up-to-date version of the argument, and one the authors took a lot of care in formulating. It doesnât feel intuitive to go back and look at their earlier writings and compare different version of the argument, which arenât obviously different at first glance. (Will MacAskill and Clara Collier both complained the book wasnât sufficiently different from previous formulations of the argument, i.e. wasnât updated enough in light of advancements in deep learning and deep reinforcement learning over the last decade.) I think an academic reviewer might just trust that Yudkowsky and Soaresâ book is going to be the best thing to read and respond to if they want to engage with their argument.
You might, as an academic, engage in a really close reading of many versions of a similar argument made by Aristotle in different texts, if youâre a scholar of Aristotle, but this level of deep textual analysis doesnât typical apply to contemporary works by lesser-known writers outside academia.
The academic philosopher David Thorstad is writing a blog series in response to the book. I havenât read it yet, so I donât know if he pulls his alternative Yudkowsky and Soares writings other than the book itself. However, I think it would be perfectly fine for him to just focus on the book, and not seek out other texts from the same authors that make the same argument in maybe a better form.
If what youâre saying is that there are multiple independent (and mutually incompatible) arguments for the AI safety communityâs core claims, including ones that Yudkowsky and Soares donât make, then I agree with that. I agree you can criticize that sentence in the Mechanize co-foundersâ essay if you believe Yudkowskyâs views and arguments donât actually unifies (or adequately represents) the views and arguments of the AI safety community overall. Maybe you could point out what those other arguments are and who has formulated them best. Maybe the Mechanize co-founders would write a follow-up piece engaging with those non-Yudkowsky arguments as well, to give a more complete engage with the AI safety communityâs worldview.
The argument Iâm referring to is the AI doom argument. Y&S are its most prominent proponents, but are widely known to be eccentric and not everyone agrees with their presentation of it. Iâm not that deep in the AI safety space myself, but I think thatâs pretty clear.
The authors of this post seemed to respond to the AI doom argument more generally, and took the book to be the best representative of the argument. So that already seems like a questionable move, and I wish theyâd gone further.
I donât think the point about alien preferences is a crux of the AI doom argument generally. I think it itâs presented in Bostromâs Superintelligence and Rob Miles videos (and surely countless other places) as: âan ASI optimising for anything that doesnât fully capture collective human preferences would be disastrous. Since we canât define collective human preferences, this spells disaster.â In that sense it doesnât have to be âalienâ, just different from the collective sum of human preferences. I guess Y&S took the opportunity to say âLLMs seem MUCH more differentâ in an attempt to strengthen their argument, but they didnât have to.
So, as I said, Iâm not really that deep into AI safety, so Iâm not the person to go to for the best version of these arguments. But I read the book, sat down with some friends to discuss it⌠and we each identified flaws, as the authors of this post did, and then found ways to make the argument better, using other ideas weâd been exposed to and some critical reflection. It would have been really nice if the authors of the post had made that second step and steelmanned it a bit.
Thereâs a fine line between steelmanning peopleâs views and creating new views that are facially similar to those views but are crucially different from the views those people actually hold. I think what youâre describing is not steelmanning, but developing your own views different from Yudkowsky and Soaresâ â views that they would almost certainly disagree with in strong terms.
I think it would be constructive for you to publish the views you developed after reading Yudkowsky and Soaresâ book. People might find that useful to read. That could give people something interesting to engage with. But if you write that Yudkowsky and Soaresâ claim about alien preferences is wrong, many people will disagree with you (including Yudkowsky and Soares, if they read it). So, itâs important to get very clear on what different people in a discussion are saying and what theyâre not saying. Just to keep everything straight, at least.
I agree the alien preferences thing is not necessarily a crux of AI doom arguments more generally, but it is certainly a crux of Yudkowsky and Soaresâ overall AI doom argument specifically. Yes, you can change their overall argument into some other argument that doesnât depend on the alien preferences thing anymore, but then thatâs no longer their argument, thatâs a different argument.
I agree that Yudkowsky and Soares (and their book) are not fully representative of the AI safety communityâs views, and probably no single text or person (or pair of people) are. I agree that it isnât really reasonable to say that if you can refute Yudkowsky and Soares (or their book), you refute the AI safety communityâs views overall. So, I agree with that critique.
Help me understand what youâre saying here. Are you saying that Yudkowsky and Soaresâs argument is just so obviously wrong that itâs almost uninteresting to discuss why itâs wrong? That you find the Mechanize co-founders refutation of the Yudkowsky and Soares argument disappointing because you found that argument so weak to begin with?
If so, Iâm not saying thatâs a wrong view â not at all. But itâs worth noting how controversial that view is in the EA community (and other communities that talk a lot about AGI). Essays like this need to be written because so many people in this community (and others) believe Yudkowsky and Soaresâ argument is correct. If my impression of the EA community is off base and actually thereâs a community consensus that Yudkowsky and Soaresâ argument is wrong, then more people should talk about this, because itâs really hard to get the wrong impression.
I think itâs also worth discussing the question of what if AGI turns out to have generally human-like motivations and psychology. What dangers might it pose? How would it behave? But not every relevant and worthy question can be addressed in a single essay.
Thanks Yarrow, I can see that that was confusing.
I donât think that Yudkowsky & Soaresâs argument as a whole is obviously wrong and uninteresting. On the contrary, Iâm rather convinced by it, and I also want more critics to engage with it.
But I think the argument presented in the book was not particularly strong, and others seem to agree: the reviews on this forum are pretty mixed (e.g.). So Iâd prefer critics to argue against the best version of this argument, not just the one presented in the book. If these critics had only set out to write a book review, then Iâd say fine. But thatâs not what they were doing here. They write âthere is no standard argument to respond to, no single text that unifies the AI safety communityââtrue, but you can engage with multiple texts in order to respond to the best form of the argument. In fact thatâs pretty standard, in academia and outside of it.
So, if the best version of Yudkowsky and Soaresâ argument is not the one made in their book, what is the best version? Can you explain how that version of the argument, which they made previously elsewhere, is different than the version in the book?
I canât tell if youâre saying:
a) that the alien preferences thing is not a crux of Yudkowsky and Soaresâ overall argument for AI doom (it seems like it is) or if
b) the version of the specific argument about alien preferences they gave in the book isnât as good as previous versions theyâve given (which is why I asked what version is better) or if
c) youâre saying that Yudkowsky and Soaresâ book overall isnât as good as their previous writings on AI alignment.
I donât know that academic reviewers of Yudkowsky and Soaresâ argument would take a different approach. The book is supposed to be the most up-to-date version of the argument, and one the authors took a lot of care in formulating. It doesnât feel intuitive to go back and look at their earlier writings and compare different version of the argument, which arenât obviously different at first glance. (Will MacAskill and Clara Collier both complained the book wasnât sufficiently different from previous formulations of the argument, i.e. wasnât updated enough in light of advancements in deep learning and deep reinforcement learning over the last decade.) I think an academic reviewer might just trust that Yudkowsky and Soaresâ book is going to be the best thing to read and respond to if they want to engage with their argument.
You might, as an academic, engage in a really close reading of many versions of a similar argument made by Aristotle in different texts, if youâre a scholar of Aristotle, but this level of deep textual analysis doesnât typical apply to contemporary works by lesser-known writers outside academia.
The academic philosopher David Thorstad is writing a blog series in response to the book. I havenât read it yet, so I donât know if he pulls his alternative Yudkowsky and Soares writings other than the book itself. However, I think it would be perfectly fine for him to just focus on the book, and not seek out other texts from the same authors that make the same argument in maybe a better form.
If what youâre saying is that there are multiple independent (and mutually incompatible) arguments for the AI safety communityâs core claims, including ones that Yudkowsky and Soares donât make, then I agree with that. I agree you can criticize that sentence in the Mechanize co-foundersâ essay if you believe Yudkowskyâs views and arguments donât actually unifies (or adequately represents) the views and arguments of the AI safety community overall. Maybe you could point out what those other arguments are and who has formulated them best. Maybe the Mechanize co-founders would write a follow-up piece engaging with those non-Yudkowsky arguments as well, to give a more complete engage with the AI safety communityâs worldview.
The argument Iâm referring to is the AI doom argument. Y&S are its most prominent proponents, but are widely known to be eccentric and not everyone agrees with their presentation of it. Iâm not that deep in the AI safety space myself, but I think thatâs pretty clear.
The authors of this post seemed to respond to the AI doom argument more generally, and took the book to be the best representative of the argument. So that already seems like a questionable move, and I wish theyâd gone further.
I donât think the point about alien preferences is a crux of the AI doom argument generally. I think it itâs presented in Bostromâs Superintelligence and Rob Miles videos (and surely countless other places) as: âan ASI optimising for anything that doesnât fully capture collective human preferences would be disastrous. Since we canât define collective human preferences, this spells disaster.â In that sense it doesnât have to be âalienâ, just different from the collective sum of human preferences. I guess Y&S took the opportunity to say âLLMs seem MUCH more differentâ in an attempt to strengthen their argument, but they didnât have to.
So, as I said, Iâm not really that deep into AI safety, so Iâm not the person to go to for the best version of these arguments. But I read the book, sat down with some friends to discuss it⌠and we each identified flaws, as the authors of this post did, and then found ways to make the argument better, using other ideas weâd been exposed to and some critical reflection. It would have been really nice if the authors of the post had made that second step and steelmanned it a bit.
Thereâs a fine line between steelmanning peopleâs views and creating new views that are facially similar to those views but are crucially different from the views those people actually hold. I think what youâre describing is not steelmanning, but developing your own views different from Yudkowsky and Soaresâ â views that they would almost certainly disagree with in strong terms.
I think it would be constructive for you to publish the views you developed after reading Yudkowsky and Soaresâ book. People might find that useful to read. That could give people something interesting to engage with. But if you write that Yudkowsky and Soaresâ claim about alien preferences is wrong, many people will disagree with you (including Yudkowsky and Soares, if they read it). So, itâs important to get very clear on what different people in a discussion are saying and what theyâre not saying. Just to keep everything straight, at least.
I agree the alien preferences thing is not necessarily a crux of AI doom arguments more generally, but it is certainly a crux of Yudkowsky and Soaresâ overall AI doom argument specifically. Yes, you can change their overall argument into some other argument that doesnât depend on the alien preferences thing anymore, but then thatâs no longer their argument, thatâs a different argument.
I agree that Yudkowsky and Soares (and their book) are not fully representative of the AI safety communityâs views, and probably no single text or person (or pair of people) are. I agree that it isnât really reasonable to say that if you can refute Yudkowsky and Soares (or their book), you refute the AI safety communityâs views overall. So, I agree with that critique.