I don’t think that Yudkowsky & Soares’s argument as a whole is obviously wrong and uninteresting. On the contrary, I’m rather convinced by it, and I also want more critics to engage with it.
But I think the argument presented in the book was not particularly strong, and others seem to agree: the reviews on this forum are pretty mixed (e.g.). So I’d prefer critics to argue against the best version of this argument, not just the one presented in the book. If these critics had only set out to write a book review, then I’d say fine. But that’s not what they were doing here. They write “there is no standard argument to respond to, no single text that unifies the AI safety community”—true, but you can engage with multiple texts in order to respond to the best form of the argument. In fact that’s pretty standard, in academia and outside of it.
So, if the best version of Yudkowsky and Soares’ argument is not the one made in their book, what is the best version? Can you explain how that version of the argument, which they made previously elsewhere, is different than the version in the book?
I can’t tell if you’re saying:
a) that the alien preferences thing is not a crux of Yudkowsky and Soares’ overall argument for AI doom (it seems like it is) or if
b) the version of the specific argument about alien preferences they gave in the book isn’t as good as previous versions they’ve given (which is why I asked what version is better) or if
c) you’re saying that Yudkowsky and Soares’ book overall isn’t as good as their previous writings on AI alignment.
I don’t know that academic reviewers of Yudkowsky and Soares’ argument would take a different approach. The book is supposed to be the most up-to-date version of the argument, and one the authors took a lot of care in formulating. It doesn’t feel intuitive to go back and look at their earlier writings and compare different version of the argument, which aren’t obviously different at first glance. (Will MacAskill and Clara Collier both complained the book wasn’t sufficiently different from previous formulations of the argument, i.e. wasn’t updated enough in light of advancements in deep learning and deep reinforcement learning over the last decade.) I think an academic reviewer might just trust that Yudkowsky and Soares’ book is going to be the best thing to read and respond to if they want to engage with their argument.
You might, as an academic, engage in a really close reading of many versions of a similar argument made by Aristotle in different texts, if you’re a scholar of Aristotle, but this level of deep textual analysis doesn’t typical apply to contemporary works by lesser-known writers outside academia.
The academic philosopher David Thorstad is writing a blog series in response to the book. I haven’t read it yet, so I don’t know if he pulls his alternative Yudkowsky and Soares writings other than the book itself. However, I think it would be perfectly fine for him to just focus on the book, and not seek out other texts from the same authors that make the same argument in maybe a better form.
If what you’re saying is that there are multiple independent (and mutually incompatible) arguments for the AI safety community’s core claims, including ones that Yudkowsky and Soares don’t make, then I agree with that. I agree you can criticize that sentence in the Mechanize co-founders’ essay if you believe Yudkowsky’s views and arguments don’t actually unifies (or adequately represents) the views and arguments of the AI safety community overall. Maybe you could point out what those other arguments are and who has formulated them best. Maybe the Mechanize co-founders would write a follow-up piece engaging with those non-Yudkowsky arguments as well, to give a more complete engage with the AI safety community’s worldview.
The argument I’m referring to is the AI doom argument. Y&S are its most prominent proponents, but are widely known to be eccentric and not everyone agrees with their presentation of it. I’m not that deep in the AI safety space myself, but I think that’s pretty clear.
The authors of this post seemed to respond to the AI doom argument more generally, and took the book to be the best representative of the argument. So that already seems like a questionable move, and I wish they’d gone further.
I don’t think the point about alien preferences is a crux of the AI doom argument generally. I think it it’s presented in Bostrom’s Superintelligence and Rob Miles videos (and surely countless other places) as: “an ASI optimising for anything that doesn’t fully capture collective human preferences would be disastrous. Since we can’t define collective human preferences, this spells disaster.” In that sense it doesn’t have to be ‘alien’, just different from the collective sum of human preferences. I guess Y&S took the opportunity to say “LLMs seem MUCH more different” in an attempt to strengthen their argument, but they didn’t have to.
So, as I said, I’m not really that deep into AI safety, so I’m not the person to go to for the best version of these arguments. But I read the book, sat down with some friends to discuss it… and we each identified flaws, as the authors of this post did, and then found ways to make the argument better, using other ideas we’d been exposed to and some critical reflection. It would have been really nice if the authors of the post had made that second step and steelmanned it a bit.
There’s a fine line between steelmanning people’s views and creating new views that are facially similar to those views but are crucially different from the views those people actually hold. I think what you’re describing is not steelmanning, but developing your own views different from Yudkowsky and Soares’ — views that they would almost certainly disagree with in strong terms.
I think it would be constructive for you to publish the views you developed after reading Yudkowsky and Soares’ book. People might find that useful to read. That could give people something interesting to engage with. But if you write that Yudkowsky and Soares’ claim about alien preferences is wrong, many people will disagree with you (including Yudkowsky and Soares, if they read it). So, it’s important to get very clear on what different people in a discussion are saying and what they’re not saying. Just to keep everything straight, at least.
I agree the alien preferences thing is not necessarily a crux of AI doom arguments more generally, but it is certainly a crux of Yudkowsky and Soares’ overall AI doom argument specifically. Yes, you can change their overall argument into some other argument that doesn’t depend on the alien preferences thing anymore, but then that’s no longer their argument, that’s a different argument.
I agree that Yudkowsky and Soares (and their book) are not fully representative of the AI safety community’s views, and probably no single text or person (or pair of people) are. I agree that it isn’t really reasonable to say that if you can refute Yudkowsky and Soares (or their book), you refute the AI safety community’s views overall. So, I agree with that critique.
Thanks Yarrow, I can see that that was confusing.
I don’t think that Yudkowsky & Soares’s argument as a whole is obviously wrong and uninteresting. On the contrary, I’m rather convinced by it, and I also want more critics to engage with it.
But I think the argument presented in the book was not particularly strong, and others seem to agree: the reviews on this forum are pretty mixed (e.g.). So I’d prefer critics to argue against the best version of this argument, not just the one presented in the book. If these critics had only set out to write a book review, then I’d say fine. But that’s not what they were doing here. They write “there is no standard argument to respond to, no single text that unifies the AI safety community”—true, but you can engage with multiple texts in order to respond to the best form of the argument. In fact that’s pretty standard, in academia and outside of it.
So, if the best version of Yudkowsky and Soares’ argument is not the one made in their book, what is the best version? Can you explain how that version of the argument, which they made previously elsewhere, is different than the version in the book?
I can’t tell if you’re saying:
a) that the alien preferences thing is not a crux of Yudkowsky and Soares’ overall argument for AI doom (it seems like it is) or if
b) the version of the specific argument about alien preferences they gave in the book isn’t as good as previous versions they’ve given (which is why I asked what version is better) or if
c) you’re saying that Yudkowsky and Soares’ book overall isn’t as good as their previous writings on AI alignment.
I don’t know that academic reviewers of Yudkowsky and Soares’ argument would take a different approach. The book is supposed to be the most up-to-date version of the argument, and one the authors took a lot of care in formulating. It doesn’t feel intuitive to go back and look at their earlier writings and compare different version of the argument, which aren’t obviously different at first glance. (Will MacAskill and Clara Collier both complained the book wasn’t sufficiently different from previous formulations of the argument, i.e. wasn’t updated enough in light of advancements in deep learning and deep reinforcement learning over the last decade.) I think an academic reviewer might just trust that Yudkowsky and Soares’ book is going to be the best thing to read and respond to if they want to engage with their argument.
You might, as an academic, engage in a really close reading of many versions of a similar argument made by Aristotle in different texts, if you’re a scholar of Aristotle, but this level of deep textual analysis doesn’t typical apply to contemporary works by lesser-known writers outside academia.
The academic philosopher David Thorstad is writing a blog series in response to the book. I haven’t read it yet, so I don’t know if he pulls his alternative Yudkowsky and Soares writings other than the book itself. However, I think it would be perfectly fine for him to just focus on the book, and not seek out other texts from the same authors that make the same argument in maybe a better form.
If what you’re saying is that there are multiple independent (and mutually incompatible) arguments for the AI safety community’s core claims, including ones that Yudkowsky and Soares don’t make, then I agree with that. I agree you can criticize that sentence in the Mechanize co-founders’ essay if you believe Yudkowsky’s views and arguments don’t actually unifies (or adequately represents) the views and arguments of the AI safety community overall. Maybe you could point out what those other arguments are and who has formulated them best. Maybe the Mechanize co-founders would write a follow-up piece engaging with those non-Yudkowsky arguments as well, to give a more complete engage with the AI safety community’s worldview.
The argument I’m referring to is the AI doom argument. Y&S are its most prominent proponents, but are widely known to be eccentric and not everyone agrees with their presentation of it. I’m not that deep in the AI safety space myself, but I think that’s pretty clear.
The authors of this post seemed to respond to the AI doom argument more generally, and took the book to be the best representative of the argument. So that already seems like a questionable move, and I wish they’d gone further.
I don’t think the point about alien preferences is a crux of the AI doom argument generally. I think it it’s presented in Bostrom’s Superintelligence and Rob Miles videos (and surely countless other places) as: “an ASI optimising for anything that doesn’t fully capture collective human preferences would be disastrous. Since we can’t define collective human preferences, this spells disaster.” In that sense it doesn’t have to be ‘alien’, just different from the collective sum of human preferences. I guess Y&S took the opportunity to say “LLMs seem MUCH more different” in an attempt to strengthen their argument, but they didn’t have to.
So, as I said, I’m not really that deep into AI safety, so I’m not the person to go to for the best version of these arguments. But I read the book, sat down with some friends to discuss it… and we each identified flaws, as the authors of this post did, and then found ways to make the argument better, using other ideas we’d been exposed to and some critical reflection. It would have been really nice if the authors of the post had made that second step and steelmanned it a bit.
There’s a fine line between steelmanning people’s views and creating new views that are facially similar to those views but are crucially different from the views those people actually hold. I think what you’re describing is not steelmanning, but developing your own views different from Yudkowsky and Soares’ — views that they would almost certainly disagree with in strong terms.
I think it would be constructive for you to publish the views you developed after reading Yudkowsky and Soares’ book. People might find that useful to read. That could give people something interesting to engage with. But if you write that Yudkowsky and Soares’ claim about alien preferences is wrong, many people will disagree with you (including Yudkowsky and Soares, if they read it). So, it’s important to get very clear on what different people in a discussion are saying and what they’re not saying. Just to keep everything straight, at least.
I agree the alien preferences thing is not necessarily a crux of AI doom arguments more generally, but it is certainly a crux of Yudkowsky and Soares’ overall AI doom argument specifically. Yes, you can change their overall argument into some other argument that doesn’t depend on the alien preferences thing anymore, but then that’s no longer their argument, that’s a different argument.
I agree that Yudkowsky and Soares (and their book) are not fully representative of the AI safety community’s views, and probably no single text or person (or pair of people) are. I agree that it isn’t really reasonable to say that if you can refute Yudkowsky and Soares (or their book), you refute the AI safety community’s views overall. So, I agree with that critique.