The argument I’m referring to is the AI doom argument. Y&S are its most prominent proponents, but are widely known to be eccentric and not everyone agrees with their presentation of it. I’m not that deep in the AI safety space myself, but I think that’s pretty clear.
The authors of this post seemed to respond to the AI doom argument more generally, and took the book to be the best representative of the argument. So that already seems like a questionable move, and I wish they’d gone further.
I don’t think the point about alien preferences is a crux of the AI doom argument generally. I think it it’s presented in Bostrom’s Superintelligence and Rob Miles videos (and surely countless other places) as: “an ASI optimising for anything that doesn’t fully capture collective human preferences would be disastrous. Since we can’t define collective human preferences, this spells disaster.” In that sense it doesn’t have to be ‘alien’, just different from the collective sum of human preferences. I guess Y&S took the opportunity to say “LLMs seem MUCH more different” in an attempt to strengthen their argument, but they didn’t have to.
So, as I said, I’m not really that deep into AI safety, so I’m not the person to go to for the best version of these arguments. But I read the book, sat down with some friends to discuss it… and we each identified flaws, as the authors of this post did, and then found ways to make the argument better, using other ideas we’d been exposed to and some critical reflection. It would have been really nice if the authors of the post had made that second step and steelmanned it a bit.
There’s a fine line between steelmanning people’s views and creating new views that are facially similar to those views but are crucially different from the views those people actually hold. I think what you’re describing is not steelmanning, but developing your own views different from Yudkowsky and Soares’ — views that they would almost certainly disagree with in strong terms.
I think it would be constructive for you to publish the views you developed after reading Yudkowsky and Soares’ book. People might find that useful to read. That could give people something interesting to engage with. But if you write that Yudkowsky and Soares’ claim about alien preferences is wrong, many people will disagree with you (including Yudkowsky and Soares, if they read it). So, it’s important to get very clear on what different people in a discussion are saying and what they’re not saying. Just to keep everything straight, at least.
I agree the alien preferences thing is not necessarily a crux of AI doom arguments more generally, but it is certainly a crux of Yudkowsky and Soares’ overall AI doom argument specifically. Yes, you can change their overall argument into some other argument that doesn’t depend on the alien preferences thing anymore, but then that’s no longer their argument, that’s a different argument.
I agree that Yudkowsky and Soares (and their book) are not fully representative of the AI safety community’s views, and probably no single text or person (or pair of people) are. I agree that it isn’t really reasonable to say that if you can refute Yudkowsky and Soares (or their book), you refute the AI safety community’s views overall. So, I agree with that critique.
The argument I’m referring to is the AI doom argument. Y&S are its most prominent proponents, but are widely known to be eccentric and not everyone agrees with their presentation of it. I’m not that deep in the AI safety space myself, but I think that’s pretty clear.
The authors of this post seemed to respond to the AI doom argument more generally, and took the book to be the best representative of the argument. So that already seems like a questionable move, and I wish they’d gone further.
I don’t think the point about alien preferences is a crux of the AI doom argument generally. I think it it’s presented in Bostrom’s Superintelligence and Rob Miles videos (and surely countless other places) as: “an ASI optimising for anything that doesn’t fully capture collective human preferences would be disastrous. Since we can’t define collective human preferences, this spells disaster.” In that sense it doesn’t have to be ‘alien’, just different from the collective sum of human preferences. I guess Y&S took the opportunity to say “LLMs seem MUCH more different” in an attempt to strengthen their argument, but they didn’t have to.
So, as I said, I’m not really that deep into AI safety, so I’m not the person to go to for the best version of these arguments. But I read the book, sat down with some friends to discuss it… and we each identified flaws, as the authors of this post did, and then found ways to make the argument better, using other ideas we’d been exposed to and some critical reflection. It would have been really nice if the authors of the post had made that second step and steelmanned it a bit.
There’s a fine line between steelmanning people’s views and creating new views that are facially similar to those views but are crucially different from the views those people actually hold. I think what you’re describing is not steelmanning, but developing your own views different from Yudkowsky and Soares’ — views that they would almost certainly disagree with in strong terms.
I think it would be constructive for you to publish the views you developed after reading Yudkowsky and Soares’ book. People might find that useful to read. That could give people something interesting to engage with. But if you write that Yudkowsky and Soares’ claim about alien preferences is wrong, many people will disagree with you (including Yudkowsky and Soares, if they read it). So, it’s important to get very clear on what different people in a discussion are saying and what they’re not saying. Just to keep everything straight, at least.
I agree the alien preferences thing is not necessarily a crux of AI doom arguments more generally, but it is certainly a crux of Yudkowsky and Soares’ overall AI doom argument specifically. Yes, you can change their overall argument into some other argument that doesn’t depend on the alien preferences thing anymore, but then that’s no longer their argument, that’s a different argument.
I agree that Yudkowsky and Soares (and their book) are not fully representative of the AI safety community’s views, and probably no single text or person (or pair of people) are. I agree that it isn’t really reasonable to say that if you can refute Yudkowsky and Soares (or their book), you refute the AI safety community’s views overall. So, I agree with that critique.