Some choice quotes from Clara Collierâs incisive review of If Anyone Builds It, Everyone Dies in Asterisk Magazine:
Itâs true that the book is more up-to-date and accessible than the authorsâ vast corpus of prior writings, not to mention marginally less condescending. Unfortunately, it is also significantly less coherent. The book is full of examples that donât quite make sense and premises that arenât fully explained. But its biggest weakness was described many years ago by a young blogger named Eliezer Yudkowsky: both authors are persistently unable to update their priors.
About the unexplained shift of focus from symbolic AI, which Yudkowsky was still claiming as of around 2015 or 2016 â quite late in the game, all things considered â was more likely than deep learning to lead to AGI, to deep learning:
Weâve learned a lot since 2008. The models Yudkowsky describes in those old posts on LessWrong and Overcoming Bias were hand-coded, each one running on its own bespoke internal architecture. Like mainstream AI researchers at the time, he didnât think deep learning had much potential, and for years he was highly skeptical of neural networks. (To his credit, heâs admitted that that was a mistake.) But If Anyone Builds It, Everyone Dies very much is about deep learning-based neural networks. The authors discuss these systems extensively â and come to the exact same conclusions they always have. The fundamental architecture, training methods and requirements for progress for modern AI systems are all completely different from the technology Yudkowsky imagined in 2008, yet nothing about the core MIRI story has changed.
Building on this:
In fact, there are plenty of reasons why the fact that AIs are grown and not crafted might cut against the MIRI argument. For one: The most advanced, generally capable AI systems around today are trained on human-generated text, encoding human values and modes of thought.
I still have no idea when, why, or how exactly Eliezer Yudkowsky changed his mind about symbolic AI vs. deep learning. This seems quite fundamental to his, Nate Soaresâ, and MIRIâs case, yet as far as I know, itâs never been discussed. Iâve looked, and Iâve asked around. Iâm not reassured Yudkowsky has a good understanding of deep learning, and, per Clara Collierâs review, it really doesnât seem like the core MIRI case has been updated since the pre-deep learning era in the late 2000s. If deep learning doesnât change things, Yudkowsky/âMIRI should explain why not. If it does change things, then Yudkowsky/âMIRIâs views and arguments should be updated to reflect that.
Also, itâs worth reflecting on how unrealistic Yudkowskyâs belief in symbolic AI now appears given that weâve had over a decade of deep learning-based and deep reinforcement learning-based AI systems that are astronomically more capable than any symbolic AI systems ever were, and yet these systems are still far below human-level. Deep learning is vastly more capable than symbolic AI and even deep learning is still vastly less capable than the average human (or, on some dimensions, the average cat). So, it really seems unrealistic to think symbolic AI could have led to AGI, especially on the short timescales Yudkowsky was imagining in the 2000s and early-to-mid 2010s.
It makes the whole thing look a little odd. If deep neural networks had never been invented, eventually, at some point, surely it would have become evident that symbolic AI was never going to lead to anything interesting or powerful. Maybe in this counterfactual timeline, by the 2040s, with no meaningful progress in AI, people who had believed Yudkowskyâs arguments would start to have doubts. Itâs odd that deep neural networks were invented and then Yudkowsky abandoned this forlorn theory about symbolic AI, yet changed very little, if anything, when creating a new version of the theory about deep learning. Itâs also quite odd that he switched from the old version of the theory to the new version with no public explanation, as far as Iâve been able to find.
This seems consistent with a general pattern of reluctance to admit mistakes.
If deep learning doesnât change things, Yudkowsky/âMIRI should explain why not.
Speaking in my capacity as someone who currently works for MIRI, but who emphatically does not understand all things that Eliezer Yudkowsky understands, and canât authoritatively represent him (or Nate, or the other advanced researchers at MIRI who are above my intellectual pay grade):
My own understanding is that Eliezer has, all along, for as long as Iâve known him and been following his work, been fairly agnostic as to questions of how AGI and ASI will be achieved, and what the underlying architectures of the systems will be.
Iâve often seen Eliezer say âI think X will not workâ or âI think Y is less doomed than X,â but in my experience itâs always been with a sort of casual shrug and an attitude of âbut of course these are very hard callsâ and also with âand it doesnât really matter to the ultimate outcome except insofar as some particular architecture might make reliable alignment possible at all.â
Eliezerâs take (again, as I understand it) is something like âif you have a system that is intelligent enough and powerful enough to do the actual interesting work that humans want to do, such as end all wars and invent longevity technology and get us to the stars (and achieve these goals in the real world, which involves also being competent at things like persuasion and communication), then that system is going to be very, very, very hard to make safe. Itâs going to be easier by many orders of magnitude to create systems that are capable of that level of sophisticated agency that donât care about human flourishing, than it will be to hit the narrow target of a sufficiently sophisticated system that also does in fact happen to care.â
Thatâs true regardless of whether youâre working with deep learning or symbolic AI. In fact, deep learning makes it worseâEliezer was pointing at âeven if you build this thing out of nuts and bolts that you thoroughly understand, alignment is a hard problem,â and instead we have ended up in a timeline where the systems are grown rather than crafted, giving us even less reason to be confident or hopeful.
(This is a trend: people often misunderstand MIRIâs attempts to underscore how hard the problem is as being concrete predictions about what will happen, c.f. the era in which people were like, well, obviously any competent lab trying to build ASI will keep their systems airgapped and secure and have a very small number of trusted and monitored employees acting as intermediaries. MIRIâs response was to demonstrate how even in such a paradigm, a sufficiently sophisticated system would have little trouble escaping the box. Now, all of the frontier labs routinely feed their systems the entire internet and let those systems interact with any human on Earth and in many cases let those systems write and deploy their own code with no oversight, and some people say âhaha, look, MIRI was wrong.â Those people are confused.)
Symbolic AI vs. deep learning was never a crux, for Eliezer or the MIRI view. It was a non-crucial sidebar in which Eliezer had some intuitions and guesses, some of which he was more confident about and others less confident, and some of those guesses turned out wrong, and none of that ever mattered to the larger picture. The crucial considerations are the power/âsophistication/âintelligence of the system, and the degree to which its true goals can be specified/âpinned-down, and being wrong about whether deep learning or symbolic AI specifically were capable of reaching the required level of sophistication is mostly irrelevant.
One could argue âwell, Eliezer proved himself incapable of predicting the future with those guesses!â but this would be, in my view, disingenuous. Eliezer has long said, and continues to say, âlook, guesses about how the board will look in the middle of the chess game are fraught, Iâm willing to share my intuitions but they are far more likely to be wrong than right; itâs hard to know what moves Stockfish will make or how the game will play out; what matters is that itâs still easy to know with high confidence that Stockfish will win.â
That claim was compelling to me in 2015, and it remains compelling to me in 2025, and the things that have happened in the world in the ten years in between have, on the whole, made the case for concern stronger rather than weaker.
To draw up one comment from your response below:
The author of the reviewâs review does not demonstrate to me that they understand Collierâs point.
...Collierâs review does not even convincingly demonstrate that they read the book, since they get some extremely basic facts about it loudly, loudly wrong, in a manner thatâs fairly crucial for their criticisms. I think that you should hold the reviewer and the reviewâs reviewer to the same standard, rather than letting the person you agree with more off the hook.
Fair warning: I wrote this response less for Yarrow specifically and more for the benefit of the EA forum userbase writ large, so Iâm not promising that I will engage much beyond this reply. I might! But I also might not. I think I said the most important thing I had to say, in the above.
EDIT: oh, for more on how this:
In fact, there are plenty of reasons why the fact that AIs are grown and not crafted might cut against the MIRI argument. For one: The most advanced, generally capable AI systems around today are trained on human-generated text, encoding human values and modes of thought.
...is badly, badly wrong, see the supplemental materials for the book, particularly chapters 2, 3, and 4, which exhaustively addressed this point long before Collier ever made it, because we knew people would make it. (Itâs also addressed in the book, but I guess Collier missed that in their haste to say a bunch of things it seems they already believed and were going to say regardless.)
I think you might be engaging in a bit of Motte-and-Baileying here. Throughout this comment, youâre stating MIRIâs position as things like âit will be hard to make ASI safeâ, and that AI will âwinâ, and that it will be hard for an AI to be perfectly aligned with âhuman flourishingâ Those statements seem pretty reasonable.
But the actual stance of MIRI, which you just released a book about, is that there is an extremely high chance that building powerful AI will result in everybody on planet earth being killed. Thatâs a much narrower and more specific claim. You can imagine a lot of scenarios where AI is unsafe, but not in a way that kills everyone. You can imagine cases where AI âwinsâ, but decides to cut a deal with us. You can imagine cases where an AI doesnât care about human flourishing because it doesnât care about anything, it ends up acting like a tool that we can direct as we please.
Iâm aware that you have counterarguments for all of these cases (that I will probably disagree with). But these counterarguments will have to be rooted in the actual nuts and bolts details of how actual, physical AI works. And if you are trying to reason about future machines, you want to be able to get a good prediction about their actual characteristics.
I think in this context, itâs totally reasonable for people to look at your (in my opinion poor) track record of prediction and adjust their credence in your effectiveness as an institution.
I disagree re: motte and bailey; the above is not at all in conflict with the position of the book (which, to be clear, I endorse and agree with and is also my position).
re: âyou can imagine,â I strongly encourage people to be careful about leaning too hard on their own ability to imagine things; itâs often fraught and a huge chunk of the work MIRI does is poking at those imaginings to see where they collapse.
Iâll note that core MIRI predictions about e.g. how machines will be misaligned at current levels of sophistication are being borne outâthings we have been saying for years about e.g. emergent drives and deception and hacking and brittle proxies. Iâm pretty sure thatâs not ârooted in the actual nuts and bolts detailsâ in the way youâre wanting, but it still feels ⊠relevant.
Thanks @Duncan Sabien for this excellent explanation. Donât undersell yourself, I rate your communication here at least as good (if not better) than that of other senior MIRI people in recent years.
Thank you for attempting to explain Yudkowskyâs views as you understand them.
I donât think anybody would be convinced by an a priori argument that any and all imaginable forms of AGI or ASI are highly existentially dangerous as a consequence of being prone to the sort of alignment failures Yudkowsky imagined in the 2000s, in the pre-deep learning era, regardless of the technological paradigm underlying them, simply by virtue of being AGI or ASI. I think you have to make some specific assumptions about how the underlying technology works. Itâs not clear what assumptions Yudkowsky is making about that.
A lot of Yudkowskyâs (and Yudkowsky and Soaresâ) arguments about why deep learning is dangerous seem to depend on very loose, hazy analogies (like this). Yudkowsky doesnât have any expertise, education, training, or research/âengineering experience with deep learning. Some deep learning experts say he doesnât know what heâs talking about, on even a basic level. He responded to this, but was unable to defend the technical point he was making â his response was more about vaguely casting aspersions and a preoccupation with social status over and above technical matters, as is often the case. So, Iâm not sure he actually understands the underlying technology well enough to make a convincing, substantive, deep, detailed version of the argument he wants to make.
Whether deep learning is dissimilar from human cognition, and specifically dissimilar in a way that makes it 99.5% likely to cause human extinction, is not some side issue, but the topic on which the whole debate depends. If deep learning is highly similar to human cognition, or dissimilar but in a way that doesnât make it existentially dangerous, then AGI and ASI based on deep learning would not have a 99.5% chance of causing human extinction. Thatâs not a minor detail. Thatâs the whole ballgame.
Also, as an aside, itâs bewildering to me how poorly Yudkowsky and others at MIRI take criticism of their ideas. In response to any sort of criticism or disagreement, Yudkowsky and other folksâ default response seems to be to fly into a rage and to try to attack or humiliate the person making the criticism/âexpressing the disagreement. Yudkowsky has explicitly said he believes heâs by far the smartest person on Earth, at least when it comes to the topic of AGI safety/âalignment. He seems indignant at having to talk to people who are so much less intelligent than he is. Unfortunately, his attitude seems to have become the MIRI culture. (Soares has also apparently contributed to this dynamic.)
If youâre doing technical research, maybe you can get away with that â but even then I donât think you can, because an inability to hear criticism/âdisagreement makes you much worse at research. But now that MIRI has pivoted to communications and advocacy, this will be an even more serious problem. If Yudkowsky and others at MIRI are incapable of engaging in civil intellectual debate, or are simply unwilling to, how on Earth are they going to be effective at advocacy and communications?
âSome experts downvote Yudkowskyâs standing to opineâ is not a reasonable standard; some experts think vaccines cause autism. You can usually find someone with credentials in a field who will say almost anything.
The responsible thing to do (EDIT: if youâre deferring at all, as opposed to evaluating the situation for yourself) is to go look at the balance of what experts in a field are saying, and in this case, theyâre fairly split, with plenty of respected big names (including many who disagree with Eliezer on many questions) saying he knows enough of what heâs talking about to be worth listening to. I get that Yarrow is not convinced, but I trust Hinton, who has reservations of his own but not of the form âEliezer should be dismissed out of hand for lack of some particular technical expertise.â
Also: when the experts in a field are split, and the question is one of existential danger, it seems that the splitness itself is not reassuring. Experts in nuclear physics do not drastically diverge in their predictions about what will happen inside a bomb or reactor, because we understand nuclear physics. When experts in the field of artificial intelligence have wildly different predictions and the disagreement cannot be conclusively resolved, this is a sign of looseness in everyoneâs understanding, and when you ask normal people on the street âhey, if one expert says an invention will kill everyone, and another says it wonât, and you ask the one who says it wonât where their confidence comes from, and they say âbecause Iâm pretty sure weâll muddle our way through, with unproven techniques that havenât been invented yet, the risk of killing everyone is probably under 5%,â how do you feel?â
they tend to feel alarmed.
And that characterization is not uncharitableâthe optimists in this debate do not have an actual concrete plan. You can just go check. It all ultimately boils down to handwaving and platitudes and âIâm sure weâll stay ahead of capabilities [for no explicable reason].â
And weâre intentionally aiming at something that exceeds us along the very axis that led us to dominate the planet, so ⊠?
Another way of saying this: itâs very, very weird that the burden of proof on this brand-new and extremely powerful technology is âmake an airtight case that itâs dangerousâ instead of âmake an airtight case that itâs a good idea.â Even a 50â50shared burden would be better than the status quo.
Iâll note that
In response to any sort of criticism or disagreement, Yudkowsky and other folksâ default response seems to be to fly into a rage and to try to attack or humiliate the person making the criticism/âexpressing the disagreement.
The responsible thing to do is to go look at the balance of what experts in a field are saying, and in this case, theyâre fairly split
This is not a crux for me. I think if you were paying attention, it was not hard to be convinced that AI extinction risk was a big deal in 2005â2015, when the expert consensus was something like âwho cares, ASI is a long way off.â Most people in my college EA group were concerned about AI risk well before ML experts were concerned about it. If todayâs ML experts were still dismissive of AI risk, that wouldnât make me more optimistic.
Oh, I agree that if one feels equipped to go actually look at the arguments, one doesnât need any argument-from-consensus. This is just, like, âif you are going to defer, defer reasonably.â Thanks for your comment; I feel similarly/âendorse.
This seems like a motte-and-bailey. The question at hand is not about expertsâ opinions on the general topic of existential risk from AGI, but specifically their assessment of Yudkowskyâs competence at understanding deep learning. You can believe that deep learning-based AGI is a serious existential risk within the next 20 years and also believe that Yudkowsky is not competent to understand the topic at a technical level.
As far as I know, Geoffrey Hinton has only commented on Yudkowskyâs high-level comments about existential risk from AGI â which is a concern Hinton shares â and not said anything about Yudkowskyâs technical competence on deep learning.
If you know any examples of prominent experts in deep learning vouching for Yudkowskyâs technical competence in deep learning, specifically, I invite you to give citations.
Yudkowsky has said he believes heâs by far the smartest person in the world at least when it comes to AI alignment/âsafety â as in, the second smartest doesnât come close â and maybe the smartest person in the world in general. AI alignment/âsafety has been his lifeâs work since before he decided â seemingly sometime in the mid-to-late 2010s â deep learning was likely to lead to AGI. MIRI pays him about $600,000 a year to do research. By now, heâs had plenty of opportunity to learn about deep learning. Given this, shouldnât he show a good grasp on concepts in deep learning? Shouldnât he be competent at making technical arguments about deep learning? Shouldnât he be able to clearly, coherently explain his reasoning?
It seems like Yudkowsky must at least be wrong about his own intelligence because if he really were as intelligent as he thinks, he wouldnât struggle with basic concepts in deep learning or have such a hard time defending the technical points he wants to make about deep learning. He would just be able to make a clear, coherent case, demonstrating an understanding of the definitions of widely-used terms and concepts. Since he canât do that, he must be overestimating his own abilities by quite a lot.
In domains other than AI, such as Japanese monetary policy, he has expressed views with a similar level of confidence and self-assurance as what he says about deep learning that turned out to be wrong, but, notably, never acknowledged the mistake. This speaks to Clara Collierâs point about not updating his views based on his new evidence. Itâs not clear that any amount of evidence would (at least publicly) change his mind about any topic where he would lose face if he admitted being wrong. (Heâs been wrong many times in the past. Has this ever happened before?) And if he doesnât understand deep learning in the first place, then the public shouldnât care whether he changes his mind or not.
You would most likely get fired for agreeing with me about this, so I canât reasonably expect you to agree, but I might as well say the things that people on the payroll of a Yudkowsky-founded organization canât say. For me, the cost isnât losing a job, itâs just a bit of negative karma on a forum.
Sorry to be so blunt, but youâre asking for $6M to $10M to be redirected from possibly the worldâs poorest people or animals in factory farms â or even other organizations working on AI safety â to your organization, led by Yudkowsky, so that you can try to influence policy on a national U.S. and international scale. Yudkowsky has indicated if his preferred policy were enacted at an international scale, it might increase the risk of wars. This calls for a high level of scrutiny. No one should accept weak, flimsy, hand-wavy arguments about this. No one should tiptoe around Yudkowskyâs track record of false or extremely dubious claims, or avoid questioning his technical competence in deep learning, which is in serious doubt, out of fear or politeness. If you, MIRI, or Yudkowsky donât want this level of scrutiny, donât ask for donations from the EA community and donât try to influence policy.
About the unexplained shift of focus from symbolic AI, which Yudkowsky was still claiming as of around 2015 or 2016
This is made up, as far as I can tell (at least re: symbolic AI as described in the wikipedia article you link). See Logical or Connectionist AI? (2008):
As it so happens, I do believe that the type of systems usually termed GOFAI will not yield general intelligence, even if you run them on a computer the size of the moon.
This seems quite fundamental to his, Soaresâ, and MIRIâs case, yet as far as I know, itâs never been discussed. Iâve looked, and Iâve asked around.
I really struggle to see how you could possibly have come to this conclusion, given the above.
Nope, it was Yudkowsky in a Facebook group about AI x-risk around 2015 or 2016. He specifically said he didnât think deep learning was the royal road to AGI.
Iâm not sure what sort of system Yudkowsky had in mind, specifically, when he said he thought symbolic AI approaches were more likely to get there. He didnât give more details. Maybe he saw a difference between symbolic AI and âthe type of systems usually termed GOFAIâ. Some people seem to draw a distinction between the two, whereas other people say theyâre just the same thing. Maybe he changed his mind about GOFAI between 2008 and whenever that discussion took place. I donât know. Iâm only going off of what he said, and he didnât explain this.
In any case, at some point after 2008 and before 2023 he changed his mind about deep learning, and he never explained why, as far as I can tell. This seems like an important topic to discuss, and not a minor detail or an afterthought. What gives?
Again, itâs not even clear that Yudkowsky even understands deep learning particularly well, so to apply his pre-deep learning theory to deep learning specifically, we need a substantive explanation from him. Itâs the kind of thing that if you were writing a book about this topic (or many long-form posts over the span of years), you would probably want to address.
The review of Collierâs review that you linked does not, in my view, adequately address the point that I raised from Collierâs review. The author of the reviewâs review does not demonstrate to me that they understand Collierâs point. They might understand it or they might not, but itâs not clear from the review whether they do or donât, so there isnât the basis for a convincing reply, there.
By the way, the last time we interacted on the EA Forum, you refused to retract a false accusation against me after I disproved it. I gave you the opportunity to apologize and try to have a good faith discussion from that point, but you didnât apologize and you didnât retract the accusation. Given this, I donât particularly have much patience for engaging with you further. Take care.
Nope, it was Yudkowsky in a Facebook group about AI x-risk around 2015 or 2016. He specifically said he didnât think deep learning was the royal road to AGI.
Nope, it was Yudkowsky in a Facebook group about AI x-risk around 2015 or 2016. He specifically said he didnât think deep learning was the royal road to AGI.
Would you be able to locate the post in question? If Yudkowsky did indeed say that, I would agree that it would constitute a relevant negative update about his overall prediction track record.
Some choice quotes from Clara Collierâs incisive review of If Anyone Builds It, Everyone Dies in Asterisk Magazine:
About the unexplained shift of focus from symbolic AI, which Yudkowsky was still claiming as of around 2015 or 2016 â quite late in the game, all things considered â was more likely than deep learning to lead to AGI, to deep learning:
Building on this:
I still have no idea when, why, or how exactly Eliezer Yudkowsky changed his mind about symbolic AI vs. deep learning. This seems quite fundamental to his, Nate Soaresâ, and MIRIâs case, yet as far as I know, itâs never been discussed. Iâve looked, and Iâve asked around. Iâm not reassured Yudkowsky has a good understanding of deep learning, and, per Clara Collierâs review, it really doesnât seem like the core MIRI case has been updated since the pre-deep learning era in the late 2000s. If deep learning doesnât change things, Yudkowsky/âMIRI should explain why not. If it does change things, then Yudkowsky/âMIRIâs views and arguments should be updated to reflect that.
Also, itâs worth reflecting on how unrealistic Yudkowskyâs belief in symbolic AI now appears given that weâve had over a decade of deep learning-based and deep reinforcement learning-based AI systems that are astronomically more capable than any symbolic AI systems ever were, and yet these systems are still far below human-level. Deep learning is vastly more capable than symbolic AI and even deep learning is still vastly less capable than the average human (or, on some dimensions, the average cat). So, it really seems unrealistic to think symbolic AI could have led to AGI, especially on the short timescales Yudkowsky was imagining in the 2000s and early-to-mid 2010s.
It makes the whole thing look a little odd. If deep neural networks had never been invented, eventually, at some point, surely it would have become evident that symbolic AI was never going to lead to anything interesting or powerful. Maybe in this counterfactual timeline, by the 2040s, with no meaningful progress in AI, people who had believed Yudkowskyâs arguments would start to have doubts. Itâs odd that deep neural networks were invented and then Yudkowsky abandoned this forlorn theory about symbolic AI, yet changed very little, if anything, when creating a new version of the theory about deep learning. Itâs also quite odd that he switched from the old version of the theory to the new version with no public explanation, as far as Iâve been able to find.
This seems consistent with a general pattern of reluctance to admit mistakes.
Speaking in my capacity as someone who currently works for MIRI, but who emphatically does not understand all things that Eliezer Yudkowsky understands, and canât authoritatively represent him (or Nate, or the other advanced researchers at MIRI who are above my intellectual pay grade):
My own understanding is that Eliezer has, all along, for as long as Iâve known him and been following his work, been fairly agnostic as to questions of how AGI and ASI will be achieved, and what the underlying architectures of the systems will be.
Iâve often seen Eliezer say âI think X will not workâ or âI think Y is less doomed than X,â but in my experience itâs always been with a sort of casual shrug and an attitude of âbut of course these are very hard callsâ and also with âand it doesnât really matter to the ultimate outcome except insofar as some particular architecture might make reliable alignment possible at all.â
Eliezerâs take (again, as I understand it) is something like âif you have a system that is intelligent enough and powerful enough to do the actual interesting work that humans want to do, such as end all wars and invent longevity technology and get us to the stars (and achieve these goals in the real world, which involves also being competent at things like persuasion and communication), then that system is going to be very, very, very hard to make safe. Itâs going to be easier by many orders of magnitude to create systems that are capable of that level of sophisticated agency that donât care about human flourishing, than it will be to hit the narrow target of a sufficiently sophisticated system that also does in fact happen to care.â
Thatâs true regardless of whether youâre working with deep learning or symbolic AI. In fact, deep learning makes it worseâEliezer was pointing at âeven if you build this thing out of nuts and bolts that you thoroughly understand, alignment is a hard problem,â and instead we have ended up in a timeline where the systems are grown rather than crafted, giving us even less reason to be confident or hopeful.
(This is a trend: people often misunderstand MIRIâs attempts to underscore how hard the problem is as being concrete predictions about what will happen, c.f. the era in which people were like, well, obviously any competent lab trying to build ASI will keep their systems airgapped and secure and have a very small number of trusted and monitored employees acting as intermediaries. MIRIâs response was to demonstrate how even in such a paradigm, a sufficiently sophisticated system would have little trouble escaping the box. Now, all of the frontier labs routinely feed their systems the entire internet and let those systems interact with any human on Earth and in many cases let those systems write and deploy their own code with no oversight, and some people say âhaha, look, MIRI was wrong.â Those people are confused.)
Symbolic AI vs. deep learning was never a crux, for Eliezer or the MIRI view. It was a non-crucial sidebar in which Eliezer had some intuitions and guesses, some of which he was more confident about and others less confident, and some of those guesses turned out wrong, and none of that ever mattered to the larger picture. The crucial considerations are the power/âsophistication/âintelligence of the system, and the degree to which its true goals can be specified/âpinned-down, and being wrong about whether deep learning or symbolic AI specifically were capable of reaching the required level of sophistication is mostly irrelevant.
One could argue âwell, Eliezer proved himself incapable of predicting the future with those guesses!â but this would be, in my view, disingenuous. Eliezer has long said, and continues to say, âlook, guesses about how the board will look in the middle of the chess game are fraught, Iâm willing to share my intuitions but they are far more likely to be wrong than right; itâs hard to know what moves Stockfish will make or how the game will play out; what matters is that itâs still easy to know with high confidence that Stockfish will win.â
That claim was compelling to me in 2015, and it remains compelling to me in 2025, and the things that have happened in the world in the ten years in between have, on the whole, made the case for concern stronger rather than weaker.
To draw up one comment from your response below:
...Collierâs review does not even convincingly demonstrate that they read the book, since they get some extremely basic facts about it loudly, loudly wrong, in a manner thatâs fairly crucial for their criticisms. I think that you should hold the reviewer and the reviewâs reviewer to the same standard, rather than letting the person you agree with more off the hook.
Fair warning: I wrote this response less for Yarrow specifically and more for the benefit of the EA forum userbase writ large, so Iâm not promising that I will engage much beyond this reply. I might! But I also might not. I think I said the most important thing I had to say, in the above.
EDIT: oh, for more on how this:
...is badly, badly wrong, see the supplemental materials for the book, particularly chapters 2, 3, and 4, which exhaustively addressed this point long before Collier ever made it, because we knew people would make it. (Itâs also addressed in the book, but I guess Collier missed that in their haste to say a bunch of things it seems they already believed and were going to say regardless.)
I think you might be engaging in a bit of Motte-and-Baileying here. Throughout this comment, youâre stating MIRIâs position as things like âit will be hard to make ASI safeâ, and that AI will âwinâ, and that it will be hard for an AI to be perfectly aligned with âhuman flourishingâ Those statements seem pretty reasonable.
But the actual stance of MIRI, which you just released a book about, is that there is an extremely high chance that building powerful AI will result in everybody on planet earth being killed. Thatâs a much narrower and more specific claim. You can imagine a lot of scenarios where AI is unsafe, but not in a way that kills everyone. You can imagine cases where AI âwinsâ, but decides to cut a deal with us. You can imagine cases where an AI doesnât care about human flourishing because it doesnât care about anything, it ends up acting like a tool that we can direct as we please.
Iâm aware that you have counterarguments for all of these cases (that I will probably disagree with). But these counterarguments will have to be rooted in the actual nuts and bolts details of how actual, physical AI works. And if you are trying to reason about future machines, you want to be able to get a good prediction about their actual characteristics.
I think in this context, itâs totally reasonable for people to look at your (in my opinion poor) track record of prediction and adjust their credence in your effectiveness as an institution.
I disagree re: motte and bailey; the above is not at all in conflict with the position of the book (which, to be clear, I endorse and agree with and is also my position).
re: âyou can imagine,â I strongly encourage people to be careful about leaning too hard on their own ability to imagine things; itâs often fraught and a huge chunk of the work MIRI does is poking at those imaginings to see where they collapse.
Iâll note that core MIRI predictions about e.g. how machines will be misaligned at current levels of sophistication are being borne outâthings we have been saying for years about e.g. emergent drives and deception and hacking and brittle proxies. Iâm pretty sure thatâs not ârooted in the actual nuts and bolts detailsâ in the way youâre wanting, but it still feels ⊠relevant.
Thanks @Duncan Sabien for this excellent explanation. Donât undersell yourself, I rate your communication here at least as good (if not better) than that of other senior MIRI people in recent years.
Thank you for attempting to explain Yudkowskyâs views as you understand them.
I donât think anybody would be convinced by an a priori argument that any and all imaginable forms of AGI or ASI are highly existentially dangerous as a consequence of being prone to the sort of alignment failures Yudkowsky imagined in the 2000s, in the pre-deep learning era, regardless of the technological paradigm underlying them, simply by virtue of being AGI or ASI. I think you have to make some specific assumptions about how the underlying technology works. Itâs not clear what assumptions Yudkowsky is making about that.
A lot of Yudkowskyâs (and Yudkowsky and Soaresâ) arguments about why deep learning is dangerous seem to depend on very loose, hazy analogies (like this). Yudkowsky doesnât have any expertise, education, training, or research/âengineering experience with deep learning. Some deep learning experts say he doesnât know what heâs talking about, on even a basic level. He responded to this, but was unable to defend the technical point he was making â his response was more about vaguely casting aspersions and a preoccupation with social status over and above technical matters, as is often the case. So, Iâm not sure he actually understands the underlying technology well enough to make a convincing, substantive, deep, detailed version of the argument he wants to make.
Whether deep learning is dissimilar from human cognition, and specifically dissimilar in a way that makes it 99.5% likely to cause human extinction, is not some side issue, but the topic on which the whole debate depends. If deep learning is highly similar to human cognition, or dissimilar but in a way that doesnât make it existentially dangerous, then AGI and ASI based on deep learning would not have a 99.5% chance of causing human extinction. Thatâs not a minor detail. Thatâs the whole ballgame.
Also, as an aside, itâs bewildering to me how poorly Yudkowsky and others at MIRI take criticism of their ideas. In response to any sort of criticism or disagreement, Yudkowsky and other folksâ default response seems to be to fly into a rage and to try to attack or humiliate the person making the criticism/âexpressing the disagreement. Yudkowsky has explicitly said he believes heâs by far the smartest person on Earth, at least when it comes to the topic of AGI safety/âalignment. He seems indignant at having to talk to people who are so much less intelligent than he is. Unfortunately, his attitude seems to have become the MIRI culture. (Soares has also apparently contributed to this dynamic.)
If youâre doing technical research, maybe you can get away with that â but even then I donât think you can, because an inability to hear criticism/âdisagreement makes you much worse at research. But now that MIRI has pivoted to communications and advocacy, this will be an even more serious problem. If Yudkowsky and others at MIRI are incapable of engaging in civil intellectual debate, or are simply unwilling to, how on Earth are they going to be effective at advocacy and communications?
Again speaking more for the broad audience:
âSome experts downvote Yudkowskyâs standing to opineâ is not a reasonable standard; some experts think vaccines cause autism. You can usually find someone with credentials in a field who will say almost anything.
The responsible thing to do (EDIT: if youâre deferring at all, as opposed to evaluating the situation for yourself) is to go look at the balance of what experts in a field are saying, and in this case, theyâre fairly split, with plenty of respected big names (including many who disagree with Eliezer on many questions) saying he knows enough of what heâs talking about to be worth listening to. I get that Yarrow is not convinced, but I trust Hinton, who has reservations of his own but not of the form âEliezer should be dismissed out of hand for lack of some particular technical expertise.â
Also: when the experts in a field are split, and the question is one of existential danger, it seems that the splitness itself is not reassuring. Experts in nuclear physics do not drastically diverge in their predictions about what will happen inside a bomb or reactor, because we understand nuclear physics. When experts in the field of artificial intelligence have wildly different predictions and the disagreement cannot be conclusively resolved, this is a sign of looseness in everyoneâs understanding, and when you ask normal people on the street âhey, if one expert says an invention will kill everyone, and another says it wonât, and you ask the one who says it wonât where their confidence comes from, and they say âbecause Iâm pretty sure weâll muddle our way through, with unproven techniques that havenât been invented yet, the risk of killing everyone is probably under 5%,â how do you feel?â
they tend to feel alarmed.
And that characterization is not uncharitableâthe optimists in this debate do not have an actual concrete plan. You can just go check. It all ultimately boils down to handwaving and platitudes and âIâm sure weâll stay ahead of capabilities [for no explicable reason].â
And weâre intentionally aiming at something that exceeds us along the very axis that led us to dominate the planet, so ⊠?
Another way of saying this: itâs very, very weird that the burden of proof on this brand-new and extremely powerful technology is âmake an airtight case that itâs dangerousâ instead of âmake an airtight case that itâs a good idea.â Even a 50â50 shared burden would be better than the status quo.
Iâll note that
...seems false.
This is not a crux for me. I think if you were paying attention, it was not hard to be convinced that AI extinction risk was a big deal in 2005â2015, when the expert consensus was something like âwho cares, ASI is a long way off.â Most people in my college EA group were concerned about AI risk well before ML experts were concerned about it. If todayâs ML experts were still dismissive of AI risk, that wouldnât make me more optimistic.
Oh, I agree that if one feels equipped to go actually look at the arguments, one doesnât need any argument-from-consensus. This is just, like, âif you are going to defer, defer reasonably.â Thanks for your comment; I feel similarly/âendorse.
Made a small edit to reflect.
This seems like a motte-and-bailey. The question at hand is not about expertsâ opinions on the general topic of existential risk from AGI, but specifically their assessment of Yudkowskyâs competence at understanding deep learning. You can believe that deep learning-based AGI is a serious existential risk within the next 20 years and also believe that Yudkowsky is not competent to understand the topic at a technical level.
As far as I know, Geoffrey Hinton has only commented on Yudkowskyâs high-level comments about existential risk from AGI â which is a concern Hinton shares â and not said anything about Yudkowskyâs technical competence on deep learning.
If you know any examples of prominent experts in deep learning vouching for Yudkowskyâs technical competence in deep learning, specifically, I invite you to give citations.
Yudkowsky has said he believes heâs by far the smartest person in the world at least when it comes to AI alignment/âsafety â as in, the second smartest doesnât come close â and maybe the smartest person in the world in general. AI alignment/âsafety has been his lifeâs work since before he decided â seemingly sometime in the mid-to-late 2010s â deep learning was likely to lead to AGI. MIRI pays him about $600,000 a year to do research. By now, heâs had plenty of opportunity to learn about deep learning. Given this, shouldnât he show a good grasp on concepts in deep learning? Shouldnât he be competent at making technical arguments about deep learning? Shouldnât he be able to clearly, coherently explain his reasoning?
It seems like Yudkowsky must at least be wrong about his own intelligence because if he really were as intelligent as he thinks, he wouldnât struggle with basic concepts in deep learning or have such a hard time defending the technical points he wants to make about deep learning. He would just be able to make a clear, coherent case, demonstrating an understanding of the definitions of widely-used terms and concepts. Since he canât do that, he must be overestimating his own abilities by quite a lot.
In domains other than AI, such as Japanese monetary policy, he has expressed views with a similar level of confidence and self-assurance as what he says about deep learning that turned out to be wrong, but, notably, never acknowledged the mistake. This speaks to Clara Collierâs point about not updating his views based on his new evidence. Itâs not clear that any amount of evidence would (at least publicly) change his mind about any topic where he would lose face if he admitted being wrong. (Heâs been wrong many times in the past. Has this ever happened before?) And if he doesnât understand deep learning in the first place, then the public shouldnât care whether he changes his mind or not.
You would most likely get fired for agreeing with me about this, so I canât reasonably expect you to agree, but I might as well say the things that people on the payroll of a Yudkowsky-founded organization canât say. For me, the cost isnât losing a job, itâs just a bit of negative karma on a forum.
Sorry to be so blunt, but youâre asking for $6M to $10M to be redirected from possibly the worldâs poorest people or animals in factory farms â or even other organizations working on AI safety â to your organization, led by Yudkowsky, so that you can try to influence policy on a national U.S. and international scale. Yudkowsky has indicated if his preferred policy were enacted at an international scale, it might increase the risk of wars. This calls for a high level of scrutiny. No one should accept weak, flimsy, hand-wavy arguments about this. No one should tiptoe around Yudkowskyâs track record of false or extremely dubious claims, or avoid questioning his technical competence in deep learning, which is in serious doubt, out of fear or politeness. If you, MIRI, or Yudkowsky donât want this level of scrutiny, donât ask for donations from the EA community and donât try to influence policy.
This is made up, as far as I can tell (at least re: symbolic AI as described in the wikipedia article you link). See Logical or Connectionist AI? (2008):
Wikipedia, on GOFAI (reformatted, bolding mine):
Even earlier is Levels of Organization in General Intelligence. It is difficult to excerpt a quote but it is not favorable to the traditional âsymbolic AIâ paradigm.
I really struggle to see how you could possibly have come to this conclusion, given the above.
And see here re: Collierâs review.
Nope, it was Yudkowsky in a Facebook group about AI x-risk around 2015 or 2016. He specifically said he didnât think deep learning was the royal road to AGI.
Iâm not sure what sort of system Yudkowsky had in mind, specifically, when he said he thought symbolic AI approaches were more likely to get there. He didnât give more details. Maybe he saw a difference between symbolic AI and âthe type of systems usually termed GOFAIâ. Some people seem to draw a distinction between the two, whereas other people say theyâre just the same thing. Maybe he changed his mind about GOFAI between 2008 and whenever that discussion took place. I donât know. Iâm only going off of what he said, and he didnât explain this.
In any case, at some point after 2008 and before 2023 he changed his mind about deep learning, and he never explained why, as far as I can tell. This seems like an important topic to discuss, and not a minor detail or an afterthought. What gives?
Again, itâs not even clear that Yudkowsky even understands deep learning particularly well, so to apply his pre-deep learning theory to deep learning specifically, we need a substantive explanation from him. Itâs the kind of thing that if you were writing a book about this topic (or many long-form posts over the span of years), you would probably want to address.
The review of Collierâs review that you linked does not, in my view, adequately address the point that I raised from Collierâs review. The author of the reviewâs review does not demonstrate to me that they understand Collierâs point. They might understand it or they might not, but itâs not clear from the review whether they do or donât, so there isnât the basis for a convincing reply, there.
By the way, the last time we interacted on the EA Forum, you refused to retract a false accusation against me after I disproved it. I gave you the opportunity to apologize and try to have a good faith discussion from that point, but you didnât apologize and you didnât retract the accusation. Given this, I donât particularly have much patience for engaging with you further. Take care.
This is a narrow point[1] but I want to point out that [not deep learning] is extremely broad, and the usage of the term âgood old-fashioned AIâ has been moving around between [not deep learning] and [deduction on Lisp symbols], and I think thereâs a huge space of techniques inbetween (probabilistic programming, program induction/âsynthesis, support vector machines, dimensionality reduction Ă la t-SNE/âUMAP, evolutionary methodsâŠ).
A hobby-horse of mine.
Would you be able to locate the post in question? If Yudkowsky did indeed say that, I would agree that it would constitute a relevant negative update about his overall prediction track record.