If deep learning doesn’t change things, Yudkowsky/MIRI should explain why not.
Speaking in my capacity as someone who currently works for MIRI, but who emphatically does not understand all things that Eliezer Yudkowsky understands, and can’t authoritatively represent him (or Nate, or the other advanced researchers at MIRI who are above my intellectual pay grade):
My own understanding is that Eliezer has, all along, for as long as I’ve known him and been following his work, been fairly agnostic as to questions of how AGI and ASI will be achieved, and what the underlying architectures of the systems will be.
I’ve often seen Eliezer say “I think X will not work” or “I think Y is less doomed than X,” but in my experience it’s always been with a sort of casual shrug and an attitude of “but of course these are very hard calls” and also with “and it doesn’t really matter to the ultimate outcome except insofar as some particular architecture might make reliable alignment possible at all.”
Eliezer’s take (again, as I understand it) is something like “if you have a system that is intelligent enough and powerful enough to do the actual interesting work that humans want to do, such as end all wars and invent longevity technology and get us to the stars (and achieve these goals in the real world, which involves also being competent at things like persuasion and communication), then that system is going to be very, very, very hard to make safe. It’s going to be easier by many orders of magnitude to create systems that are capable of that level of sophisticated agency that don’t care about human flourishing, than it will be to hit the narrow target of a sufficiently sophisticated system that also does in fact happen to care.”
That’s true regardless of whether you’re working with deep learning or symbolic AI. In fact, deep learning makes it worse—Eliezer was pointing at “even if you build this thing out of nuts and bolts that you thoroughly understand, alignment is a hard problem,” and instead we have ended up in a timeline where the systems are grown rather than crafted, giving us even less reason to be confident or hopeful.
(This is a trend: people often misunderstand MIRI’s attempts to underscore how hard the problem is as being concrete predictions about what will happen, c.f. the era in which people were like, well, obviously any competent lab trying to build ASI will keep their systems airgapped and secure and have a very small number of trusted and monitored employees acting as intermediaries. MIRI’s response was to demonstrate how even in such a paradigm, a sufficiently sophisticated system would have little trouble escaping the box. Now, all of the frontier labs routinely feed their systems the entire internet and let those systems interact with any human on Earth and in many cases let those systems write and deploy their own code with no oversight, and some people say “haha, look, MIRI was wrong.” Those people are confused.)
Symbolic AI vs. deep learning was never a crux, for Eliezer or the MIRI view. It was a non-crucial sidebar in which Eliezer had some intuitions and guesses, some of which he was more confident about and others less confident, and some of those guesses turned out wrong, and none of that ever mattered to the larger picture. The crucial considerations are the power/sophistication/intelligence of the system, and the degree to which its true goals can be specified/pinned-down, and being wrong about whether deep learning or symbolic AI specifically were capable of reaching the required level of sophistication is mostly irrelevant.
One could argue “well, Eliezer proved himself incapable of predicting the future with those guesses!” but this would be, in my view, disingenuous. Eliezer has long said, and continues to say, “look, guesses about how the board will look in the middle of the chess game are fraught, I’m willing to share my intuitions but they are far more likely to be wrong than right; it’s hard to know what moves Stockfish will make or how the game will play out; what matters is that it’s still easy to know with high confidence that Stockfish will win.”
That claim was compelling to me in 2015, and it remains compelling to me in 2025, and the things that have happened in the world in the ten years in between have, on the whole, made the case for concern stronger rather than weaker.
To draw up one comment from your response below:
The author of the review’s review does not demonstrate to me that they understand Collier’s point.
...Collier’s review does not even convincingly demonstrate that they read the book, since they get some extremely basic facts about it loudly, loudly wrong, in a manner that’s fairly crucial for their criticisms. I think that you should hold the reviewer and the review’s reviewer to the same standard, rather than letting the person you agree with more off the hook.
Fair warning: I wrote this response less for Yarrow specifically and more for the benefit of the EA forum userbase writ large, so I’m not promising that I will engage much beyond this reply. I might! But I also might not. I think I said the most important thing I had to say, in the above.
EDIT: oh, for more on how this:
In fact, there are plenty of reasons why the fact that AIs are grown and not crafted might cut against the MIRI argument. For one: The most advanced, generally capable AI systems around today are trained on human-generated text, encoding human values and modes of thought.
...is badly, badly wrong, see the supplemental materials for the book, particularly chapters 2, 3, and 4, which exhaustively addressed this point long before Collier ever made it, because we knew people would make it. (It’s also addressed in the book, but I guess Collier missed that in their haste to say a bunch of things it seems they already believed and were going to say regardless.)
I think you might be engaging in a bit of Motte-and-Baileying here. Throughout this comment, you’re stating MIRI’s position as things like “it will be hard to make ASI safe”, and that AI will “win”, and that it will be hard for an AI to be perfectly aligned with “human flourishing” Those statements seem pretty reasonable.
But the actual stance of MIRI, which you just released a book about, is that there is an extremely high chance that building powerful AI will result in everybody on planet earth being killed. That’s a much narrower and more specific claim. You can imagine a lot of scenarios where AI is unsafe, but not in a way that kills everyone. You can imagine cases where AI “wins”, but decides to cut a deal with us. You can imagine cases where an AI doesn’t care about human flourishing because it doesn’t care about anything, it ends up acting like a tool that we can direct as we please.
I’m aware that you have counterarguments for all of these cases (that I will probably disagree with). But these counterarguments will have to be rooted in the actual nuts and bolts details of how actual, physical AI works. And if you are trying to reason about future machines, you want to be able to get a good prediction about their actual characteristics.
I think in this context, it’s totally reasonable for people to look at your (in my opinion poor) track record of prediction and adjust their credence in your effectiveness as an institution.
I disagree re: motte and bailey; the above is not at all in conflict with the position of the book (which, to be clear, I endorse and agree with and is also my position).
re: “you can imagine,” I strongly encourage people to be careful about leaning too hard on their own ability to imagine things; it’s often fraught and a huge chunk of the work MIRI does is poking at those imaginings to see where they collapse.
I’ll note that core MIRI predictions about e.g. how machines will be misaligned at current levels of sophistication are being borne out—things we have been saying for years about e.g. emergent drives and deception and hacking and brittle proxies. I’m pretty sure that’s not “rooted in the actual nuts and bolts details” in the way you’re wanting, but it still feels … relevant.
Thanks @Duncan Sabien for this excellent explanation. Don’t undersell yourself, I rate your communication here at least as good (if not better) than that of other senior MIRI people in recent years.
Thank you for attempting to explain Yudkowsky’s views as you understand them.
I don’t think anybody would be convinced by an a priori argument that any and all imaginable forms of AGI or ASI are highly existentially dangerous as a consequence of being prone to the sort of alignment failures Yudkowsky imagined in the 2000s, in the pre-deep learning era, regardless of the technological paradigm underlying them, simply by virtue of being AGI or ASI. I think you have to make some specific assumptions about how the underlying technology works. It’s not clear what assumptions Yudkowsky is making about that.
A lot of Yudkowsky’s (and Yudkowsky and Soares’) arguments about why deep learning is dangerous seem to depend on very loose, hazy analogies (like this). Yudkowsky doesn’t have any expertise, education, training, or research/engineering experience with deep learning. Some deep learning experts say he doesn’t know what he’s talking about, on even a basic level. He responded to this, but was unable to defend the technical point he was making — his response was more about vaguely casting aspersions and a preoccupation with social status over and above technical matters, as is often the case. So, I’m not sure he actually understands the underlying technology well enough to make a convincing, substantive, deep, detailed version of the argument he wants to make.
Whether deep learning is dissimilar from human cognition, and specifically dissimilar in a way that makes it 99.5% likely to cause human extinction, is not some side issue, but the topic on which the whole debate depends. If deep learning is highly similar to human cognition, or dissimilar but in a way that doesn’t make it existentially dangerous, then AGI and ASI based on deep learning would not have a 99.5% chance of causing human extinction. That’s not a minor detail. That’s the whole ballgame.
Also, as an aside, it’s bewildering to me how poorly Yudkowsky and others at MIRI take criticism of their ideas. In response to any sort of criticism or disagreement, Yudkowsky and other folks’ default response seems to be to fly into a rage and to try to attack or humiliate the person making the criticism/expressing the disagreement. Yudkowsky has explicitly said he believes he’s by far the smartest person on Earth, at least when it comes to the topic of AGI safety/alignment. He seems indignant at having to talk to people who are so much less intelligent than he is. Unfortunately, his attitude seems to have become the MIRI culture. (Soares has also apparently contributed to this dynamic.)
If you’re doing technical research, maybe you can get away with that — but even then I don’t think you can, because an inability to hear criticism/disagreement makes you much worse at research. But now that MIRI has pivoted to communications and advocacy, this will be an even more serious problem. If Yudkowsky and others at MIRI are incapable of engaging in civil intellectual debate, or are simply unwilling to, how on Earth are they going to be effective at advocacy and communications?
“Some experts downvote Yudkowsky’s standing to opine” is not a reasonable standard; some experts think vaccines cause autism. You can usually find someone with credentials in a field who will say almost anything.
The responsible thing to do (EDIT: if you’re deferring at all, as opposed to evaluating the situation for yourself) is to go look at the balance of what experts in a field are saying, and in this case, they’re fairly split, with plenty of respected big names (including many who disagree with Eliezer on many questions) saying he knows enough of what he’s talking about to be worth listening to. I get that Yarrow is not convinced, but I trust Hinton, who has reservations of his own but not of the form “Eliezer should be dismissed out of hand for lack of some particular technical expertise.”
Also: when the experts in a field are split, and the question is one of existential danger, it seems that the splitness itself is not reassuring. Experts in nuclear physics do not drastically diverge in their predictions about what will happen inside a bomb or reactor, because we understand nuclear physics. When experts in the field of artificial intelligence have wildly different predictions and the disagreement cannot be conclusively resolved, this is a sign of looseness in everyone’s understanding, and when you ask normal people on the street “hey, if one expert says an invention will kill everyone, and another says it won’t, and you ask the one who says it won’t where their confidence comes from, and they say ‘because I’m pretty sure we’ll muddle our way through, with unproven techniques that haven’t been invented yet, the risk of killing everyone is probably under 5%,’ how do you feel?”
they tend to feel alarmed.
And that characterization is not uncharitable—the optimists in this debate do not have an actual concrete plan. You can just go check. It all ultimately boils down to handwaving and platitudes and “I’m sure we’ll stay ahead of capabilities [for no explicable reason].”
And we’re intentionally aiming at something that exceeds us along the very axis that led us to dominate the planet, so … ?
Another way of saying this: it’s very, very weird that the burden of proof on this brand-new and extremely powerful technology is “make an airtight case that it’s dangerous” instead of “make an airtight case that it’s a good idea.” Even a 50⁄50shared burden would be better than the status quo.
I’ll note that
In response to any sort of criticism or disagreement, Yudkowsky and other folks’ default response seems to be to fly into a rage and to try to attack or humiliate the person making the criticism/expressing the disagreement.
The responsible thing to do is to go look at the balance of what experts in a field are saying, and in this case, they’re fairly split
This is not a crux for me. I think if you were paying attention, it was not hard to be convinced that AI extinction risk was a big deal in 2005–2015, when the expert consensus was something like “who cares, ASI is a long way off.” Most people in my college EA group were concerned about AI risk well before ML experts were concerned about it. If today’s ML experts were still dismissive of AI risk, that wouldn’t make me more optimistic.
Oh, I agree that if one feels equipped to go actually look at the arguments, one doesn’t need any argument-from-consensus. This is just, like, “if you are going to defer, defer reasonably.” Thanks for your comment; I feel similarly/endorse.
This seems like a motte-and-bailey. The question at hand is not about experts’ opinions on the general topic of existential risk from AGI, but specifically their assessment of Yudkowsky’s competence at understanding deep learning. You can believe that deep learning-based AGI is a serious existential risk within the next 20 years and also believe that Yudkowsky is not competent to understand the topic at a technical level.
As far as I know, Geoffrey Hinton has only commented on Yudkowsky’s high-level comments about existential risk from AGI — which is a concern Hinton shares — and not said anything about Yudkowsky’s technical competence on deep learning.
If you know any examples of prominent experts in deep learning vouching for Yudkowsky’s technical competence in deep learning, specifically, I invite you to give citations.
Yudkowsky has said he believes he’s by far the smartest person in the world at least when it comes to AI alignment/safety — as in, the second smartest doesn’t come close — and maybe the smartest person in the world in general. AI alignment/safety has been his life’s work since before he decided — seemingly sometime in the mid-to-late 2010s — deep learning was likely to lead to AGI. MIRI pays him about $600,000 a year to do research. By now, he’s had plenty of opportunity to learn about deep learning. Given this, shouldn’t he show a good grasp on concepts in deep learning? Shouldn’t he be competent at making technical arguments about deep learning? Shouldn’t he be able to clearly, coherently explain his reasoning?
It seems like Yudkowsky must at least be wrong about his own intelligence because if he really were as intelligent as he thinks, he wouldn’t struggle with basic concepts in deep learning or have such a hard time defending the technical points he wants to make about deep learning. He would just be able to make a clear, coherent case, demonstrating an understanding of the definitions of widely-used terms and concepts. Since he can’t do that, he must be overestimating his own abilities by quite a lot.
In domains other than AI, such as Japanese monetary policy, he has expressed views with a similar level of confidence and self-assurance as what he says about deep learning that turned out to be wrong, but, notably, never acknowledged the mistake. This speaks to Clara Collier’s point about not updating his views based on his new evidence. It’s not clear that any amount of evidence would (at least publicly) change his mind about any topic where he would lose face if he admitted being wrong. (He’s been wrong many times in the past. Has this ever happened before?) And if he doesn’t understand deep learning in the first place, then the public shouldn’t care whether he changes his mind or not.
You would most likely get fired for agreeing with me about this, so I can’t reasonably expect you to agree, but I might as well say the things that people on the payroll of a Yudkowsky-founded organization can’t say. For me, the cost isn’t losing a job, it’s just a bit of negative karma on a forum.
Sorry to be so blunt, but you’re asking for $6M to $10M to be redirected from possibly the world’s poorest people or animals in factory farms — or even other organizations working on AI safety — to your organization, led by Yudkowsky, so that you can try to influence policy on a national U.S. and international scale. Yudkowsky has indicated if his preferred policy were enacted at an international scale, it might increase the risk of wars. This calls for a high level of scrutiny. No one should accept weak, flimsy, hand-wavy arguments about this. No one should tiptoe around Yudkowsky’s track record of false or extremely dubious claims, or avoid questioning his technical competence in deep learning, which is in serious doubt, out of fear or politeness. If you, MIRI, or Yudkowsky don’t want this level of scrutiny, don’t ask for donations from the EA community and don’t try to influence policy.
Speaking in my capacity as someone who currently works for MIRI, but who emphatically does not understand all things that Eliezer Yudkowsky understands, and can’t authoritatively represent him (or Nate, or the other advanced researchers at MIRI who are above my intellectual pay grade):
My own understanding is that Eliezer has, all along, for as long as I’ve known him and been following his work, been fairly agnostic as to questions of how AGI and ASI will be achieved, and what the underlying architectures of the systems will be.
I’ve often seen Eliezer say “I think X will not work” or “I think Y is less doomed than X,” but in my experience it’s always been with a sort of casual shrug and an attitude of “but of course these are very hard calls” and also with “and it doesn’t really matter to the ultimate outcome except insofar as some particular architecture might make reliable alignment possible at all.”
Eliezer’s take (again, as I understand it) is something like “if you have a system that is intelligent enough and powerful enough to do the actual interesting work that humans want to do, such as end all wars and invent longevity technology and get us to the stars (and achieve these goals in the real world, which involves also being competent at things like persuasion and communication), then that system is going to be very, very, very hard to make safe. It’s going to be easier by many orders of magnitude to create systems that are capable of that level of sophisticated agency that don’t care about human flourishing, than it will be to hit the narrow target of a sufficiently sophisticated system that also does in fact happen to care.”
That’s true regardless of whether you’re working with deep learning or symbolic AI. In fact, deep learning makes it worse—Eliezer was pointing at “even if you build this thing out of nuts and bolts that you thoroughly understand, alignment is a hard problem,” and instead we have ended up in a timeline where the systems are grown rather than crafted, giving us even less reason to be confident or hopeful.
(This is a trend: people often misunderstand MIRI’s attempts to underscore how hard the problem is as being concrete predictions about what will happen, c.f. the era in which people were like, well, obviously any competent lab trying to build ASI will keep their systems airgapped and secure and have a very small number of trusted and monitored employees acting as intermediaries. MIRI’s response was to demonstrate how even in such a paradigm, a sufficiently sophisticated system would have little trouble escaping the box. Now, all of the frontier labs routinely feed their systems the entire internet and let those systems interact with any human on Earth and in many cases let those systems write and deploy their own code with no oversight, and some people say “haha, look, MIRI was wrong.” Those people are confused.)
Symbolic AI vs. deep learning was never a crux, for Eliezer or the MIRI view. It was a non-crucial sidebar in which Eliezer had some intuitions and guesses, some of which he was more confident about and others less confident, and some of those guesses turned out wrong, and none of that ever mattered to the larger picture. The crucial considerations are the power/sophistication/intelligence of the system, and the degree to which its true goals can be specified/pinned-down, and being wrong about whether deep learning or symbolic AI specifically were capable of reaching the required level of sophistication is mostly irrelevant.
One could argue “well, Eliezer proved himself incapable of predicting the future with those guesses!” but this would be, in my view, disingenuous. Eliezer has long said, and continues to say, “look, guesses about how the board will look in the middle of the chess game are fraught, I’m willing to share my intuitions but they are far more likely to be wrong than right; it’s hard to know what moves Stockfish will make or how the game will play out; what matters is that it’s still easy to know with high confidence that Stockfish will win.”
That claim was compelling to me in 2015, and it remains compelling to me in 2025, and the things that have happened in the world in the ten years in between have, on the whole, made the case for concern stronger rather than weaker.
To draw up one comment from your response below:
...Collier’s review does not even convincingly demonstrate that they read the book, since they get some extremely basic facts about it loudly, loudly wrong, in a manner that’s fairly crucial for their criticisms. I think that you should hold the reviewer and the review’s reviewer to the same standard, rather than letting the person you agree with more off the hook.
Fair warning: I wrote this response less for Yarrow specifically and more for the benefit of the EA forum userbase writ large, so I’m not promising that I will engage much beyond this reply. I might! But I also might not. I think I said the most important thing I had to say, in the above.
EDIT: oh, for more on how this:
...is badly, badly wrong, see the supplemental materials for the book, particularly chapters 2, 3, and 4, which exhaustively addressed this point long before Collier ever made it, because we knew people would make it. (It’s also addressed in the book, but I guess Collier missed that in their haste to say a bunch of things it seems they already believed and were going to say regardless.)
I think you might be engaging in a bit of Motte-and-Baileying here. Throughout this comment, you’re stating MIRI’s position as things like “it will be hard to make ASI safe”, and that AI will “win”, and that it will be hard for an AI to be perfectly aligned with “human flourishing” Those statements seem pretty reasonable.
But the actual stance of MIRI, which you just released a book about, is that there is an extremely high chance that building powerful AI will result in everybody on planet earth being killed. That’s a much narrower and more specific claim. You can imagine a lot of scenarios where AI is unsafe, but not in a way that kills everyone. You can imagine cases where AI “wins”, but decides to cut a deal with us. You can imagine cases where an AI doesn’t care about human flourishing because it doesn’t care about anything, it ends up acting like a tool that we can direct as we please.
I’m aware that you have counterarguments for all of these cases (that I will probably disagree with). But these counterarguments will have to be rooted in the actual nuts and bolts details of how actual, physical AI works. And if you are trying to reason about future machines, you want to be able to get a good prediction about their actual characteristics.
I think in this context, it’s totally reasonable for people to look at your (in my opinion poor) track record of prediction and adjust their credence in your effectiveness as an institution.
I disagree re: motte and bailey; the above is not at all in conflict with the position of the book (which, to be clear, I endorse and agree with and is also my position).
re: “you can imagine,” I strongly encourage people to be careful about leaning too hard on their own ability to imagine things; it’s often fraught and a huge chunk of the work MIRI does is poking at those imaginings to see where they collapse.
I’ll note that core MIRI predictions about e.g. how machines will be misaligned at current levels of sophistication are being borne out—things we have been saying for years about e.g. emergent drives and deception and hacking and brittle proxies. I’m pretty sure that’s not “rooted in the actual nuts and bolts details” in the way you’re wanting, but it still feels … relevant.
Thanks @Duncan Sabien for this excellent explanation. Don’t undersell yourself, I rate your communication here at least as good (if not better) than that of other senior MIRI people in recent years.
Thank you for attempting to explain Yudkowsky’s views as you understand them.
I don’t think anybody would be convinced by an a priori argument that any and all imaginable forms of AGI or ASI are highly existentially dangerous as a consequence of being prone to the sort of alignment failures Yudkowsky imagined in the 2000s, in the pre-deep learning era, regardless of the technological paradigm underlying them, simply by virtue of being AGI or ASI. I think you have to make some specific assumptions about how the underlying technology works. It’s not clear what assumptions Yudkowsky is making about that.
A lot of Yudkowsky’s (and Yudkowsky and Soares’) arguments about why deep learning is dangerous seem to depend on very loose, hazy analogies (like this). Yudkowsky doesn’t have any expertise, education, training, or research/engineering experience with deep learning. Some deep learning experts say he doesn’t know what he’s talking about, on even a basic level. He responded to this, but was unable to defend the technical point he was making — his response was more about vaguely casting aspersions and a preoccupation with social status over and above technical matters, as is often the case. So, I’m not sure he actually understands the underlying technology well enough to make a convincing, substantive, deep, detailed version of the argument he wants to make.
Whether deep learning is dissimilar from human cognition, and specifically dissimilar in a way that makes it 99.5% likely to cause human extinction, is not some side issue, but the topic on which the whole debate depends. If deep learning is highly similar to human cognition, or dissimilar but in a way that doesn’t make it existentially dangerous, then AGI and ASI based on deep learning would not have a 99.5% chance of causing human extinction. That’s not a minor detail. That’s the whole ballgame.
Also, as an aside, it’s bewildering to me how poorly Yudkowsky and others at MIRI take criticism of their ideas. In response to any sort of criticism or disagreement, Yudkowsky and other folks’ default response seems to be to fly into a rage and to try to attack or humiliate the person making the criticism/expressing the disagreement. Yudkowsky has explicitly said he believes he’s by far the smartest person on Earth, at least when it comes to the topic of AGI safety/alignment. He seems indignant at having to talk to people who are so much less intelligent than he is. Unfortunately, his attitude seems to have become the MIRI culture. (Soares has also apparently contributed to this dynamic.)
If you’re doing technical research, maybe you can get away with that — but even then I don’t think you can, because an inability to hear criticism/disagreement makes you much worse at research. But now that MIRI has pivoted to communications and advocacy, this will be an even more serious problem. If Yudkowsky and others at MIRI are incapable of engaging in civil intellectual debate, or are simply unwilling to, how on Earth are they going to be effective at advocacy and communications?
Again speaking more for the broad audience:
“Some experts downvote Yudkowsky’s standing to opine” is not a reasonable standard; some experts think vaccines cause autism. You can usually find someone with credentials in a field who will say almost anything.
The responsible thing to do (EDIT: if you’re deferring at all, as opposed to evaluating the situation for yourself) is to go look at the balance of what experts in a field are saying, and in this case, they’re fairly split, with plenty of respected big names (including many who disagree with Eliezer on many questions) saying he knows enough of what he’s talking about to be worth listening to. I get that Yarrow is not convinced, but I trust Hinton, who has reservations of his own but not of the form “Eliezer should be dismissed out of hand for lack of some particular technical expertise.”
Also: when the experts in a field are split, and the question is one of existential danger, it seems that the splitness itself is not reassuring. Experts in nuclear physics do not drastically diverge in their predictions about what will happen inside a bomb or reactor, because we understand nuclear physics. When experts in the field of artificial intelligence have wildly different predictions and the disagreement cannot be conclusively resolved, this is a sign of looseness in everyone’s understanding, and when you ask normal people on the street “hey, if one expert says an invention will kill everyone, and another says it won’t, and you ask the one who says it won’t where their confidence comes from, and they say ‘because I’m pretty sure we’ll muddle our way through, with unproven techniques that haven’t been invented yet, the risk of killing everyone is probably under 5%,’ how do you feel?”
they tend to feel alarmed.
And that characterization is not uncharitable—the optimists in this debate do not have an actual concrete plan. You can just go check. It all ultimately boils down to handwaving and platitudes and “I’m sure we’ll stay ahead of capabilities [for no explicable reason].”
And we’re intentionally aiming at something that exceeds us along the very axis that led us to dominate the planet, so … ?
Another way of saying this: it’s very, very weird that the burden of proof on this brand-new and extremely powerful technology is “make an airtight case that it’s dangerous” instead of “make an airtight case that it’s a good idea.” Even a 50⁄50 shared burden would be better than the status quo.
I’ll note that
...seems false.
This is not a crux for me. I think if you were paying attention, it was not hard to be convinced that AI extinction risk was a big deal in 2005–2015, when the expert consensus was something like “who cares, ASI is a long way off.” Most people in my college EA group were concerned about AI risk well before ML experts were concerned about it. If today’s ML experts were still dismissive of AI risk, that wouldn’t make me more optimistic.
Oh, I agree that if one feels equipped to go actually look at the arguments, one doesn’t need any argument-from-consensus. This is just, like, “if you are going to defer, defer reasonably.” Thanks for your comment; I feel similarly/endorse.
Made a small edit to reflect.
This seems like a motte-and-bailey. The question at hand is not about experts’ opinions on the general topic of existential risk from AGI, but specifically their assessment of Yudkowsky’s competence at understanding deep learning. You can believe that deep learning-based AGI is a serious existential risk within the next 20 years and also believe that Yudkowsky is not competent to understand the topic at a technical level.
As far as I know, Geoffrey Hinton has only commented on Yudkowsky’s high-level comments about existential risk from AGI — which is a concern Hinton shares — and not said anything about Yudkowsky’s technical competence on deep learning.
If you know any examples of prominent experts in deep learning vouching for Yudkowsky’s technical competence in deep learning, specifically, I invite you to give citations.
Yudkowsky has said he believes he’s by far the smartest person in the world at least when it comes to AI alignment/safety — as in, the second smartest doesn’t come close — and maybe the smartest person in the world in general. AI alignment/safety has been his life’s work since before he decided — seemingly sometime in the mid-to-late 2010s — deep learning was likely to lead to AGI. MIRI pays him about $600,000 a year to do research. By now, he’s had plenty of opportunity to learn about deep learning. Given this, shouldn’t he show a good grasp on concepts in deep learning? Shouldn’t he be competent at making technical arguments about deep learning? Shouldn’t he be able to clearly, coherently explain his reasoning?
It seems like Yudkowsky must at least be wrong about his own intelligence because if he really were as intelligent as he thinks, he wouldn’t struggle with basic concepts in deep learning or have such a hard time defending the technical points he wants to make about deep learning. He would just be able to make a clear, coherent case, demonstrating an understanding of the definitions of widely-used terms and concepts. Since he can’t do that, he must be overestimating his own abilities by quite a lot.
In domains other than AI, such as Japanese monetary policy, he has expressed views with a similar level of confidence and self-assurance as what he says about deep learning that turned out to be wrong, but, notably, never acknowledged the mistake. This speaks to Clara Collier’s point about not updating his views based on his new evidence. It’s not clear that any amount of evidence would (at least publicly) change his mind about any topic where he would lose face if he admitted being wrong. (He’s been wrong many times in the past. Has this ever happened before?) And if he doesn’t understand deep learning in the first place, then the public shouldn’t care whether he changes his mind or not.
You would most likely get fired for agreeing with me about this, so I can’t reasonably expect you to agree, but I might as well say the things that people on the payroll of a Yudkowsky-founded organization can’t say. For me, the cost isn’t losing a job, it’s just a bit of negative karma on a forum.
Sorry to be so blunt, but you’re asking for $6M to $10M to be redirected from possibly the world’s poorest people or animals in factory farms — or even other organizations working on AI safety — to your organization, led by Yudkowsky, so that you can try to influence policy on a national U.S. and international scale. Yudkowsky has indicated if his preferred policy were enacted at an international scale, it might increase the risk of wars. This calls for a high level of scrutiny. No one should accept weak, flimsy, hand-wavy arguments about this. No one should tiptoe around Yudkowsky’s track record of false or extremely dubious claims, or avoid questioning his technical competence in deep learning, which is in serious doubt, out of fear or politeness. If you, MIRI, or Yudkowsky don’t want this level of scrutiny, don’t ask for donations from the EA community and don’t try to influence policy.