I want to remind any reader that this is an opinion from 1999, when Eliezer was barely 20 years old.
I think your comment might give the misimpression that I don’t discuss this fact in the post or explain why I include the case. What I write is:
I should, once again, emphasize that Yudkowsky was around twenty when he did the final updates on this essay. In that sense, it might be unfair to bring this very old example up.
Nonetheless, I do think this case can be treated as informative, since: the belief was so analogous to his current belief about AI (a high outlier credence in near-term doom from an emerging technology), since he had thought a lot about the subject and was already highly engaged in the relevant intellectual community, since it’s not clear when he dropped the belief, and since twenty isn’t (in my view) actually all that young. I do know a lot of people in their early twenties; I think their current work and styles of thought are likely to be predictive of their work and styles of thought in the future, even though I do of course expect the quality to go up over time....
An addition reason why I think it’s worth distinguishing between his views on nanotech and (e.g.) your views on nuclear power: I think there’s a difference between an off-hand view picked up from other people vs. a fairly idiosyncratic view that you consciously adopted after a lot of reflection and that you decide to devote your professional life to and found an organization to address.
It’s definitely up to the reader to decide how relevant the nanotech case is. Since it’s not widely known, it seems at least pretty plausibly relevant, and the post twice flags his age at the time, I do still endorse including it.
At face value, as well: we’re trying to assess how much weight to give to someone’s extreme, outlier-ish prediction that an emerging technology is almost certain to kill everyone very soon. It just does seem very relevant, to me, that they previously had a different extreme outlier-ish prediction that another emerging technology was very likely kill everyone within a decade.
I don’t find it plausible that we should assign basically no significance to this.
On 6 (the question of whether Yudkowsky has acknowledged negative aspects of his track record):
For the two “clear cut” examples, Eliezer has posted dozens of times on the internet that he has disendorsed his views from before 2002. This is present on his personal website, the relevant articles are no longer prominently linked anywhere, and Eliezer has openly and straightforwardly acknowledged that his predictions and beliefs from the relevant period were wrong.
Similarly, I think your comment may give the impression that I don’t discuss this point in the post. What I write is this:
He has written about mistakes from early on in his intellectual life (particularly pre-2003) and has, on this basis, even made a blanket-statement disavowing his pre-2003 work. However, based on my memory and a quick re-read/re-skim, this writing is an exploration of why it took him a long time to become extremely concerned about existential risks from misaligned AI. For instance, the main issue it discusses with his plans to build AGI are that these plans didn’t take into account the difficulty and importance of ensuring alignment. This writing isn’t, I think, an exploration or acknowledgement of the kinds of mistakes I’ve listed in this post.
On the general point that this post uses old examples:
Give the sorts of predictions involved (forecasts about pathways to transformative technologies), old examples are generally going to be more unambiguous than new examples. Similarly for risk arguments: it’s hard to have a sense of how new arguments are going to hold up. It’s only for older arguments that we can start to approach the ability to say that technological progress, progress in arguments, and evolving community opinion say something clear-ish about how strong the arguments were.
On signposting:
I also dislike calling this post “On Deference and Yudkowsky’s AI Risk Estimates”, as if this post was trying to be an unbiased analysis of how much to defer to Eliezer, while you just list negative examples. I think this post is better named “against Yudkowsky on AI Risk estimates”. Or “against Yudkowsky’s track record in AI Risk Estimates”. Which would have made it clear that you are selectively giving evidence for one side, and more clearly signposted that if someone was trying to evaluate Eliezer’s track record, this post will only be a highly incomplete starting point.
I think it’s possible another title would have been better (I chose a purposely bland one partly for the purpose of trying to reduce heat—and that might have been a mistake). But I do think I signpost what the post is doing fairly clearly.
The introduction says it’s focusing on “negative aspects” of Yudkowsky’s track record, the section heading for the section introducing the examples describes them as “cherry-picked,” and the start of the section introducing the examples has an italicized paragraph re-emphasizing that the examples are selective and commenting on the significance of this selectiveness.
On the role of the fast take-off assumption in classic arguments:
I think the arguments are pretty tight and sufficient to establish the basic risk argument. I found your critique relatively uncompelling. In particular, I think you are misrepresenting that a premise of the original arguments was a fast takeoff.
I disagree with this. I do think it’s fair to say that fast take-off was typically a premise of the classic arguments.
Two examples I have off-hand (since they’re in the slides from my talk) are from Yudkowsky’s exchange with Caplan and from Superintelligence. Superintelligence isn’t by Yudkowsky, of course, but hopefully is still meaningful to include (insofar as Superintelligence heavily drew on Yudkowsky’s work and was often accepted as a kind of distillation of the best arguments as they existed at the time).
“I’d ask which of the following statements Bryan Caplan [a critic of AI risk arguments] denies:
Orthogonality thesis: Intelligence can be directed toward any compact goal….
Instrumental convergence: An AI doesn’t need to specifically hate you to hurt you; a paperclip maximizer doesn’t hate you but you’re made out of atoms that it can use to make paperclips, so leaving you alive represents an opportunity cost and a number of foregone paperclips….
Rapid capability gain and large capability differences: Under scenarios seeming more plausible than not, there’s the possibility of AIs gaining in capability very rapidly, achieving large absolute differences of capability, or some mixture of the two….
1-3 in combination imply that Unfriendly AI is a critical problem-to-be-solved, because AGI is not automatically nice, by default does things we regard as harmful, and will have avenues leading up to great intelligence and power.”
(Caveat that the fast-take-off premise is stated a bit ambiguity here, so it’s not clear what level of rapidness is being assumed.)
From Superintelligence:
Taken together, these three points [decisive strategic advantage, orthogonality, and instrumental convergence] thus indicate that the first superintelligence may shape the future of Earth-originating life, could easily have non-anthropomorphic final goals, and would likely have instrumental reasons to pursue open-ended resource acquisition. If we now reflect that human beings consist of useful resources (such as conveniently located atoms) and that we depend for our survival and flourishing on many more local resources, we can see that the outcome could easily be one in which humanity quickly becomes extinct.
The decisive strategic advantage point is justified through a discussion of the possibility of a fast take-off. The first chapter of the book also starts by introducing the possibility of an intelligence explosion. It then devotes two chapters to the possibility of a fast take-off and the idea this might imply a decisive strategic advantage, before it gets to discussing things like the orthogonality thesis.
I think it’s also relevant that content from MIRI and people associated with MIRI, raising the possibility of extinction from AI, tended to very strongly emphasize (e.g. spend most of its time on) the possibility of a run-away intelligence explosion. The most developed classic pieces arguing for AI risk often have names like “Shaping the Intelligence Explosion,” “Intelligence Explosion: Evidence and import,” “Intelligence Explosion Microeconomics,” and “Facing the Intelligence Explosion.”
Overall, then, I do think it’s fair to consider a fast-takeoff to be a core premise of the classic arguments. It wasn’t incidental or a secondary consideration.
[[Note: I’ve edited my comment, here, to respond to additional points. Although there are still some I haven’t responded to yet.]]
One quick response, since it was easy (might respond more later):
Overall, then, I do think it’s fair to consider a fast-takeoff to be a core premise of the classic arguments. It wasn’t incidental or a secondary consideration.
I do think takeoff speeds between 1 week and 10 years are a core premise of the classic arguments. I do think the situation looks very different if we spend 5+ years in the human domain, but I don’t think there are many who believe that that is going to happen.
I don’t think the distinction between 1 week and 1 year is that relevant to the core argument for AI Risk, since it seems in either case more than enough cause for likely doom, and that premise seems very likely to be true to me. I do think Eliezer believes things more on the order of 1 week than 1 year, but I don’t think the basic argument structure is that different in either case (though I do agree that the 1 year opens us up to some more potential mitigating strategies).
“Orthogonality thesis: Intelligence can be directed toward any compact goal….
Instrumental convergence: An AI doesn’t need to specifically hate you to hurt you; a paperclip maximizer doesn’t hate you but you’re made out of atoms that it can use to make paperclips, so leaving you alive represents an opportunity cost and a number of foregone paperclips….
Rapid capability gain and large capability differences: Under scenarios seeming more plausible than not, there’s the possibility of AIs gaining in capability very rapidly, achieving large absolute differences of capability, or some mixture of the two….
1-3 in combination imply that Unfriendly AI is a critical problem-to-be-solved, because AGI is not automatically nice, by default does things we regard as harmful, and will have avenues leading up to great intelligence and power.””
1-3 in combination don’t imply anything with high probability.
On 1 (the nanotech case):
I think your comment might give the misimpression that I don’t discuss this fact in the post or explain why I include the case. What I write is:
An addition reason why I think it’s worth distinguishing between his views on nanotech and (e.g.) your views on nuclear power: I think there’s a difference between an off-hand view picked up from other people vs. a fairly idiosyncratic view that you consciously adopted after a lot of reflection and that you decide to devote your professional life to and found an organization to address.
It’s definitely up to the reader to decide how relevant the nanotech case is. Since it’s not widely known, it seems at least pretty plausibly relevant, and the post twice flags his age at the time, I do still endorse including it.
At face value, as well: we’re trying to assess how much weight to give to someone’s extreme, outlier-ish prediction that an emerging technology is almost certain to kill everyone very soon. It just does seem very relevant, to me, that they previously had a different extreme outlier-ish prediction that another emerging technology was very likely kill everyone within a decade.
I don’t find it plausible that we should assign basically no significance to this.
On 6 (the question of whether Yudkowsky has acknowledged negative aspects of his track record):
Similarly, I think your comment may give the impression that I don’t discuss this point in the post. What I write is this:
On the general point that this post uses old examples:
Give the sorts of predictions involved (forecasts about pathways to transformative technologies), old examples are generally going to be more unambiguous than new examples. Similarly for risk arguments: it’s hard to have a sense of how new arguments are going to hold up. It’s only for older arguments that we can start to approach the ability to say that technological progress, progress in arguments, and evolving community opinion say something clear-ish about how strong the arguments were.
On signposting:
I think it’s possible another title would have been better (I chose a purposely bland one partly for the purpose of trying to reduce heat—and that might have been a mistake). But I do think I signpost what the post is doing fairly clearly.
The introduction says it’s focusing on “negative aspects” of Yudkowsky’s track record, the section heading for the section introducing the examples describes them as “cherry-picked,” and the start of the section introducing the examples has an italicized paragraph re-emphasizing that the examples are selective and commenting on the significance of this selectiveness.
On the role of the fast take-off assumption in classic arguments:
I disagree with this. I do think it’s fair to say that fast take-off was typically a premise of the classic arguments.
Two examples I have off-hand (since they’re in the slides from my talk) are from Yudkowsky’s exchange with Caplan and from Superintelligence. Superintelligence isn’t by Yudkowsky, of course, but hopefully is still meaningful to include (insofar as Superintelligence heavily drew on Yudkowsky’s work and was often accepted as a kind of distillation of the best arguments as they existed at the time).
From Yudkowsky’s debate with Caplan (2016):
(Caveat that the fast-take-off premise is stated a bit ambiguity here, so it’s not clear what level of rapidness is being assumed.)
From Superintelligence:
The decisive strategic advantage point is justified through a discussion of the possibility of a fast take-off. The first chapter of the book also starts by introducing the possibility of an intelligence explosion. It then devotes two chapters to the possibility of a fast take-off and the idea this might imply a decisive strategic advantage, before it gets to discussing things like the orthogonality thesis.
I think it’s also relevant that content from MIRI and people associated with MIRI, raising the possibility of extinction from AI, tended to very strongly emphasize (e.g. spend most of its time on) the possibility of a run-away intelligence explosion. The most developed classic pieces arguing for AI risk often have names like “Shaping the Intelligence Explosion,” “Intelligence Explosion: Evidence and import,” “Intelligence Explosion Microeconomics,” and “Facing the Intelligence Explosion.”
Overall, then, I do think it’s fair to consider a fast-takeoff to be a core premise of the classic arguments. It wasn’t incidental or a secondary consideration.
[[Note: I’ve edited my comment, here, to respond to additional points. Although there are still some I haven’t responded to yet.]]
One quick response, since it was easy (might respond more later):
I do think takeoff speeds between 1 week and 10 years are a core premise of the classic arguments. I do think the situation looks very different if we spend 5+ years in the human domain, but I don’t think there are many who believe that that is going to happen.
I don’t think the distinction between 1 week and 1 year is that relevant to the core argument for AI Risk, since it seems in either case more than enough cause for likely doom, and that premise seems very likely to be true to me. I do think Eliezer believes things more on the order of 1 week than 1 year, but I don’t think the basic argument structure is that different in either case (though I do agree that the 1 year opens us up to some more potential mitigating strategies).
“Orthogonality thesis: Intelligence can be directed toward any compact goal….
Instrumental convergence: An AI doesn’t need to specifically hate you to hurt you; a paperclip maximizer doesn’t hate you but you’re made out of atoms that it can use to make paperclips, so leaving you alive represents an opportunity cost and a number of foregone paperclips….
Rapid capability gain and large capability differences: Under scenarios seeming more plausible than not, there’s the possibility of AIs gaining in capability very rapidly, achieving large absolute differences of capability, or some mixture of the two….
1-3 in combination imply that Unfriendly AI is a critical problem-to-be-solved, because AGI is not automatically nice, by default does things we regard as harmful, and will have avenues leading up to great intelligence and power.””
1-3 in combination don’t imply anything with high probability.