On Deference and Yudkowsky’s AI Risk Estimates
Note: I mostly wrote this post after Eliezer Yudkowsky’s “Death with Dignity” essay appeared on LessWrong. Since then, Jotto has written a post that overlaps a bit with this one, which sparked an extended discussion in the comments. You may want to look at that discussion as well. See also, here, for another relevant discussion thread.
EDIT: See here for some post-discussion reflections on what I think this post got right and wrong.
Introduction
Most people, when forming their own views on risks from misaligned AI, have some inclination to defer to others who they respect or think of as experts.
This is a reasonable thing to do, especially if you don’t yet know much about AI or haven’t yet spent much time scrutinizing the arguments. If someone you respect has spent years thinking about the subject, and believes the risk of catastrophe is very high, then you probably should take that information into account when forming your own views.
It’s understandable, then, if Eliezer Yudkowsky’s recent writing on AI risk helps to really freak some people out. Yudkowsky has probably spent more time thinking about AI risk than anyone else. Along with Nick Bostrom, he is the person most responsible for developing and popularizing these concerns. Yudkowsky has now begun to publicly express the view that misaligned AI has a virtually 100% chance of killing everyone on Earth—such that all we can hope to do is “die with dignity.”
The purpose of this post is, simply, to argue that people should be wary of deferring too much to Eliezer Yudkowsky, specifically, when it comes to estimating AI risk.[1] In particular, I think, they shouldn’t defer to him more than they would defer to anyone else who is smart and has spent a large amount of time thinking about AI risk.[2]
The post highlights what I regard as some negative aspects of Yudkowsky’s track record, when it comes to technological risk forecasting. I think these examples suggest that (a) his track record is at best fairly mixed and (b) he has some tendency toward expressing dramatic views with excessive confidence. As a result, I don’t personally see a strong justification for giving his current confident and dramatic views about AI risk a great deal of weight.[3]
I agree it’s highly worthwhile to read and reflect on Yudkowsky’s arguments. I also agree that potential risks from misaligned AI deserve extremely serious attention—and are even, plausibly, more deserving of attention than any other existential risk.[4] I also think it’s important to note that many experts beyond Yudkowsky are very concerned about risks from misaligned AI. I just don’t think people should infer too much from the fact that Yudkowsky, specifically, believes we’re doomed.
Why write this post?
Before diving in, it may be worth saying a little more about why I hope this post might be useful. (Feel free to skip ahead if you’re not interested in this section.)
In brief, it matters what the existential risk community believes about the risk from misaligned AI. I think that excessively high credences in doom can lead to:
poor prioritization decisions (underprioritizing other risks, including other potential existential risks from AI)
poor community health (anxiety and alienation)
poor reputation (seeming irrational, cultish, or potentially even threatening), which in turn can lead to poor recruitment or retention of people working on important problems[5]
My own impression is that, although it’s sensible to take potential risks from misaligned AI very seriously, a decent number of people are now more freaked out than they need to be. And I think that excessive deference to some highly visible intellectuals in this space, like Yudkowsky, may be playing an important role—either directly or through deference cascades.[6] I’m especially concerned about new community members, who may be particularly inclined to defer to well-known figures and who may have particularly limited awareness of the diversity of views in this space. I’ve recently encountered some anecdotes I found worrisome.
Nothing I write in this post implies that people shouldn’t freak out, of course, since I’m mostly not engaging with the substance of the relevant arguments (although I have done this elsewhere, for instance here, here, and here). If people are going to freak out about AI risk, then I at least want to help make sure that they’re freaking out for sufficiently good reasons.
Yudkowsky’s track record: some cherry-picked examples
Here, I’ve collected a number of examples of Yudkowsky making (in my view) dramatic and overconfident predictions concerning risks from technology.
Note that this isn’t an attempt to provide a balanced overview of Yudkowsky’s technological predictions over the years. I’m specifically highlighting a number of predictions that I think are underappreciated and suggest a particular kind of bias.
Doing a more comprehensive overview, which doesn’t involve specifically selecting poor predictions, would surely give a more positive impression. Hopefully this biased sample is meaningful enough, however, to support the claim that Yudkowsky’s track record is at least pretty mixed.[7]
Also, a quick caveat: Unfortunately, but understandably, Yudkowsky didn’t have time review this post and correct any inaccuracies. In various places, I’m summarizing or giving impressions of lengthy pieces I haven’t fully read, or haven’t fully read in well more than year, so there’s a decent chance that I’ve accidentally mischaracterized some of his views or arguments. Concretely: I think there’s something on the order of a 50% chance I’ll ultimately feel I should correct something below.
Fairly clearcut examples
1. Predicting near-term extinction from nanotech
At least up until 1999, admittedly when he was still only about 20 years old, Yudkowsky argued that transformative nanotechnology would probably emerge suddenly and soon (“no later than 2010”) and result in human extinction by default. My understanding is that this viewpoint was a substantial part of the justification for founding the institute that would become MIRI; the institute was initially focused on building AGI, since developing aligned superintelligence quickly enough was understood to be the only way to manage nanotech risk:
On the nanotechnology side, we possess machines capable of producing arbitrary DNA sequences, and we know how to turn arbitrary DNA sequences into arbitrary proteins (6). We have machines—Atomic Force Probes—that can put single atoms anywhere we like, and which have recently [1999] been demonstrated to be capable of forming atomic bonds. Hundredth-nanometer precision positioning, atomic-scale tweezers… the news just keeps on piling up…. If we had a time machine, 100K of information from the future could specify a protein that built a device that would give us nanotechnology overnight….
If you project on a graph the minimum size of the materials we can manipulate, it reaches the atomic level—nanotechnology—in I forget how many years (the page vanished), but I think around 2035. This, of course, was before the time of the Scanning Tunnelling Microscope and “IBM” spelled out in xenon atoms. For that matter, we now have the artificial atom (“You can make any kind of artificial atom—long, thin atoms and big, round atoms.”), which has in a sense obsoleted merely molecular nanotechnology—the surest sign that nanotech is just around the corner. I believe Drexler is now giving the ballpark figure of 2013. My own guess would be no later than 2010…
Above all, I would really, really like the Singularity to arrive before nanotechnology, given the virtual certainty of deliberate misuse—misuse of a purely material (and thus, amoral) ultratechnology, one powerful enough to destroy the planet. We cannot just sit back and wait….
Mitchell Porter calls it “The race between superweapons and superintelligence.” Human civilization will continue to change until we either create superintelligence, or wipe ourselves out. Those are the two stable states, the two “attractors”. It doesn’t matter how long it takes, or how many cycles of nanowar-and-regrowth occur before Transcendence or final extinction. If the system keeps changing, over a thousand years, or a million years, or a billion years, it will eventually wind up in one attractor or the other. But my best guess is that the issue will be settled now.”
I should, once again, emphasize that Yudkowsky was around twenty when he did the final updates on this essay. In that sense, it might be unfair to bring this very old example up.
Nonetheless, I do think this case can be treated as informative, since: the belief was so analogous to his current belief about AI (a high outlier credence in near-term doom from an emerging technology), since he had thought a lot about the subject and was already highly engaged in the relevant intellectual community, since it’s not clear when he dropped the belief, and since twenty isn’t (in my view) actually all that young. I do know a lot of people in their early twenties; I think their current work and styles of thought are likely to be predictive of their work and styles of thought in the future, even though I do of course expect the quality to go up over time.
2. Predicting that his team had a substantial chance of building AGI before 2010
In 2001, and possibly later, Yudkowsky apparently believed that his small team would be able to develop a “final stage AI” that would “reach transhumanity sometime between 2005 and 2020, probably around 2008 or 2010.”
In the first half of the 2000s, he produced a fair amount of technical and conceptual work related to this goal. It hasn’t ultimately had much clear usefulness for AI development, and, partly on the basis, my impression is that it has not held up well—but that he was very confident in the value of this work at the time.
The key points here are that:
-
Yudkowsky has previously held short AI timeline views that turned out to be wrong
-
Yudkowsky has previously held really confident inside views about the path to AGI that (at least seemingly) turned out to be wrong
-
More generally, Yudkowsky may have a track record of overestimating or overstating the quality of his insights into AI
Flare
Although I haven’t evaluated the work, my impression is that Yudkowsky was a key part of a Singularity Institute effort to develop a new programming language to use to create “seed AI.” He (or whoever was writing the description of the project) seems to have been substantially overconfident about its usefulness. From the section of the documentation titled “Foreword: Earth Needs Flare” (2001):
A new programming language has to be really good to survive. A new language needs to represent a quantum leap just to be in the game. Well, we’re going to be up-front about this: Flare is really good. There are concepts in Flare that have never been seen before. We expect to be able to solve problems in Flare that cannot realistically be solved in any other language. We expect that people who learn to read Flare will think about programming differently and solve problems in new ways, even if they never write a single line of Flare….Flare was created under the auspices of the Singularity Institute for Artificial Intelligence, an organization created with the mission of building a computer program far before its time—a true Artificial Intelligence. Flare, the programming language they asked for to help achieve that goal, is not that far out of time, but it’s still a special language.”
Coding a Transhuman AI
I haven’t read it, to my discredit, but “Coding a Transhuman AI 2.2” is another piece of technical writing by Yudkowsky that one could look at. The document is described as “the first serious attempt to design an AI which has the potential to become smarter than human,” and aims to “describe the principles, paradigms, cognitive architecture, and cognitive components needed to build a complete mind possessed of general intelligence.”
From a skim, I suspect there’s a good chance it hasn’t held up well—since I’m not aware of any promising later work that builds on it and since it doesn’t seem to have been written with the ML paradigm in mind—but can’t currently give an informed take.
Levels of Organization in General Intelligence
A later piece of work which I also haven’t properly read is “Levels of Organization in General Intelligence.” At least by 2005, going off of Yudkowsky’s post “So You Want to be a Seed AI Programmer,” it seems like he thought a variation of the framework in this paper would make it possible for a very small team at the Singularity Institute to create AGI:
There’s a tradeoff between the depth of AI theory, the amount of time it takes to implement the project, the number of people required, and how smart those people need to be. The AI theory we’re planning to use—not LOGI, LOGI’s successor—will save time and it means that the project may be able to get by with fewer people. But those few people will have to be brilliant…. The theory of AI is a lot easier than the practice, so if you can learn the practice at all, you should be able to pick up the theory on pretty much the first try. The current theory of AI I’m using is considerably deeper than what’s currently online in Levels of Organization in General Intelligence—so if you’ll be able to master the new theory at all, you shouldn’t have had trouble with LOGI. I know people who did comprehend LOGI on the first try; who can complete patterns and jump ahead in explanations and get everything right, who can rapidly fill in gaps from just a few hints, who still don’t have the level of ability needed to work on an AI project.
Somewhat disputable examples
I think of the previous two examples as predictions that resolved negatively. I’ll now cover a few predictions that we don’t yet know are wrong (e.g. predictions about the role of compute in developing AGI), but I think now have reason to regard as significantly overconfident.
3. Having high confidence that AI progress would be extremely discontinuous and localized and not require much compute
In his 2008 “FOOM debate” with Robin Hanson, Yudkowsky confidentally staked out very extreme positions about what future AI progress would look like—without (in my view) offering strong justifications. The past decade of AI progress has also provided further evidence against the correctness of his core predictions.
A quote from the debate, describing the median development scenario he was imaging at the time:
When we try to visualize how all this is likely to go down, we tend to visualize a scenario that someone else once termed “a brain in a box in a basement.” I love that phrase, so I stole it. In other words, we tend to visualize that there’s this AI programming team, a lot like the sort of wannabe AI programming teams you see nowadays, trying to create artificial general intelligence, like the artificial general intelligence projects you see nowadays. They manage to acquire some new deep insights which, combined with published insights in the general scientific community, let them go down into their basement and work on it for a while and create an AI which is smart enough to reprogram itself, and then you get an intelligence explosion…. (p. 436)
The idea (as I understand it) was that AI progress would have very little impact on the world, then a small team of people with a very small amount of computing power would have some key insight, then they’d write some code for an AI system, then that system would rewrite its own code, and then it would shortly after take over the world.
When pressed by his debate partner, regarding the magnitude of the technological jump he was forecasting, Yudkowsky suggested that economic output could at least plausibly rise by twenty orders-of-magnitude within not much more than a week—once the AI system has developed relevant nanotechnologies (pg. 400).[8] To give a sense of how extreme that is: If you extrapolate twenty-orders-of-magnitude-per-week over the course of a year—although, of course, no one expected this rate to be maintained for anywhere close to a year—it is equivalent to an annual economic growth rate of (10^1000)%.
I think it’s pretty clear that this viewpoint was heavily influenced by the reigning AI paradigm at the time, which was closer to traditional programming than machine learning. The emphasis on “coding” (as opposed to training) as the means of improvement, the assumption that large amounts of compute are unnecessary, etc. seem to follow from this. A large part of the debate was Yudkowsky arguing against Hanson, who thought that Yudkowsky was underrating the importance of compute and “content” (i.e. data) as drivers of AI progress. Although Hanson very clearly wasn’t envisioning something like deep learning either[9], his side of the argument seems to fit better with what AI progress has looked like over the past decade. In particular, huge amounts of compute and data have clearly been central to recent AI progress and are currently commonly thought to be central—or, at least, necessary—for future progress.
In my view, the pro-FOOM essays in the debate also just offered very weak justifications for thinking that a small number of insights could allow a small programming team, with a small amount of computing power, to abruptly jump the economic growth rate up by several orders of magnitude. The main reasons that stood out to me, from the debate, are these:[10]
-
It requires less than a gigabyte to store someone’s genetic information on a computer (p. 444).[11]
-
The brain “just doesn’t look all that complicated” in comparison to human-made pieces of technology such as computer operating systems (p.444), on the basis of the principles that have been worked out by neuroscientists and cognitive scientists.
-
There is a large gap between the accomplishments of humans and chimpanzees, which Yudkowsky attributes this to a small architectural improvement: “If we look at the world today, we find that taking a little bit out of the architecture produces something that is just not in the running as an ally or a competitor when it comes to doing cognitive labor….[T]here are no branches of science where chimpanzees do better because they have mostly the same architecture and more relevant content” (p. 448).
-
Although natural selection can be conceptualized as implementing a simple algorithm, it was nonetheless capable of creating the human mind.
I think that Yudkowsky’s prediction—that a small amount of code, run using only a small amount of computing power, was likely to abruptly jump economic output upward by more than a dozen orders-of-magnitude—was extreme enough to require very strong justifications. My view is that his justifications simply weren’t that strong. Given the way AI progress has looked over the past decade, his prediction also seems very likely to resolve negatively.[12]
4. Treating early AI risk arguments as close to decisive
In my view, the arguments for AI risk that Yudkowsky had developed by the early 2010s had a lot of very important gaps. They were suggestive of a real risk, but were still far from worked out enough to justify very high credences in extinction from misaligned AI. Nonetheless, Yudkowsky recalls his credence in doom was “around the 50% range” at the time, and his public writing tended to suggest that he saw the arguments as very tight and decisive.
These slides summarize what I see as gaps in the AI risk argument that appear in Yudkowsky’s essays/papers and in Superintelligence, which presents somewhat fleshed out and tweaked versions of Yudkowsky’s arguments. This podcast episode covers most of the same points. (Note that almost none of these objections I walk through are entirely original to me.)
You can judge for yourself whether these criticisms of his arguments fair. If they seem unfair to you, then, of course, you should disregard this as an illustration of an overconfident prediction. One additional piece of evidence, though, is that his arguments focused on a fairly specific catastrophe scenario that most researchers now assign less weight to than they did when they first entered the field.
For instance, the classic arguments treated used an extremely sudden “AI takeoff” as a central premise. Arguably, fast takeoff was the central premise, since presentations of the risk often began by establishing that there is likely to be a fast take-off (and thus an opportunity for a decisive strategic advantage) and then built the remainder of the argument on top of this foundation. However, many people in the field have now moved away from finding sudden take-off arguments compelling (e.g. for the kinds of reasons discussed here and here).
My point, here, is not necessarily that Yudkowsky was wrong, but rather that he held a much higher credence in existential risk from AI than his arguments justified at the time. The arguments had pretty crucial gaps that still needed to be resolved[13], but, I believe, his public writing tended to suggest that these arguments were tight and sufficient to justify very high credences in doom.
5. Treating “coherence arguments” as forceful
In the mid-2010s, some arguments for AI risk began to lean heavily on “coherence arguments” (i.e. arguments that draw implications from the von Neumann-Morgenstern utility theorem) to support the case for AI risk. See, for instance, this introduction to AI risk from 2016, by Yudkowsky, which places a coherence argument front and center as a foundation for the rest of the presentation. I think it’s probably fair to guess that the introduction-to-AI-risk talk that Yudkowsky was giving in 2016 contained what he regarded as the strongest concise arguments available.
However, later analysis has suggested that coherence arguments have either no or very limited implications for how we should expect future AI systems to behave. See Rohin Shah’s (I think correct) objection to the use of “coherence arguments” to support AI risk concerns. See also similar objections by Richard Ngo and Eric Drexler (Section 6.4).
Unfortunately, this is another case where the significance of this example depends on how much validity you assign to a given critique. In my view, the critique is strong. However, I’m unsure what portion of alignment researchers currently agree with me. I do know of at least one prominent researcher who was convinced by it; people also don’t seem to make coherence arguments very often anymore, which perhaps suggests that the critiques have gotten traction. However, if you have the time and energy, you should reflect on the critiques for yourself.[14]
If the critique is valid, then this would be another example of Yudkowsky significantly overestimating the strength of an argument for AI risk.
[[EDIT: See here for a useful clarification by Rohin.]]
A somewhat meta example
6. Not acknowledging his mixed track record
So far as I know, although I certainly haven’t read all of his writing, Yudkowsky has never (at least publicly) seemed to take into account the mixed track record outlined above—including the relatively unambiguous misses.
He has written about mistakes from early on in his intellectual life (particularly pre-2003) and has, on this basis, even made a blanket-statement disavowing his pre-2003 work. However, based on my memory and a quick re-read/re-skim, this writing is an exploration of why it took him a long time to become extremely concerned about existential risks from misaligned AI. For instance, the main issue it discusses with his plans to build AGI are that these plans didn’t take into account the difficulty and importance of ensuring alignment. This writing isn’t, I think, an exploration or acknowledgement of the kinds of mistakes I’ve listed in this post.
The fact he seemingly hasn’t taken these mistakes into account—and, if anything, tends to write in a way that suggests he holds a very high opinion of his technological forecasting track record—leads me to trust his current judgments less than I otherwise would.
- ↩︎
To be clear, Yudkowsky isn’t asking other people to defer to him. He’s spent a huge amount of time outlining his views (allowing people to evaluate them on their merits) and has often expressed concerns about excessive epistemic deference.
- ↩︎
A better, but still far-from-optimal approach to deference might be to give a lot of weight to the “average” view within the pool of smart people who have spent a reasonable amount of time thinking about AI risk. This still isn’t great, though, since different people do deserve different amounts of weight, and since there’s at least some reason to think that selection effects might bias this pool toward overestimating the level of risk.
- ↩︎
It might be worth emphasizing that I’m not making any claim about the relative quality of my own track record.
- ↩︎
To say something concrete about my current views on misalignment risk: I’m currently inclined to assign a low-to-mid-single-digits probability to existential risk from misaligned AI this century, with a lot of volatility in my views. This is of course, in some sense, still extremely high!
- ↩︎
I think that expressing extremely high credences in existential risk (without sufficiently strong and clear justification) can also lead some people to simply dismiss the concerns. It is often easier to be taken seriously, when talking about strange and extreme things, if you express significant uncertainty. Importantly, I don’t think this means that people should ever misrepresent their levels of concern about existential risks; dishonesty seems like a really bad and corrosive policy. Still, this is one extra reason to think that it can be important to avoid overestimating risks.
- ↩︎
Yudkowsky is obviously a pretty polarizing figure. I’d also say that some people are probably too dismissive of him, for example because they assign too much significance to his lack of traditional credentials. But it also seems clear that many people are inclined to give Yudkowsky’s views a great deal of weight. I’ve even encountered the idea that Yudkowsky is virtually the only person capable of thinking about alignment risk clearly.
- ↩︎
I think that cherry-picking examples from someone’s forecasting track record is normally bad to do, even if you flag that you’re engaged in cherry-picking. However, I do think (or at least hope) that it’s fair in cases where someone already has a very high level of respect and frequently draws attention to their own successful predictions.
- ↩︎
I don’t mean to suggest that the specific twenty orders-of-magnitude of growth figure was the result of deep reflection or was Yudkowsky’s median estimate. Here is the specific quote, in response to Hanson raising the twenty orders-of-magnitude-in-a-week number: “Twenty orders of magnitude in a week doesn’t sound right, unless you’re talking about the tail end after the AI gets nanotechnology. Figure more like some number of years to push the AI up to a critical point, two to six orders of magnitude improvement from there to nanotech, then some more orders of magnitude after that.” I think that my general point, that this is a very extreme prediction, stays the same even if we lower the number to ten orders-of-magnitude and assume that there will be a bit of a lag between the ‘critical point’ and the development of the relevant nanotechnology.
- ↩︎
As an example of a failed prediction or piece of analysis on the other side of the FOOM debate, Hanson praised the CYC project—which lies far afield of the current deep learning paradigm and now looks like a clear dead end.
- ↩︎
Yudkowsky also provides a number of arguments in favor of the view that the human mind can be massively improved upon. I think these arguments are mostly right. However, I think, they don’t have any very strong implications for the question of whether AI progress will be compute-intensive, sudden, or localized.
- ↩︎
To probe just the relevance of this one piece of evidence, specifically, let’s suppose that it’s appropriate to use the length of a person’s genome in bits of information as an upper bound on the minimum amount of code required to produce a system that shares their cognitive abilities (excluding code associated with digital environments). This would imply that it is in principle possible to train an ML model that can do anything a given person can do, using something on the order of 10 million lines of code. But even if we accept this hypothesis—which seems quite plausible to me—it doesn’t seem to me like this implies much about the relative contributions of architecture and compute to AI progress or the extent to which progress in architecture design is driven by “deep insights.” For example, why couldn’t it be true that it is possible to develop a human-equivalent system using fewer than 10 million lines of code and also true that computing power (rather than insight) is the main bottleneck to developing such a system?
- ↩︎
Two caveats regarding my discussion of the FOOM debate:
First, I should emphasize that, although I think Yudkowsky’s arguments were weak when it came to the central hypothesis being debated, his views were in some other regards more reasonable than his debate partner’s. See here for comments by Paul Christiano on how well various views Yudkowsky expressed in the FOOM debate have held up.
Second, it’s been a few years since I’ve read the FOOM debate—and there’s a lot in there (the book version of it is 741 pages long) - so I wouldn’t be surprised if my high-level characterization of Yudkowsky’s arguments is importantly misleading. My characterization here is based on some rough notes I took the last time I read it.
- ↩︎
For example, it may be possible to construct very strong arguments for AI risk that don’t rely on the fast take-off assumption. However, in practice, I think it’s fair to say that the classic arguments did rely on this assumption. If the assumption wasn’t actually very justified, then, I think, it seems to follow that having a very high credence in AI risk also wasn’t justified at the time
- ↩︎
Here’s another example of an argument that’s risen to prominence in the past few years, and plays an important role in some presentations of AI risk, that I now suspect simply might not work. This argument shows up, for example, in Yudkowsky’s recent post “AGI Ruin: A List of Lethalities,” at the top of the section outlining “central difficulties.”
- Reasons I’ve been hesitant about high levels of near-ish AI risk by 22 Jul 2022 1:32 UTC; 207 points) (
- [linkpost] Christiano on agreement/disagreement with Yudkowsky’s “List of Lethalities” by 19 Jun 2022 22:47 UTC; 130 points) (
- Contra Yudkowsky on AI Doom by 24 Apr 2023 0:20 UTC; 95 points) (LessWrong;
- Future Matters #3: digital sentience, AGI ruin, and forecasting track records by 4 Jul 2022 17:44 UTC; 70 points) (
- 24 Jan 2023 0:21 UTC; 67 points) 's comment on My highly personal skepticism braindump on existential risk from artificial intelligence. by (
- Monthly Overload of EA—July 2022 by 1 Jul 2022 16:22 UTC; 55 points) (
- Forecasting Newsletter: June 2022 by 12 Jul 2022 12:35 UTC; 49 points) (
- Eliezer Yudkowsky Is Frequently, Confidently, Egregiously Wrong by 27 Aug 2023 1:07 UTC; 44 points) (
- [Link-post] On Deference and Yudkowsky’s AI Risk Estimates by 19 Jun 2022 17:25 UTC; 29 points) (LessWrong;
- 24 Dec 2022 17:09 UTC; 21 points) 's comment on Read The Sequences by (
- 24 Dec 2022 13:53 UTC; 13 points) 's comment on Read The Sequences by (
- 12 Apr 2024 15:54 UTC; 12 points) 's comment on Yanni Kyriacos’s Quick takes by (
- 24 Jan 2023 9:03 UTC; 7 points) 's comment on My highly personal skepticism braindump on existential risk from artificial intelligence. by (
- Eliezer Yudkowsky Is Frequently, Confidently, Egregiously Wrong by 27 Aug 2023 1:06 UTC; -25 points) (LessWrong;
EDIT: I’ve now written up my own account of how we should do epistemic deference in general, which fleshes out more clearly a bunch of the intuitions I outline in this comment thread.
I think that a bunch of people are overindexing on Yudkowsky’s views; I’ve nevertheless downvoted this post because it seems like it’s making claims that are significantly too strong, based on a methodology that I strongly disendorse. I’d much prefer a version of this post which, rather than essentially saying “pay less attention to Yudkowsky”, is more nuanced about how to update based on his previous contributions; I’ve tried to do that in this comment, for example. (More generally, rather than reading this post, I recommend people read this one by Paul Christiano, which outlines specific agreements and disagreements. Note that the list of agreements there, which I expect that many other alignment researchers also buy into, serves as a significant testament to Yudkowsky’s track record.)
The part of this post which seems most wild to me is the leap from “mixed track record” to
For any reasonable interpretation of this sentence, it’s transparently false. Yudkowsky has proven to be one of the best few thinkers in the world on a very difficult topic. Insofar as there are others who you couldn’t write a similar “mixed track record” post about, it’s almost entirely because they don’t have a track record of making any big claims, in large part because they weren’t able to generate the relevant early insights themselves. Breaking ground in novel domains is very, very different from forecasting the weather or events next year; a mixed track record is the price of entry.
Based on his track record, I would endorse people deferring more towards the general direction of Yudkowsky’s views than towards the views of almost anyone else. I also think that there’s a good case to be made that Yudkowsky tends to be overconfident, and this should be taken into account when deferring; but when it comes to making big-picture forecasts, the main value of deference is in helping us decide which ideas and arguments to take seriously, rather than the specific credences we should place on them, since the space of ideas is so large. The EA community has ended up strongly moving in Yudkowsky’s direction over the last decade, and that seems like much more compelling evidence than anything listed in this post.
I disagree that the sentence is false for the interpretation I have in mind.
I think it’s really important to seperate out the question “Is Yudkowsky an unusually innovative thinker?” and the question “Is Yudkowsky someone whose credences you should give an unusual amount of weight to?”
I read your comment as arguing for the former, which I don’t disagree with. But that doesn’t mean that people should currently weigh his risk estimates more highly than they weigh the estimates of other researchers currently in the space (like you).
But we do also need to try to have well-calibrated credences, of course. For the reason given in the post, it’s important to know whether the risk of everyone dying soon is 5% or 99%. It’s not enough just to determine whether we should take AI risk seriously.
We’re also now past the point, as a community, where “Should AI risk be taken seriously?” is that much of a live question. The main epistemic question that matters is what probability we assign to it—and I think this post is relevant to that.
I definitely recommend people read the post Paul just wrote! I think it’s overall more useful than this one.
But I don’t think there’s an either-or here. People—particularly non-experts in a domain—do and should form their views through a mixture of engaging with arguments and deferring to others. So both arguments and track records should be discussed.
I discuss this in response to another comment, here, but I’m not convinced of that point.
I phrased my reply strongly (e.g. telling people to read the other post instead of this one) because deference epistemology is intrinsically closely linked to status interactions, and you need to be pretty careful in order to make this kind of post not end up being, in effect, a one-dimensional “downweight this person”. I don’t think this post was anywhere near careful enough to avoid that effect. That seems particularly bad because I think most EAs should significantly upweight Yudkowsky’s views if they’re doing any kind of reasonable, careful deference, because most EAs significantly underweight how heavy-tailed the production of innovative ideas actually is (e.g. because of hindsight bias, it’s hard to realise how much worse than Eliezer we would have been at inventing the arguments for AI risk, and how many dumb things we would have said in his position).
By contrast, I think your post is implicitly using a model where we have a few existing, well-identified questions, and the most important thing is to just get to the best credences on those questions, and we should do so partly by just updating in the direction of experts. But I think this model of deference is rarely relevant; see my reply to Rohin for more details. Basically, as soon as we move beyond toy models of deference, the “innovative thinking” part becomes crucially important, and the “well-calibrated” part becomes much less so.
One last intuition: different people have different relationships between their personal credences and their all-things-considered credences. Inferring track records in the way you’ve done here will, in addition to favoring people who are quieter and say fewer useful things, also favor people who speak primarily based on their all-things-considered credences rather than their personal credences. But that leads to a vicious cycle where people are deferring to people who are deferring to people who… And then the people who actually do innovative thinking in public end up getting downweighted to oblivion via cherrypicked examples.
Modesty epistemology delenda est.
This seems like an overly research-centric position.
When your job is to come up with novel relevant stuff in a domain, then I agree that it’s mostly about “which ideas and arguments to take seriously” rather than specific credences.
When your job is to make decisions right now, the specific credences matter. Some examples:
Any cause prioritization decision, e.g. should funders reallocate nearly all biosecurity money to AI?
What should AI-focused community builders provide as starting resources?
Should there be an organization dedicated to solving Eliezer’s health problems? What should its budget be?
Should people try to solve technical AI alignment or try to, idk, create a culture of secrecy within AGI labs?
I think that there are very few decisions which are both a) that low-dimensional and b) actually sensitive to the relevant range of credences that we’re talking about.
Like, suppose you think that Eliezer’s credences on his biggest claims are literally 2x higher than they should be, even for claims where he’s 90% confident. This is a huge hit in terms of Bayes points; if that’s how you determine deference, and you believe he’s 2x off, then plausibly that implies you should defer to him less than you do to the median EA. But when it comes to grantmaking, for example, a cost-effectiveness factor of 2x is negligible given the other uncertainties involved—this should very rarely move you from a yes to no, or vice versa. (edit: I should restrict the scope here to grantmaking in complex, high-uncertainty domains like AI alignment).
Then you might say: well, okay, we’re not just making binary decisions, we’re making complex decisions where we’re choosing between lots of different options. But the more complex the decisions you’re making, the less you should care about whether somebody’s credences on a few key claims are accurate, and the more you should care about whether they’re identifying the right types of considerations, even if you want to apply a big discount factor to the specific credences involved.
As a simple example, as soon as you’re estimating more than one variable, you typically start caring a lot about whether the errors on your estimates are correlated or uncorrelated. But there are so many different possibilities for ways and reasons that they might be correlated that you can’t just update towards experts’ credences, you have to actually update towards experts’ reasons for those credences, which then puts you in the regime of caring more about whether you’ve identified the right types of considerations.
Such differences are crucial for many of the most important grant areas IME, because they are areas where you are trading off multiple high-stakes concerns. E.g. in nuclear policy all the strategies on offer have arguments that they might lead to nuclear war or worse war. On AI alignment there are multiple such tradeoffs and people embracing strategies to push the same variable in opposite directions with high stakes on both sides.
I haven’t thought much about nuclear policy, so I can’t respond there. But at least in alignment, I expect that pushing on variables where there’s less than a 2x difference between the expected positive and negative effects of changing that variable is not a good use of time for altruistically-motivated people.
(By contrast, upweighting or downweighting Eliezer’s opinions by a factor of 2 could lead to significant shifts in expected value, especially for people who are highly deferential. The specific thing I think doesn’t make much difference is deferring to a version of Eliezer who’s 90% confident about something, versus deferring to the same extent to a version of Eliezer who’s 45% confident in the same thing.)
My more general point, which doesn’t hinge on the specific 2x claim, is that naive conversions between metrics of calibration and deferential weightings are a bad idea, and that a good way to avoid naive conversions is to care a lot more about innovative thinking than calibration when deferring.
I think differences between Eliezer + my views often make way more than a 2x difference to the bottom line. I’m not sure why you’re only considering probabilities on specific claims; when I think of “deferring” I also imagine deferring on estimates of usefulness of various actions, which can much more easily have OOMs of difference.
(Fwiw I also think Eliezer is way more than 2x too high for probabilities on many claims, though I don’t think that matters much for my point.)
Taking my examples:
Since Eliezer thinks something like 99.99% chance of doom from AI, that reduces cost effectiveness of all x-risk-targeted biosecurity work by a factor of 10,000x (since only in 1 in 10,000 worlds does the reduced bio x-risk matter at all), whereas if you have < 50% of doom from AI (as I do) then that’s a discount factor of < 2x on x-risk-targeted biosecurity work. So that’s almost 4 OOMs of difference.
Eliezer seems very confident that a lot of existing alignment work is useless. So if you imagine taking a representative set of such papers as starting resources, I’d imagine that Eliezer would be at < 1% on “this will help the person become an effective alignment researcher” whereas I’d be at > 50% (for actual probabilities I’d want a better operationalization), leading to a >50x difference in cost effectiveness.
(And if you compare against the set of readings Eliezer would choose, I’d imagine the difference becomes even greater—I could imagine we’d each think the other’s choice would be net negative.)
I don’t have a citation but I’m guessing that Eliezer thinks that with more energy and project management skills he could make a significant dent in x-risk (perhaps 10 percentage points), while thinking the rest of the alignment field if fully funded can’t make a dent of more than 0.01 percentage points, suggesting that “improve Eliezer’s health + project management skills” is 3 OOM more important than “all other alignment work” (saying nothing about tractability, which I don’t know enough to evaluate). Whereas I’d have it at, idk, 1-2 OOM less important, for a difference of 4-5 OOMs.
This one is harder to make up numbers for but intuitively it seems like there should again be many OOMs of difference, primarily because we differ by many OOMs on “regular EAs trying to solve technical AI alignment” but roughly agree on the value of “culture of secrecy”.
I realize I haven’t engaged with the abstract points you made. I think I mostly just don’t understand them and currently they feel like they have to be wrong given the obvious OOMs of difference in all of the examples I gave. If you still disagree it would be great if you could explain how your abstract points play out in some of my concrete examples.
We both agree that you shouldn’t defer to Eliezer’s literal credences, because we both think he’s systematically overconfident. The debate is between two responses to that:
a) Give him less deference weight than the cautious, sober, AI safety people who make few novel claims but are better-calibrated (which is what Ben advocates).
b) Try to adjust for his overconfidence and then give significant deference weight to a version of his worldview that isn’t overconfident.
I say you should do the latter, because you should be deferring to coherent worldviews (which are rare) rather than deferring on a question-by-question basis. This becomes more and more true the more complex the decisions you have to make. Even for your (pretty simple) examples, the type of deference you seem to be advocating doesn’t make much sense.
For instance:
It doesn’t make sense to defer to Eliezer’s estimate of the relative importance of AI without also accounting for his estimate of the relative tractability of funding AI, which I infer he thinks is very low.
Again, the problem is that you’re deferring on a question-by-question basis, without considering the correlations between different questions—in this case, the likelihood that Eliezer is right, and the value of his work. (Also, the numbers seem pretty wild; maybe a bit uncharitable to ascribe to Eliezer the view that his research would be 3 OOM more valuable than the rest of the field combined? His tone is strong but I don’t think he’s ever made a claim that big.)
Here’s an alternative calculation which takes into account that correlation. I claim that the value of this organization is mainly determined by the likelihood that Eliezer is correct about a few key claims which underlie his research agenda. Suppose he thinks that’s 90% likely and I think that’s 10% likely. Then if our choices are “defer entirely to Eliezer” or “defer entirely to Richard”, there’s a 9x difference in funding efficacy. In practice, though, the actual disagreement here is between “defer to Eliezer no more than a median AI safety researcher” and something like “assume Eliezer is, say, 2x overconfident and then give calibrated-Eliezer, say, 30%ish of your deference weight”. If we assume for the sake of simplicity that every other AI safety researcher has my worldview, then the practical difference here is something like a 2x difference in this org’s efficacy (0.1 vs 0.3*0.9*0.5+0.7*0.1). Which is pretty low!
Won’t go through the other examples but hopefully that conveys the idea. The basic problem here, I think, is that the implicit “deference model” that you and Ben are using doesn’t actually work (even for very simple examples like the ones you gave).
There’s lots of things you can do under Eliezer’s worldview that add dignity points, like paying relevant people millions of dollars to spend a week really engaging with the arguments, or trying to get whole-brain emulation before AGI. My understanding is that he doesn’t expect those sorts of things to happen.
This seems like a crazy way to do cost-effectiveness analyses.
Like, if I were comparing deworming to GiveDirectly, would I be saying “well, the value of deworming is mainly determined by the likelihood that the pro-deworming people are right, which I estimate is 70% but you estimate is 50%, so there’s only a 1.4x difference”? Something has clearly gone wrong here.
It also feels like this reasoning implies that no EA action can be > 10x more valuable than any other action that an EA critic thinks is good? Since you assign a 90% chance that the EA is right, and the critic thinks there’s a 10% chance of that, so there’s only a 9x gap? And then once you do all of your adjustments it’s only 2x? Why do we even bother with cause prioritization under this worldview?
I don’t have a fleshed out theory of how and when to defer, but I feel pretty confident that even our intuitive pretheoretic deference should not be this sort of thing, and should be the sort of thing that can have orders of magnitude of difference between actions.
(One major thing is that I think you should be comparing between two actions, rather than evaluating an action by itself, which is why I compared to “all other alignment work”.)
I don’t see why you are not including “c) give significant deference weight to his actual worldview”, which is what I’d be inclined to do if I didn’t have significant AI expertise myself and so was trying to defer.
(Aside: note that Ben said “they shouldn’t defer to him more than they would defer to anyone else who seems smart and has spent a reasonable amount of time thinking about AI risk”, which is slightly different from your rephrasing, but that’s a nitpick)
¯\_(ツ)_/¯ Both the 10% and 0.01% (= 100% − 99.99%) numbers are ones I’ve heard reported (though both second-hand, not directly from Eliezer), and it also seems consistent with other things he writes. It seems entirely plausible that people misspoke or misremembered or lied, or that Eliezer was reporting probabilities “excluding miracles” or something else that makes these not the right numbers to use.
I’m not trying to be “charitable” to Eliezer, I’m trying to predict his views accurately (while noting that often people predict views inaccurately by failing to be sufficiently charitable). Usually when I see people say things like “obviously Eliezer meant this more normal, less crazy thing” they seem to be wrong.
Rob thinking that it’s not actually 99.99% is in fact an update for me.
IMO the crux is that I disagree with both of these. Instead I think you should use each worldview to calculate a policy, and then generate some kind of compromise between those policies. My arguments above were aiming to establish that this strategy is not very sensitive to exactly how much you defer to Eliezer, because there just aren’t very many good worldviews going around—hence why I assign maybe 15 or 20% (inside view) credence to his worldview (updated from 10% above after reflection). (I think my all-things-considered view is similar, actually, because deference to him cancels out against deference to all the people who think he’s totally wrong.)
Again, the difference is in large part determined by whether you think you’re in a low-dimensional space (here are our two actions, which one should we take?) versus a high-dimensional space (millions of actions available to us, how do we narrow it down?) In a high-dimensional space the tradeoffs between the best ways to generate utility according to Eliezer’s worldview and the best ways to generate utility according to other worldviews become much smaller.
Within a worldview, you can assign EVs which are orders of magnitude different. But once you do worldview diversification, if a given worldview gets even 1% of my resources, then in some sense I’m acting like that worldview’s favored interventions are in a comparable EV ballpark to all the other worldviews’ favored interventions. That’s a feature not a bug.
An arbitrary critic typically gets well less than 0.1% of my deference weight on EA topics (otherwise it’d run out fast!) But also see above: because in high-dimensional spaces there are few tradeoffs between different worldviews’ favored interventions, changing the weights on different worldviews doesn’t typically lead to many OOM changes in how you’re acting like you’re assigning EVs.
Also, I tend to think of cause prio as trying to integrate multiple worldviews into a single coherent worldview. But with deference you intrinsically can’t do that, because the whole point of deference is you don’t fully understand their views.
What do you mean “he doesn’t expect this sort of thing to happen”? I think I would just straightforwardly endorse doing a bunch of costly things like these that Eliezer’s worldview thinks are our best shot, as long as they don’t cause much harm according to other worldviews.
Because neither Ben nor myself was advocating for this.
Okay, my new understanding of your view is that you’re suggesting that (if one is going to defer) one should:
Identify a panel of people to defer to
Assign them weights based on how good they seem (e.g. track record, quality and novelty of ideas, etc)
Allocate resources to [policies advocated by person X] in proportion to [weight assigned to person X].
I agree that (a) this is a reasonable deference model and (b) under this deference model most of my calculations and questions in this thread don’t particularly make sense to think about.
However, I still disagree with the original claim I was disagreeing with:
Even in this new deference model, it seems like the specific weights chosen in step 2 are a pretty big deal (which seem like the obvious analogues of “credences”, and the sort of thing that Ben’s post would influence). If you switch from a weight of 0.5 to a weight of 0.3, that’s a reallocation of 20% of your resources, which is pretty large!
Yepp, thanks for the clear rephrasing. My original arguments for this view were pretty messy because I didn’t have it fully fleshed out in my mind before writing this comment thread, I just had a few underlying intuitions about ways I thought Ben was going wrong.
Upon further reflection I think I’d make two changes to your rephrasing.
First change: in your rephrasing, we assign people weights based on the quality of their beliefs, but then follow their recommended policies. But any given way of measuring the quality of beliefs (in terms of novelty, track record, etc) is only an imperfect proxy for quality of policies. For example, Kurzweil might very presciently predict that compute is the key driver of AI progress, but suppose (for the sake of argument) that the way he does so is by having a worldview in which everything is deterministic, individuals are powerless to affect the future, etc. Then you actually don’t want to give many resources to Kurzweil’s policies, because Kurzweil might have no idea which policies make any difference.
So I think I want to adjust the rephrasing to say: in principle we should assign people weights based on how well their past recommended policies for someone like you would have worked out, which you can estimate using things like their track record of predictions, novelty of ideas, etc. But notably, the quality of past recommended policies is often not very sensitive to credences! For example, if you think that there’s a 50% chance of solving nanotech in a decade, or a 90% chance of solving nanotech in a decade, then you’ll probably still recommend working on nanotech (or nanotech safety) either way.
Having said all that, since we only get one rollout, evaluating policies is very high variance. And so looking at other information like reasoning, predictions, credences, etc, helps you distinguish between “good” and “lucky”. But fundamentally we should think of these as approximations to policy evaluation, at least if you’re assuming that we mostly can’t fully evaluate whether their reasons for holding their views are sound.
Second change: what about the case where we don’t get to allocate resources, but we have to actually make a set of individual decisions? I think the theoretically correct move here is something like: let policies spend their weight on the domains which they think are most important, and then follow the policy which has spent most weight on that domain.
Some complications:
I say “domains” not “decisions” because you don’t want to make a series of related decisions which are each decided by a different policy, that seems incoherent (especially if policies are reasoning adversarially about how to undermine each other’s actions).
More generally, this procedure could in theory be sensitive to bargaining and negotiating dynamics between different policies, and also the structure of the voting system (e.g. which decisions are voted on first, etc). I think we can just resolve to ignore those and do fine, but in principle I expect it gets pretty funky.
Lastly, two meta-level notes:
I feel like I’ve probably just reformulated some kind of reinforcement learning. Specifically the case where you have a fixed class of policies and no knowledge of how they relate to each other, so you can only learn how much to upweight each policy. And then the best policy is not actually in your hypothesis space, but you can learn a simple meta-policy of when to use each existing policy.
It’s very ironic that in order to figure out how much to defer to Yudkowsky we need to invent a theory of idealised cooperative decision-making. Since he’s probably the person whose thoughts on that I trust most, I guess we should meta-defer to him about what that will look like...
In your Kurzweil example I think the issue is not that you assigned weights based on hypothetical-Kurzweil’s beliefs, but that hypothetical-Kurzweil is completely indifferent over policies. I think the natural fix is “moral parliament” style decision making where the weights can still come from beliefs but they now apply more to preferences-over-policies. In your example hypothetical-Kurzweil has a lot of weight but never has any preferences-over-policies so doesn’t end up influencing your decisions at all.
That being said, I agree that if you can evaluate quality of past recommended policies well, without a ton of noise, that would be a better signal than accuracy of beliefs. This just seems extremely hard to do, especially given the selection bias in who comes to your attention in the first place, and idk how I’d do it for Eliezer in any sane way. (Whereas you get to see people state many more beliefs and so there are a lot more data points that you can evaluate if you look at beliefs.)
I think you’re thinking way too much about credences-in-particular. The relevant notion is not “credences”, it’s that-which-determines-how-much-influence-the-person-has-over-your-actions. In this model of deference the relevant notion is the weights assigned in step 2 (however you calculate them), and the message of Ben’s post would be “I think people assign too high a weight to Eliezer”, rather than anything about credences. I don’t think either Ben or I care particularly much about credences-based-on-deference except inasmuch as they affect your actions.
I do agree that Ben’s post looks at credences that Eliezer has given and considers those to be relevant evidence for computing what weight to assign Eliezer. You could take a strong stand against using people’s credences or beliefs to compute weights, but that is at least a pretty controversial take (that I personally don’t agree with), and it seems different from what you’ve been arguing so far (except possibly in the parent comment).
This change seems fine. Personally I’m pretty happy with a rough heuristic of “here’s how I should be splitting my resources across worldviews” and then going off of intuitive “how much does this worldview care about this decision” + intuitive trading between worldviews rather than something more fleshed out and formal but that seems mostly a matter of taste.
Your procedure is non-robust in the sense that, if Kurzweil transitions from total indifference to thinking that one policy is better by epsilon, he’ll throw his full weight behind that policy. Hmm, but then in a parliamentary approach I guess that if there are a few different things he cares epsilon about, then other policies could negotiate to give him influence only over the things they don’t care about themselves. Weighting by hypothetical-past-impact still seems a bit more elegant, but maybe it washes out.
(If we want to be really on-policy then I guess the thing which we should be evaluating is whether the person’s worldview would have had good consequences when added to our previous mix of worldviews. And one algorithm for this is assigning policies weights by starting off from a state where they don’t know anything about the world, then letting them bet on all your knowledge about the past (where the amount they win on bets is determined not just by how correct they are, but also how much they disagree with other policies). But this seems way too complicated to be helpful in practice.)
I think I’m happy with people spending a bunch of time evaluating accuracy of beliefs, as long as they keep in mind that this is a proxy for quality of recommended policies. Which I claim is an accurate description of what I was doing, and what Ben wasn’t: e.g. when I say that credences matter less than coherence of worldviews, that’s because the latter is crucial for designing good policies, whereas the former might not be; and when I say that all-things-considered estimates of things like “total risk level” aren’t very important, that’s because in principle we should be aggregating policies not risk estimates between worldviews.
I also agree that selection bias could be a big problem; again, I think that the best strategy here is something like “do the standard things while remembering what’s a proxy for what”.
Meta: This comment (and some previous ones) get a bunch into “what should deference look like”, which is interesting, but I’ll note that most of this seems unrelated to my original claim, which was just “deference* seems important for people making decisions now, even if it isn’t very important in practice for researchers”, in contradiction to a sentence on your top-level comment. Do you now agree with that claim?
*Here I mean deference in the sense of how-much-influence-various-experts-have-over-your-actions. I initially called this “credences” because I thought you were imagining a model of deference in which literal credences determined how much influence experts had over your actions.
Agreed, but I’m not too worried about that. It seems like you’ll necessarily have some edge cases like this; I’d want to see an argument that the edge cases would be common before I switch to something else.
The chain of approximations could look something like:
The correct thing to do is to consider all actions / policies and execute the one with the highest expected impact.
First approximation: Since there are so many actions / policies, it would take too long to do this well, and so we instead take a shortcut and consider only those actions / policies that more experienced people have thought of, and execute the ones with the highest expected impact. (I’m assuming for now that you’re not in the business of coming up with new ideas of things to do.)
Second approximation: Actually it’s still pretty hard to evaluate the expected impact of the restricted set of actions / policies, so we’ll instead do the ones that the experts say is highest impact. Since the experts disagree, we’ll divide our resources amongst them, in accordance with our predictions of which experts have highest expected impact across their portfolios of actions. (This is assuming a large enough pile of resources that it makes sense to diversify due to diminishing marginal returns for any one expert.)
Third approximation: Actually expected impact of an expert’s portfolio of actions is still pretty hard to assess, we can save ourselves decision time by choosing weights for the portfolios according to some proxy that’s easier to assess.
It seems like right now we’re disagreeing about proxies we could use in the third approximation. It seems to me like proxies should be evaluated based on how close they reach the desired metric (expected future impact) in realistic use cases, which would involve both (1) how closely they align with “expected future impact” in general and (2) how easy they are to evaluate. It seems to me like you’re thinking mostly of (1) and not (2) and this seems weird to me; if you were going to ignore (2) you should just choose “expected future impact”. Anyway, individual proxies and my thoughts on them:
Beliefs / credences: 5⁄10 on easy to evaluate (e.g. Ben could write this post). 3⁄10 on correlation with expected future impact. Doesn’t take into account how much impact experts think their policies could have (e.g. the Kurzweil example above).
Coherence: 3⁄10 on easy to evaluate (seems hard to do this without being an expert in the field). 2⁄10 on correlation with expected future impact (it’s not that hard to have wrong coherent worldviews, see e.g. many pop sci books).
Hypothetical impact of past policies: 1⁄10 on easy to evaluate (though it depends on the domain). 7⁄10 on correlation with expected future impact (it’s not 9⁄10 or 10⁄10 because selection bias seems very hard to account for).
As is almost always the case with proxies, I would usually use an intuitive combination of all the available proxies, because that seems way more robust than relying on any single one. I am not advocating for only relying on beliefs.
I get the sense that you think I’m trying to defend “this is a good post and has no problems whatsoever”? (If so, that’s not what I said.)
Summarizing my main claims about this deference model that you might disagree with:
In practice, an expert’s beliefs / credences will be relevant information into deciding what weight to assign them,
Ben’s post provides relevant information about Eliezer’s beliefs (note this is not taking a stand on other aspects of the post, e.g. the claim about how much people should defer to Eliezer)
The weights assigned to experts are important / valuable to people who need to make decisions now (but they are usually not very important / valuable to researchers).
Meta: I’m currently writing up a post with a fully-fleshed-out account of deference. If you’d like to drop this thread and engage with that when it comes out (or drop this thread without engaging with that), feel free; I expect it to be easier to debate when I’ve described the position I’m defending in more detail.
I always agreed with this claim; my point was that the type of deference which is important for people making decisions now should not be very sensitive to the “specific credences” of the people you’re deferring to. You were arguing above that the difference between your and Eliezer’s views makes much more than a 2x difference; do you now agree that, on my account of deference, a big change in the deference-weight you assign to Eliezer plausibly leads to a much smaller change in your policy from the perspective of other worldviews, because the Eliezer-worldview trades off influence over most parts of the policy for influence over the parts that the Eliezer-worldview thinks are crucial and other policies don’t?
This is helpful, thanks. I of course agree that we should consider both correlations with impact and ease of evaluation; I’m talking so much about the former because not noticing this seems like the default mistake that people make when thinking about epistemic modesty. Relatedly, I think my biggest points of disagreement with your list are:
1. I think calibrated credences are badly-correlated with expected future impact, because:
a) Overconfidence is just so common, and top experts are often really miscalibrated even when they have really good models of their field
b) The people who are best at having impact have goals other than sounding calibrated—e.g. convincing people to work with them, fighting social pressure towards conformity, etc. By contrast, the people who are best at being calibrated are likely the ones who are always stating their all-things-considered views, and who therefore may have very poor object-level models. This is particularly worrying when we’re trying to infer credences from tone—e.g. it’s hard to distinguish the hypotheses “Eliezer’s inside views are less calibrated than other peoples” and “Eliezer always speaks based on his inside-view credences, whereas other people usually speak based on their all-things-considered credences”.
c) I think that “directionally correct beliefs” are much better-correlated, and not that much harder to evaluate, and so credences are especially unhelpful by comparison to those (like, 2⁄10 before conditioning on directional correctness, and 1⁄10 after, whereas directional correctness is like 3⁄10).
2. I think coherence is very well-correlated with expected future impact (like, 5⁄10), because impact is heavy-tailed and the biggest sources of impact often require strong, coherent views. I don’t think it’s that hard to evaluate in hindsight, because the more coherent a view is, the more easily it’s falsified by history.
3. I think “hypothetical impact of past policies” is not that hard to evaluate. E.g. in Eliezer’s case the main impact is “people do a bunch of technical alignment work much earlier”, which I think we both agree is robustly good.
I was arguing that EV estimates have more than a 2x difference; I think this is pretty irrelevant to the deference model you’re suggesting (which I didn’t know you were suggesting at the time).
No, I don’t agree with that. It seems like all the worldviews are going to want resources (money / time) and access to that is ~zero-sum. (All the worldviews want “get more resources” so I’m assuming you’re already doing that as much as possible.) The bargaining helps you avoid wasting resources on counterproductive fighting between worldviews, it doesn’t change the amount of resources each worldview gets to spend.
Going from allocating 10% of your resources to 20% of your resources to a worldview seems like a big change. It’s a big difference if you start with twice as much money / time as you otherwise would have, unless there just happens to be a sharp drop in marginal utility of resources between those two points for some reason.
Maybe you think that there are lots of things one could do that have way more effect than “redirecting 10% of one’s resources” and so it’s not a big deal? If so can you give examples?
I agree overconfidence is common and you shouldn’t literally calculate a Brier score to figure out who to defer to.
I agree that directionally-correct beliefs are better correlated than calibrated credences.
When I say “evaluate beliefs” I mean “look at stated beliefs and see how reasonable they look overall, taking into account what other people thought when the beliefs were stated” and not “calculate a Brier score”; I think this post is obviously closer to the former than the latter.
I agree that people’s other goals make it harder to evaluate what their “true beliefs” are, and that’s one of the reasons I say it’s only 3⁄10 correlation.
Re: correlation, I was implicitly also asking the question “how much does this vary across experts”. Across the general population, maybe coherence is 7⁄10 correlated with expected future impact; across the experts that one would consider deferring to I think it is more like 2⁄10, because most experts seem pretty coherent (within the domains they’re thinking about and trying to influence) and so the differences in impact depend on other factors.
Re: evaluation, it seems way more common to me that there are multiple strong, coherent, conflicting views that all seem compelling (see epistemic learned helplessness), which do not seem to have been easily falsified by history (in sufficiently obvious manner that everyone agrees which one is false).
This too is in large part because we’re looking at experts in particular. I think we’re good at selecting for “enough coherence” before we consider someone an expert (if anything I think we do it too much in the “public intellectual” space), and so evaluating coherence well enough to find differences between experts ends up being pretty hard.
I feel like looking at any EA org’s report on estimation of their own impact makes it seem like “impact of past policies” is really difficult to evaluate?
Eliezer seems like a particularly easy case, where I agree his impact is probably net positive from getting people to do alignment work earlier, but even so I think there’s a bunch of questions that I’m uncertain about:
How bad is it that some people completely dismiss AI risk because they encountered Eliezer and found it off putting? (I’ve explicitly heard something along the lines of “that crazy stuff from Yudkowsky” from multiple ML researchers.)
How many people would be working on alignment without Eliezer’s work? (Not obviously hugely fewer, Superintelligence plausibly still gets published, Stuart Russell plausibly still goes around giving talks about value alignment and its importance.)
To what extent did Eliezer’s forceful rhetoric (as opposed to analytic argument) lead people to focus on the wrong problems?
I’ve now written up a more complete theory of deference here. I don’t expect that it directly resolves these disagreements, but hopefully it’s clearer than this thread.
Note that this wouldn’t actually make a big change for AI alignment, since we don’t know how to use more funding. It’d make a big change if we were talking about allocating people, but my general heuristic is that I’m most excited about people acting on strong worldviews of their own, and so I think the role of deference there should be much more limited than when it comes to money. (This all falls out of the theory I linked above.)
Experts are coherent within the bounds of conventional study. When we try to apply that expertise to related topics that are less conventional (e.g. ML researchers on AGI; or even economists on what the most valuable interventions are) coherence drops very sharply. (I’m reminded of an interview where Tyler Cowen says that the most valuable cause area is banning alcohol, based on some personal intuitions.)
The question is how it compares to estimating past correctness, where we face pretty similar problems. But mostly I think we don’t disagree too much on this question—I think epistemic evaluations are gonna be bigger either way, and I’m mostly just advocating for the “think-of-them-as-a-proxy” thing, which you might be doing but very few others are.
Funding isn’t the only resource:
You’d change how you introduce people to alignment (since I’d guess that has a pretty strong causal impact on what worldviews they end up acting on). E.g. if you previously flipped a 10%-weighted coin to decide whether to send them down the Eliezer track or the other track, now you’d flip a 20%-weighted coin, and this straightforwardly leads to different numbers of people working on particular research agendas that the worldviews disagree about. Or if you imagine the community as a whole acting as an agent, you send 20% of the people to MIRI fellowships and the remainder to other fellowships (whereas previously it would be 10%).
(More broadly I think there’s a ton of stuff you do differently in community building, e.g. do you target people who know ML or people who are good at math?)
You’d change what you used political power for. I don’t particularly understand what policies Eliezer would advocate for but they seem different, e.g. I think I’m more keen on making sure particular alignment schemes for building AI systems get used and less keen on stopping everyone from doing stuff besides one secrecy-oriented lab that can become a leader.
Yeah, that’s what I mean.
Responding to other more minor points:
I mean that he predicts that these costly actions will not be taken despite seeming good to him.
I think it’s also important to consider Ben’s audience. If I were Ben I’d be imagining my main audience to be people who give significant deference weight to Eliezer’s actual worldview. If you’re going to write a top-level comment arguing against Ben’s post it seems pretty important to engage with the kind of deference he’s imagining (or argue that no one actually does that kind of deference, or that it’s not worth writing to that audience, etc).
(Of course, I could be wrong about who Ben imagines his audience to be.)
This survey suggests that he was at 96-98% a year ago.
Why do you think it suggests that? There are two MIRI responses in that range, but responses are anonymous, and most MIRI staff didn’t answer the survey.
I should have clarified that I think (or at least I thought so, prior to your question; kind of confused now) Yudkowsky’s answer is probably one of those two MIRI responses. Sorry about that.
I recall you or somebody else at MIRI once wrote something along the lines that most of MIRI researchers don’t actually believe that p(doom) is extremely high, like >90% doom. Then, in the linked post, there is a comment from someone who marked themselves both as a technical safety and strategy researcher and who gave 0.98, 0.96 on your questions. The style/content of the comment struck me as something Yudkowsky would have written.
Cool! I figured your reasoning was probably something along those lines, but I wanted to clarify that the survey is anonymous and hear your reasoning. I personally don’t know who wrote the response you’re talking about, and I’m very uncertain how many researchers at MIRI have 90+% p(doom), since only five MIRI researchers answered the survey (and marked that they’re from MIRI).
Musing out loud: I don’t know of any complete model of deference which doesn’t run into weird issues, like the conclusion that you should never trust yourself. But suppose you have some kind of epistemic parliament where you give your own views some number of votes, and assign the rest of the votes to other people in proportion to how defer-worthy they seem. Then you need to make a bunch of decisions, and your epistemic parliament keeps voting on what will best achieve your (fixed) goals.
If you do naive question-by-question majority voting on each question simultaneously then you can end up with an arbitrarily incoherent policy—i.e. a set of decisions that’s inconsistent with each other. And if you make the decisions in some order, with the constraint that they each have to be consistent with all prior decisions, then the ordering of the decisions can become arbitrarily important.
Instead, you want your parliament to negotiate some more coherent joint policy to follow. And I expect that in this joint policy, each worldview gets its way on the questions that are most important to it, and cedes responsibility on the questions that are least important. So Eliezer’s worldview doesn’t end up reallocating all the biosecurity money, but it does get a share of curriculum time (at least for the most promising potential researchers). But in general how to conduct those negotiations is an unsolved problem (and pretty plausibly unsolveable).
I could be wrong, but I’d guess Eliezer’s all-things-considered p(doom) is less extreme than that.
Yeah, I’m gonna ballpark guess he’s around 95%? I think the problem is that he cites numbers like 99.99% when talking about the chance of doom “without miracles”, which in his parlance means assuming that his claims are never overly pessimistic. Which seems like wildly bad epistemic practice. So then it goes down if you account for that, and then maybe it goes down even further if he adjusts for the possibility that other people are more correct than him overall (although I’m not sure that’s a mental move he does at all, or would ever report on if he did).
Even at 95% you get OOMs of difference by my calculations, though significantly fewer OOMs, so this doesn’t seem like the main crux.
Beat me to it & said it better than I could.
My now-obsolete draft comment was going to say:
It seems to me that between about 2004 and 2014, Yudkowsky was the best person in the world to listen to on the subject of AGI and AI risks. That is, deferring to Yudkowsky would have been a better choice than deferring to literally anyone else in the world. Moreover, after about 2014 Yudkowsky would probably have been in the top 10; if you are going to choose 10 people to split your deference between (which I do not recommend, I recommend thinking for oneself), Yudkowsky should be one of those people and had you dropped Yudkowsky from the list in 2014 you would have missed out on some important stuff. Would you agree with this?
On the positive side, I’d be interested to see a top ten list from you of people you think should be deferred to as much or more than Yudkowsky on matters of AGI and AI risks.*
*What do I mean by this? Idk, here’s a partial operationalization: Timelines, takeoff speeds, technical AI alignment, and p(doom).
[ETA: lest people write me off as a Yudkowsky fanboy, I wish to emphasize that I too think people are overindexing on Yudkowsky’s views, I too think there are a bunch of people who defer to him too much, I too think he is often overconfident, wrong about various things, etc.]
[ETA: OK, I guess I think Bostrom probably was actually slightly better than Yudkowsky even on 20-year timespan.]
[ETA: I wish to reemphasize, but more strongly, that Yudkowsky seems pretty overconfident not just now but historically. Anyone deferring to him should keep this in mind; maybe directly update towards his credences but don’t adopt his credences. E.g. think “we’re probably doomed” but not “99% chance of doom” Also, Yudkowsky doesn’t seem to be listening to others and understanding their positions well. So his criticisms of other views should be listened to but not deferred to, IMO.]
Didn’t you post that comment right here?
Oops! Dunno what happened, I thought it was not yet posted. (I thought I had posted it at first, but then I looked for it and didn’t see it & instead saw the unposted draft, but while I was looking for it I saw Richard’s post… I guess it must have been some sort of issue with having multiple tabs open. I’ll delete the other version.)
I agree, and I’m a bit confused that the top-level post does not violate forum rules in its current form. There is a version of the post – rephrased and reframed – that I think would be perfectly fine even though I would still disagree with it.
And I say that as someone who loved Paul’s response to Eliezer’s list!
Separately, my takeaway from Ben’s 80k interview has been that I think that Eliezer’s take on AI risk is much more truth-tracking than Ben’s. To improve my understanding, I would turn to Paul and ARC’s writings rather than Eliezer and MIRI’s, but Eliezer’s takes are still up there among the most plausible ones in my mind.
I suspect that the motivation for this post comes from a place that I would find epistemically untenable and that bears little semblance to the sophisticated disagreement between Eliezer and Paul. But I’m worried that a reader may come away with the impression that Ben and Paul fall into one camp and Eliezer into another on AI risk when really Paul agrees with Eliezer on many points when it comes to the importance and urgency of AI safety (see the list of agreements at the top of Paul’s post).
That seems like a considerable overstatement to me. I think it would be bad if the forum rules said an article like this couldn’t be posted.
Maybe, but I find it important to maintain the sort of culture where one can be confidently wrong about something without fear that it’ll cause people to interpret all future arguments only in light of that mistake instead of taking them at face value and evaluating them for their own merit.
The sort of entrepreneurialness that I still feel is somewhat lacking in EA requires committing a lot of time to a speculative idea on the off-chance that it is correct. If it is not, the entrepreneur has wasted a lot of time and usually money. If additionally it has the social cost that they can’t try again because people will dismiss them because of that past failure, it makes it just so much less likely still that anyone will try in the first place.
Of course that’s not the status quo. I just really don’t want EA to move in that direction.
If anything, I think that prohibiting posts like this from being published would have a more detrimental effect on community culture.
Of course, people are welcome to criticise Ben’s post—which some in fact do. That’s a very different category from prohibition.
Yeah, that sounds perfectly plausible to me.
“A bit confused” wasn’t meant to be any sort of rhetorical pretend understatement or something. I really just felt a slight surprise that caused me to check whether the forum rules contain something about ad hom, and found that they don’t. It may well be the right call on balance. I trust the forum team on that.
I really appreciate the time people have taken to engage with this post (and actually hope the attention cost hasn’t been too significant). I decided to write some post-discussion reflections on what I think this post got right and wrong.
The reflections became unreasonably long—and almost certainly should be edited down—but I’m posting them here in a hopefully skim-friendly format. They cover what I see as some mistakes with the post, first, and then cover some views I stand by.
Things I would do differently in a second version of the post:
1. I would either drop the overall claim about how much people should defer to Yudkowsky — or defend it more explicitly
At the start of the post, I highlight the two obvious reasons to give Yudkowsky’s risk estimates a lot of weight: (a) he’s probably thought more about the topic than anyone else and (b) he developed many of the initial AI risk arguments. I acknowledge that many people, justifiably, treat these as important factors when (explicitly or implicitly) deciding how much to defer to Yudkowsky.
Then the post gives some evidence that, at each stage of his career, Yudkowsky has made a dramatic, seemingly overconfident prediction about technological timelines and risks—and at least hasn’t obviously internalised lessons from these apparent mistakes.
The post expresses my view that these two considerations at least counterbalance each other—so that, overall, Yudkowsky’s risk estimates shouldn’t be given more weight than (e.g.) those of other established alignment researchers or the typical person on the OpenPhil worldview investigation team.
But I don’t do a lot in the post to actually explore how we should weigh these factors up. In that sense: I think it’d be fair to regard the post’s central thesis as importantly under-supported by the arguments contained in the post.
I should have either done more to explicitly defend my view or simply framed the post as “some evidence about the reliability of Yudkowsky’s risk estimates.”
2. I would be clearer about how and why I generated these examples
In hindsight, this is a significant oversight on my part. The process by which I generated these examples is definitely relevant for judging how representative they are—and, therefore, how much to update on them. But I don’t say anything about this in the post. My motives (or at least conscious motives) are also part of the story that I only discuss in pretty high-level terms, but seem like they might be relevant for forming judgments.
For context, then, here was the process:
A few years ago, I tried to get a clearer sense of the intellectual history of the AI risk and existential risk communities. For that reason, I read a bunch of old white papers, blog posts, and mailing list discussions.
These gave me the impression that Yudkowsky’s track record (and—to some extent—the track record of the surrounding community) was worse than I’d realised. From reading old material, I basically formed something like this impression: “At each stage of Yudkowsky’s professional life, his work seems to have been guided by some dramatic and confident belief about technological trajectories and risks. The older beliefs have turned out to be wrong. And the ones that haven’t yet resolved at least seem to have been pretty overconfident in hindsight.”
I kept encountering the idea that Yudkowsky has an exceptionally good track record or that he has an unparalleled ability to think well about AI (he’s also expressed view himself) - and I kept thinking, basically, that this seemed wrong. I wrote up some initial notes on this discrepancy at some point, but didn’t do anything with them.
I eventually decided to write something public after the “Death with Dignity” post, since the view it expresses (that we’re all virtually certain to die soon) both seems wrong to me and very damaging if it’s actually widely adopted in the community. I also felt like the “Death with Dignity” post was getting more play than it should, simply because people have a strong tendency to give Yudkowsky’s views weight. I can’t imagine a similar post written by someone else having nearly as large of an impact. Notably, since that post didn’t really have substantial arguments in it (although the later one did), I think the fact it had an impact is seemingly a testament to the power of deference; I think it’d be hard to look at the reaction to that post and argue that it’s only Yudkowsky’s arguments (rather than his public beliefs in-and-of-themselves) that have a major impact on the community.
People are obviously pretty aware of Yudkowsky’s positive contributions, but my impression is that (especially) new community members tended not to be aware of negative aspects of his track record. So I wanted to write a post drawing attention to the negative aspects.
I was initially going to have the piece explicitly express the impression I’d formed, which was something like: “At each stage of Yudkowsky’s professional life, his work has been guided by some dramatic and seemingly overconfident belief about technological trajectories and risks.” The examples in the post were meant to map onto the main ‘animating predictions’ about technology he had at each stage of his career. I picked out the examples that immediately came to mind.
Then I realised I wasn’t at all sure I could defend the claim that these were his main ‘animating predictions’ - the category was obviously extremely vague, and the main examples that came to mind were extremely plausibly a biased sample. I thought there was a good chance that if I reflected more, then I’d also want to include various examples that were more positive.
I didn’t want to spend the time doing a thorough accounting exercise, though, so I decided to drop any claim that the examples were representative and just describe them as “cherry-picked” — and add in lots of caveats emphasising that they’re cherry-picked.
(At least, these were my conscious thought processes and motivations as I remember them. I’m sure other factors played a role!)
3. I’d tweak my discussion of take-off speeds
I’d make it clearer that my main claim is: it would have been unreasonable to assign a very high credence to fast take-offs back in (e.g.) the early- or mid-2000s, since the arguments for fast take-offs had significant gaps. For example, there were a lots of possible countervailing arguments for slow take-offs that pro-fast-take-off authors simply hadn’t address yet — as evidenced, partly, by the later publication of slow-take-off arguments leading a number of people to become significantly more sympathetic to slow take-offs. (I’m not claiming that there’s currently a consensus against fast-take-off views.)
4. I’d add further caveats to the “coherence arguments” case—or simply leave it out
Rohin’s and Oli’s comments under the post have made me aware that there’s a more positive way to interpret Yudkowsky’s use of coherence arguments. I’m not sure if that interpretation is correct, or if it would actually totally undermine the example, but this is at minimum something I hadn’t reflected on. I think it’s totally possible that further reflection would lead me to simply remove the example.
Positions I stand by:
On the flipside, here’s a set of points I still stand by:
1. If a lot of people in the community believe AI is probably going to kill everyone soon, then (if they’re wrong) this can have really important negative effects
In terms of prioritisation: My prediction is that if you were to ask different funders, career advisors, and people making career decisions (e.g. deciding whether to go into AI policy or bio policy) how much they value having a good estimate of AI risk, they’ll very often answer that they value it a great deal. I do think that over-estimating the level of risk could lead to concretely worse decisions.
In terms of community health: I think that believing you’re probably going to die soon is probably bad for a large portion of people. Reputationally: Being perceived as believing that everyone is probably going to die soon (particularly if this actually an excessive level of worry) also seems damaging.
I think we should also take seriously the tail-risk that at least one person with doomy views (even if they’re not directly connected to the existential risk community) will take dramatic and badly harmful actions on the basis of their views.
2. Directly and indirectly, deference to Yudkowsky has a significant influence on a lot of people’s views
As above: One piece of evidence for this is Yudkowsky’s “Death with Dignity” post triggered a big reaction, even though it didn’t contain any significant new arguments. I think his beliefs (above and beyond his arguments) clearly do have an impact.
Another reason to believe deference is a factor: I think it’s both natural and rational for people, particularly people new to an area, to defer to people with more expertise in that area.[1] Yudkowsky is one of the most obvious people to defer to, as one of the two people most responsible for developing and popularising AI risk arguments and as someone who has (likely) spent more time thinking about the subject than anyone else.
Beyond that: A lot of people also clearly in general have huge amount of respect for Yudkowsky, sometimes more than they have for any other public intellectual. I think it’s natural (and sensible) for people’s views to be influenced by the views of the people they respect. In general, I think, unless you have tremendous self-control, this will tend to happen sub-consciously even if you don’t consciously choose to defer to the people you respect.
Also, people sometimes just do talk about Yudkowsky’s track record or reputation as a contributing factor to their views.
3. The track records of influential intellectuals (including Yudkowsky) should be publicly discussed.
A person’s track-record provides evidence about how reliable their predictions are. If people are considering how much to defer to some intellectual, then they should want to know what their track record (at least within the relevant domain) looks like.
The main questions that matter are: What has the intellectual gotten wrong and right? Beyond whether they were wrong or right, about a given case, does it also seem like their predictions were justified? If they’ve made certain kinds of mistakes in the past, do we now have reason to think they won’t repeat those kinds of mistakes?
4. Yudkowsky’s track record suggests a substantial bias toward dramatic and overconfident predictions.
One counter—which I definitely think it’s worth reflecting on—is that it might be possible to generate a similarly bias-suggesting list of examples like this for any other public intellectual or member of the existential risk community.
I’ll focus on one specific comment, suggesting that Yudkowsky’s incorrect predictions about nanotechnology are in the same reference class as ‘writing a typically dumb high school essay.’ The counter goes something like this: Yes, it was possible to find this example from Yudkowsky’s past—but that’s not importantly different than being able to turn up anyone else’s dumb high school essay about (e.g.) nuclear power.
Ultimately, I don’t buy the comparison. I think it’s really out-of-distribution for someone in their late teens and early twenties to pro-actively form the view that an emerging technology is likely to kill everyone within a decade, found an organization and devote years of their professional life to address the risk, and talk about how they’re the only person alive who can stop it.
That just seems very different from writing a dumb high school essay. Much more than a standard dumb high school essay, I think this aspect of Yudkowsky’s track record really does suggest a bias toward dramatic and overconfident predictions. This prediction is also really strikingly analogous to the prediction Yudkowsky is making right now—its relevance is clearly higher than the relevance of (e.g.) a random poorly thought-out view in a high school essay.
(Yudkowsky’s early writing and work is also impressive, in certain ways, insofar as it suggests a much higher level of originality of thought and agency than the typical young person has. But the fact that this example is impressive doesn’t undercut, I think, the claim that it’s also highly suggestive of a bias toward highly confident and dramatic predictions.)
5. Being one of the first people to identify, develop, or take seriously some idea doesn’t necessarily mean that you predictions about the idea will be unusually reliable
By analogy:
I don’t think we can assume that the first person to take the covid lab leak theory seriously (when others were dismissive) is currently the most reliable predictor of whether the theory is true.
I don’t think we can assume that the first person to develop the many worlds theory of quantum mechanics (when others were dismissive) would currently be the best person to predict whether the theory is true, if they were still alive.
There are, certainly, reasons to give pioneers in a domain special weight when weighing expert opinion in that domain.[2] But these reasons aren’t absolute.
There are even easons that point in the opposite direction: we might worry that the pioneer has an attachment to their theory, so will be biased toward believing it is true and as important as possible. We might also worry that the pioneering-ness of their beliefs is evidence that these beliefs front-ran the evidence and arguments (since one way to be early is to simply be excessively confident). We also have less evidence of their open-mindedness than we do for the people who later on moved toward the pioneer’s views — since moving toward the pioneer’s views, when you were initially dismissive, is at least a bit of evidence for open-mindedness and humility.[3]
Overall, I do think we should tend defer more to pioneers (all else being equal). But this tendency can definitely be overruled by other evidence and considerations.
6. The causal effects that people have had on the world don’t (in themselves) have implications for how much we should defer to them
At least in expectation, so far, Eliezer Yudkowsky has probably had a very positive impact on the world. There is a plausible case to be made that misaligned AI poses a substantial existential risk—and Yudkowsky’s work has probably, on net, massively increased the number of people thinking about it and taking it seriously. He’s also written essays that have exposed huge numbers of people to other important ideas and helped them to think more clearly. It makes sense for people to applaud all of this.
Still, I don’t think his positive causal effect on the world gives people much additional reason to be deferential to him.
Here’s a dumb thought experiment: Suppose that Yudkowsky wrote all of the same things, but never published them. But suppose, also, that a freak magnetic storm ended up implanting all of the same ideas in his would-be-readers’ brains. Would this absence of a casual effect count against deferring to Yudkowsky? I don’t think so. The only thing that ultimately matters, I think, is his track record of beliefs—and the evidence we currently have about how accurate or justified those beliefs were.
I’m not sure anyone disagrees with the above point, but I did notice there seemed to be a decent amount of discussion in the comments about Yudkowsky’s impact—and I’m not sure I think this issue will ultimately be relevant.[4]
For example: I had ten hours to form a view about the viability of some application of nanotechnology, I definitely wouldn’t want to ignore the beliefs of people who have already thought about the question. Trying to learn the relevant chemistry and engineering background wouldn’t be a good use of my time.
One really basic reason is simply that they’ve simply had more time to think about certain subjects than anyone else.
Here’s a concrete case: Holden Karnofsky eventually moved toward taking AI risks seriously, after publicly being fairly dismissive of it, and then wrote up a document analysing why he was initially dismissive and drawing lessons from the experience. It seems like we could count that as positive evidence about his future judgment.
Even though I’ve just said I’m not sure this question is relevant, I do also want to say a little bit about Yudkowsky’s impact. I personally think’s probably had a very significant impact. Nonetheless, I also think the impact can be overstated. For example, I think, it’s been suggested that the effective altruism community might not be very familiar with concepts like Bayesian or the importance of overcoming bias if it weren’t for Yudkowsky’s writing. I don’t really find that particular suggestion plausible.
Here’s one data point I can offer from my own life: Through a mixture of college classes and other reading, I’m pretty confident I had already encountered the heuristics and biases literature, Bayes’ theorem, Bayesian epistemology, the ethos of working to overcome bias, arguments for the many worlds interpretation, the expected utility framework, population ethics, and a number of other ‘rationalist-associated’ ideas before I engaged with the effective altruism or rationalist communities. For example, my college had classes in probability theory, Bayesian epistemology, and the philosophy of quantum mechanics, and I’d read at least parts of books like Thinking Fast and Slow, the Signal and the Noise, the Logic of Science, and various books associated with the “skeptic community.” (Admittedly, I think it would have been harder to learn some of these things if I’d gone to college a bit earlier or had a different major. I also probably “got lucky” in various ways with the classes I took and books I picked up.) See also Carl Shulman making a similar point and John Halstead also briefly commenting the way in which he personally encountered some the relevant ideas.
I noted some places I agree with your comment here, Ben. (Along with my overall take on the OP.)
Some additional thoughts:
The “death with dignity” post came in the wake of Eliezer writing hundreds of thousands of words about why he thinks alignment is hard in the Late 2021 MIRI Conversations (in addition to the many specific views and arguments about alignment difficulty he’s written up in the preceding 15+ years). So it seems wrong to say that everyone was taking it seriously based on deference alone.
The post also has a lot of content beyond “p(doom) is high”. Indeed, I think the post’s focus (and value-add) is mostly in its discussion of rationalization, premature/excessive conditionalizing, and ethical injunctions, not in the bare assertion that p(doom) is high. Eliezer was already saying pretty similar stuff about p(doom) back in September.
I disagree; I think that, e.g., noting how powerful and widely applicable general intelligence has historically been, and noting a bunch of standard examples of how human cognition is a total shitshow, is sufficient to have a very high probability on hard takeoff.
I think the people who updated a bunch toward hard takeoff based on the recent debate were making a mistake, and should have already had a similarly high p(hard takeoff) going back to the Foom debate, if not earlier.
Insofar as others disagree, I obviously think it’s a good thing for people to publish arguments like “but ML might be very competitive”, and for people to publicly respond to them. But I don’t think “but ML might be very competitive” and related arguments ought to look compelling at a glance (given the original simple arguments for hard takeoff), so I don’t think someone should need to consider the newer discussion in order to arrive at a confident hard-takeoff view.
(Also, insofar as Paul recently argued for X and Eliezer responded with a valid counter-argument for Y, it doesn’t follow that Eliezer had never considered anything like X or Y in initially reaching his confidence. Eliezer’s stated view is that the new Paul arguments seem obviously invalid and didn’t update him at all when he read them. Your criticism would make more sense here if Eliezer had said “Ah, that’s an important objection I hadn’t considered; but now that I’m thinking about it, I can generate totally new arguments that deal with the objections, and these new counter-arguments seem correct to me.”)
At least as important, IMO, is the visible quality of their reasoning and arguments, and their retrodictions.
AGI, moral philosophy, etc. are not topics where we can observe extremely similar causal processes today and test all the key claims and all the key reasoning heuristics with simple experiments. Tossing out ‘argument evaluation’ and ‘how well does this fit what I already know?’ altogether would mean tossing out the majority of our evidence about how much weight to put on people’s views.
I take the opposite view on this comparison. I agree that this is really unusual, but I think the comparison is unfavorable to the high school students, rather than unfavorable to Eliezer. Having unusual views and then not acting on them in any way is way worse than actually acting on your predictions.
I agree that Eliezer acting on his beliefs to this degree suggests he was confident; but in a side-by-side comparison of a high schooler who’s expressed equal confidence in some other unusual view, but takes no unusual actions as a result, the high schooler is the one I update negatively about.
(This also connects up to my view that EAs generally are way too timid/passive in their EA activity, don’t start enough new things, and (when they do start new things) start too many things based on ‘what EA leadership tells them’ rather than based on their own models of the world. The problem crippling EA right now is not that we’re generating and running with too many wildly different, weird, controversial moonshot ideas. The problem is that we’re mostly just passively sitting around, over-investing in relatively low-impact meta-level interventions, and/or hoping that the most mainstream already-established ideas will somehow suffice.)
I just wanted to state agreement that it seems a large number of people largely misread Death with Dignity, at least according to what seems to me the most plausible intended message: mainly about the ethical injunctions (which are very important as a finitely-rational and prone-to-rationalisation being), as Yudkowsky has written of in the past.
The additional detail of ‘and by the way this is a bad situation and we are doing badly’ is basically modal Yudkowsky schtick and I’m somewhat surprised it updated anyone’s beliefs (about Yudkowsky’s beliefs, and therefore their all-things-considered-including-deference beliefs).
I think if he had been a little more audience-aware he might have written it differently. Then again maybe not, if the net effect is more attention and investment in AI safety—and more recent posts and comments suggest he’s more willing than before to use certain persuasive techniques to spur action (which seems potentially misguided to me, though understandable).
I think “deference alone” is a stronger claim than the one we should worry about. People might read the arguments on either side (or disproportionately Eliezer’s arguments), but then defer largely to Eliezer’s weighing of arguments because of his status/position, confidence, references to having complicated internal models (that he often doesn’t explain or link explanations to), or emotive writing style.
What share of people with views similar to Eliezer’s do you expect to have read these conversations? They’re very long, not well organized, and have no summaries/takeaways. The format seems pretty bad if you value your time.
I think the AGI Ruin: A List of Lethalities post was formatted pretty accessibly, but that came after death with dignity.
If the new Paul arguments seem obviously invalid, then Eliezer should be able to explain why in such a way that convinces Paul. Has this generally been the case?
I appreciate this update!
I am confused about you bringing in the claim of “at each stage of his career”, given that the only two examples you cited that seemed to provide much evidence here were from the same (and very early) stage of his career. Of course, you might have other points of evidence that point in this direction, but I did want to provide some additional pushback on the “at each stage of his career” point, which I think you didn’t really provide evidence for.
I do think finding evidence for each stage of his career would of course be time-consuming, and I understand that you didn’t really want to go through all of that, but it seemed good to point out explicitly.
FWIW, indeed in my teens I basically did dedicate a good chunk of my time and effort towards privacy efforts out of a concern for US and UK-based surveillance-state concerns. I was in high-school, so making it my full-time efforts was a bit hard, though I did help found a hackerspace in my hometown that had a lot of privacy concerns baked into the culture, and I did write a good number of essays on this. I think the key difference between me and Eliezer here is more the fact that Eliezer was home-schooled and had experience doing things on his own, and not some kind of other fact about his relationship to the ideas being very different.
It’s plausible you should update similarly on me, which I think isn’t totally insane (I do think I might have, as Luke put it, the “taking ideas seriously gene”, which I would also associate with taking other ideas to their extremes, like religious beliefs).
I really appreciated this update. Mostly it checks out to me, but I wanted to push back on this:
It seems to me that a good part of the beliefs I care about assessing are the beliefs about what is important. When someone has a track record of doing things with big positive impact, that’s some real evidence that they have truth-tracking beliefs about what’s important. In the hypothetical where Yudkowsky never published his work, I don’t get the update that he thought these were important things to publish, so he doesn’t get credit for being right about that.
There’s also (imperfect) information in “lots of smart people thought about EY’s opinions and agree with him” that you don’t get from the freak magnetic storm scenario.
Thanks for writing this update. I think my number one takeaway here is something like: when writing a piece with the aim of changing community dynamics, it’s important to be very clear about motivations and context. E.g. I think a version of the piece which said “I think people are overreacting to Death with Dignity, here are my specific models of where Yudkowsky tends to be overconfident, here are the reasons why I think people aren’t taking those into account as much as they should” would have been much more useful and much less controversial than the current piece, which (as I interpret it) essentially pushes a general “take Yudkowsky less seriously” meme (and is thereby intrinsically political/statusy).
I’m a bit confused about a specific small part:
I imagine that for many people, including me (including you?), once we work on [what we believe to be] preventing the world from ending, we would only move to another job if it was also preventing the world from ending, probably in an even more important way.
In other words, I think “working at a 2nd x-risk job and believing it is very important” is mainly predicted by “working at a 1st x-risk job and believing it is very important”, much more than by personality traits.
This is almost testable, given we have lots of people working on x-risk today and believing it is very important. But maybe you can easily put your finger on what I’m missing?
For what it’s worth, I found this post and the ensuing comments very illuminating. As a person relatively new to both EA and the arguments about AI risk, I was a little bit confused as to why there was not much push back on the very high confidence beliefs about AI doom within the next 10 years. My assumption had been that there was a lot of deference to EY because of reverence and fealty stemming from his role in getting the AI alignment field started not to mention the other ways he has shaped people’s thinking. I also assumed that his track record on predictions was just ambiguous enough for people not to question his accuracy. Given that I don’t give much credence to the idea that prophets/oracles exist, I thought it unlikely that the high confidence on his predictions were warranted on the count that there doesn’t seem to be much evidence supporting the accuracy of long range forecasts. I did not think that there were such glaring mispredictions made by EY in the past so thank you for highlighting them.
I feel like people are missing one fairly important consideration when discussing how much to defer to Yudkowsky, etc. Namely, I’ve heard multiple times that Nate Soares, the executive director of MIRI, has models of AI risk that are very similar to Yudkowsky’s, and their p(doom) are also roughly the same. My limited impression is that Soares is no less smart or otherwise capable than Yudkowsky. So, when having this kind of discussion, focusing on Yudkowsky’s track record or whatever, I think it’s good to remember that there’s another very smart person, who entered AI safety much later than Yudkowsky, and who holds very similar inside views on AI risk.
This isn’t much independent evidence I think: seems unlikely that you could become director of MIRI unless you agreed. (I know that there’s a lot of internal disagreement at other levels.)
My point has little to do with him being the director of MIRI per se.
I suppose I could be wrong about this, but my impression is that Nate Soares is among the top 10 most talented/insightful people with elaborate inside view and years of research experience in AI alignment. He also seems to agree with Yudkowsky on a whole lot of issues and predicts about the same p(doom) for about the same reasons. And I feel that many people don’t give enough thought to the fact that while e.g. Paul Christiano has interacted a lot with Yudkowsky and disagreed with him on many key issues (while agreeing on many others), there’s also Nate Soares, who broadly agrees with Yudkowsky’s models that predict very high p(doom).
Another, more minor point: if someone is bringing up Yudkowsky’s track record in the context of his extreme views on AI risk, it seems helpful to talk about Soares’ track record as well.
I think this maybe argues against a point not made in the OP. Garfinkel isn’t saying “disregard Yudkowsky’s views”—rather he’s saying “don’t give them extra weight just because Yudkowsky’s the one saying them”.
For example, from his reply to Richard Ngo:
So at least from Garfinkel’s perspective, Yudkowsky and Soares do count as data points, they’re just equal in weight to other relevant data points.
(I’m not expressing any of my own, mostly unformed, views here)
Ben has said this about Eliezer, but not about Nate, AFAIK.
‘Here’s one data point I can offer from my own life: Through a mixture of college classes and other reading, I’m pretty confident I had already encountered the heuristics and biases literature, Bayes’ theorem, Bayesian epistemology, the ethos of working to overcome bias, arguments for the many worlds interpretation, the expected utility framework, population ethics, and a number of other ‘rationalist-associated’ ideas before I engaged with the effective altruism or rationalist communities.’
I think some of this is just a result of being a community founded partly by analytic philosophers. (though as a philosopher I would say that!).
I think it’s normal to encounter some of these ideas in undergrad philosophy programs. At my undergrad back in 2005-09 there was a whole upper-level undergraduate course in decision theory. I don’t think that’s true everywhere all the time, but I’d be surprised if it was wildly unusual. I can’t remember if we covered population ethics in any class, but I do remember discovering Parfit on the Repugnant Conclusion in 2nd-year of undergrad because one of my ethics lecturers said Reasons and Persons was a super-important book. In terms of the Oxford phil scene where the term “effective altruism” was born, the main titled professorship in ethics at that time was held by John Broome, a utilitarianism-sympathetic former economist, who had written famous stuff on expected utility theory. I can’t remember if he was the PhD supervisor of anyone important to the founding of EA, but I’d be astounded if some of the phil. people involved in that had not been reading his stuff and talking to him about it. Most of the phil. physics people at Oxford were gung-ho for many worlds, it’s not a fringe view in philosophy of physics as far as I know. (Though I think Oxford was kind of a centre for it and there was more dissent elsewhere.) As far as I can tell, Bayesian epistemology in at least some senses of that term is a fairly well-known approach in philosophy of science. Philosophers specializing in epistemology might more often ignore it, but they know it’s there. And not all of them ignore it! I’m not an epistemologist, by my doctoral supervisor was, and it’s not unusual for his work to refer to Bayesian ideas in modelling stuff about how to evaluate evidence. (I.e. in uhm, defending the fine-tuning argument for the existence of God, which might not be the best use, but still!: https://www.yoaavisaacs.com/uploads/6/9/2/0/69204575/ms_for_fine-tuning_fine-tuning.pdf). (John was my supervisor, not Yoav.)
A high interest in bias stuff might genuinely be more an Eliezer/LessWrong legacy though.
Indeed, Broome co-supervised the doctoral theses of both Toby Ord and Will MacAskill. And Broome was, in fact, the person who advised Will to get in touch with Toby, before the two had met.
Speaking for myself, I was interested in a lot of the same things in the LW cluster (Bayes, approaches to uncertainty, human biases, utilitarianism, philosophy, avoiding the news) before I came across LessWrong or EA. The feeling is much more like “I found people who can describe these ideas well” than “oh these are interesting and novel ideas to me.” (I had the same realization when I learned about utilitarianism...much more of a feeling that “this is the articulation of clearly correct ideas, believing otherwise seems dumb”).
That said, some of the ideas on LW that seemed more original to me (AI risk, logical decision theory stuff, heroic responsibility in an inadequate world), do seem both substantively true and extremely important, and it took me a lot of time to be convinced of this.
(There are also other ideas that I’m less sure about, like cryonics and MW).
Veering entirely off-topic here, but how does the many worlds hypothesis tie in with all the rest of the rationality/EA stuff?
[replying only to you with no context]
EY pointed out the many worlds hypothesis as a thing that even modern science, specifically physics (which is considered a very well functioning science, it’s not like social psychology), is missing.
And he used this as an example to get people to stop trusting authority, including modern science, which many people around him seem to trust.
I think this is a reasonable reference.
Can’t say any of that makes sense to me. I have the feeling there’s some context I’m totally missing (or he’s just wrong about it). I may ask you about this in person at some point :)
Edit: I think this came off more negatively than I intended it to, particularly about Yudkowsky’s understanding of physics. The main point I was trying to make is that Yudkowsky was overconfident, not that his underlying position was wrong. See the replies for more clarification.
I think there’s another relevant (and negative) data point when discussing Yudkowsky’s track record: his argument and belief that the Many-Worlds Interpretation of quantum mechanics is the only viable interpretation of quantum mechanics, and anyone who doesn’t agree is essentially a moron. Here’s one 2008 link from the Sequences where he expresses this position[1]; there are probably many other places where he’s said similar things. (To be clear, I don’t know if he still holds this belief, and if he doesn’t anymore, when and why he updated away from it.)
Many Worlds is definitely a viable and even leading interpretation, and may well be correct. But Yudkowsky’s confidence in Many Worlds, as well as his conviction that people who disagree with him are making elementary mistakes, is more than a little disproportionate, and may come partly from a lack of knowledge and expertise.
The above is a paraphrase of Scott Aaronson, a credible authority on quantum mechanics who is sympathetic to both Yudkowsky and Many Worlds (bold added):
While this isn’t directly related to AI risk, I think it’s relevant to Yudkowsky’s track record as a public intellectual.
He expresses this in the last six paragraphs of the post. I’m excerpting some of it (bold added, italics were present in the original):
OTOH, I am (or I guess was?) a professional physicist, and when I read Rationality A-Z, I found that Yudkowsky was always reaching exactly the same conclusions as me whenever he talked about physics, including areas where (IMO) the physics literature itself is a mess—not only interpretations of QM, but also how to think about entropy & the 2nd law of thermodynamics, and, umm, I thought there was a third thing too but I forget.
That increased my respect for him quite a bit.
And who the heck am I? Granted, I can’t out-credential Scott Aaronson in QM. But FWIW, hmm let’s see, I had the highest physics GPA in my Harvard undergrad class and got the highest preliminary-exam score in my UC Berkeley physics grad school class, and I’ve played a major role in designing I think 5 different atomic interferometers (including an atomic clock) for various different applications, and in particular I was always in charge of all the QM calculations related to estimating their performance, and also I once did a semester-long (unpublished) research project on quantum computing with superconducting qubits, and also I have made lots of neat wikipedia QM diagrams and explanations including a pedagogical introduction to density matrices and mixed states.
I don’t recall feeling strongly that literally every word Yudkowsky wrote about physics was correct, more like “he basically figured out the right idea, despite not being a physicist, even in areas where physicists who are devoting their career to that particular topic are all over the place”. In particular, I don’t remember exactly what Yudkowsky wrote about the no-communication theorem. But I for one absolutely understand mixed states, and that doesn’t prevent me from being a pro-MWI extremist like Yudkowsky.
I agree that: Yudkowsky has an impressive understanding of physics for a layman, in some situations his understanding is on par with or exceeds some experts, and he has written explanations of technical topics that even some experts like and find impressive. This includes not just you, but also e.g. Scott Aaronson, who praised his series on QM in the same answer I excerpted above, calling it entertaining, enjoyable, and getting the technical stuff mostly right. He also praised it for its conceptual goals. I don’t believe this is faint praise, especially given stereotypes of amateurs writing about physics. This is a positive part of Yudkowsky’s track record. I think my comment sounds more negative about Yudkowsky’s QM sequence than it deserves, so thanks for pushing back on that.
I’m not sure what you mean when you call yourself a pro-MWI extremist but in any case AFAIK there are physicists, including one or more prominent ones, who think MWI is really the only explanation that makes sense, although there are obviously degrees in how fervently one can hold this position and Yudkowsky seems at the extreme end of the scale in some of his writings. And he is far from the only one who thinks Copenhagen is ridiculous. These two parts of Yudkowsky’s position on MWI are not without parallel within professional physicists, and the point about Copenhagen being ridiculous is probably a point in his favor from most views (e.g. Nobel laureate Murray Gell-Mann said that Neils Bohr brainwashed people into Copenhagen), let alone this community. Perhaps I should have clarified this in my comment, although I did say that MWI is a leading interpretation and may well be correct.
The negative aspects I said in my comment were:
Yudkowsky’s confidence in MWI is disproportionate
Yudkowsky’s conviction that people who disagree with him are making elementary mistakes is disproportionate
These may come partly from a lack of knowledge or expertise
Maybe (3) is a little unfair, or sounds harsher than I meant it. It’s a bit unclear to me how seriously to take Aaronson’s quote. It seems like plenty of physicists have looked through the sequences to find glaring flaws, and basically found none (physics stackexchange). This is a nontrivial achievement in context. At the same time I expect most of the scrutiny has been to a relatively shallow level, partly because Yudkowsky is a polarizing writer. Aaronson is probably one of fairly few people who have deep technical expertise and have read the sequences with both enjoyment and a critical eye. Aaronson suggested a specific, technical flaw that may be partly responsible for Yudkowsky holding an extreme position with overconfidence and misunderstanding what people who disagree with him think. Probably this is a flaw Yudkowsky would not have made if he had worked with a professional physicist or something. But maybe Aaronson was just casually speculating and maybe this doesn’t matter too much. I don’t know. Possibly you are right to push back on the mixed states explanation.
I think (1) and (2) are well worth considering though. The argument here is not that his position is necessarily wrong or impossible, but that it is overconfident. I am not courageous enough to argue for this position to a physicist who holds some kind of extreme pro-MWI view, but I think this is a reasonable view and there’s a good chance (1) and (2) are correct. It also fits in Ben’s point 4 in the comment above: “Yudkowsky’s track record suggests a substantial bias toward dramatic and overconfident predictions.”
Hmm, I’m a bit confused where you’re coming from.
Suppose that the majority of eminent mathematicians believe 5+5=10, but a significant minority believes 5+5=11. Also, out of the people in the 5+5=10 camp, some say “5+5=10 and anyone who says otherwise is just totally wrong”, whereas other people said “I happen to believe that the balance of evidence is that 5+5=10, but my esteemed colleagues are reasonable people and have come to a different conclusion, so we 5+5=10 advocates should approach the issue with appropriate humility, not overconfidence.”
In this case, the fact of the matter is that 5+5=10. So in terms of who gets the most credit added to their track-record, the ranking is:
1st place: The ones who say “5+5=10 and anyone who says otherwise is just totally wrong”,
2nd place: The ones who say “I think 5+5=10, but one should be humble, not overconfident”,
3rd place: The ones who say “I think 5+5=11, but one should be humble, not overconfident”,
Last place: The ones who say “5+5=11 and anyone who says otherwise is just totally wrong.
Agree so far?
(See also: Bayes’s theorem, Brier score, etc.)
Back to the issue here. Yudkowsky is claiming “MWI, and anyone who says otherwise is a just totally wrong”. (And I agree—that’s what I meant when I called myself a pro-MWI extremist.)
IF the fact of the matter is that careful thought shows MWI to be unambiguously correct, then Yudkowsky (and I) get more credit for being more confident. Basically, he’s going all in and betting his reputation on MWI being right, and (in this scenario) he won the bet.
Conversely, IF the fact of the matter is that careful thought shows MWI to be not unambiguously correct, then Eliezer loses the maximum number of points. He staked his reputation on MWI being right, and (in this scenario) he lost the bet.
So that’s my model, and in my model “overconfidence” per se is not really a thing in this context. Instead we first have to take a stand on the object-level controversy. I happen to agree with Eliezer that careful thought shows MWI to be unambiguously correct, and given that, the more extreme his confidence in this (IMO correct) claim, the more credit he deserves.
I’m trying to make sense of why you’re bringing up “overconfidence” here. The only thing I can think of is that you think that maybe there is simply not enough information to figure out whether MWI is right or wrong (not even for even an ideal reasoner with a brain the size of Jupiter and a billion years to ponder the topic), and therefore saying “MWI is unambiguously correct” is “overconfident”? If that’s what you’re thinking, then my reply is: if “not enough information” were the actual fact of the matter about MWI, then we should criticize Yudkowsky first and foremost for being wrong, not for being overconfident.
As for your point (2), I forget what mistakes Yudkowsky claimed that anti-MWI-advocates are making, and in particular whether he thought those mistakes were “elementary”. I am open-minded to the possibility that Yudkowsky was straw-manning the MWI critics, and that they are wrong for more interesting and subtle reasons than he gives them credit for, and in particular that he wouldn’t pass an anti-MWI ITT. (For my part, I’ve tried harder, see e.g. here.) But that’s a different topic. FWIW I don’t think of Yudkowsky as having a strong ability to explain people’s wrong opinions in a sympathetic and ITT-passing way, or if he does have that ability, then I find that he chooses not to exercise it too much in his writings. :-P
‘The more probability someone assigns to a claim, the more credit they get when the claim turns out to be true’ is true as a matter of Bayesian math. And I agree with you that MWI is true, and that we have enough evidence to say it’s true with very high confidence, if by ‘MWI’ we just mean a conjunction like “Objective collapse is false.” and “Quantum non-realism is false / the entire complex amplitude is in some important sense real”.
(I think Eliezer had a conjunction like this in mind when he talked about ‘MWI’ in the Sequences; he wasn’t claiming that decoherence explains the Born rule, and he certainly wasn’t claiming that we need to reify ‘worlds’ as a fundamental thing. I think a better term for MWI might be the ‘Much World Interpretation’, since the basic point is about how much stuff there is, not about a division of that stuff into discrete ‘worlds’.)
That said, I have no objection in principle to someone saying ‘Eliezer was right about MWI (and gets more points insofar as he was correct), but I also dock him more points than he gained because I think he was massively overconfident’.
E.g., imagine someone who assigns probability 1 (or probability .999999999) to a coin flip coming up heads. If the coin then comes up heads, then I’m going to either assume they were trolling me, or I’m going to infer that they’re very bad at reasoning. Even if they somehow rigged the coin, .999999999 is just too extreme a probability to be justified here.
By the same logic, if Eliezer had said that MWI is true with probability 1, or if he’d put too many ‘9s’ at the end of his .99… probability assignment, then I’d probably dock him more points than he gained for being object-level-correct. (Or I’d at least assume he has a terrible understanding of how Bayesian probability works. Someone could indeed be very miscalibrated and bad at talking in probabilistic terms, and yet be very knowledgeable and correct on object-level questions like MWI.)
I’m not sure exactly how many 9s is too many in the case of MWI, but it’s obviously possible to have too many 9s here. E.g., a hundred 9s would be too many! So I think this objection can make sense; I just don’t think Eliezer is in fact overconfident about MWI.
Fair enough, thanks.
Here’s my point: There is a rational limit to the amount of confidence one can have in MWI (or any belief). I don’t know where exactly this limit is for MWI-extremism but Yudkowsky clearly exceeded it sometimes. To use made up numbers, suppose:
MWI is objectively correct
Eliezer says P(MWI is correct) = 0.9999999
But rationally one can only reach P(MWI) = 0.999
Because there are remaining uncertainties that cannot be eliminated through superior thinking and careful consideration, such lack of experimental evidence, the possibility of QM getting overturned, the possibility of a new and better interpretation in the future, and unknown unknowns.
These factors add up to at least P(Not MWI) = 0.001.
Then even though Eliezer is correct about MWI being correct, he is still significantly overconfident in his belief about it.
Consider Paul’s example of Eliezer saying MWI is comparable to heliocentrism:
I agree with Paul here. Heliocentrism is vastly more likely than any particular interpretation of quantum mechanics, and Eliezer was wrong to have made this comparison.
This may sound like I’m nitpicking, but I think it fits into a pattern of Eliezer making dramatic and overconfident pronouncements, and it’s relevant information for people to consider e.g. when evaluating Eliezer’s belief that p(doom) = ~1 and the AI safety situation is so hopeless that the only thing left is to die with slightly more dignity.
Of course, it’s far from the only relevant data point.
Regarding (2), I think we’re on the same page haha.
Could someone point to the actual quotes where Eliezer compares heliocentrism to MWI? I don’t generally assume that when people are ‘comparing’ two very-high-probability things, they’re saying they have the same probability. Among other things, I’d want confirmation that ‘Eliezer and Paul assign roughly the same probability to MWI, but they have different probability thresholds for comparing things to heliocentrism’ is false.
E.g., if I compare Flat Earther beliefs, beliefs in psychic powers, belief ‘AGI was secretly invented in the year 2000’, geocentrism, homeopathy, and theism to each other, it doesn’t follow that I’d assign the same probabilities to all of those six claims, or even probabilities that are within six orders of magnitude of each other.
In some contexts it might indeed Griceanly imply that all six of those things pass my threshold for ‘unlikely enough that I’m happy to call them all laughably silly views’, but different people have their threshold for that kind of thing in different places.
Gotcha, thanks. I guess we have an object-level disagreement: I think that careful thought reveals MWI to be unambiguously correct, with enough 9’s as to justify Eliezer’s tone. And you don’t. ¯\_(ツ)_/¯
(Of course, this is bound to be a judgment call; e.g. Eliezer didn’t state how many 9’s of confidence he has. It’s not like there’s a universal convention for how many 9’s are enough 9’s to state something as a fact without hedging, or how many 9’s are enough 9’s to mock the people who disagree with you.)
Yes, agreed.
Let me lay out my thinking in more detail. I mean this to explain my views in more detail, not as an attempt to persuade.
Paul’s account of Aaronson’s view says that Eliezer shouldn’t be as confident in MWI as he is, which in words sounds exactly like my point, and similar to Aaronson’s stack exchange answer. But it still leaves open the question of how overconfident he was, and what, if anything, should be taken away from this. It’s possible that there’s a version of my point which is true but is also uninteresting or trivial (who cares if Yudkowsky was 10% too confident about MWI 15 years ago?).
And it’s worth reiterating that a lot of people give Eliezer credit for his writing on QM, including for being forceful in his views. I have no desire to argue against this. I had hoped to sidestep discussing this entirely since I consider it to be a separate point, but perhaps this was unfair and led to miscommunication. If someone wants to write a detailed comment/post explaining why Yudkowsky deserves a lot of credit for his QM writing, including credit for how forceful he was at times, I would be happy to read it and would likely upvote/strong upvote it depending on quality.
However, here my intention was to focus on the overconfidence aspect.
I’ll explain what I see as the epistemic mistakes Eliezer likely made to end up in an overconfident state. Why do I think Eliezer was overconfident on MWI?
(Some of the following may be wrong.)
He didn’t understand non-MWI-extremist views, which should have rationally limited his confidence
I don’t have sources for this, but I think something like this is true.
This was an avoidable mistake
Worth noting that Eliezer has updated towards the competence of elites in science since some of his early writing according to Rob’s comment elsewhere this thread
It’s possible that his technical understanding was uneven. This should also have limited his confidence.
Aaronson praised him for “actually get most of the technical stuff right”, which of course implies that not everything technical was correct.
He also suggested a specific, technical flaw in Yudkowsky’s understanding.
One big problem with having extreme conclusions based on uneven technical understanding is that you don’t know what you don’t know. And in fact Aaronson suggests a mistake Yudkowsky seems unaware of as a reason why Yudkowsky’s central argument is overstated/why Yudkowsky is overconfident about MWI.
However, it’s unclear how true/important a point this really is
At least 4 points limit confidence in P(MWI) to some degree:
Lack of experimental evidence
The possibility of QM getting overturned
The possibility of a new and better interpretation in the future
Unknown unknowns
I believe most or all of these are valid, commonly brought up points that together limit how confident anyone can be in P(MWI). Reasonable people may disagree with their weighting of course.
I am skeptical that Eliezer correctly accounted for these factors
Note that these are all points about the epistemic position Eliezer was in, not about the correctness of MWI. The first two are particular to him, and the last one applies to everyone.
Now, Rob points out that maybe the heliocentrism example is lacking context in some way (I find it a very compelling example of a super overconfident mistake if it’s not). Personally I think there are at least a couple[1] [2] of places in the sequences where Yudkowsky clearly says something that I think indicates ridiculous overconfidence tied to epistemic mistakes, but to be honest I’m not excited to argue about whether some of his language 15 years ago was or wasn’t overzealous.
The reason I brought this up despite it being a pretty minor point is because I think it’s part of a general pattern of Eliezer being overconfident in his views and overstating them. I am curious how much people actually disagree with this.
Of course, whether Eliezer has a tendency to be overconfident and overstate his views is only one small data point among very many others in evaluating p(doom), the value of listening to Eliezer’s views, etc.
“Many-worlds is an obvious fact, if you have all your marbles lined up correctly (understand very basic quantum physics, know the formal probability theory of Occam’s Razor, understand Special Relativity, etc.)”
“The only question now is how long it will take for the people of this world to update.” Both quotes from https://www.lesswrong.com/s/Kqs6GR7F5xziuSyGZ/p/S8ysHqeRGuySPttrS
For what it’s worth, consider the claim “The Judeo-Christian God, the one who listens to prayers and so on, doesn’t exist.” I have such high confidence in this claim that I would absolutely state it as a fact without hedging, and psychoanalyze people for how they came to disagree with me. Yet there’s a massive theology literature arguing to the contrary of that claim, including by some very smart and thoughtful people, and I’ve read essentially none of this theology literature, and if you asked me to do an anti-atheism ITT I would flunk it catastrophically.
I’m not sure what lesson you’ll take from that; for all I know you yourself are very religious, and this anecdote will convince you that I have terrible judgment. But if you happen to be on the same page as me, then maybe this would be an illustration of the fact that (I claim) one can rationally and correctly arrive at extremely-confident beliefs without it needing to pass through a deep understanding and engagement with the perspectives of the people who disagree with you.
I agree that this isn’t too important a conversation, it’s just kinda interesting. :)
I’m not sure either of the quotes you cited by Eliezer require or suggest ridiculous overconfidence.
If I’ve seen some photos of a tiger in town, and I know a bunch of people in town who got eaten by an animal, and we’ve all seen some apparent tiger-prints near where people got eaten, I may well say “it’s obvious there is a tiger in town eating people.” If people used to think it was a bear, but that belief was formed based on priors when we didn’t yet have any hard evidence about the tiger, I may be frustrated with people who haven’t yet updated. I may say “The only question is how quickly people’s views shift from bear to tiger. Those who haven’t already shifted seem like they are systematically slow on the draw and we should learn from their mistakes.” I don’t think any of those statements imply I think there’s a 99.9% chance that it’s a tiger. It’s more a statement rejecting the reasons why people think there is a bear, and disagreeing with those reasons, and expecting their views to predictably change over time. But I could say all that while still acknowledging some chance that the tiger is a hoax, that there is a new species of animal that’s kind of like a tiger, that the animal we saw in photos is different from the one that’s eating people, or whatever else. The exact smallness of the probability of “actually it wasn’t the tiger after all” is not central to my claim that it’s obvious or that people will come around.
I don’t think it’s central to this point, but I think 99% is a defensible estimate for many-worlds. I would probably go somewhat lower but certainly wouldn’t run victory laps about that or treat it as damning of someone’s character. The above is mostly a bad analogy explaining why I think it’s pretty reasonable to say things like Eliezer did even if your all-things-considered confidence was 99% or even lower.
To get a sense for what Eliezer finds frustrating and intends to critique, you can read If many-worlds had come first (which I find quite obnoxious). I think to the extent that he’s wrong it’s generally by mischaracterizing the alternative position and being obnoxious about it (e.g. misunderstanding the extent to which collapse is proposed as ontologically fundamental rather than an expression of agnosticism or a framework for talking about experiments, and by slightly misunderstanding what “ontologically fundamental collapse” would actually mean). I don’t think it has much to do with overconfidence directly, or speaks to the quality of Eliezer’s reasoning about the physical world, though I think it is a bad recurring theme in Eliezer’s reasoning about and relationships with other humans. And in fairness I do think there are a lot of people who probably deserve Eliezer’s frustration on this point (e.g. who talk about how collapse is an important and poorly-understood phenomenon rather than most likely just being the most boring thing) though I mostly haven’t talked with them and I think they are systematically more mediocre physicists.
“Maybe (3) is a little unfair, or sounds harsher than I meant it. It’s a bit unclear to me how seriously to take Aaronson’s quote. It seems like plenty of physicists have looked through the sequences to find glaring flaws, and basically found none (physics stackexchange). T”
Here’s a couple: he conflates Copenhagen and Objective collapse throughout.
He fails to distinguish Everettian and Decoherence based MWI.
This doesn’t feel like a track record claim to me. Nothing has changed since Eliezer wrote that; it reads as reasonably now as it did then; and we have nothing objective against which to evaluate it.
I broadly agree with Eliezer that (i) collapse seems unlikely, (ii) if the world is governed by QM as we understand it, the whole state is probably as “real” as we are, (iii) there seems to be nothing to favor the alternative interpretations other than those that make fewer claims and are therefore more robust to unknown-unknowns. So if anything I’d be inclined to give him a bit of credit on this one, given that it seems to have held up fine for readers who know much more about quantum mechanics than he did when writing the sequence.
The main way the sequence felt misleading was by moderately overstating how contrarian this take was. For example, near the end of my PhD I was talking with Scott Aaronson and my advisor Umesh Vazirani, who I considered not-very-sympathetic to many worlds. When asked why, my recollection of his objection was “What are these ‘worlds’ that people are talking about? There’s just the state.” That is, the whole issue turned on a (reasonable) semantic objection.
However, I do think Eliezer is right that in some parts of physics collapse is still taken very seriously and there are more-than-semantic disagreements. For example, I was pretty surprised by David Griffiths’ discussion of collapse in the afterword of his textbook (pdf) during undergrad. I think that Eliezer is probably right that some of these are coming from a pretty confused place. I think the actual situation with respect to consensus is a bit muddled, and e.g. I would be fairly surprised if Eliezer was able to make a better prediction about the result of any possible experiment than the physics community based on his confidence in many-worlds. But I also think that a naive-Paul perspective of “no way anyone is as confused as Eliezer is saying” would have been equally-unreasonable.
I agree that Eliezer is overconfident about the existence of the part of the wavefunction we never see. If we are deeply wrong about physics, then I think this could go either way. And it still seems quite plausible that we are deeply wrong about physics in one way or another (even if not in any particular way). So I think it’s wrong to compare many-worlds to heliocentrism (as Eliezer has done). Heliocentrism is extraordinarily likely even if we are completely wrong about physics—direct observation of the solar system really is a much stronger form of evidence than a priori reasoning about the existence of other worlds. Similarly, I think it’s wrong to compare many-worlds to a particular arbitrary violation of conservation of energy when top quarks collide, rather than something more like “there is a subtle way in which our thinking about conservation of energy is mistaken and the concept either doesn’t apply or is only approximately true.” (It sounds reasonable to compare it to the claim that spinning black holes obey conservation of angular momentum, at least if you don’t yet made any astronomical observations that back up that claim.)
My understanding is this is the basic substance of Eliezer’s disagreement with Scott Aaronson. My vague understanding of Scott’s view (from one conversation with Scott and Eliezer about this ~10 years ago) is roughly “Many worlds is a strong prediction of our existing theories which is intuitively wild and mostly-experimentally-unconfirmed. Probably true, and would be ~the most interesting physics result ever if false, but still seems good to test and you shouldn’t be as confident as you are about heliocentrism.”
When I said it was relevant to his track record as a public intellectual, I was referring to his tendency to make dramatic and overconfident pronouncements (which Ben mentioned in the parent comment). I wasn’t intending to imply that the debate around QM had been settled or that new information had come out. I do think that even at the time Eliezer’s positions on both MWI and why people disagreed with him on it were overconfident though.
I think you’re right that my comment gave too little credit to Eliezer, and possibly misleadingly implied that Eliezer is the only one who holds some kind of extreme MWI or anti-collapse view or that such views are not or cannot be reasonable (especially anti-collapse). I said that MWI is a leading candidate but that’s still probably underselling how many super pro-MWI positions there are. I expanded on this in another comment.
Your story of Eliezer comparing MWI to heliocentrism is a central example of what I’m talking about. It is not that his underlying position is wrong or even unlikely, but that he is significantly overconfident.
I think this is relevant information for people trying to understand Eliezer’s recent writings.
To be clear, I don’t think it’s a particularly important example, and there is a lot of other more important information than whether Eliezer overestimated the case for MWI to some degree while also displaying impressive understanding of physics and possibly/probably being right about MWI.
It seems that half of these examples are from 15+ years ago, from a period for which Eliezer has explicitly disavowed his opinions (and the ones that are not strike me as most likely correct, like treating coherence arguments as forceful and that AI progress is likely to be discontinuous and localized and to require relatively little compute).
Let’s go example-by-example:
1. Predicting near-term extinction from nanotech
This critique strikes me as about as sensible as digging up someone’s old high-school essays and critiquing their stance on communism or the criminal justice system. I want to remind any reader that this is an opinion from 1999, when Eliezer was barely 20 years old. I am confident I can find crazier and worse opinions for every single leadership figure in Effective Altruism, if I am willing to go back to what they thought while they were in high-school. To give some character, here are some things I believed in my early high-school years:
The economy was going to collapse because the U.S. was establishing a global surveillance state
Nuclear power plants are extremely dangerous and any one of them is quite likely to explode in a given year
We could have easily automated the creation of all art, except for the existence of a vaguely defined social movement that tries to preserve the humanity of art-creation
These are dumb opinions. I am not ashamed of having had them. I was young and trying to orient in the world. I am confident other commenters can add their own opinions they had when they were in high-school. The only thing that makes it possible for someone to critique Eliezer on these opinions is that he was virtuous and wrote them down, sometimes in surprisingly well-argued ways.
If someone were to dig up an old high-school essay of mine, in-particular one that has at the top written “THIS IS NOT ENDORSED BY ME, THIS IS A DUMB OPINION”, and used it to argue that I am wrong about important cause prioritization questions, I would feel deeply frustrated and confused.
For context, on Eliezer’s personal website it says:
2. Predicting that his team had a substantial chance of building AGI before 2010
Given that this is only 2 years later, all my same comments apply. But let’s also talk a bit about the object-level here.
This is the quote on which this critique is based:
This… is not a very confident prediction. This paragraph literally says “only a guess”. I agree, if Eliezer said this today, I would definitely dock him some points, but this is again a freshman-aged Eliezer, and it was more than 20 years ago.
But also, I don’t know, predicting AGI by 2020 from the year 2000 doesn’t sound that crazy. If we didn’t have a whole AI winter, if Moore’s law had accelerated a bit instead of slowed down, if more talent had flowed into AI and chip-development, 2020 doesn’t seem implausible to me. I think it’s still on the aggressive side, given what we know now, but technological forecasting is hard, and the above sounds more like a 70% confidence interval instead of a 90% confidence interval.
3. Having high confidence that AI progress would be extremely discontinuous and localized and not require much compute
This opinion strikes me as approximately correct. I still expect highly discontinuous progress, and many other people have argued for this as well. Your analysis that the world looks more like Hanson’s world described in the AI foom debate also strikes me as wrong (and e.g. Paul Christiano has also said that Hanson’s predictions looked particularly bad in the FOOM debate. EDIT: I think this was worded too strong, and while Paul had some disagreements with Robin, on the particular dimension of discontinuity and competitiveness, Paul thinks Robin came away looking better than Eliezer). Indeed, I would dock Hanson many more points in that discussion (though, overall, I give both of them a ton of points, since they both recognized the importance of AI-like technologies early, and performed vastly above baseline for technological forecasting, which again, is extremely hard).
This seems unlikely to be the right place for a full argument on discontinuous progress. However, continuous takeoff is very far from consensus in the AI Alignment field, and this post seems to try to paint it as such, which seems pretty bad to me (especially if it’s used in a list with two clearly wrong things, without disclaiming it as such).
4. Treating early AI risk arguments as close to decisive
You say:
I think the arguments are pretty tight and sufficient to establish the basic risk argument. I found your critique relatively uncompelling. In particular, I think you are misrepresenting that a premise of the original arguments was a fast takeoff. I can’t currently remember any writing that said it was a necessary component of the AI risk arguments that takeoff happens fast, or at least whether the distinction between “AI vastly exceeds human intelligence in 1 week vs 4 years” is that crucial to the overall argument, which is as far as I can tell the range that most current opinions in the AI Alignment field falls into (and importantly, I know of almost no one who believes that it could take 20+ years for AI to go from mildly subhuman to vastly superhuman, which does feel like it could maybe change the playing field, but also seems to be a very rarely held opinion).
Indeed, I think Eliezer was probably underconfident in doom from AI, since I currently assign >50% probability to AI Doom, as do many other people in the AI Alignment field.
See also Nate’s recent comment on some similar critiques to this: https://www.lesswrong.com/posts/8NKu9WES7KeKRWEKK/why-all-the-fuss-about-recursive-self-improvement
5. Treating “coherence arguments” as forceful
Coherence arguments do indeed strike me as one of the central valid arguments in favor of AI Risk. I think there was a common misunderstanding that did confuse some people, but that misunderstanding was not argued for by Eliezer or other people at MIRI, as far as I can tell (and I’ve looked into this for 5+ hours as part of discussions with Rohin and Richard).
The central core of coherence arguments, which are based in arguments of competetiveness and economic efficiency strike me as very strong, robustly argued for, and one of the main reasons for why AI Risk will be dangerous. The Neumann-Morgensterm theorem does play a role here, though it’s definitely not sufficient to establish a strong case, and Rohin and Richard have successfully argued against that, though I don’t think Eliezer has historically argued that the Neumann-Morgenstern theorem is sufficient to establish an AI-alignment relevant argument on its own (though Dutch-book style arguments are very suggestive for the real structure of the argument).
Edit: Rohin says something similar in a separate comment reply.
6. Not acknowledging his mixed track record
Given my disagreements with the above, I think doing so would be a mistake. But even without that, let’s look at the merits of this critique.
For the two “clear cut” examples, Eliezer has posted dozens of times on the internet that he has disendorsed his views from before 2002. This is present on his personal website, the relevant articles are no longer prominently linked anywhere, and Eliezer has openly and straightforwardly acknowledged that his predictions and beliefs from the relevant period were wrong.
For the disputed examples, Eliezer still believes all of these arguments (as do I), so it would be disingenuous for Eliezer to “acknowledge his mixed track record” in this domain. You can either argue that he is wrong, or you can argue that he hasn’t acknowledged that he has changed his mind and was previously wrong, but you can’t both argue that Eliezer is currently wrong in his beliefs, and accuse him of not telling others that he is wrong. I want people to say things they believe. And for the only two cases where you have established that Eliezer has changed his mind, he has extensively acknowledged his track record.
Some comments on the overall post:
I really dislike this post. I think it provides very little argument, and engages in extremely extensive cherry-picking in a way that does not produce a symmetric credit-allocation (i.e. most people who are likely to update downwards on Yudkowsky on the basis of this post, seem to me to be generically too trusting, and I am confident I can write a more compelling post about any other central figure in Effective Altruism that would likely cause you to update downwards even more).
I think a good and useful framing on this post could have been “here are 3 points where I disagree with Eliezer on AI Risk” (I don’t think it would have been useful under almost any circumstance to bring up the arguments from the year 2000). And then to primarily spend your time arguing about the concrete object-level. Not to start a post that is trying to say that Eliezer is “overconfident in his beliefs about AI” and “miscalibrated”, and then to justify that by cherry-picking two examples from when Eliezer was barely no longer a teenager, and three arguments on which there is broad disagreement within the AI Alignment field.
I also dislike calling this post “On Deference and Yudkowsky’s AI Risk Estimates”, as if this post was trying to be an unbiased analysis of how much to defer to Eliezer, while you just list negative examples. I think this post is better named “against Yudkowsky on AI Risk estimates”. Or “against Yudkowsky’s track record in AI Risk Estimates”. Which would have made it clear that you are selectively giving evidence for one side, and more clearly signposted that if someone was trying to evaluate Eliezer’s track record, this post will only be a highly incomplete starting point.
I have many more thoughts, but I think I’ve written enough for now. I think I am somewhat unlikely to engage with replies in much depth, because writing this comment has already taken up a lot of my time, and I expect given the framing of the post, discussion on the post to be unnecessarily conflicty and hard to navigate.
Just to note that the boldfaced part has no relevance in this context. The post is not attributing these views to present-day Yudkowsky. Rather, it is arguing that Yudkowsky’s track record is less flattering than some people appear to believe. You can disavow an opinion that you once held, but this disavowal doesn’t erase a bad prediction from your track record.
Hmm, I think that part definitely has relevance. Clearly we would trust Eliezer less if his response to that past writing was “I just got unlucky in my prediction, I still endorse the epistemological principles that gave rise to this prediction, and would make the same prediction, given the same evidence, today”.
If someone visibly learns from forecasting mistakes they make, that should clearly update us positively on them not repeating the same mistakes.
I suppose one of my main questions is whether he has visibly learned from the mistakes, in this case.
For example, I wasn’t able to find a post or comment to the effect of “When I was younger, I spent of years of my life motivated by the belief that near-term extinction from nanotech was looming. I turned out to be wrong. Here’s what I learned from that experience and how I’ve applied it to my forecasts of near-term existential risk from AI.” Or a post or comment acknowledging his previous over-optimistic AI timelines and what he learned from them, when formulating his current seemingly short AI timelines.
(I genuinely could be missing these, since he has so much public writing.)
Eliezer writes a bit about his early AI timeline and nanotechnology opinions here, though it sure is a somewhat obscure reference that takes a bunch of context to parse:
While also including some other points, I do read it as a pretty straightforward “Yes, I was really wrong. I didn’t know about cognitive biases, and I did not know about the virtue of putting probability distributions on things, and I had not thought enough about the art of thinking well. I would not make the same mistakes today.”.
Did Yudkowsky actually write these sentences?
If Yudkowsky thinks, as this suggests, that people in EA think or do things because he tells them to—this alone means it’s valuable to question whether people give him the right credibility.
I am not sure about the question. Yeah, this is a quote from the linked post, so he wrote those sections.
Also, yeah, seems like Eliezer has had a very large effect on whether this community uses things like probability distributions, models things in a bayesian way, makes lots of bets, and pays attention to things like forecasting track records. I don’t think he gets to take full credit for those norms, but my guess is he is the single individual who most gets to take credit for those norms.
I don’t see how he has encouraged people to pay attention to forecasting track records. People who have encouraged that norm make public bets or go on public forecasting platforms and make predictions about questions that can resolve in the short term. Bryan Caplan does this; I think greg Lewis and David Manheim are superforecasters.
I thought the upshot of this piece and the Jotto post was that Yudkowsky is in fact very dismissive of people who make public forecasts. “I consider naming particular years to be a cognitively harmful sort of activity; I have refrained from trying to translate my brain’s native intuitions about this into probabilities, for fear that my verbalized probabilities will be stupider than my intuitions if I try to put weight on them.” This seems like the opposite of encouraging people to pay attention to forecasting but is rather dismissing the whole enterprise of forecasting.
I wanted to make sure I’m not missing something, since this shines a negative light about him IMO.
There’s a difference between saying, for example, “You can’t expect me to have done X then—nobody was doing it, and I haven’t even written about it yet, nor was I aware of anyone else doing so”—and saying ”… nobody was doing it because I haven’t told them to.”
This isn’t about credit. It’s about self-perception and social dynamics.
I mean… it is true that Eliezer really did shape the culture in the direction of forecasting and predictions and that kind of stuff. My best guess is that without Eliezer, we wouldn’t have a culture of doing those things (and like, the AI Alignment community as is probably wouldn’t exist). You might disagree with me and him on this, in which case sure, update in that direction, but I don’t think it’s a crazy opinion to hold.
The timeline doesn’t make sense for this version of events at all. Eliezer was uninformed on this topic in 1999, at a time when Robin Hanson had already written about gambling on scientific theories (1990), prediction markets (1996), and other betting-related topics, as you can see from the bibliography of his Futarchy paper (2000). Before Eliezer wrote his sequences (2006-2009), the Long Now Foundation already had Long Bets (2003), and Tetlock had already written Expert Political Judgment (2005).
If Eliezer had not written his sequences, forecasting content would have filtered through to the EA community from contacts of Hanson. For instance, through blogging by other GMU economists like Caplan (2009). And of course, through Jason Matheny, who worked at FHI, where Hanson was an affiliate. He ran the ACE project (2010), which led to the science behind Superforecasting, a book that the EA community would certainly have discovered.
Hmm, I think these are good points. My best guess is that I don’t think we would have a strong connection to Hanson without Eliezer, though I agree that that kind of credit is harder to allocate (and it gets fuzzy what we even mean by “this community” as we extend into counterfactuals like this).
I do think the timeline here provides decent evidence in favor of less credit allocation (and I think against the stronger claim “we wouldn’t have a culture of [forecasting and predictions] without Eliezer”). My guess is in terms of causing that culture to take hold, Eliezer is probably still the single most-responsible individual, though I do now expect (after having looked into a bunch of comment threads from 1996 to 1999 and seeing many familiar faces show up) that a lot of the culture would show up without Eliezer.
speaking for myself, eliezer has played no role in encouraging me to give quantitative probability distributions. For me, that was almost entirely due to people like Tetlock and Bryan Caplan, both of whom I would have encountered regardless of Eliezer. I strongly suspect this is true of lots of people who are in EA but don’t identify with the rationalist community
More generally, I do think that Eliezer and other rationalists overestimate how much influence they have had on wider views in the community. eg I have not read the sequences and I just don’t think it plays a big role in the internal story of a lot of EAs.
For me, even people like Nate Silver or David McKay, who aren’t part of the community, have played a bigger role on encouraging quantification and probabilistic judgment.
This is my impression and experience as well
“My best guess is that I don’t think we would have a strong connection to Hanson without Eliezer”
Fwiw, I found Eliezer through Robin Hanson.
Yeah, I think this isn’t super rare, but overall still much less common than the reverse.
I’ll currently take your word for that because I haven’t been here nearly as long. I’ll mention that some of these contributions I don’t necessarily consider positive.
But the point is, is Yudkowsky a (major) contributor to a shared project, or is he a ruler directing others, like his quote suggests? How does he view himself? How do the different communities involved view him?
P.S. I disagree with whoever (strong-)downvoted your comment.
Yudkowsky often
complainsrantshopes people will form their own opinions instead of just listening to him, I can find references if you want.I also think he lately finds it
depressingworrying that he’s got to be the responsible adult. Easy references: Search for “Eliezer” in List Of Lethalities.I think this strengthens my point, especially given how it is written in the post you linked. Telling people you’re the responsible adult, or the only one who notices things, still means telling them you’re smarter than them and they should just defer to you.
I’m trying to account for my biases in these comments, but I encourage others to go to that post, search for “Eliezer” as you suggested, and form their own views.
Those are four very different claims. In general, I think it’s bad to collapse all (real or claimed) differences in ability into a single status hierarchy, for the reasons stated in Inadequate Equilibria.
Eliezer is claiming that other people are not taking the problem sufficiently seriously, claiming ownership of it, trying to form their own detailed models of the full problem, and applying enough rigor and clarity to make real progress on the problem.
He is specifically not saying “just defer to me”, and in fact is saying that he and everyone else is going to die if people rely on deference here. A core claim in AGI Ruin is that we need more people with “not the ability to read this document and nod along with it, but the ability to spontaneously write it from scratch without anybody else prompting you”.
Deferring to Eliezer means that Eliezer is the bottleneck on humanity solving the alignment problem; which means we die. The thing Eliezer claims we need is a larger set of people who arrive at true, deep, novel insights about the problem on their own —without Eliezer even mentioning the insights, much less spending a ton of time trying to persuade anyone of them—and writing them up.
It’s true that Eliezer endorses his current stated beliefs; this goes without saying, or he obviously wouldn’t have written them down. It doesn’t mean that he thinks humanity has any path to survival via deferring to him, or that he thinks he has figured out enough of the core problems (or ever could conceivably could do so, on his own) to give humanity a significant chance of surviving. Quoting AGI Ruin:
The end of the “death with dignity” post is also alluding to Eliezer’s view that it’s pretty useless to figure out what’s true merely via deferring to Eliezer.
Thanks, those are some good counterpoints.
Eliezer is cleanly just a major contributor. If he went off the rails tomorrow, some people would follow him (and the community would be better with those few gone), but the vast majority would say “wtf is that Eliezer fellow doing”. I also don’t think he sees himself as the leader of the community either.
Probably Eliezer likes Eliezer more than EA/Rationality likes Eliezer, because Eliezer really likes Eliezer. If I were as smart & good at starting social movements as Eliezer, I’d probably also have an inflated ego, so I don’t take it as too unreasonable of a character flaw.
More than Philip Tetlock (author of Superforecasting)?
Does that particular quote from Yudkowsky not strike you as slightly arrogant?
Yes, definitely much more than Philip Tetlock, given that our community had strong norms of forecasting and making bets before Tetlock had done most of his work on the topic (Expert Political Forecasting was out, but as far as I can tell was not a major influence on people in the community, though I am not totally confident of that).
I am generally strongly against a culture of fake modesty. If I want people to make good decisions, they need to be able to believe things about them that might sound arrogant to others. Yes, it sounds arrogant to an external audience, but it also seems true, and it seems like whether it is true should be the dominant fact on whether it is good to say.
FWIW I think “it was 20 years ago” is a good reason not to take these failed predictions too seriously, and “he has disavowed these predictions after seeing they were false” is a bad reason to take them unseriously.
If EY gets to disavow his mistakes, so does everyone else.
On 1 (the nanotech case):
I think your comment might give the misimpression that I don’t discuss this fact in the post or explain why I include the case. What I write is:
An addition reason why I think it’s worth distinguishing between his views on nanotech and (e.g.) your views on nuclear power: I think there’s a difference between an off-hand view picked up from other people vs. a fairly idiosyncratic view that you consciously adopted after a lot of reflection and that you decide to devote your professional life to and found an organization to address.
It’s definitely up to the reader to decide how relevant the nanotech case is. Since it’s not widely known, it seems at least pretty plausibly relevant, and the post twice flags his age at the time, I do still endorse including it.
At face value, as well: we’re trying to assess how much weight to give to someone’s extreme, outlier-ish prediction that an emerging technology is almost certain to kill everyone very soon. It just does seem very relevant, to me, that they previously had a different extreme outlier-ish prediction that another emerging technology was very likely kill everyone within a decade.
I don’t find it plausible that we should assign basically no significance to this.
On 6 (the question of whether Yudkowsky has acknowledged negative aspects of his track record):
Similarly, I think your comment may give the impression that I don’t discuss this point in the post. What I write is this:
On the general point that this post uses old examples:
Give the sorts of predictions involved (forecasts about pathways to transformative technologies), old examples are generally going to be more unambiguous than new examples. Similarly for risk arguments: it’s hard to have a sense of how new arguments are going to hold up. It’s only for older arguments that we can start to approach the ability to say that technological progress, progress in arguments, and evolving community opinion say something clear-ish about how strong the arguments were.
On signposting:
I think it’s possible another title would have been better (I chose a purposely bland one partly for the purpose of trying to reduce heat—and that might have been a mistake). But I do think I signpost what the post is doing fairly clearly.
The introduction says it’s focusing on “negative aspects” of Yudkowsky’s track record, the section heading for the section introducing the examples describes them as “cherry-picked,” and the start of the section introducing the examples has an italicized paragraph re-emphasizing that the examples are selective and commenting on the significance of this selectiveness.
On the role of the fast take-off assumption in classic arguments:
I disagree with this. I do think it’s fair to say that fast take-off was typically a premise of the classic arguments.
Two examples I have off-hand (since they’re in the slides from my talk) are from Yudkowsky’s exchange with Caplan and from Superintelligence. Superintelligence isn’t by Yudkowsky, of course, but hopefully is still meaningful to include (insofar as Superintelligence heavily drew on Yudkowsky’s work and was often accepted as a kind of distillation of the best arguments as they existed at the time).
From Yudkowsky’s debate with Caplan (2016):
(Caveat that the fast-take-off premise is stated a bit ambiguity here, so it’s not clear what level of rapidness is being assumed.)
From Superintelligence:
The decisive strategic advantage point is justified through a discussion of the possibility of a fast take-off. The first chapter of the book also starts by introducing the possibility of an intelligence explosion. It then devotes two chapters to the possibility of a fast take-off and the idea this might imply a decisive strategic advantage, before it gets to discussing things like the orthogonality thesis.
I think it’s also relevant that content from MIRI and people associated with MIRI, raising the possibility of extinction from AI, tended to very strongly emphasize (e.g. spend most of its time on) the possibility of a run-away intelligence explosion. The most developed classic pieces arguing for AI risk often have names like “Shaping the Intelligence Explosion,” “Intelligence Explosion: Evidence and import,” “Intelligence Explosion Microeconomics,” and “Facing the Intelligence Explosion.”
Overall, then, I do think it’s fair to consider a fast-takeoff to be a core premise of the classic arguments. It wasn’t incidental or a secondary consideration.
[[Note: I’ve edited my comment, here, to respond to additional points. Although there are still some I haven’t responded to yet.]]
One quick response, since it was easy (might respond more later):
I do think takeoff speeds between 1 week and 10 years are a core premise of the classic arguments. I do think the situation looks very different if we spend 5+ years in the human domain, but I don’t think there are many who believe that that is going to happen.
I don’t think the distinction between 1 week and 1 year is that relevant to the core argument for AI Risk, since it seems in either case more than enough cause for likely doom, and that premise seems very likely to be true to me. I do think Eliezer believes things more on the order of 1 week than 1 year, but I don’t think the basic argument structure is that different in either case (though I do agree that the 1 year opens us up to some more potential mitigating strategies).
“Orthogonality thesis: Intelligence can be directed toward any compact goal….
Instrumental convergence: An AI doesn’t need to specifically hate you to hurt you; a paperclip maximizer doesn’t hate you but you’re made out of atoms that it can use to make paperclips, so leaving you alive represents an opportunity cost and a number of foregone paperclips….
Rapid capability gain and large capability differences: Under scenarios seeming more plausible than not, there’s the possibility of AIs gaining in capability very rapidly, achieving large absolute differences of capability, or some mixture of the two….
1-3 in combination imply that Unfriendly AI is a critical problem-to-be-solved, because AGI is not automatically nice, by default does things we regard as harmful, and will have avenues leading up to great intelligence and power.””
1-3 in combination don’t imply anything with high probability.
My impression is the post is somewhat unfortunate attempt to “patch” the situation in which many generically too trusting people updated a lot on AGI Ruin: A List of Lethalities and Death with Dignity and subsequent deference/update cascades.
In my view the deeper problem here is instead of disagreements about model internals, many of these people do some sort of “averaging conclusions” move, based on signals like seniority, karma, vibes, etc.
Many of these signals are currently wildly off from truth-tracking, so you get attempts to push the conclusion-updates directly.
This is really minor and nitpicky, and I agree with much of your overall points, but I don’t think equivocating between “barely 20” and “early high-school” is fair. The former is a normal age to be a third-year university student in the US, and plenty of college-age EAs are taken quite seriously by the rest of us.
Oh, hmm, I think this is just me messing up the differences between the U.S. and german education systems (I was 18 and 19 in high-school, and enrolled in college when I was 20).
I think the first quote on nanotechnology was actually written in 1996 originally (though was maybe updated in 1999). Which would put Eliezer at ~17 years old when he wrote that.
The second quote was I think written in more like 2000, which would put him more in the early college years, and I agree that it seems good to clarify that.
Thank you, this clarification makes sense to me!
To clarify, what I said was:
Then I listed a bunch of ways in which the world looks more like Robin’s predictions, particularly regarding continuity and locality. I said Robin’s predictions about AI timelines in particular looked bad. This isn’t closely related to the topic of your section 3, where I mostly agree with the OP.
Hmm, I think this is fair, rereading that comment.
I feel a bit confused here, since at the scale that Robin is talking about, timelines and takeoff speeds seem very inherently intertwined (like, if Robin predicts really long timelines, this clearly implies a much slower takeoff speed, especially when combined with gradual continuous increases). I agree there is a separate competitiveness dimension that you and Robin are closer on, which is important for some of the takeoff dynamics, but on overall takeoff speed, I feel like you are closer to Eliezer than Robin (Eliezer predicting weeks to months to cross the general intelligence human->superhuman gap, you predicting single-digit years to cross that gap, and Hanson predicting decades to cross that gap). Though it’s plausible that I am missing something here.
In any case, I agree that my summary of your position here is misleading, and will edit accordingly.
I think my views about takeoff speeds are generally similar to Robin’s though neither Robin nor Eliezer got at all concrete in that discussion so I can’t really say. You can read this essay from 1998 with his “outside-view” guesses, which I suspect are roughly in line with what he’s imagining in the FOOM debate.
I think that doc implies significant probability on a “slow” takeoff of 8, 4, 2… year doublings (more like the industrial revolution), but a broad distribution over dynamics which also puts significant probability on e.g. a relatively fast jump to a 1 month doubling time (more like the agricultural revolution). In either case, over the next few doublings he would by default expect still further acceleration. Overall I think this is basically a sensible model.
(I agree that shorter timelines generally suggest faster takeoff, but I think either Robin or Eliezer’s views about timelines would be consistent with either Robin or Eliezer’s views about takeoff speed.)
If done in a polite and respectful manner, I think this would be a genuinely good idea.
Not sure why this is on EAF rather than LW or maybe AF, but anyway. I find this interesting to look at because I have been following Eliezer’s work since approximately 2003 on SL4, and so I remember this firsthand, as it were. I disagree with several of the evaluations here (but of course agree with several of the others—I found the premise of Flare to be ludicrous at the time, and thankfully, AFAICT, pretty much zero effort went into that vaporware*):
calling LOGI and related articles ‘wrong’ because that’s not how DL looks right now is itself wrong. Yudkowsky has never said that DL or evolutionary approaches couldn’t work, or that all future AI work would look like the Bayesian program and logical approach he favored; he’s said (consistently since at least SL4 that I’ve observed) that they would be extremely dangerous when they worked, and extremely hard to make safe to the high probability that we need them to when deployed to the real world indefinitely and unboundedly and self-modifyingly, and that rigorous program-proof approaches which can make formal logical guarantees of 100% safety are what are necessary and must deal with the issues and concepts discussed in LOGI. I think this is true: they do look extremely dangerous by default, and we still do not have adequate solutions to problems like “how do we talk about human values in a way which doesn’t hardwire them dangerously into a reward function which can’t be changed?” This is something actively researched now in RL & AI safety, and which continues to lack any solution you could call even ‘decent’. (If you have ever been surprised by any result from causal influence diagrams, then you have inadvertently demonstrated the value of this.) More broadly, we still do not have any good proof or approach that we can feasibly engineer any of that with prosaic alignment approaches, which tend towards the ‘patch bugs as you find them’ or ‘make systems so complex you can’t immediately think of how they fail’ approach to security that we already knew back then was a miserable failure. Eliezer hasn’t been shown to be wrong here.
I continue to be amazed anyone can look at the past decade of DL and think that Hanson is strongly vindicated by it, rather than Yudkowsky-esque views. (Take a look at his OB posts on AI the past few years. Hanson is not exactly running victory laps, either on DL, foom, or ems. It would be too harsh to compare him to Gary Marcus… but I’ve seen at least one person do so anyway.) I would also say that to the extent that Yudkowsky-style research has enjoyed any popularity of late, it’s because people have been looking at the old debate and realizing that extremely simple generic architectures written down in a few dozen lines of code, with large capability differences between very similar lines of code, solving many problems in many fields and subsuming entire subfields as simply another minor variant, with large generalizing models (as opposed to the very strong small-models-unique-to-each-individual-problem-solved-case-by-case-by-subject-experts which Hanson & Drexler strongly advocated and which was the ML mainstream at the time) powered by OOMs more compute, steadily increasing in agency, is a short description of Yudkowsky’s views on what the runup will look like and how DL now works.
“his arguments focused on a fairly specific catastrophe scenario that most researchers now assign less weight to than they did when they first entered the field.”
Yet, the number who take it seriously since Eliezer started advocating it in the 1990s is now far greater than it was when he started and was approximately the only person anywhere. You aren’t taking seriously that these surveyed researchers (“AI Impacts, CHAI, CLR, CSER, CSET, FHI, FLI, GCRI, MILA, MIRI, Open Philanthropy and PAI”) wouldn’t exist without Eliezer as he created the AI safety field as we know it, with everyone else downstream (like Bostrom’s influential Superintelligence—Eliezer with the serial numbers filed off and an Oxford logo added). This is missing the forest for a few trees; if you are going to argue that a bit of regression to the mean in extreme beliefs should be taken as some evidence against Eliezer, then you must also count the initial extremity of the beliefs leading to these NGOs doing AI safety & people at them doing AI safety at all as much evidence for Eliezer.† (What a perverse instance of Simpson’s paradox.)
There’s also the caveat mentioned there that the reduction may simply be because they have moved up other scenarios like the part 2 scenario where it’s not a singleton hard takeoff but a multipolar scenario (a distinction of great comfort, I’m sure), which is a scenario which over the past few years is certainly looking more probable due to how DL scaling and arms races work. (In particular, we’ve seen some fast followups—because the algorithms are so simple that once you hear the idea described at all, you know most of it.) I didn’t take the survey & don’t work at the listed NGOs, but I would point out that if I had gone pro sometime in the past decade & taken it, under your interpretation of this statistic, you would conclude “Gwern now thinks Eliezer was wrong”. Something to think about, especially if you want to consider observations like “this statistic claims most people are moving away from Eliezer’s views, even though when I look at discussions of scaling, research trends, and what startups/NGOs are being founded, it sure looks like the opposite...”
* Flare has been, like Roko’s Basilisk, one of those things where the afterlife of it has been vastly greater than the thing itself ever was, and where it gets employed in mutually contradictory ways by critics
† I find it difficult to convey what incredibly hot garbage AI researcher opinions in the ’90s were about these topics. And I don’t mean the casual projections that AGI would take until 2500 AD or whatever, I mean basics like the orthogonality thesis and instrumental drives. Like ‘transhumanism’, these are terms used in inverse proportion to how much people need them. Even on SL4, which was the fringiest of the fringe in AI alarmism, you had plenty of people reading and saying, “no, there’s no problem here at all, any AI will just automatically be friendly and safe, human moral values aren’t fragile or need to be learned, they’re just, like, a law of physics and any evolving system will embody our values”. If you ever wonder how old people in AI like Kurzweil or Schmidhuber can be so gungho about the prospect of AGI happening and replacing (ie. killing) humanity and why they have zero interest in AI safety/alignment, it’s because they think that this is a good thing and our mind-children will just automatically be like us but better and this is evolution. (“Say, doth the dull soil / Quarrel with the proud forests it hath fed, / And feedeth still, more comely than itself?”...) If your response to reading this is, “gwern, do you have a cite for all of that? because no real person could possibly believe such a both deeply naive and also colossally evil strawman”, well, perhaps that will convey some sense of the intellectual distance traveled.
It’s not accurate that the key ideas of Superintelligence came to Bostrom from Eliezer, who originated them. Rather, at least some of the main ideas came to Eliezer from Nick. For instance, in one message from Nick to Eliezer on the Extropians mailing list, dated to Dec 6th 1998, inline quotations show Eliezer arguing that it would be good to allow a superintelligent AI system to choose own its morality. Nick responds that it’s possible for an AI system to be highly intelligent without being motivated to act morally. In other words, Nick explains to Eliezer an early version of the orthogonality thesis.
Nick was not lagging behind Eliezer on evaluating the ideal timing of a singularity, either—the same thread reveals that they both had some grasp of the issue. Nick said that the fact that 150,000 people die per day must be contextualised against “the total number of sentiences that have died or may come to live”, foreshadowing his piece on Astronomical Waste, that would be published five years later. Eliezer said that having waited billions of years, the probability of a success is more important than any delay of hundreds of years.
These are indeed two of the most-important macrostrategy insights relating to AI. A reasonable guess is that a lot of the big ideas in Superintelligence were discovered by Bostrom. Some surely came from Eliezer and his sequences, or from discussions between the two, and I suppose that some came from other utilitarians and extropians.
I think chapter 4, The Kinetics of an Intelligence Explosion, has a lot of terms and arguments from EY’s posts in the FOOM Debate. (I’ve been surprised by this in the past, thinking Bostrom invented the terms, then finding things like resource overhangs getting explicitly defined in the FOOM Debate.)
Thanks for the comment! A lot of this is useful.
I mainly have the impression that LOGI and related articles were probably “wrong” because, so far as I’ve seen, nothing significant has been built on top of them in the intervening decade-and-half (even though LOGI’s successor was seemingly predicted to make it possible for a small group to build AGI). It doesn’t seem like there’s any sign that these articles were the start of a promising path to AGI that was simply slower than the deep learning path.
I have had the impression, though, that Yudkowsky also thought that logical/Bayesian approaches were in general more powerful/likely-to-enable-near-term-AGI (not just less safe) than DL. It’s totally possible this is a misimpression—and I’d be inclined to trust your impression over mine, since you’ve read more of his old writing than I have. (I’d also be interested if you happen to have any links handy.) But I’m not sure this significantly undermine the relevance of the LOGI case.
I also think that, in various ways, Hanson also doesn’t come off great. For example, he expresses a favorable attitude toward the CYC project, which now looks like a clear dead end. He is also overly bullish about the importance of having lots of different modules. So I mostly don’t want to defend the view “Hanson had a great performance in the FOOM debate.”
I do think, though, his abstract view that compute and content (i.e. data) are centrally important are closer to mark than Yudkowsky’s expressed view. I think it does seem hard to defend Yudkowsky’s view that it’s possible for a programming team (with mid-2000s levels of compute) to acquire some “deep new insights,” go down into their basement, and then create an AI system that springboards itself into taking over the world. At least—I think it’s fair to say—the arguments weren’t strong enough to justify a lot of confidence in that view.
This is certainly a positive aspect of his track-record—that many people have now moved closer to his views. (It also suggests that his writing was, in expectation, a major positive contribution to the project of existential risk reduction—insofar as this writing has helped move people up and we assume this was the right direction to move.) But it doesn’t imply that we should give him many more “Bayes points” to him than we give to the people who moved.
Suppose, for example, that someone says in 2020 that there was a 50% chance of full-scale nuclear war in the next five years. Then—due to Russia’s invasion of Ukraine—most people move their credences upward (although they still remained closer to 0% than 50%). Does that imply the person giving the early warning was better-calibrated than the people who moved their estimates up? I don’t think so. And I think—in this nuclear case—some analysis can be used to justify the view that the person giving the early warning was probably overconfident; they probably didn’t have enough evidence or good enough arguments to actually justify a 50% credence.
It may still be the case that the person giving the early warning (in the hypothetical nuclear case) had some valuable and neglected insights, missed by others, that are well worth paying attention to and seriously reflecting on; but that’s a different matter from believing they were overall well-calibrated or should be deferred to much more than the people who moved.
[[EDIT: Something else it might be worth emphasizing, here, is that I’m not arguing for the view “ignore Eliezer.” It’s closer to “don’t give Eliezer’s views outsized weight, compared to (e.g.) the views of the next dozen people you might be inclined to defer to, and factor in evidence that his risk estimates might have a sigificant upward bias to them.”]]
I’m going to break a sentence from your comment here into bits for inspection. Also, emphasis and elisions mine.
We don’t have a formalism to describe what “agency” is. We do have several posts trying to define it on the Alignment Forum:
Gradations of Agency
Optimality is the tiger, and agents are its teeth
Agency and Coherence
While it might not be the best choice, I’m going to use Gradations of Agency as a definition, because it’s more systematic in its presentation.
“Level 3” is described as “Armed with this ability you can learn not just from your own experience, but from the experience of others—you can identify successful others and imitate them.”
This doesn’t seem like what any ML model does. So we can look at “Level 2,” which gives the example ” You start off reacting randomly to inputs, but you learn to run from red things and towards green things because when you ran towards red things you got negative reward and when you ran towards green things you got positive reward.”
This seems like how all ML works.
So using the “Gradations of Agency” framework, we might view individual ML systems as improving in power and generality within a single level of agency. But they don’t appear to be changing levels of agency. They aren’t identifying other successful ML models and imitating them.
Gradations of Agency doesn’t argue whether or not there is an asymptote of power and generality within each level. Is there a limit to the power and generality possible within level 2, where all ML seems to reside?
This seems to be the crux of the issue. If DL is approaching an asymptote of power and generality below that of AGI as model and data sizes increase, then this cuts directly against Yudkowsky’s predictions. On the other hand, if we think that DL can scale to AGI through model and data size increases alone, then that would be right in line with his predictions.
A 10 trillion parameter model now exists, and it’s been suggested that a 100 trillion parameter model, which might even be created this year, might be roughly comparable to the power of the human brain.
It’s scary to see that we’re racing full-on toward a very near-term ML project that might plausibly be AGI. However, if a 100-trillion parameter ML model is not AGI, then we’d have two strikes against Yudkowski. If neither a small coded model nor a 100-trillion parameter trained model using 2022-era ML results in AGI, then I think we have to take a hard look at his track record on predicting what technology is likely to result in AGI. We also have his “AGI well before 2050” statement from “Beware boasting” to work with, although that’s not much help.
On the other hand, I think his assertiveness about the importance of AI safety and risk is appropriate even if he proves wrong about the technology by which AGI will be created.
I would critique the OP, however, for not being sufficiently precise in its critiques of Yudkowsky. As its “fairly clearcut examples,” it uses 20+-year-old predictions that Yudkowsky has explicitly disavowed. Then, at the end, it complains that he hasn’t “acknowledged his mixed track record.” Yet in the post it links, Yudkowsky’s quoted as saying:
6 years is not 20 years. It’s perfectly consistent to say that a youthful, 20+-years-in-the-past version of you thought wrongly about a topic, but that you’ve since come to be so much better at making predictions within your field that you’re 6 years ahead of Metaculus. We might wish he’d stated these predictions in public and specified what they were. But his failure to do so doesn’t make him wrong, but rather lacking evidence of his superior forecasting ability. These are distinct failure modes.
Overall, I think it’s wrong to conflate “Yudkowsky was wrong 20+ years ago in his youth” with “not everyone in AI safety agrees with Yudkowsky” with “Yudkowsky hasn’t made many recent, falsifiable near-term public predictions about AI timelines.” I think this is a fair critique of the OP, which claims to be interrogating Yudkowsky’s “track record.”
But I do agree that it’s wise for a non-expert to defer to a portfolio of well-chosen experts, rather than the views of the originator of the field alone. While I don’t love the argument the OP used to get there, I do agree with the conclusion, which strikes me as just plain common sense.
Re gradations of agency: Level 3 and level 4 seem within reach IMO. IIRC there are already some examples of neural nets being trained to watch other actors in some simulated environment and then imitate them. Also, model-based planning (i.e. level 4) is very much a thing, albeit something that human programmers seem to have to hard-code. I predict that within 5 years there will be systems which are unambiguously in level 3 and level 4, even if they aren’t perfect at it (hey, we humans aren’t perfect at it either).
This sounds like straightforward transfer learning (TL) or fine tuning, common in 2017.
So you could just write 15 lines of python which shops between some set of pretrained weights and sees how they perform. Often TL is many times (1000x) faster than random weights and only needs a few examples.
As speculation: it seems like in one of the agent simulations you can just have agents grab other agents weights or layers and try them out in a strategic way (when they detect an impasse or new environment or something). There is an analogy to biology where species alternate between asexual vs sexual reproduction, and trading of genetic material occurs during periods of adversity. (This is trivial, I’m sure a second year student has written a lot more.)
This doesn’t seem to fit any sort of agent framework or improve agency though. It just makes you train faster.
Eh, there seems like a connection to interpretability.
For example, if the ML architecture “were modular+categorized or legible to the agents”, they would more quickly and effectively swap weights or models.
So there might be some way where legibility can emerge by selection pressure in an environment where say, agents had limited capacity to store weights or data, and had to constantly and extensively share weights with each other. You could imagine teams of agents surviving and proliferating by a shared architecture that let them pass this data fluently in the form of weights.
To make sure the transmission mechanism itself isn’t crazy baroque you can, like, use some sort of regularization or something.
I’m 90% sure this is a shower thought but like it can’t be worse than “The Great Reflection”.
n00b q: What’s AF?
Alignment Forum (for technical discussions about AI alignment)
It’s short for the Alignment Forum: https://www.alignmentforum.org/
Eh.
The above seems voluminous and I believe this is the written output with the goal of defending a person.
I will reluctantly engage directly, instead of just launching into another class of arguments or something or go for a walk (I’m being blocked by moral maze sort of reasons and unseasonable weather).
Yeah, no, it’s the exact opposite.
So one dude, who only has a degree in social studies, but seems to write well, wrote this:
https://docs.google.com/document/d/1hKZNRSLm7zubKZmfA7vsXvkIofprQLGUoW43CYXPRrk/edit#
I’m copying a screenshot to show the highlighting isn’t mine:
This isn’t what is written or is said, but using other experience unrelated to EA or anyone in it, I’m really sure even a median thought leader would have better convinced the person written this.
So they lost 4 years of support (until Superintelligence was written)
Yes, much like the OP is voluminous and is the written output with the goal of criticizing a person. You’re familiar with such writings, as you’ve written enough criticizing me. Your point?
No, it’s just as I said, and your Karnofsky retrospective strongly supports what I said. (I strongly encourage people to go and read it, not just to see what’s before and after the part He screenshots, but because it is a good retrospective which is both informative about the history here and an interesting case study of how people change their minds and what Karnofsky has learned.)
Karnofsky started off disagreeing that there is any problem at all in 2007 when he was introduced to MIRI via EA, and merely thought there were some interesting points. Interesting, but certainly not worth sending any money to MIRI or looking for better alternative ways to invest in AI safety. These ideas kept developing, and Karnofsky kept having to engage, steadily moving from ‘there is no problem’ to intermediate points like ‘but we can make tool AIs and not agent AIs’ (a period in his evolution I remember well because I wrote criticisms of it), which he eventually abandons. You forgot to screenshot the part where Karnofsky writes that he assumed ‘the experts’ had lots of great arguments against AI risk and the Yudkowsky paradigm and that was why they just bother talking about it, and then moved to SF and discovered ‘oh no’, that not only did those not exist, the experts hadn’t even begun to think about it. Karnofsky also agrees with many of the points I make about Bostrom’s book & intellectual pedigree (“When I’d skimmed Superintelligence (prior to its release), I’d felt that its message was very similar to—though more clearly and carefully stated than—the arguments MIRI had been making without much success.” just below where you cut off). And so here we are today, where Karnofsky has not just overseen donations of millions of dollars to MIRI and AI safety NGOs or the recruitment of MIRI staffers like ex-MIRI CEO Muehlhauser, but it remains a major area for OpenPhil (and philanthropies imitating it like FTX). It all leads back to Eliezer. As Karnofsky concludes:
That is, Karnofsky explicitly attributes the widespread changes I am describing to the causal impact of the AI risk community around MIRI & Yudkowsky. He doesn’t say it happened regardless or despite them, or that it was already fairly common and unoriginal, or that it was reinvented elsewhere, or that Yudkowsky delayed it on net.
Hard to be convincing when you don’t exist.
I also agree that Karnfosky’s retrospective supports Gwern’s analysis, rather than doing the opposite.
(I just disagree about how strongly it counts in favor of deference to Yudkowsky. For example, I don’t think this case implies we should currently defer more to Yudkwosky’s risk estimates than we do to Karnofsky’s.)
Ugh. Y’all just made me get into “EA rhetoric” mode:
What?
No. Not only is this not true but this is indulging in a trivial rhetorical maneuver.
My comment said that the counterfactual would be better without the involvement of the person mentioned in the OP. I used the retrospective as evidence.
The retrospective includes at least two points for why the author changed their mind:
The book Superintelligence, which they explicitly said was the biggest event
The author moved to SF and learned about DL, and was informed by speaking to non-rationalist AI researchers, and then decided that LessWrong and MIRI were right.
In response to this, Gwern states the point #2, and asserts that this is causal evidence in favor of the person mentioned in the OP being useful.
Why? How?
Notice that #2 above doesn’t at all rule out that the founders or culture was repellent. In fact it seems like a lavish, and unlikely level amount of involvement.
I interpreted Gwern as mostly highlighting that people have updated toward’s Yudkowsky’s views—and using this as evidence in favor of the view we should defer a decent amount to Yudkowsky. I think that was a reasonable move.
There is also a causal question here (‘Has Yudkowsky on-net increased levels of concern about AI risk relative to where they would otherwise be?’), but I didn’t take the causal question to be central to the point Gwern was making. Although now I’m less sure.
I don’t personally have strong views on the causal question—I haven’t thought through the counterfactual.
By the way, I didn’t screenshot the pieces that fit my narrative—Gwern’s assertion of bad faith is another device being used.
Gwern also digs up a previous argument. Not only is that issue entirely unrelated, its sort of exactly the opposite evidence he wants to show: Gwern appeared to borderline or threaten to dox someone who spoke out against him.
I commented. However I do not know anyone involved, such as who Gwern was, but only acting on the content and behaviour I saw, which was outright abusive.
There is no expected benefit to doing this. It’s literally the most principled thing to act in this way and I would do it again.
The consequences of that incident, the fact that this person with this behavior and content had this much status, was a large update for me.
More subtly and perniciously, Gwern’s adverse behavior in this comment chain and the incident mentioned above, is calibrated to the level of “EA rhetoric”. Digs like his above can sail through, with the tailwind of support of a subset of this community, a subset that values authority over content and Truth, to a degree much more than it understands.
On the other hand, in contrast, an outsider, who already has to dance through all the rhetorical devices and elliptical references, has to make a high effort, unemotional comment to try to make a point. Even or especially if they manage to do this, they can expect to be hit with a wall of text with various hostilities.
Like, this is awful. This isn’t just bad but it’s borderline abusive.
It’s wild that that this is the level of discourse here.
Because of the amount of reputation, money and ingroupness, this is probably one of the most extreme forms of tribalism that exists.
Do you know how much has been lost?
Charles, consider going for that walk now if you’re able to. (Maybe I’m missing it, but the rhetorical moves in this thread seem equally bad, and not very bad at that.)
You are right, I don’t think my comments are helping.
Like, how can so many standard, stale patterns of internet forum authority, devices and rhetoric be rewarded and replicate in a community explicitly addressing topics like tribalism and “evaporative cooling”?
The moderators feel that some comments in this thread break Forum norms and are discussing what to do about it.
Here are some things we think break Forum norms:
Rude/hostile language and condescension, especially from Charles He
Gwern brings in an external dispute — a thread in which Charles accuses them of doxing an anonymous critic on LessWrong. We think that bringing in external disputes interferes with good discourse; it moves the thread away from discussion of the topic in question, and more towards discussions of individual users’ characters
The conversation about the external dispute gets increasingly unproductive
The mentioned thread about doxing also breaks Forum norms in multiple ways. We’ve listed them on that thread.
The moderators are still considering a further response. We’ll also be discussing with both Gwern and Charles privately.
The moderation team is issuing Charles a 3-month ban.
I honestly don’t see such a problem with Gwern calling out out Charles’ flimsy argument and hypocrisy using an example, be it a part of an external dispute.
On the other hand, I think Charles’ uniformly low comment quality should have had him (temporarily) banned long ago (sorry Charles). The material is generally poorly organised, poorly researched, often intentionally provocative, sometimes interspersed with irrelevant images, and high in volume. One gets the impression of an author who holds their reader in contempt.
I don’t necessarily disagree with the assessment of a temporary ban for “unnecessary rudeness or offensiveness”, or “other behaviour that interferes with good discourse”, but I disagree that Charles’ comment quality is “uniformly” low or that a ban might be merited primarily because of high comment volume and too low quality.There are some real insights and contributions sprinkled in in my opinion.
For me the unnecessary rudeness or offensiveness and other behavior interfering with discourse comes from things like comments that are technically replies to a particular person but seem like they’re mostly intended to win the argument in front of unknown readers, and containing things like rudeness, paranoia, and condescension towards the person they’re replying to. I think the doxing accusation, which if I remember correctly actually doxxed the victim much more than Gwern’s comment, is part of a similar pattern of engaging poorly with a particular person, partly through an incorrect assessment that the benefits to bystanders will outweigh the costs. I think this sort of behavior stifles conversation and good will.
I’m not sure a ban is a great solution though. There might be other, less blunt ways of tackling this situation.
What I would really like to see is a (much) higher lower limit of comment quality from Charles i.e. moving the bar for tolerating rudeness and bad behavior in a comment much higher even though it could be potentially justified in terms of benefits to bystanders or readers.
This is useful and thoughtful. I will read and will try to update on this (in general life, if not the forum?) Please continue as you wish!
I want to notify you and others, that I don’t expect such discussion to materially affect any resulting moderator action, see this comment describing my views on my ban.
Below that comment, I wrote some general thoughts on EA. It would be great if people considered or debated the ideas there.
Comments on Global Health
Comments on Animal Welfare
Comments on AI Safety
Comments on two Meta EA ideas
I don’t disagree with your judgement of banning but I point out there’s no banning for quality—you must be very frustrated with the content.
To get a sense of this, for the specific issue in the dispute, where I suggested the person or institution in question caused a a 4 year delay in funding, are you saying it’s an objectively bad read, even limited to just the actual document cited? I don’t see how that is.
Or is this wrong, but requires additional context or knowledge.
Re the banning idea, I think you could fall afoul of “unnecessary rudeness or offensiveness”, or “other behaviour that interferes with good discourse” (too much volume, too low quality). But I’m not the moderator here.
My point is that when you say that Gwern produces verbose content about a person, it seems fine—indeed quite appropriate—for him to point out that you do too. So it seems a bit rich for that to be a point of concern for moderators.
I’m not taking any stance on the doxxing dispute itself, funding delays, and so on.
I agree with your first paragraph for sure.
A general reflection: I wonder if one at least minor contributing factor to disagreement, around whether this post is worthwhile, is different understandings about who the relevant audience is.
I mostly have in mind people who have read and engaged a little bit with AI risk debates, but not yet in a very deep way, and would overall be disinclined to form strong independent views on the basis of (e.g.) simply reading Yudkowsky’s and Christiano’s most recent posts. I think the info I’ve included in this post could be pretty relevant to these people, since in practice they’re often going to rely a lot—consciously or unconsciously; directly or indirectly—on cues about how much weight to give different prominent figures’ views. I also think that the majority of members of the existential risk community are in this reference class.
I think the info in this post isn’t nearly as relevant to people who’ve consumed and reflected on the relevant debates very deeply. The more you’ve engaged with and reflected on an issue, the less you should be inclined to defer—and therefore the less relevant track records become.
(The limited target audience might be something I don’t do a good enough job communicating in the post.)
I think that insofar as people are deferring on matters of AGI risk etc., Yudkowsky is in the top 10 people in the world to defer to based on his track record, and arguably top 1. Nobody who has been talking about these topics for 20+ years has a similarly good track record. If you restrict attention to the last 10 years, then Bostrom does and Carl Shulman and maybe some other people too (Gwern?), and if you restrict attention to the last 5 years then arguably about a dozen people have a somewhat better track record than him.
(To my knowledge. I think I’m probably missing a handful of people who I don’t know as much about because their writings aren’t as prominent in the stuff I’ve read, sorry!)
He’s like Szilard. Szilard wasn’t right about everything (e.g. he predicted there would be a war and the Nazis would win) but he was right about a bunch of things including that there would be a bomb, that this put all of humanity in danger, etc. and importantly he was the first to do so by several years.
I think if I were to write a post cautioning people against deferring to Yudkowsky, I wouldn’t talk about his excellent track record but rather about his arrogance, inability to clearly explain his views and argue for them (at least on some important topics, he’s clear on others), seeming bias towards pessimism, ridiculously high (and therefore seemingly overconfident) credences in things like p(doom), etc. These are the reasons I would reach for (and do reach for) when arguing against deferring to Yudkowsky.
[ETA: I wish to reemphasize, but more strongly, that Yudkowsky seems pretty overconfident not just now but historically. Anyone deferring to him should keep this in mind; maybe directly update towards his credences but don’t adopt his credences. E.g. think “we’re probably doomed” but not “99% chance of doom” Also, Yudkowsky doesn’t seem to be listening to others and understanding their positions well. So his criticisms of other views should be listened to but not deferred to, IMO.]
“Nobody who has been talking about these topics for 20+ years has a similarly good track record.”
Really? We know EY made a bunch of mispredictions “A certain teenaged futurist, who, for example, said in 1999, “The most realistic estimate for a seed AI transcendence is 2020; nanowar, before 2015.” What are his good predictions? I can’t see a single example in this thread.
Ironically, one of the two predictions you quote as example of bad prediction, is in fact an example of a good prediction: “The most realistic estimate for a seed AI transcendence is 2020.”
Currently it seems that AGI/superintelligence/singularity/etc. will happen sometime in the 2020′s. Yudkowsky’s median estimate in 1999 was 2020 apparently, so he probably had something like 30% of his probability mass in the 2020s, and maybe 15% of it in the 2025-2030 period when IMO it’s most likely to happen.
Now let’s compare to what other people would have been saying at the time. They would almost all have been saying 0%, and then maybe the smarter and more rational ones would have been saying things like 1%, for the 2025-2030 period.
To put it in nonquantitative terms, almost everyone else in 1999 would have been saying “AGI? Singularity? That’s not a thing, don’t be ridiculous.” The smarter and more rational ones would have been saying “OK it might happen eventually but it’s nowhere in sight, it’s silly to start thinking about it now.” Yudkowsky said “It’s about 21 years away, give or take; we should start thinking about it now.” Now with the benefit of 24 years of hindsight, Yudkowsky was a lot closer to the truth than all those other people.
Also, you didn’t reply to my claim. Who else has been talking about AGI etc. for 20+ years and has a similarly good track record? Which of them managed to only make correct predictions when they were teenagers? Certainly not Kurzweil.
Fwiw I’d say this somewhat differently.
I object to a specific way in which one could use coherence arguments to support AI risk: namely, “AI is intelligent --> AI satisfies coherence arguments better than we do --> AI looks as though it is maximizing a utility function from our perspective --> Convergent instrumental subgoals --> Doom”.
As far as I know, anyone who has spent ~an hour reading my post and thinking about it basically agrees with that particular narrow point.
This doesn’t rule out other ways that one could use coherence arguments to support AI risk, such as “coherence arguments show that achieving stuff can typically be factored into beliefs about the world and goals that you want to achieve; since we’ll be building AIs to achieve stuff, it seems likely they’ll work by having separated beliefs and goals; if they have bad goals, then we die because of convergent instrumental subgoals”. I’m more sympathetic to this argument (though not nearly as much as Eliezer appears to be).
I agree that the intro talk that you link to would likely cause people to think of the first pathway (which I object to) rather than the second pathway. Similar rhetoric caused me to believe the first pathway for a while.
But it also looks like the sort of talk you might give if you were thinking about the second pathway, and then compressed it losing a bunch of nuance, and didn’t notice that people might then instead think of the first pathway.
(It’s not clear whether any of this changes the upshot of your post. I am mostly trying to preserve nuance so I get fewer people saying “I thought you thought utility functions are fake” which is definitely not what I said or believed.)
Thank you for writing this, Ben. I think the examples are a helpful and I plan to read more about several of them.
With that in mind, I’m confused about how to interpret your post and how much to update on Eliezer. Specifically, I find it pretty hard to assess how much I should update (if at all) given the “cherry-picking” methodology:
If you were apply this to any EA thought leader (or non-EA thought leader, for that matter), I strongly suspect you’d find a lot clearcut and disputable examples of them being wrong on important things.
As a toy analogy, imagine that Alice is widely-considered to be extremely moral. I hire an investigator to find as many examples of Alice doing Bad Things as possible. I then publish my list of Bad Things that Alice has done. And I tell people “look—Alice has done some Bad Things. You all think of her as a really moral person, and you defer to her a lot, but actually, she has done Bad Things!”
And I guess I’m left with a feeling of… OK, but I didn’t expect Alice to have never done Bad Things! In fact, maybe I expected Alice to do worse things than the things that were on this list, so I should actually update toward Alice being moral and defer to Alice more.
To make an informed update, I’d want to understand your balanced take. Or I’d want to know some of the following:
How much effort did the investigator spend looking for examples of Bad Things?
Given my current impression of Alice, how many Bad Things (weighted by badness) would I have expected the investigator to find?
How many Good Things did Alice do (weighted by goodness)?
Final comment: I think this comment might come across as ungrateful—just want to point out that I appreciate this post, find it useful, and will be more likely to challenge/question my deference as a result of it.
I think the effect should depend on your existing view. If you’ve always engaged directly with Yudkowsky’s arguments and chose the ones convinced you, there’s nothing to learn. If you thought he was a unique genius and always assumed you weren’t convinced of things because he understood things you didn’t know about, and believed him anyway, maybe it’s time to dial it back. If you’d always assumed he’s wrong about literally everything, it should be telling for you that OP had to go 15 years back for good examples.
Writing this comment actually helped me understand how to respond to the OP myself.
‘If you’d always assumed he’s wrong about literally everything, it should be telling for you that OP had to go 15 years back to get good examples.’ How strong evidence this is also depends on whether he has made many resolvable predictions since 15-years ago, right? If he hasn’t it’s not very telling. To be clear, I genuinely don’t know if he has or hasn’t.
Sounds reasonable. Though predictions aren’t the only thing one can be demonstratably wrong about.
The negative reactions to this post are disheartening. I have a degree of affectionate fondness for the parodic levels of overthinking that characterize the EA community, but here you really see the downsides of that overthinking concretely.
Of course it is meaningful that Eliezer Yudkowsky has made a bunch of terrible predictions in the past that closely echo predictions he continues to make in slightly different form today. Of course it is relevant that he has neither owned up to those earlier terrible predictions or explained how he has learned from those mistakes. Of course we should be more skeptical of similar claims he makes in the future. Of course we should pay more attention to broader consensus or aggregate predictions in the field than in outlier predictions.
This is sensible advice in any complex domain, and saying that we should “evaluate every argument in isolation on its merits” is a type of special pleading or sophistry. Sometimes (often!) the obvious conclusions are the correct ones: even extraordinarily clever people are often wrong; extreme claims that other knowledgeable experts disagree with are often wrong; and people who make extreme claims that prove to be wrong should be strongly discounted when they make further extreme claims.
None of this is to suggest in any what that Yudkowsky should be ignored, or even is necessarily wrong. But if you yourself are not an expert in AI (as most of us aren’t), his past bad predictions are highly relevant indicators when assessing his current predictions.
I assume you’re mainly talking about young-Eliezer worrying about near-term risk from molecular nanotechnology, and current-Eliezer worrying about near-term risk from AGI?
I think age-17 Eliezer was correct to think widespread access to nanotech would be extremely dangerous. See my comment. If you or Ben disagree, why do you disagree?
Age-20 Eliezer was obviously wrong about the timing for nanotech, and this is obviously Bayesian evidence for ‘Eliezer may have overly-aggressive tech timelines in general’.
I don’t think this is generally true—e.g., if you took a survey of EAs worried about AI risk in 2010 or in 2014, I suspect Eliezer would have longer AI timelines than others at the time. (E.g., he expected it to take longer to solve Go than Carl Shulman did.) When I joined MIRI, the standard way we summarized MIRI’s view was roughly ‘We think AI risk is high, but not because we think AGI is imminent; rather, our worry is that alignment is likely to take a long time, and that civilization may need to lay groundwork decades in advance in order to have a realistic chance of building aligned AGI.’
But nanotech is a totally fair data point regardless.
Eliezer wrote a 20,000-word essay series on his update, and the mistakes he thought he was making. Essay titles include “My Childhood Death Spiral”, “The Sheer Folly of Callow Youth”, “Fighting a Rearguard Action Against the Truth”, and “The Magnitude of His Own Folly”.
He also talks a lot about how he’s updated and revised his heuristics and world-models in other parts of the Sequences. (E.g., he writes that he underestimated elite competence when he was younger.)
What specific cognitive error do you want him to write about, that he hasn’t already written on?
I don’t think the argument I’m making (or most others are making) is ‘don’t update on people’s past mistakes’ or ‘never do deference’. Rather, a lot of the people discussing this matter within EA (Wei Dai, Gwern Branwen, Richard Ngo, Rohin Shah, Carl Shulman, Nate Soares, Ajeya Cotra, etc.) are the world’s leading experts in this area, and a lot of the world’s frontier progess on this topic is happening on Internet fora like the EA Forum and LessWrong. It makes sense for domain specialists to put much more focus into evaluating arguments on the merits; object-level conversations like these are how the intellectual advances occur that can then be reflected in aggregators like Metaculus.
Metaculus and prediction markets will be less accurate if frontier researchers replace object-level discussion with debates about who to defer to, in the same way that stock markets would be less efficient if everyone overestimated the market’s efficiency and put minimal effort into beating the market.
Insofar as we’re trying to grow the field, it also makes sense to encourage more EAs to try to think about these topics and build their own inside-view models; and this has the added benefit of reducing the risk of deference cascades.
(I also think there are other reasons it would be healthy for EA to spend a lot more time on inside-view building on topics like AI, normative ethics, and global poverty, as I briefly said here. But it’s possible to practice model-building and then decide at the end of the day, nonetheless, that you don’t put much weight on the domain-specific inside views you’ve built.)
When people use words like “extreme” here, I often get the sense that they aren’t crisply separating “extreme” in the sense of “weird-sounding” from “extreme” in the sense of “low prior probability”. I think Eliezer’s views are weird-sounding, not unlikely on priors.
E.g., why should we expect generally intelligent machines to be low-impact if built, or to never be built?
The idea that a post-AGI world looks mostly the same as a pre-AGI world might sound more normal and unsurprising to an early-21st-century well-off Anglophone intellectual, but I think this is just an error. It’s a clear case of the availability heuristic misfiring, not a prior anyone should endorse upon reflection.
I view the Most Important Century series as an attempt to push back against many versions of this conflation.
Epistemically, I view Paul’s model as much more “extreme” than Eliezer’s because I think it’s much more conjunctive. I obviously share the view that soft takeoff sounds more normal in some respects, but I don’t think this should inform our prior much. I’d guess we should start with a prior that assigns lots of weight to soft takeoff as well as to hard takeoff, and then mostly arrive at a conclusion based on the specific arguments for each view.
Several thoughts:
I’m not sure I can argue for this, but it feels weird and off-putting to me that all this energy is being spent discussing how good a track-record one guy has, especially one guy with a very charismatic and assertive writing-style, and a history of attempting to provide very general guidance for how to think across all topics (though I guess any philosophical theory of rationality does the last thing.) It just feels like a bad sign to me, though that could just be for dubious social reasons.
The question of how much to defer to E.Y. isn’t answered just by things like “he has possibly the best track record in the world on this issue.” If he’s out of step with other experts, and by a long way, we need to have reason to think he outperforms the aggregate of experts before we weight him more than the aggregate and it’s entirely normal, I’d have thought, for the aggregate to significantly outperform the single best individual. (I’m not making as strong a claim as that the best individual outperforming the aggregate is super-unusual and unlikely.) Of course if you think he’s nearly as good as the aggregate, then you should still move a decent amount in his direction. But even that is quite a strong claim that goes beyond him being in the handful of individuals with the best track record.
It strikes me that some of the people criticizing this post on the grounds that actually E.Y. has a great track record keep citing “he’s been right that there is significant X-risk from A.I., when almost everyone else missed that’ for a couple of reasons.
Firstly, this isn’t actually a prediction that has been resolved as correct in any kind of unambiguous way. Sure, a lot of very smart people in the EA community now agree. (And I agree the risk is worth assigning EA resources to as well, to be clear.) But we should be wary of substituting the judgment of the community that a prediction looks rational, for a track record of predictions that have actually resolved successfully in my view. (I think the later is better evidence than the former in most cases.)
Secondly, I feel like E.Y. being right about the importance of A.I.-risk is actually not very surprising, conditional on the key assumption here about E.Y. that Ben is relying on in telling people to be cautious about the probabilities and timelines that E.Y. gives for A.I. doom, but that even given this, IF Ben’s assumption is correct it’s still a good reason to doubt E.Y.‘s p(doom). Suppose, as is being alleged here, someone has a general bias, for whatever reasons towards the view that doom from some technological source or other is likely and imminent. Does that make it especially surprising that that individual finds an important source of doom most people have missed? Not especially that I can see: sure they will be less rational on the topic perhaps, but a) a bias towards p(doom) wbeing high doesn’t necessarily imply being poor ranking sources of doom-risk by relative importance, and b) there is probably a counter-effect where bias towards doom makes you more likely to find underrated doom-risks, because you spend more time looking. Of course, finding a doom-risk larger than most others that approx. everyone had missed would still be a very impressive achievement. But the question Ben’s addressing isn’t “is E.Y. a smart person with insights about A.I. risk?” but rather “how much should we update on E.Y.‘s views about p(near-term A.I. doom)?” Suppose significant bias towards doom is genuinely evidenced by E.Y.‘s earlier nanotech prediction (which to be fair is only 1 data point) and a good record at identifying neglected important doom sources is only weak evidence that E.Y. lacks the bias. Then we’d be right to only update a little towards doom, even if E.Y.’s record on A.I. risk was impressive in some ways.
Some things that aren’t said in this post or any comments in here yet:
The issue isn’t at all about 15-20 year old content, it’s about very recent content and events (mostly publicly visible)
In addition to this recent, publicly visible content, there are several latent issues or effects that directly affect progress in the relevant cause area
To calibrate, this could be slowing things down by 10 times or more, in what is supposed to be the most important cause area in EA and whose effects are supposed to happen very soon
Certain comments here do not at all contain all of the relevant content, because laying them out risks damaging an entire cause area.
Certain commentors may feel personally restricted from doing for a variety of complex reasons (“moral mazes”) and the content they are presenting is a “second best” option
The above interacts poorly with the customs and practices around discourse and criticism
These in totality have become sort of an odious and out of space specter, invisible to people who a lot of spend time here
For all I know, you maybe right or not (insofar as I follow what’s being insinuated), but whilst I freely admit that l, like anyone who wants to work in EA, have self-interested incentives to not be too critical of Eliezer, there is no specific secret “latent issue” that I personally am aware of and consciously avoiding talking about. Honest.
I am grateful for your considerate comment and your reply. I had no belief or thought about dishonesty.
Maybe I should have added[1]:
“this is for onlookers”
“this is trying to rationalize/explain why this post exists, that has 234 karma and 156 votes, yet only talks about high school stuff.”
I posted my comment because this situation is hurting onlookers and producing bycatch?
I don’t really know what to do here (as a communications thing) and I have incentives not to be involved?
But this is sort of getting into the elliptical rhetoric and self-referential stuff, that is sort of related to the problem in the first place.
Some off-topic comments, not specific to you or Yudkowsky:
It seems to me (but I could be mistaken) like I see the phrase “has thought a lot about X” fairly often in EA contexts, where it is taken to imply being very well-informed about X. I don’t think this is good reasoning. Thinking about something is probably required for understanding it well, but is certainly not enough.
When an idea or theory is very fringe, there’s a strong selection effect for people in the relevant intellectual community. This means even their average views are sometimes not good evidence for something. For example, to answer a question about the probability of doom from AI in this century, are alignment researchers a good reference class? They all naturally believe AI is an existential risk to begin with. I’m not sure I have the solution, since “AI researchers in general” isn’t a good reference class either—many might have not given any thought to whether AI is dangerous.
Strong +1 on this. It in fact seems like the more someone thinks about something and takes a public position on it with strong confidence the more incentive they have to stick to the position they have. It’s why making explicit forecasts and creating a forecasting track record is so important in countering this tendency. If arguments cannot be resolved by events happening in the real world then there is not much incentive for one to change their mind especially if it’s about something speculative and abstract that one can generate arguments for ad infinitum by engaging in more speculation.
On your example. The question of AI existential risk this century seems downstream to the question of the probability of AGI this century and one can find some potential reference classes for that: AI safety research, general AI research, computer science research, scientific research, technological innovation etc. None of these are perfect reference classes but are at least something to work with. Contingent on AGI being possible this century one can form an opinion on how low/high the probability of doom be to warrant concern.
I like that you admit that your examples are cherry-picked. But I’m actually curious what a non-cherry-picked track record would show. Can people point to Yudkowsky’s successes? What did he predict better than other people? What project did MIRI generate that either solved clearly interesting technical problems or got significant publicity in academic/AI circles outside of rationalism/EA? Maybe instead of a comment here this should be a short-form question on the forum.
While he’s not single-handedly responsible, he lead the movement to take AI risk seriously at a time when approximately no one was talking about it, which has now attracted the interests of top academics. This isn’t a complete track record, but it’s still a very important data-point. It’s a bit like if he were the first person to say that we should take nuclear war seriously, and then five years later people are starting to build nuclear bombs and academics realize that nuclear war is very plausible.
I definitely do agree with that!
It’s possible I should have emphasized the significance of it more in the post, rather than moving on after just a quick mention at the top.
If it’s of interest: I say a little more about how I think about this, in response to Gwern’s comment below. (To avoid thread-duplicating, people might want to respond there rather than here if they have follow-on thoughts on this point.) My further comment is:
I work at MIRI, but as usual, this comment is me speaking for myself, and I haven’t heard from Eliezer or anyone else on whether they’d agree with the following.
My general thoughts:
The primary things I like about this post are that (1) it focuses on specific points of disagreement, encouraging us to then hash out a bunch of object-level questions; and (2) it might help wake some people from their dream if they hero-worship Eliezer, or if they generally think that leaders in this space can do no wrong.
By “hero-worshipping” I mean a cognitive algorithm, not a set of empirical conclusions. I’m generally opposed to faux egalitarianism and the Modest-Epistemology reasoning discussed in Inadequate Equilibria: if your generalized anti-hero-worship defenses force the conclusion that there just aren’t big gaps in skills or knowledge (or that skills and knowledge always correspond to mainstream prestige and authority), then your defenses are ruling out reality a priori. In saying “people need to hero-worship Eliezer less”, I’m opposing a certain kind of reasoning process and mindset, not a specific factual belief like “Eliezer is the clearest thinker about AI risk”.
In a sense, I want to promote the idea that the latter is a boring claim, to be evaluated like any other claim about the world; flinching away from it (e.g., because Eliezer is weird and says sci-fi-sounding stuff) and flinching toward it (e.g., because you have a bunch of your identity invested in the idea that the Sequences are awesome and rationalists are great) are both errors of process.
The main thing I dislike about this post is that it introduces a bunch of not-obviously-false Eliezer-claims — claims that EAs either widely disagree about, or haven’t discussed — and then dives straight into ‘therefore Eliezer has a bad track record’.
E.g., I disagree that molecular nanotech isn’t a big deal (if that’s a claim you’re making?), that Robin better predicted deep learning than Eliezer did, and that your counter-arguments against Eliezer and Bostrom are generally strong. Certainly I don’t think these points have been well-established enough that it makes sense to cite them in the mode ‘look at these self-evident ways Yudkowsky got stuff wrong; let us proceed straight to psychoanalysis, without dwelling on the case for why I think he’s wrong about this stuff’. At this stage of the debate on those topics, it would be more appropriate to talk in terms of cruxes like ‘I think the history of tech shows it’s ~always continuous in technological change and impact’, so it’s clear why you disagree with Eliezer in the first place.
I generally think that EA’s core bottlenecks right now are related to ‘willingness to be candid and weird enough to make intellectual progress (especially on AI alignment), and to quickly converge on our models of the world’.
My own models suggest to me that EA’s path to impact is almost entirely as a research community and a community that helps produce other research communities, rather than via ‘changing the culture of the world at large’ or going into politics or what-have-you. In that respect, rigor and skepticism is good, but singling out Eliezer because he’s unusually weird and candid is bad, because it discourages others from expressing weird/novel/minority views and from blurting out their true thought processes. (I recognize that this isn’t the only reason you’re singling Eliezer out, but it’s obviously a contributing factor.)
I am a big fan of Ben’s follow-up comment. Especially the part where he outlines the thought process that led to him generating the post’s contents. I think this is an absolutely wonderful thing to include in a variety of posts, or to add in the comment sections for a lot of posts.
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
Some specific thoughts on Ben’s follow-up comment:
1. I agree with Ben on this: “If a lot of people in the community believe AI is probably going to kill everyone soon, then (if they’re wrong) this can have really important negative effects”.
I think they’re not wrong, and I think the benefits of discussing this openly strongly outweigh the costs. But the negative effects are no less real for that.
(Separately, I think the “death with dignity” post was a suboptimal way to introduce various people to the view that p(doom) is very high. I’m much more confident that we should discuss this at all, than that Eliezer or I or others have been discussing this optimally.)
2. “Directly and indirectly, deference to Yudkowsky has a significant influence on a lot of people’s views”
Agreed.
Roughly speaking, my own view is:
EAs currently do a very high amount of deferring to others (both within EA and outside of EA) on topics like AI, global development, moral philosophy, economics, cause prioritization, organizational norms, personal career development, etc.
On the whole, EAs currently do a low amount of model-building and developing their own inside views.
EAs should switch to doing a medium amount of deference on topics like the ones I listed, and a very high amount of personal model-building.
Note that model-building can be useful even if you think all your conclusions will be strictly worse than the models of some other person you’ve identified. I’m pretty radical on this topic, and think that nearly all EAs should spend a nontrivial fraction of their time developing their own inside-view models of EA-relevant stuff, in spite of the obvious reasons (like gains from specialization) that this would normally not make sense.
Happy to say more about my views here, and I’ll probably write a post explaining why I think this.
I think the Alignment Research Field Guide, in spite of nominally being about “alignment”, is the best current intro resource for “how should I go about developing my own models on EA stuff?” A lot of the core advice is important and generalizes extremely well, IMO.
Insofar as EAs should do deference at all, Eliezer is in the top tier of people it makes sense to defer to.
But I’d guess the current amount of Eliezer-deference is way too high, because the current amount of deference overall is way too high. Eliezer should get a relatively high fraction of the deference pie IMO, but the overall pie should shrink a lot.
3. I also agree with Ben on “The track records of influential intellectuals (including Yudkowsky) should be publicly discussed.”
I don’t like the execution of the OP, but I strongly disagree with the people in the comments who have said “let us never publicly talk about individuals’ epistemic track records at all”—both because I think ‘how good is EY’s reasoning’ is a genuine crux for lots of people, and because I think this is a very common topic people think about, both in more pro-Eliezer and in more anti-Eliezer camps.
Discussing cruxes is obviously good, but even if this weren’t a crux for anyone, I’m strongly in favor of EAs doing a lot more “sharing their actual thoughts out loud”, including the more awkward and potentially inflammatory ones. (I’m happy to say more about why I think this.)
I do think it’s worth talking about what the best way is to discuss individuals’ epistemic track records, without making EA feel hostile/unpleasant/scary. I think EAs are currently way too timid (on average) about sharing their thoughts, so I worry about any big norm shifts that might make that problem even worse.
But Eliezer’s views are influential enough (and cover a topic, AGI, that is complicated and difficult enough to reason about) that this just seems like an important topic to me (similar to ‘how much should we defer to Paul?’, etc.). I’d rather see crappy discussion of this in the community than zero discussion whatsoever.
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
Some specific thoughts on claims in the OP:
This is in large part Eliezer’s fault for picking such a bad post title, but I should still note that this is a very misleading summary. “Dying with dignity” often refers to giving up on taking any actions to keep yourself alive.
Eliezer’s version of “dying with dignity” is exactly the opposite: he’s advocating for doing whatever it takes to maximize the probability that humanity survives.
It’s true that he thinks we’ll probably fail (and I agree), and he thinks we should emotionally reconcile ourselves with that fact (because he thinks this emotional reconciliation will itself increase our probability of surviving!!), but he doesn’t advocate giving up.
Quoting the post:
“Q1: Does ‘dying with dignity’ in this context mean accepting the certainty of your death, and not childishly regretting that or trying to fight a hopeless battle?
“Don’t be ridiculous. How would that increase the log odds of Earth’s survival?”
I think the “no later than 2010” prediction is from when Eliezer was 20, but the bulk of the linked essay was written when he was 17. The quotation here is: “As of ’95, Drexler was giving the ballpark figure of 2015. I suspect the timetable has been accelerated a bit since then. My own guess would be no later than 2010.”
The argument for worrying about extinction via molecular nanotech to some non-small degree seems pretty straightforward and correct: molecular nanotech lets you build arbitrary structures, including dangerous ones, and some humans would want to destroy the world given the power to do so.
Eliezer was overconfident about nanotech timelines (though roughly to the same degree as Drexler, the world’s main authority on nanotech).
Eliezer may have also been overconfident about nanotech’s riskiness, but the specific thing he said when he was 17 is that he considered it important for humanity to achieve AGI “before nanotechnology, given the virtual certainty of deliberate misuse—misuse of a purely material (and thus, amoral) ultratechnology, one powerful enough to destroy the planet”.
It’s not clear to me whether this is saying that human-extinction-scale misuse from nanotech is ‘virtually certain’, versus the more moderate claim that some misuse is ‘virtually certain’ if nanotech sees wide usage (and any misuse is pretty terrifying in EV terms). The latter seems reasonable to me, given how powerful molecular nanotechnology would be.
Eliezer denies that he has a general tendency toward alarmism:
It seems fair to note that nanotech is a second example of Eliezer raising alarm bells. But this remains a pretty small number of data points, and in neither of those cases does it actually look unreasonable to worry a fair bit—those are genuinely some of the main ways we could destroy ourselves.
I think ‘Eliezer predicted nanotech way too early’ is a better data point here, as evidence for ‘maybe Eliezer tends to have overly aggressive tech forecasts’.
If Eliezer was deferring to Drexler to some extent, that makes the data a bit less relevant, but ‘I was deferring to someone else who was also wrong’ is not in fact a general-purpose excuse for getting the wrong answer.
That view seems very dumb to me — specifically the belief that SingInst’s very first unvetted idea would pan out and result in them building AGI, more so than the timelines per se.
I don’t fault 21-year-old Eliezer for trying (except insofar as he was totally wrong about the probability of Unfriendly AI at the time!), because the best way to learn that a weird new path is unviable is often to just take a stab at it. But insofar as 2001-Eliezer thought his very first idea was very likely to work, this seems like a totally fair criticism of the quality of his reasoning at the time.
Looking at the source text, I notice that the actual text is much more hedged than Ben’s summary (though it still sounds foreseeably overconfident to me, to the extent I can glean likely implicit probabilities from tone):
Note that this paper was written much earlier than its publication date. Description from yudkowsky.net: “Book chapter I wrote in 2002 for an edited volume, Artificial General Intelligence, which is now supposed to come out in late 2006. I no longer consider LOGI’s theory useful for building de novo AI. However, it still stands as a decent hypothesis about the evolutionary psychology of human general intelligence.”
I agree that Eliezer loses Bayes points (e.g., relative to Shane Legg and Dario Amodei) for not predicting the enormous success of deep learning. See also Nate’s recent post about this.
I disagree that Robin Hanson scored Bayes points off of Eliezer, on net, from the deep learning revolution, or that Hanson’s side of the Foom debate looks good (compared to Eliezer’s) with the benefit of hindsight. I side with Gwern here; I think Robin’s predictions and arguments on this topic have been terrible, as a rule.
I think Eliezer assigned too high a probability to ‘it’s easy to find relatively clean, understandable approaches to AGI’, and too low a probability to ‘it’s easy to find relatively messy, brute-forced approaches to AGI’. A consequence of the latter is that he (IMO) underestimated how compute-intensive AGI was likely to be, and overestimated how important recursive self-improvement was likely to be.
I otherwise broadly agree with his picture. E.g.:
I expect AGI to represent a large, sharp capabilities jump. (I think this is unlikely to require a bunch of recursive self-improvement.)
I think AGI is mainly bottlenecked on software, rather than hardware. (E.g., I think GPT-3 is impressive, but isn’t a baby AGI; rather than AGI just being ‘current systems but bigger’, I expect at least one more key insight lies on the shortest likely path to AGI.)
And I expect AGI to be much more efficient than current systems at utilizing small amounts of data. Though (because it’s likely to come from a relatively brute-forced, unalignable approach) I still expect it to be more compute-intensive than 2009-Eliezer was imagining.
This seems completely wrong to me. See Katja Grace’s Coherence arguments imply a force for goal-directed behavior.
I think that part of why Eliezer’s early stuff sounds weird is:
He generally had a lower opinion of the competence of elites in business, science, etc. (Which he later updated about.)
He had a lower opinion of the field of AI in particular, as it existed in the 1990s and 2000s. Maybe more like nutrition science or continental philosophy than like chemistry, on the scale of ‘field rigor and intellectual output’.
If you think of A(G)I as a weird, neglected, pre-paradigmatic field that gets very little attention outside of science fiction writing, then it’s less surprising to think it’s possible to make big, fast strides in the field. Outperforming a competitive market is very different from outperforming a small, niche market where very little high-quality effort is going into trying new things.
Similarly, if you have a lower opinion of elites, you should be more willing to endorse weird, fringe ideas, because you should be less confident that the mainstream is efficient relative to you. (And I think Eliezer still has a low opinion of elites on some very important dimensions, compared to a lot of EAs. But not to the same degrees as teenaged Eliezer.)
From Competent Elites:
And from Above-Average AI Scientists:
I think this, plus Eliezer’s general ‘fuck it, I’m gonna call it like I see it rather than be reflexively respectful to authority’ attitude, explains most of Ben’s ‘holy shit, your views were so weird!!’ thing.
“I don’t fault 21-year-old Eliezer for trying (except insofar as he was totally wrong about the probability of Unfriendly AI at the time!), because the best way to learn that a weird new path is unviable is often to just take a stab at it”
It was only weird in that involved technologies and methods that were unlikely to work, and EY could have figured that out theoretically by learning more about AI and software development.
I didn’t see the “my own guess” part in the linked document (or the archived version), but it’s visible here, was probably edited between 2001 and 2004. Mentioned it in case others are confused after trying to find the quote in context.
Perhaps also relevant, though it isn’t forecasting, is Eliezer’s weak (in my opinion) attempted takedown of Ajeya Cotra’s bioanchors report on AI timelines. Here’s Eliezer’s bioanchors takedown attempt, here’s Holden Karnofsky’s response to Eliezer, and here’s Scott Alexander’s response.
Eliezer’s post was less a takedown of the report, and more a takedown of the idea that the report provides a strong basis for expecting AGI in ~2050, or for discriminating scenarios like ‘AGI in 2030’, ‘AGI in 2050’, and ‘AGI in 2070’.
The report itself was quite hedged, and Holden posted a follow-up clarification emphasizing that “biological anchors” is about bounding, not pinpointing, AI timelines. So it’s not clear to me that Eliezer and Ajeya/Holden/etc. even disagree about the core question “do biological anchors provide a strong case for putting a median AGI year in ~2050?”, though maybe they disagree on the secondary question of how useful the “bounds” are.
Copying over my high-level view, which I recently wrote on Twitter:
Commenting on a few minor points from Scott’s post, since I meant to write a full reply at some point but haven’t had the time:
I’d say ‘clearly not, for some possible AI designs’; but maybe it will be true for the first AIs we actually build, shrug.
Why aren’t there examples like ‘amount of cargo a bird can carry compared to an airplane’, or ‘number of digits a human can multiply together in ten seconds compared to a computer’?
Seems like you’ll get a skewed number if your brainstorming process steers away from examples like these altogether.
‘AI physicist’ is less like an artificial heart (trying to exactly replicate the structure of a biological organ functioning within a specific body), more like a calculator (trying to do a certain kind of cognitive work, without any constraint at all to do it in a human-like way).
I read this post kind of quickly, so apologies if I’m misunderstanding. It seems to me that this post’s claim is basically:
Eliezer wrote some arguments about what he believes about AI safety.
People updated toward Eliezer’s beliefs.
Therefore, people defer too much to Eliezer.
I think this is dismissing a different (and much more likely IMO) possibility, which is that Eliezer’s arguments were good, and people updated based on the strength of the arguments.
(Even if his recent posts didn’t contain novel arguments, the arguments still could have been novel to many readers.)
I’m a bit confused by both this post and comments about questions like what level/timing the deference happens.
Speaking for myself, if an internet rando wrote a random blog post called “AGI Ruin: A List of Lethalities,” I probably would not read it. But I did read Yudkowsky’s post carefully and thought about it nontrivially, mostly due to his track record and writing ability (rather than e.g. because the title was engaging or because the first paragraph was really well-argued).
“which is that Eliezer’s arguments were good,”
There is plenty of evidence against that. His arguments on other subjects aren’t good (see OP), his arguments on AI aren’t informed by academic expertise or industry experience, his predictions are bad,etc.
I’m confused by the fact Eliezer’s post was posted on April Fool’s day. To what extent does that contribute to conscious exaggeration on his part?
Right? Up to reading this post, I was convinced it was an April Fool’s post.
The post is serious. Details: https://www.lesswrong.com/posts/j9Q8bRmwCgXRYAgcJ/miri-announces-new-death-with-dignity-strategy?commentId=FounAZsg4kFxBDiXs
It seems really bad, from a communications/PR point of view, to write something that was ambiguous in this way. Like, bad enough that it makes me slightly worried that MIRI will commit some kind of big communications error that gets into the newspapers and does big damage to the reputation of EA as a whole.
As someone not active in the field of AI risk, and having always used epistemic deference quite heavily, this feels very helpful. I hope it doesn’t end up reducing society’s efforts to stop AI from taking over the world some day.
On the contrary, my best guess is that the “dying with dignity” style dooming is harming the community’s ability to tackle AI risk as effectively as it otherwise could
I agree with many of the comments here that this is overall a bit unfair, and there are good reasons to take Yudkowsky seriously even if you don’t automatically accept his self-expressed level of confidence.
My main criticism of Yudkowsky is that he has many innovative/somewhat compelling ideas, but even with many years and a research institution their evolution has been unsatisfying. Many of them are still imprecise, and some of those that are precise(ish) are not satisfactory (e.g the orthogonality thesis, mesa-optimizers). Furthermore, he still doesn’t seem very interested in improving this situation.
Almost all of this seems reasonable. But:
I don’t think we should update based on this, or eg on the fact that we didn’t go extinct due to nanotechnology, because anthropics / observer selection. (We should only update based on whether we think the reasons for those beliefs were bad.)
Suppose you’ve been captured by some terrorists and you’re tied up with your friend Eli. There is a device on the other side of the room you that you can’t quite make out. Your friend Eli says that he can tell (he’s 99% sure) it is a bomb and that it is rigged to go off randomly. Every minute, he’s confident there’s a 50-50 chance it will explode, killing both of you. You wait a minute and it doesn’t explode. You wait 10. You wait 12 hours. Nothing. He starts eying the light fixture, and say’s he’s pretty sure there’s a bomb there too. You believe him?
No, my survival for 12 hours is evidence against Eli being correct about the bomb.
So: oops, I think.
I’m still not totally comfortable. I think my confusion arose because I was considering the related question of whether I could use my better knowledge than Eli to win money from bets (in expectation) -- I couldn’t, because Eli has no reason to bet on the bomb going off. More generally, Eliezer never had reason to bet (in the sense that he gets epistemic credit if he’s right) on nanotech-doom-by-2010, because in the worlds where he’s right we’re dead. It feels weird to update against Eliezer on the basis of beliefs that he wouldn’t have bet on; updating against him doesn’t seem to be incentive-compatible… but maybe that’s just the sacrifice immanent to the epistemic virtue of publicly sharing your belief in doom.
I am willing to bite your bullet.
I had a comment here explaining my reasoning, but deleted it because I plan to make a post instead.
Interesting! I would think this sort of case just shows that the law of conservation of expected evidence is wrong, at least for this sort of application. I figure it might depend on how you think about evidence. If you think of the infinite void of non-existence as possibly constituting your evidence (albeit evidence you’re not in a position to appreciate, being dead and all), then that principle wouldn’t push you toward this sort of anthropic reasoning.
I am curious, what do you make of the following case?
Suppose you’re touring Acme Bomb & Replica Bomb Co with your friend Eli. ABRBC makes bombs and perfect replicas of bombs, but they’re sticklers for safety so they alternate days for real bombs and replicas. You’re not sure which sort of day it is. You get to the point of the tour where they show off the finished product. As they pass around the latest model from the assembly line, Eli drops it, knocking the safety back and letting the bomb (replica?) land squarely on its ignition button. If it were a real bomb, it would kill everyone unless it were one of the 1-in-a-million bombs that’s a dud. You hold your breath for a second but nothing happens. Whew. How much do you want to bet that it’s a replica day?
I think posts like this better open with “but consider forming your own opinions rather than relying on experts”
I prefer to just analyse and refute his concrete arguments on the object level.
I’m not a fan of engaging the person of the arguer instead of their arguments.
Granted, I don’t practice epistemic deference in regards to AI risk (so I’m not the target audience here), but I’m really not a fan of this kind of post. It rubs me the wrong way.
Challenging someone’s overall credibility instead of their concrete arguments feels like bad form and [logical rudeness] (https://www.lesswrong.com/posts/srge9MCLHSiwzaX6r/logical-rudeness).
I wish EAs did not engage in such behaviour and especially not with respect to other members of the community.
I agree that work analyzing specific arguments is, overall, more useful than work analyzing individual people’s track records. Personally, partly for that reason, I’ve actually done a decent amount of public argument analysis (e.g. here, here, and most recently here) but never written a post like this before.
Still, I think, people do in practice tend to engage in epistemic deference. (I think that even people who don’t consciously practice epistemic deference tend to be influenced by the views of people they respect.) I also think that people should practice some level of epistemic deference, particularly if they’re new to an area. So—in that sense—I think this kind of track record analysis is still worth doing, even if it’s overall less useful than argument analysis.
(I hadn’t seen this reply when I made my other reply).
What do you think of legitimising behaviour that calls out the credibility of other community members in the future?
I am worried about displacing the concrete object level arguments as the sole domain of engagement. A culture in which arguments cannot be allowed to stand by themselves. In which people have to be concerned about prior credibility, track record and legitimacy when formulating their arguments...
It feels like a worse epistemic culture.
Expert opinion has always been a substitute for object level arguments because of deference culture. Nobody has object level arguments for why x-risk in the 21st century is around 1/6: we just think it might be because Toby Ord says so and he is very credible. Is this ideal? No. But we do it because expert priors are the second best alternative when there is no data to base our judgments off of.
Given this, I think criticizing an expert’s priors is functionally an object level argument, since the expert’s prior is so often used as a substitute for object level analysis.
I agree that a slippery slope endpoint would be bad but I do not think criticizing expert priors takes us there.
To expand on my complaints in the above comment.
I do not want an epistemic culture that finds it acceptable to challenge an individuals overall credibility in lieu of directly engaging with their arguments.
I think that’s unhealthy and contrary to collaborative knowledge growing.
Yudkowsky has laid out his arguments for doom at length. I don’t fully agree with those arguments (I believe he’s mistaken in 2 − 3 serious and important ways), but he has laid them out, and I can disagree on the object level with him because of that.
Given that the explicit arguments are present, I would prefer posts that engaged with and directly refuted the arguments if you found them flawed in some way.
I don’t like this direction of attacking his overall credibility.
Attacking someone’s credibility in lieu of their arguments feels like a severe epistemic transgression.
I am not convinced that the community is better for a norm that accepts such epistemic call out posts.
I think I roughly agree with you on this point, although I would guess I have at least a somewhat weaker version of your view. If discourse about people’s track records or reliability starts taking up (e.g.) more than a fifth of the space that object-level argument does, within the most engaged core of people, then I do think that will tend to suggest an unhealthy or at least not-very-intellectually-productive community.
One caveat: For less engaged people, I do actually think it can make sense to spend most of your time thinking about questions around deference. If I’m only going to spend ten hours thinking about nanotechnology risk, for example, then I might actually want to spend most of this time trying to get a sense of what different people believe and how much weight I should give their views; I’m probably not going to be able to make a ton of headway getting a good gears-level-understanding of the relevant issues, particularly as someone without a chemistry or engineering background.
> I do not want an epistemic culture that finds it acceptable to challenge an individuals overall credibility in lieu of directly engaging with their arguments.
I think it’s fair to talk about a person’s lifetime performance when we are talking about forecasting. When we don’t have the expertise ourselves, all we have to go on is what little we understand and the track records of the experts we defer to. Many people defer to Eliezer so I think it’s a service to lay out his track record so that we can know how meaningful his levels of confidence and special insights into this kind of problem are.
I don’t think this is realistic. There is much more important knowledge than one can engage with in a lifetime. The only way of forming views about many things is to somehow decide who to listen to, or at least how to aggregate relevant more strongly based opinions (so, who to count as an expert and who not to and with what weight).
Tldr
Personally and from my very uneducated vantage point. I question why a superintelligence with a truly universal set of ethics, would pose a risk to other lifeforms. But I also do not know how the initial conditions can be architected. If indeed the initial conditions can be set/architected. That could go a different set of ways and depending on who’s values.
What I worry about is what humans (enhanced or not) and cyborgs may chose to do with the bread-crumbs (the leftovers). Or the steps taken to get to AGI.
Here is a schematic (link below) that I started meditating on yesterday. I am not sure if it’s polite to share, particularly in light of a reality that I have not taken the time to absorb the post above. But here goes and sharing it, as it may (or may not) help provide some value to someone. Hopefully in a manner that is reasonable. https://qr.ae/pvoVJn
The karma on this post is impressive especially since OP could have started this AM UK time but didn’t.
I want to say stuff but it’s not going to help?