The part of this post which seems most wild to me is the leap from “mixed track record” to
In particular, I think, they shouldn’t defer to him more than they would defer to anyone else who seems smart and has spent a reasonable amount of time thinking about AI risk.
For any reasonable interpretation of this sentence, it’s transparently false. Yudkowsky has proven to be one of the best few thinkers in the world on a very difficult topic. Insofar as there are others who you couldn’t write a similar “mixed track record” post about, it’s almost entirely because they don’t have a track record of making any big claims, in large part because they weren’t able to generate the relevant early insights themselves. Breaking ground in novel domains is very, very different from forecasting the weather or events next year; a mixed track record is the price of entry.
I disagree that the sentence is false for the interpretation I have in mind.
I think it’s really important to seperate out the question “Is Yudkowsky an unusually innovative thinker?” and the question “Is Yudkowsky someone whose credences you should give an unusual amount of weight to?”
I read your comment as arguing for the former, which I don’t disagree with. But that doesn’t mean that people should currently weigh his risk estimates more highly than they weigh the estimates of other researchers currently in the space (like you).
I also think that there’s a good case to be made that Yudkowsky tends to be overconfident, and this should be taken into account when deferring; but when it comes to making big-picture forecasts, the main value of deference is in helping us decide which ideas and arguments to take seriously, rather than the specific credences we should place on them, since the space of ideas is so large.
But we do also need to try to have well-calibrated credences, of course. For the reason given in the post, it’s important to know whether the risk of everyone dying soon is 5% or 99%. It’s not enough just to determine whether we should take AI risk seriously.
We’re also now past the point, as a community, where “Should AI risk be taken seriously?” is that much of a live question. The main epistemic question that matters is what probability we assign to it—and I think this post is relevant to that.
(More generally, rather than reading this post, I recommend people read this one by Paul Christiano, which outlines specific agreements and disagreements.)
I definitely recommend people read the post Paul just wrote! I think it’s overall more useful than this one.
But I don’t think there’s an either-or here. People—particularly non-experts in a domain—do and should form their views through a mixture of engaging with arguments and deferring to others. So both arguments and track records should be discussed.
The EA community has ended up strongly moving in Yudkowsky’s direction over the last decade, and that seems like much more compelling evidence than anything listed in this post.
I discuss this in response to another comment, here, but I’m not convinced of that point.
I really appreciate the time people have taken to engage with this post (and actually hope the attention cost hasn’t been too significant). I decided to write some post-discussion reflections on what I think this post got right and wrong.
The reflections became unreasonably long—and almost certainly should be edited down—but I’m posting them here in a hopefully skim-friendly format. They cover what I see as some mistakes with the post, first, and then cover some views I stand by.
Things I would do differently in a second version of the post:
1. I would either drop the overall claim about how much people should defer to Yudkowsky — or defend it more explicitly
At the start of the post, I highlight the two obvious reasons to give Yudkowsky’s risk estimates a lot of weight: (a) he’s probably thought more about the topic than anyone else and (b) he developed many of the initial AI risk arguments. I acknowledge that many people, justifiably, treat these as important factors when (explicitly or implicitly) deciding how much to defer to Yudkowsky.
Then the post gives some evidence that, at each stage of his career, Yudkowsky has made a dramatic, seemingly overconfident prediction about technological timelines and risks—and at least hasn’t obviously internalised lessons from these apparent mistakes.
The post expresses my view that these two considerations at least counterbalance each other—so that, overall, Yudkowsky’s risk estimates shouldn’t be given more weight than (e.g.) those of other established alignment researchers or the typical person on the OpenPhil worldview investigation team.
But I don’t do a lot in the post to actually explore how we should weigh these factors up. In that sense: I think it’d be fair to regard the post’s central thesis as importantly under-supported by the arguments contained in the post.
I should have either done more to explicitly defend my view or simply framed the post as “some evidence about the reliability of Yudkowsky’s risk estimates.”
2. I would be clearer about how and why I generated these examples
In hindsight, this is a significant oversight on my part. The process by which I generated these examples is definitely relevant for judging how representative they are—and, therefore, how much to update on them. But I don’t say anything about this in the post. My motives (or at least conscious motives) are also part of the story that I only discuss in pretty high-level terms, but seem like they might be relevant for forming judgments.
For context, then, here was the process:
A few years ago, I tried to get a clearer sense of the intellectual history of the AI risk and existential risk communities. For that reason, I read a bunch of old white papers, blog posts, and mailing list discussions.
These gave me the impression that Yudkowsky’s track record (and—to some extent—the track record of the surrounding community) was worse than I’d realised. From reading old material, I basically formed something like this impression: “At each stage of Yudkowsky’s professional life, his work seems to have been guided by some dramatic and confident belief about technological trajectories and risks. The older beliefs have turned out to be wrong. And the ones that haven’t yet resolved at least seem to have been pretty overconfident in hindsight.”
I kept encountering the idea that Yudkowsky has an exceptionally good track record or that he has an unparalleled ability to think well about AI (he’s also expressed view himself) - and I kept thinking, basically, that this seemed wrong. I wrote up some initial notes on this discrepancy at some point, but didn’t do anything with them.
I eventually decided to write something public after the “Death with Dignity” post, since the view it expresses (that we’re all virtually certain to die soon) both seems wrong to me and very damaging if it’s actually widely adopted in the community. I also felt like the “Death with Dignity” post was getting more play than it should, simply because people have a strong tendency to give Yudkowsky’s views weight. I can’t imagine a similar post written by someone else having nearly as large of an impact. Notably, since that post didn’t really have substantial arguments in it (although the later one did), I think the fact it had an impact is seemingly a testament to the power of deference; I think it’d be hard to look at the reaction to that post and argue that it’s only Yudkowsky’s arguments (rather than his public beliefs in-and-of-themselves) that have a major impact on the community.
People are obviously pretty aware of Yudkowsky’s positive contributions, but my impression is that (especially) new community members tended not to be aware of negative aspects of his track record. So I wanted to write a post drawing attention to the negative aspects.
I was initially going to have the piece explicitly express the impression I’d formed, which was something like: “At each stage of Yudkowsky’s professional life, his work has been guided by some dramatic and seemingly overconfident belief about technological trajectories and risks.” The examples in the post were meant to map onto the main ‘animating predictions’ about technology he had at each stage of his career. I picked out the examples that immediately came to mind.
Then I realised I wasn’t at all sure I could defend the claim that these were his main ‘animating predictions’ - the category was obviously extremely vague, and the main examples that came to mind were extremely plausibly a biased sample. I thought there was a good chance that if I reflected more, then I’d also want to include various examples that were more positive.
I didn’t want to spend the time doing a thorough accounting exercise, though, so I decided to drop any claim that the examples were representative and just describe them as “cherry-picked” — and add in lots of caveats emphasising that they’re cherry-picked.
(At least, these were my conscious thought processes and motivations as I remember them. I’m sure other factors played a role!)
3. I’d tweak my discussion of take-off speeds
I’d make it clearer that my main claim is: it would have been unreasonable to assign a very high credence to fast take-offs back in (e.g.) the early- or mid-2000s, since the arguments for fast take-offs had significant gaps. For example, there were a lots of possible countervailing arguments for slow take-offs that pro-fast-take-off authors simply hadn’t address yet — as evidenced, partly, by the later publication of slow-take-off arguments leading a number of people to become significantly more sympathetic to slow take-offs. (I’m not claiming that there’s currently a consensus against fast-take-off views.)
4. I’d add further caveats to the “coherence arguments” case—or simply leave it out
Rohin’s and Oli’s comments under the post have made me aware that there’s a more positive way to interpret Yudkowsky’s use of coherence arguments. I’m not sure if that interpretation is correct, or if it would actually totally undermine the example, but this is at minimum something I hadn’t reflected on. I think it’s totally possible that further reflection would lead me to simply remove the example.
Positions I stand by:
On the flipside, here’s a set of points I still stand by:
1. If a lot of people in the community believe AI is probably going to kill everyone soon, then (if they’re wrong) this can have really important negative effects
In terms of prioritisation: My prediction is that if you were to ask different funders, career advisors, and people making career decisions (e.g. deciding whether to go into AI policy or bio policy) how much they value having a good estimate of AI risk, they’ll very often answer that they value it a great deal. I do think that over-estimating the level of risk could lead to concretely worse decisions.
In terms of community health: I think that believing you’re probably going to die soon is probably bad for a large portion of people. Reputationally: Being perceived as believing that everyone is probably going to die soon (particularly if this actually an excessive level of worry) also seems damaging.
I think we should also take seriously the tail-risk that at least one person with doomy views (even if they’re not directly connected to the existential risk community) will take dramatic and badly harmful actions on the basis of their views.
2. Directly and indirectly, deference to Yudkowsky has a significant influence on a lot of people’s views
As above: One piece of evidence for this is Yudkowsky’s “Death with Dignity” post triggered a big reaction, even though it didn’t contain any significant new arguments. I think his beliefs (above and beyond his arguments) clearly do have an impact.
Another reason to believe deference is a factor: I think it’s both natural and rational for people, particularly people new to an area, to defer to people with more expertise in that area.[1] Yudkowsky is one of the most obvious people to defer to, as one of the two people most responsible for developing and popularising AI risk arguments and as someone who has (likely) spent more time thinking about the subject than anyone else.
Beyond that: A lot of people also clearly in general have huge amount of respect for Yudkowsky, sometimes more than they have for any other public intellectual. I think it’s natural (and sensible) for people’s views to be influenced by the views of the people they respect. In general, I think, unless you have tremendous self-control, this will tend to happen sub-consciously even if you don’t consciously choose to defer to the people you respect.
Also, people sometimes just do talk about Yudkowsky’s track record or reputation as a contributing factor to their views.
3. The track records of influential intellectuals (including Yudkowsky) should be publicly discussed.
A person’s track-record provides evidence about how reliable their predictions are. If people are considering how much to defer to some intellectual, then they should want to know what their track record (at least within the relevant domain) looks like.
The main questions that matter are: What has the intellectual gotten wrong and right? Beyond whether they were wrong or right, about a given case, does it also seem like their predictions were justified? If they’ve made certain kinds of mistakes in the past, do we now have reason to think they won’t repeat those kinds of mistakes?
4. Yudkowsky’s track record suggests a substantial bias toward dramatic and overconfident predictions.
One counter—which I definitely think it’s worth reflecting on—is that it might be possible to generate a similarly bias-suggesting list of examples like this for any other public intellectual or member of the existential risk community.
I’ll focus on one specific comment, suggesting that Yudkowsky’s incorrect predictions about nanotechnology are in the same reference class as ‘writing a typically dumb high school essay.’ The counter goes something like this: Yes, it was possible to find this example from Yudkowsky’s past—but that’s not importantly different than being able to turn up anyone else’s dumb high school essay about (e.g.) nuclear power.
Ultimately, I don’t buy the comparison. I think it’s really out-of-distribution for someone in their late teens and early twenties to pro-actively form the view that an emerging technology is likely to kill everyone within a decade, found an organization and devote years of their professional life to address the risk, and talk about how they’re the only person alive who can stop it.
That just seems very different from writing a dumb high school essay. Much more than a standard dumb high school essay, I think this aspect of Yudkowsky’s track record really does suggest a bias toward dramatic and overconfident predictions. This prediction is also really strikingly analogous to the prediction Yudkowsky is making right now—its relevance is clearly higher than the relevance of (e.g.) a random poorly thought-out view in a high school essay.
(Yudkowsky’s early writing and work is also impressive, in certain ways, insofar as it suggests a much higher level of originality of thought and agency than the typical young person has. But the fact that this example is impressive doesn’t undercut, I think, the claim that it’s also highly suggestive of a bias toward highly confident and dramatic predictions.)
5. Being one of the first people to identify, develop, or take seriously some idea doesn’t necessarily mean that you predictions about the idea will be unusually reliable
By analogy:
I don’t think we can assume that the first person to take the covid lab leak theory seriously (when others were dismissive) is currently the most reliable predictor of whether the theory is true.
I don’t think we can assume that the first person to develop the many worlds theory of quantum mechanics (when others were dismissive) would currently be the best person to predict whether the theory is true, if they were still alive.
There are, certainly, reasons to give pioneers in a domain special weight when weighing expert opinion in that domain.[2] But these reasons aren’t absolute.
There are even easons that point in the opposite direction: we might worry that the pioneer has an attachment to their theory, so will be biased toward believing it is true and as important as possible. We might also worry that the pioneering-ness of their beliefs is evidence that these beliefs front-ran the evidence and arguments (since one way to be early is to simply be excessively confident). We also have less evidence of their open-mindedness than we do for the people who later on moved toward the pioneer’s views — since moving toward the pioneer’s views, when you were initially dismissive, is at least a bit of evidence for open-mindedness and humility.[3]
Overall, I do think we should tend defer more to pioneers (all else being equal). But this tendency can definitely be overruled by other evidence and considerations.
6. The causal effects that people have had on the world don’t (in themselves) have implications for how much we should defer to them
At least in expectation, so far, Eliezer Yudkowsky has probably had a very positive impact on the world. There is a plausible case to be made that misaligned AI poses a substantial existential risk—and Yudkowsky’s work has probably, on net, massively increased the number of people thinking about it and taking it seriously. He’s also written essays that have exposed huge numbers of people to other important ideas and helped them to think more clearly. It makes sense for people to applaud all of this.
Still, I don’t think his positive causal effect on the world gives people much additional reason to be deferential to him.
Here’s a dumb thought experiment: Suppose that Yudkowsky wrote all of the same things, but never published them. But suppose, also, that a freak magnetic storm ended up implanting all of the same ideas in his would-be-readers’ brains. Would this absence of a casual effect count against deferring to Yudkowsky? I don’t think so. The only thing that ultimately matters, I think, is his track record of beliefs—and the evidence we currently have about how accurate or justified those beliefs were.
I’m not sure anyone disagrees with the above point, but I did notice there seemed to be a decent amount of discussion in the comments about Yudkowsky’s impact—and I’m not sure I think this issue will ultimately be relevant.[4]
For example: I had ten hours to form a view about the viability of some application of nanotechnology, I definitely wouldn’t want to ignore the beliefs of people who have already thought about the question. Trying to learn the relevant chemistry and engineering background wouldn’t be a good use of my time.
One really basic reason is simply that they’ve simply had more time to think about certain subjects than anyone else.
Here’s a concrete case: Holden Karnofsky eventually moved toward taking AI risks seriously, after publicly being fairly dismissive of it, and then wrote up a document analysing why he was initially dismissive and drawing lessons from the experience. It seems like we could count that as positive evidence about his future judgment.
Even though I’ve just said I’m not sure this question is relevant, I do also want to say a little bit about Yudkowsky’s impact. I personally think’s probably had a very significant impact. Nonetheless, I also think the impact can be overstated. For example, I think, it’s been suggested that the effective altruism community might not be very familiar with concepts like Bayesian or the importance of overcoming bias if it weren’t for Yudkowsky’s writing. I don’t really find that particular suggestion plausible.
Here’s one data point I can offer from my own life: Through a mixture of college classes and other reading, I’m pretty confident I had already encountered the heuristics and biases literature, Bayes’ theorem, Bayesian epistemology, the ethos of working to overcome bias, arguments for the many worlds interpretation, the expected utility framework, population ethics, and a number of other ‘rationalist-associated’ ideas before I engaged with the effective altruism or rationalist communities. For example, my college had classes in probability theory, Bayesian epistemology, and the philosophy of quantum mechanics, and I’d read at least parts of books like Thinking Fast and Slow, the Signal and the Noise, the Logic of Science, and various books associated with the “skeptic community.” (Admittedly, I think it would have been harder to learn some of these things if I’d gone to college a bit earlier or had a different major. I also probably “got lucky” in various ways with the classes I took and books I picked up.) See also Carl Shulman making a similar point and John Halstead also briefly commenting the way in which he personally encountered some the relevant ideas.