bgarfinkel

Karma: 2,755

bgarfinkel 21 Jun 2022 22:41 UTC
129 points
0 ∶ 0
on: On Deference and Yudkowsky’s AI Risk Estimates
I really appreciate the time people have taken to engage with this post (and actually hope the attention cost hasn’t been too significant). I decided to write some post-discussion reflections on what I think this post got right and wrong.

The reflections became unreasonably long—and almost certainly should be edited down—but I’m posting them here in a hopefully skim-friendly format. They cover what I see as some mistakes with the post, first, and then cover some views I stand by.

Things I would do differently in a second version of the post:

1. I would either drop the overall claim about how much people should defer to Yudkowsky — or defend it more explicitly

At the start of the post, I highlight the two obvious reasons to give Yudkowsky’s risk estimates a lot of weight: (a) he’s probably thought more about the topic than anyone else and (b) he developed many of the initial AI risk arguments. I acknowledge that many people, justifiably, treat these as important factors when (explicitly or implicitly) deciding how much to defer to Yudkowsky.

Then the post gives some evidence that, at each stage of his career, Yudkowsky has made a dramatic, seemingly overconfident prediction about technological timelines and risks—and at least hasn’t obviously internalised lessons from these apparent mistakes.

The post expresses my view that these two considerations at least counterbalance each other—so that, overall, Yudkowsky’s risk estimates shouldn’t be given more weight than (e.g.) those of other established alignment researchers or the typical person on the OpenPhil worldview investigation team.

But I don’t do a lot in the post to actually explore how we should weigh these factors up. In that sense: I think it’d be fair to regard the post’s central thesis as importantly under-supported by the arguments contained in the post.

I should have either done more to explicitly defend my view or simply framed the post as “some evidence about the reliability of Yudkowsky’s risk estimates.”

2. I would be clearer about how and why I generated these examples

In hindsight, this is a significant oversight on my part. The process by which I generated these examples is definitely relevant for judging how representative they are—and, therefore, how much to update on them. But I don’t say anything about this in the post. My motives (or at least conscious motives) are also part of the story that I only discuss in pretty high-level terms, but seem like they might be relevant for forming judgments.

For context, then, here was the process:

A few years ago, I tried to get a clearer sense of the intellectual history of the AI risk and existential risk communities. For that reason, I read a bunch of old white papers, blog posts, and mailing list discussions.

These gave me the impression that Yudkowsky’s track record (and—to some extent—the track record of the surrounding community) was worse than I’d realised. From reading old material, I basically formed something like this impression: “At each stage of Yudkowsky’s professional life, his work seems to have been guided by some dramatic and confident belief about technological trajectories and risks. The older beliefs have turned out to be wrong. And the ones that haven’t yet resolved at least seem to have been pretty overconfident in hindsight.”

I kept encountering the idea that Yudkowsky has an exceptionally good track record or that he has an unparalleled ability to think well about AI (he’s also expressed view himself) - and I kept thinking, basically, that this seemed wrong. I wrote up some initial notes on this discrepancy at some point, but didn’t do anything with them.

I eventually decided to write something public after the “Death with Dignity” post, since the view it expresses (that we’re all virtually certain to die soon) both seems wrong to me and very damaging if it’s actually widely adopted in the community. I also felt like the “Death with Dignity” post was getting more play than it should, simply because people have a strong tendency to give Yudkowsky’s views weight. I can’t imagine a similar post written by someone else having nearly as large of an impact. Notably, since that post didn’t really have substantial arguments in it (although the later one did), I think the fact it had an impact is seemingly a testament to the power of deference; I think it’d be hard to look at the reaction to that post and argue that it’s only Yudkowsky’s arguments (rather than his public beliefs in-and-of-themselves) that have a major impact on the community.

People are obviously pretty aware of Yudkowsky’s positive contributions, but my impression is that (especially) new community members tended not to be aware of negative aspects of his track record. So I wanted to write a post drawing attention to the negative aspects.

I was initially going to have the piece explicitly express the impression I’d formed, which was something like: “At each stage of Yudkowsky’s professional life, his work has been guided by some dramatic and seemingly overconfident belief about technological trajectories and risks.” The examples in the post were meant to map onto the main ‘animating predictions’ about technology he had at each stage of his career. I picked out the examples that immediately came to mind.

Then I realised I wasn’t at all sure I could defend the claim that these were his main ‘animating predictions’ - the category was obviously extremely vague, and the main examples that came to mind were extremely plausibly a biased sample. I thought there was a good chance that if I reflected more, then I’d also want to include various examples that were more positive.

I didn’t want to spend the time doing a thorough accounting exercise, though, so I decided to drop any claim that the examples were representative and just describe them as “cherry-picked” — and add in lots of caveats emphasising that they’re cherry-picked.

(At least, these were my conscious thought processes and motivations as I remember them. I’m sure other factors played a role!)

3. I’d tweak my discussion of take-off speeds

I’d make it clearer that my main claim is: it would have been unreasonable to assign a very high credence to fast take-offs back in (e.g.) the early- or mid-2000s, since the arguments for fast take-offs had significant gaps. For example, there were a lots of possible countervailing arguments for slow take-offs that pro-fast-take-off authors simply hadn’t address yet — as evidenced, partly, by the later publication of slow-take-off arguments leading a number of people to become significantly more sympathetic to slow take-offs. (I’m not claiming that there’s currently a consensus against fast-take-off views.)

4. I’d add further caveats to the “coherence arguments” case—or simply leave it out

Rohin’s and Oli’s comments under the post have made me aware that there’s a more positive way to interpret Yudkowsky’s use of coherence arguments. I’m not sure if that interpretation is correct, or if it would actually totally undermine the example, but this is at minimum something I hadn’t reflected on. I think it’s totally possible that further reflection would lead me to simply remove the example.

Positions I stand by:

On the flipside, here’s a set of points I still stand by:

1. If a lot of people in the community believe AI is probably going to kill everyone soon, then (if they’re wrong) this can have really important negative effects

In terms of prioritisation: My prediction is that if you were to ask different funders, career advisors, and people making career decisions (e.g. deciding whether to go into AI policy or bio policy) how much they value having a good estimate of AI risk, they’ll very often answer that they value it a great deal. I do think that over-estimating the level of risk could lead to concretely worse decisions.

In terms of community health: I think that believing you’re probably going to die soon is probably bad for a large portion of people. Reputationally: Being perceived as believing that everyone is probably going to die soon (particularly if this actually an excessive level of worry) also seems damaging.

I think we should also take seriously the tail-risk that at least one person with doomy views (even if they’re not directly connected to the existential risk community) will take dramatic and badly harmful actions on the basis of their views.

2. Directly and indirectly, deference to Yudkowsky has a significant influence on a lot of people’s views

As above: One piece of evidence for this is Yudkowsky’s “Death with Dignity” post triggered a big reaction, even though it didn’t contain any significant new arguments. I think his beliefs (above and beyond his arguments) clearly do have an impact.

Another reason to believe deference is a factor: I think it’s both natural and rational for people, particularly people new to an area, to defer to people with more expertise in that area.^[1] Yudkowsky is one of the most obvious people to defer to, as one of the two people most responsible for developing and popularising AI risk arguments and as someone who has (likely) spent more time thinking about the subject than anyone else.

Beyond that: A lot of people also clearly in general have huge amount of respect for Yudkowsky, sometimes more than they have for any other public intellectual. I think it’s natural (and sensible) for people’s views to be influenced by the views of the people they respect. In general, I think, unless you have tremendous self-control, this will tend to happen sub-consciously even if you don’t consciously choose to defer to the people you respect.

Also, people sometimes just do talk about Yudkowsky’s track record or reputation as a contributing factor to their views.

3. The track records of influential intellectuals (including Yudkowsky) should be publicly discussed.

A person’s track-record provides evidence about how reliable their predictions are. If people are considering how much to defer to some intellectual, then they should want to know what their track record (at least within the relevant domain) looks like.

The main questions that matter are: What has the intellectual gotten wrong and right? Beyond whether they were wrong or right, about a given case, does it also seem like their predictions were justified? If they’ve made certain kinds of mistakes in the past, do we now have reason to think they won’t repeat those kinds of mistakes?

4. Yudkowsky’s track record suggests a substantial bias toward dramatic and overconfident predictions.

One counter—which I definitely think it’s worth reflecting on—is that it might be possible to generate a similarly bias-suggesting list of examples like this for any other public intellectual or member of the existential risk community.

I’ll focus on one specific comment, suggesting that Yudkowsky’s incorrect predictions about nanotechnology are in the same reference class as ‘writing a typically dumb high school essay.’ The counter goes something like this: Yes, it was possible to find this example from Yudkowsky’s past—but that’s not importantly different than being able to turn up anyone else’s dumb high school essay about (e.g.) nuclear power.

Ultimately, I don’t buy the comparison. I think it’s really out-of-distribution for someone in their late teens and early twenties to pro-actively form the view that an emerging technology is likely to kill everyone within a decade, found an organization and devote years of their professional life to address the risk, and talk about how they’re the only person alive who can stop it.

That just seems very different from writing a dumb high school essay. Much more than a standard dumb high school essay, I think this aspect of Yudkowsky’s track record really does suggest a bias toward dramatic and overconfident predictions. This prediction is also really strikingly analogous to the prediction Yudkowsky is making right now—its relevance is clearly higher than the relevance of (e.g.) a random poorly thought-out view in a high school essay.

(Yudkowsky’s early writing and work is also impressive, in certain ways, insofar as it suggests a much higher level of originality of thought and agency than the typical young person has. But the fact that this example is impressive doesn’t undercut, I think, the claim that it’s also highly suggestive of a bias toward highly confident and dramatic predictions.)

5. Being one of the first people to identify, develop, or take seriously some idea doesn’t necessarily mean that you predictions about the idea will be unusually reliable

By analogy:
- I don’t think we can assume that the first person to take the covid lab leak theory seriously (when others were dismissive) is currently the most reliable predictor of whether the theory is true.
- I don’t think we can assume that the first person to develop the many worlds theory of quantum mechanics (when others were dismissive) would currently be the best person to predict whether the theory is true, if they were still alive.
There are, certainly, reasons to give pioneers in a domain special weight when weighing expert opinion in that domain.^[2] But these reasons aren’t absolute.

There are even easons that point in the opposite direction: we might worry that the pioneer has an attachment to their theory, so will be biased toward believing it is true and as important as possible. We might also worry that the pioneering-ness of their beliefs is evidence that these beliefs front-ran the evidence and arguments (since one way to be early is to simply be excessively confident). We also have less evidence of their open-mindedness than we do for the people who later on moved toward the pioneer’s views — since moving toward the pioneer’s views, when you were initially dismissive, is at least a bit of evidence for open-mindedness and humility.^[3]

Overall, I do think we should tend defer more to pioneers (all else being equal). But this tendency can definitely be overruled by other evidence and considerations.

6. The causal effects that people have had on the world don’t (in themselves) have implications for how much we should defer to them

At least in expectation, so far, Eliezer Yudkowsky has probably had a very positive impact on the world. There is a plausible case to be made that misaligned AI poses a substantial existential risk—and Yudkowsky’s work has probably, on net, massively increased the number of people thinking about it and taking it seriously. He’s also written essays that have exposed huge numbers of people to other important ideas and helped them to think more clearly. It makes sense for people to applaud all of this.

Still, I don’t think his positive causal effect on the world gives people much additional reason to be deferential to him.

Here’s a dumb thought experiment: Suppose that Yudkowsky wrote all of the same things, but never published them. But suppose, also, that a freak magnetic storm ended up implanting all of the same ideas in his would-be-readers’ brains. Would this absence of a casual effect count against deferring to Yudkowsky? I don’t think so. The only thing that ultimately matters, I think, is his track record of beliefs—and the evidence we currently have about how accurate or justified those beliefs were.

I’m not sure anyone disagrees with the above point, but I did notice there seemed to be a decent amount of discussion in the comments about Yudkowsky’s impact—and I’m not sure I think this issue will ultimately be relevant.^[4]
1. ↩︎
  For example: I had ten hours to form a view about the viability of some application of nanotechnology, I definitely wouldn’t want to ignore the beliefs of people who have already thought about the question. Trying to learn the relevant chemistry and engineering background wouldn’t be a good use of my time.
2. ↩︎
  One really basic reason is simply that they’ve simply had more time to think about certain subjects than anyone else.
3. ↩︎
  Here’s a concrete case: Holden Karnofsky eventually moved toward taking AI risks seriously, after publicly being fairly dismissive of it, and then wrote up a document analysing why he was initially dismissive and drawing lessons from the experience. It seems like we could count that as positive evidence about his future judgment.
4. ↩︎
  Even though I’ve just said I’m not sure this question is relevant, I do also want to say a little bit about Yudkowsky’s impact. I personally think’s probably had a very significant impact. Nonetheless, I also think the impact can be overstated. For example, I think, it’s been suggested that the effective altruism community might not be very familiar with concepts like Bayesian or the importance of overcoming bias if it weren’t for Yudkowsky’s writing. I don’t really find that particular suggestion plausible.
  
  Here’s one data point I can offer from my own life: Through a mixture of college classes and other reading, I’m pretty confident I had already encountered the heuristics and biases literature, Bayes’ theorem, Bayesian epistemology, the ethos of working to overcome bias, arguments for the many worlds interpretation, the expected utility framework, population ethics, and a number of other ‘rationalist-associated’ ideas before I engaged with the effective altruism or rationalist communities. For example, my college had classes in probability theory, Bayesian epistemology, and the philosophy of quantum mechanics, and I’d read at least parts of books like Thinking Fast and Slow, the Signal and the Noise, the Logic of Science, and various books associated with the “skeptic community.” (Admittedly, I think it would have been harder to learn some of these things if I’d gone to college a bit earlier or had a different major. I also probably “got lucky” in various ways with the classes I took and books I picked up.) See also Carl Shulman making a similar point and John Halstead also briefly commenting the way in which he personally encountered some the relevant ideas.
What links here?

bgarfinkel 20 Jun 2022 8:26 UTC
91 points
3 ∶ 0
in reply to: richard_ngo’s comment on: On Deference and Yudkowsky’s AI Risk Estimates

The part of this post which seems most wild to me is the leap from “mixed track record” to

In particular, I think, they shouldn’t defer to him more than they would defer to anyone else who seems smart and has spent a reasonable amount of time thinking about AI risk.

For any reasonable interpretation of this sentence, it’s transparently false. Yudkowsky has proven to be one of the best few thinkers in the world on a very difficult topic. Insofar as there are others who you couldn’t write a similar “mixed track record” post about, it’s almost entirely because they don’t have a track record of making any big claims, in large part because they weren’t able to generate the relevant early insights themselves. Breaking ground in novel domains is very, very different from forecasting the weather or events next year; a mixed track record is the price of entry.

I disagree that the sentence is false for the interpretation I have in mind.

I think it’s really important to seperate out the question “Is Yudkowsky an unusually innovative thinker?” and the question “Is Yudkowsky someone whose credences you should give an unusual amount of weight to?”

I read your comment as arguing for the former, which I don’t disagree with. But that doesn’t mean that people should currently weigh his risk estimates more highly than they weigh the estimates of other researchers currently in the space (like you).

I also think that there’s a good case to be made that Yudkowsky tends to be overconfident, and this should be taken into account when deferring; but when it comes to making big-picture forecasts, the main value of deference is in helping us decide which ideas and arguments to take seriously, rather than the specific credences we should place on them, since the space of ideas is so large.

But we do also need to try to have well-calibrated credences, of course. For the reason given in the post, it’s important to know whether the risk of everyone dying soon is 5% or 99%. It’s not enough just to determine whether we should take AI risk seriously.

We’re also now past the point, as a community, where “Should AI risk be taken seriously?” is that much of a live question. The main epistemic question that matters is what probability we assign to it—and I think this post is relevant to that.

(More generally, rather than reading this post, I recommend people read this one by Paul Christiano, which outlines specific agreements and disagreements.)

I definitely recommend people read the post Paul just wrote! I think it’s overall more useful than this one.

But I don’t think there’s an either-or here. People—particularly non-experts in a domain—do and should form their views through a mixture of engaging with arguments and deferring to others. So both arguments and track records should be discussed.

The EA community has ended up strongly moving in Yudkowsky’s direction over the last decade, and that seems like much more compelling evidence than anything listed in this post.

I discuss this in response to another comment, here, but I’m not convinced of that point.
What links here?
- Making decisions using multiple worldviews by Richard_Ngo (LessWrong; 13 Jul 2022 19:15 UTC; 50 points)
- Making decisions using multiple worldviews by richard_ngo (13 Jul 2022 19:15 UTC; 43 points)

bgarfinkel 27 Aug 2022 14:33 UTC
83 points
0 ∶ 0
in reply to: John G. Halstead’s comment on: Climate Change & Longtermism: new book-length report
I generally think it’d be good to have a higher evidential bar for making these kinds of accusations on the forum. Partly, I think the downside of making an off-base socket-puppeting accusation (unfair reputation damage, distraction from object-level discussion, additional feeling of adversarialism) just tends to be larger than the upside of making a correct one.

Fwiw, in this case, I do trust that A.C. Skraeling isn’t Zoe. One point on this: Since she has a track record of being willing to go on record with comparatively blunter criticisms, using her own name, I think it would be a confusing choice to create a new pseudonym to post that initial comment.

bgarfinkel 1 Jan 2022 5:30 UTC
69 points
0 ∶ 0
in reply to: RAB’s comment on: Democratising Risk—or how EA deals with critics

I’m not familiar with Zoe’s work, and would love to hear from anyone who has worked with them in the past. After seeing the red flags mentioned above, and being stuck with only Zoe’s word for their claims, anything from a named community member along the lines of “this person has done good research/has been intellectually honest” would be a big update for me…. [The post] strikes me as being motivated not by a desire to increase community understanding of an important issue, but rather to generate sympathy for the authors and support for their position by appealing to justice and fairness norms. The other explanation is that this was a very stressful experience, and the author was simply venting their frustrations.

(Hopefully I’m not overstepping; I’m just reading this thread now and thought someone ought to reply.)

I’ve worked with Zoe and am happy to vouch for her intentions here; I’m sure others would be as well. I served as her advisor at FHI for a bit more than a year, and have now known her for a few years. Although I didn’t review this paper, and don’t have any detailed or first-hand knowledge of the reviewer discussions, I have also talked to her about this paper a few different times while she’s been working on it with Luke.

I’m very confident that this post reflects genuine concern/frustration; it would be a mistake to dismiss it as (e.g.) a strategy to attract funding or bias readers toward accepting the paper’s arguments. In general, I’m confident that Zoe genuinely cares about the health of the EA and existential risk communities and that her critiques have come from this perspective.

bgarfinkel 19 Jun 2022 16:15 UTC
61 points
0 ∶ 0
in reply to: 𝕮𝖎𝖓𝖊𝖗𝖆’s comment on: On Deference and Yudkowsky’s AI Risk Estimates

I prefer to just analyse and refute his concrete arguments on the object level.

I agree that work analyzing specific arguments is, overall, more useful than work analyzing individual people’s track records. Personally, partly for that reason, I’ve actually done a decent amount of public argument analysis (e.g. here, here, and most recently here) but never written a post like this before.

Still, I think, people do in practice tend to engage in epistemic deference. (I think that even people who don’t consciously practice epistemic deference tend to be influenced by the views of people they respect.) I also think that people should practice some level of epistemic deference, particularly if they’re new to an area. So—in that sense—I think this kind of track record analysis is still worth doing, even if it’s overall less useful than argument analysis.

bgarfinkel 15 Feb 2022 23:49 UTC
59 points
0 ∶ 0
on: How likely is World War III?

Let’s call the hypothesis that the base rate of major wars hasn’t changed the constant risk hypothesis. The best presentation of this view is in Only the Dead, a book by an IR professor with the glorious name of Bear Braumoeller. He argues that there is no clear trend in the average incidence of several measures of conflict—including uses of force, militarized disputes, all interstate wars, and wars between “politically-relevant dyads”—between 1800 and today.

A quick note on Braumoeller’s analysis:

He’s relying on the Correlates of War (COW) dataset, which is extremely commonly used but (in my opinion) somewhat more problematic than the book indicates. As a result, I don’t think we should give the book’s main finding too much weight.

The COW dataset is meant to record all “militarized disputes” between states since 1816. However, it uses a really strange standard for what counts as a “state.” If I remember correctly, up until WW1, a political entity only qualifies as a “state” if it has a sufficiently high-level diplomatic presence in England or France. As a result, in 1816, there are supposedly only two non-European states: Turkey and the US. If I remember correctly, even an obvious state like China doesn’t get classified as a “state” until after the Opium Wars. The dataset only really becomes properly global sometime in the 20th century century.

This means that Braumoeller is actually comparing (A) the rate of intra-European conflict in the first half of the 19th century and (B) the global rate of interstate conflict in the late 20th century.

This 19th-century-Europe-vs.-20th-century-world comparison is interesting, and suggestive, but isn’t necessarily as informative as we’d want. Europe was almost certainly, by far, the most conflict-free part of the world at the start of the 19th century—so I strongly expect that the actual global rate of conflict in the early 19th century was much higher.

It’s also important that the COW dataset begins in 1816, at the very start of a few-decade period that was—at the time—marvelled over as the most peaceful in all of European history. This period was immediately preceded by two decades of intense warfare involving essentially all the states in Europe.

So, in summary: I think Braumoeller’s analysis would probably show a long-run drop in the rate of conflict if the COW dataset was either properly global or went back slightly further in time. (Which is good news!)

EDIT: Here’s a bit more detail, on the claim that the COW dataset can’t tell us very much about long-run trends in the global rate of interstate conflict.

From the COW documentation, these are the criteria for state membership:

The Correlates of War project includes a state in the international system from 1816-2016 for the following criteria. Prior to 1920, the entity must have had a population greater than 500,000 and have had diplomatic missions at or above the rank of charge d’affaires with Britain and France. After 1920, the entity must be a member of the League of Nations or the United Nations, or have a population greater than 500,000 and receive diplomatic missions from two major powers.

As a result, the dataset starts out assuming that only 23 states existed in 1816. For reference, they’re: Austria-Hungary, Baden, Bavaria, Denmark, France, Germany, Hesse Electoral, Hesse Grand Ducal, Italy, Netherlands, Papal States, Portugal, Russia, Saxony, Two Sicilies, Spain, Sweden, Switzerland, Tuscany, United Kingdom, USA, Wuerttemburg, and Turkey.

An alternative dataset, the International Systems(s) Dataset, instead produces an estimate of 135 states by relaxing the criteria to (a) estimated population over 100,000, (b) “autonomy over a specific territory”, and (c) “sovereignty that is either uncontested or acknowledged by the relevant international actors.”

So—at least by these alternative standards—the COW dataset starts out considering only a very small portion (<20%) of the international system. We also have reason to believe that this portion of the international system was really unusually peaceful internally, rather than serving as a representative sample.
What links here?
- How bad could a war get? by Stephen Clare (4 Nov 2022 9:25 UTC; 130 points)
- Future Matters #0: Space governance, future-proof ethics, and the launch of the Future Fund by Pablo (22 Mar 2022 21:15 UTC; 109 points)

bgarfinkel 21 May 2022 14:54 UTC
57 points
0 ∶ 0
on: Ben Garfinkel’s Shortform
The existential risk community’s relative level of concern about different existential risks is correlated with how hard-to-analyze these risks are. For example, here is The Precipice’s ranking of the top five most concerning existential risks:
1. Unaligned artificial intelligence^[1]
2. Unforeseen anthropogenic risks (tied)
3. Engineered pandemics (tied)
4. Other anthropogenic risks
5. Nuclear war (tied)
6. Climate change (tied)
This isn’t surprising.

For a number of risks, when you first hear about them, it’s reasonable to have the reaction “Oh, hm, maybe that could be a huge threat to human survival” and initially assign something on the order of a 10% credence to the hypothesis that it will by default lead to existentially bad outcomes. In each case, if we can gain much greater clarity about the risk, then we should think there’s about a 90% chance we’ll become less worried about it. We’re likely to remain decently worried about hard-to-analyze risks (because we can’t get greater clarity about them) while becoming less worried about easy-to-analyze risks.

In particular, our level of worry about different plausible existential risks is likely to roughly track our ability to analyze them (e.g. through empirical evidence, predictively accurate formal models, and clearcut arguments).

Some plausible existential risks also are far easier to analyze than others. If you compare 80K’s articles on climate change and artificial intelligence, for example, then I think it is pretty clear that people analyzing climate risk simply have a lot more to go on. When we study climate change, we can rely on climate models that we have reason to believe have a decent amount of validity. We can also draw on empirical evidence about the historical effects of previous large changes in global temperature and about the ability of humans and other specifies to survive under different local climate conditions. And so on. We’re in a much worse epistemic position when it comes to analyzing the risk from misaligned AI: we’re reliant on fuzzy analogies, abstract arguments that use highly ambiguous concepts, observations of the behaviour of present-day AI systems (e.g. reinforcement learners that play videogames) that will probably be very different than future AI systems, a single datapoint (the evolution of human intelligence and values) that has a lot of important differences with the case we’re considering, and attempts to predict the incentives and beliefs of future actors in development scenarios that are still very opaque to us. Even if the existential risk from misaligned AI actually is reasonably small, it’s hard to see how we could become really confident of that.

Some upshots:
1. The fact that the existential risk community is particularly worried about misaligned AI might mostly reflect the fact that it’s hard to analyze risks from misaligned AI.
2. Nonetheless, even if the above possibility is true, it doesn’t at all follow that the community is irrational to worry far more about misaligned AI than other potential risks. It’s completely coherent to have something like this attitude: “If I could think more clearly about the risk from misaligned AI, then I would probably come to realize it’s not that big a deal. But, in practice, I can’t yet think very clearly about it. That means that, unlike in the case of climate change, I also can’t rule out the small possibility that clarity would make me much more worried about it than I currently am. So, on balance, I should feel more worried about misaligned AI than I do about other risks. I should focus my efforts on it, even if — to uncharitable observers — my efforts will probably look a bit misguided after the fact.”
3. For hard-to-analyze risks, it matters a lot what your “prior” in the risks is (since evidence, models, and arguments can only really move you so much). I sometimes get the sense that some people are starting from a prior that’s not far from 50%: For example, people who are very worried about misaligned AI sometimes use the rhetorical move “How would the world look different if AI wasn’t going to kill everyone?”, and this move seems to assume that empirical evidence is needed to shift us down from a high credence. I think that other people (including myself) are often implicitly starting from a low prior and feel the need to be argued up. Insofar as it’s very unclear how we should determine our priors, and it’s even a bit unclear what exactly a “prior” means in this case, it’s also unsurprising that there’s a particularly huge range of variation in estimates of the risk from misaligned AI.
(This shortform partly inspired by Greg Lewis’s recent forecasting post .)
1. ↩︎
  Toby Ord notes, in the section of The Precipice that gives risk estimates: “The case for existential risk from AI is clearly speculative. Indeed, it is the most speculative case for a major risk in this book.”
What links here?
- We should expect to worry more about speculative risks by bgarfinkel (29 May 2022 21:08 UTC; 120 points)
- Linch's comment on Don’t Be Comforted by Failed Apocalypses by ColdButtonIssues (21 May 2022 20:29 UTC; 2 points)

bgarfinkel 20 Jun 2022 11:06 UTC
52 points
0 ∶ 0
on: On Deference and Yudkowsky’s AI Risk Estimates
A general reflection: I wonder if one at least minor contributing factor to disagreement, around whether this post is worthwhile, is different understandings about who the relevant audience is.

I mostly have in mind people who have read and engaged a little bit with AI risk debates, but not yet in a very deep way, and would overall be disinclined to form strong independent views on the basis of (e.g.) simply reading Yudkowsky’s and Christiano’s most recent posts. I think the info I’ve included in this post could be pretty relevant to these people, since in practice they’re often going to rely a lot—consciously or unconsciously; directly or indirectly—on cues about how much weight to give different prominent figures’ views. I also think that the majority of members of the existential risk community are in this reference class.

I think the info in this post isn’t nearly as relevant to people who’ve consumed and reflected on the relevant debates very deeply. The more you’ve engaged with and reflected on an issue, the less you should be inclined to defer—and therefore the less relevant track records become.

(The limited target audience might be something I don’t do a good enough job communicating in the post.)

bgarfinkel 19 Jun 2022 19:19 UTC
42 points
0 ∶ 0
in reply to: Habryka’s comment on: On Deference and Yudkowsky’s AI Risk Estimates
On 1 (the nanotech case):

I want to remind any reader that this is an opinion from 1999, when Eliezer was barely 20 years old.

I think your comment might give the misimpression that I don’t discuss this fact in the post or explain why I include the case. What I write is:

I should, once again, emphasize that Yudkowsky was around twenty when he did the final updates on this essay. In that sense, it might be unfair to bring this very old example up.

Nonetheless, I do think this case can be treated as informative, since: the belief was so analogous to his current belief about AI (a high outlier credence in near-term doom from an emerging technology), since he had thought a lot about the subject and was already highly engaged in the relevant intellectual community, since it’s not clear when he dropped the belief, and since twenty isn’t (in my view) actually all that young. I do know a lot of people in their early twenties; I think their current work and styles of thought are likely to be predictive of their work and styles of thought in the future, even though I do of course expect the quality to go up over time....

An addition reason why I think it’s worth distinguishing between his views on nanotech and (e.g.) your views on nuclear power: I think there’s a difference between an off-hand view picked up from other people vs. a fairly idiosyncratic view that you consciously adopted after a lot of reflection and that you decide to devote your professional life to and found an organization to address.

It’s definitely up to the reader to decide how relevant the nanotech case is. Since it’s not widely known, it seems at least pretty plausibly relevant, and the post twice flags his age at the time, I do still endorse including it.

At face value, as well: we’re trying to assess how much weight to give to someone’s extreme, outlier-ish prediction that an emerging technology is almost certain to kill everyone very soon. It just does seem very relevant, to me, that they previously had a different extreme outlier-ish prediction that another emerging technology was very likely kill everyone within a decade.

I don’t find it plausible that we should assign basically no significance to this.

On 6 (the question of whether Yudkowsky has acknowledged negative aspects of his track record):

For the two “clear cut” examples, Eliezer has posted dozens of times on the internet that he has disendorsed his views from before 2002. This is present on his personal website, the relevant articles are no longer prominently linked anywhere, and Eliezer has openly and straightforwardly acknowledged that his predictions and beliefs from the relevant period were wrong.

Similarly, I think your comment may give the impression that I don’t discuss this point in the post. What I write is this:

He has written about mistakes from early on in his intellectual life (particularly pre-2003) and has, on this basis, even made a blanket-statement disavowing his pre-2003 work. However, based on my memory and a quick re-read/re-skim, this writing is an exploration of why it took him a long time to become extremely concerned about existential risks from misaligned AI. For instance, the main issue it discusses with his plans to build AGI are that these plans didn’t take into account the difficulty and importance of ensuring alignment. This writing isn’t, I think, an exploration or acknowledgement of the kinds of mistakes I’ve listed in this post.

On the general point that this post uses old examples:

Give the sorts of predictions involved (forecasts about pathways to transformative technologies), old examples are generally going to be more unambiguous than new examples. Similarly for risk arguments: it’s hard to have a sense of how new arguments are going to hold up. It’s only for older arguments that we can start to approach the ability to say that technological progress, progress in arguments, and evolving community opinion say something clear-ish about how strong the arguments were.

On signposting:

I also dislike calling this post “On Deference and Yudkowsky’s AI Risk Estimates”, as if this post was trying to be an unbiased analysis of how much to defer to Eliezer, while you just list negative examples. I think this post is better named “against Yudkowsky on AI Risk estimates”. Or “against Yudkowsky’s track record in AI Risk Estimates”. Which would have made it clear that you are selectively giving evidence for one side, and more clearly signposted that if someone was trying to evaluate Eliezer’s track record, this post will only be a highly incomplete starting point.

I think it’s possible another title would have been better (I chose a purposely bland one partly for the purpose of trying to reduce heat—and that might have been a mistake). But I do think I signpost what the post is doing fairly clearly.

The introduction says it’s focusing on “negative aspects” of Yudkowsky’s track record, the section heading for the section introducing the examples describes them as “cherry-picked,” and the start of the section introducing the examples has an italicized paragraph re-emphasizing that the examples are selective and commenting on the significance of this selectiveness.

On the role of the fast take-off assumption in classic arguments:

I think the arguments are pretty tight and sufficient to establish the basic risk argument. I found your critique relatively uncompelling. In particular, I think you are misrepresenting that a premise of the original arguments was a fast takeoff.

I disagree with this. I do think it’s fair to say that fast take-off was typically a premise of the classic arguments.

Two examples I have off-hand (since they’re in the slides from my talk) are from Yudkowsky’s exchange with Caplan and from Superintelligence. Superintelligence isn’t by Yudkowsky, of course, but hopefully is still meaningful to include (insofar as Superintelligence heavily drew on Yudkowsky’s work and was often accepted as a kind of distillation of the best arguments as they existed at the time).

From Yudkowsky’s debate with Caplan (2016):
“I’d ask which of the following statements Bryan Caplan [a critic of AI risk arguments] denies:
1. Orthogonality thesis: Intelligence can be directed toward any compact goal….
2. Instrumental convergence: An AI doesn’t need to specifically hate you to hurt you; a paperclip maximizer doesn’t hate you but you’re made out of atoms that it can use to make paperclips, so leaving you alive represents an opportunity cost and a number of foregone paperclips….
3. Rapid capability gain and large capability differences: Under scenarios seeming more plausible than not, there’s the possibility of AIs gaining in capability very rapidly, achieving large absolute differences of capability, or some mixture of the two….
4. 1-3 in combination imply that Unfriendly AI is a critical problem-to-be-solved, because AGI is not automatically nice, by default does things we regard as harmful, and will have avenues leading up to great intelligence and power.”
(Caveat that the fast-take-off premise is stated a bit ambiguity here, so it’s not clear what level of rapidness is being assumed.)

From Superintelligence:

Taken together, these three points [decisive strategic advantage, orthogonality, and instrumental convergence] thus indicate that the first superintelligence may shape the future of Earth-originating life, could easily have non-anthropomorphic final goals, and would likely have instrumental reasons to pursue open-ended resource acquisition. If we now reflect that human beings consist of useful resources (such as conveniently located atoms) and that we depend for our survival and flourishing on many more local resources, we can see that the outcome could easily be one in which humanity quickly becomes extinct.

The decisive strategic advantage point is justified through a discussion of the possibility of a fast take-off. The first chapter of the book also starts by introducing the possibility of an intelligence explosion. It then devotes two chapters to the possibility of a fast take-off and the idea this might imply a decisive strategic advantage, before it gets to discussing things like the orthogonality thesis.

I think it’s also relevant that content from MIRI and people associated with MIRI, raising the possibility of extinction from AI, tended to very strongly emphasize (e.g. spend most of its time on) the possibility of a run-away intelligence explosion. The most developed classic pieces arguing for AI risk often have names like “Shaping the Intelligence Explosion,” “Intelligence Explosion: Evidence and import,” “Intelligence Explosion Microeconomics,” and “Facing the Intelligence Explosion.”

Overall, then, I do think it’s fair to consider a fast-takeoff to be a core premise of the classic arguments. It wasn’t incidental or a secondary consideration.

[[Note: I’ve edited my comment, here, to respond to additional points. Although there are still some I haven’t responded to yet.]]

bgarfinkel 19 Jun 2022 16:37 UTC
39 points
0 ∶ 0
in reply to: 𝕮𝖎𝖓𝖊𝖗𝖆’s comment on: On Deference and Yudkowsky’s AI Risk Estimates

I do not want an epistemic culture that finds it acceptable to challenge an individuals overall credibility in lieu of directly engaging with their arguments.

I think I roughly agree with you on this point, although I would guess I have at least a somewhat weaker version of your view. If discourse about people’s track records or reliability starts taking up (e.g.) more than a fifth of the space that object-level argument does, within the most engaged core of people, then I do think that will tend to suggest an unhealthy or at least not-very-intellectually-productive community.

One caveat: For less engaged people, I do actually think it can make sense to spend most of your time thinking about questions around deference. If I’m only going to spend ten hours thinking about nanotechnology risk, for example, then I might actually want to spend most of this time trying to get a sense of what different people believe and how much weight I should give their views; I’m probably not going to be able to make a ton of headway getting a good gears-level-understanding of the relevant issues, particularly as someone without a chemistry or engineering background.

bgarfinkel 6 Jan 2019 21:46 UTC
38 points
0 ∶ 0
on: Altruistic Motivations
I’m a bit concerned that this post is blurring the distinction between two different questions: “Do we have obligations to others?” and “What way of ‘framing’ effective altruism to yourself is most productive or sits best emotionally?”

For example, it may be the case that “guilt and shame are poor motivators,” but this would have no bearing on the question of whether or not we have moral obligations. People who say that we “ought to” help others don’t normally say it because they think that obligation is an instrumentally useful framing—they say it because they believe that what they’re saying is true.

Just do what you want to do.

Internalising this principle might make many people happier—and might even lead many altruistically-inclined people to do more good in the long run.

But I also think the principle is probably false. It implies, for example, that sadists and abusers should just do what they want to do as well. If there are actually any “oughtthorities to ordain what is right and what is wrong,” then it seems unlikely these oughtthorities would endorse harming others in such cases. On the other hand, if the post is right about there not being any oughtthorities (i.e. normative facts), then the principle is still at minimum no more correct than the principle that people should “just do what helps others the most.”

bgarfinkel 18 Apr 2021 23:19 UTC
36 points
0 ∶ 0
in reply to: bgarfinkel’s comment on: Ben Garfinkel’s Shortform
In general, I think “read short descriptions of randomly sampled cases” might be an underrated way to learn about the world and notice issues with your assumptions/models.

A couple other examples:

I’ve been trying to develop a better understanding of various aspects of interstate conflict. The Correlates of War militarized interstate disputes (MIDs) dataset is, I think, somewhat useful for this. The project files include short descriptions of (supposedly) every case between 1993 and 2014 in which one state “threatened, displayed, or used force against another.” Here, for example, is the set of descriptions for 2011-2014. I’m not sure I’ve had any huge/concrete take-aways, but I think reading the cases: (a) made me aware of some international tensions I was oblivious to; (b) gave me a slightly better understanding of dynamics around ‘micro-aggressions’ (e.g. flying over someone’s airspace); and (c) helped me more strongly internalize the low base rate for crises boiling over into war (since I disproportionately read about historical disputes that turned into something larger).

Last year, I also spent a bit of time trying to improve my understanding of police killings in the US. I found this book unusually useful. It includes short descriptions of every single incident in which an unarmed person was killed by a police officer in 2015. I feel like reading a portion of it helped me to quickly notice and internalize different aspects of the problem (e.g. the fact that something like a third of the deaths are caused by tasers; the large role of untreated mental illness as a risk factor; the fact that nearly all fatal interactions are triggered by 911 calls, rather than stops; the fact that officers are trained to interact importantly differently with people they believe are on PCP; etc.). l assume I could have learned all the same things by just reading papers — but I think the case sampling approach was probably faster and better for retention.

I think it’s possible there might be value in creating “random case descriptions” collections for a broader range of phenomena. Academia really doesn’t emphasize these kinds of collections as tools for either research or teaching.

EDIT: Another good example of this approach to learning is Rob Besinger’s recent post “thirty-three randomly selected bioethics papers.”

bgarfinkel 2 May 2021 15:30 UTC
35 points
0 ∶ 0
on: Ben Garfinkel’s Shortform
A thought on how we describe existential risks from misaligned AI:

Sometimes discussions focus on a fairly specific version of AI risk, which involves humanity being quickly wiped out. Increasingly, though, the emphasis seems to be on the more abstract idea of “humanity losing control of its future.” I think it might be worthwhile to unpack this latter idea a bit more.

There’s already a fairly strong sense in which humanity has never controlled its own future. For example, looking back ten thousand years, no one decided that the sedentary agriculture would increasingly supplant hunting and gathering, that increasingly complex states would arise, that slavery would become common, that disease would take off, that social hierarchies and gender divisions would become stricter, etc. The transition to the modern world, and everything that came with this transition, also doesn’t seem to have been meaningfully chosen (or even really understood by anyone). The most serious effort to describe a possible future in detail — Hanson’s Age of Em — also describes a future with loads of features that most present-day people would not endorse.

As long as there are still strong competitive pressures or substantial random drift, it seems to me, no generation ever really gets to choose the future.^[1] It’s actually sort of ambiguous, then, what it means to worry about “losing control of our future.”

Here are a few alternative versions of the concern that feel a bit crisper to me:
1. If we ‘mess up on AI,’ then even the most powerful individual humans will have unusually little influence over their own lives or the world around them.^[2]
1. If we ‘mess up on AI,’ then future people may be unusually dissatisfied about the world they live in. In other words, people’s preferences will be unfilled to an unusually large degree.
2. Humanity may have a rare opportunity to take control of its own future, by achieving strong coordination and then locking various things in. But if we ‘mess up on AI,’ then we’ll miss out on this opportunity.^[3]
Something that’s a bit interesting about these alternative versions of the concern, though, is that they’re not inherently linked to AI alignment issues. Even if AI systems behave roughly as their users intend, I believe each of these outcomes is still conceivable. For example, if there’s a missed opportunity to achieve strong coordination around AI, the story might look like the failure of the Baruch Plan for international control of nuclear weapons: that failure had much more to do with politics than it had to do with the way engineers designed the technology in question.

In general, if we move beyond discussing very sharp alignment-related catastrophes (e.g. humanity being quickly wiped out), then I think concerns about misaligned AI start to bleed into broader AI governance concerns. It starts to become more ambiguous whether technical alignment issues are actually central or necessary to the disaster stories people tell.
1. ↩︎
  Although, admittedly, notable individuals or groups (e.g. early Christians) do sometimes have a fairly lasting and important influence.
2. ↩︎
  As an analogy, in the world of The Matrix, people may not actually have much less control over the long-run future than hunter-gatherers did twenty thousand years ago. But they certainly have much less control over their own lives.
3. ↩︎
  Notably, this is only a bad thing if we expect the relevant generation of humans to choose a better future than would be arrived at by default.

bgarfinkel 20 Nov 2019 20:21 UTC
30 points
1 ∶ 0
in reply to: Buck’s comment on: I’m Buck Shlegeris, I do research and outreach at MIRI, AMA

I think it’s plausible that Paul is being overly charitable to decision theorists; I’d love to hear whether skeptics of updateless decision theories actually agree that you shouldn’t build a CDT agent.

FWIW, I could probably be described as a “skeptic” of updateless decision theories; I’m pretty sympathetic to CDT. But I also don’t think we should build AI systems that consistently take the actions recommended by CDT. I know at least a few other people who favor CDT, but again (although small sample size) I don’t think any of them advocate for designing AI systems that consistently act in accordance with CDT.

I think the main thing that’s going on here is that academic decision theorists are primarily interested in normative principles. They’re mostly asking the question: “What criterion determines whether or not a decision is ‘rational’?” For example, standard CDT claims that an action is rational only if it’s the action that can be expected to cause the largest increase in value.

On the other hand, AI safety researchers seem to be mainly interested in a different question: “What sort of algorithm would it be rational for us to build into an AI system?” The first question doesn’t seem very relevant to the second one, since the different criteria of rationality proposed by academic decision theorists converge in most cases. For example: No matter whether CDT, EDT, or UDT is correct, it will not typically be rational to build a two-boxing AI system. It seems to me, then, that it’s probably not very pressing for the AI safety community to think about the first question or engage with the academic decision theory literature.

At the same time, though, AI safety writing on decision theory sometimes seems to ignore (or implicitly deny?) the distinction between these two questions. For example: The FDT paper seems to be pitched at philosophers and has an abstract that frames the paper as an exploration of “normative principles.” I think this understandably leads philosophers to interpret FDT as an attempt to answer the first question and to criticize it on those grounds.

they aren’t as oriented by the question of “how do I write down a decision theory which would have good outcomes if I created an intelligent agent which used it”

I would go further and say that (so far as I understand the field) most academic decisions theorists aren’t at all oriented by this question. I think the question they’re asking is again mostly independent. I’m also not sure it would even make sense to talk about “using” a “decision theory” in this context, insofar as we’re conceptualizing decision theories the way most academic decision theorists do (as normative principles). Talking about “using” CDT in this context is sort of like talking about “using” deontology.

[[EDIT: See also this short post for a better description of the distinction between a “criterion of rightness” and a “decision procedure.” Another way to express my impression of what’s going on is that academic decision theorists are typically talking about critera of rightness and AI safety decision theorists are typically (but not always) talking about decision procedures.]]
What links here?
- Buck's comment on I’m Buck Shlegeris, I do research and outreach at MIRI, AMA by Buck (21 Nov 2019 1:24 UTC; 24 points)

bgarfinkel 19 Jun 2022 18:22 UTC
29 points
0 ∶ 0
in reply to: gwern’s comment on: On Deference and Yudkowsky’s AI Risk Estimates
Thanks for the comment! A lot of this is useful.

calling LOGI and related articles ‘wrong’ because that’s not how DL looks right now is itself wrong. Yudkowsky has never said that DL or evolutionary approaches couldn’t work, or that all future AI work would look like the Bayesian program and logical approach he favored;

I mainly have the impression that LOGI and related articles were probably “wrong” because, so far as I’ve seen, nothing significant has been built on top of them in the intervening decade-and-half (even though LOGI’s successor was seemingly predicted to make it possible for a small group to build AGI). It doesn’t seem like there’s any sign that these articles were the start of a promising path to AGI that was simply slower than the deep learning path.

I have had the impression, though, that Yudkowsky also thought that logical/Bayesian approaches were in general more powerful/likely-to-enable-near-term-AGI (not just less safe) than DL. It’s totally possible this is a misimpression—and I’d be inclined to trust your impression over mine, since you’ve read more of his old writing than I have. (I’d also be interested if you happen to have any links handy.) But I’m not sure this significantly undermine the relevance of the LOGI case.

I continue to be amazed anyone can look at the past decade of DL and think that Hanson is strongly vindicated by it, rather than Yudkowsky-esque views.

I also think that, in various ways, Hanson also doesn’t come off great. For example, he expresses a favorable attitude toward the CYC project, which now looks like a clear dead end. He is also overly bullish about the importance of having lots of different modules. So I mostly don’t want to defend the view “Hanson had a great performance in the FOOM debate.”

I do think, though, his abstract view that compute and content (i.e. data) are centrally important are closer to mark than Yudkowsky’s expressed view. I think it does seem hard to defend Yudkowsky’s view that it’s possible for a programming team (with mid-2000s levels of compute) to acquire some “deep new insights,” go down into their basement, and then create an AI system that springboards itself into taking over the world. At least—I think it’s fair to say—the arguments weren’t strong enough to justify a lot of confidence in that view.

Yet, the number who take it seriously since Eliezer started advocating it is now far greater than it was when he started and was approximately the only person anywhere. You aren’t taking seriously that these surveyed researchers (“AI Impacts, CHAI, CLR, CSER, CSET, FHI, FLI, GCRI, MILA, MIRI, Open Philanthropy and PAI”) wouldn’t exist without Eliezer as he created the AI safety field as we know it, with everyone else downstream (like Bostrom’s influential Superintelligence—Eliezer with the serial numbers filed off and an Oxford logo added).

This is certainly a positive aspect of his track-record—that many people have now moved closer to his views. (It also suggests that his writing was, in expectation, a major positive contribution to the project of existential risk reduction—insofar as this writing has helped move people up and we assume this was the right direction to move.) But it doesn’t imply that we should give him many more “Bayes points” to him than we give to the people who moved.

Suppose, for example, that someone says in 2020 that there was a 50% chance of full-scale nuclear war in the next five years. Then—due to Russia’s invasion of Ukraine—most people move their credences upward (although they still remained closer to 0% than 50%). Does that imply the person giving the early warning was better-calibrated than the people who moved their estimates up? I don’t think so. And I think—in this nuclear case—some analysis can be used to justify the view that the person giving the early warning was probably overconfident; they probably didn’t have enough evidence or good enough arguments to actually justify a 50% credence.

It may still be the case that the person giving the early warning (in the hypothetical nuclear case) had some valuable and neglected insights, missed by others, that are well worth paying attention to and seriously reflecting on; but that’s a different matter from believing they were overall well-calibrated or should be deferred to much more than the people who moved.

[[EDIT: Something else it might be worth emphasizing, here, is that I’m not arguing for the view “ignore Eliezer.” It’s closer to “don’t give Eliezer’s views outsized weight, compared to (e.g.) the views of the next dozen people you might be inclined to defer to, and factor in evidence that his risk estimates might have a sigificant upward bias to them.”]]
What links here?
- bgarfinkel's comment on On Deference and Yudkowsky’s AI Risk Estimates by bgarfinkel (19 Jun 2022 20:47 UTC; 8 points)

bgarfinkel 30 Apr 2021 12:18 UTC
29 points
0 ∶ 0
on: Ben Garfinkel’s Shortform
A thought on epistemic deference:

The longer you hold a view, and the more publicly you hold a view, the more calcified it typically becomes. Changing your mind becomes more aversive and potentially costly, you have more tools at your disposal to mount a lawyerly defense, and you find it harder to adopt frameworks/perspectives other than your favored one (the grooves become firmly imprinted into your brain). At least, this is the way it seems and personally feels to me.^[1]

For this reason, the observation “someone I respect publicly argued for X many years ago and still believes X” typically only provides a bit more evidence than the observation “someone I respect argued for X many years ago.” For example, even though I greatly respect Daron Acemoglu, I think the observation “Daron Acemoglu still believes that political institutions are the central determinant of economic growth rates” only gives me a bit more evidence than the observation “15 years ago Daron Acemoglu publicly argued that institutions are the central determinant of economic growth rates.”

A corollary: If there’s an academic field that contains a long-standing debate, and you’d like to defer to experts in this field, you may want to give disproportionate weight to the opinions of junior academics. They’re less likely to have responded to recent evidence and arguments in an epistemically inflexible way.
1. ↩︎
  Of course, there are exceptions. The final chapter of Scout Mindset includes a moving example of a professor publicly abandoning a view he had championed for fifteen years, after a visiting academic presented persuasive new evidence. The reason these kinds of stories are moving, though, is that they describe truly exceptional behavior.

bgarfinkel 19 Jun 2022 22:56 UTC
27 points
2 ∶ 0
in reply to: Habryka’s comment on: On Deference and Yudkowsky’s AI Risk Estimates

If someone visibly learns from forecasting mistakes they make, that should clearly update us positively on them not repeating the same mistakes.

I suppose one of my main questions is whether he has visibly learned from the mistakes, in this case.

For example, I wasn’t able to find a post or comment to the effect of “When I was younger, I spent of years of my life motivated by the belief that near-term extinction from nanotech was looming. I turned out to be wrong. Here’s what I learned from that experience and how I’ve applied it to my forecasts of near-term existential risk from AI.” Or a post or comment acknowledging his previous over-optimistic AI timelines and what he learned from them, when formulating his current seemingly short AI timelines.

(I genuinely could be missing these, since he has so much public writing.)

bgarfinkel 28 May 2022 20:40 UTC
25 points
0 ∶ 0
on: Ben Garfinkel’s Shortform
A point about hiring and grantmaking, that may already be conventional wisdom:

If you’re hiring for highly autonomous roles at a non-profit, or looking for non-profit founders to fund, then advice derived from the startup world is often going to overweight the importance of entrepreneurialism relative to self-skepticism and reflectiveness.^[1]

Non-profits, particularly non-profits with longtermist missions, are typically trying to maximize something that is way more illegible than time-discounted future profits. To give a specific example: I think it’s way harder for an organization like the Centre for Effective Altruism to tell if it’s on the right track than it is for a company like Zoom to tell if it’s on the right track. CEA can track certain specific metrics (e.g. the number of “new connections” reported at each conference it organizes), but it will often be ambiguous how strongly these metrics reflect positive impact—and there will also always be a risk that various negative indirect effects aren’t being captured by the key metrics being used. In some cases, evaluating the expected impact of work will also require making assumptions about how the world will evolve over the next couple decades (e.g. assumptions about how pressing risks from AI are).

I think this means that it’s especially important for these non-profits to employ and be headed by people who are self-skeptical and reflect deeply on decisions. Being entrepreneurial, having a bias toward action, and so on, don’t count for much if the organisation isn’t pointed in the right direction. As Ozzie Gooen has pointed out, there are many examples of massive and superficially successful initiatives (headed by very driven and entrepreneurial people) whose theories-of-impact don’t stand up to scrutiny.

A specific example from Ozzie’s post: SpaceX is a massive and extraordinarily impressive venture that was (at least according to Elon Musk) largely started to help reduce the chance of human extinction, by helping humanity become a multi-planetary species earlier than it otherwise would. But I think it’s hard to see how their work reduces extinction risk very much. If you’re worried about the climate effects of nuclear war, for example, then it seems important to remember that post-nuclear-war Earth would still have a much more hospitable climate than Mars. It’s pretty hard to imagine a disaster scenario where building Martian colonies would be better than (for example) building some bunkers on Earth.^[2]^[3] So—relative to the organization’s stated social mission—all the talent, money, and effort SpaceX has absorbed might not ultimately come out to even close to as much as it could have.

A more concise way to put the concern here: Popular writing on talent identification is often implicitly asking the question “How can we identify future Elon Musks?” But, for the most part, longtermist non-profits shouldn’t be looking to put future Elon Musks into leadership positions .^[4]
1. ↩︎
  I have in mind, for example, advice given by Y Combinator and advice given in Talent.
2. ↩︎
  Another example: It’s possible that many highly successful environmentalist organizations/groups have ended up causing net harm to the environment, by being insufficiently self-skeptical and reflective when deciding how to approach nuclear energy issues.
3. ↩︎
  I’ve encountered the argument that a Mars mission will reduce existential risk by fostering a common human identity and political unity, or hope for the future, which will in turn lead to policies that reduce other existential risks (e.g. bioterrorism or nuclear war). But I think this argument also doesn’t hold to scrutiny. Focusing just at the domestic level, for example, the Apollo program had far from universal support, and the decade that followed the moon landing definitely was very from a time of optimism and unity in the US. At the international level, it was also of course largely motivated by great power competition with the Soviet Union.
4. ↩︎
  A follow-up thought: Ultimately, outside of earning-to-give ventures, we probably shouldn’t expect the longtermist community (or at least the best version of it) to house many extremely entrepreneurial people. There will be occasional leaders who are extremely high on both entrepreneurialism and reflectiveness (I can currently think of at least a couple); however, since these two traits don’t seem to be strongly correlated, this will probably only happen pretty rarely. It’s also, often, hard to keep exceptionally entrepreneurial people satisfied in non-leadership positions—since, almost by definition, autonomy is deeply important to them—so there may not be many opportunities, in general, to harness the talents of people who are exceptionally high on entrepreneurialism but mediocre on reflectiveness.

bgarfinkel 18 Jun 2021 22:20 UTC
24 points
0 ∶ 0
on: Taboo “Outside View”
When people use “outside view” or “inside view” without clarifying which of the things on the above lists they mean, I am left ignorant of what exactly they are doing and how well-justified it is. People say “On the outside view, X seems unlikely to me.” I then ask them what they mean, and sometimes it turns out they are using some reference class, complete with a dataset. (Example: Tom Davidson’s four reference classes for TAI). Other times it turns out they are just using the anti-weirdness heuristic. Good thing I asked for elaboration!

FWIW, as a contrary datapoint, I don’t think I’ve really encountered this problem much in conversation. In my own experience (which may be quite different from yours): when someone makes some reference to an “outside view,” they say something that indicates roughly what kind of “outside view” they’re using. For example, if someone is just extrapolating a trend forward, they’ll reference the trend. Or if someone is deferring to expert opinion, they’ll reference expert opinion. I also don’t think I’d find it too bothersome, in any case, to occasionally have to ask the person which outside view they have in mind.

So this concern about opacity wouldn’t be enough to make me, personally, want people to stop using the term “outside view.”

If there’s a really serious linguistic issue, here, I think it’s probably that people sometimes talk about “the outside view” as though there’s only a single relevant outside view. I think Michael Aird made a good comment on my recent democracy post, where he suggests that people should taboo the phrase “the outside view” and instead use the phrase “an outside view.” (I was guilty of using the phrase “the outside view” in that post — and, arguably, of leaning too hard on one particular way of defining a reference class.) I’d be pretty happy if people just dropped the “the,” but kept talking about “outside views.”^[1]
1. ↩︎
  It’s of course a little ambiguous what counts as an “outside view,” but in practice I don’t think this is too huge of an issue. In my experience, which again may be different from yours, “taking an outside view” still does typically refer to using some sort of reference-class-based reasoning. It’s just the case that there are lots of different reference classes that people use. (“I’m extrapolating this 20-year trend forward, for another five years, because if a trend has been stable for 20 years it’s typically stable for another five.” “I’m deferring to the experts in this survey, because experts typically have more accurate views than amateurs.” Etc.) I do feel like this style of reasoning is useful and meaningfully distinct from, for example, reasoning based on causal models, so I’m happy to have a term for it, even if the boundaries of the concept are somewhat fuzzy.

bgarfinkel 2 Feb 2020 4:23 UTC
24 points
0 ∶ 0
in reply to: Habryka’s comment on: Concerning the Recent 2019-Novel Coronavirus Outbreak

I can guess that the primary motivation is not “making money” or “the feeling of winning and being right”—which would be quite inappropriate in this context

I don’t think these motivations would be inappropriate in this context. Those are fine motivations that we healthily leverage in large parts of the world to cause people to do good things, so of course we should leverage them here to allow us to do good things.

The whole economy relies on people being motivated to make money, and it has been a key ingredient to our ability to sustain the most prosperous period humanity has ever experienced (cf. more broadly the stock market). Of course I want people to have accurate beliefs by giving them the opportunity to make money. That is how you get them to have accurate beliefs!

At least from a common-sense morality perspective, this doesn’t sit right with me. I do feel that it would be wrong for two people to get together to bet about some horrible tragedy—“How many people will die in this genocide?” “Will troubled person X kill themselves this year?” etc. -- purely because they thought it’d be fun to win a bet and make some money off a friend. I definitely wouldn’t feel comfortable if a lot of people around me were doing this.

When the motives involve working to form more accurate and rigorous beliefs about ethically pressing issues, as they clearly were in this case, I think that’s a different story. I’m sympathetic to the thought that it would be bad to discourage this sort of public bet. I think it might also be possible to argue that, if the benefits of betting are great enough, then it’s worth condoning or even encouraging more ghoulishly motivated bets too. I guess I don’t really buy that, though. I don’t think that a norm specifically against public bets that are ghoulish from a common-sense morality perspective would place very important limitations on the community’s ability to form accurate beliefs or do good.

I do also think there are significant downsides, on the other hand, to having a culture that disregards common-sense feelings of discomfort like the ones Chi’s comment expressed.

[[EDIT: As a clarification, I’m not classifying the particular bet in this thread as “ghoulish.” I share the general sort of discomfort that Chi’s comment describes, while also recognizing that the bet was well-motivated and potentially helpful. I’m more generally pushing back against the thought that evident motives don’t matter much or that concerns about discomfort/disrespectfulness should never lead people to refrain from public bets.]]