General vs specific arguments for the longtermist importance of shaping AI development

Suppose you want to argue for the longtermist importance of shaping AI development.

My key point is: you can choose to make an argument that is more specific, or more general. Specific arguments include more details about what AI development looks like and how exactly it threatens humanity’s longterm potential; general arguments include fewer. The most specific arguments specify a complete threat model.

I think this is an important distinction, and that people being more clear about what kind of argument they are making/​motivated by would improve the quality of discussion about how much AI should be prioritised as a cause area, and how to prioritise between different AI-related interventions.

The distinction

I’ll clarify the distinction by giving examples of arguments that sit near the ends of the general vs specific spectrum.

Here’s a highly specific argument:

  1. It’s plausible that TAI will arrive this century

  2. If TAI arrives this century, then a scenario broadly similar to What failure looks like is plausible (see here for a summary if you’re unfamiliar)

  3. In this scenario, most of the value of the longterm future is lost

Here’s a highly general argument:

  1. It’s plausible that TAI will arrive this century

  2. If TAI arrives this century, it’s probably one of our best bets for positively shaping the longterm future

Of course, many arguments fall somewhere in between these extremes. For example, Richard Ngo’s AGI safety from first principles argues specifically for the plausibility of “AI takeover” (a scenario where the most consequential decisions about the future get made by AI systems with goals that aren’t desirable by human standards) - which is just one among many possible risks from advanced AI. And many of its arguments won’t apply if the field of AI moves away from focusing on machine learning. But the argument isn’t fully specific, because the author doesn’t argue for the plausibility of any specific variety of AI takeover scenario (there are many), instead focusing on making a more general case for the plausibility of AI takeover.

Joe Carlsmith’s report on existential risk from power-seeking AI is similar, and also falls somewhere between the extremes. As do arguments that appeal to the inner alignment problem being necessary for AI existential safety, but very difficult.

Why this distinction matters

Mostly, for the reason I mentioned at the beginning: when arguing about the longtermist importance of shaping AI development, I think that if people were clearer about how general/​specific their arguments are/​need to be, then this would improve discussion about (1) how much AI should be prioritised as a cause area, and (2) how to prioritise between different AI-related interventions.

For an example of (1): I sometimes hear people arguing against longtermist case for shaping AI development by pointing out that there’s wide disagreement and uncertainty about which risk scenarios are actually plausible, as if this is an argument against the entire case—when actually it’s just an argument against one kind of case you can make. Conversely, I’ve heard other discussions being vaguer than they should be about how exactly AI leads to existential catastrophe.

For an example of (2): for prioritising interventions within AI as a cause area, if the strongest arguments for this kind of work are very general, this suggests a broader portfolio of work than if the strongest arguments pick out a particular class of threat models.

Finally, I personally think that the strongest case that we can currently make for the longtermist importance of shaping AI development is fairly general—something along the lines of the most important century series—and yet this doesn’t seem to be the “default” argument (i.e. the one presented in key EA content/​fellowships/​etc. when discussing AI). Instead, it seems the “default” is something more specific, focusing on the alignment problem, and sometimes even particular threat models (e.g. recursively self-improving misaligned superintelligent AI). I would really like to see this redressed.

Acknowledgments: this distinction is alluded to in this post and this podcast episode. We discussed it while working on the Survey on AI existential risk scenarios. Thanks to Alex Holness-Tofts for helpful feedback.