Concepts of existential catastrophe (Hilary Greaves)

This paper was originally published as a working paper in September 2023 and is forthcoming in The Monist.

Abstract

The notion of existential catastrophe is increasingly appealed to in discussion of risk management around emerging technologies, but it is not completely clear what this notion amounts to. Here, I provide an opinionated survey of the space of plausibly useful definitions of existential catastrophe. Inter alia, I discuss: whether to define existential catastrophe in ex post or ex ante terms, whether an ex ante definition should be in terms of loss of expected value or loss of potential, and what kind of probabilities should be involved in any appeal to expected value.

Introduction and motivations

Humanity today arguably faces various very significant existential risks, especially from new and anticipated technologies such as nuclear weapons, synthetic biology and advanced artificial intelligence (Rees 2003, Posner 2004, Bostrom 2014, Häggström 2016, Ord 2020). Furthermore, the scale of the corresponding possible catastrophes is such that anything we could do to reduce their probability by even a tiny amount could plausibly score very highly in terms of expected value (Bostrom 2013, Beckstead 2013, Greaves and MacAskill 2024). If so, then addressing these risks should plausibly be one of our top priorities.

An existential risk is a risk of an existential catastrophe. An existential catastrophe is a particular type of possible event. This much is relatively clear. But there is not complete clarity, or uniformity of terminology, over what exactly it is for a given possible event to count as an existential catastrophe. Unclarity is no friend of fruitful discussion. Because of the importance of the topic, it is worth clarifying this as much as we can. The present paper is intended as a contribution to this task.

The aim of the paper is to survey the space of plausibly useful definitions, drawing out the key choice points. I will also offer arguments for the superiority of one definition over another where I see such arguments, but such arguments will often be far from conclusive; the main aim here is to clarify the menu of options.

I will discuss four broad approaches to defining “existential catastrophe”. The first approach (section 2) is to define existential catastrophe in terms of human extinction. A suitable notion of human extinction is indeed one concept that it is useful to work with. But it does not cover all the cases of interest. In thinking through the worst-case outcomes from technologies such as those listed above, analysts of existential risk are at least equally concerned about various other outcomes that do not involve extinction but would be similarly bad.

The other three approaches all seek to include these non-extinction types of existential catastrophe. The second approach appeals to loss of value, either ex post value (section 3) or expected value (section 4). There are several subtleties involved in making precise a definition based on expected value; I will suggest (though without watertight argument) that the best approach focuses on the consequences for expected value of “imaging” one’s evidential probabilities on the possible event in question. The fourth approach appeals to a notion of the loss of humanity’s potential (section 5). I will suggest (again, without watertight argument) that when the notion of “potential” is optimally understood, this fourth approach is theoretically equivalent to the third.

The notion of existential catastrophe has a natural inverse: there could be events that are as good as existential catastrophes are bad. Ord and Cotton-Barratt (2015) suggest coining the term “existential eucatastrophe” for this inverse notion. Section 6 sets out the idea, and briefly discusses how useful we should expect this inverse notion to be in actual practice. Section 7 discusses the possibility of defining existential catastrophe in more purely descriptive terms. Section 8 summarises.

Concepts of existential catastrophe (Hilary Greaves)

Abstract

Introduction and motivations

Read the rest of the paper