Summary: Mistakes in the Moral Mathematics of Existential Risk (David Thorstad)
This post summarizes “Three Mistakes in the Moral Mathematics of Existential Risk,” a Global Priorities Institute Working Paper by David Thorstad. This post is part of my sequence of GPI Working Paper summaries. For more, Thorstad’s blog, Reflective Altruism, has a five-part series on this paper.
Introduction
Many prominent figures in the effective altruism community argue existential risk mitigation offers astronomical value. Thorstad believes there are many philosophical ways to push back on this conclusion[1] and even mathematical ones.
Thorstad argues leading models of existential risk mitigation neglect morally relevant parameters, mislocating debates and inflating existential risk reduction’s value by many orders of magnitude.
He broadly assumes we aren’t in the time of perils (which he justifies in this paper) and treats extinction risks as only those that kill all humans.[2]
Mistake 1: Cumulative Risk
Existential risks recur many times throughout history, meaning they can be presented as a per-century risk repeated each century or as a cumulative risk of occurring during a total time interval (e.g., the cumulative risk of extinction before Earth becomes less habitable).
Mistake 1: Expected value calculations of existential risk mitigation reduce the cumulative risk, not the per-century risk.
Thorstad identifies two problems with this choice.
If humans live a long time, small reductions in cumulative risk require astronomical reductions in per-century risk. This is because the chance we survive for the total time interval in question depends on cumulative risk, and our cumulative survival chance must exceed our reduction in cumulative risk.
Reducing cumulative risks with our actions today requires changing the risk for many, many centuries to come. So, even if we can substantially shift the risk of extinction this century or even nearby ones, we’ll likely have a hard time doing so for existential risk a thousand or million centuries from now.
For instance, if we want to create a meager one-in-a-hundred-million absolute reduction[3] in existential risk before Earth becomes less habitable,[4] the per-century risk must be nearly one-in-a-million or lower.[5] Many longtermists estimate this century’s existential risk to be ~15–20% or higher,[6] in which case we’d need to drive the per-century risk down a hundred thousand times. Hence, many expected value calculations of existential risk mitigation demand vastly greater reductions in per-century risk than they initially seem to.
Mistake 2: Background Risk
Millett and Snyder-Beattie (MSB) offer one of the most cited papers discussing biorisk—biological extinction risk—featuring a favorable cost-effectiveness estimate. While Thorstad believes many complaints about MSB’s model exist, he raises two.
Mistake 2: Existential risk mitigation calculations (including MSB’s model) ignore background risk.
In MSB’s model, the background risk is the risk of extinction from all non-biological sources. But, modifying this model to include background risk changes the estimated cost-effectiveness considerably.
Without background risk, a 1% relative reduction in biorisk has a meaningful impact on per-century risk: it discounts per-century risk by 1%. But, when you include non-biological background risk, the same reduction in biorisk changes the per-century risk far less: per-century risk becomes the discounted biorisk plus the full background risk. Since many longtermists believe the per-century risk is very high (~15–20% or higher)[6] and thus much greater than biorisk, this substantially reduces biorisk mitigation’s estimated cost-effectiveness:
For reference, GiveWell estimates its most effective short-term interventions can save a life-year for about $100.[7]
Thorstad also raises a second complaint with MSB’s model: It assumes we reduce the cumulative biorisk rather than the more plausible prospect of reducing the biorisk of nearby centuries.
Suppose our intervention reduces this century’s biorisk, but other centuries must fend for themselves. Combined with the background risk, this assumption produces the following doubly revised cost-effectiveness table:
A precipitous drop in cost-effectiveness. GiveWell’s recommendations now appear more cost-effective, perhaps by orders of magnitude.
Mistake 3: Population Dynamics
Longtermist estimates of the future population display astronomically many lives at stake if we don’t act prudently as a species. However, calculated by taking the number of lives a region (e.g., Earth) can support and multiplying it by the duration humanity could exist in the region, these estimates ignore background risk (mistake 2) and introduce the third mistake:
Mistake 3: Existential risk mitigation calculations ignore population dynamics.[8]
The most pessimistic existential risk mitigation calculations assume the population hovers around the current 8 billion, but most demographers[9] expect the population to begin decreasing by ~2100, which may be permanent.[10] After all, fertility rates are influenced by myriad factors other than the number of lives we can support in principle. Will MacAskill memorably projected the future population reaching five million ‘stick figures’ of ten billion humans each (5 * 10). In contrast, even with their most optimistic fertility rate (1.8), Geruso and Spears estimate the future population at no greater than three stick figures (3 * 10), even ignoring background risk.
Objections
High-Fertility Subpopulations
One may object that high-fertility subpopulations will constitute an increasing share of the total population, effectively reversing declining fertility rates, but Thorstad argues that many demographers place little confidence in this scenario and provides three reasons:
Fertility rates are dropping in high-fertility subpopulations[11]
Fertility norms are, at best, incompletely transmissible[12]
We could adapt, as high-fertility subpopulations would take centuries to gain an outsized population share[13]
Techno-optimism
One also might object to these population forecasts with more optimistic assumptions. For instance, say…
The human population will reach the maximum number of lives our inhabited region can support.
We continuously settle the stars in all directions at a tenth of the speed of light.
Christian Tarsney models these assumptions to compare existential risk mitigation with a near-term intervention, finding the former more effective if the risk of extinction per century remains below ~1.34%. Already, many longtermists estimate the per-century risk to be higher than this.[6] Not to mention, this assumes humanity continues immediately past each planet it settles toward the next, which demographers find unlikely. They prefer an alternative story:
As settlers populate planets, they gradually use up prime economic opportunities and make the planets crowded.
This eventually motivates settlers to colonize new planets.
If you assume it takes 1,000 years to colonize a planet, Tarsney’s model now only endorses existential risk mitigation if the risk of extinction per century remains below ~0.145%, almost ten times less than before. Hence, even with optimistic technological assumptions, population dynamics matter.
Conclusion / Brief Summary
Thorstad identifies three mistakes in existential risk mitigation calculations.
They focus on cumulative extinction risk, not per-century risk. This is a mistake for two reasons:
It assumes we can change the cumulative risk for all future generations.
Slightly changing the cumulative risk requires dramatic changes in per-century risk (if humans live a long time).
For instance, reducing cumulative risk by a millionth of a percent requires cutting per-century risk down to 1 in 100 million, which is five orders of magnitude below many longtermists’ estimated risk of extinction this century)[6]
They ignore background extinction risk. But, reducing any single existential risk is much less valuable when unaltered extinction risks exist, as it decreases the per-century risk less than it would otherwise.
For instance, one of the most cited cost-effectiveness estimates of biological extinction risk changes orders of magnitude when accounting for this mistake, making biorisk mitigation appear less cost-effective than GiveWell’s recommendations.
They ignore population dynamics by assuming population sizes are only determined by the maximum number of lives that inhabited regions can support.
On our best scientific population models,[9] the population begins declining by ~2100, which may be permanent.[10] MacAskill’s estimated five million future ‘stick figures’ of ten billion humans each may be replaced by only two or three.
Even optimistically assuming a spacefaring future for humanity where we reach the maximum number of lives that inhabited regions can support, the downtime required to establish colonies on each planet substantially reduces the value of existential risk mitigation.
For more, see the paper itself or Thorstad’s blog, Reflective Altruism, which has a five-part series on this paper.
- ^
Including population ethical neutrality (Naverson; Frick), discounting future people (Lloyd; Mogensen), prioritizing present duties (Cordelli), questioning personal perogatives (Unruh forthcoming), or being averse to risk (Pettigrew), ambiguity (Buchak), fanaticism (Monton; Smith), or aggregation (Curran; Heikkinen).
- ^
This assumption has three reasons: (1) Thorstad wants to avoid burdensome modeling complexity and (2) avoid accusations of fiddling with modeling assumptions, and (3) he believes this assumption will likely not alter conclusions.
- ^
In this case, an absolute reduction means we’re taking the original extinction risk and subtracting one in a hundred million (or 10).
- ^
In this estimate, Earth becomes less habitable in one billion years.
- ^
Thorstad says, “...an absolute reduction of 10 in cumulative existential risk would bring about a probability of at least 10 that humanity survives for a billion years. However, the probability of surviving for a billion years, or ten million centuries, depends on the cumulative risk r: we survive for ten million centuries with probability P(S) = (1 − r). For our cumulative survival chance P(S) to exceed the seemingly small probability 10 requires an extremely low per-century risk of r ≈ 1.6 ∗ 10, barely a one-in-a-million risk of existential catastrophe per century.”
- ^
“Ord (2020) puts risk at 16.6%; attendees of the 2008 Global Catastrophic Risks Conference at the Future of Humanity Institute gave a median estimate of 19% (Sandberg and Bostrom 2008); and the Astronomer Royal Martin Rees puts the chance of civilizational collapse at 50% by the end of the century (Rees 2003).”
- ^
Thorstad argues this estimate is conservative because it doesn’t consider global health interventions’ long-term wellfare benefits, such as their contribution to economic growth.
- ^
“Even readers who place nontrivial confidence in optimistic scenarios for future population growth may not place substantial confidence [this assumption], on which population size hovers near carrying capacity. There is a significant gap between the most optimistic and the most pessimistic population projections, and moving beyond pessimism need not carry us all the way to full optimism.”
- ^
See Basten et al., Lutz et al., and United Nations.
- ^
See Alexandrie and Eden, Geruso and Spears, and Spears et al.
- ^
- ^
- ^
This work hinges on the assumption that we’re not in a time of perils situation. In other work Thorstad argues that the common arguments for thinking we’re in a time of perils are uncompelling. I’m not sure I agree (i.e. on balance my inside view supports a time of perils, but I’m not sure that the case for this has ever been spelled out in a watertight way), but fair enough—it’s very healthy and good to poke at foundational assumptions. But he doesn’t provide any strong arguments that we aren’t in a time of perils. And the arguments presented here rely in important ways on certainty (rather than just likelihood) in the assumption that we’re not in a time of perils—the far-distant future should be discounted at its lowest possible rate, and that applies just as much to discounting for hazard rate (chance that we will go extinct) as to any other kind of discounting.
From my perspective, therefore, the value of this work is that it justifies that it would be importantly decision-relevant to find strong arguments that we’re not in a time of perils situation. That’s not hugely surprising, but it’s good to get the increased confidence and to have a handle on precisely how it would be decision-relevant.
Thorstadt has previously written a paper specifically addressing the time of perils hypothesis, summarised in seven parts here.
One of the points is that just being in a time of perils is not enough to debunk his arguments, it has to be a short time of perils, and the time of perils ending has to drop the risk by many orders of magnitude. These assumptions seem highly uncertain to me.
The main point of my comment above is that “highly uncertain” is enough to support action premised on the possibility of a time of perils.
For what it’s worth I think that the ontology of “dropping risk by many orders of magnitude” is putting somewhat too much emphasis on “risk per century” as a natural unit. I think a lot of anthropogenic risk is best understood not as a state risk (think “risk I randomly fall off the side of the boat”), but as a transition risk (think “risk I fall in as I try to put the sail up”). Some of the high risk imagined this century is from the possibility that we rush putting the sail up. We may not rush it! So my ex ante risk doesn’t diminish super steeply over centuries as I don’t know in which one the sail attempt will be made. But (in this metaphor) we only need to put the sail up once, and it would seem confused to argue that risk will stay high ~forever just because we don’t know when we’ll make the attempt.
I think this sail metaphor is more obfucatory than revealing. If you think that the risk will drop orders of magnitude and stay there, then it’s fine to say so, and you should make your object-level arguments for that. Calling it a transition doesn’t really add anything: society has been “transitioning” between different states for it’s entire lifetime, why is this one different?
Let me be clear about the type signature of the sail metaphor: it’s not giving an object-level argument that the risk will drop a long way. I think it’s a completely legit question why this one is different. (I’m not confident that it is, but the kind of reason I think it may well be are outlined in this post.)
Instead it’s saying that it may be more natural to have the object-level conversations about transitions rather than about risk-per-century. Here’s a stylized example:
Suppose you’re confident that putting up the sail will incur a 50% risk, and otherwise risk is essentially zero
Suppose further that you don’t know at all when the sail attempt will be made
(yeah, I’m mixing my metaphors here by keeping us on the boat for many centuries)
You decide to use Laplace’s law of succession on centuries, starting 1 century ago
So ex ante there was a 1⁄2 chance of it happening in the last century; but that now hasn’t happened, so there’s a 1⁄3 chance of it happening in the next century. If we wait N more centuries without it happening, then the probability of it happening over the following century (i.e. conditional on it not having happened yet) is 1/(3+N)
Then your risk of falling in is 16% over the next century, and decreasing smoothly with time, but still 0.01% absolute risk (i.e. that’s not even conditional on surviving that long) 100 centuries out
Here’s a quick spreadsheet with my sail math, if you want to play with it
In this example you’re certain there’s a time-of-perils dynamic going on, and that you have a 50% chance of an indefinitely long future without falling in. But it’s hard to argue for any particular century by which risk is very low … even the estimates in my spreadsheet don’t provide bounds on risk, because you weren’t at all confident in the per-century estimates of when the sail attempt would be made.
My claim is that in cases roughly like this it can be more illuminating to think and argue about the risk-per-transition than the risk-per-century. (Of course, if you think that most risk is state risk rather than transition risk that’s also worth discussion.)
Hmm, I definitely think there’s an object level disagreement about the structure of risk here.
Take the invention of nuclear weapons for example. This was certainly a “transition” in society relevant to existential risk. But it doesn’t make sense to me to analogise it to a risk in putting up a sail. Instead, nuclear weapons are just now a permanent risk to humanity, which goes up or down depending on geopolitical strategy.
I don’t see why future developments wouldn’t work the same way. It seems that since early humanity the state risk has only been increasing further and further as technology develops. I know there are arguments for why it could suddenly drop, but I agree with the linked Thorstadt analysis that this seems unlikely.
Ok, I agree with you that state risk is also an important part of the picture. I basically agree that nuclear risk is better understood as a state risk. I think the majority of AI risk is better understood as a transition risk, which was why I was emphasising that.
I guess at a very high level, I think: either there are accessible arrangements for society at some level of technological advancement which drive risk very low, or there aren’t. If there aren’t, it’s very unlikely that the future will be very large. If there are, then there’s a question of whether the world can reach such a state before an existential catastrophe. If risk now is lower than risk we’re likely to incur on the path to such an arrangement, it can be thought of as a transition risk (whether we manage to bear the increased exposure on the way) … by analogy, maybe there’s a part of putting up the sail where you’re exposed to being washed overboard by a freak wave, which can be thought of as a state risk which forms part of a transition risk.
If there are accessible arrangements, even if we can’t identify them now, I expect some significant effort to go into searching and steering for them, so a nontrivial chance of reaching one. An argument that we won’t reach such a state seems like it’s either going to need to argue that there are no such states (seems unlikely to me; I think my intuition is informed in part by the existence of error correcting codes), or that it’s vanishingly unlikely that we could reach one that does exist (doesn’t seem impossible to me but I find it hard to see how we could hope to get confidence on this point).
(With apologies, I think this comment is kind of dense. Some better version of it would give the arguments more cleanly.)
This reasoning seems off. Why would it have to drive thing to very low risk, rather than to a low but significant level of risk, like we have today with nuclear weapons? Why would it be impossible to find arrangements that keep the level of state risk at like 1%?
AI risk thinking seems to have a lot of “all or nothing” reasoning that seems completely unjustified to me.
I’m worried we’re talking past each other here. We totally might find arrangements that keep the state risk at like 1% -- and in that case then (as Thorstad points out) we expect not to have a very large future (though it could still be decently large compared to the world today).
But if your axiology is (in part) totalist, you’ll care a lot whether we actually get to very large futures. I’m saying (agreeing with Thorstad) that these are dependent on finding some arrangement which drives risk very low. Then I’m saying (disagreeing with Thorstad?) that the decision-relevant question is more like “have we got any chance of getting to such a state?” rather than “are we likely to reach such a state?”
Okay, that makes a lot more sense, thank you.
I think the talk of transition risks and sail metaphors aren’t actually that relevant to your argument here? Wouldn’t a gradual and continuous decrease to state risk, like Kuznets curve shown in Thorstadt’s paper here, have the same effect?
Yeah it totally has the same effect. It can just be less natural to analyse, if you think the risk will (or might) decrease a lot following some transition (which is also when the risk will mostly be incurred), but you’re less confident about when the transition will occur.
Nice discussion, Owen and titotal!
I think this depends on the timeframe. Over a longer one, looking into the estimated destroyable area by nuclear weapons, nuclear risk looks like a transition risk (see graph below). In addition, I think the nuclear extinction risk has decreased even more than the destroyable area, since I believe greater wealth has made society more resilient to the effects of nuclear war and nuclear winter. For reference, I estimated the current annual nuclear extinction risk is 5.93*10^-12.
You can conservatively multiply through by the probability that the time of perils is short enough and for the risk dropping enough orders of magnitude.
This summary was helpful — I’ve tried a couple times to engage with the original paper but found it hard, whereas this was very readable & I now think I understand the main points at a basic level :)
Glad it helped! All credit to Nicholas who wrote 99% of it. If you have a minute, I uploaded a talk version of the paper last week. Would love to hear what you think, especially re accessibility:
Thank you so much! I’m very happy to hear it was helpful for you.
This paper was previously shared and discussed (with detailed responses from Toby) here, and the arguments also shared and discussed before that here.
Kind of frustrating that there isn’t a single place for it to be discussed.
How is this as a snapshot of the discussion so far?
You can edit the image here and post as a comment: https://link.excalidraw.com/l/82wslD39E6w/5wUzJOIPnRl
I wish this was more well-known and read in the EA community. So far I have not seen any credible objections to these three compelling arguments. Perils or not perils, these arguments are still valid on their own.
I am intrigued by these sort of arguments. I tried reading the paper a little while ago and I read this post. But I find it basically impossible to follow what is being said. Maybe someone someday could do some explainer journalism on this paper. And on Toby Ord’s response (which Larks linked to in another comment), which I found similarly hard to follow.
Executive summary: Thorstad argues that leading models of existential risk mitigation make three key mistakes that significantly inflate the estimated value of reducing existential risk.
Key points:
Focusing on cumulative extinction risk instead of per-century risk assumes we can change risk for all future generations and requires dramatic reductions in per-century risk to make small changes to cumulative risk.
Ignoring background extinction risk from other sources makes reducing any single existential risk appear more valuable than it actually is when accounting for the unaltered risks.
Ignoring realistic population dynamics and assuming maximum supportable population leads to overestimates of potential future population. More realistic models project population decline.
Even with optimistic assumptions of space settlement, the downtime to establish each new colony reduces the value of existential risk mitigation compared to estimates that ignore this factor.
When accounting for these mistakes, the cost-effectiveness of reducing existential risk is reduced by orders of magnitude, potentially below that of near-term interventions.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.