This post is a part of Rethink Priorities’ Worldview Investigations Team’s CURVE Sequence: “Causes and Uncertainty: Rethinking Value in Expectation.” The aim of this sequence is twofold: first, to consider alternatives to expected value maximisation for cause prioritisation; second, to evaluate the claim that a commitment to expected value maximisation robustly supports the conclusion that we ought to prioritise existential risk mitigation over all else.
Executive Summary
Background
This report builds on the model originally introduced byToby Ord on how to estimate the value of existential risk mitigation.
The previous framework has several limitations, including:
The inability to model anything requiring shorter time units than centuries, like AI timelines.
A very limited range of scenarios considered. In the previous model, risk and value growth can take different forms, and each combination represents one scenario
No explicit treatment of persistence –– how long the mitigation efforts’ effects last for ––as a variable of interest.
No easy way to visualise and compare the differences between different possible scenarios.
No mathematical discussion of the convergence of the cumulative value of existential risk mitigation, as time goes to infinity, for all of the main scenarios.
This report addresses the limitations above by enriching the base model and relaxing its key stylised assumptions.
What this report offers
There are many possible risk structure and value trajectory combinations. This report explicitly considers 20 scenarios.
The report examines several plausible scenarios that were absent from the existing literature on the model, like:
decreasingrisk (in particular exponentially decreasing) and Great Filtersrisk.
cubic and logisticvalue growth; both of which are widely used in adjacent literatures, so the report makes progress in consolidating the model with those approaches.
It offers key visual comparisons and illustrations of how risk mitigation efforts differ in value, like Figure 1 below.
The report is accompanied by an interactiveJupyter Notebook and a generalised mathematical framework that can, with minor input by the user, cope with any arbitrary value trajectory and risk profile they wish to investigate.
This acts as a uniquely versatile tool that can calculate and graph the expected value of risk mitigation.
The user can also adjust all the parameters in the 20 default scenarios.
Takeaways
In all 20 scenarios, the cumulative value of mitigation efforts converges to a finite number, as the time horizon goes to infinity.
This implies that it is not devoid of meaning to talk about the amount of long-term value obtained from mitigating risk, even in an infinitely long universe.
In this context, even if we assign any minuscule credence to one of the scenarios, it won’t overshadow the collective view.
It helps clarify what assumptions would be required for infinite value.
The report introduces the Great Filters Hypothesis:
It states that humanity will face a number of great filters, during which existential risk will be unusually high.
This hypothesis is a more general, and thus more plausible, version of what is commonly discussed under the name ‘Time of Perils’: the one filter case.
Persistence – the risk mitigation’s duration – plays a key role in our estimates, suggesting that work to investigate this role further, and to obtain better empirical estimates of different interventions’ persistence would be highly impactful. Other tentative lessons:
Interventions to increase persistence exhibit diminishing returns, and are most valuable for mitigation efforts exhibiting small persistence.
Great value requires relatively high persistence, and the latter could be implausible.
It is often assumed that, when considering long-term impact, existential risk mitigation is, in expectation, enormously valuable relative to other altruistic opportunities. There are a number of ways that could prove to be false. One possibility, which this report emphasises, is that the vast value of risk mitigation is only found in certain scenarios, each of which makes a whole host of assumptions.
The expected value of risk mitigation therefore strongly depends on our beliefs about these assumptions. And, depending on how we decide to aggregate our credences and which scenarios we allow for, astronomical value might be off the table after all.
Figure 1 (see header’s image): This is a visual representation of the estimated expected value of reducing existential risk by 0.01%. The image is to scale and one cubic unit is the size of the world under constant risk and constant value, the top-left scenario.
This abridged technical report is accompanied by an interactive Jupyter Notebook. The full report is available here.
Recommended: The PDF version of the abridged report can be accessed here.
Abridged Report
Introduction
Consider a catastrophe that permanently ends human civilisation.[1] You might find it plausible that any efforts to reduce the risk of such a catastrophe are of enormous value. You might also be inclined to think that the value is particularly high if the risks are high also. After all, in most contexts, the bigger the risk of something bad happening, the less it can be safely ignored. In other words, you might believe that it is of astronomical importance to mitigate these extinction risks because the stakes are very large and because the probability of these catastrophic scenarios is uncomfortably high. Existing work by Ord, Adamczewski and Thorstad (hereon ‘OAT’) argues that this last sentence is questionable: in the context of an extinction catastrophe, the higher we think the risk is, the less we should value efforts that mitigate that risk.[2]
Our initial intuitions are not always a good guide for how we should think about estimating the value of extinction risk mitigation. Indeed, the unexpected tensions between high pessimism about the risk we face and whether risk mitigation is of astronomical value, are a good example of this.[3] Similarly, simplified attempts and heuristics used to estimate the cost-effectiveness of risk reduction ---- such as those in 1, 2, 3, 4, 5 ---- turn out to only be appropriate in a handful of very restricted scenarios (usually where value and risk are constant in all the periods), and they otherwise mischaracterise the value of extinction risk mitigation.
If we want to evaluate the general merits of interventions that seek to safeguard humanity’s future, we need a systematic way to estimate the value of mitigating extinction risk. The current frameworks help us understand which scenarios might lead to astronomical value. However, they have several limitations that make it difficult, or sometimes impossible, to comment meaningfully on the amount of good that mitigating risk in the next few decades could achieve. This report builds on the existing models and provides tools to estimate the value of mitigating risk in more realistic settings.
The Base Model
As a first attempt to provide a more rigorous analysis, existing work presents a stylised model to assess the value of extinction risk mitigation given the following assumptions:
A1 Each century of human existence has some constant value.
A2 Humans face a constant level of per-century extinction risk.
A3 No value will be realised after an extinction catastrophe.
A4 Risk is reduced by a fraction.
A5 Risk is only reduced this century.
A6 Centuries are the shortest time units.
The model is clearly oversimplified, and, indeed, previous work has partially relaxed a subset of these six assumptions.[4] However, there are still several limitations present in those frameworks.
OAT Limitations
Some of the main limitations of the previous work include:
The current models lack the necessary resolution to yield results that are relevant for, or incorporate observations from, key issues like near-term AI timelines. The models cannot presently handle anything requiring shorter time units than centuries.
The duration of a mitigation action’s effects affects its overall value. However, OAT has not explored how varying the duration of these effects may impact the model.[5]
There are many possible scenarios (i.e., combinations of risk and value trajectories), and OAT has explored very few of these. Given our large uncertainty in this area, it is a priority to have a clear picture of how the value compares in each case. This will provide the necessary tools for future work that assigns credences to each scenario to arrive at better-informed expected value judgements.
There are currently no versatile frameworks that can calculate the expected value of mitigating risk, for a given set of idiosyncratic beliefs about risk and value trajectories.
As time goes to infinity, the expected value of existential risk mitigation could, in principle, be infinite; making most scenario comparisons redundant in those cases. There has been no formal discussion of the convergence of the value of extinction risk mitigation for all of the main scenarios.
Key Research Questions
The present report aims to tackle all of the above limitations. With that in mind, the key guiding questions are:
When is the value—of the future and of risk mitigation—particularly large and when is it not?
What is the Great Filter Hypothesis, how does it relate to the Time of Perils and what is the impact of adding great filters on the value of risk mitigation?
What are the qualitative pictures of the expected value of the world—and thus of mitigation efforts—given different risk structures (e.g. linear, Time of Perils, Great Filters, decaying) and value growth cases (e.g. linear, quadratic, cubic, logistic)?
How does the value of mitigation efforts depend on their persistence?
The main ambition here is to develop a generalised version of the toy model that relaxes all assumptions above, except for A3, no value after extinction, and A4, fractional risk reduction.[6][7] By relaxing A1 and A2 -- that the value and risk are constant—we are able to introduce a framework that can accommodate more complex risk structures and sophisticated value trajectories. We also depart from existing analyses by relaxing A6: here, years are the shortest time unit. Moreover, by also relaxing A5, the model now has tools to observe persistence of mitigation effects lasting less (or more) than one century and can meaningfully comment on the near-term value of extinction risk mitigation. Using this generalised framework, we can systematically assess the value of risk mitigation under various combinations of assumptions.
Generalised Model: Arbitrary Risk Profile
Let us consider the expected value of a world wthat faces an existential risk rt at time t. This is best observed with a picture.
At each period t the world ends with probability rt and all possible future value is reduced to zero. On the other hand, with probability (1−rt), the world progresses to the next period and achieves value vt, which is added to the total pool of value it had accrued. Figure 2 summarises all of this. The expected value is the value of each branch weighted by the probability of reaching that value. That is
where the maximum number of periods T is the age of the universe when it ends, and T→∞ when we assume an infinite universe. We do not impose that T→∞ or otherwise to give the flexibility to consider cases where there is some known, exogenous, end to the universe. Throughout this document, the length of a period will equal one year. However, the results are not tied to any particular interpretation of period length.[8]
Now consider a risk mitigation action M which reduces the original risk sequence from r to r′, where, for some t, r′t=(1−f)rt and f∈(0,1) is the fraction of the risk that is successfully mitigated.[9][10] What value have we added by performing action M? In the most basic sense, we have changed the expected value of the future by
E(M)=E(w′)−E(w),
where our action modified the original risk from r in world w to r′ in w′.[11] More generally, we could allow f≤0, which would amount to increasing the risk and M would produce negative value (or none at all if f=0). For example, f<0 if M made a nuclear war more likely by contributing to political instability. For the rest of the report we focus on non-negative value.
Value
Denote v as v1,v2,v3,v4,... as the sequence of values that the world will follow, conditional on the world existing at time t. Estimating this sequence is no trivial undertaking. There is large uncertainty in this area and considerable research is needed for us to insert reasonable values into the sequence v. Given this uncertainty, a promising approach is to develop a more flexible framework, i.e. the generalised model above and its accompanying code in the Jupyter Notebook, that is versatile enough to handle a wide range of cases. Next, we will investigate several possible paths for value growth, in particular: constant, linear, quadratic, cubic and logistic.
Value Cases Summary
Here is a table summary of the main value cases this report will investigate.[12] When the time unit is years instead of centuries, the value is adjusted to reflect this (see the full report here for the details). Cubic has previously been adopted for modelling interplanetary expansion. Logistic can be thought of as ‘exponential with a value cap’, a model that has special economic relevance.[13]
Constant
Linear
Quadratic
Cubic
Logistic
vt
vc
tvc
t2vc
t3vc
c1+c−sse−γt
Table 1: Summary of vt Cases
Here is a visual summary.
Persistence
Extinction risk mitigation actions could have effects that last different amounts of time. We may have reasons to believe that an action will reduce risks only for a few years; for example, passing a bill that restricts AI compute which is expected to be overturned after the next election cycle in 5 years. Other actions could last longer; for example, a shield in space that physically protects Earth from asteroid impact could be effective for thousands of years. Or, in the extreme case, an action could reduce extinction risk forever. In this report, we refer to the length of the mitigating effect of an action as its persistence.
Persistence is key in evaluating the value of an action M. In the Ord model, the persistence of M has been assumed to be of exactly one period (which equals one century in that setting). Thorstad proceeds with the same assumption and briefly considers the permanent case as well. Because persistence plays such an important role, we developed a more flexible framework where we allow persistence P to be anything between one period and permanently reducing risk, i.e. P∈Z+.
An investigation of persistence likely deserves a report of its own, both for a theoretical and empirical treatment of the issue. For now we will assume that M mitigates risk for P periods, without delay. We illustrate how results differ by presenting five representative cases: P=1,5,50,500,2000.
So, for example, if we had a risk profile of r=(0.5,0.5,0.2,0.4,0.1,0.2,...) and M acts at the first period with persistence P=3 and an efficacy of f=0.5, halving the risk, the profile then becomes: r′=(0.25,0.25,0.1,0.4,0.1,0.2,...).
A Concrete Example
There are too many cases for us to explicitly consider each one in the exposition of this report. Instead, they are systematically solved for and implemented in the code; so the user can see the results for any one desired scenario. However, it is pedagogically valuable to explicitly discuss one of these cases here.
Suppose that performing M halves the risk with a 5-year persistence. Let us also add some complexity to the risk structure, so it takes two constant values. Suppose that there is a 0.22% annual risk, which approximates a one in five chance of surviving the end of the century, under the assumption that it remains constant for the next 100 years.[14] Suppose that, for no particular reason, the annual risk after those 100 years is 0.01%.[15] That is r=(0.22%,0.22%,...,0.22%,0.01%,0.01%,...). Suppose, for this exercise, that this universe lasts 10,000 years.[16] We also normalise the value unit to vc=1. What is the value of performing M? It is
E(M)=E(w′)−E(w)≈28.6.
It is worth roughly 28.6vc to perform M under these assumptions, where vc is this year’s value of our world.
The Rest of this Report
So far, we have thought about risk in the abstract. Indeed, what we have outlined is enough for us to evaluate any arbitrary risk and value structure that we may want to test. See the Jupyter Notebook to try this yourself.
However, there are specific risk structures that we might be especially interested in evaluating. We might be inclined to believe certain stories about risk; for example, that it will systematically decline (like in the Decaying Risk section). Alternatively, we might want to pay heed to the commonly held view that humanity is living in a particularly risky period now, but will reach a low-risk future if it overcomes the present challenges. The concrete example above is an instance of this, assuming constant value. Thorstad states this view, termed the ‘Time of Perils’ sis and discussed more thoroughly here, as:
(ToP) Existential risk is and will remain high for several years, but it will drop to a low level if humanity survives this Time of Perils.
We explore this type of risk structure next.
Great Filters and the Time of Perils Hypothesis
Humanity is potentially facing unprecedented threats from nuclear weapons, engineered pandemics and advanced artificial intelligence, among others. It may be that we are living in perilous times. If we do well, we might escape these dangers. But who’s to say that there will be no comparable challenges in the future? The perilous times might return.
The reasoning above introduces the notion of great filters: hurdles that our civilisation must pass to ensure its long-term longevity (Hanson, 1998).[17] Specific details as to what these filters might be are beyond this work. But if AI is the first filter, we could easily imagine future ones such as escaping our dying sun or meeting powerful and unfriendly alien life. The great filter hypothesis tells us:
(GFH) Humanity will face one or more great filters, during which extinction risk will be unusually high. Otherwise, the risk will be low.
It follows that, by construction, the Time of Perils hypothesis is the one filter version of GFH. For the purposes of this report, let us consider a stylised model of GFH where:
There are F filters (e.g. F=2).
There are 2F ‘eras’, sets of periods within which risk is constant. Filters are high-risk eras.
Filters and low-risk eras alternate, starting with a filter.
The length of each era is given by ℓ=(ℓ1,ℓ2,...,ℓ2F).
At each era i, humanity faces a per-period constant risk gi, and g denotes the vector (g1,g2,...g2F).
For example, suppose that we had F=2, such that there are two filters, with two lower-risk eras of lower risk after each of them. Suppose that g=(r1,rlow,r2,rlow), ℓ=(100,500,100,10100) and that value is constant. From this we could write the expected value of such a world as
E(w)=100∑t=1vc(1−r1)tFirst filter+(1−r1)100500∑t=1vc(1−rlow)tLow-risk era+(1−r1)100(1−rlow)500100∑t=1vc(1−r2)tSecond filter+(1−r1)100(1−rlow)500(1−r2)10010100∑t=1vc(1−rlow)tLow-risk era
Decaying Risk
Optimistically, we could live in a world where humanity is progressively getting better at surviving. One way of modelling this is with decreasing risk, and in particular, we can specify an exponentially decreasing function; where r∞∈[0,1) is the risk as t→∞ , λ∈(0,1) is the decay rate, t is the period, r(t) is the risk in period t and the starting risk is r0+r∞≈r0∈(0,1) for small r∞. For the first few periods, the sequence is approximately: r0,r0e−λ,r0e−2λ,r0e−3λ,… More generally,
r(t)=r0⋅e−λt+r∞.
Risk Cases Summary
A graph summarising the main cases of interest can be found below.
Results
Convergence
As time goes to infinity, the expected value of existential risk mitigation could, in principle, be infinite. This would render comparing different estimates of E(M) redundant.[18] To investigate when this might happen, we turn our attention to convergence next.
We know that for any finite T, Equation 1 is bounded.[19] A key issue is whether the expected value of the world converges in an infinite universe. When T→∞, the series for the expected value of a world, E(w), as described in Equation 1, is given by the infinite sum
E(w)=∞∑t=1[(t∏j=1(1−rj))vt].
For this kind of series, we can use the Ratio Test to evaluate its convergence. The Ratio Test states that for a series ∑∞n=1an, if there exists a limit L=limn→∞∣∣an+1an∣∣, then the series converges absolutely if L<1, diverges if L>1, and is inconclusive if L=1. To apply the Ratio Test to E(w), we look at consecutive terms of the series and their ratio.
Recall that rt∈(0,1) for all i, so (1−ri) also lies within (0,1) for all i. Thus, if rt converges to a positive scalar, the exact risk level will not affect convergence. Instead, the convergence of the series E(w) critically depends on limt→∞∣∣vt+1vt∣∣. In particular, we find that this limit is less than or equal to 1 in our cases of interest, thus E(w) converges absolutely. The full details can be found in the report but as an example, consider the n-polynomial case, which is a more general version of all the cases, excepting logistic.
Under logistic, limt→∞∣∣vt+1vt∣∣=1 also. Hence, in the context of the various scenarios we’ve explored, we are now ready to present the following result:
Proposition 1. The expected value of the world is finite if existential risk does not converge to zero.
Proof. See the full report. An intuition: asymptotically, the probability of survival shrinks every period by a constant proportion, while value is either constant or increasing polynomially at a shrinking proportion. Therefore the expected value contribution for a distant enough t approximates zero. ◻
Maintain the assumption that the risk tends to any nonzero value. As an immediate consequence of the above proposition, we have:
Corollary 1. In an infinitely long universe, the value of existential risk mitigation is finite.
Proof.
E(M)=E(w′)−E(w)
and, by Proposition1, both E(w′) and E(w) converge. ◻
These results, tell us that it is meaningful to talk about the long-term value of risk mitigation, even in the infinite universe case. Moreover, however great the value might be, it is simply not infinite. We estimate the exact size of this value next, in the Results section. It should be emphasised that the scope of Corollary 1 and Proposition 1 is the scenarios that this report considers, and not all the possible ways of modelling risk and value. For example, the proofs fail when the risk exponentially decays to zero, or when value grows exponentially without a cap.
The Expected Value of Mitigating Risk Visualised
First, we present Figure 5, a grid which summarises what the expected value of the future is, without the presence of risk mitigation efforts.
The first column indicates what value case we are on, the first row what risk case, and the middle plots display the cumulative E(w) as time passes for each risk and value combination. Notice that in all cases, E(w) converges as T→∞. This is only indirectly related to the Convergence section, which is about the convergence of E(M) and not the expected value of the future. For the middle plots, the horizontal axis displays the range from year zero (today), until year 140,000. For visibility, we display until year 100,000 for exponential decay instead. The vertical axis is different every time so that all graphs are clearly visible. For example, constant risk under linear value is in the thousands of vc and Two Great Filters under logistic value is in billions of vc, where vc is always normalised to one. The default parameters for these simulations can be found and modified in the Notebook.
Next, we plot E(w), with and without performing M for all twenty scenarios in Figure 6. We do this for a range of persistence levels and, for entirely pedagogical reasons, we assume an extreme efficacy of f=50% reduction in the risk from performing M.
In the grid above, to calculate E(M) for some specific case, we first take the dotted curve that tells us the expected value of the world after performing the action, all under a particular scenario and at certain persistence. Then, we subtract the baseline E(w) without mitigation, i.e. we subtract the solid blue curve from any one dotted curve.
When discussing the value and eventually the cost effectiveness of risk mitigation, a useful and more realistic efficacy f is one basis point: f=0.0001. Table 2 below shows E(M) for all the scenarios of interest.
Though we show it above, we are suspicious of long persistence, both because effects are blunted by political or technological changes and because, given enough time, some actor is likely to perform an action that achieves similar effects.[20]
Given the difference in orders of magnitude, it can be difficult to directly compare the figures in this table. To facilitate this, we display Figure 1: a visual representation of the estimated expected value of reducing existential risk by 0.01%.[21] The image is to scale and one cubic unit is the size of the world under constant risk and constant value, the top-left scenario. A persistence of 5 years is assumed.
For an extended discussion of these results see the full report. Here are some key takeaways:
How many orders of magnitude E(M) is under Time of Perils crucially depends on assumptions about value growth (it is 11 million times bigger under cubic value compared to constant).
For constant value, as we vary the assumed risk and persistence, E(M) stays within one order of magnitude above or below the median value in Table 2. For linear and quadratic it’s within two orders of magnitude.
Adding another filter keeps E(M) in the same order of magnitude, and only reduces it by about 25%, under the default parameters in the Notebook.
Given a fixed persistence, there’s still extreme variability: the minimum E(M) is roughly 8 orders of magnitude smaller than the maximum.
This extreme difference can be put succinctly: suppose that the units were meters travelled as you walk away from London Bridge. The smallest value implies you’d walk 17cm, about the length of a pencil. Whereas the largest means that you’d walk from London to Sydney.
The Role of Persistence
Two remarks seem worth making. First, that persistence plays a key role in the value of risk mitigation. For example, in Figure 7 below, depending on persistence E(M) can increase by up to 30 times. Second, we suggest an empirical hypothesis that persistence is unlikely to be higher than 50 years. The reasoning here is that there might be interventions that reduce risk a lot for not very long or not very much but for a long time. But actions that drastically reduce risk and do so for a long time are rare. Jointly these two remarks entail that the value of risk mitigation is between one ten-thousandth of a vc (under constant risk and value) and two billion vc (under cubic and time of perils assuming f is one basis point), a considerable range.[22]
To illustrate the role of persistence consider the following picture, which plots E(w) versus persistence in the constant risk and value case for f=0.0001.
Increasing persistence is important but it exhibits decreasing marginal returns in the concave fashion illustrated above.
This result matches our intuitions. Because of its cumulative nature, the probability of avoiding extinction in the near-term is much higher than avoiding it long-term. That means that the value contributions to E(w), which also impact E(M), are much higher in the short term than in the long term, when they are heavily discounted by the probability of them taking place. So the marginal gains from increasing persistence are much higher in the short term than in the long term. In other words, for example, adding 1 year of persistence to a mitigation action whose effects last 1 year is much more valuable than adding 1 year of persistence to a mitigation action whose effects last 100 years. A general lesson follows: performing actions that have larger persistence is key, but increasing persistence is particularly valuable for low persistence values.
Concluding Remarks
This report is restricted in its scope and has a number of limitations. If there is enough value and interest in this type of work, our follow-up research could include:
a friendlier online platform with sliders and buttons to select and tweak the scenarios users want to visualise
explicit closed-form expressions for comparative statics, formulae that describe the impact of shifting key parameters on E(M)
explicit uncertainty analyses with Monte Carlo simulations where we graphically observe the importance of key parameters and different upper and lower bounds of E(M) according to a range of scenarios
more sophisticated treatments of persistence
discussions about option value and its role in thinking about existential risk mitigation
modelling efforts that improve value trajectory and could be competitive with extinction risk reduction
including partial catastrophes
formally exploring other events conceptually included in existential risk but not extinction risk
including population growth as a parameter that directly affects values
new scenarios, including explicit treatment of population growth and other non-human sentience
investigating value trajectories that feature negative value
With these limitations in mind, some points of caution about practical upshots include:
Depending on the parameters of exponential decay, and the time horizon, convergence under exponential decay risk can be misleading, check the Jupyter Notebook for full details.[23]
While the results here might help us arrive at better-informed expected value judgements, this report is not meant to settle questions about how to form an overarching view on the overall value of extinction risk mitigation. A lot more work is needed for that, for instance, our views on risk aversion could play an important role.
Readers should be careful with using the reports’ results to perform back-of-the-envelope calculations with new parameters in mind, and update your views by roughly deducting or adding some orders of magnitude. When possible, rerun the code instead.[24]
More broadly, while a more complex model like this one can certainly model things that were previously left out, we have so little data to fit it to that we should be especially cautious about over-updating from specific quantitative conclusions.
This report extended the model developed by Ord, Thorstad and Adamczewski. By enriching the base model, we were able to perform sensitivity analyses, observe convergence and can now better evaluate when extinction risk mitigation could, in expectation, be overwhelmingly valuable, and when it is comparable to or of lesser value than the alternatives. Crucially, we show that the value of extinction risk work varies considerably with different assumptions about the relevant risk and value scenarios. Insofar as we don’t have much confidence in any one scenario, we should form views that reflect this uncertainty and we shouldn’t have much confidence in any particular estimate of the value of risk mitigation efforts.
Previous work has referred to such a risk as ‘existential risk’. But this is a misnomer. Existential risk is technically broader and it encompasses another case: the risk of an event that drastically and permanently curtails the potential of humanity. For the rest of this report we characterise the risk as that of extinction where previous work has used ‘existential’.
The reasoning goes that if there is always a high level of background risk to humanity, then we should expect to go extinct soon anyway, which means the importance of avoiding any one particular risk is not as valuable as it may seem. For more details see the full report here.
In particular, Thorstad explores how, in this model, extinction risk pessimism fails to support and sometimes hinders the thesis that extinction risk mitigation is of astronomical value.
The models thus far centred around mitigating risk for one century only. Thorstad comments on one additional case: when risk is permanently mitigated, calling it ‘global risk reduction’.
We leave A4 untouched because it introduces diminishing returns in risk reduction (see more the details Adamczewski discusses), which we find realistic.
That said, the risk and value trajectories usually need adjusting when considering a different time unit. For more details see the section on adjustments on the full report here.
In its most general form, r′ could be any new risk vector that M has brought about. All there is left to evaluate the value of the action is to compute E(w′)−E(w).
Alternatively, an altruistic intervention could seek to improve the future by positively influencing the value trajectory; that is, by bringing about a better v′ rather than a new r′. Such actions, deserve a separate analysis.
So far we have been writing E(w) to abbreviate E(w(r,v,T)), where r,v and T are, respectively, the risk vector (sometimes termed ‘risk profile’), the value vector and the maximum number of periods in our universe, which could be infinite. Note that a different class of interventions might focus on increasing the value of the world from v=(v1,v2,...) to v′=(v1,v2,...), which would also result in negative value according to E(w)−E(w′). Exploring these is not within the scope of this report.
Here: vt is the value at time t, c is the cap value the vt can reach and s is the starting value at t=0. vc is a constant, normalised to 1 in all the simulations. More generally, we interpret vc as one year of value in 2023, which in human terms is roughly 8 billion people enjoying life at an average of 0.85QALYs each.
Other work, has considered exponential without a cap. There seem to be good reasons to posit a cap, however high, like the physical limits on how much matter is accessible to humans in our expanding universe.
The probability of dying each year that gives a 0.2 probability of dying over 100 years is approximately 0.00222894771 or 0.22%. To see why, consider the following binary outcomes model. Let p be the probability of dying in a given year. The implied probability of surviving for one year is 1−p. The probability of surviving for 100 years consecutively would be (1−p)100. Given that there’s a 0.2 probability of dying over 100 years, the probability of surviving the entire 100 years is 1−0.2=0.8. Thus, (1−p)100=0.8.
Numerical approximations of the expected value of M converge in this setting for large T so an infinite universe could be thought of as finite, without loss of generality. See the Convergence section for a discussion of convergence.
On the latter point, to calculate the actual difference that our efforts makes to the effects of persistence will require future work. For example, imagine you do an action, M, at t=1 that mitigates risk for the next 10 years. If you hadn’t done M, someone else would have taken that same action at t=5. How should we measure the persistence and value of M in this case? The treatment of ‘contingency’ here can help guide our thoughts.
Because of computational limits, the expected value calculation assumes a cap of 120 thousand years. This is more than long enough in most scenarios, where a T this large achieves the same behaviour as T→∞, but nuances arise in the exponential decay case, see the notebook for a thorough discussion of those.
The post was written by Arvo Muñoz Morán. Thank you to the members of the Worldview Investigations Team – David Bernard, Hayley Clatterbuck, Bob Fischer, Laura Duffy and Derek Shiller – Marcus Davis, Toby Ord, Elliott Thornley, Tom Houlden, Loren Fryxell, Lucy Hampton, Adam Binks, Jacob Peacock, Daniel Carey for helpful comments and discussions. The post is a project of Rethink Priorities, a global priority think-and-do tank, aiming to do good at scale. We research and implement pressing opportunities to make the world better. We act upon these opportunities by developing and implementing strategies, projects, and solutions to key issues. We do this work in close partnership with foundations and impact-focused non-profits or other entities. If you’re interested in Rethink Priorities’ work, please consider subscribing to our newsletter. You can explore our completed public work here.
How bad would human extinction be?
This post is a part of Rethink Priorities’ Worldview Investigations Team’s CURVE Sequence: “Causes and Uncertainty: Rethinking Value in Expectation.” The aim of this sequence is twofold: first, to consider alternatives to expected value maximisation for cause prioritisation; second, to evaluate the claim that a commitment to expected value maximisation robustly supports the conclusion that we ought to prioritise existential risk mitigation over all else.
Executive Summary
Background
This report builds on the model originally introduced by Toby Ord on how to estimate the value of existential risk mitigation.
The previous framework has several limitations, including:
The inability to model anything requiring shorter time units than centuries, like AI timelines.
A very limited range of scenarios considered. In the previous model, risk and value growth can take different forms, and each combination represents one scenario
No explicit treatment of persistence –– how long the mitigation efforts’ effects last for ––as a variable of interest.
No easy way to visualise and compare the differences between different possible scenarios.
No mathematical discussion of the convergence of the cumulative value of existential risk mitigation, as time goes to infinity, for all of the main scenarios.
This report addresses the limitations above by enriching the base model and relaxing its key stylised assumptions.
What this report offers
There are many possible risk structure and value trajectory combinations. This report explicitly considers 20 scenarios.
The report examines several plausible scenarios that were absent from the existing literature on the model, like:
decreasing risk (in particular exponentially decreasing) and Great Filters risk.
cubic and logistic value growth; both of which are widely used in adjacent literatures, so the report makes progress in consolidating the model with those approaches.
It offers key visual comparisons and illustrations of how risk mitigation efforts differ in value, like Figure 1 below.
The report is accompanied by an interactive Jupyter Notebook and a generalised mathematical framework that can, with minor input by the user, cope with any arbitrary value trajectory and risk profile they wish to investigate.
This acts as a uniquely versatile tool that can calculate and graph the expected value of risk mitigation.
The user can also adjust all the parameters in the 20 default scenarios.
Takeaways
In all 20 scenarios, the cumulative value of mitigation efforts converges to a finite number, as the time horizon goes to infinity.
This implies that it is not devoid of meaning to talk about the amount of long-term value obtained from mitigating risk, even in an infinitely long universe.
In this context, even if we assign any minuscule credence to one of the scenarios, it won’t overshadow the collective view.
It helps clarify what assumptions would be required for infinite value.
The report introduces the Great Filters Hypothesis:
It states that humanity will face a number of great filters, during which existential risk will be unusually high.
This hypothesis is a more general, and thus more plausible, version of what is commonly discussed under the name ‘Time of Perils’: the one filter case.
Persistence – the risk mitigation’s duration – plays a key role in our estimates, suggesting that work to investigate this role further, and to obtain better empirical estimates of different interventions’ persistence would be highly impactful. Other tentative lessons:
Interventions to increase persistence exhibit diminishing returns, and are most valuable for mitigation efforts exhibiting small persistence.
Great value requires relatively high persistence, and the latter could be implausible.
It is often assumed that, when considering long-term impact, existential risk mitigation is, in expectation, enormously valuable relative to other altruistic opportunities. There are a number of ways that could prove to be false. One possibility, which this report emphasises, is that the vast value of risk mitigation is only found in certain scenarios, each of which makes a whole host of assumptions.
The expected value of risk mitigation therefore strongly depends on our beliefs about these assumptions. And, depending on how we decide to aggregate our credences and which scenarios we allow for, astronomical value might be off the table after all.
Figure 1 (see header’s image): This is a visual representation of the estimated expected value of reducing existential risk by 0.01%. The image is to scale and one cubic unit is the size of the world under constant risk and constant value, the top-left scenario.
This abridged technical report is accompanied by an interactive Jupyter Notebook. The full report is available here.
Recommended: The PDF version of the abridged report can be accessed here.
Abridged Report
Introduction
Consider a catastrophe that permanently ends human civilisation.[1] You might find it plausible that any efforts to reduce the risk of such a catastrophe are of enormous value. You might also be inclined to think that the value is particularly high if the risks are high also. After all, in most contexts, the bigger the risk of something bad happening, the less it can be safely ignored. In other words, you might believe that it is of astronomical importance to mitigate these extinction risks because the stakes are very large and because the probability of these catastrophic scenarios is uncomfortably high. Existing work by Ord, Adamczewski and Thorstad (hereon ‘OAT’) argues that this last sentence is questionable: in the context of an extinction catastrophe, the higher we think the risk is, the less we should value efforts that mitigate that risk.[2]
Our initial intuitions are not always a good guide for how we should think about estimating the value of extinction risk mitigation. Indeed, the unexpected tensions between high pessimism about the risk we face and whether risk mitigation is of astronomical value, are a good example of this.[3] Similarly, simplified attempts and heuristics used to estimate the cost-effectiveness of risk reduction ---- such as those in 1, 2, 3, 4, 5 ---- turn out to only be appropriate in a handful of very restricted scenarios (usually where value and risk are constant in all the periods), and they otherwise mischaracterise the value of extinction risk mitigation.
If we want to evaluate the general merits of interventions that seek to safeguard humanity’s future, we need a systematic way to estimate the value of mitigating extinction risk. The current frameworks help us understand which scenarios might lead to astronomical value. However, they have several limitations that make it difficult, or sometimes impossible, to comment meaningfully on the amount of good that mitigating risk in the next few decades could achieve. This report builds on the existing models and provides tools to estimate the value of mitigating risk in more realistic settings.
The Base Model
As a first attempt to provide a more rigorous analysis, existing work presents a stylised model to assess the value of extinction risk mitigation given the following assumptions:
A1 Each century of human existence has some constant value.
A2 Humans face a constant level of per-century extinction risk.
A3 No value will be realised after an extinction catastrophe.
A4 Risk is reduced by a fraction.
A5 Risk is only reduced this century.
A6 Centuries are the shortest time units.
The model is clearly oversimplified, and, indeed, previous work has partially relaxed a subset of these six assumptions.[4] However, there are still several limitations present in those frameworks.
OAT Limitations
Some of the main limitations of the previous work include:
The current models lack the necessary resolution to yield results that are relevant for, or incorporate observations from, key issues like near-term AI timelines. The models cannot presently handle anything requiring shorter time units than centuries.
The duration of a mitigation action’s effects affects its overall value. However, OAT has not explored how varying the duration of these effects may impact the model.[5]
There are many possible scenarios (i.e., combinations of risk and value trajectories), and OAT has explored very few of these. Given our large uncertainty in this area, it is a priority to have a clear picture of how the value compares in each case. This will provide the necessary tools for future work that assigns credences to each scenario to arrive at better-informed expected value judgements.
There are currently no versatile frameworks that can calculate the expected value of mitigating risk, for a given set of idiosyncratic beliefs about risk and value trajectories.
As time goes to infinity, the expected value of existential risk mitigation could, in principle, be infinite; making most scenario comparisons redundant in those cases. There has been no formal discussion of the convergence of the value of extinction risk mitigation for all of the main scenarios.
Key Research Questions
The present report aims to tackle all of the above limitations. With that in mind, the key guiding questions are:
When is the value—of the future and of risk mitigation—particularly large and when is it not?
What is the Great Filter Hypothesis, how does it relate to the Time of Perils and what is the impact of adding great filters on the value of risk mitigation?
What are the qualitative pictures of the expected value of the world—and thus of mitigation efforts—given different risk structures (e.g. linear, Time of Perils, Great Filters, decaying) and value growth cases (e.g. linear, quadratic, cubic, logistic)?
How does the value of mitigation efforts depend on their persistence?
The main ambition here is to develop a generalised version of the toy model that relaxes all assumptions above, except for A3, no value after extinction, and A4, fractional risk reduction.[6][7] By relaxing A1 and A2 -- that the value and risk are constant—we are able to introduce a framework that can accommodate more complex risk structures and sophisticated value trajectories. We also depart from existing analyses by relaxing A6: here, years are the shortest time unit. Moreover, by also relaxing A5, the model now has tools to observe persistence of mitigation effects lasting less (or more) than one century and can meaningfully comment on the near-term value of extinction risk mitigation. Using this generalised framework, we can systematically assess the value of risk mitigation under various combinations of assumptions.
Generalised Model: Arbitrary Risk Profile
Let us consider the expected value of a world wthat faces an existential risk rt at time t. This is best observed with a picture.
At each period t the world ends with probability rt and all possible future value is reduced to zero. On the other hand, with probability (1−rt), the world progresses to the next period and achieves value vt, which is added to the total pool of value it had accrued. Figure 2 summarises all of this. The expected value is the value of each branch weighted by the probability of reaching that value. That is
E(w)=r1⋅0+(1−r1)v1+(1−r1)r2⋅0+(1−r1)(1−r2)v2+(1−r1)(1−r2)r3⋅0+(1−r1)(1−r2)(1−r3)v3+...In other words, the expected value of this world is
E(w)=(1−r1)v1+(1−r1)(1−r2)v2+(1−r1)(1−r2)(1−r3)v3+...=T∑t=1[(t∏j=1(1−rj))vt].[Equation 1]
where the maximum number of periods T is the age of the universe when it ends, and T→∞ when we assume an infinite universe. We do not impose that T→∞ or otherwise to give the flexibility to consider cases where there is some known, exogenous, end to the universe. Throughout this document, the length of a period will equal one year. However, the results are not tied to any particular interpretation of period length.[8]
Now consider a risk mitigation action M which reduces the original risk sequence from r to r′, where, for some t, r′t=(1−f)rt and f∈(0,1) is the fraction of the risk that is successfully mitigated.[9][10] What value have we added by performing action M? In the most basic sense, we have changed the expected value of the future by
E(M)=E(w′)−E(w),where our action modified the original risk from r in world w to r′ in w′.[11] More generally, we could allow f≤0, which would amount to increasing the risk and M would produce negative value (or none at all if f=0). For example, f<0 if M made a nuclear war more likely by contributing to political instability. For the rest of the report we focus on non-negative value.
Value
Denote v as v1,v2,v3,v4,... as the sequence of values that the world will follow, conditional on the world existing at time t. Estimating this sequence is no trivial undertaking. There is large uncertainty in this area and considerable research is needed for us to insert reasonable values into the sequence v. Given this uncertainty, a promising approach is to develop a more flexible framework, i.e. the generalised model above and its accompanying code in the Jupyter Notebook, that is versatile enough to handle a wide range of cases. Next, we will investigate several possible paths for value growth, in particular: constant, linear, quadratic, cubic and logistic.
Value Cases Summary
Here is a table summary of the main value cases this report will investigate.[12] When the time unit is years instead of centuries, the value is adjusted to reflect this (see the full report here for the details). Cubic has previously been adopted for modelling interplanetary expansion. Logistic can be thought of as ‘exponential with a value cap’, a model that has special economic relevance.[13]
Table 1: Summary of vt Cases
Here is a visual summary.
Persistence
Extinction risk mitigation actions could have effects that last different amounts of time. We may have reasons to believe that an action will reduce risks only for a few years; for example, passing a bill that restricts AI compute which is expected to be overturned after the next election cycle in 5 years. Other actions could last longer; for example, a shield in space that physically protects Earth from asteroid impact could be effective for thousands of years. Or, in the extreme case, an action could reduce extinction risk forever. In this report, we refer to the length of the mitigating effect of an action as its persistence.
Persistence is key in evaluating the value of an action M. In the Ord model, the persistence of M has been assumed to be of exactly one period (which equals one century in that setting). Thorstad proceeds with the same assumption and briefly considers the permanent case as well. Because persistence plays such an important role, we developed a more flexible framework where we allow persistence P to be anything between one period and permanently reducing risk, i.e. P∈Z+.
An investigation of persistence likely deserves a report of its own, both for a theoretical and empirical treatment of the issue. For now we will assume that M mitigates risk for P periods, without delay. We illustrate how results differ by presenting five representative cases: P=1,5,50,500,2000.
So, for example, if we had a risk profile of r=(0.5,0.5,0.2,0.4,0.1,0.2,...) and M acts at the first period with persistence P=3 and an efficacy of f=0.5, halving the risk, the profile then becomes: r′=(0.25,0.25,0.1,0.4,0.1,0.2,...).
A Concrete Example
There are too many cases for us to explicitly consider each one in the exposition of this report. Instead, they are systematically solved for and implemented in the code; so the user can see the results for any one desired scenario. However, it is pedagogically valuable to explicitly discuss one of these cases here.
Suppose that performing M halves the risk with a 5-year persistence. Let us also add some complexity to the risk structure, so it takes two constant values. Suppose that there is a 0.22% annual risk, which approximates a one in five chance of surviving the end of the century, under the assumption that it remains constant for the next 100 years.[14] Suppose that, for no particular reason, the annual risk after those 100 years is 0.01%.[15] That is r=(0.22%,0.22%,...,0.22%,0.01%,0.01%,...). Suppose, for this exercise, that this universe lasts 10,000 years.[16] We also normalise the value unit to vc=1. What is the value of performing M? It is
E(M)=E(w′)−E(w)≈28.6.It is worth roughly 28.6vc to perform M under these assumptions, where vc is this year’s value of our world.
The Rest of this Report
So far, we have thought about risk in the abstract. Indeed, what we have outlined is enough for us to evaluate any arbitrary risk and value structure that we may want to test. See the Jupyter Notebook to try this yourself.
However, there are specific risk structures that we might be especially interested in evaluating. We might be inclined to believe certain stories about risk; for example, that it will systematically decline (like in the Decaying Risk section). Alternatively, we might want to pay heed to the commonly held view that humanity is living in a particularly risky period now, but will reach a low-risk future if it overcomes the present challenges. The concrete example above is an instance of this, assuming constant value. Thorstad states this view, termed the ‘Time of Perils’ sis and discussed more thoroughly here, as:
We explore this type of risk structure next.
Great Filters and the Time of Perils Hypothesis
Humanity is potentially facing unprecedented threats from nuclear weapons, engineered pandemics and advanced artificial intelligence, among others. It may be that we are living in perilous times. If we do well, we might escape these dangers. But who’s to say that there will be no comparable challenges in the future? The perilous times might return.
The reasoning above introduces the notion of great filters: hurdles that our civilisation must pass to ensure its long-term longevity (Hanson, 1998).[17] Specific details as to what these filters might be are beyond this work. But if AI is the first filter, we could easily imagine future ones such as escaping our dying sun or meeting powerful and unfriendly alien life. The great filter hypothesis tells us:
It follows that, by construction, the Time of Perils hypothesis is the one filter version of GFH. For the purposes of this report, let us consider a stylised model of GFH where:
There are F filters (e.g. F=2).
There are 2F ‘eras’, sets of periods within which risk is constant. Filters are high-risk eras.
Filters and low-risk eras alternate, starting with a filter.
The length of each era is given by ℓ=(ℓ1,ℓ2,...,ℓ2F).
At each era i, humanity faces a per-period constant risk gi, and g denotes the vector (g1,g2,...g2F).
For example, suppose that we had F=2, such that there are two filters, with two lower-risk eras of lower risk after each of them. Suppose that g=(r1,rlow,r2,rlow), ℓ=(100,500,100,10100) and that value is constant. From this we could write the expected value of such a world as
E(w)=100∑t=1vc(1−r1)tFirst filter+(1−r1)100500∑t=1vc(1−rlow)tLow-risk era+(1−r1)100(1−rlow)500100∑t=1vc(1−r2)tSecond filter+(1−r1)100(1−rlow)500(1−r2)10010100∑t=1vc(1−rlow)tLow-risk era
Decaying Risk
Optimistically, we could live in a world where humanity is progressively getting better at surviving. One way of modelling this is with decreasing risk, and in particular, we can specify an exponentially decreasing function; where r∞∈[0,1) is the risk as t→∞ , λ∈(0,1) is the decay rate, t is the period, r(t) is the risk in period t and the starting risk is r0+r∞≈r0∈(0,1) for small r∞. For the first few periods, the sequence is approximately: r0, r0e−λ, r0e−2λ, r0e−3λ, … More generally,
r(t)=r0⋅e−λt+r∞.Risk Cases Summary
A graph summarising the main cases of interest can be found below.
Results
Convergence
As time goes to infinity, the expected value of existential risk mitigation could, in principle, be infinite. This would render comparing different estimates of E(M) redundant.[18] To investigate when this might happen, we turn our attention to convergence next.
We know that for any finite T, Equation 1 is bounded.[19] A key issue is whether the expected value of the world converges in an infinite universe. When T→∞, the series for the expected value of a world, E(w), as described in Equation 1, is given by the infinite sum
E(w)=∞∑t=1[(t∏j=1(1−rj))vt].For this kind of series, we can use the Ratio Test to evaluate its convergence. The Ratio Test states that for a series ∑∞n=1an, if there exists a limit L=limn→∞∣∣an+1an∣∣, then the series converges absolutely if L<1, diverges if L>1, and is inconclusive if L=1. To apply the Ratio Test to E(w), we look at consecutive terms of the series and their ratio.
L=limt→∞∣∣ ∣ ∣∣(∏t+1j=1(1−rj))vt+1(∏ij=1(1−rj))vt∣∣ ∣ ∣∣=limt→∞∣∣∣vt+1vt∣∣∣(1−rt+1).Recall that rt∈(0,1) for all i, so (1−ri) also lies within (0,1) for all i. Thus, if rt converges to a positive scalar, the exact risk level will not affect convergence. Instead, the convergence of the series E(w) critically depends on limt→∞∣∣vt+1vt∣∣. In particular, we find that this limit is less than or equal to 1 in our cases of interest, thus E(w) converges absolutely. The full details can be found in the report but as an example, consider the n-polynomial case, which is a more general version of all the cases, excepting logistic.
Consider the n-polynomial case vt=vctn. Then:
limt→∞∣∣∣vt+1vt∣∣∣=limt→∞(t+1)ntn=limt→∞(1+1t)n=1.Under logistic, limt→∞∣∣vt+1vt∣∣=1 also. Hence, in the context of the various scenarios we’ve explored, we are now ready to present the following result:
Proof. See the full report. An intuition: asymptotically, the probability of survival shrinks every period by a constant proportion, while value is either constant or increasing polynomially at a shrinking proportion. Therefore the expected value contribution for a distant enough t approximates zero. ◻
Maintain the assumption that the risk tends to any nonzero value. As an immediate consequence of the above proposition, we have:
Proof.
E(M)=E(w′)−E(w)and, by Proposition 1, both E(w′) and E(w) converge. ◻
These results, tell us that it is meaningful to talk about the long-term value of risk mitigation, even in the infinite universe case. Moreover, however great the value might be, it is simply not infinite. We estimate the exact size of this value next, in the Results section. It should be emphasised that the scope of Corollary 1 and Proposition 1 is the scenarios that this report considers, and not all the possible ways of modelling risk and value. For example, the proofs fail when the risk exponentially decays to zero, or when value grows exponentially without a cap.
The Expected Value of Mitigating Risk Visualised
First, we present Figure 5, a grid which summarises what the expected value of the future is, without the presence of risk mitigation efforts.
The first column indicates what value case we are on, the first row what risk case, and the middle plots display the cumulative E(w) as time passes for each risk and value combination. Notice that in all cases, E(w) converges as T→∞. This is only indirectly related to the Convergence section, which is about the convergence of E(M) and not the expected value of the future. For the middle plots, the horizontal axis displays the range from year zero (today), until year 140,000. For visibility, we display until year 100,000 for exponential decay instead. The vertical axis is different every time so that all graphs are clearly visible. For example, constant risk under linear value is in the thousands of vc and Two Great Filters under logistic value is in billions of vc, where vc is always normalised to one. The default parameters for these simulations can be found and modified in the Notebook.
Next, we plot E(w), with and without performing M for all twenty scenarios in Figure 6. We do this for a range of persistence levels and, for entirely pedagogical reasons, we assume an extreme efficacy of f=50% reduction in the risk from performing M.
In the grid above, to calculate E(M) for some specific case, we first take the dotted curve that tells us the expected value of the world after performing the action, all under a particular scenario and at certain persistence. Then, we subtract the baseline E(w) without mitigation, i.e. we subtract the solid blue curve from any one dotted curve.
When discussing the value and eventually the cost effectiveness of risk mitigation, a useful and more realistic efficacy f is one basis point: f=0.0001. Table 2 below shows E(M) for all the scenarios of interest.
Though we show it above, we are suspicious of long persistence, both because effects are blunted by political or technological changes and because, given enough time, some actor is likely to perform an action that achieves similar effects.[20]
Given the difference in orders of magnitude, it can be difficult to directly compare the figures in this table. To facilitate this, we display Figure 1: a visual representation of the estimated expected value of reducing existential risk by 0.01%.[21] The image is to scale and one cubic unit is the size of the world under constant risk and constant value, the top-left scenario. A persistence of 5 years is assumed.
For an extended discussion of these results see the full report. Here are some key takeaways:
How many orders of magnitude E(M) is under Time of Perils crucially depends on assumptions about value growth (it is 11 million times bigger under cubic value compared to constant).
For constant value, as we vary the assumed risk and persistence, E(M) stays within one order of magnitude above or below the median value in Table 2. For linear and quadratic it’s within two orders of magnitude.
Adding another filter keeps E(M) in the same order of magnitude, and only reduces it by about 25%, under the default parameters in the Notebook.
Given a fixed persistence, there’s still extreme variability: the minimum E(M) is roughly 8 orders of magnitude smaller than the maximum.
This extreme difference can be put succinctly: suppose that the units were meters travelled as you walk away from London Bridge. The smallest value implies you’d walk 17cm, about the length of a pencil. Whereas the largest means that you’d walk from London to Sydney.
The Role of Persistence
Two remarks seem worth making. First, that persistence plays a key role in the value of risk mitigation. For example, in Figure 7 below, depending on persistence E(M) can increase by up to 30 times. Second, we suggest an empirical hypothesis that persistence is unlikely to be higher than 50 years. The reasoning here is that there might be interventions that reduce risk a lot for not very long or not very much but for a long time. But actions that drastically reduce risk and do so for a long time are rare. Jointly these two remarks entail that the value of risk mitigation is between one ten-thousandth of a vc (under constant risk and value) and two billion vc (under cubic and time of perils assuming f is one basis point), a considerable range.[22]
To illustrate the role of persistence consider the following picture, which plots E(w) versus persistence in the constant risk and value case for f=0.0001.
Increasing persistence is important but it exhibits decreasing marginal returns in the concave fashion illustrated above.
This result matches our intuitions. Because of its cumulative nature, the probability of avoiding extinction in the near-term is much higher than avoiding it long-term. That means that the value contributions to E(w), which also impact E(M), are much higher in the short term than in the long term, when they are heavily discounted by the probability of them taking place. So the marginal gains from increasing persistence are much higher in the short term than in the long term. In other words, for example, adding 1 year of persistence to a mitigation action whose effects last 1 year is much more valuable than adding 1 year of persistence to a mitigation action whose effects last 100 years. A general lesson follows: performing actions that have larger persistence is key, but increasing persistence is particularly valuable for low persistence values.
Concluding Remarks
This report is restricted in its scope and has a number of limitations. If there is enough value and interest in this type of work, our follow-up research could include:
a friendlier online platform with sliders and buttons to select and tweak the scenarios users want to visualise
explicit closed-form expressions for comparative statics, formulae that describe the impact of shifting key parameters on E(M)
explicit uncertainty analyses with Monte Carlo simulations where we graphically observe the importance of key parameters and different upper and lower bounds of E(M) according to a range of scenarios
more sophisticated treatments of persistence
discussions about option value and its role in thinking about existential risk mitigation
modelling efforts that improve value trajectory and could be competitive with extinction risk reduction
including partial catastrophes
formally exploring other events conceptually included in existential risk but not extinction risk
including population growth as a parameter that directly affects values
new scenarios, including explicit treatment of population growth and other non-human sentience
investigating value trajectories that feature negative value
With these limitations in mind, some points of caution about practical upshots include:
Depending on the parameters of exponential decay, and the time horizon, convergence under exponential decay risk can be misleading, check the Jupyter Notebook for full details.[23]
While the results here might help us arrive at better-informed expected value judgements, this report is not meant to settle questions about how to form an overarching view on the overall value of extinction risk mitigation. A lot more work is needed for that, for instance, our views on risk aversion could play an important role.
Readers should be careful with using the reports’ results to perform back-of-the-envelope calculations with new parameters in mind, and update your views by roughly deducting or adding some orders of magnitude. When possible, rerun the code instead.[24]
More broadly, while a more complex model like this one can certainly model things that were previously left out, we have so little data to fit it to that we should be especially cautious about over-updating from specific quantitative conclusions.
This report extended the model developed by Ord, Thorstad and Adamczewski. By enriching the base model, we were able to perform sensitivity analyses, observe convergence and can now better evaluate when extinction risk mitigation could, in expectation, be overwhelmingly valuable, and when it is comparable to or of lesser value than the alternatives. Crucially, we show that the value of extinction risk work varies considerably with different assumptions about the relevant risk and value scenarios. Insofar as we don’t have much confidence in any one scenario, we should form views that reflect this uncertainty and we shouldn’t have much confidence in any particular estimate of the value of risk mitigation efforts.
Previous work has referred to such a risk as ‘existential risk’. But this is a misnomer. Existential risk is technically broader and it encompasses another case: the risk of an event that drastically and permanently curtails the potential of humanity. For the rest of this report we characterise the risk as that of extinction where previous work has used ‘existential’.
The reasoning goes that if there is always a high level of background risk to humanity, then we should expect to go extinct soon anyway, which means the importance of avoiding any one particular risk is not as valuable as it may seem. For more details see the full report here.
In particular, Thorstad explores how, in this model, extinction risk pessimism fails to support and sometimes hinders the thesis that extinction risk mitigation is of astronomical value.
For example, Thorstad relaxes each of the A1, A4 and A5 assumptions.
The models thus far centred around mitigating risk for one century only. Thorstad comments on one additional case: when risk is permanently mitigated, calling it ‘global risk reduction’.
We leave A4 untouched because it introduces diminishing returns in risk reduction (see more the details Adamczewski discusses), which we find realistic.
A3 is a core assumption in the extended and simplified versions of this model. Relaxing it would amount to changing the approach completely.
That said, the risk and value trajectories usually need adjusting when considering a different time unit. For more details see the section on adjustments on the full report here.
In its most general form, r′ could be any new risk vector that M has brought about. All there is left to evaluate the value of the action is to compute E(w′)−E(w).
Alternatively, an altruistic intervention could seek to improve the future by positively influencing the value trajectory; that is, by bringing about a better v′ rather than a new r′. Such actions, deserve a separate analysis.
So far we have been writing E(w) to abbreviate E(w(r,v,T)), where r,v and T are, respectively, the risk vector (sometimes termed ‘risk profile’), the value vector and the maximum number of periods in our universe, which could be infinite. Note that a different class of interventions might focus on increasing the value of the world from v=(v1,v2,...) to v′=(v1,v2,...), which would also result in negative value according to E(w)−E(w′). Exploring these is not within the scope of this report.
Here: vt is the value at time t, c is the cap value the vt can reach and s is the starting value at t=0. vc is a constant, normalised to 1 in all the simulations. More generally, we interpret vc as one year of value in 2023, which in human terms is roughly 8 billion people enjoying life at an average of 0.85QALYs each.
Other work, has considered exponential without a cap. There seem to be good reasons to posit a cap, however high, like the physical limits on how much matter is accessible to humans in our expanding universe.
The probability of dying each year that gives a 0.2 probability of dying over 100 years is approximately 0.00222894771 or 0.22%. To see why, consider the following binary outcomes model. Let p be the probability of dying in a given year. The implied probability of surviving for one year is 1−p. The probability of surviving for 100 years consecutively would be (1−p)100. Given that there’s a 0.2 probability of dying over 100 years, the probability of surviving the entire 100 years is 1−0.2=0.8. Thus, (1−p)100=0.8.
Which is congruent with a (1−0.0001)100≈0.99004933869 probability of surviving each century.
Numerical approximations of the expected value of M converge in this setting for large T so an infinite universe could be thought of as finite, without loss of generality. See the Convergence section for a discussion of convergence.
An excellent informal introduction to great filters can be found here.
Tentatively, ordering infinite cardinalities could be a good option in those cases.
For example by T⋅maxvt{v1,v2,...vT}.
On the latter point, to calculate the actual difference that our efforts makes to the effects of persistence will require future work. For example, imagine you do an action, M, at t=1 that mitigates risk for the next 10 years. If you hadn’t done M, someone else would have taken that same action at t=5. How should we measure the persistence and value of M in this case? The treatment of ‘contingency’ here can help guide our thoughts.
Because of computational limits, the expected value calculation assumes a cap of 120 thousand years. This is more than long enough in most scenarios, where a T this large achieves the same behaviour as T→∞, but nuances arise in the exponential decay case, see the notebook for a thorough discussion of those.
Recall the previous footnote defining vc.
In particular, Figure 1′s exponential decay values were approximated using the first 100,000 years.
I’m happy to help with this.
Acknowledgements
The post was written by Arvo Muñoz Morán. Thank you to the members of the Worldview Investigations Team – David Bernard, Hayley Clatterbuck, Bob Fischer, Laura Duffy and Derek Shiller – Marcus Davis, Toby Ord, Elliott Thornley, Tom Houlden, Loren Fryxell, Lucy Hampton, Adam Binks, Jacob Peacock, Daniel Carey for helpful comments and discussions. The post is a project of Rethink Priorities, a global priority think-and-do tank, aiming to do good at scale. We research and implement pressing opportunities to make the world better. We act upon these opportunities by developing and implementing strategies, projects, and solutions to key issues. We do this work in close partnership with foundations and impact-focused non-profits or other entities. If you’re interested in Rethink Priorities’ work, please consider subscribing to our newsletter. You can explore our completed public work here.