Conflicting Effects of Existential Risk Mitigation Interventions

Introduction

There are multiple types of existential catastrophes, and different risk mitigation interventions often address only certain types. Some interventions mitigate only human extinction risk, others decrease risk to all life, and others have an impact on several types. Interventions that decrease human extinction risk increase the time during which human-caused astronomical suffering could plausibly occur, thereby increasing suffering risk. That is, there may be an offsetting effect that decreases the expected positive impact of an intervention, or even turns it negative.

The question we are bringing up is as follows: are interventions that decrease only or primarily human extinction risk bad because decreasing human extinction risk increases s-risk?

There are Different Types of Existential Catastrophes

Human Extinction

In this scenario, all humans die. One negative consequence of human extinction is that we would no longer be able to improve life for wild animals. Without us on the planet, they would presumably continue to exist in some form for a long time, which could constitute extreme suffering. From this scenario, however, a non-human animal could evolve human-level intelligence, and open up old s-risks. There could also be intelligent life that humans are unaware of elsewhere in the universe, which leaves open a route to astronomical suffering.

Animal Extinction

This scenario would involve the elimination of all sentient life. If all of animalia were to go extinct, the probability of another life form on earth developing human-level intelligence would decrease even further, since they would have to evolve from plants. This scenario would eliminate extreme suffering at least until sentient life evolved again.

Life Extinction

In this scenario, all life is eliminated, and it would be difficult for sentient life to return.

Extreme Suffering

Some scenarios (such as the one we’re currently in) contain suffering that is extreme but not astronomical. Examples of this type of suffering include the experiences of many farmed and wild animals.

Astronomical Suffering

LessWrong defines astronomical suffering as “the creation of intense suffering in the far future on an astronomical scale, vastly exceeding all suffering that has existed on Earth so far.” This scenario would be more likely to occur with digital sentience, artificial intelligence, intergovernmental conflict, and interplanetary life.

Different Interventions Have Different Effects on Different Types of Existential Risk

Pandemic Shelters

Pandemic shelters are one proposed intervention in the biosecurity space to mitigate human extinction risk. In the case of a worldwide pandemic, a group of people will be able to retreat to a well-stocked bunker with advanced ventilation and reemerge once the threat is gone. Depending on the implementation of the shelter and the type of virus that spreads, pandemic shelters could mitigate human, animal, or life extinction risk.

Understanding S-Risk

Research on suffering risks, such as work done by the Center on Long-Term Risk and the Center for Reducing Suffering, probably decreases s-risk much more than extinction risk. The extent to which it does decrease extinction risk probably does not cause a net increase in s-risk.

AI Alignment

This category includes AI alignment work, which seems to mitigate both extinction risk and s-risk from AI. Insofar as it mitigates extinction risk, however, it increases s-risk. The magnitude of each effect is unclear. Solving the alignment problem, however, would substantially decrease s-risk.

Strengthening Institutions

This type of work also seems to mitigate both extinction risk and s-risk. Depending on the specific project, it may increase collaboration and coordination between organizations, prevent undemocratic forms of government, and build trust. Such work could plausibly decrease extinction risk from nuclear war, for example. It could also strengthen respect for human rights and mitigate agential s-risks.

Conclusion

When determining the expected impact of a given intervention, it’s important to consider offsetting effects. Longtermists searching for robustly good interventions that perform well under many moral frameworks should take into consideration conflicting effects on human extinction risk and s-risk. We would like to note that the primary purpose of this piece is to bring the conflict to attention, not to adjudicate the ultimate value of any given intervention. Adding this consideration may also warrant updating one’s perception of the differences in impact between interventions downward.

We look forward to reading your feedback in the comments.

Complex Model

We have included a rough Discrete Time Markov Decision Process model that tries to lay out our thinking in more detail, but it has not been fully fleshed out.

Let M be a Discrete Time Markov Decision Process (S, A, P, R) modeling an individual decision maker where each element is defined in the following manner:

S := {s | s in {n,e,s}} where n: “neither extinction event nor astronomical suffering event occurs”; e: “extinction event occurs”; s: “astronomical suffering event occurs”.

A := {a | a in {n,e,s}} where n: “working on neither extinction event nor astronomical suffering risks”; e: “working on decreasing the probability of extinction events”; s: “working on decreasing the probability of astronomical sufferent events”.

P := (s, a,s’) → [0,1] such that sum(P(s,a,s’) over s’) = 1 for each tuple (s,a) and P(s,a,s’) corresponds to “the probability of being in state s, taking action a, and transitioning to state s’”. For example, P(n,e,e) corresponds to “the probability of being in a world where neither an extinction event nor an astronomical suffering event has occurred, working on existential risk mitigation, and transition to an extinction event.”

R := (s) → R is the reward function mapping being in each state to a numeric (moral) value. For example, R(n) is the moral value of being in a world where neither an extinction event nor an astronomical suffering event has occurred.

We want to find a policy that maximizes expected reward.

Our intuitions are as follows:

  1. R(n) is positive and perhaps (but not necessarily) unbounded.

    1. That is, the moral value of being in a world with neither extinction events nor astronomical suffering events is net positive with respect to being in a world with an extinction event.

  2. R(e) is 0.

    1. That is, the moral value for being in an extinction state is somewhere between being in a world with neither an extinction event nor an astronomical event occurring and a world with an astronomical suffering event. To facilitate analysis, we assign R(e) a value of 0.

  3. R(s) is negative and perhaps (but not necessarily) unbounded.

    1. That is, the moral value of being in a world with an astronomical suffering event is net negative with respect to being in a world with an extinction event.

  4. |R(s)| >> |R(n)|

    1. That is, it is much worse to be in a world with an astronomical suffering event than it is better to be in a world with neither an astronomical suffering event nor an extinction event with respect to a world with an extinction event.

  5. P(0,1,2) = P(0,0,2)

    1. That is, the probability of transitioning to an astronomical suffering scenario given that one works on decreasing the transition probability of extinction risks is equal to the probability of transition to an astronomical suffering scenario given that one works on neither decreasing extinction risks nor decreasing astronomical suffering risks.

  6. The probability over the time horizon of an astronomical suffering event occurring is larger given that you’re working on extinction risk mitigation compared to not working on extinction risk mitigation.

  7. P(e,{n,e,s}, e) = 1

    1. That is, s = e is an absorbing state. Informally, once a transition to an extinction event scenario occurs, it is impossible to transition to an astronomical suffering scenario or a scenario with neither an extinction event nor an astronomical suffering event.

  8. P(s,{n,e,s}, s) = 1

    1. That is, s = s is an absorbing state. Informally, once a transition to an astronomical sufferent event occurs, it is impossible to transition to an extinction event scenario or a scenario with neither an extinction event nor an astronomical suffering event.

  9. A static policy is chosen at time t = 0.

    1. That is, once a policy is chosen it is not changed over the time horizon. We make this assumption to facilitate analysis.

Suppose the MDP starts at t = 0. Note that all probabilities are fictitious and the analysis focuses on each policy’s expected value under different scenarios with the hyperparameters of c and the coefficients in transition probabilities.

First, definite the probabilities for the policy of taking action a = n in state s = s, where:

  1. P(n,n,e) := 20c.

    1. That is, the probability of transitioning from a world where neither an extinction event nor an astronomical suffering event occurs to a world where an extinction event occurs given one is working on neither extinction risk mitigation nor astronomical suffering risk mitigation is 20c.

  2. P(n,n,s) := c.

    1. That is, the probability of transitioning from a world where neither an extinction event nor an astronomical suffering event occurs to a world where an astronomical suffering event occurs given one is working on neither extinction risk mitigation nor astronomical suffering risk mitigation is c.

  3. P(n,n,n) := p_nn = 1 − 21c.

    1. That is, the probability of transitioning from a world where neither an extinction event nor an astronomical suffering event occurs to a world where neither an extinction event nor an astronomical suffering event occurs given one is working on neither extinction risk mitigation nor astronomical suffering risk mitigation is 1 − 21c.

Second, define the probabilities for the policy of taking action a = e in state s = n, where:

  1. P(n,e,e) := 10c.

    1. That is, the probability of transitioning from a world where neither an extinction event nor an astronomical suffering event occurs to a world where an extinction event occurs given one is working on extinction risk mitigation is 10c. Note that this value is less than P(n,n,e).

  2. P(n,e,s) := c.

    1. That is, the probability of transitioning from a world where neither an extinction event nor an astronomical suffering event occurs to a world where an astronomical suffering event occurs given one is working on extinction risk mitigation is c.

  3. P(n,e,n) := p_nn = 1 − 11c.

    1. That is, the probability of transitioning from a world where neither an extinction event nor an astronomical suffering event occurs to a world where neither an extinction event nor an astronomical suffering event occurs given one is working on extinction risk mitigation is 1 − 11c.

Third, define the policy of taking action a = s in state s = n, where:

  1. P(n,s,e) := 20c.

    1. That is, the probability of transitioning from a world where neither an extinction event nor an astronomical suffering event occurs to a world where an extinction event occurs given one is working on astronomical suffering risk mitigation is 20c.

  2. P(n,s,s) := 0.5c.

    1. That is, the probability of transitioning from a world where neither an extinction event nor an astronomical suffering event occurs to a world where an astronomical suffering event occurs given one is working on astronomical suffering risk mitigation is 0.5c.

  3. P(n,s,n) := p_nn = 1 − 20.5c.

    1. That is, the probability of transitioning from a world where neither an extinction event nor an astronomical suffering event occurs to a world where neither an extinction event nor an astronomical suffering event occurs given one is working on astronomical suffering risk mitigation is 1 − 20.5c.

For the first policy, the expected value of the MDP M at time t is E[M_t]= (20c) • 0 + (c) • (t • R(2)) + (1 − 21c) [ (20c) • R(0) + (c) • ((t-1) • R(2)) + (1 − 21c) [ •••]], which equals t •p_02 •R(2) + (t-1) • p_00 • p_02 •R(2) + … + (p_00)^(t-1)• p_02 •R(2) + (p_00) •(p_01) •R(0) + 2 • (p_00)^(2)(p_01)R(0) + … + (t-1) • (p_00)^(t-1)• (p_01)• R(0) + (p_00)^(t)• (t)• R(0).

Note that the expected value is similar to the sum of two sequences of geometric random variables with termination probabilities of the extinction risk and astronomical suffering transitions at different time steps and one geometric series for neither extinction risk nor astronomical suffering risk occurring across the entire time horizon.

Since the transition probabilities are a partition, working on solely astronomical suffering risk mitigation or extinction risk has a net positive effect on the expected value only if decreasing one of transition probabilities is not counteracted by increases in the other either at a single time step or across the entire time horizon. A similar analysis holds for the second and third policy, which are analyzed below.

For the second policy, the expected value of the MDP M at time t is E[M_t]= (10c) • 0 + (c) • (t • R(2)) + (1 − 11c) [ (10c) • R(0) + (c) • ((t-1) • R(2)) + (1 − 11c) [ •••]], which equals t •p_12 •R(2) + (t-1) • p_10 • p_12 •R(2) + … + (p_10)^(t-1)• p_12 •R(2) + (p_10) •(p_11) •R(0) + 2 • (p_10)^(2)(p_11)R(0) + … + (t-1) • (p_10)^(t-1)• (p_11)• R(0) + (p_10)^(t)• (t)• R(0).

For the third policy, the expected value of the MDP M at time t is E[M_t]= (20c) • 0 + (0.5c) • (t • R(2)) + (1 − 20.5c) [ (20c) • R(0) + (0.5c) • ((t-1) • R(2)) + (1 − 20.5c) [ •••]], which equals t •p_22 •R(2) + (t-1) • p_20 • p_22 •R(2) + … + (p_20)^(t-1)• p_22 •R(2) + (p_20) •(p_21) •R(0) + 2 • (p_20)^(2)(p_21)R(0) + … + (t-1) • (p_20)^(t-1)• (p_21)• R(0) + (p_20)^(t)• (t)• R(0).

If our intuition |R(2)| >> |R(0)| is correct, then the suffering risk sequence of geometric series t •p_a2 •R(2) + (t-1) • p_a0 • p_a2 •R(2) + … + (p_a0)^(t-1)• p_a2 •R(2) dominates for each policy.

In short, under this moral framework a policy of working on extinction risks does not alleviate s-risks, which dominate extinction risks under expected value calculations.

Modeling Limitations

Analytically calculating the exact expected value of each policy and rigorously proving the dominance of the suffering risk series are clear and helpful next steps that the model does not take. Furthermore, a sensitivity analysis regarding assumptions including but not limited to (1) work on extinction risks does not mitigate suffering risks, (2) vice versa, and (3) transition probabilities are stationary with respect to t (i.e. the Markov property) would provide a more accurate and more precise model that could be tuned to provide insight with respect to each reader’s intuitions (or assumptions, cruxes, confidences, confidence intervals, etc.).

No comments.