For clarity’s sake, the definition I was using is as follows:
Existential risk – One where an adverse outcome would either annihilate Earth-originating intelligent life or permanently and drastically curtail its potential.
Perhaps the issue here is the definition of “drastic.” Ironically I too have complained about the imprecision of this definition before. If I was the czar-in-charge-of-renaming-things-in-EA, I’d probably define an existential risk as:
Existential risk – One where an adverse outcome would either annihilate Earth-originating intelligent life or permanently curtail its potential to <90% of what an optimal future would look like.
Of course, I’m not the czar-in-charge-of-renaming-things-in-EA , so here we are.
EDIT: I am the czar-in-charge-of-renaming-things-in-my-own EA Forum question, so I’ll rephrase.
I think one problem with your re-definition (that makes it imperfect IMO also) is apparent when thinking about the following questions: How likely is it that Earth-originating intelligent life eventually reaches >90% of its potential? How likely is it that it eventually reaches >0.1% of its potential? >0.0001% of its potential? >10^20 the total value of the conscious experiences of all humans with net-positive lives during the year 2020? My answers to these questions increase with each respective question, and my answer to the last question is several times higher than my answer to the first question.
Our cosmic potential is potentially extremely large, and there are many possible “very long-lasting, positive futures” (to use Holden’s language from here) that seem “extremely good” from our limited perspective today (e.g. the futures we imagine when we read Bostrom’s Letter from Utopia). But these futures potentially differ in value tremendously.
Okay, I just saw Zach’s comment that he thinks value is roughly binary. I currently don’t think I agree with him (see his first paragraph at that link, and my reply clarifying my view). Maybe my view is unusual?
I’d be interested in seeing operationalizations at some subset of {1%, 10%, 50%, 90, 99%}.* I can imagine that most safety researchers will give nearly identical answers to all of them, but I can also imagine that large divergences, so decent value of information here.
I’d give similar answers for all 5 of those questions because I think most of the “existential catastrophes” (defined vaguely) involve wiping out >>99% of our potential (e.g. extinction this century before value/time increases substantially). But my independent impression is that there are a lot of “extremely good” outcomes in which we have a very long-lasting, positive future with value/year much, much greater than the value per year on Earth today, that also falls >99% short of our potential (and even >99.9999% of our potential).
Great point. Ideally “existential risk” should be an entirely empirical thing that we can talk about independent of our values / moral beliefs about what future is optimal.
This is impossible if you consider “unrecoverable dystopia”, “stable totalitarianism” etc as existential risks, as these things are implicitly values judgments.
Though I’m open to the idea that we should maybe talk about extinction risks instead of existential risks instead, given that this is empirically most of what xrisk people work on.
(Though I think some AI risk people think, as an empirical matter, that some AI catastrophes would entail humanity surviving while completely losing control for the lightcone, and both they and I would consider this basically as bad as all of our descendants dying).
A risk of catastrophe where an adverse outcome would permanently cause Earth-originating intelligent life’s astronomical value to be <50% of what it would otherwise be capable of.
I’m not a fan of this definition, because I find it very plausible that the expected value of the future is less than 50% of what humanity is capable of. Which e.g. raises the question: does even extinction fulfil the description? Maybe you could argue “yes”: but the mix of causing an actual outcome compared with what intelligent life is “capable of” makes all of this unnecessarily dependant on both definitions and empirics about the future.
For purposes of the original question, I don’t think we need to deal with all the complexity around “curtailing potential”. You can just ask: How much should a funder be willing to pay to remove an 0.01% risk of extinction that’s independent from all other extinction risks we’re facing. (Eg., a giganormous asteroid is on its way to Earth and has an 0.01% probability of hitting us, causing guaranteed extinction. No on else will notice this in time. Do we pay $X to redirect it?)
This seems closely analogous to questions that funders are facing (are we keen to pay to slightly reduce one, contemporary extinction risk). For non-extinction x-risk reduction, this extinction-estimate will be informative as a comparison point, and it seems completely appropriate that you should also check “how bad is this purported x-risk compared to extinction” as a separate exercise.
Seems better than the previous one, though imo still worse than my suggestion, for 3 reasons:
it’s more complex than asking about immediate extinction. (Why exactly 100 year cutoff? why 50%?)
since the definition explicitly allows for different x-risks to be differently bad, the amount you’d pay to reduce them would vary depending on the x-risk. So the question is underspecified.
The independence assumption is better if funders often face opportunities to reduce a Y%-risk that’s roughly independent from most other x-risk this century. Your suggestion is better if funders often face opportunities to reduce Y percentage points of all x-risk this century (e.g. if all risks are completely disjunctive, s.t. if you remove a risk, you’re guaranteed to not be hit by any other risk).
For your two examples, the risks from asteroids and climate change are mostly independent from the majority of x-risk this century, so there the independence assumption is better.
The disjunctive assumption can happen if we e.g. study different mutually exclusive cases, e.g. reducing risk from worlds with fast AI take-off vs reducing risk from worlds with slow AI take-off.
I weakly think that the former is more common.
(Note that the difference only matters if total x-risk this century is large.)
Edit: This is all about what version of this question is the best version, independent of inertia. If you’re attached to percentage points because you don’t want to change to an independence assumption after there’s already been some discussion on the post, then this your latest suggestion seems good enough. (Though I think most people have been assuming low total amount of x-risk, so probably independence or not doesn’t matter that much for the existing discussion.)
For clarity’s sake, the definition I was using is as follows:
Perhaps the issue here is the definition of “drastic.” Ironically I too have complained about the imprecision of this definition before. If I was the czar-in-charge-of-renaming-things-in-EA, I’d probably define an existential risk as:
Of course, I’m not the czar-in-charge-of-renaming-things-in-EA , so here we are.
EDIT: I am the czar-in-charge-of-renaming-things-in-my-own EA Forum question, so I’ll rephrase.
I think one problem with your re-definition (that makes it imperfect IMO also) is apparent when thinking about the following questions: How likely is it that Earth-originating intelligent life eventually reaches >90% of its potential? How likely is it that it eventually reaches >0.1% of its potential? >0.0001% of its potential? >10^20 the total value of the conscious experiences of all humans with net-positive lives during the year 2020? My answers to these questions increase with each respective question, and my answer to the last question is several times higher than my answer to the first question.
Our cosmic potential is potentially extremely large, and there are many possible “very long-lasting, positive futures” (to use Holden’s language from here) that seem “extremely good” from our limited perspective today (e.g. the futures we imagine when we read Bostrom’s Letter from Utopia). But these futures potentially differ in value tremendously.
Okay, I just saw Zach’s comment that he thinks value is roughly binary. I currently don’t think I agree with him (see his first paragraph at that link, and my reply clarifying my view). Maybe my view is unusual?
Linch here:
I’d give similar answers for all 5 of those questions because I think most of the “existential catastrophes” (defined vaguely) involve wiping out >>99% of our potential (e.g. extinction this century before value/time increases substantially). But my independent impression is that there are a lot of “extremely good” outcomes in which we have a very long-lasting, positive future with value/year much, much greater than the value per year on Earth today, that also falls >99% short of our potential (and even >99.9999% of our potential).
It’s also possible that different people have different views of what “humanity’s potential” really means!
Great point. Ideally “existential risk” should be an entirely empirical thing that we can talk about independent of our values / moral beliefs about what future is optimal.
This is impossible if you consider “unrecoverable dystopia”, “stable totalitarianism” etc as existential risks, as these things are implicitly values judgments.
Though I’m open to the idea that we should maybe talk about extinction risks instead of existential risks instead, given that this is empirically most of what xrisk people work on.
(Though I think some AI risk people think, as an empirical matter, that some AI catastrophes would entail humanity surviving while completely losing control for the lightcone, and both they and I would consider this basically as bad as all of our descendants dying).
Currently, the post says:
I’m not a fan of this definition, because I find it very plausible that the expected value of the future is less than 50% of what humanity is capable of. Which e.g. raises the question: does even extinction fulfil the description? Maybe you could argue “yes”: but the mix of causing an actual outcome compared with what intelligent life is “capable of” makes all of this unnecessarily dependant on both definitions and empirics about the future.
For purposes of the original question, I don’t think we need to deal with all the complexity around “curtailing potential”. You can just ask: How much should a funder be willing to pay to remove an 0.01% risk of extinction that’s independent from all other extinction risks we’re facing. (Eg., a giganormous asteroid is on its way to Earth and has an 0.01% probability of hitting us, causing guaranteed extinction. No on else will notice this in time. Do we pay $X to redirect it?)
This seems closely analogous to questions that funders are facing (are we keen to pay to slightly reduce one, contemporary extinction risk). For non-extinction x-risk reduction, this extinction-estimate will be informative as a comparison point, and it seems completely appropriate that you should also check “how bad is this purported x-risk compared to extinction” as a separate exercise.
How do people feel about a proposed new definition:
Seems better than the previous one, though imo still worse than my suggestion, for 3 reasons:
it’s more complex than asking about immediate extinction. (Why exactly 100 year cutoff? why 50%?)
since the definition explicitly allows for different x-risks to be differently bad, the amount you’d pay to reduce them would vary depending on the x-risk. So the question is underspecified.
The independence assumption is better if funders often face opportunities to reduce a Y%-risk that’s roughly independent from most other x-risk this century. Your suggestion is better if funders often face opportunities to reduce Y percentage points of all x-risk this century (e.g. if all risks are completely disjunctive, s.t. if you remove a risk, you’re guaranteed to not be hit by any other risk).
For your two examples, the risks from asteroids and climate change are mostly independent from the majority of x-risk this century, so there the independence assumption is better.
The disjunctive assumption can happen if we e.g. study different mutually exclusive cases, e.g. reducing risk from worlds with fast AI take-off vs reducing risk from worlds with slow AI take-off.
I weakly think that the former is more common.
(Note that the difference only matters if total x-risk this century is large.)
Edit: This is all about what version of this question is the best version, independent of inertia. If you’re attached to percentage points because you don’t want to change to an independence assumption after there’s already been some discussion on the post, then this your latest suggestion seems good enough. (Though I think most people have been assuming low total amount of x-risk, so probably independence or not doesn’t matter that much for the existing discussion.)