Point of clarification: Is this like the world had a 3% chance of ending and now the world has a 2.99% of ending? For what length of time? Are we assuming perfect knowledge?
Is this like the world had a 3% chance of ending and now the world has a 2.99% of ending?
Yes, though I would count existential but non-extinction risks (eg permanent dictatorships) as well.
For what length of time?
Hmm let’s say we’re looking at existential risks in the next 100 years to make it comparable to Ord’s book.
Are we assuming perfect knowledge?
I would prefer “at the quality of models we currently or in the near future will have access to”, but we can assume perfect knowledge if that makes things conceptually easier or if you believe in intrinsic discounting for uncertainty.
I agree that it makes much more sense to estimate x-risk on a timescale of 100 years (as I said in the sidenote of my answer), but I think you should specify that in the question, because “How many EA 2021 $s would you trade off against a 0.01% chance of existential catastrophe?” together with your definition of x-risk, implies taking the whole future of humanity into account. I think it may make sense to explicitly only talk about the risk of existential catastrophe in this or in the next couple of centuries.
Lots of people have different disagreements about how to word this question. I feel like I should pass on editing the question even further, especially given that I don’t think it’s likely to change people’s answers too much.
let’s say we’re looking at existential risks in the next 100 years
This works out fine for me, since for empirical reasons, I place overwhelming probability on the conjunction of “existential catastrophe by 2121” and “long-term future looks very likely to be excellent in 2121.” But insofar as we’re using existential-risk-reduction as a proxy for good-accomplished, I think that 1 basis point should be worth something more like 1⁄10,000 of the transition from a catastrophic future to a near-optimal future. Those are units that we should fundamentally care about maximizing; we don’t fundamentally care about maximizing no-catastrophe-before-2121. (And then we can make comparisons to interventions that don’t affect X-risk by saying what fraction of a transition from catastrophe to excellent they are worth. I guess I’m saying we should think in terms of something equivalent to utilons, where utilons are whatever we ought to increase, since optimizing for anything not equivalent to them is by definition optimizing imprecisely.)
But insofar as we’re using existential-risk-reduction as a proxy for good-accomplished, I think that 1 basis point should be worth something more like 1⁄10,000 of the transition from a catastrophic future to a near-optimal future.
I agree, I think this is what existential risks meansby definition. But I appreciate the clarification regardless!
As I think you agree, the optimal future can be extremely good and exotic.
It seems that in your reply to Zach, you are saying that x risk reduction is moving us to this future by definition. However, I can’t immediately find this definition in the link you provided.
(There might be some fuzzy thinking below, I am just typing quickly.)
Removing the risk of AI or nanobots, and just keeping humans “in the game” in 2121, just as we are in 2021, is valuable, but I don’t think this is the same as moving us to the awesome future.
I think moving us 1⁄10,000 to the awesome future could be a really strong statement.
Well put. I most often find it useful to think in terms of awesome future vs the alternative, but this isn’t the default definition of existential risk, and certainly not of existential-catastrophe-by-2121.
For empirical reasons, I agree because value is roughly binary. But by definition, existential risk reduction could involve moving from catastrophic futures to futures in which we realize a significant fraction of our potential but the future is not near-optimal.
For clarity’s sake, the definition I was using is as follows:
Existential risk – One where an adverse outcome would either annihilate Earth-originating intelligent life or permanently and drastically curtail its potential.
Perhaps the issue here is the definition of “drastic.” Ironically I too have complained about the imprecision of this definition before. If I was the czar-in-charge-of-renaming-things-in-EA, I’d probably define an existential risk as:
Existential risk – One where an adverse outcome would either annihilate Earth-originating intelligent life or permanently curtail its potential to <90% of what an optimal future would look like.
Of course, I’m not the czar-in-charge-of-renaming-things-in-EA , so here we are.
EDIT: I am the czar-in-charge-of-renaming-things-in-my-own EA Forum question, so I’ll rephrase.
I think one problem with your re-definition (that makes it imperfect IMO also) is apparent when thinking about the following questions: How likely is it that Earth-originating intelligent life eventually reaches >90% of its potential? How likely is it that it eventually reaches >0.1% of its potential? >0.0001% of its potential? >10^20 the total value of the conscious experiences of all humans with net-positive lives during the year 2020? My answers to these questions increase with each respective question, and my answer to the last question is several times higher than my answer to the first question.
Our cosmic potential is potentially extremely large, and there are many possible “very long-lasting, positive futures” (to use Holden’s language from here) that seem “extremely good” from our limited perspective today (e.g. the futures we imagine when we read Bostrom’s Letter from Utopia). But these futures potentially differ in value tremendously.
Okay, I just saw Zach’s comment that he thinks value is roughly binary. I currently don’t think I agree with him (see his first paragraph at that link, and my reply clarifying my view). Maybe my view is unusual?
I’d be interested in seeing operationalizations at some subset of {1%, 10%, 50%, 90, 99%}.* I can imagine that most safety researchers will give nearly identical answers to all of them, but I can also imagine that large divergences, so decent value of information here.
I’d give similar answers for all 5 of those questions because I think most of the “existential catastrophes” (defined vaguely) involve wiping out >>99% of our potential (e.g. extinction this century before value/time increases substantially). But my independent impression is that there are a lot of “extremely good” outcomes in which we have a very long-lasting, positive future with value/year much, much greater than the value per year on Earth today, that also falls >99% short of our potential (and even >99.9999% of our potential).
Great point. Ideally “existential risk” should be an entirely empirical thing that we can talk about independent of our values / moral beliefs about what future is optimal.
This is impossible if you consider “unrecoverable dystopia”, “stable totalitarianism” etc as existential risks, as these things are implicitly values judgments.
Though I’m open to the idea that we should maybe talk about extinction risks instead of existential risks instead, given that this is empirically most of what xrisk people work on.
(Though I think some AI risk people think, as an empirical matter, that some AI catastrophes would entail humanity surviving while completely losing control for the lightcone, and both they and I would consider this basically as bad as all of our descendants dying).
A risk of catastrophe where an adverse outcome would permanently cause Earth-originating intelligent life’s astronomical value to be <50% of what it would otherwise be capable of.
I’m not a fan of this definition, because I find it very plausible that the expected value of the future is less than 50% of what humanity is capable of. Which e.g. raises the question: does even extinction fulfil the description? Maybe you could argue “yes”: but the mix of causing an actual outcome compared with what intelligent life is “capable of” makes all of this unnecessarily dependant on both definitions and empirics about the future.
For purposes of the original question, I don’t think we need to deal with all the complexity around “curtailing potential”. You can just ask: How much should a funder be willing to pay to remove an 0.01% risk of extinction that’s independent from all other extinction risks we’re facing. (Eg., a giganormous asteroid is on its way to Earth and has an 0.01% probability of hitting us, causing guaranteed extinction. No on else will notice this in time. Do we pay $X to redirect it?)
This seems closely analogous to questions that funders are facing (are we keen to pay to slightly reduce one, contemporary extinction risk). For non-extinction x-risk reduction, this extinction-estimate will be informative as a comparison point, and it seems completely appropriate that you should also check “how bad is this purported x-risk compared to extinction” as a separate exercise.
Seems better than the previous one, though imo still worse than my suggestion, for 3 reasons:
it’s more complex than asking about immediate extinction. (Why exactly 100 year cutoff? why 50%?)
since the definition explicitly allows for different x-risks to be differently bad, the amount you’d pay to reduce them would vary depending on the x-risk. So the question is underspecified.
The independence assumption is better if funders often face opportunities to reduce a Y%-risk that’s roughly independent from most other x-risk this century. Your suggestion is better if funders often face opportunities to reduce Y percentage points of all x-risk this century (e.g. if all risks are completely disjunctive, s.t. if you remove a risk, you’re guaranteed to not be hit by any other risk).
For your two examples, the risks from asteroids and climate change are mostly independent from the majority of x-risk this century, so there the independence assumption is better.
The disjunctive assumption can happen if we e.g. study different mutually exclusive cases, e.g. reducing risk from worlds with fast AI take-off vs reducing risk from worlds with slow AI take-off.
I weakly think that the former is more common.
(Note that the difference only matters if total x-risk this century is large.)
Edit: This is all about what version of this question is the best version, independent of inertia. If you’re attached to percentage points because you don’t want to change to an independence assumption after there’s already been some discussion on the post, then this your latest suggestion seems good enough. (Though I think most people have been assuming low total amount of x-risk, so probably independence or not doesn’t matter that much for the existing discussion.)
Point of clarification: Is this like the world had a 3% chance of ending and now the world has a 2.99% of ending? For what length of time? Are we assuming perfect knowledge?
Yes, though I would count existential but non-extinction risks (eg permanent dictatorships) as well.
Hmm let’s say we’re looking at existential risks in the next 100 years to make it comparable to Ord’s book.
I would prefer “at the quality of models we currently or in the near future will have access to”, but we can assume perfect knowledge if that makes things conceptually easier or if you believe in intrinsic discounting for uncertainty.
I agree that it makes much more sense to estimate x-risk on a timescale of 100 years (as I said in the sidenote of my answer), but I think you should specify that in the question, because “How many EA 2021 $s would you trade off against a 0.01% chance of existential catastrophe?” together with your definition of x-risk, implies taking the whole future of humanity into account.
I think it may make sense to explicitly only talk about the risk of existential catastrophe in this or in the next couple of centuries.
Lots of people have different disagreements about how to word this question. I feel like I should pass on editing the question even further, especially given that I don’t think it’s likely to change people’s answers too much.
This works out fine for me, since for empirical reasons, I place overwhelming probability on the conjunction of “existential catastrophe by 2121” and “long-term future looks very likely to be excellent in 2121.” But insofar as we’re using existential-risk-reduction as a proxy for good-accomplished, I think that 1 basis point should be worth something more like 1⁄10,000 of the transition from a catastrophic future to a near-optimal future. Those are units that we should fundamentally care about maximizing; we don’t fundamentally care about maximizing no-catastrophe-before-2121. (And then we can make comparisons to interventions that don’t affect X-risk by saying what fraction of a transition from catastrophe to excellent they are worth. I guess I’m saying we should think in terms of something equivalent to utilons, where utilons are whatever we ought to increase, since optimizing for anything not equivalent to them is by definition optimizing imprecisely.)
I agree, I think this is what existential risks means by definition. But I appreciate the clarification regardless!
As I think you agree, the optimal future can be extremely good and exotic.
It seems that in your reply to Zach, you are saying that x risk reduction is moving us to this future by definition. However, I can’t immediately find this definition in the link you provided.
(There might be some fuzzy thinking below, I am just typing quickly.)
Removing the risk of AI or nanobots, and just keeping humans “in the game” in 2121, just as we are in 2021, is valuable, but I don’t think this is the same as moving us to the awesome future.
I think moving us 1⁄10,000 to the awesome future could be a really strong statement.
Well put. I most often find it useful to think in terms of awesome future vs the alternative, but this isn’t the default definition of existential risk, and certainly not of existential-catastrophe-by-2121.
For empirical reasons, I agree because value is roughly binary. But by definition, existential risk reduction could involve moving from catastrophic futures to futures in which we realize a significant fraction of our potential but the future is not near-optimal.
For clarity’s sake, the definition I was using is as follows:
Perhaps the issue here is the definition of “drastic.” Ironically I too have complained about the imprecision of this definition before. If I was the czar-in-charge-of-renaming-things-in-EA, I’d probably define an existential risk as:
Of course, I’m not the czar-in-charge-of-renaming-things-in-EA , so here we are.
EDIT: I am the czar-in-charge-of-renaming-things-in-my-own EA Forum question, so I’ll rephrase.
I think one problem with your re-definition (that makes it imperfect IMO also) is apparent when thinking about the following questions: How likely is it that Earth-originating intelligent life eventually reaches >90% of its potential? How likely is it that it eventually reaches >0.1% of its potential? >0.0001% of its potential? >10^20 the total value of the conscious experiences of all humans with net-positive lives during the year 2020? My answers to these questions increase with each respective question, and my answer to the last question is several times higher than my answer to the first question.
Our cosmic potential is potentially extremely large, and there are many possible “very long-lasting, positive futures” (to use Holden’s language from here) that seem “extremely good” from our limited perspective today (e.g. the futures we imagine when we read Bostrom’s Letter from Utopia). But these futures potentially differ in value tremendously.
Okay, I just saw Zach’s comment that he thinks value is roughly binary. I currently don’t think I agree with him (see his first paragraph at that link, and my reply clarifying my view). Maybe my view is unusual?
Linch here:
I’d give similar answers for all 5 of those questions because I think most of the “existential catastrophes” (defined vaguely) involve wiping out >>99% of our potential (e.g. extinction this century before value/time increases substantially). But my independent impression is that there are a lot of “extremely good” outcomes in which we have a very long-lasting, positive future with value/year much, much greater than the value per year on Earth today, that also falls >99% short of our potential (and even >99.9999% of our potential).
It’s also possible that different people have different views of what “humanity’s potential” really means!
Great point. Ideally “existential risk” should be an entirely empirical thing that we can talk about independent of our values / moral beliefs about what future is optimal.
This is impossible if you consider “unrecoverable dystopia”, “stable totalitarianism” etc as existential risks, as these things are implicitly values judgments.
Though I’m open to the idea that we should maybe talk about extinction risks instead of existential risks instead, given that this is empirically most of what xrisk people work on.
(Though I think some AI risk people think, as an empirical matter, that some AI catastrophes would entail humanity surviving while completely losing control for the lightcone, and both they and I would consider this basically as bad as all of our descendants dying).
Currently, the post says:
I’m not a fan of this definition, because I find it very plausible that the expected value of the future is less than 50% of what humanity is capable of. Which e.g. raises the question: does even extinction fulfil the description? Maybe you could argue “yes”: but the mix of causing an actual outcome compared with what intelligent life is “capable of” makes all of this unnecessarily dependant on both definitions and empirics about the future.
For purposes of the original question, I don’t think we need to deal with all the complexity around “curtailing potential”. You can just ask: How much should a funder be willing to pay to remove an 0.01% risk of extinction that’s independent from all other extinction risks we’re facing. (Eg., a giganormous asteroid is on its way to Earth and has an 0.01% probability of hitting us, causing guaranteed extinction. No on else will notice this in time. Do we pay $X to redirect it?)
This seems closely analogous to questions that funders are facing (are we keen to pay to slightly reduce one, contemporary extinction risk). For non-extinction x-risk reduction, this extinction-estimate will be informative as a comparison point, and it seems completely appropriate that you should also check “how bad is this purported x-risk compared to extinction” as a separate exercise.
How do people feel about a proposed new definition:
Seems better than the previous one, though imo still worse than my suggestion, for 3 reasons:
it’s more complex than asking about immediate extinction. (Why exactly 100 year cutoff? why 50%?)
since the definition explicitly allows for different x-risks to be differently bad, the amount you’d pay to reduce them would vary depending on the x-risk. So the question is underspecified.
The independence assumption is better if funders often face opportunities to reduce a Y%-risk that’s roughly independent from most other x-risk this century. Your suggestion is better if funders often face opportunities to reduce Y percentage points of all x-risk this century (e.g. if all risks are completely disjunctive, s.t. if you remove a risk, you’re guaranteed to not be hit by any other risk).
For your two examples, the risks from asteroids and climate change are mostly independent from the majority of x-risk this century, so there the independence assumption is better.
The disjunctive assumption can happen if we e.g. study different mutually exclusive cases, e.g. reducing risk from worlds with fast AI take-off vs reducing risk from worlds with slow AI take-off.
I weakly think that the former is more common.
(Note that the difference only matters if total x-risk this century is large.)
Edit: This is all about what version of this question is the best version, independent of inertia. If you’re attached to percentage points because you don’t want to change to an independence assumption after there’s already been some discussion on the post, then this your latest suggestion seems good enough. (Though I think most people have been assuming low total amount of x-risk, so probably independence or not doesn’t matter that much for the existing discussion.)