The classic definition comes from Bostrom:
Existential risk – One where an adverse outcome would either annihilate Earth-originating intelligent life or permanently and drastically curtail its potential.
But this definition, while poetic and gesturing at something real, is more than a bit vague, and many people are unhappy with it, judging from the long chain of clarifying questions in my linked question. So I’m interested in proposed community alternatives that the EA community and/or leading longtermist or xrisk researchers may wish to adopt instead.
Alternative definitions should ideally be precise, clear, unambiguous, and hopefully not too long.
I wrote a post last year basically trying to counter misconceptions about Ord’s definition and also somewhat operationalise it. Here’s the “Conclusion” section:
That leaves ambiguity as to precisely what fraction is sufficient to count as “the vast majority”, but I don’t think that’s a very important ambiguity—e.g., I doubt people’s estimates would change a lot if we set the bar at 75% of potential lost vs 99%.
I think the more important ambiguities are what our “potential” is and what it means to “lose” it. As Ord defines x-risk, that’s partly a question of moral philosophy—i.e. it’s as if his definition contains a “pointer” to whatever moral theories we have credence in, our credence in them, and our way of aggregating that, rather than baking a moral conclusion in. E.g., his definition deliberating avoids taking a stance on things like whether a future where we stay on Earth forever or a future with only strange but in some sense “happy” digital minds, or failing to reach such futures, would be an existential catastrophe.
This footnote from my post is also relevant:
If we’re being precise, I would just avoid thinking in terms of X-risk, since “X-risk” vs “not-an-X-risk” imposes a binary where really we should just care about losing expected value.
If we want a definition to help gesture at the kinds of things we mean when we talk about X-risk, several possibilities would be fine. I like something like destruction of lots of expected value.
If we wanted to make this precise, which I don’t think we should, we would need to go beyond fraction of expected value or fraction of potential, since something could reduce our expectations to zero or negative without being an X-catastrophe (in particular, if our expectations had already been reduced to an insignificant positive value by a previous X-catastrophe; note that the definitions MichaelA and Mauricio suggest are undesirable for this reason), and some things that should definitely be called X-catastrophes can destroy expectation without decreasing potential. A precise definition would need to look more like expectations decreasing by at least a standard deviation. Again, I don’t think this is useful, but any simpler alternative won’t precisely describe what we mean.
We might also need to appeal to some idealization of our expectations, such as expectations from the point of view of an imaginary smart/knowledgable person observing human civilization, such that changes in our knowledge affecting our expectations don’t constitute X-catastrophes, but not so idealized that our future is predictable and nothing affects our expectations...
Best to just speak in terms of what we actually care about, not X-risks but expected value.
(Borrowing some language from a comment I just wrote here.)
If an event occurs that permanently locks us in to an “astronomically good” future that is <X% as valuable as the optimal future, has an existential catastrophe occurred? I’d like to use the term “existential risk” such that the answer is “no” for any value of X that still allows for the future to intuitively seem “astronomically good.” If a future intuitively seems just extremely, mind-bogglingly good, then saying that an existential catastrophe has occurred in that future before all the good stuff happened just feels wrong.
So in short, I think “existential catastrophe” should mean what we think of when we think of central examples of existential catastrophes. That includes extinction events and (at least some, but not all) events that lock us in to disappointing futures (futures in which, e.g. “we never leave the solar system” or “massive nonhuman animal suffering continues”). But it does not include things that only seem like catastrophes when a total utilitarian compares them to what’s optimal.
Per Linch’s point that defining existential risk entirely empirically is kind of impossible, I think that maybe we should embrace defining existential risk in terms of value by defining an arbitrary thresholds of value above which if the world is still capable of reaching that level of value then an existential catastrophe has not occurred.
But rather than use 1% or 50% or 90% of optimal as that threshold, we should use a much lower bar that is approximately at the extremely-fuzzy boundary of what seems like an “astronomically good future” in order to avoid situations where “an existential catastrophe has occurred, but the future is still going to be extremely good.”
One such arbitrary threshold:
So if we create an AI that’s destined to put the universe to work creating stuff of value in a sub-optimal way (<90% or <1% or even <<1% of optimal), but that will still fill the universe with amazing conscious minds such that the future is still worth >10^30 utilons, then an existential catastrophe has not occurred. But if it’s a really mediocre non-misaligned AI that (e.g.) wins out in a Adversarial Technological Maturity scenario and only puts our light cone to use to create a future worth <10^30 utilons, then we can call it an existential catastrophe (and perhaps refer to it as a disappointing future, which seems to be the sort of future that results from a subset of existential catastrophes).