Per Linch’s point that defining existential risk entirely empirically is kind of impossible, I think that maybe we should embrace defining existential risk in terms of value by defining an arbitrary thresholds of value above which if the world is still capable of reaching that level of value then an existential catastrophe has not occurred.
But rather than use 1% or 50% or 90% of optimal as that threshold, we should use a much lower bar that is approximately at the extremely-fuzzy boundary of what seems like an “astronomically good future” in order to avoid situations where “an existential catastrophe has occurred, but the future is still going to be extremely good.”
One such arbitrary threshold:
If the intrinsic value of the year-long conscious experiences of the billion happiest people on Earth in 2020 is equal to 10^9 utilons, we could say that an “astronomically good future” is a future worth >10^30 utilons.
So if we create an AI that’s destined to put the universe to work creating stuff of value in a sub-optimal way (<90% or <1% or even <<1% of optimal), but that will still fill the universe with amazing conscious minds such that the future is still worth >10^30 utilons, then an existential catastrophe has not occurred. But if it’s a really mediocre non-misaligned AI that (e.g.) wins out in a Adversarial Technological Maturity scenario and only puts our light cone to use to create a future worth <10^30 utilons, then we can call it an existential catastrophe (and perhaps refer to it as a disappointing future, which seems to be the sort of future that results from a subset of existential catastrophes).
Per Linch’s point that defining existential risk entirely empirically is kind of impossible, I think that maybe we should embrace defining existential risk in terms of value by defining an arbitrary thresholds of value above which if the world is still capable of reaching that level of value then an existential catastrophe has not occurred.
But rather than use 1% or 50% or 90% of optimal as that threshold, we should use a much lower bar that is approximately at the extremely-fuzzy boundary of what seems like an “astronomically good future” in order to avoid situations where “an existential catastrophe has occurred, but the future is still going to be extremely good.”
One such arbitrary threshold:
So if we create an AI that’s destined to put the universe to work creating stuff of value in a sub-optimal way (<90% or <1% or even <<1% of optimal), but that will still fill the universe with amazing conscious minds such that the future is still worth >10^30 utilons, then an existential catastrophe has not occurred. But if it’s a really mediocre non-misaligned AI that (e.g.) wins out in a Adversarial Technological Maturity scenario and only puts our light cone to use to create a future worth <10^30 utilons, then we can call it an existential catastrophe (and perhaps refer to it as a disappointing future, which seems to be the sort of future that results from a subset of existential catastrophes).