Like the other commenter says, I feel worried that v(.) refers to the value of “humanity”. For similar reasons, I feel worried that existential risk is defined in terms of humanity’s potential.
One issue is that it’s vague what counts as “humanity”. Homo sapiens count, but what about:
A species that Homo sapiens evolves into
“Uploaded” humans
“Aligned” AI systems
Non-aligned AI systems that nonetheless produce morally valuable or disvaluable outcomes
I’m not sure where you draw the line, or if there is a principled place to draw the line.
A second issue is that “humanity” doesn’t include the value of:
Earth-originating but nonhuman civilisations, for example if Homo sapiens go extinct, but some other species later evolves that has technological capability.
Non-Earth-originating alien civilisation.
And, depending on how “humanity” is defined, it may not include non-aligned AI systems that nonetheless produce morally valuable or disvaluable outcomes.
I tried to think about how to incorporate this into your model, but ultimately I think it’s hard without it becoming quite unintuitive.
And I think these adjustments are potentially non-trivial. I think one could reasonably hold, for example, that the probability of a technologically-capable species evolving, if Homo sapiens goes extinct, is 90%, that non-Earth-originating alien civilisations settling the solar systems that we would ultimately settle is also 90%, and that such civilisations would have similar value to human-originating civilisation.
(They also change how you should think about longterm impact. If alien civilisations will settle the Milky Way (etc) anyway, then preventing human extinction is actually about changing how interstellar resources are used, not whether they are used at all .)
And I think it means we miss out on some potentially important ways of improving the future. For example, consider scenarios where we fail on alignment. There is no “humanity”, but we can still make the future better or worse. A misaligned AI system that promotes suffering (or promotes something that involves a lot of suffering) is a lot worse than an AI system that promotes something valueless.
The term ‘humanity’ is definitely intended to be interpreted broadly. I was more explicit about this in The Precipice and forgot to reiterate it in this paper. I certainly want to include any worthy successors to homo sapiens. But it may be important to understand the boundary of what counts. A background assumption is that the entities are both moral agents and moral patients — capable of steering the future towards what matters and for being intrinsically part of what matters. I’m not sure if those assumptions are actually needed, but they were guiding my thought.
I definitely don’t intend to include alien civilisations or future independent earth-originating intelligent life. The point is to capture the causal downstream consequences of things in our sphere of control. So the effects of us on alien civilisations should be counted and any effects we have of on whether any earth species evolves after us, but it isn’t meant to be a graph of all value in the universe. My methods wouldn’t work for that, as we can’t plausibly speed that up, or protect it all etc (unless we were almost all the value anyway).
Humanity
Like the other commenter says, I feel worried that v(.) refers to the value of “humanity”. For similar reasons, I feel worried that existential risk is defined in terms of humanity’s potential.
One issue is that it’s vague what counts as “humanity”. Homo sapiens count, but what about:
A species that Homo sapiens evolves into
“Uploaded” humans
“Aligned” AI systems
Non-aligned AI systems that nonetheless produce morally valuable or disvaluable outcomes
I’m not sure where you draw the line, or if there is a principled place to draw the line.
A second issue is that “humanity” doesn’t include the value of:
Earth-originating but nonhuman civilisations, for example if Homo sapiens go extinct, but some other species later evolves that has technological capability.
Non-Earth-originating alien civilisation.
And, depending on how “humanity” is defined, it may not include non-aligned AI systems that nonetheless produce morally valuable or disvaluable outcomes.
I tried to think about how to incorporate this into your model, but ultimately I think it’s hard without it becoming quite unintuitive.
And I think these adjustments are potentially non-trivial. I think one could reasonably hold, for example, that the probability of a technologically-capable species evolving, if Homo sapiens goes extinct, is 90%, that non-Earth-originating alien civilisations settling the solar systems that we would ultimately settle is also 90%, and that such civilisations would have similar value to human-originating civilisation.
(They also change how you should think about longterm impact. If alien civilisations will settle the Milky Way (etc) anyway, then preventing human extinction is actually about changing how interstellar resources are used, not whether they are used at all .)
And I think it means we miss out on some potentially important ways of improving the future. For example, consider scenarios where we fail on alignment. There is no “humanity”, but we can still make the future better or worse. A misaligned AI system that promotes suffering (or promotes something that involves a lot of suffering) is a lot worse than an AI system that promotes something valueless.
The term ‘humanity’ is definitely intended to be interpreted broadly. I was more explicit about this in The Precipice and forgot to reiterate it in this paper. I certainly want to include any worthy successors to homo sapiens. But it may be important to understand the boundary of what counts. A background assumption is that the entities are both moral agents and moral patients — capable of steering the future towards what matters and for being intrinsically part of what matters. I’m not sure if those assumptions are actually needed, but they were guiding my thought.
I definitely don’t intend to include alien civilisations or future independent earth-originating intelligent life. The point is to capture the causal downstream consequences of things in our sphere of control. So the effects of us on alien civilisations should be counted and any effects we have of on whether any earth species evolves after us, but it isn’t meant to be a graph of all value in the universe. My methods wouldn’t work for that, as we can’t plausibly speed that up, or protect it all etc (unless we were almost all the value anyway).