Like the other commenter says, I feel worried that v(.) refers to the value of “humanity”. For similar reasons, I feel worried that existential risk is defined in terms of humanity’s potential.
One issue is that it’s vague what counts as “humanity”. Homo sapiens count, but what about:
A species that Homo sapiens evolves into
“Uploaded” humans
“Aligned” AI systems
Non-aligned AI systems that nonetheless produce morally valuable or disvaluable outcomes
I’m not sure where you draw the line, or if there is a principled place to draw the line.
A second issue is that “humanity” doesn’t include the value of:
Earth-originating but nonhuman civilisations, for example if Homo sapiens go extinct, but some other species later evolves that has technological capability.
Non-Earth-originating alien civilisation.
And, depending on how “humanity” is defined, it may not include non-aligned AI systems that nonetheless produce morally valuable or disvaluable outcomes.
I tried to think about how to incorporate this into your model, but ultimately I think it’s hard without it becoming quite unintuitive.
And I think these adjustments are potentially non-trivial. I think one could reasonably hold, for example, that the probability of a technologically-capable species evolving, if Homo sapiens goes extinct, is 90%, that non-Earth-originating alien civilisations settling the solar systems that we would ultimately settle is also 90%, and that such civilisations would have similar value to human-originating civilisation.
(They also change how you should think about longterm impact. If alien civilisations will settle the Milky Way (etc) anyway, then preventing human extinction is actually about changing how interstellar resources are used, not whether they are used at all .)
And I think it means we miss out on some potentially important ways of improving the future. For example, consider scenarios where we fail on alignment. There is no “humanity”, but we can still make the future better or worse. A misaligned AI system that promotes suffering (or promotes something that involves a lot of suffering) is a lot worse than an AI system that promotes something valueless.
Humanity
Like the other commenter says, I feel worried that v(.) refers to the value of “humanity”. For similar reasons, I feel worried that existential risk is defined in terms of humanity’s potential.
One issue is that it’s vague what counts as “humanity”. Homo sapiens count, but what about:
A species that Homo sapiens evolves into
“Uploaded” humans
“Aligned” AI systems
Non-aligned AI systems that nonetheless produce morally valuable or disvaluable outcomes
I’m not sure where you draw the line, or if there is a principled place to draw the line.
A second issue is that “humanity” doesn’t include the value of:
Earth-originating but nonhuman civilisations, for example if Homo sapiens go extinct, but some other species later evolves that has technological capability.
Non-Earth-originating alien civilisation.
And, depending on how “humanity” is defined, it may not include non-aligned AI systems that nonetheless produce morally valuable or disvaluable outcomes.
I tried to think about how to incorporate this into your model, but ultimately I think it’s hard without it becoming quite unintuitive.
And I think these adjustments are potentially non-trivial. I think one could reasonably hold, for example, that the probability of a technologically-capable species evolving, if Homo sapiens goes extinct, is 90%, that non-Earth-originating alien civilisations settling the solar systems that we would ultimately settle is also 90%, and that such civilisations would have similar value to human-originating civilisation.
(They also change how you should think about longterm impact. If alien civilisations will settle the Milky Way (etc) anyway, then preventing human extinction is actually about changing how interstellar resources are used, not whether they are used at all .)
And I think it means we miss out on some potentially important ways of improving the future. For example, consider scenarios where we fail on alignment. There is no “humanity”, but we can still make the future better or worse. A misaligned AI system that promotes suffering (or promotes something that involves a lot of suffering) is a lot worse than an AI system that promotes something valueless.