This is really useful in that it examines critically what I think of as the ‘orthodox view’: alignment is good because it ‘allows humans to preserve control over the future’. This view feels fundamental but underexamined, in much of the EA/alignment world (with notable exceptions: Rich Sutton, Robin Hanson, Joscha Bach who seem species-agnostic; Paul Christiano has also fleshed out his position e.g. this part of a Dwarkesh Patel podcast).
A couple of points I wasn’t sure I understood/agreed with FWIW:
a) A relatively minor one is
To the extent you think that future AIs would not be capable of creating massive wealth for humans, or extending their lifespans, this largely implies that you think future AIs will not be very powerful, smart, or productive. Thus, by the same argument, we should also not think future AIs will be capable of making humanity go extinct.
I’m not sure about about this symmetry—I can imagine an LLM (~GPT-5 class) integrated into a nuclear/military decision-making system that could cause catastrophic death/suffering (millions/billions of immediate/secondary deaths, massive technological setback, albeit not literal extinction). I’m assuming the point doesn’t hinge on literal extinction.
b) Regarding calebp’s comment on option value: I agree most option value discussion (doesn’t seem to be much outside Bostrom and the s-risk discourse) assumes continuation of the human species, but I wonder if there is room for a more cosmopolitan framing: ‘Humans are our only example of an advanced technological civilisation, that might be on the verge of a step change in their evolution. The impact of this evolutionary step-change on the future can arguably be (on balance) good (definition of “good” tbd). The “option value” we are trying to preserve is less the existence of humans per-se, but rather the possibility of such an evolution happening at all. Put another way, we don’t an to prematurely introduce an unaligned or misaligned AI (perhaps a weak one) that causes extinction, a bad lock-in, or prevents emergence of more capable AIs that could have achieved this evolutionary transition.’
In other words, the option value is not over the number of human lives (or economic value) but rather over the possible trajectories of the future...this does not seem particularly species-specific. It just says that we should be careful not to throw these futures away.
c) point (b) hinges on why human evolution is ‘good’ in any broad or inclusive sense (outside of letting current and near-current generations live wealthier, longer lives, if indeed those are good things).
In order to answer this, it feels like we need some way of defining value ‘from the point of view of the universe’. That particular phrase is a Sidgwick/Singer thing, and I’m not sure it is directly applicable in this context (like similar phrases e.g. Nagel’s ‘view from nowhere’), but without this it is very hard to talk about non-species based notions of value (i.e. standard utilitarianism, deontological/virtue approaches all basically rely on human on animal beings).
My candidate for this ‘cosmic value’ is something like created complexity (which can be physical or not, and can include things that are not obviously economically/militarily/reproductively valuable like art). This includes having trillions of diverse computing entities (human or otherwise).
This is obviously pretty hand-wavey, but I’d be interested in talking to anyone with views (it’s basically my PhD :-)
This is really useful in that it examines critically what I think of as the ‘orthodox view’: alignment is good because it ‘allows humans to preserve control over the future’. This view feels fundamental but underexamined, in much of the EA/alignment world (with notable exceptions: Rich Sutton, Robin Hanson, Joscha Bach who seem species-agnostic; Paul Christiano has also fleshed out his position e.g. this part of a Dwarkesh Patel podcast).
A couple of points I wasn’t sure I understood/agreed with FWIW:
a) A relatively minor one is
I’m not sure about about this symmetry—I can imagine an LLM (~GPT-5 class) integrated into a nuclear/military decision-making system that could cause catastrophic death/suffering (millions/billions of immediate/secondary deaths, massive technological setback, albeit not literal extinction). I’m assuming the point doesn’t hinge on literal extinction.
b) Regarding calebp’s comment on option value: I agree most option value discussion (doesn’t seem to be much outside Bostrom and the s-risk discourse) assumes continuation of the human species, but I wonder if there is room for a more cosmopolitan framing: ‘Humans are our only example of an advanced technological civilisation, that might be on the verge of a step change in their evolution. The impact of this evolutionary step-change on the future can arguably be (on balance) good (definition of “good” tbd). The “option value” we are trying to preserve is less the existence of humans per-se, but rather the possibility of such an evolution happening at all. Put another way, we don’t an to prematurely introduce an unaligned or misaligned AI (perhaps a weak one) that causes extinction, a bad lock-in, or prevents emergence of more capable AIs that could have achieved this evolutionary transition.’
In other words, the option value is not over the number of human lives (or economic value) but rather over the possible trajectories of the future...this does not seem particularly species-specific. It just says that we should be careful not to throw these futures away.
c) point (b) hinges on why human evolution is ‘good’ in any broad or inclusive sense (outside of letting current and near-current generations live wealthier, longer lives, if indeed those are good things).
In order to answer this, it feels like we need some way of defining value ‘from the point of view of the universe’. That particular phrase is a Sidgwick/Singer thing, and I’m not sure it is directly applicable in this context (like similar phrases e.g. Nagel’s ‘view from nowhere’), but without this it is very hard to talk about non-species based notions of value (i.e. standard utilitarianism, deontological/virtue approaches all basically rely on human on animal beings).
My candidate for this ‘cosmic value’ is something like created complexity (which can be physical or not, and can include things that are not obviously economically/militarily/reproductively valuable like art). This includes having trillions of diverse computing entities (human or otherwise).
This is obviously pretty hand-wavey, but I’d be interested in talking to anyone with views (it’s basically my PhD :-)