Most arguments about AI x-risk are designed to convince other humans, e.g. policymakers, civil society or the public. Therefore, they are often framed in a human-centric way (“AI safety should be about helping humans survive and flourish indefinitely”). This situation is improving in 2024 as we think more about s-risks and moral circle expansion (to include non-human animals and AIs).
The linked post’s assumption is that we could persuade future powerful AIs to preserve/help humans with arguments that are (our projection of) “in the best interests of the AIs”. Perhaps these arguments can also be framed in ways that are species- or substrate-neutral. At the very least, we could get arguments like these into future AI systems’ training data, with a view to influencing their value alignment. The linked post reworks (and in some cases uses verbatim—tracklining is available on-site) the LessWrong post by Miller et al (2023) which in turn draws upon Turchin (2021).
As the comments in Miller at al (2023) show, there are basic objection to this exercise:
Something that qualifies as AGI or ASI presumably would be so cognitively beyond us, that it seems implausible that we can make an argument (or rather, predict what such an argument might look like) that would influence or even make sense to it.
Alternatively, a true superintelligence could likely simulate/predict, and have a considered answer/refutation of, almost any (logical or fact-based) argument we could come up with.
Partially in response to above, Turchin notes the likelihood of “young AGIs” or proto-AGIs, who might share concepts or ontologies with us (or have translations we are familiar with). There is a lot of possibly relevant writing about ontology translation/identification and how predictive systems might complicate AI alignment, so I won’t analyse the questions above here.
Miller et al (2023) also talked about “killing humans”, which was probably meant for expository clarity. But it seemed narrow; hence, the linked post adds points about totalitarian dystopias as well as the need to preserve our ecosystem (not just humans).
Aside from expanding upon the prior work, the linked post broadens the philosophical/cultural background assumptions outside the rationalist or AI safety/alignment space; and makes a “positive case” for what value humans add in a cosmic or universal sense (i.e. it is related to the question “if we weren’t here, why would that be bad and for whom?”).
This is pretty imperfect and a work-in-progress, but seems worth doing, if only to clarify for ourselves what value humanity might collectively have in a post-AGI world while practicing strategic empathy (with our possible successors). Comments are welcome, and feel free to re-use content from the GitHub repo.
Linkpost: Epistle to the Successors
Link post
Most arguments about AI x-risk are designed to convince other humans, e.g. policymakers, civil society or the public. Therefore, they are often framed in a human-centric way (“AI safety should be about helping humans survive and flourish indefinitely”). This situation is improving in 2024 as we think more about s-risks and moral circle expansion (to include non-human animals and AIs).
The linked post’s assumption is that we could persuade future powerful AIs to preserve/help humans with arguments that are (our projection of) “in the best interests of the AIs”. Perhaps these arguments can also be framed in ways that are species- or substrate-neutral. At the very least, we could get arguments like these into future AI systems’ training data, with a view to influencing their value alignment. The linked post reworks (and in some cases uses verbatim—tracklining is available on-site) the LessWrong post by Miller et al (2023) which in turn draws upon Turchin (2021).
As the comments in Miller at al (2023) show, there are basic objection to this exercise:
Something that qualifies as AGI or ASI presumably would be so cognitively beyond us, that it seems implausible that we can make an argument (or rather, predict what such an argument might look like) that would influence or even make sense to it.
Alternatively, a true superintelligence could likely simulate/predict, and have a considered answer/refutation of, almost any (logical or fact-based) argument we could come up with.
Partially in response to above, Turchin notes the likelihood of “young AGIs” or proto-AGIs, who might share concepts or ontologies with us (or have translations we are familiar with). There is a lot of possibly relevant writing about ontology translation/identification and how predictive systems might complicate AI alignment, so I won’t analyse the questions above here.
Miller et al (2023) also talked about “killing humans”, which was probably meant for expository clarity. But it seemed narrow; hence, the linked post adds points about totalitarian dystopias as well as the need to preserve our ecosystem (not just humans).
Aside from expanding upon the prior work, the linked post broadens the philosophical/cultural background assumptions outside the rationalist or AI safety/alignment space; and makes a “positive case” for what value humans add in a cosmic or universal sense (i.e. it is related to the question “if we weren’t here, why would that be bad and for whom?”).
This is pretty imperfect and a work-in-progress, but seems worth doing, if only to clarify for ourselves what value humanity might collectively have in a post-AGI world while practicing strategic empathy (with our possible successors). Comments are welcome, and feel free to re-use content from the GitHub repo.