Scenario 1: Alignment goes well. In this scenario, I agree that our future AI-assisted selves can figure things out, and that pre-alignment AI sentience work will have been wasted effort.
Scenario 2: Alignment goes poorly. While I don’t technically disagree with your statement, “If AIs are unaligned with human values, that seems very bad already,” I do think it misleads through lumping together all kinds of misaligned AI outcomes into “very bad,” when in reality this category ranges across many orders of magnitude of badness.[1] In the case that we lose control of the future at some point, to me it seems worthwhile to try to steer away from some of the worse outcomes (e.g., astronomical “byproduct” suffering of digital minds, which is likely easier to avoid if we better understand AI sentience), before then.
Scenario 1: Alignment goes well. In this scenario, I agree that our future AI-assisted selves can figure things out, and that pre-alignment AI sentience work will have been wasted effort.
Scenario 2: Alignment goes poorly. While I don’t technically disagree with your statement, “If AIs are unaligned with human values, that seems very bad already,” I do think it misleads through lumping together all kinds of misaligned AI outcomes into “very bad,” when in reality this category ranges across many orders of magnitude of badness.[1] In the case that we lose control of the future at some point, to me it seems worthwhile to try to steer away from some of the worse outcomes (e.g., astronomical “byproduct” suffering of digital minds, which is likely easier to avoid if we better understand AI sentience), before then.
From the roughly neutral outcome of paperclip maximization, to the extremely bad outcome of optimized suffering.