AI alignment is not just about achieving initial compatibility with human values.
I am not sure what “human values” refers to here, but, from my perspective, the goal is aligning transformative AI with impartial values, not human values. In particular, I would be happy for transformative AI to maximise expected total hedonistic wellbeing, even if that implies human extinction (e.g. maybe humans will do everything to maintain significant amounts of wildlife, and AI concludes wildlife is full of suffering). This related to a point you make in the next section:
Finally, under the TOP case it may even be the case that actors not conventionally seen as bad could result in existential catastrophe. If the actors who control AI lack expansionist ambitions, they might fail to colonize the stars and create a grand future.
Hi David,
I am not sure what “human values” refers to here, but, from my perspective, the goal is aligning transformative AI with impartial values, not human values. In particular, I would be happy for transformative AI to maximise expected total hedonistic wellbeing, even if that implies human extinction (e.g. maybe humans will do everything to maintain significant amounts of wildlife, and AI concludes wildlife is full of suffering). This related to a point you make in the next section: