I’m curious to what extent the value of the “happiness-to-be-overthrown-by” (H2BOB) variable for the unaligned AI that overthrew us would be predictive of the H2BOB value of future generations / evolutions of AI. Specifically, it seems at least plausible that the nature and rate of unaligned AI evolution could be so broad and fast that knowing the nature and H2BOB of the first AGI would tell us essentially nothing about prospects for AI welfare in the long run.
I’m curious to what extent the value of the “happiness-to-be-overthrown-by” (H2BOB) variable for the unaligned AI that overthrew us would be predictive of the H2BOB value of future generations / evolutions of AI. Specifically, it seems at least plausible that the nature and rate of unaligned AI evolution could be so broad and fast that knowing the nature and H2BOB of the first AGI would tell us essentially nothing about prospects for AI welfare in the long run.