While promoting AI safety on the basis of wrong values may increase AI safety work, it may also increase the likelihood that AI will have wrong values (plausibly increasing the likelihood of quality risks), and shift the values in the EA community towards wrong values. It’s very plausibly worth the risks, but these risks are worth considering.
I’m personally pretty unconvinced of this. I conceive of AI Safety work as “solve the problem of making AGI that doesn’t kill everyone” more so than I conceive of it as “figure out humanity’s coherent extrapolated vision and load it into a sovereign that creates a utopia”. To the degree that we do explicitly load a value system into an AGI (which I’m skeptical of), I think that the process of creating this value system will be hard and messy and involve many stakeholders, and that EA may have outsized influence but is unlikely to be the deciding voice.
Having outsized influence could be enough, when we’re considering the (dis)value in the far future at stake, which is still much larger than from the deaths of everyone killed by AI. What ratio of probabilities between influencing values in a better direction vs preventing extinction would you assign? Is the ratio small enough to give less overall weight to the expected far future impact than the reduction in risk of everyone killed by AI?
(FWIW, I don’t think it’s strictly necessary to explicitly “load” a value system to influence the kinds of values an AI system might have.)
Fair points!
I’m personally pretty unconvinced of this. I conceive of AI Safety work as “solve the problem of making AGI that doesn’t kill everyone” more so than I conceive of it as “figure out humanity’s coherent extrapolated vision and load it into a sovereign that creates a utopia”. To the degree that we do explicitly load a value system into an AGI (which I’m skeptical of), I think that the process of creating this value system will be hard and messy and involve many stakeholders, and that EA may have outsized influence but is unlikely to be the deciding voice.
Having outsized influence could be enough, when we’re considering the (dis)value in the far future at stake, which is still much larger than from the deaths of everyone killed by AI. What ratio of probabilities between influencing values in a better direction vs preventing extinction would you assign? Is the ratio small enough to give less overall weight to the expected far future impact than the reduction in risk of everyone killed by AI?
(FWIW, I don’t think it’s strictly necessary to explicitly “load” a value system to influence the kinds of values an AI system might have.)