There are person-affecting views according to which future beings still matter substantially, e.g. wide or asymmetric ones. I don’t personally find narrow symmetric person-affecting views very plausible, since they would mean causing someone to come to exist and have a life of pure torture and misery wouldn’t be bad for that person (although later preventing their suffering after they exist would be good, if possible), and significantly discounting such harms just because the individual doesn’t yet exist seems really wrong to me. So, it’s hard to me to justify discounting all of the possible sources of (dis)value in the far future, except to the extent that predictably improving things in the far future is hard.
While promoting AI safety on the basis of wrong values may increase AI safety work, it may also increase the likelihood that AI will have wrong values (plausibly increasing the likelihood of quality risks), and shift the values in the EA community towards wrong values. It’s very plausibly worth the risks, but these risks are worth considering.
While promoting AI safety on the basis of wrong values may increase AI safety work, it may also increase the likelihood that AI will have wrong values (plausibly increasing the likelihood of quality risks), and shift the values in the EA community towards wrong values. It’s very plausibly worth the risks, but these risks are worth considering.
I’m personally pretty unconvinced of this. I conceive of AI Safety work as “solve the problem of making AGI that doesn’t kill everyone” more so than I conceive of it as “figure out humanity’s coherent extrapolated vision and load it into a sovereign that creates a utopia”. To the degree that we do explicitly load a value system into an AGI (which I’m skeptical of), I think that the process of creating this value system will be hard and messy and involve many stakeholders, and that EA may have outsized influence but is unlikely to be the deciding voice.
Having outsized influence could be enough, when we’re considering the (dis)value in the far future at stake, which is still much larger than from the deaths of everyone killed by AI. What ratio of probabilities between influencing values in a better direction vs preventing extinction would you assign? Is the ratio small enough to give less overall weight to the expected far future impact than the reduction in risk of everyone killed by AI?
(FWIW, I don’t think it’s strictly necessary to explicitly “load” a value system to influence the kinds of values an AI system might have.)
There are person-affecting views according to which future beings still matter substantially, e.g. wide or asymmetric ones. I don’t personally find narrow symmetric person-affecting views very plausible, since they would mean causing someone to come to exist and have a life of pure torture and misery wouldn’t be bad for that person (although later preventing their suffering after they exist would be good, if possible), and significantly discounting such harms just because the individual doesn’t yet exist seems really wrong to me. So, it’s hard to me to justify discounting all of the possible sources of (dis)value in the far future, except to the extent that predictably improving things in the far future is hard.
For general implications of some impartial and asymmetric (but non-antinatalist) person-affecting views, see: https://globalprioritiesinstitute.org/teruji-thomas-the-asymmetry-uncertainty-and-the-long-term/
On asymmetric person-affecting views, negative utilitarianism and other downside-focused ethical views, AI safety looks promising, although the relative focus between interventions in our community is plausibly wrong, as s-risks and other quality risks seem relatively neglected. For s-risks, see: https://forum.effectivealtruism.org/posts/225Aq4P4jFPoWBrb5/cause-prioritization-for-downside-focused-value-systems https://longtermrisk.org/research-agenda
While promoting AI safety on the basis of wrong values may increase AI safety work, it may also increase the likelihood that AI will have wrong values (plausibly increasing the likelihood of quality risks), and shift the values in the EA community towards wrong values. It’s very plausibly worth the risks, but these risks are worth considering.
Fair points!
I’m personally pretty unconvinced of this. I conceive of AI Safety work as “solve the problem of making AGI that doesn’t kill everyone” more so than I conceive of it as “figure out humanity’s coherent extrapolated vision and load it into a sovereign that creates a utopia”. To the degree that we do explicitly load a value system into an AGI (which I’m skeptical of), I think that the process of creating this value system will be hard and messy and involve many stakeholders, and that EA may have outsized influence but is unlikely to be the deciding voice.
Having outsized influence could be enough, when we’re considering the (dis)value in the far future at stake, which is still much larger than from the deaths of everyone killed by AI. What ratio of probabilities between influencing values in a better direction vs preventing extinction would you assign? Is the ratio small enough to give less overall weight to the expected far future impact than the reduction in risk of everyone killed by AI?
(FWIW, I don’t think it’s strictly necessary to explicitly “load” a value system to influence the kinds of values an AI system might have.)