Two major reasons/â considerations: 1- Iâm unconvinced of the tractability of non-extinction-risk reducing longtermist interventions. 2- Perhaps this is self-defeatingâbut I feel uncomfortable substantively shaping the future in ways that arenât merely making sure it exists. Visions of the future that I would have found un-objectionable a century ago would probably seem bad to me today. In shortâthis consideration is basically âmoral uncertaintyâ. I think extinction-risk reduction is, though not recommended on every moral framework, at least recommended on most. I havenât seen other ideas for shaping the future which are as widely recommended.
I am curious about (1) Do you think that changing the moral values/âgoals of the ASIs Humanity would create is not a tractable way to influence the value of the future? If yes, is that because we are not able to change them, or because we donât know which moral values to input, or something else? In the second case, what about inputting the goal of figuring out which goals to pursue (âlong reflectionâ)?
I think yes and for all the reasons. Iâm a bit sceptical that we can change the values ASIs will haveâwe donât understand present models that well, and there are good reasons not to treat how a model outputs text as representative of its goals (it could be hallucinating, it could be deceptive, itâs outputs might just not be isomorphic to a reward structure).
And even if we could, I donât know of any non-controversial value to instill in the ASI, that isnât just included in basic attempts to control the ASI (which Iâd be doing mostly for extinction related reasons).
Iâm going to press on point 2; I think this is self-defeating as it suggests the future will just be bad, so by this line of reasoning we shouldnât even try to reduce extinction risks.
Two major reasons/â considerations:
1- Iâm unconvinced of the tractability of non-extinction-risk reducing longtermist interventions.
2- Perhaps this is self-defeatingâbut I feel uncomfortable substantively shaping the future in ways that arenât merely making sure it exists. Visions of the future that I would have found un-objectionable a century ago would probably seem bad to me today. In shortâthis consideration is basically âmoral uncertaintyâ. I think extinction-risk reduction is, though not recommended on every moral framework, at least recommended on most. I havenât seen other ideas for shaping the future which are as widely recommended.
I am curious about (1)
Do you think that changing the moral values/âgoals of the ASIs Humanity would create is not a tractable way to influence the value of the future?
If yes, is that because we are not able to change them, or because we donât know which moral values to input, or something else?
In the second case, what about inputting the goal of figuring out which goals to pursue (âlong reflectionâ)?
I think yes and for all the reasons. Iâm a bit sceptical that we can change the values ASIs will haveâwe donât understand present models that well, and there are good reasons not to treat how a model outputs text as representative of its goals (it could be hallucinating, it could be deceptive, itâs outputs might just not be isomorphic to a reward structure).
And even if we could, I donât know of any non-controversial value to instill in the ASI, that isnât just included in basic attempts to control the ASI (which Iâd be doing mostly for extinction related reasons).
Iâm going to press on point 2; I think this is self-defeating as it suggests the future will just be bad, so by this line of reasoning we shouldnât even try to reduce extinction risks.