I am curious about (1) Do you think that changing the moral values/goals of the ASIs Humanity would create is not a tractable way to influence the value of the future? If yes, is that because we are not able to change them, or because we don’t know which moral values to input, or something else? In the second case, what about inputting the goal of figuring out which goals to pursue (“long reflection”)?
I think yes and for all the reasons. I’m a bit sceptical that we can change the values ASIs will have—we don’t understand present models that well, and there are good reasons not to treat how a model outputs text as representative of its goals (it could be hallucinating, it could be deceptive, it’s outputs might just not be isomorphic to a reward structure).
And even if we could, I don’t know of any non-controversial value to instill in the ASI, that isn’t just included in basic attempts to control the ASI (which I’d be doing mostly for extinction related reasons).
I’m going to press on point 2; I think this is self-defeating as it suggests the future will just be bad, so by this line of reasoning we shouldn’t even try to reduce extinction risks.
I am curious about (1)
Do you think that changing the moral values/goals of the ASIs Humanity would create is not a tractable way to influence the value of the future?
If yes, is that because we are not able to change them, or because we don’t know which moral values to input, or something else?
In the second case, what about inputting the goal of figuring out which goals to pursue (“long reflection”)?
I think yes and for all the reasons. I’m a bit sceptical that we can change the values ASIs will have—we don’t understand present models that well, and there are good reasons not to treat how a model outputs text as representative of its goals (it could be hallucinating, it could be deceptive, it’s outputs might just not be isomorphic to a reward structure).
And even if we could, I don’t know of any non-controversial value to instill in the ASI, that isn’t just included in basic attempts to control the ASI (which I’d be doing mostly for extinction related reasons).
I’m going to press on point 2; I think this is self-defeating as it suggests the future will just be bad, so by this line of reasoning we shouldn’t even try to reduce extinction risks.