If it was the only thing we wanted we could actually work to explicitly specify that as the AI’s goal, and that’s CEV and hence problem solved.
This is just an aside, but it might be informative. I actually think that
single alignment: “This specific blob of meat here “is” an “agent”. Figure out its utility function and do that”
is going to be simpler to program than
Hard-coded: “make a large number of many “humanlike” things “experience” “happiness”″
I think it’s clear that.. there seem to be more things in the hard-coded solution that we don’t know how to formalize, and we’re much further from knowing how to formalize them (we’re pretty deep into agency already). And it’s fairly likely we’ll arrive at a robust account of agency in the process of developing AGI, or as a result of stumbling onto it, it seems to be one of the short paths to it.
I agree about the hmm, murcurial quality of of human agency. There seem to be a lot of triggers that “change” the utility function. Note, I don’t know if anyone knows what we mean when we say that a utility function was undermined, by a compelling speech, or by acclimating to the cold water, or by falling in love or whatever. It’s strange. And humans generally tend to be indifferent to these changes, they identify with process of change itself. In a sense they all must be part of one utility function (~ “when I am in bed I want to stay in bed, but if I need to pee I would prefer to get out of bed, but once I’m out of bed I will not want to go back to bed. That is my will. That is what it means to be human. If you remove this then it’s a dystopia.”), but somehow we have to exclude changes like… being convinced by a superintelligence’s rhetoric that maximizing paperclips is more important than preserving human life. Somehow we know that’s a bad change, even though we don’t have any intuitive aversion to reading superintelligences’ rhetoric. Even though we (Well I think I would be at least) know we’d be convinced by it, somehow we know that we want to exclude that possibility.
This is just an aside, but it might be informative. I actually think that
single alignment: “This specific blob of meat here “is” an “agent”. Figure out its utility function and do that”
is going to be simpler to program than
Hard-coded: “make a large number of many “humanlike” things “experience” “happiness”″
I think it’s clear that.. there seem to be more things in the hard-coded solution that we don’t know how to formalize, and we’re much further from knowing how to formalize them (we’re pretty deep into agency already). And it’s fairly likely we’ll arrive at a robust account of agency in the process of developing AGI, or as a result of stumbling onto it, it seems to be one of the short paths to it.
I agree about the hmm, murcurial quality of of human agency. There seem to be a lot of triggers that “change” the utility function. Note, I don’t know if anyone knows what we mean when we say that a utility function was undermined, by a compelling speech, or by acclimating to the cold water, or by falling in love or whatever. It’s strange. And humans generally tend to be indifferent to these changes, they identify with process of change itself. In a sense they all must be part of one utility function (~ “when I am in bed I want to stay in bed, but if I need to pee I would prefer to get out of bed, but once I’m out of bed I will not want to go back to bed. That is my will. That is what it means to be human. If you remove this then it’s a dystopia.”), but somehow we have to exclude changes like… being convinced by a superintelligence’s rhetoric that maximizing paperclips is more important than preserving human life. Somehow we know that’s a bad change, even though we don’t have any intuitive aversion to reading superintelligences’ rhetoric. Even though we (Well I think I would be at least) know we’d be convinced by it, somehow we know that we want to exclude that possibility.