I think the ELI5 on AI alignment is the same as it has been: make nice AI. Being a little more specific I like Russell’s slightly more precise formulation of this as “align AI with human values”, and being even more specific (without jumping to mathematical notation), I’d say we want to design AI that value what humans value and for us to believe these AI share our values.
Maybe the key thing I’m trying to get at though is that alignable AI will be phenomenally conscious, or in ELI5 terms as much people as anything else (humans, animals, etc.). So then my position is not just “make nice AI” but “make nice AI people we can believe are nice”.
I think the ELI5 on AI alignment is the same as it has been: make nice AI. Being a little more specific I like Russell’s slightly more precise formulation of this as “align AI with human values”, and being even more specific (without jumping to mathematical notation), I’d say we want to design AI that value what humans value and for us to believe these AI share our values.
Maybe the key thing I’m trying to get at though is that alignable AI will be phenomenally conscious, or in ELI5 terms as much people as anything else (humans, animals, etc.). So then my position is not just “make nice AI” but “make nice AI people we can believe are nice”.
Thanks, Gordon.
“Make nice AI people we can believe are nice” makes sense to me; I hadn’t been aware of the ”...we can believe are nice” requirement.