I think the ELI5 on AI alignment is the same as it has been: make nice AI. Being a little more specific I like Russell’s slightly more precise formulation of this as “align AI with human values”, and being even more specific (without jumping to mathematical notation), I’d say we want to design AI that value what humans value and for us to believe these AI share our values.
Maybe the key thing I’m trying to get at though is that alignable AI will be phenomenally conscious, or in ELI5 terms as much people as anything else (humans, animals, etc.). So then my position is not just “make nice AI” but “make nice AI people we can believe are nice”.
Hi Gordon, I don’t have accounts on LW or Medium so I’ll comment on your original post here.
If possible, could you explain like I’m five what your working definition of the AI alignment problem is?
I find it hard to prioritize causes that I don’t understand in simple terms.
I think the ELI5 on AI alignment is the same as it has been: make nice AI. Being a little more specific I like Russell’s slightly more precise formulation of this as “align AI with human values”, and being even more specific (without jumping to mathematical notation), I’d say we want to design AI that value what humans value and for us to believe these AI share our values.
Maybe the key thing I’m trying to get at though is that alignable AI will be phenomenally conscious, or in ELI5 terms as much people as anything else (humans, animals, etc.). So then my position is not just “make nice AI” but “make nice AI people we can believe are nice”.
Thanks, Gordon.
“Make nice AI people we can believe are nice” makes sense to me; I hadn’t been aware of the ”...we can believe are nice” requirement.