Are people in the AI safety community thinking about this?
Yes. They think about this more on the policy side than on the technical side, but there is technical/policy cross-over work too.
Should we be concerned that an aligned AI’s values will be set by (for example) the small team that created it, who might have idiosyncratic and/or bad values?
Yes.
There is significant of talk about ‘aligned with whom exactly’. But many of the more technical papers and blog posts on x-risk style alignment tend to ignore this part of the problem, or mention it only in one or two sentences and then move on. This does not necessarily mean that the authors are unconcerned about this question, it more often means that they feel they have little new to say about it.
If you want to see an example of a vigorous and occasionally politically sophisticated debate on solving the ‘aligned with whom’ question, instead of the moral philosophy 101⁄201 debate which is still the dominant form of discourse in the x-risk community, you can dip into the literature on AI fairness.
Yes. They think about this more on the policy side than on the technical side, but there is technical/policy cross-over work too.
Yes.
There is significant of talk about ‘aligned with whom exactly’. But many of the more technical papers and blog posts on x-risk style alignment tend to ignore this part of the problem, or mention it only in one or two sentences and then move on. This does not necessarily mean that the authors are unconcerned about this question, it more often means that they feel they have little new to say about it.
If you want to see an example of a vigorous and occasionally politically sophisticated debate on solving the ‘aligned with whom’ question, instead of the moral philosophy 101⁄201 debate which is still the dominant form of discourse in the x-risk community, you can dip into the literature on AI fairness.