This is a great idea! I expect to use these threads to ask many many basic questions.
One on my mind recently: assuming we succeed in creating aligned AI, whose values, or which values, will the AI be aligned with? We talk of ‘human values’, but humans have wildly differing values. Are people in the AI safety community thinking about this? Should we be concerned that an aligned AI’s values will be set by (for example) the small team that created it, who might have idiosyncratic and/or bad values?
Are people in the AI safety community thinking about this?
Yes. They think about this more on the policy side than on the technical side, but there is technical/policy cross-over work too.
Should we be concerned that an aligned AI’s values will be set by (for example) the small team that created it, who might have idiosyncratic and/or bad values?
Yes.
There is significant of talk about ‘aligned with whom exactly’. But many of the more technical papers and blog posts on x-risk style alignment tend to ignore this part of the problem, or mention it only in one or two sentences and then move on. This does not necessarily mean that the authors are unconcerned about this question, it more often means that they feel they have little new to say about it.
If you want to see an example of a vigorous and occasionally politically sophisticated debate on solving the ‘aligned with whom’ question, instead of the moral philosophy 101⁄201 debate which is still the dominant form of discourse in the x-risk community, you can dip into the literature on AI fairness.
An AI could be aligned to something other than humanity’s shared values, and this could potentially prevent most of the value in the universe from being realized. Nate Soares talks about this in Don’t leave your fingerprints on the future.
Most of the focus goes on being able to align an AI at all, as this is necessary for any win-state. There seems to be consensus among the relevant actors that seizing the cosmic endowment for themselves would be a Bad Thing. Hopefully this will hold.
This is a great idea! I expect to use these threads to ask many many basic questions.
One on my mind recently: assuming we succeed in creating aligned AI, whose values, or which values, will the AI be aligned with? We talk of ‘human values’, but humans have wildly differing values. Are people in the AI safety community thinking about this? Should we be concerned that an aligned AI’s values will be set by (for example) the small team that created it, who might have idiosyncratic and/or bad values?
Yes. They think about this more on the policy side than on the technical side, but there is technical/policy cross-over work too.
Yes.
There is significant of talk about ‘aligned with whom exactly’. But many of the more technical papers and blog posts on x-risk style alignment tend to ignore this part of the problem, or mention it only in one or two sentences and then move on. This does not necessarily mean that the authors are unconcerned about this question, it more often means that they feel they have little new to say about it.
If you want to see an example of a vigorous and occasionally politically sophisticated debate on solving the ‘aligned with whom’ question, instead of the moral philosophy 101⁄201 debate which is still the dominant form of discourse in the x-risk community, you can dip into the literature on AI fairness.
An AI could be aligned to something other than humanity’s shared values, and this could potentially prevent most of the value in the universe from being realized. Nate Soares talks about this in Don’t leave your fingerprints on the future.
Most of the focus goes on being able to align an AI at all, as this is necessary for any win-state. There seems to be consensus among the relevant actors that seizing the cosmic endowment for themselves would be a Bad Thing. Hopefully this will hold.