So ok, the AI knows that some human values are unknown to the AI.
What does the AI do about this?
The AI can do some action that maximizes the known-human-values, and risk hurting others.
The AI can do nothing and wait until it knows more (wait how long? There could always be missing values).
Something I’m not sure I understood from the article:
Does the AI assume that the AI is able to list all the possible values that humans maybe care about? Is this how the AI is supposed to guard against any of the possible-human-values from going down too much?
An AI that is aware that value is fragile will behave in a much more cautious way. This gives a different dynamic to the extrapolation process.
Thanks!
So ok, the AI knows that some human values are unknown to the AI.
What does the AI do about this?
The AI can do some action that maximizes the known-human-values, and risk hurting others.
The AI can do nothing and wait until it knows more (wait how long? There could always be missing values).
Something I’m not sure I understood from the article:
Does the AI assume that the AI is able to list all the possible values that humans maybe care about? Is this how the AI is supposed to guard against any of the possible-human-values from going down too much?