Artificial Intelligence, Values and Reflective Processes
In psychology, valence refers to the attractiveness, neutrality, or aversiveness of subjective experience. Improving our understanding of valence and its principal components could have large implications for how we approach AI alignment. For example, determining the extent to which valence is an intrinsic property of reality could provide computer-legible targets to align AI towards. This could be investigated experimentally: the relationship between experiences and their neural correlates & subjective reports could be mapped out across a large sample of subjects and cultural contexts.
I’ve been wondering whether AGI independently discovering valence realism could be a “get out clause” for alignment. Maybe this could even happen in a convergent manner with natural abstraction?
Researching valence for AI alignment
Artificial Intelligence, Values and Reflective Processes
In psychology, valence refers to the attractiveness, neutrality, or aversiveness of subjective experience. Improving our understanding of valence and its principal components could have large implications for how we approach AI alignment. For example, determining the extent to which valence is an intrinsic property of reality could provide computer-legible targets to align AI towards. This could be investigated experimentally: the relationship between experiences and their neural correlates & subjective reports could be mapped out across a large sample of subjects and cultural contexts.
I’ve been wondering whether AGI independently discovering valence realism could be a “get out clause” for alignment. Maybe this could even happen in a convergent manner with natural abstraction?