Jobst Heitzig (vodle.it) answers What new psychology research could best promote AI safety & alignment research?

Jobst Heitzig (vodle.it) 13 Jul 2023 19:04 UTC
3 points
1 ∶ 0
As long as state-of-the-art alignment attempts by industry involve eliciting human evaluations of actual or hypothetical AI behaviors (e.g. responses a chatbot might give to a prompt, as in RLHF), it seems important to understand the psychological aspects of such human-AI interactions. I plan to do some experiments on what I call collective RLHF myself, more from a social choice perspective (see http://amsterdam.vodle.it ), and can imagine collaborating on similar questions.
- Geoffrey Miller 13 Jul 2023 20:28 UTC
  5 points
  1 ∶ 0
  Parent
  Jobst—yes, I think ew need a lot more psych research on how to elicit the human values that AI systems are trying to align with. Especially given that some of our most important values either can’t be articulated very well, or are too ‘obvious’ and ‘common-sensical’ to be discussed much, or are embodied in our physical phenotypes rather than articulated in our brains.
  - Jobst Heitzig (vodle.it) 13 Jul 2023 21:47 UTC
    3 points
    0 ∶ 0
    Parent
    This becomes particularly important in human feedback/input about “higher-level” or more “abstract” questions, as in OpenAI’s deliberative mini-public / citizen assembly idea (https://openai.com/blog/democratic-inputs-to-ai).