Linyphia comments on AI alignment with humans… but with which humans?

Linyphia 17 Mar 2023 19:31 UTC
10 points
2 ∶ 0
I agree with Miller’s response to mic (6-mos ago). Is it even possible for us to stop avoiding the “hard problem” of human nature?
Also, any given agent’s or interest group’s priorities and agendas always will be dynamic, compounding the problem of maintaining multiple mutually satisfactory alignments. Natural selection has designed us to exhibit complex contingent responsiveness to both subtle and dramatic environmental contingencies.
In addition, the humans providing the feedback, even if THEY can find sustainable alignment amongst themselves (remember they are all reproductive competitors, and deeply programmed by natural selection to act accordingly, intentionally or not), will change over time, possibly a very short time, by being exposed to such power. They will be corrupted by that power, and in due time corrupted absolutely.
Finally, an important Darwinian sub-theory has to do with the problem of nonconscious self-deception. I have to wonder whether even our discussing the possibility of properly managing substantive AI systems is just a way of obscuring from ourselves the utter impossibility, given human nature, of managing them properly, morally, wisely? Are all these conversations a way (a kind of competition) to convince ourselves and others that we or our (perceived) allies deserve the power to program and maintain these systems?
- Geoffrey Miller 17 Mar 2023 22:00 UTC
  3 points
  1 ∶ 0
  Parent
  Linyphia—totally agree (unsurprisingly!).
  You raise good additional points about the dynamism and unpredictability of human values and preferences. Some of that unpredictability may reflect adaptive unpredictability (what biologists call ‘protean behavior’) that makes it harder for evolutionary enemies and rivals to predict what one’s going to do next. I discuss this issue extensively in this 1997 chapter and this 1996 simulation study. Insofar as human values are somewhat adaptively unpredictable by design, for good functional reasons, it will be very hard for reinforcement learning systems to get a good ‘fix’ on our preferences.
  The other issues of adaptive self-deception (e.g. virtue signaling, as discussed in my 2019 book on the topic) about our values, and about AI power corrupting humans, also deserve much more attention in AI alignment work, IMHO.