EA: “Their utility functions would not overlap with our utility functions.”
Me: “By that definition, humans are already unaligned with each other. Any given person has almost a completely non-overlapping utility function with a random stranger. People—through their actions rather than words—routinely value their own life and welfare thousands of times higher than that of strangers. Yet even with this misalignment, the world does not end in any strong sense.”
EA: “Sure, but that’s because humans are all roughly the same intelligence and/or capability. Future AIs will be way smarter and more capable than humans.”
Just for the record, this is when I got off the train for this dialogue. I don’t think humans are misaligned with each other in the relevant ways, and if I could press a button to have the universe be optimized by a random human’s coherent extrapolated volition, then that seems great and thousands of times better than what I expect to happen with AI-descendants. I believe this for a mixture of game-theoretic reasons and genuinely thinking that other human’s values do really actually capture most of what I care about.
In this part of the dialogue, when I talk about a utility function of a human, I mean roughly their revealed preferences, rather than their coherent extrapolated volition (which I also think is underspecified). This is important because it is our revealed preferences that better predict our actual behavior, and the point I’m making is simply that behavioral misalignment is common in this sense among humans. And also this fact does not automatically imply the world will end for a given group of humans within humanity.
Just for the record, this is when I got off the train for this dialogue. I don’t think humans are misaligned with each other in the relevant ways, and if I could press a button to have the universe be optimized by a random human’s coherent extrapolated volition, then that seems great and thousands of times better than what I expect to happen with AI-descendants. I believe this for a mixture of game-theoretic reasons and genuinely thinking that other human’s values do really actually capture most of what I care about.
In this part of the dialogue, when I talk about a utility function of a human, I mean roughly their revealed preferences, rather than their coherent extrapolated volition (which I also think is underspecified). This is important because it is our revealed preferences that better predict our actual behavior, and the point I’m making is simply that behavioral misalignment is common in this sense among humans. And also this fact does not automatically imply the world will end for a given group of humans within humanity.