Thanks for the AMA. I’m most curious as to what MIRI’s working definition is for what has intrinsic value. The core worry of MIRI has been that it’s easy to get the AI value problem wrong, to build AIs that don’t value the correct thing. But how do we humans get the value problem right? What should we value?
Quantum effects aside, a truly well-defined goal would specify how all particles in our Universe should be arranged at the end of time. [But] what particle arrangement is preferable, anyway? … What is the ultimate ethical imperative, i.e., how should we strive to shape the future of our Universe? If we fail to answer the last question rigorously, this future is unlikely to contain humans.
So I have two questions:
(1) Do you see this (e.g., what Tegmark is speaking about above) as part of MIRI’s bailiwick?
(2) If so, do you have any thoughts or research directions you can share publicly?
We don’t have a working definition of “what has intrinsic value.” My basic view on these hairy problems (“but what should I value?”) is that we really don’t want to be coding in the answer by hand. I’m more optimistic about building something that has a few layers of indirection, e.g., something that figures out how to act as intended, rather than trying to transmit your object-level intentions by hand.
In the paper you linked, I think Max is raising about a slightly different issue. He’s talking about what we would call the ontology identification problem. Roughly, imagine building an AI system that you want to produce lots of diamond. Maybe it starts out with an atomic model of the universe, and you (looking at its model) give it a utility function that scores one point per second for every carbon atom covalently bound to four other carbon atoms (and then time-discounts or something). Later, the system develops a nuclear model of the universe. You do want it to somehow deduce that carbon atoms in the old model map onto six-proton atoms in the new model, and maybe query the user about how to value carbon isotopes in its diamond lattice. You don’t want it to conclude that none of these six-proton nuclei pattern-match to “true carbon”, and then turn the universe upside down looking for some hidden cache of “true carbon.”
Hi Nate,
Thanks for the AMA. I’m most curious as to what MIRI’s working definition is for what has intrinsic value. The core worry of MIRI has been that it’s easy to get the AI value problem wrong, to build AIs that don’t value the correct thing. But how do we humans get the value problem right? What should we value?
Max Tegmark alludes to this in Friendly Artificial Intelligence: the Physics Challenge:
So I have two questions: (1) Do you see this (e.g., what Tegmark is speaking about above) as part of MIRI’s bailiwick? (2) If so, do you have any thoughts or research directions you can share publicly?
We don’t have a working definition of “what has intrinsic value.” My basic view on these hairy problems (“but what should I value?”) is that we really don’t want to be coding in the answer by hand. I’m more optimistic about building something that has a few layers of indirection, e.g., something that figures out how to act as intended, rather than trying to transmit your object-level intentions by hand.
In the paper you linked, I think Max is raising about a slightly different issue. He’s talking about what we would call the ontology identification problem. Roughly, imagine building an AI system that you want to produce lots of diamond. Maybe it starts out with an atomic model of the universe, and you (looking at its model) give it a utility function that scores one point per second for every carbon atom covalently bound to four other carbon atoms (and then time-discounts or something). Later, the system develops a nuclear model of the universe. You do want it to somehow deduce that carbon atoms in the old model map onto six-proton atoms in the new model, and maybe query the user about how to value carbon isotopes in its diamond lattice. You don’t want it to conclude that none of these six-proton nuclei pattern-match to “true carbon”, and then turn the universe upside down looking for some hidden cache of “true carbon.”
We have a few different papers that mention this problem, albeit shallowly: Ontological Crises in Artificial Agents’ Value Systems, The Value Learning Problem, Formalizing Two Problems of Realistic World-Models. There’s a lot more work to be done here, and it’s definitely on our radar, though also note that work on this problem is at least a little blocked on attaining a better understanding of how to build multi-level maps of the world.
That diamond/carbon scenario is an excellent concrete example of the ontology problem.