A very good example of the sort of risks that I’m referring to is based on a modified version of the ultimatum game and comes from the Soares and Fallenstein paper “Toward Idealized Decision Theory”:
Consider a simple two-player game, described by Slepnev (2011), played by a human and an agent which is capable of fully simulating the human and which acts according to the prescriptions of [Updateless Decision Theory (UDT)]. The game works as follows: each player must write down an integer between 0 and 10. If both numbers sum to 10 or less, then each player is paid according to the number that they wrote down. Otherwise, they are paid nothing. For example, if one player writes down 4 and the other 3, then the former gets paid $4 while the latter gets paid $3. But if both players write down 6, then neither player gets paid. Say the human player reasons as follows:
I don’t quite know how UDT works, but I remember hearing that it’s a very powerful predictor. So if I decide to write down 9, then it will predict this, and it will decide to write 1. Therefore, I can write down 9 without fear.
The human writes down 9, and UDT, predicting this, prescribes writing down 1.
This result is uncomfortable, in that the agent with superior predictive power “loses” to the “dumber” agent. In this scenario, it is almost as if the human’s lack of ability to predict UDT (while using correct abstract reasoning about the UDT algorithm) gives the human an “epistemic high ground” or “first mover advantage.” It seems unsatisfactory that increased predictive power can harm an agent.
A solution to this problem would have to come from the area of decision theory. It probably can’t be part of the sort of collaborative decision-making system that we envision here. Maybe there is a way to make such a problem statement inconsistent because the smarter agent would’ve committed to writing down 5 and signaled that sufficiently long in advance of the game. Ozzie also suggests that introducing randomness along the lines of the madman theory may be a solution concept.
Modified Ultimatum Game
A very good example of the sort of risks that I’m referring to is based on a modified version of the ultimatum game and comes from the Soares and Fallenstein paper “Toward Idealized Decision Theory”:
A solution to this problem would have to come from the area of decision theory. It probably can’t be part of the sort of collaborative decision-making system that we envision here. Maybe there is a way to make such a problem statement inconsistent because the smarter agent would’ve committed to writing down 5 and signaled that sufficiently long in advance of the game. Ozzie also suggests that introducing randomness along the lines of the madman theory may be a solution concept.