Just to say slightly more on this, I think the Bomb case is again useful for illustrating my (I think not uncommon) intuitions here.
Bomb Case: Omega puts a million dollars in a transparent box if he predicts you’ll open it. He puts a bomb in the transparent box if he predicts you won’t open it. He’s only wrong about one in a trillion times.
Now suppose you enter the room and see that there’s a bomb in the box. You know that if you open the box, the bomb will explode and you will die a horrible and painful death. If you leave the room and don’t open the box, then nothing bad will happen to you. You’ll return to a grateful family and live a full and healthy life. You understand all this. You want so badly to live. You then decide to walk up to the bomb and blow yourself up.
Intuitively, this decision strikes me as deeply irrational. You’re intentionally taking an action that you know will cause a horrible outcome that you want badly to avoid. It feels very relevant that you’re flagrantly violating the “Don’t Make Things Worse” principle.
Now, let’s step back a time step. Suppose you know that you’re sort of person who would refuse to kill yourself by detonating the bomb. You might decide that—since Omega is such an accurate predictor—it’s worth taking a pill to turn you into that sort of person, to increase your odds of getting a million dollars. You recognize that this may lead you, in the future, to take an action that makes things worse in a horrifying way. But you calculate that the decision you’re making now is nonetheless making things better in expectation.
This decision strikes me as pretty intuitively rational. You’re violating the second principle—the “Don’t Commit to a Policy...” Principle—but this violation just doesn’t seem that intuitively relevent or remarkable to me. I personally feel like there is nothing too odd about the idea that it can be rational to commit to violating principles of rationality in the future.
(This obviously just a description of my own intuitions, as they stand, though.)
It feels very relevant that you’re flagrantly violating the “Don’t Make Things Worse” principle.
By triggering the bomb, you’re making things worse from your current perspective, but making things better from the perspective of earlier you. Doesn’t that seem strange and deserving of an explanation? The explanation from a UDT perspective is that by updating upon observing the bomb, you actually changed your utility function. You used to care about both the possible worlds where you end up seeing a bomb in the box, and the worlds where you don’t. After updating, you think you’re either a simulation within Omega’s prediction so your action has no effect on yourself or you’re in the world with a real bomb, and you no longer care about the version of you in the world with a million dollars in the box, and this accounts for the conflict/inconsistency.
Giving the human tendency to change our (UDT-)utility functions by updating, it’s not clear what to do (or what is right), and I think this reduces UDT’s intuitive appeal and makes it less of a slam-dunk over CDT/EDT. But it seems to me that it takes switching to the UDT perspective to even understand the nature of the problem. (Quite possibly this isn’t adequately explained in MIRI’s decision theory papers.)
Just to say slightly more on this, I think the Bomb case is again useful for illustrating my (I think not uncommon) intuitions here.
Bomb Case: Omega puts a million dollars in a transparent box if he predicts you’ll open it. He puts a bomb in the transparent box if he predicts you won’t open it. He’s only wrong about one in a trillion times.
Now suppose you enter the room and see that there’s a bomb in the box. You know that if you open the box, the bomb will explode and you will die a horrible and painful death. If you leave the room and don’t open the box, then nothing bad will happen to you. You’ll return to a grateful family and live a full and healthy life. You understand all this. You want so badly to live. You then decide to walk up to the bomb and blow yourself up.
Intuitively, this decision strikes me as deeply irrational. You’re intentionally taking an action that you know will cause a horrible outcome that you want badly to avoid. It feels very relevant that you’re flagrantly violating the “Don’t Make Things Worse” principle.
Now, let’s step back a time step. Suppose you know that you’re sort of person who would refuse to kill yourself by detonating the bomb. You might decide that—since Omega is such an accurate predictor—it’s worth taking a pill to turn you into that sort of person, to increase your odds of getting a million dollars. You recognize that this may lead you, in the future, to take an action that makes things worse in a horrifying way. But you calculate that the decision you’re making now is nonetheless making things better in expectation.
This decision strikes me as pretty intuitively rational. You’re violating the second principle—the “Don’t Commit to a Policy...” Principle—but this violation just doesn’t seem that intuitively relevent or remarkable to me. I personally feel like there is nothing too odd about the idea that it can be rational to commit to violating principles of rationality in the future.
(This obviously just a description of my own intuitions, as they stand, though.)
By triggering the bomb, you’re making things worse from your current perspective, but making things better from the perspective of earlier you. Doesn’t that seem strange and deserving of an explanation? The explanation from a UDT perspective is that by updating upon observing the bomb, you actually changed your utility function. You used to care about both the possible worlds where you end up seeing a bomb in the box, and the worlds where you don’t. After updating, you think you’re either a simulation within Omega’s prediction so your action has no effect on yourself or you’re in the world with a real bomb, and you no longer care about the version of you in the world with a million dollars in the box, and this accounts for the conflict/inconsistency.
Giving the human tendency to change our (UDT-)utility functions by updating, it’s not clear what to do (or what is right), and I think this reduces UDT’s intuitive appeal and makes it less of a slam-dunk over CDT/EDT. But it seems to me that it takes switching to the UDT perspective to even understand the nature of the problem. (Quite possibly this isn’t adequately explained in MIRI’s decision theory papers.)