Newcomb’s problem isn’t a challenge to causal decision theory. I can solve Newcomb’s problem by committing to one-boxing in any of a number of ways e.g. signing a contract or building a reputation as a one-boxer. After the boxes have already been placed in front of me, however, I can no longer influence their contents, so it would be good if I two-boxed if the rewards outweighed the penalty e.g. if it turned out the contract I signed was void, or if I don’t care about my one-boxing reputation because I don’t think I’m going to play this game again in the future.
The “wishful thinking” hypothesis might just apply to me then. I think it would be super cool if we could spontaneously cooperate with aliens in other universes.
Edit: Wow, ok I remember what I actually meant about wishful thinking. I meant that evidential decision theory literally prescribes wishful thinking. Also, if you made a copy of a purely selfish person and then told them of the fact, then I still think it would be rational to defect. Of course, if they could commit to cooperating before being copied, then that would be the right strategy.
After the boxes have already been placed in front of me, however, I can no longer influence their contents, so it would be good if I two-boxed
You would get more utility if you were willing to one-box even when there’s no external penalty or opportunity to bind yourself to the decision. Indeed, functional decision theory can be understood as a formalization of the intuition: “I would be better off if only I could behave in the way I would have precommitted to behave in every circumstance, without actually needing to anticipate each such circumstance in advance.” Since the predictor in Newcomb’s problem fills the boxes based on your actual action, regardless of the reasoning or contract-writing or other activities that motivate the action, this suffices to always get the higher payout (compared to causal or evidential decision theory).
There are also dilemmas where causal decision theory gets less utility even if it has the opportunity to precommit to the dilemma; e.g., retro blackmail.
Ha, I think the problem is just that your formalization of Newcomb’s problem is defined so that one-boxing is always the correct strategy, and I’m working with a different formulation. There are four forms of Newcomb’s problem that jibe with my intuition, and they’re all different from the formalization you’re working with.
Your source code is readable. Then the best strategy is whatever the best strategy is when you get to publicly commit e.g. you should tear off the wheel when playing chicken if you have the opportunity to do so before your opponent.
Your source code is readable and so is your opponent’s. Then you get mathy things like mutual simulation and lob’s theorem.
We’re in the real world, so the only information the other player has to guess your strategy is information like your past behavior and reputation. (This is by far the most realistic situation in my opinion.)
You’re playing against someone who’s an expert in reading body language, say. Then it might be impossible to fool them unless you can fool yourself into thinking you’ll one-box. But of course, after the boxes are actually in front of you, it would be great for you if you had a change of heart.
Your version is something like
Your opponent can simulate you with 100% accuracy, including unforeseen events like something unexpected causing you to have a change of mind.
If we’re creating AIs that others can simulate, then I guess we might as well make them immune to retro blackmail. I still don’t see the implications for humans, who cannot be simulated with 100% fidelity and already have ample intuition about their reputations and know lots of ways to solve coordination problems.
Newcomb’s problem isn’t a challenge to causal decision theory. I can solve Newcomb’s problem by committing to one-boxing in any of a number of ways e.g. signing a contract or building a reputation as a one-boxer. After the boxes have already been placed in front of me, however, I can no longer influence their contents, so it would be good if I two-boxed if the rewards outweighed the penalty e.g. if it turned out the contract I signed was void, or if I don’t care about my one-boxing reputation because I don’t think I’m going to play this game again in the future.
The “wishful thinking” hypothesis might just apply to me then. I think it would be super cool if we could spontaneously cooperate with aliens in other universes.
Edit: Wow, ok I remember what I actually meant about wishful thinking. I meant that evidential decision theory literally prescribes wishful thinking. Also, if you made a copy of a purely selfish person and then told them of the fact, then I still think it would be rational to defect. Of course, if they could commit to cooperating before being copied, then that would be the right strategy.
You would get more utility if you were willing to one-box even when there’s no external penalty or opportunity to bind yourself to the decision. Indeed, functional decision theory can be understood as a formalization of the intuition: “I would be better off if only I could behave in the way I would have precommitted to behave in every circumstance, without actually needing to anticipate each such circumstance in advance.” Since the predictor in Newcomb’s problem fills the boxes based on your actual action, regardless of the reasoning or contract-writing or other activities that motivate the action, this suffices to always get the higher payout (compared to causal or evidential decision theory).
There are also dilemmas where causal decision theory gets less utility even if it has the opportunity to precommit to the dilemma; e.g., retro blackmail.
For a fuller argument, see the paper “Functional Decision Theory” by Yudkowsky and Soares.
Ha, I think the problem is just that your formalization of Newcomb’s problem is defined so that one-boxing is always the correct strategy, and I’m working with a different formulation. There are four forms of Newcomb’s problem that jibe with my intuition, and they’re all different from the formalization you’re working with.
Your source code is readable. Then the best strategy is whatever the best strategy is when you get to publicly commit e.g. you should tear off the wheel when playing chicken if you have the opportunity to do so before your opponent.
Your source code is readable and so is your opponent’s. Then you get mathy things like mutual simulation and lob’s theorem.
We’re in the real world, so the only information the other player has to guess your strategy is information like your past behavior and reputation. (This is by far the most realistic situation in my opinion.)
You’re playing against someone who’s an expert in reading body language, say. Then it might be impossible to fool them unless you can fool yourself into thinking you’ll one-box. But of course, after the boxes are actually in front of you, it would be great for you if you had a change of heart.
Your version is something like
Your opponent can simulate you with 100% accuracy, including unforeseen events like something unexpected causing you to have a change of mind.
If we’re creating AIs that others can simulate, then I guess we might as well make them immune to retro blackmail. I still don’t see the implications for humans, who cannot be simulated with 100% fidelity and already have ample intuition about their reputations and know lots of ways to solve coordination problems.