I don’t want to get into a long back-and-forth here, but for the record I still think you’re misunderstanding what I flippantly described as “other Everett branches” and missing the entire motivation behind Counterfactual Mugging. It is definitely not supposed to directly make sense in the exact situation you’re in. I think this is part of why a variant of it is called “updateless”, because it makes a principled refusal to update on which world you find yourself in in order to (more flippant not-quite-right description) program the type of AIs that would weird games played against omniscient entities.
If the demon would only create me conditional on me cutting off my legs after I existed, and it was the specific class of omniscient entity that FDT is motivated by winning games with, then I would endorse cutting off my legs in that situation.
(as a not-exactly-right-but-maybe-helpful intuition pump, consider that if the demon isn’t omniscient—but simply reads the EA Forum—or more strictly can predict the text that will appear on the EA Forum years in the future—it would now plan to create me but not you, and I with my decision theory would be better off than you with yours. And surely omniscience is a stronger case than just reads-the-EA-Forum!)
If this sounds completely stupid to you, and you haven’t yet read the LW posts on Counterfactual Mugging. I would recommend starting there; otherwise, consider finding a competent and motivated FDT proponent (ie not me) and trying to do some kind of double-crux or debate with them, I’d be interested in seeing the results.
Oh sorry, yeah I misunderstood what point you were making. I agree that you want to be the type of agent who cuts off their legs—you become better off in expectation. But the mere fact that the type of agent who does A rather than B gets more utility on average does not mean that you should necessarily do A rather than B. If you know you are in a situation where doing A is guaranteed to get you less utility than B, you should do B. The question of which agent you should want to be is not the same as which agent is acting rationally. I agree with MacAskill’s suggestion that FDT is the result of conflating what type of agent to be with what actions are rational. FDT is close to the right answer for the second and a crazy answer for the first imo.
Happy to debate someone about FDT. I’ll make a post on LessWrong about it.
One other point, I know that this will sound like a cop-out, but I think that the FDT stuff is the weakest example in the post. I am maybe 95% confident that FDT is wrong, while 99.9% confident that Eliezer’s response to zombies fails and 99.9% confident that he’s overconfident about animal consciousness.
Sorry if I misunderstood your point. I agree this is the strongest objection against FDT. I think there is some sense in which I can become the kind of agent who cuts off their legs (ie by choosing to cut off my legs), but I admit this is poorly specified.
I think there’s a stronger case for, right now, having heard about FDT for the first time, deciding I will follow FDT in the future. Various gods and demons can observe this and condition on my decision, so when the actual future comes around, they will treat me as an FDT-following agent rather than a non-FDT-following agent. Even though future-created-me isn’t exactly in a position to influence the (long-since gone) demon, current me is in a position to make this decision for future relevant situations, and should decide to follow FDT in general. Part of this decision I’ve made involves being the kind of person who would take the FDT option in hypothetical scenarios.
Then there’s the additional question of whether to defect against the demons/gods later, and say “Haha, back in August 2023 I resolved to become an FDT agent, and I fooled you into believing me, but now that I’ve been created I’m just going to not cut off my legs after all”. I think of this as—suppose every past being created by the demon has cut off its legs, ie the demon has a 100% predictive success rate over millions of cases. So the demon would surely predict if I would do this. That means I should (now) try really hard not to do this. Cf. Parfit’s Hitchhiker. Can I bind my future self like this? I think empirically yes—I think I have enough honor that if I tell hypothetical demon gods now that I’m going to do various things, I can actually do them when the time comes. This will be “irrational” in some sense, but I’ll still end up with more utility than everyone else.
Is there some sense in which, if I decide not to cut off my legs, I would wink out of existence? I admit feeling a superstitious temptation to believe this (a non-superstitious justification might be wondering if I’m the real me, or a version of me in the omniscient demon’s simulation to predict what I would do). I think the literal answer is no but that it’s practically useful to keep my superstitious belief in this to allow myself to do the irrational thing that gets me more utility. But this is a weird enough sidetrack that I’m really not sure I’m still in normal Eliezer-approved-decision-theory-land at all.
I think an easier question is whether you should program an AI to always keep its pre-emptive bargains with gods and demons; here the answer is just straightforwardly yes. You don’t have to assume that your actions alter your algorithm, you can just alter the algorithm directly. I think this is what Eliezer is most interested in, though I’m not sure.
I know you said you didn’t want to repeatedly go back and forth, but . . .
Yes, I agree that if you have some psychological mechanism by which you can guarantee that you’ll follow through on future promises—like programming an AI—then that’s worth it. It’s better to be the kind of agent who follows FDT (in many cases). But the way I’d think about this is that this is an example of rational irrationality, where it’s rational to try to get yourself to do something irrational in the future because you get rewarded for it. But remember, decision theories are theories about what’s rational, not theories about what kind of agent you should be.
I think we agree with both of the following claims:
If you have some way to commit in advance to follow FDT in cases like the demon case or the bomb case, you should do so.
Once you are in those cases, you have most reason to defect.
Given that you can predict that you’ll have most reason to defect, you can sort of psychologically make a deal with your future self where you say “NO REALLY, DON’T DEFECT, I’M SERIOUS.”
My claim though, is that decision theory is about 2, rather than 1 or 3. No one disputes that the kinds of agents who two box do worse than the kinds of agents who one box—the question is about what you should do once you’re in that situation.
If an AI is going to encounter Newcombe’s problem a lot, everyone agrees you should program it to one box.
I guess any omniscient demon reading this to assess my ability to precommit will have learned I can’t even precommit effectively to not having long back-and-forth discussions, let alone cutting my legs off. But I’m still interested in where you’re coming from here since I don’t think I’ve heard your exact position before.
Would you endorse the statement “Eliezer, using his decision theory, will usually end out with more utility than me over a long life of encountering the sorts of weird demonic situations decision theorists analyze, I just think he is less formally-rational” ?
Or do you expect that you will, over the long run, get more utility than him?
I would agree with the statement “if Eliezer followed his decision theory, and the world was such that one frequently encountered lots of Newcombe’s problems and similar, you’d end up with more utility.” I think my position is relatively like MacAskill’s in the linked post where he says that FDT is better as a theory of the agent you should want to be than what’s rational.
But I think that rationality won’t always benefit you. I think you’d agree with that. If there’s a demon who tortures everyone who believes FDT, then believing FDT, which you’d regard as rational, would make you worse off. If there’s another demon who will secretly torture you if you one box, then one boxing is bad for you! It’s possible to make up contrived scenarios that punish being rational—and Newcombe’s problem is a good example of that.
Notably, if we’re in the twin scenario or the scenario that tortures FDTists, CDT will dramatically beat FDT.
I think the example that’s most worth focusing on is the demon legs cut off case. I think it’s not crazy at all to one box, and have maybe 35% credence that one boxing is right. I have maybe 95% credence that you shouldn’t cut off your legs in the demon case, and 80% confidence that the position that you can is crazy, in the sense that if you spent years thinking about it while being relatively unbiased you’d almost certainly give it up.
I think rather than say that Eliezer is wrong about decision theory, you should say that Eliezer’s goal is to come up with a decision theory that helps him get utility, and your goal is something else, and you have both come up with very nice decision theories for achieving your goal.
(what is your goal?)
My opinion on your response to the demon question is “The demon would never create you in the first place, so who cares what you think?” That is, I think your formulation of the problem includes a paradox—we assume the demon is always right, but also, that you’re in a perfect position to betray it and it can’t stop you. What would actually happen is the demon would create a bunch of people with amputation fetishes, plus me and Eliezer who it knows wouldn’t betray it, and it would never put you in the position of getting to make the choice in real life (as opposed to in an FDT algorithmic way) in the first place. The reason you find the demon example more compelling than the Newcomb example is that it starts by making an assumption that undermines the whole problem—that is, that the demon has failed its omniscience check and created you who are destined to betray it. If your problem setup contains an implicit contradiction, you can prove anything.
I don’t think this is as degenerate a case as “a demon will torture everyone who believes FDT”. If that were true, and I expected to encounter that demon, I would simply try not to believe FDT (insofar as I can voluntarily change my beliefs). While you can always be screwed over by weird demons, I think decision theory is about what to choose in cases where you have all of the available knowledge and also a choice in the matter, and I think the leg demon fits that situation.
The demon case shows that there are cases where FDT loses, as is true of all decision theories. IF the question is which decision theory will programming into an AI generate most utility, then that’s an empirical question that depends on facts about the world. If it’s once you’re in a situation which will get the most utility, well, that’s causal decision theory.
Decision theories are intended as theories of what is rational for you to do. So it describes what choices are wise and which choices are foolish. I think Eliezer is confused about what a decision theory is, but that is a reason to trust his judgment less.
In the demon case, we can assume it’s only almost infallible, so every million times it makes a mistake. The demon case is a better example, because I have some credence in EVT, and EVT entails you should one box. I am waaaaaaaaaaaay more confident FDT is crazy than I am that you should two box.
I thought we already agreed the demon case showed that FDT wins in real life, since FDT agents will consistently end up with more utility than other agents.
Eliezer’s argument is that you can become the kind of entity that is programmed to do X, by choosing to do X. This is in some ways a claim about demons (they are good enough to predict even the choices you made with “your free will”). But it sounds like we’re in fact positing that demons are that good—I don’t know how to explain how they have 999,999/million success rate otherwise—so I think he is right.
I don’t think the demon being wrong one in a million times changes much. 999,999 of the people created by the demon will be some kind of FDT decision theorist with great precommitment skills. If you’re the one who isn’t, you can observe that you’re the demon’s rare mistake and avoid cutting off your legs, but this just means you won the lottery—it’s not a generally winning strategy.
Decision theories are intended as theories of what is rational for you to do. So it describes what choices are wise and which choices are foolish.
I don’t understand why you think that the choices that get you more utility with no drawbacks are foolish, and the choices that cost you utility for no reason are wise.
On the Newcomb’s Problem post, Eliezer explicitly said that he doesn’t care why other people are doing decision theory, he would like to figure out a way to get more utility. Then he did that. I think if you disagree with his goal, you should be arguing “decision theory should be about looking good, not about getting utility” (so we can all laugh at you) rather than saying “Eliezer is confidently and egregiously wrong” and hiding the fact that one of your main arguments is that he said we should try to get utility instead of failing all the time and then came up with a strategy that successfully does that.
We all agree that you should get utility. You are pointing out that FDT agents get more utility. But once they are already in the situation where they’ve been created by the demon, FDT agents get less utility. If you are the type of agent to follow FDT, you will get more utility, just as if you are the type of agent to follow CDT while being in a scenario that tortures FDTists, you’ll get more utility. The question of decision theory is, given the situation you are in, what gets you more utility—what is the rational thing to do. Eliezer’s turns you into the type of agent who often gets more utility, but that does not make it the right decision theory. The fact that you want to be the type of agent who does X doesn’t make doing X rational if doing X is bad for you and not doing X is rewarded artificially.
Again, there is no dispute about whether on average one boxers or two boxers get more utility or which kind of AI you should build.
I don’t want to get into a long back-and-forth here, but for the record I still think you’re misunderstanding what I flippantly described as “other Everett branches” and missing the entire motivation behind Counterfactual Mugging. It is definitely not supposed to directly make sense in the exact situation you’re in. I think this is part of why a variant of it is called “updateless”, because it makes a principled refusal to update on which world you find yourself in in order to (more flippant not-quite-right description) program the type of AIs that would weird games played against omniscient entities.
If the demon would only create me conditional on me cutting off my legs after I existed, and it was the specific class of omniscient entity that FDT is motivated by winning games with, then I would endorse cutting off my legs in that situation.
(as a not-exactly-right-but-maybe-helpful intuition pump, consider that if the demon isn’t omniscient—but simply reads the EA Forum—or more strictly can predict the text that will appear on the EA Forum years in the future—it would now plan to create me but not you, and I with my decision theory would be better off than you with yours. And surely omniscience is a stronger case than just reads-the-EA-Forum!)
If this sounds completely stupid to you, and you haven’t yet read the LW posts on Counterfactual Mugging. I would recommend starting there; otherwise, consider finding a competent and motivated FDT proponent (ie not me) and trying to do some kind of double-crux or debate with them, I’d be interested in seeing the results.
Oh sorry, yeah I misunderstood what point you were making. I agree that you want to be the type of agent who cuts off their legs—you become better off in expectation. But the mere fact that the type of agent who does A rather than B gets more utility on average does not mean that you should necessarily do A rather than B. If you know you are in a situation where doing A is guaranteed to get you less utility than B, you should do B. The question of which agent you should want to be is not the same as which agent is acting rationally. I agree with MacAskill’s suggestion that FDT is the result of conflating what type of agent to be with what actions are rational. FDT is close to the right answer for the second and a crazy answer for the first imo.
Happy to debate someone about FDT. I’ll make a post on LessWrong about it.
One other point, I know that this will sound like a cop-out, but I think that the FDT stuff is the weakest example in the post. I am maybe 95% confident that FDT is wrong, while 99.9% confident that Eliezer’s response to zombies fails and 99.9% confident that he’s overconfident about animal consciousness.
Sorry if I misunderstood your point. I agree this is the strongest objection against FDT. I think there is some sense in which I can become the kind of agent who cuts off their legs (ie by choosing to cut off my legs), but I admit this is poorly specified.
I think there’s a stronger case for, right now, having heard about FDT for the first time, deciding I will follow FDT in the future. Various gods and demons can observe this and condition on my decision, so when the actual future comes around, they will treat me as an FDT-following agent rather than a non-FDT-following agent. Even though future-created-me isn’t exactly in a position to influence the (long-since gone) demon, current me is in a position to make this decision for future relevant situations, and should decide to follow FDT in general. Part of this decision I’ve made involves being the kind of person who would take the FDT option in hypothetical scenarios.
Then there’s the additional question of whether to defect against the demons/gods later, and say “Haha, back in August 2023 I resolved to become an FDT agent, and I fooled you into believing me, but now that I’ve been created I’m just going to not cut off my legs after all”. I think of this as—suppose every past being created by the demon has cut off its legs, ie the demon has a 100% predictive success rate over millions of cases. So the demon would surely predict if I would do this. That means I should (now) try really hard not to do this. Cf. Parfit’s Hitchhiker. Can I bind my future self like this? I think empirically yes—I think I have enough honor that if I tell hypothetical demon gods now that I’m going to do various things, I can actually do them when the time comes. This will be “irrational” in some sense, but I’ll still end up with more utility than everyone else.
Is there some sense in which, if I decide not to cut off my legs, I would wink out of existence? I admit feeling a superstitious temptation to believe this (a non-superstitious justification might be wondering if I’m the real me, or a version of me in the omniscient demon’s simulation to predict what I would do). I think the literal answer is no but that it’s practically useful to keep my superstitious belief in this to allow myself to do the irrational thing that gets me more utility. But this is a weird enough sidetrack that I’m really not sure I’m still in normal Eliezer-approved-decision-theory-land at all.
I think an easier question is whether you should program an AI to always keep its pre-emptive bargains with gods and demons; here the answer is just straightforwardly yes. You don’t have to assume that your actions alter your algorithm, you can just alter the algorithm directly. I think this is what Eliezer is most interested in, though I’m not sure.
I know you said you didn’t want to repeatedly go back and forth, but . . .
Yes, I agree that if you have some psychological mechanism by which you can guarantee that you’ll follow through on future promises—like programming an AI—then that’s worth it. It’s better to be the kind of agent who follows FDT (in many cases). But the way I’d think about this is that this is an example of rational irrationality, where it’s rational to try to get yourself to do something irrational in the future because you get rewarded for it. But remember, decision theories are theories about what’s rational, not theories about what kind of agent you should be.
I think we agree with both of the following claims:
If you have some way to commit in advance to follow FDT in cases like the demon case or the bomb case, you should do so.
Once you are in those cases, you have most reason to defect.
Given that you can predict that you’ll have most reason to defect, you can sort of psychologically make a deal with your future self where you say “NO REALLY, DON’T DEFECT, I’M SERIOUS.”
My claim though, is that decision theory is about 2, rather than 1 or 3. No one disputes that the kinds of agents who two box do worse than the kinds of agents who one box—the question is about what you should do once you’re in that situation.
If an AI is going to encounter Newcombe’s problem a lot, everyone agrees you should program it to one box.
I guess any omniscient demon reading this to assess my ability to precommit will have learned I can’t even precommit effectively to not having long back-and-forth discussions, let alone cutting my legs off. But I’m still interested in where you’re coming from here since I don’t think I’ve heard your exact position before.
Have you read https://www.lesswrong.com/posts/6ddcsdA2c2XpNpE5x/newcomb-s-problem-and-regret-of-rationality ? Do you agree that this is our crux?
Would you endorse the statement “Eliezer, using his decision theory, will usually end out with more utility than me over a long life of encountering the sorts of weird demonic situations decision theorists analyze, I just think he is less formally-rational” ?
Or do you expect that you will, over the long run, get more utility than him?
I would agree with the statement “if Eliezer followed his decision theory, and the world was such that one frequently encountered lots of Newcombe’s problems and similar, you’d end up with more utility.” I think my position is relatively like MacAskill’s in the linked post where he says that FDT is better as a theory of the agent you should want to be than what’s rational.
But I think that rationality won’t always benefit you. I think you’d agree with that. If there’s a demon who tortures everyone who believes FDT, then believing FDT, which you’d regard as rational, would make you worse off. If there’s another demon who will secretly torture you if you one box, then one boxing is bad for you! It’s possible to make up contrived scenarios that punish being rational—and Newcombe’s problem is a good example of that.
Notably, if we’re in the twin scenario or the scenario that tortures FDTists, CDT will dramatically beat FDT.
I think the example that’s most worth focusing on is the demon legs cut off case. I think it’s not crazy at all to one box, and have maybe 35% credence that one boxing is right. I have maybe 95% credence that you shouldn’t cut off your legs in the demon case, and 80% confidence that the position that you can is crazy, in the sense that if you spent years thinking about it while being relatively unbiased you’d almost certainly give it up.
I think rather than say that Eliezer is wrong about decision theory, you should say that Eliezer’s goal is to come up with a decision theory that helps him get utility, and your goal is something else, and you have both come up with very nice decision theories for achieving your goal.
(what is your goal?)
My opinion on your response to the demon question is “The demon would never create you in the first place, so who cares what you think?” That is, I think your formulation of the problem includes a paradox—we assume the demon is always right, but also, that you’re in a perfect position to betray it and it can’t stop you. What would actually happen is the demon would create a bunch of people with amputation fetishes, plus me and Eliezer who it knows wouldn’t betray it, and it would never put you in the position of getting to make the choice in real life (as opposed to in an FDT algorithmic way) in the first place. The reason you find the demon example more compelling than the Newcomb example is that it starts by making an assumption that undermines the whole problem—that is, that the demon has failed its omniscience check and created you who are destined to betray it. If your problem setup contains an implicit contradiction, you can prove anything.
I don’t think this is as degenerate a case as “a demon will torture everyone who believes FDT”. If that were true, and I expected to encounter that demon, I would simply try not to believe FDT (insofar as I can voluntarily change my beliefs). While you can always be screwed over by weird demons, I think decision theory is about what to choose in cases where you have all of the available knowledge and also a choice in the matter, and I think the leg demon fits that situation.
The demon case shows that there are cases where FDT loses, as is true of all decision theories. IF the question is which decision theory will programming into an AI generate most utility, then that’s an empirical question that depends on facts about the world. If it’s once you’re in a situation which will get the most utility, well, that’s causal decision theory.
Decision theories are intended as theories of what is rational for you to do. So it describes what choices are wise and which choices are foolish. I think Eliezer is confused about what a decision theory is, but that is a reason to trust his judgment less.
In the demon case, we can assume it’s only almost infallible, so every million times it makes a mistake. The demon case is a better example, because I have some credence in EVT, and EVT entails you should one box. I am waaaaaaaaaaaay more confident FDT is crazy than I am that you should two box.
I thought we already agreed the demon case showed that FDT wins in real life, since FDT agents will consistently end up with more utility than other agents.
Eliezer’s argument is that you can become the kind of entity that is programmed to do X, by choosing to do X. This is in some ways a claim about demons (they are good enough to predict even the choices you made with “your free will”). But it sounds like we’re in fact positing that demons are that good—I don’t know how to explain how they have 999,999/million success rate otherwise—so I think he is right.
I don’t think the demon being wrong one in a million times changes much. 999,999 of the people created by the demon will be some kind of FDT decision theorist with great precommitment skills. If you’re the one who isn’t, you can observe that you’re the demon’s rare mistake and avoid cutting off your legs, but this just means you won the lottery—it’s not a generally winning strategy.
I don’t understand why you think that the choices that get you more utility with no drawbacks are foolish, and the choices that cost you utility for no reason are wise.
On the Newcomb’s Problem post, Eliezer explicitly said that he doesn’t care why other people are doing decision theory, he would like to figure out a way to get more utility. Then he did that. I think if you disagree with his goal, you should be arguing “decision theory should be about looking good, not about getting utility” (so we can all laugh at you) rather than saying “Eliezer is confidently and egregiously wrong” and hiding the fact that one of your main arguments is that he said we should try to get utility instead of failing all the time and then came up with a strategy that successfully does that.
We all agree that you should get utility. You are pointing out that FDT agents get more utility. But once they are already in the situation where they’ve been created by the demon, FDT agents get less utility. If you are the type of agent to follow FDT, you will get more utility, just as if you are the type of agent to follow CDT while being in a scenario that tortures FDTists, you’ll get more utility. The question of decision theory is, given the situation you are in, what gets you more utility—what is the rational thing to do. Eliezer’s turns you into the type of agent who often gets more utility, but that does not make it the right decision theory. The fact that you want to be the type of agent who does X doesn’t make doing X rational if doing X is bad for you and not doing X is rewarded artificially.
Again, there is no dispute about whether on average one boxers or two boxers get more utility or which kind of AI you should build.