I downvoted this post for the lack of epistemic humility. I don’t mind people playing with thought experiments as a way to yield insight, but they’re almost never a way of generating ‘evidence’. Saying things like CDT ‘falls short’ because of them is far too strong. Under a certain set of assumptions, it arguably recommends an action that some people intuitively take issue with—that’s not exactly a self-evident deficiency.
Personally, I can’t see any reason to reject CDT based on the arguments here (or any I’ve seen elsewhere). They seem to rely on sleights such amphiboly, vagueness around what we call causality, asking me to accepting magic or self-effacing premises in some form, and an assumption of perfect selfishness/attachment to closed individualism, both of which I’d much sooner give up than accept a theory that basically defies known physics. For example:
Betting on the past. In my pocket (says Bob) I have a slip of paper on which is written a proposition P. You must choose between two bets. Bet 1 is a bet on P at 10:1 for a stake of one dollar. Bet 2 is a bet on P at 1:10 for a stake of ten dollars. So your pay-offs are as in [the table below]. Before you choose whether to take Bet 1 or Bet 2 I should tell you what P is. It is the proposition that the past state of the world was such as to cause you now to take Bet 2.
Ahmed argues that any causal decision theory worthy of the name would recommend taking Bet 1, simply because taking Bet 1 causally dominates taking Bet 2.[9]
In this scenario I don’t actually see the intuition that supposedly pushes CDTs to take bet one. It’s not clearly phrased, so maybe I’m not parsing it as intended, but it seems to me like the essence is supposed to be that the piece of paper says ‘you will take bet 2.’ If I take bet 2, it’s true, if not, it’s false. Since I’m financially invested in the proposition being true, I ‘cause’ it to be so. I don’t see a case for taking bet 1, or any commitment to evidential weirdness from eschewing it.
The psychopath button. Paul is debating whether to press the “kill all psychopaths” button. It would, he thinks, be much better to live in a world with no psychopaths. Unfortunately, Paul is quite confident that only a psychopath would press such a button. Paul very strongly prefers living in a world with psychopaths to dying. Should Paul press the button?
This is doing some combination of amphiboly and asking us to accept self-effacing premises.
There are many lemmas to this scenario, depending on how we interpret it, and none given me any concern for CDT:
We might just reject the premises as being underspecified or false. It’s very unclear what the result of pressing the button would be, so if Paul were me I would need to ask a great deal of questions first to have any confidence that this were really a positive EV action. But if I were satisfied that it was, I don’t think I’m a psychopath, and wouldn’t see any real reason why I should assume that I am one just because I’m doing something that harms a few to help many.
Otherwise, if we’re imagining that pressing this button is extremely net good, then rather than switching to a belief system with spooky acausality, we might relax the highly unrealistic assumption that Paul is necessarily totally selfish (even if he does turn out to be a psychopath, they are perfectly capable of being altruistic), and believe that he would push the button and risk death. In such a scenario, a CDT Paul would altruistically push the button and this seems optimal for his goals.
If we insist on the claim that he is totally selfish, then we’re more or less defining him as a psychopath (see the dictionary definition, ‘no feeling for other people’), so pressing the button would provide no relevant evidence but would definitely kill him. In such a scenario CDT Paul would selfishly not press it and this seems optimal for his goals.
If we modify the definition of psychopathy to ‘both being willing to kill other people and also being totally selfish’, then if Paul were to press it expecting it to kill him but improve the world, he would (by evidencing himself to be a psychopath and therefore choosing self-sacrifice for the sake of others) be evidencing himself to be not a psychopath. So the premises in this interpretation are self-effacing, and CDT Paul could reasonably go either way; or he might reasonably believe that such premises are either inconsistent or insufficient to allow him to decide.
Maybe we define psychopathy being willing to kill other people at all even in extremely bizarre circumstances, and even altruistically at great cost to oneself (which is a strange definition, but we’re running out of options) and independent of that definition, we again insist insist on the claim that Paul is totally selfish. Then pressing the button will probably kill almost the entire world’s population including Paul, and be a very bad thing to do. CDT Paul would have both altruistic and selfish reason not to push the button, and this seems optimal for his goals.
Finally if we somehow still insist that pressing the button will necessarily make the world better and that CDT will require Paul to press it while other decision theories will not, this seems like a strike against all those other decision theories. Why would we want to promote algorithms that worsen the world?
If we’re imagining that we get to determine how an AGI thinks, I would rather than give it an easily comprehensible and somewhat altruistic motivation than a perfectly selfish motivation with greater complexity that’s supposed to undo that selfishness.
Newcomb’s problem is similar. I won’t go through all the lemmas in detail, because there are many and some are extremely convoluted, but an approximation of my view is that it’s incredibly underspecified how we know that Omega supposedly knows the outcome and that he’s being honest with us about his intentions, and what we should do as a consequence. For example:
If we imagine he simulated us countless times, he can’t know the outcome, because each simulation introduces a new variable (himself, with knowledge of N+1 simulations, where in the previous simulation he had had knowledge of N) that we could use to randomise generate a result that would seem random to him. So asserting him to be actually perfect requires magic and hence gives us no meaningful result under any decision theory.
If we don’t try to sabotage him via randomness and we believe that he’s doing endless simulations then a CDT without either the assumption of perfect selfishness or of closed individualism (without which, the ‘next guy’ is indistinguishable including to me from myself) will one-box because it will cause the next guy to get more money. Both of these assumptions seem reasonable to drop—or rather, I think they’re unreasonable to hold in the first place—and in this case everyone ends up richer because we one-box without any metaphysical spookiness.
If we believe we have full information about his simulation process and insist on being perfectly selfish, then in the first simulation we should obviously 2-box, since there’s nothing to lose (assuming he has no other way of knowing our intent; otherwise see below). Then, knowing that we’re the Sim 2, we should two-box, since we can infer that Omega will have predicted we do that based on Sim 1. And so on, ad infinitum—we always two-box.
If we assert some more mundane scenario, like Omega is just a human psychologist with a good track record, then we need to assume his hit rate is much lower for it to be non-magic. Then his best strategy (if he’s trying to optimise for correct predictions and not messing with us in some other way) is certainly going to be something trivial like ‘always assume they’ll one-box’, or perhaps something fractionally more sophisticated like ‘assume they’ll one-box iff they’ve ever posted on Less Wrong’. That’s going to dominate any personality-reading he does—and implies we should two-box.
If he’s a machine learning algorithm who looks at a huge amount of my personal historical data and generates a generally accurate prediction, then we’re bordering back on magic for him to have collected all that data without me knowing what was going to happen.
If I did know or had a strong suspicion, then if the expected payoff was sufficient then as a CDT I should have always publicly acted in ways that imply I would one day one-box (and then ultimately two-box and expect to get the maximum reward).
If I didn’t know, then choosing non-CDT examples throughout my life just in case I ever run into such an ML algorithm opens me to other easy exploitation. For eg, I have actually played this game with non-CDT friends, and ‘exploited’ them by just lying about which box I put how much money in (often after hinting in advance that I plan to do so). If they distrust me—even forewarned by a comment such as this—enough to call my bluff in such scenarios, they’re giving evidence to the ML algorithm that it should assume they’re two-boxers, and hence losing the purported benefit of rejecting CDT. Regardless, when we unexpectedly reach the event, we have no reason not to then two-box.
I realise there are many more scenarios, but these arguments feel liturgical to me. If the rejection of CDT can’t be explained in a single well defined and non-spooky case of how it evidently fails by its own standards, I don’t see any value in generating or ‘clearing up confusions about’ ever more scenarios, and strongly suspect those who try of motivated reasoning.
I downvoted this post for the lack of epistemic humility. I don’t mind people playing with thought experiments as a way to yield insight, but they’re almost never a way of generating ‘evidence’. Saying things like CDT ‘falls short’ because of them is far too strong. Under a certain set of assumptions, it arguably recommends an action that some people intuitively take issue with—that’s not exactly a self-evident deficiency.
Personally, I can’t see any reason to reject CDT based on the arguments here (or any I’ve seen elsewhere). They seem to rely on sleights such amphiboly, vagueness around what we call causality, asking me to accepting magic or self-effacing premises in some form, and an assumption of perfect selfishness/attachment to closed individualism, both of which I’d much sooner give up than accept a theory that basically defies known physics. For example:
In this scenario I don’t actually see the intuition that supposedly pushes CDTs to take bet one. It’s not clearly phrased, so maybe I’m not parsing it as intended, but it seems to me like the essence is supposed to be that the piece of paper says ‘you will take bet 2.’ If I take bet 2, it’s true, if not, it’s false. Since I’m financially invested in the proposition being true, I ‘cause’ it to be so. I don’t see a case for taking bet 1, or any commitment to evidential weirdness from eschewing it.
This is doing some combination of amphiboly and asking us to accept self-effacing premises.
There are many lemmas to this scenario, depending on how we interpret it, and none given me any concern for CDT:
We might just reject the premises as being underspecified or false. It’s very unclear what the result of pressing the button would be, so if Paul were me I would need to ask a great deal of questions first to have any confidence that this were really a positive EV action. But if I were satisfied that it was, I don’t think I’m a psychopath, and wouldn’t see any real reason why I should assume that I am one just because I’m doing something that harms a few to help many.
Otherwise, if we’re imagining that pressing this button is extremely net good, then rather than switching to a belief system with spooky acausality, we might relax the highly unrealistic assumption that Paul is necessarily totally selfish (even if he does turn out to be a psychopath, they are perfectly capable of being altruistic), and believe that he would push the button and risk death. In such a scenario, a CDT Paul would altruistically push the button and this seems optimal for his goals.
If we insist on the claim that he is totally selfish, then we’re more or less defining him as a psychopath (see the dictionary definition, ‘no feeling for other people’), so pressing the button would provide no relevant evidence but would definitely kill him. In such a scenario CDT Paul would selfishly not press it and this seems optimal for his goals.
If we modify the definition of psychopathy to ‘both being willing to kill other people and also being totally selfish’, then if Paul were to press it expecting it to kill him but improve the world, he would (by evidencing himself to be a psychopath and therefore choosing self-sacrifice for the sake of others) be evidencing himself to be not a psychopath. So the premises in this interpretation are self-effacing, and CDT Paul could reasonably go either way; or he might reasonably believe that such premises are either inconsistent or insufficient to allow him to decide.
Maybe we define psychopathy being willing to kill other people at all even in extremely bizarre circumstances, and even altruistically at great cost to oneself (which is a strange definition, but we’re running out of options) and independent of that definition, we again insist insist on the claim that Paul is totally selfish. Then pressing the button will probably kill almost the entire world’s population including Paul, and be a very bad thing to do. CDT Paul would have both altruistic and selfish reason not to push the button, and this seems optimal for his goals.
Finally if we somehow still insist that pressing the button will necessarily make the world better and that CDT will require Paul to press it while other decision theories will not, this seems like a strike against all those other decision theories. Why would we want to promote algorithms that worsen the world?
If we’re imagining that we get to determine how an AGI thinks, I would rather than give it an easily comprehensible and somewhat altruistic motivation than a perfectly selfish motivation with greater complexity that’s supposed to undo that selfishness.
Newcomb’s problem is similar. I won’t go through all the lemmas in detail, because there are many and some are extremely convoluted, but an approximation of my view is that it’s incredibly underspecified how we know that Omega supposedly knows the outcome and that he’s being honest with us about his intentions, and what we should do as a consequence. For example:
If we imagine he simulated us countless times, he can’t know the outcome, because each simulation introduces a new variable (himself, with knowledge of N+1 simulations, where in the previous simulation he had had knowledge of N) that we could use to randomise generate a result that would seem random to him. So asserting him to be actually perfect requires magic and hence gives us no meaningful result under any decision theory.
If we don’t try to sabotage him via randomness and we believe that he’s doing endless simulations then a CDT without either the assumption of perfect selfishness or of closed individualism (without which, the ‘next guy’ is indistinguishable including to me from myself) will one-box because it will cause the next guy to get more money. Both of these assumptions seem reasonable to drop—or rather, I think they’re unreasonable to hold in the first place—and in this case everyone ends up richer because we one-box without any metaphysical spookiness.
If we believe we have full information about his simulation process and insist on being perfectly selfish, then in the first simulation we should obviously 2-box, since there’s nothing to lose (assuming he has no other way of knowing our intent; otherwise see below). Then, knowing that we’re the Sim 2, we should two-box, since we can infer that Omega will have predicted we do that based on Sim 1. And so on, ad infinitum—we always two-box.
If we assert some more mundane scenario, like Omega is just a human psychologist with a good track record, then we need to assume his hit rate is much lower for it to be non-magic. Then his best strategy (if he’s trying to optimise for correct predictions and not messing with us in some other way) is certainly going to be something trivial like ‘always assume they’ll one-box’, or perhaps something fractionally more sophisticated like ‘assume they’ll one-box iff they’ve ever posted on Less Wrong’. That’s going to dominate any personality-reading he does—and implies we should two-box.
If he’s a machine learning algorithm who looks at a huge amount of my personal historical data and generates a generally accurate prediction, then we’re bordering back on magic for him to have collected all that data without me knowing what was going to happen.
If I did know or had a strong suspicion, then if the expected payoff was sufficient then as a CDT I should have always publicly acted in ways that imply I would one day one-box (and then ultimately two-box and expect to get the maximum reward).
If I didn’t know, then choosing non-CDT examples throughout my life just in case I ever run into such an ML algorithm opens me to other easy exploitation. For eg, I have actually played this game with non-CDT friends, and ‘exploited’ them by just lying about which box I put how much money in (often after hinting in advance that I plan to do so). If they distrust me—even forewarned by a comment such as this—enough to call my bluff in such scenarios, they’re giving evidence to the ML algorithm that it should assume they’re two-boxers, and hence losing the purported benefit of rejecting CDT. Regardless, when we unexpectedly reach the event, we have no reason not to then two-box.
I realise there are many more scenarios, but these arguments feel liturgical to me. If the rejection of CDT can’t be explained in a single well defined and non-spooky case of how it evidently fails by its own standards, I don’t see any value in generating or ‘clearing up confusions about’ ever more scenarios, and strongly suspect those who try of motivated reasoning.