ESRogs comments on I’m Buck Shlegeris, I do research and outreach at MIRI, AMA

ESRogs Nov 29, 2019, 6:58 AM
9 points
0 ∶ 0
There may be a pretty different argument here, which you have in mind. I at least don’t see it yet though.
Perhaps the argument is something like:
- “Don’t make things worse” (DMTW) is one of the intuitions that leads us to favoring R_CDT
- But the actual policy that R_CDT recommends does not in fact follow DMTW
- So R_CDT only gets intuitive appeal from DMTW to the extent that DMTW was about R_′s, and not about P_′s
- But intuitions are probably(?) not that precisely targeted, so R_CDT shouldn’t get to claim the full intuitive endorsement of DMTW. (Yes, DMTW endorses it more than it endorses R_FDT, but R_CDT is still at least somewhat counter-intuitive when judged against the DMTW intuition.)
- bgarfinkel Nov 30, 2019, 1:14 AM
  4 points
  0 ∶ 0
  Parent
  So R_CDT only gets intuitive appeal from DMTW to the extent that DMTW was about R_′s, and not about P_′s
  
  But intuitions are probably(?) not that precisely targeted, so R_CDT shouldn’t get to claim the full intuitive endorsement of DMTW. (Yes, DMTW endorses it more than it endorses R_FDT, but R_CDT is still at least somewhat counter-intuitive when judged against the DMTW intuition.)
  Here are two logically inconsistent principles that could be true:
  
  Don’t Make Things Worse: If a decision would definitely make things worse, then taking that decision is not rational.
  
  Don’t Commit to a Policy That In the Future Will Sometimes Make Things Worse: It is not rational to commit to a policy that, in the future, will sometimes output decisions that definitely make things worse.
  
  I have strong intuitions that the fist one is true. I have much weaker (comparatively neglible) intuitions that the second one is true. Since they’re mutually inconsistent, I reject the second and accept the first. I imagine this is also true of most other people who are sympathetic to R_CDT.
  
  One could argue that R_CDT sympathists don’t actually have much stronger intuitions regarding the first principle than the second—i.e. that their intuitions aren’t actually very “targeted” on the first one—but I don’t think that would be right. At least, it’s not right in my case.
  
  A more viable strategy might be to argue for something like a meta-principle:
  
  The ‘Don’t Make Things Worse’ Meta-Principle: If you find “Don’t Make Things Worse” strongly intuitive, then you should also find “Don’t Commit to a Policy That In the Future Will Sometimes Make Things Worse” just about as intuitive.
  
  If the meta-principle were true, then I guess this would sort of imply that people’s intuitions in favor of “Don’t Make Things Worse” should be self-neutralizing. They should come packaged with equally strong intuitions for another position that directly contradicts it.
  
  But I don’t see why the meta-principle should be true. At least, my intuitions in favor of the meta-principle are way less strong than my intutions in favor of “Don’t Make Things Worse” :)
  - bgarfinkel Nov 30, 2019, 3:24 PM
    4 points
    0 ∶ 0
    Parent
    Just to say slightly more on this, I think the Bomb case is again useful for illustrating my (I think not uncommon) intuitions here.
    
    Bomb Case: Omega puts a million dollars in a transparent box if he predicts you’ll open it. He puts a bomb in the transparent box if he predicts you won’t open it. He’s only wrong about one in a trillion times.
    
    Now suppose you enter the room and see that there’s a bomb in the box. You know that if you open the box, the bomb will explode and you will die a horrible and painful death. If you leave the room and don’t open the box, then nothing bad will happen to you. You’ll return to a grateful family and live a full and healthy life. You understand all this. You want so badly to live. You then decide to walk up to the bomb and blow yourself up.
    
    Intuitively, this decision strikes me as deeply irrational. You’re intentionally taking an action that you know will cause a horrible outcome that you want badly to avoid. It feels very relevant that you’re flagrantly violating the “Don’t Make Things Worse” principle.
    
    Now, let’s step back a time step. Suppose you know that you’re sort of person who would refuse to kill yourself by detonating the bomb. You might decide that—since Omega is such an accurate predictor—it’s worth taking a pill to turn you into that sort of person, to increase your odds of getting a million dollars. You recognize that this may lead you, in the future, to take an action that makes things worse in a horrifying way. But you calculate that the decision you’re making now is nonetheless making things better in expectation.
    
    This decision strikes me as pretty intuitively rational. You’re violating the second principle—the “Don’t Commit to a Policy...” Principle—but this violation just doesn’t seem that intuitively relevent or remarkable to me. I personally feel like there is nothing too odd about the idea that it can be rational to commit to violating principles of rationality in the future.
    
    (This obviously just a description of my own intuitions, as they stand, though.)
    - Wei Dai Jan 19, 2020, 12:37 AM
      11 points
      0 ∶ 0
      Parent
      
      It feels very relevant that you’re flagrantly violating the “Don’t Make Things Worse” principle.
      
      By triggering the bomb, you’re making things worse from your current perspective, but making things better from the perspective of earlier you. Doesn’t that seem strange and deserving of an explanation? The explanation from a UDT perspective is that by updating upon observing the bomb, you actually changed your utility function. You used to care about both the possible worlds where you end up seeing a bomb in the box, and the worlds where you don’t. After updating, you think you’re either a simulation within Omega’s prediction so your action has no effect on yourself or you’re in the world with a real bomb, and you no longer care about the version of you in the world with a million dollars in the box, and this accounts for the conflict/inconsistency.
      
      Giving the human tendency to change our (UDT-)utility functions by updating, it’s not clear what to do (or what is right), and I think this reduces UDT’s intuitive appeal and makes it less of a slam-dunk over CDT/EDT. But it seems to me that it takes switching to the UDT perspective to even understand the nature of the problem. (Quite possibly this isn’t adequately explained in MIRI’s decision theory papers.)
  - ESRogs Dec 1, 2019, 7:58 AM
    3 points
    0 ∶ 0
    Parent
    Don’t Make Things Worse: If a decision would definitely make things worse, then taking that decision is not rational.
    Don’t Commit to a Policy That In the Future Will Sometimes Make Things Worse: It is not rational to commit to a policy that, in the future, will sometimes output decisions that definitely make things worse.
    ...
    One could argue that R_CDT sympathists don’t actually have much stronger intuitions regarding the first principle than the second—i.e. that their intuitions aren’t actually very “targeted” on the first one—but I don’t think that would be right. At least, it’s not right in my case.
    I would agree that, with these two principles as written, more people would agree with the first. (And certainly believe you that that’s right in your case.)
    But I feel like the second doesn’t quite capture what I had in mind regarding the DMTW intuition applied to P_′s.
    Consider an alternate version:
    If a decision would definitely make things worse, then taking that decision is not good policy.
    Or alternatively:
    If a decision would definitely make things worse, a rational person would not take that decision.
    It seems to me that these two claims are naively intuitive on their face, in roughly the same way that the ”… then taking that decision is not rational.” version is. And it’s only after you’ve considered prisoners’ dilemmas or Newcomb’s paradox, etc. that you realize that good policy (or being a rational agent) actually diverges from what’s rational in the moment.
    (But maybe others would disagree on how intuitive these versions are.)
    EDIT: And to spell out my argument a bit more: if several alternate formulations of a principle are each intuitively appealing, and it turns out that whether some claim (e.g. R_CDT is true) is consistent with the principle comes down to the precise formulation used, then it’s not quite fair to say that the principle fully endorses the claim and that the claim is not counter-intuitive from the perspective of the original intuition.
    Of course, this argument is moot if it’s true that the original DMTW intuition was always about rational in-the-moment action, and never about policies or actors. And maybe that’s the case? But I think it’s a little more ambiguous with the ”… is not good policy” or “a rational person would not...” versions than with the “Don’t commit to a policy...” version.
    EDIT2: Does what I’m trying to say make sense? (I felt like I was struggling a bit to express myself in this comment.)