RobBensinger comments on I’m Buck Shlegeris, I do research and outreach at MIRI, AMA

RobBensinger Nov 27, 2019, 6:48 AM
3 points
0 ∶ 0
Since (in a physically determinstic sense) the P_UDT agent could not have two-boxed, there’s no relevant sense in which the agent should have two-boxed.”
No, I don’t endorse this argument. To simplify the discussion, let’s assume that the Newcomb predictor is infallible. FDT agents, CDT agents, and EDT agents each get a decision: two-box (which gets you $1000 plus an empty box), or one-box (which gets you $1,000,000 and leaves the $1000 behind). Obviously, insofar as they are in fact following the instructions of their decision theory, there’s only one possible outcome; but it would be odd to say that a decision stops being a decision just because it’s determined by something. (What’s the alternative?)
I do endorse “given the predictor’s perfect accuracy, it’s impossible for the P_UDT agent to two-box and come away with $1,001,000”. I also endorse “given the predictor’s perfect accuracy, it’s impossible for the P_CDT agent to two-box and come away with $1,001,000″. Per the problem specification, no agent can two-box and get $1,001,000 or one-box and get $0. But this doesn’t mean that no decision is made; it just means that the predictor can predict the decision early enough to fill the boxes accordingly.
(Notably, the agent following P_CDT two-boxes because $1,001,000 > $1,000,000 and $1000 > $0, even though this “dominance” argument appeals to two outcomes that are known to be impossible just from the problem statement. I certainly don’t think agents “should” try to achieve outcomes that are impossible from the problem specification itself. The reason agents get more utility than CDT in Newcomb’s problem is that non-CDT agents take into account that the predictor is a predictor when they construct their counterfactuals.)
In the transparent version of this dilemma, the agent who sees the $1M and one-boxes also “could have two-boxed”, but if they had two-boxed, it would only have been after making a different observation. In that sense, if the agent has any lingering uncertainty about what they’ll choose, the uncertainty goes away as soon as they see whether the box is full.
In general, it seems to me like all statements that evoke counterfactuals have something like this problem. For example, it is physically determined what sort of decision procedure we will build into any given AI system; only choice of decision procedure is physically consistent with the state of the world at the time the choice is made. So—insofar as we accept this kind of objection from determinism—there seems to be something problematically non-naturalistic about discussing what “would have happened” if we built in one decision procedure or another.
No, there’s nothing non-naturalistic about this. Consider the scenario you and I are in. Simplifying somewhat, we can think of ourselves as each doing meta-reasoning to try to choose between different decision algorithms to follow going forward; where the new things we learn in this conversation are themselves a part of that meta-reasoning.
The meta-reasoning process is deterministic, just like the object-level decision algorithms are. But this doesn’t mean that we can’t choose between object-level decision algorithms. Rather, the meta-reasoning (in spite of having deterministic causes) chooses either “I think I’ll follow P_FDT from now on” or “I think I’ll follow P_CDT from now on”. Then the chosen decision algorithm (in spite of also having deterministic causes) outputs choices about subsequent actions to take. Meta-processes that select between decision algorithms (to put into an AI, or to run in your own brain, or to recommend to other humans, etc.)) can make “real decisions”, for exactly the same reason (and in exactly the same sense) that the decision algorithms in question can make real decisions.
It isn’t problematic that all these processes requires us to consider counterfactuals that (if we were omniscient) we would perceive as inconsistent/impossible. Deliberation, both at the object level and at the meta level, just is the process of determining the unique and only possible decision. Yet because we are uncertain about the outcome of the deliberation while deliberating, and because the details of the deliberation process do determine our decision (even as these details themselves have preceding causes), it feels from the inside of this process as though both options are “live”, are possible, until the very moment we decide.
(See also Decisions are for making bad outcomes inconsistent.)
- vaniver Nov 28, 2019, 12:05 AM
  7 points
  0 ∶ 0
  Parent
  I certainly don’t think agents “should” try to achieve outcomes that are impossible from the problem specification itself.
  I think you need to make a clearer distinction here between “outcomes that don’t exist in the universe’s dynamics” (like taking both boxes and receiving $1,001,000) and “outcomes that can’t exist in my branch” (like there not being a bomb in the unlucky case). Because if you’re operating just in the branch you find yourself in, many outcomes whose probability an FDT agent is trying to affect are impossible from the problem specification (once you include observations).
  And, to be clear, I do think agents “should” try to achieve outcomes that are impossible from the problem specification including observations, if certain criteria are met, in a way that basically lines up with FDT, just like agents “should” try to achieve outcomes that are already known to have happened from the problem specification including observations.
  As an example, if you’re in Parfit’s Hitchhiker, you should pay once you reach town, even though reaching town has probability 1 in cases where you’re deciding whether or not to pay, and the reason for this is because it was necessary for reaching town to have had probability 1.
  - RobBensinger Nov 28, 2019, 12:49 AM
    2 points
    0 ∶ 0
    Parent
    +1, I agree with all this.
- bgarfinkel Nov 27, 2019, 10:42 PM
  2 points
  0 ∶ 0
  Parent
  
  Notably, the agent following P_CDT two-boxes because $1,001,000 > $1,000,000 and $1000 > $0, even though this “dominance” argument appeals to two outcomes that are known to be impossible just from the problem statement. I certainly don’t think agents “should” try to achieve outcomes that are impossible from the problem specification itself.
  
  Suppose that we accept the principle that agents never “should” try to achieve outcomes that are impossible from the problem specification—with one implication being that it’s false that (as R_CDT suggests) agents that see a million dollars in the first box “should” two-box.
  
  This seems to imply that it’s also false that (as R_UDT suggests) an agent that sees that the first box is empty “should” one box. By the problem specification, of course, one boxing when there is no money in the first box is also an impossible outcome. Since decisions to two box only occur when the first box is empty, this would then imply that decisions to two box are never irrational in the context of this problem. But I imagine you don’t want to say that.
  
  I think I probably still don’t understand your objection here—so I’m not sure this point is actually responsive to it—but I initially have trouble seeing what potential violations of naturalism/determinism R_CDT could be committing that R_UDT would not also be committing.
  
  (Of course, just to be clear, both R_UDT and R_CDT imply that the decision to commit yourself to a one-boxing policy at the start of the game would be rational. They only diverge in their judgments of what actual in-room boxing decision would be rational. R_UDT says that the decision to two-box is irrational and R_CDT says that the decision to one-box is irrational.)
  - ESRogs Nov 29, 2019, 6:23 AM
    3 points
    0 ∶ 0
    Parent
    both R_UDT and R_CDT imply that the decision to commit yourself to a two-boxing policy at the start of the game would be rational
    That should be “a one-boxing policy”, right?
    - bgarfinkel Nov 30, 2019, 1:23 AM
      1 point
      0 ∶ 0
      Parent
      Yep, thanks for the catch! Edited to fix.