RobBensinger comments on I’m Buck Shlegeris, I do research and outreach at MIRI, AMA

RobBensinger Nov 26, 2019, 9:34 AM
3 points
0 ∶ 0
My impression is that most CDT advocates who know about FDT think FDT is making some kind of epistemic mistake, where the most popular candidate (I think) is some version of magical thinking.
Superstitious people often believe that it’s possible to directly causally influence things across great distances of time and space. At a glance, FDT’s prescription (“one-box, even though you can’t causally affect whether the box is full”) as well as its account of how and why this works (“you can somehow ‘control’ the properties of abstract objects like ‘decision functions’”) seem weird and spooky in the manner of a superstition.
FDT’s response: if a thing seems spooky, that’s a fine first-pass reason to be suspicious of it. But at some point, the accusation of magical thinking has to cash out in some sort of practical, real-world failure—in the case of decision theory, some systematic loss of utility that isn’t balanced by an equal, symmetric loss of utility from CDT. After enough experience of seeing a tool outperforming the competition in scenario after scenario, at some point calling the use of that tool “magical thinking” starts to ring rather hollow. At that point, it’s necessary to consider the possibility that FDT is counter-intuitive but correct (like Einstein’s “spukhafte Fernwirkung”), rather than magical.
In turn, FDT advocates tend to think the following reflects an epistemic mistake by CDT advocates:
2. I’m not the slave of my decision theory, or of the predictor, or of any environmental factor; I can freely choose to do anything in any dilemma, and by choosing to not leave money on the table (e.g., in a transparent Newcomb problem with a 1% chance of predictor failure where I’ve already observed that the second box is empty), I’m “getting away with something” and getting free utility that the FDT agent would miss out on.
The alleged mistake here is a violation of naturalism. Humans tend to think of themselves as free Cartesian agents acting upon the world, rather than as deterministic subprocesses of a larger deterministic process. If we consistently and whole-heartedly accepted the “deterministic subprocess” view of our decision-making, we would find nothing strange about the idea that it’s sometimes right for this subprocess to do locally incorrect things for the sake of better global results.
E.g., consider the transparent Newcomb problem with a 1% chance of predictor error. If we think of the brain’s decision-making as a rule-governed system whose rules we are currently determining (via a meta-reasoning process that is itself governed by deterministic rules), then there’s nothing strange about enacting a rule that gets us $1M in 99% of outcomes and $0 in 1% of outcomes; and following through when the unlucky 1% scenario hits us is nothing to agonize over, it’s just a consequence of the rule we already decided. In that regard, steering the rule-governed system that is your brain is no different than designing a factory robot that performs well enough in 99% of cases to offset the 1% of cases where something goes wrong.
(Note how a lot of these points are more intuitive in CS language. I don’t think it’s a coincidence that people coming from CS were able to improve on academic decision theory’s ideas on these points; I think it’s related to what kinds of stumbling blocks get in the way of thinking in these terms.)
Suppose you initially tell yourself:
“I’m going to one-box in all strictly-future transparent Newcomb problems, since this produces more expected causal (and evidential, and functional) utility. One-boxing and receiving $1M in 99% of future states is worth the $1000 cost of one-boxing in the other 1% of future states.”
Suppose that you then find yourself facing the 1%-likely outcome where Omega leaves the box empty regardless of your choice. You then have a change of heart and decide to two-box after all, taking the $1000.
I claim that the above description feels from the inside like your brain is escaping the iron chains of determinism (even if your scientifically literate system-2 verbal reasoning fully recognizes that you’re a deterministic process). And I claim that this feeling (plus maybe some reluctance to fully accept the problem description as accurate?) is the only thing that makes CDT’s decision seem reasonable in this case.
In reality, however, if we end up not following through on our verbal commitment and we one-box in that 1% scenario, then this would just prove that we’d been mistaken about what rule we had successfully installed in our brains. As it turns out, we were really following the lower-global-utility rule from the outset. A lack of follow-through or a failure of will is itself a part of the decision-making process that Omega is predicting; however much it feels as though a last-minute swerve is you “getting away with something”, it’s really just you deterministically following through on an algorithm that will get you less utility in 99% of scenarios (while happening to be bad at predicting your own behavior and bad at following through on verbalized plans).
I should emphasize that the above is my own attempt to characterize the intuitions behind CDT and FDT, based on the arguments I’ve seen in the wild and based on what makes me feel more compelled by CDT, or by FDT. I could easily be wrong about the crux of disagreement between some CDT and FDT advocates.
- bgarfinkel Nov 26, 2019, 9:39 PM
  3 points
  0 ∶ 0
  Parent
  In turn, FDT advocates tend to think the following reflects an epistemic mistake by CDT advocates:
  
  I’m not the slave of my decision theory, or of the predictor, or of any environmental factor; I can freely choose to do anything in any dilemma, and by choosing to not leave money on the table (e.g., in a transparent Newcomb problem with a 1% chance of predictor failure where I’ve already observed that the second box is empty), I’m “getting away with something” and getting free utility that the FDT agent would miss out on.
  
  The alleged mistake here is a violation of naturalism. Humans tend to think of themselves as free Cartesian agents acting upon the world, rather than as deterministic subprocesses of a larger deterministic process. If we consistently and whole-heartedly accepted the “deterministic subprocess” view of our decision-making, we would find nothing strange about the idea that it’s sometimes right for this subprocess to do locally incorrect things for the sake of better global results.
  Is the following a roughly accurate re-characterization of the intuition here?
  
  “Suppose that there’s an agent that implements P_UDT. Because it is following P_UDT, when it enters the box room it finds a ton of money in the first box and then refrains from taking the money in the second box. People who believe R_CDT claim that the agent should have also taken the money in the second box. But, given that the universe is deterministic, this doesn’t really make sense. From before the moment the agent the room, it was already determined that the agent would one box. Since (in a physically determinstic sense) the P_UDT agent could not have two-boxed, there’s no relevant sense in which the agent should have two-boxed.”
  
  If so, then I suppose my first reaction is that this seems like a general argument against normative realism rather than an argument against any specific proposed criterion of rightness. It also applies, for example, to the claim that a P_CDT agent “should have” one-boxed—since in a physically deterministic sense it could not have. Therefore, I think it’s probably better to think of this as an argument against the truth (and possibly conceptual coherence) of both R_CDT and R_UDT, rather than an argument that favors one over the other.
  
  In general, it seems to me like all statements that evoke counterfactuals have something like this problem. For example, it is physically determined what sort of decision procedure we will build into any given AI system; only one choice of decision procedure is physically consistent with the state of the world at the time the choice is made. So—insofar as we accept this kind of objection from determinism—there seems to be something problematically non-naturalistic about discussing what “would have happened” if we built in one decision procedure or another.
  - RobBensinger Nov 27, 2019, 6:48 AM
    3 points
    0 ∶ 0
    Parent
    Since (in a physically determinstic sense) the P_UDT agent could not have two-boxed, there’s no relevant sense in which the agent should have two-boxed.”
    No, I don’t endorse this argument. To simplify the discussion, let’s assume that the Newcomb predictor is infallible. FDT agents, CDT agents, and EDT agents each get a decision: two-box (which gets you $1000 plus an empty box), or one-box (which gets you $1,000,000 and leaves the $1000 behind). Obviously, insofar as they are in fact following the instructions of their decision theory, there’s only one possible outcome; but it would be odd to say that a decision stops being a decision just because it’s determined by something. (What’s the alternative?)
    I do endorse “given the predictor’s perfect accuracy, it’s impossible for the P_UDT agent to two-box and come away with $1,001,000”. I also endorse “given the predictor’s perfect accuracy, it’s impossible for the P_CDT agent to two-box and come away with $1,001,000″. Per the problem specification, no agent can two-box and get $1,001,000 or one-box and get $0. But this doesn’t mean that no decision is made; it just means that the predictor can predict the decision early enough to fill the boxes accordingly.
    (Notably, the agent following P_CDT two-boxes because $1,001,000 > $1,000,000 and $1000 > $0, even though this “dominance” argument appeals to two outcomes that are known to be impossible just from the problem statement. I certainly don’t think agents “should” try to achieve outcomes that are impossible from the problem specification itself. The reason agents get more utility than CDT in Newcomb’s problem is that non-CDT agents take into account that the predictor is a predictor when they construct their counterfactuals.)
    In the transparent version of this dilemma, the agent who sees the $1M and one-boxes also “could have two-boxed”, but if they had two-boxed, it would only have been after making a different observation. In that sense, if the agent has any lingering uncertainty about what they’ll choose, the uncertainty goes away as soon as they see whether the box is full.
    In general, it seems to me like all statements that evoke counterfactuals have something like this problem. For example, it is physically determined what sort of decision procedure we will build into any given AI system; only choice of decision procedure is physically consistent with the state of the world at the time the choice is made. So—insofar as we accept this kind of objection from determinism—there seems to be something problematically non-naturalistic about discussing what “would have happened” if we built in one decision procedure or another.
    No, there’s nothing non-naturalistic about this. Consider the scenario you and I are in. Simplifying somewhat, we can think of ourselves as each doing meta-reasoning to try to choose between different decision algorithms to follow going forward; where the new things we learn in this conversation are themselves a part of that meta-reasoning.
    The meta-reasoning process is deterministic, just like the object-level decision algorithms are. But this doesn’t mean that we can’t choose between object-level decision algorithms. Rather, the meta-reasoning (in spite of having deterministic causes) chooses either “I think I’ll follow P_FDT from now on” or “I think I’ll follow P_CDT from now on”. Then the chosen decision algorithm (in spite of also having deterministic causes) outputs choices about subsequent actions to take. Meta-processes that select between decision algorithms (to put into an AI, or to run in your own brain, or to recommend to other humans, etc.)) can make “real decisions”, for exactly the same reason (and in exactly the same sense) that the decision algorithms in question can make real decisions.
    It isn’t problematic that all these processes requires us to consider counterfactuals that (if we were omniscient) we would perceive as inconsistent/impossible. Deliberation, both at the object level and at the meta level, just is the process of determining the unique and only possible decision. Yet because we are uncertain about the outcome of the deliberation while deliberating, and because the details of the deliberation process do determine our decision (even as these details themselves have preceding causes), it feels from the inside of this process as though both options are “live”, are possible, until the very moment we decide.
    (See also Decisions are for making bad outcomes inconsistent.)
    - vaniver Nov 28, 2019, 12:05 AM
      7 points
      0 ∶ 0
      Parent
      I certainly don’t think agents “should” try to achieve outcomes that are impossible from the problem specification itself.
      I think you need to make a clearer distinction here between “outcomes that don’t exist in the universe’s dynamics” (like taking both boxes and receiving $1,001,000) and “outcomes that can’t exist in my branch” (like there not being a bomb in the unlucky case). Because if you’re operating just in the branch you find yourself in, many outcomes whose probability an FDT agent is trying to affect are impossible from the problem specification (once you include observations).
      And, to be clear, I do think agents “should” try to achieve outcomes that are impossible from the problem specification including observations, if certain criteria are met, in a way that basically lines up with FDT, just like agents “should” try to achieve outcomes that are already known to have happened from the problem specification including observations.
      As an example, if you’re in Parfit’s Hitchhiker, you should pay once you reach town, even though reaching town has probability 1 in cases where you’re deciding whether or not to pay, and the reason for this is because it was necessary for reaching town to have had probability 1.
      - RobBensinger Nov 28, 2019, 12:49 AM
        2 points
        0 ∶ 0
        Parent
        +1, I agree with all this.
    - bgarfinkel Nov 27, 2019, 10:42 PM
      2 points
      0 ∶ 0
      Parent
      
      Notably, the agent following P_CDT two-boxes because $1,001,000 > $1,000,000 and $1000 > $0, even though this “dominance” argument appeals to two outcomes that are known to be impossible just from the problem statement. I certainly don’t think agents “should” try to achieve outcomes that are impossible from the problem specification itself.
      
      Suppose that we accept the principle that agents never “should” try to achieve outcomes that are impossible from the problem specification—with one implication being that it’s false that (as R_CDT suggests) agents that see a million dollars in the first box “should” two-box.
      
      This seems to imply that it’s also false that (as R_UDT suggests) an agent that sees that the first box is empty “should” one box. By the problem specification, of course, one boxing when there is no money in the first box is also an impossible outcome. Since decisions to two box only occur when the first box is empty, this would then imply that decisions to two box are never irrational in the context of this problem. But I imagine you don’t want to say that.
      
      I think I probably still don’t understand your objection here—so I’m not sure this point is actually responsive to it—but I initially have trouble seeing what potential violations of naturalism/determinism R_CDT could be committing that R_UDT would not also be committing.
      
      (Of course, just to be clear, both R_UDT and R_CDT imply that the decision to commit yourself to a one-boxing policy at the start of the game would be rational. They only diverge in their judgments of what actual in-room boxing decision would be rational. R_UDT says that the decision to two-box is irrational and R_CDT says that the decision to one-box is irrational.)
      - ESRogs Nov 29, 2019, 6:23 AM
        3 points
        0 ∶ 0
        Parent
        both R_UDT and R_CDT imply that the decision to commit yourself to a two-boxing policy at the start of the game would be rational
        That should be “a one-boxing policy”, right?
        bgarfinkel Nov 30, 2019, 1:23 AM
        1 point
        0 ∶ 0
        Parent
        Yep, thanks for the catch! Edited to fix.