Scott Alexander comments on Eliezer Yudkowsky Is Frequently, Confidently, Egregiously Wrong

Scott Alexander Aug 29, 2023, 6:22 PM
2 points
0 ∶ 1
I guess any omniscient demon reading this to assess my ability to precommit will have learned I can’t even precommit effectively to not having long back-and-forth discussions, let alone cutting my legs off. But I’m still interested in where you’re coming from here since I don’t think I’ve heard your exact position before.
Have you read https://www.lesswrong.com/posts/6ddcsdA2c2XpNpE5x/newcomb-s-problem-and-regret-of-rationality ? Do you agree that this is our crux?
Would you endorse the statement “Eliezer, using his decision theory, will usually end out with more utility than me over a long life of encountering the sorts of weird demonic situations decision theorists analyze, I just think he is less formally-rational” ?
Or do you expect that you will, over the long run, get more utility than him?
- Bentham's Bulldog 30 Aug 2023 0:19 UTC
  1 point
  0 ∶ 0
  Parent
  I would agree with the statement “if Eliezer followed his decision theory, and the world was such that one frequently encountered lots of Newcombe’s problems and similar, you’d end up with more utility.” I think my position is relatively like MacAskill’s in the linked post where he says that FDT is better as a theory of the agent you should want to be than what’s rational.
  But I think that rationality won’t always benefit you. I think you’d agree with that. If there’s a demon who tortures everyone who believes FDT, then believing FDT, which you’d regard as rational, would make you worse off. If there’s another demon who will secretly torture you if you one box, then one boxing is bad for you! It’s possible to make up contrived scenarios that punish being rational—and Newcombe’s problem is a good example of that.
  Notably, if we’re in the twin scenario or the scenario that tortures FDTists, CDT will dramatically beat FDT.
  I think the example that’s most worth focusing on is the demon legs cut off case. I think it’s not crazy at all to one box, and have maybe 35% credence that one boxing is right. I have maybe 95% credence that you shouldn’t cut off your legs in the demon case, and 80% confidence that the position that you can is crazy, in the sense that if you spent years thinking about it while being relatively unbiased you’d almost certainly give it up.
  - Scott Alexander 30 Aug 2023 3:24 UTC
    6 points
    4 ∶ 1
    Parent
    I think rather than say that Eliezer is wrong about decision theory, you should say that Eliezer’s goal is to come up with a decision theory that helps him get utility, and your goal is something else, and you have both come up with very nice decision theories for achieving your goal.
    (what is your goal?)
    My opinion on your response to the demon question is “The demon would never create you in the first place, so who cares what you think?” That is, I think your formulation of the problem includes a paradox—we assume the demon is always right, but also, that you’re in a perfect position to betray it and it can’t stop you. What would actually happen is the demon would create a bunch of people with amputation fetishes, plus me and Eliezer who it knows wouldn’t betray it, and it would never put you in the position of getting to make the choice in real life (as opposed to in an FDT algorithmic way) in the first place. The reason you find the demon example more compelling than the Newcomb example is that it starts by making an assumption that undermines the whole problem—that is, that the demon has failed its omniscience check and created you who are destined to betray it. If your problem setup contains an implicit contradiction, you can prove anything.
    I don’t think this is as degenerate a case as “a demon will torture everyone who believes FDT”. If that were true, and I expected to encounter that demon, I would simply try not to believe FDT (insofar as I can voluntarily change my beliefs). While you can always be screwed over by weird demons, I think decision theory is about what to choose in cases where you have all of the available knowledge and also a choice in the matter, and I think the leg demon fits that situation.
    - Bentham's Bulldog 31 Aug 2023 0:02 UTC
      1 point
      0 ∶ 0
      Parent
      The demon case shows that there are cases where FDT loses, as is true of all decision theories. IF the question is which decision theory will programming into an AI generate most utility, then that’s an empirical question that depends on facts about the world. If it’s once you’re in a situation which will get the most utility, well, that’s causal decision theory.
      Decision theories are intended as theories of what is rational for you to do. So it describes what choices are wise and which choices are foolish. I think Eliezer is confused about what a decision theory is, but that is a reason to trust his judgment less.
      In the demon case, we can assume it’s only almost infallible, so every million times it makes a mistake. The demon case is a better example, because I have some credence in EVT, and EVT entails you should one box. I am waaaaaaaaaaaay more confident FDT is crazy than I am that you should two box.
      - Scott Alexander 1 Sep 2023 2:35 UTC
        2 points
        0 ∶ 0
        Parent
        I thought we already agreed the demon case showed that FDT wins in real life, since FDT agents will consistently end up with more utility than other agents.
        Eliezer’s argument is that you can become the kind of entity that is programmed to do X, by choosing to do X. This is in some ways a claim about demons (they are good enough to predict even the choices you made with “your free will”). But it sounds like we’re in fact positing that demons are that good—I don’t know how to explain how they have 999,999/million success rate otherwise—so I think he is right.
        I don’t think the demon being wrong one in a million times changes much. 999,999 of the people created by the demon will be some kind of FDT decision theorist with great precommitment skills. If you’re the one who isn’t, you can observe that you’re the demon’s rare mistake and avoid cutting off your legs, but this just means you won the lottery—it’s not a generally winning strategy.
        Decision theories are intended as theories of what is rational for you to do. So it describes what choices are wise and which choices are foolish.
        I don’t understand why you think that the choices that get you more utility with no drawbacks are foolish, and the choices that cost you utility for no reason are wise.
        On the Newcomb’s Problem post, Eliezer explicitly said that he doesn’t care why other people are doing decision theory, he would like to figure out a way to get more utility. Then he did that. I think if you disagree with his goal, you should be arguing “decision theory should be about looking good, not about getting utility” (so we can all laugh at you) rather than saying “Eliezer is confidently and egregiously wrong” and hiding the fact that one of your main arguments is that he said we should try to get utility instead of failing all the time and then came up with a strategy that successfully does that.
        Bentham's Bulldog 2 Sep 2023 15:44 UTC
        1 point
        0 ∶ 0
        Parent
        We all agree that you should get utility. You are pointing out that FDT agents get more utility. But once they are already in the situation where they’ve been created by the demon, FDT agents get less utility. If you are the type of agent to follow FDT, you will get more utility, just as if you are the type of agent to follow CDT while being in a scenario that tortures FDTists, you’ll get more utility. The question of decision theory is, given the situation you are in, what gets you more utility—what is the rational thing to do. Eliezer’s turns you into the type of agent who often gets more utility, but that does not make it the right decision theory. The fact that you want to be the type of agent who does X doesn’t make doing X rational if doing X is bad for you and not doing X is rewarded artificially.
        Again, there is no dispute about whether on average one boxers or two boxers get more utility or which kind of AI you should build.