RobBensinger comments on [Linkpost] Sam Harris on “The Fall of Sam Bankman-Fried”

RobBensinger 17 Nov 2022 22:08 UTC
5 points
1 ∶ 1
I don’t think “known causal decision theorist Sam Bankman-Fried committed multibillion-dollar fraud, therefore we should be less confident that causal decision theory is false” is a good argument. There are (IMO) things some EAs should soul-search about after this, but “downgrade our confidence in literally all EA-associated claims” is the wrong lesson.
they ask you to imagine things that aren’t logically consistent with the real world, such as humans who can make categorical changes to their own behaviour,
Do you mean that FDT requires that humans be capable of following through on things like “pay Parfit’s hitchhiker”? I’d say it’s obvious that humans can follow through on that kind of commitment. Humans may not be able to be 100% confident in their own future behavior, but 100% confidence isn’t required.
arbitrarily perceptive psychologists,
See https://www.lesswrong.com/posts/RhAxxPXrkcEaNArnd/notes-on-can-you-control-the-past, especially:
[...] There’s a cute theorem I’ve proven (or, well, I’ve jotted down what looks to me like a proof somewhere, but haven’t machine-checked it or anything), which says that if you want to disagree with logical decision theorists, then you have to disagree in cases where the predictor is literally perfect. The idea is that we can break any decision problem down by cases (like “insofar as the predictor is accurate, …” and “insofar as the predictor is inaccurate, …”) and that all the competing decision theories (CDT, EDT, LDT) agree about how to aggregate cases. So if you want to disagree, you have to disagree in one of the separated cases. (And, spoilers, it’s not going to be the case where the predictor is on the fritz.)
I see this theorem as the counter to the decidedly human response “but in real life, predictors are never perfect”. “OK!”, I respond, “But decomposing a decision problem by cases is always valid, so what do you suggest we do under the assumption that the predictor is accurate?”
Even if perfect predictors don’t exist in real life, your behavior in the more complicated probabilistic setting should be assembled out of a mixture of ways you’d behave in simpler cases. Or, at least, so all the standard leading decision theories prescribe. So, pray tell, what do you do insofar as the predictor reasoned accurately? [...]
LDT doesn’t require that any predictors be perfectly accurate in real life. It just requires that there be agents that can predict your future behavior better than chance.
literally identical humans,
Not required, for the same reason. E.g., LDT comes into play whenever two humans make decisions based on a similar reasoning process (like “we both are using the long division algorithm to solve this math problem”), not just when the full brain-state is identical.
Like, to be clear, you can make literally identical humans, because there’s nothing physically impossible about emulating a human brain in computing hardware, and emulations are trivial to copy.
And it’s even more obvious that AI systems are copyable; and “figure out decision theory so we can better understand AI reasoning” is indeed the primary reason MIRI folks care about CDT vs. LDT.
But just because literal copies are a real-world example we need to take into account in AI (and, someday, in ems) doesn’t mean that any of the core arguments for LDT require there to be literal copies of agents. This is discussed in https://www.lesswrong.com/posts/RhAxxPXrkcEaNArnd/notes-on-can-you-control-the-past.
and even omniscient spirits and gods who defy causality.
How so? This is sort of an assumption in parts of algorithmic decision theory (AIXI has all possible worlds in its hypothesis space, and runs on compute that’s bigger than any of the possible worlds it’s reasoning about, though it isn’t indexically omniscient / doesn’t start off knowing which world it’s in). But I don’t know of any standard LDT arguments that require omniscience or causal loops or anything.
indeed, I’ve tricked actual rationalists out of small amounts of real-world utility by persuading them to one-box at inopportune moments
“I defected and the opponent cooperated” could mean one of two different things:
1. “The opponent cooperated even though they knew I would defect”.
2. “I tricked the opponent into thinking that I would cooperate, and then I defected anyway”.
Re case 1: FDT advises defection in decision problems where your opponent defects, so any rationalists who endorse 1 are not following FDT’s prescriptions. Obviously it shouldn’t count as a strike against FDT if rationalists lose money by diverging from FDT.
Re case 2: no decision theory can protect agents from ever being tricked by others, or protect agents from the general fact that having false beliefs will make you lose utility. “If FDT agents believe falsehood X, they’ll reliably lose money” is true for many values of X, but this doesn’t help distinguish FDT from other theories; no decision algorithm can magically protect you from losing utility if you lack good world-models.
(This is why decision problems in the literature generally stipulate that the agent knows what situation it’s in. It’s clearly a strike against a decision theory if it predictably fails when the agent knows what’s going on; whereas if the agent fails when it’s clueless, the blame may lie with the false/incomplete world-model, rather than with the decision algorithm.)
You could respond “but CDT can’t make this particular mistake”, but I don’t think this should be convincing unless you’re pointing to a case where CDT does better than FDT while the FDT agent has relevantly accurately beliefs. Otherwise I can just respond, “The CDT agent is guaranteed to lose in the cases where cooperate-cooperate equilibria are achievable; so both agents will lose utility in various situations, but CDT has the additional defect that it loses when it has correct world-models, not just when it has incorrect ones.”
It’s one thing to err when you don’t know enough to do better; it’s another thing to light money on fire and watch it burn for no reason, when you know exactly how to get more utility.
LDT agents achieving rational cooperate-cooperate equilibria can be compared to trading partners who realize gains from trade. You can respond “But being willing to ever trade opens up the possibility of being cheated; how about if I instead precommit to never trading in any circumstance, so no one can cheat me.” And that’s indeed an option available to you. (And in fact, it’s one the FDT agent will take too if they’re in a weird world where this disposition is somehow rewarded. FDT is flexible enough to cover this case, whereas CDT isn’t flexible enough to self-modify to FDT when needed.)
But in the real world, it’s not actually a good idea to throw all trade opportunities out the window a priori, because (a) the value of honest trade is too large to be worth throwing away, and (b) if you’re worried that you’re bad at identifying cheaters, you can just default to defecting in all cases except the ones where you’re extremely confident that you’re dealing with a non-cheater.
FDT’s prescription in this case, “defect unless you’re confident enough that the other person really will cooperate iff you’re the sort of person who cooperates in this situation”, is strictly better than CDT’s “defect no matter what”, because you can always set the required confidence level higher within FDT. FDT just says, “Don’t rule out the possibility of coordination totally.”
That said, if in-real-life people who endorse FDT consistently get their lunch stolen by people who endorse CDT, even though this goes against FDT’s prescriptions, then I would update on that and tell human beings to follow something closer to CDT in their daily life.
This is a crux for me in the context of “what advice should I give humans?”, even if it’s not a crux for the application to AI.
It would just be very weird if humans are unable to implement FDT well.
- Arepo 18 Nov 2022 1:28 UTC
  2 points
  0 ∶ 0
  Parent
  You said “The rationalist community also wasn’t involved from the start”. I think this is false almost no matter how you slice it.
  I’ve given a timeline to the contrary which you don’t seem to contradict, so I have little more to say here. If you think that ‘some rationalists were at some EA events’ implies that ‘Eliezer Yudkowsky’s post ~2 years later on was somehow foundational to the EA movement’, then I don’t think we’re going to agree.
  I don’t think “known causal decision theorist Sam Bankman-Fried committed multibillion-dollar fraud, therefore we should be less confident that causal decision theory is false” is a good argument.
  I haven’t said anything to the effect that SBF’s behaviour should update us on decision theory, so please don’t put that in my mouth. I said that I would like to see you, as a prominent EA, show more epistemic humility.
  Do you mean that FDT… E.g., LDT comes into play
  I didn’t mention any decision theory except CDT, which I have not seen sufficient reason to reject based on the thought experiments you’ve cited. For example, I expect a real jeep driver in a real desert with no knowledge of my history to have no better than base rate chance of guessing my intentions based on the decision theory I’ve at some stage picked out. I expect a real omnipotent entity with seemingly perfect knowledge of my actions to raise serious questions about personal identity, to which a reasonable answer is ‘I will one-box because it will cause future simulations like me to get more utility’. I don’t have the bandwidth to trawl through every argument and make a counterargument to the effect that the parameters are ill-defined, but that seems to be the unifying factor among them. If you think your views are provable, then don’t link me to multiple thousand-flowery-word essays: just copy and paste the formal proof!
  I initially misunderstood you as making a claim that early EAs were philosophically committed to “naive consequentialism” in the sense of “willingness to lie, steal, cheat, murder, etc. whenever the first-order effects of this seem to outweigh the costs”.
  Your original comment was about how ‘consequentialism at the level of actions has worse consequences than consequentialism at the level of policies/dispositions’ which said nothing about lying, stealing etc. It was presented as a counterpoint to Harris who, to my knowledge, does neither of those things with any regularity.
  Toby Ord’s PhD thesis, which he completed while working on GWWC, was on ‘global consequentialism’, which explicitly endorses act-level reasoning if, on balance, it will lead to the best effect. His solicitation for people to do something actively beneficent rather than just be a satisficing citizen ran against very much against the disinterested academic stylings of rule consequentialist reasoning in practice. You can claim it was advocating a ‘policy or disposition of giving’, but if you’re going to use such language so broadly, you no longer seem to be disagreeing with the original claim that ‘if you’re critiquing consequentialism ethics on the basis that it led to bad consequences you’re seriously confused’.