Error
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
Thanks for writing this up!
I think the idea is intriguing, and I agree that this is possible in principle, but I’m not convinced of your take on its practical implications. Apart from heuristic reasons to be sceptical of a new idea on this level of abstractness and speculativeness, my main objection is that a high degree of similarity with respect to reasoning (which is required for the decisions to be entangled) probably goes along with at least some degree of similarity with respect to values. (And if the values of the agents that correlate with me are similar to mine, then the result of taking them into account is also closer to my own values than the compromise value system of all agents.)
You write:
Conditional on this extremely high degree of similarity to me, isn’t it also more likely that their values are also similar to mine? For instance, if my reasoning is shaped by the experiences I’ve made, my genetic makeup, or the set of all ideas I’ve read about over the course of my life, then an agent with identical or highly similar reasoning would also share a lot of these characteristics. But of course, my experiences, genes, etc. also determine my values, so similarity with respect to these factors implies similarity with respect to values.
This is not the same as claiming that a given characteristic X that’s relevant to decision-making is generally linked to values, in the sense that people with X have systematically different values. It’s a subtle difference: I’m not saying that certain aspects of reasoning generally go along with certain values across the entire population; I’m saying that a high degree of similarity regarding reasoning goes along with similarity regarding values.
This was really interesting and probably as clear as such a topic can possibly be displayed.
Disclaimer: I dont know how to deal with infinities mathematically. What I am about to say is probably very wrong.
For every conceivable value system, there is an exactly opposing value system, so that there is no room for gains from trade between the systems (e.g. suffering maximizers vs suffering minimizers).
In an infinite multiverse, there are infinite agents with decision algorithms sufficiently similar to mine to allow for MSR. Among them, there are infinite agents that hold any value system. So whenever I cooperate with one value system, I defect on infinite agents that hold the exactly opposing values. So infinity seems to make cooperation impossble??
Sidenote: If you assume decision algorithm and values to be orthogonal, why do you suggest to “adjust [the values to cooperate with] by the degree their proponents are receptive to MSR ideas”?
Best, Jan
There is an intuition that “disorderly” worlds with improbable histories must somehow “matter less,” but it’s very hard to cash out what this could mean. See this post or this proposal. I’m not sure these issues are solved yet (probably not). (I’m assuming that suffering maximizers or other really weird value systems would only evolve, or be generated when lightning hits someone’s brain or whatever, in very improbable instances.)
Good point; this shows that I’m skeptical about a strong version of independence where values and decision algorithms are completely uncorrelated. E.g., I find it less likely that deep ecologists would change their actions based on MSR than people with more EA(-typical) value systems. It is open to discussion whether (or how strongly) this has to be corrected for historical path dependencies and founder effects: If Eliezer had not been really into acausal decision theory, perhaps the EA movement would think somewhat differently about the topic. If we could replay history many times over, how often would EA be more or less sympathetic to superrationality than it is currently?
This is a very clear description of some cool ideas. Thanks to you and Caspar for doing this!
I’m worried that people’s altruistic sentiments are ruining their intuition about the prisoner’s dilemma. If Bob were an altruist, then there would be no dilemma. He would just cooperate. But within the framework of the one-shot prisoner’s dilemma, defecting is a dominant strategy – no matter what Alice does, Bob is better off defecting.
I’m all for caring about other value systems, but if there’s no causal connection between our actions and aliens’, then it’s impossible to trade with them. I can pump someone’s intuition by saying, “Imagine a wizard produced a copy of yourself and had the two of you play the prisoner’s dilemma. Surely you would cooperate?” But that thought experiment is messed up because I care about copies of myself in a way that defies the set up of the prisoner’s dilemma.
One way to get cooperation in the one-shot prisoner’s dilemma is if Bob and Alice can inspect each other’s source code and prove that the other player will cooperate if and only if they do. But then Alice and Bob can communicate with each other! By having provably committed to this strategy, Alice and Bob can cause other player’s with the same strategy to cooperate.
Evidential decision theory also preys on our sentiments. I’d like to live in a cool multiverse where there are aliens outside my light cone who do what I want them to, but it’s not like my actions can cause that world to be the one I was born into.
I’m all for chasing after infinities and being nice to aliens, but acausal trade makes no sense. I’m willing to take many other infinite gambles, like theism or simulationism, before I’m willing to throw out causality.
I agree that altruistic sentiments are a confounder in the prisoner’s dilemma. Yudkowsky (who would cooperate against a copy) makes a similar point in The True Prisoner’s Dilemma, and there are lots of psychology studies showing that humans cooperate with each other in the PD in cases where I think they (that is, each individually) shouldn’t. (Cf. section 6.4 of the MSR paper.)
But I don’t think that altruistic sentiments are the primary reason for why some philosophers and other sophisticated people tend to favor cooperation in the prisoner’s dilemma against a copy. As you may know, Newcomb’s problem is decision-theoretically similar to the PD against a copy. In contrast to the PD, however, it doesn’t seem to evoke any altruistic sentiments. And yet, many people prefer EDT’s recommendations in Newcomb’s problem. Thus, the “altruism error theory” of cooperation in the PD is not particularly convincing.
I don’t see much evidence in favor of the “wishful thinking” hypothesis. It, too, seems to fail in the non-multiverse problems like Newcomb’s paradox. Also, it’s easy to come up with lots of incorrect theories about how any particular view results from biased epistemics, so I have quite low credence in any such hypothesis that isn’t backed up by any evidence.
Of course, causal eliminativism (or skepticism) is one motivation to one-box in Newcomb’s problem, but subscribing to eliminitavism is not necessary to do so.
For example, in Evidence, Decision and Causality Arif Ahmed argues that causality is irrelevant for decision making. (The book starts with: “Causality is a pointless superstition. These days it would take more than one book to persuade anyone of that. This book focuses on the ‘pointless’ bit, not the ‘superstition’ bit. I take for granted that there are causal relations and ask what doing so is good for. More narrowly still, I ask whether causal belief plays a special role in decision.”) Alternatively, one could even endorse the use of causal relationships for informing one’s decision but still endorse one-boxing. See, e.g., Yudkowsky, 2010; Fisher, n.d.; Spohn, 2012 or this talk by Ilya Shpitser.
Newcomb’s problem isn’t a challenge to causal decision theory. I can solve Newcomb’s problem by committing to one-boxing in any of a number of ways e.g. signing a contract or building a reputation as a one-boxer. After the boxes have already been placed in front of me, however, I can no longer influence their contents, so it would be good if I two-boxed if the rewards outweighed the penalty e.g. if it turned out the contract I signed was void, or if I don’t care about my one-boxing reputation because I don’t think I’m going to play this game again in the future.
The “wishful thinking” hypothesis might just apply to me then. I think it would be super cool if we could spontaneously cooperate with aliens in other universes.
Edit: Wow, ok I remember what I actually meant about wishful thinking. I meant that evidential decision theory literally prescribes wishful thinking. Also, if you made a copy of a purely selfish person and then told them of the fact, then I still think it would be rational to defect. Of course, if they could commit to cooperating before being copied, then that would be the right strategy.
You would get more utility if you were willing to one-box even when there’s no external penalty or opportunity to bind yourself to the decision. Indeed, functional decision theory can be understood as a formalization of the intuition: “I would be better off if only I could behave in the way I would have precommitted to behave in every circumstance, without actually needing to anticipate each such circumstance in advance.” Since the predictor in Newcomb’s problem fills the boxes based on your actual action, regardless of the reasoning or contract-writing or other activities that motivate the action, this suffices to always get the higher payout (compared to causal or evidential decision theory).
There are also dilemmas where causal decision theory gets less utility even if it has the opportunity to precommit to the dilemma; e.g., retro blackmail.
For a fuller argument, see the paper “Functional Decision Theory” by Yudkowsky and Soares.
Ha, I think the problem is just that your formalization of Newcomb’s problem is defined so that one-boxing is always the correct strategy, and I’m working with a different formulation. There are four forms of Newcomb’s problem that jibe with my intuition, and they’re all different from the formalization you’re working with.
Your source code is readable. Then the best strategy is whatever the best strategy is when you get to publicly commit e.g. you should tear off the wheel when playing chicken if you have the opportunity to do so before your opponent.
Your source code is readable and so is your opponent’s. Then you get mathy things like mutual simulation and lob’s theorem.
We’re in the real world, so the only information the other player has to guess your strategy is information like your past behavior and reputation. (This is by far the most realistic situation in my opinion.)
You’re playing against someone who’s an expert in reading body language, say. Then it might be impossible to fool them unless you can fool yourself into thinking you’ll one-box. But of course, after the boxes are actually in front of you, it would be great for you if you had a change of heart.
Your version is something like
Your opponent can simulate you with 100% accuracy, including unforeseen events like something unexpected causing you to have a change of mind.
If we’re creating AIs that others can simulate, then I guess we might as well make them immune to retro blackmail. I still don’t see the implications for humans, who cannot be simulated with 100% fidelity and already have ample intuition about their reputations and know lots of ways to solve coordination problems.
Geographical distance is a kind of inferential distance.