richard_ngo comments on Request for input on multiverse-wide superrationality (MSR)

richard_ngo 14 Aug 2018 13:15 UTC
13 points
1 ∶ 1
A few doubts:
1. It seems like MSR requires a multiverse large enough to have many well-correlated agents, but not large enough to run into the problems involved with infinite ethics. Most of my credence is on no multiverse or infinite multiverse, although I’m not particularly well-read on this issue.
2. My broad intuition is something like “Insofar as we can know about the values of other civilisations, they’re probably similar to our own. Insofar as we can’t, MSR isn’t relevant.” There are probably exceptions, though (e.g. we could guess the direction in which an r-selected civilisation’s values would vary from our own).
3. I worry that MSR is susceptible to self-mugging of some sort. I don’t have a particular example, but the general idea is that you’re correlated with other agents even if you’re being very irrational. And so you might end up doing things which seem arbitrarily irrational. But this is just a half-fledged thought, not a proper objection.
4. And lastly, I would have much more confidence in FDT and superrationality in general if there were a sensible metric of similarity between agents, apart from correlation (because if you always cooperate in prisoner’s dilemmas, then your choices are perfectly correlated with CooperateBot, but intuitively it’d still be more rational to defect against CooperateBot, because your decision algorithm isn’t similar to CooperateBot in the same way that it’s similar to your psychological twin). I guess this requires a solution to logical uncertainty, though.
Happy to discuss this more with you in person. Also, I suggest you cross-post to Less Wrong.
What links here?
- Resurrection of the dead via multiverse-wide acausual cooperation by avturchin (LessWrong; 3 Sep 2018 11:21 UTC; 25 points)
- Why I think ECL shouldn’t make you update your cause prio by Jim Buhler (LessWrong; 6 Oct 2025 13:01 UTC; 1 point)
- SoerenMind 2 Sep 2018 23:27 UTC
  3 points
  1 ∶ 0
  Parent
  Re 4): Correlation or similarity between agents is not really necessary condition for cooperation in the open source PD. LaVictoire et al. (2012) and related papers showed that ‘fair’ agents with completely different implementations can cooperate. A fair agent, roughly speaking, has to conform to any structure that implements “I’ll cooperate with you if I can show that you’ll cooperate with me”. So maybe that’s the measure you’re looking for.
  
  A population of fair agents is also typically a Nash equilibrium in such games so you might expect that they sometimes do evolve.
  
  Source: LaVictoire, P., Fallenstein, B., Yudkowsky, E., Barasz, M., Christiano, P., & Herreshoff, M. (2014, July). Program equilibrium in the prisoner’s dilemma via Löb’s theorem. In AAAI Multiagent Interaction without Prior Coordination workshop.
  - richard_ngo 27 Jan 2019 2:02 UTC
    2 points
    0 ∶ 0
    Parent
    The example you’ve given me shows that agents which implement exactly the same (high-level) algorithm can cooperate with each other. The metric I’m looking for is: how can we decide how similar two agents are when their algorithms are non-identical? Presumably we want a smoothness property for that metric such that if our algorithms are very similar (e.g. only differ with respect to some radically unlikely edge case) the reduction in cooperation is negligible. But it doesn’t seem like anyone knows how to do this.
    - Johannes_Treutlein 3 Jul 2023 21:20 UTC
      1 point
      0 ∶ 0
      Parent
      One way I imagine dealing with this is that there is an oracle that tells us with certainty, for two algorithms and their decision situations, what the counterfactual possible joint outputs are. The smoothness then comes from our uncertainty about (i) the other agents’ algorithms (ii) their decision situation (iii) potentially the outputs of the oracle. The correlations vary smoothly as we vary our probability distributions over these things, but for a fully specified algorithm, situation, etc., the algorithms are always either logically identical or not.
      
      Unfortunately, I don’t know what the oracle would be doing in general. I could also imagine that, when formulated this way, the conclusion is that humans never correlate with anything, for instance.