SoerenMind comments on Request for input on multiverse-wide superrationality (MSR)

SoerenMind 2 Sep 2018 23:27 UTC
3 points
1 ∶ 0
Re 4): Correlation or similarity between agents is not really necessary condition for cooperation in the open source PD. LaVictoire et al. (2012) and related papers showed that ‘fair’ agents with completely different implementations can cooperate. A fair agent, roughly speaking, has to conform to any structure that implements “I’ll cooperate with you if I can show that you’ll cooperate with me”. So maybe that’s the measure you’re looking for.

A population of fair agents is also typically a Nash equilibrium in such games so you might expect that they sometimes do evolve.

Source: LaVictoire, P., Fallenstein, B., Yudkowsky, E., Barasz, M., Christiano, P., & Herreshoff, M. (2014, July). Program equilibrium in the prisoner’s dilemma via Löb’s theorem. In AAAI Multiagent Interaction without Prior Coordination workshop.
- richard_ngo 27 Jan 2019 2:02 UTC
  2 points
  0 ∶ 0
  Parent
  The example you’ve given me shows that agents which implement exactly the same (high-level) algorithm can cooperate with each other. The metric I’m looking for is: how can we decide how similar two agents are when their algorithms are non-identical? Presumably we want a smoothness property for that metric such that if our algorithms are very similar (e.g. only differ with respect to some radically unlikely edge case) the reduction in cooperation is negligible. But it doesn’t seem like anyone knows how to do this.
  - Johannes_Treutlein 3 Jul 2023 21:20 UTC
    1 point
    0 ∶ 0
    Parent
    One way I imagine dealing with this is that there is an oracle that tells us with certainty, for two algorithms and their decision situations, what the counterfactual possible joint outputs are. The smoothness then comes from our uncertainty about (i) the other agents’ algorithms (ii) their decision situation (iii) potentially the outputs of the oracle. The correlations vary smoothly as we vary our probability distributions over these things, but for a fully specified algorithm, situation, etc., the algorithms are always either logically identical or not.
    
    Unfortunately, I don’t know what the oracle would be doing in general. I could also imagine that, when formulated this way, the conclusion is that humans never correlate with anything, for instance.