A think tank to investigate the game theory of ethics
Values and Reflective Processes, Effective Altruism, Research That Can Help Us Improve, Space Governance, Artificial Intelligence
Caspar Oesterheld’s work on Evidential Cooperation in Large Worlds (ECL) shows that some fairly weak assumptions about the shape of the universe are enough to arrive at the conclusion that there is one optimal system of ethics: the compromise between all the preferences of all agents who cooperate with each other acausally. That would solve ethics for all practical purposes. It would therefore have enormous effects on a wide variety of fields because of how foundational ethics is.
The main catch is that it will take a lot more thought and empirical study to narrow down what that optimal compromise ethical system looks like. The ethical systems and bargaining methods used on earth can serve as a sample and convergent drives can help us extrapolate to unobserved types of agents. We may never have certainty that we’ve found the optimal ethical system, but we can go from a state of overwhelming Knightian uncertainty to a state of quantifiable uncertainty. Along the way we can probably rule out many ethical systems as likely nonoptimal.
First and foremost, this is a reflective process that will inform altruistic priorities, which suggests the categories Values and Reflective Processes, Effective Altruism, and Research That Can Help Us Improve. But I also see applications wherever agents have trouble communicating: cooperation between multiple mass movements, cooperation between large groups of donors, cooperation between anonymous donors, cooperation between camps of voters, cooperation on urgent issues between civilizations that are too far separated to communicated quickly enough, cooperation between agents on different levels of the simulation hierarchy. ECL may turn out to be a convergent goal of a wide range of artificial intelligences. Thus it also has indirect effects on the categories of Space Governance and Artificial Intelligence. (But I don’t think it would be good for someone to prioritize this over more direct AI safety work at this time.)
I see a few weaknesses in the argument for ECL, so first step may be to get experts in game theory and physics together to probe these and work out exactly what assumptions go into ECL and how likely they are.
Some people have thought about this more than I have – including (of course) Caspar Oesterheld, Johannes Treutlein, David Althaus, Daniel Kokotajlo, and Lukas Gloor – but I don’t think anyone is currently focused on it.
Caspar Oesterheld’s work on Evidential Cooperation in Large Worlds (ECL) shows that some fairly weak assumptions about the shape of the universe are enough to arrive at the conclusion that there is one optimal system of ethics: the compromise between all the preferences of all agents who cooperate with each other acausally. That would solve ethics for all practical purposes. It would therefore have enormous effects on a wide variety of fields because of how foundational ethics is.
ECL recommends that agents maximize a compromise utility function averaging their own and those of the agents that action-correlate with them (their “copies”). The compromise between me and my copies would look different from the compromise between you and your copies, right? So I could “solve ethics” for myself, but not for you, and vice versa. Ethics could be “solved” for everyone if all agents in the multiverse were action-correlated with each other to the exact same degree, which appears exceedingly unlikely. Do I miss something?
(Not a criticism of your proposal. I’m just trying to refine my understanding of ECL) :)
Thanks for the comment! I think that’s a misunderstanding because trading with copies of oneself wouldn’t do anything since you already want the same thing. The compromise between you would be the same as what you want individually.
But with ECL you instead employ the concept of “superrationality,” which Douglas Hofstadter, Gary Drescher, and others have already looked into in isolation. You have now learned of superrationality, and others out there have perhaps also figured it out (or will in the future). Superrationality is now the thing that you have in common and that allows you to coordinate our decisions without communicating.
That coordination relies a lot on Schelling points, on extrapolation from the things that we see around us, from general considerations when it comes to what sorts of agents will consider superrationality to be worth their while (some brands of consequentialists surely), etc.
By “copies”, I meant “agents which action-correlate with you” (i.e., those which will cooperate if you cooperate), not “agents sharing your values”. Sorry for the confusion.
Do you think all agents thinking superrationaly action-correlate? This seems like a very strong claim to me. My impression is that the agents with a decision-algorithm similar enough to mine to (significantly) action-correlate with me is a very small subset of all superrationalists . As your post suggests, even your past-self doesn’t fully action-correlate with you (although you don’t need “full correlation” for cooperation to be worthwhile, of course).
In a one-shot prisoner’s dilemma, would you cooperate with anyone who agrees that superrationality is the way to go?
In his paper on ECL, Caspar Oesterheld says (section 2, p.9): “I will tend to make arguments from similarity of decision algorithms rather than from common rationality, because I hold these to be more rigorous and more applicable whenever there is not authority to tell my collaborators and me about our common rationality.” However, he also often uses “the agents with a decision-algorithm similar enough to mine to (significantly) action-correlate with me” and “all superrationalists ” interchangeably, which confuses me a lot.
Do you think all agents thinking superrationaly action-correlate?
Yes, but by implication not assumption. (Also no, not perfectly at least, because we’ll all always have some empirical uncertainty.)
Superrationalists want to compromise with each other (if they have the right aggregative-consequentialist mindset), so they try to infer what everyone else wants (in some immediate, pre-superrationality sense), calculate the compromise that follows from that, determine what actions that compromise implies for the context in which they find themselves (resources and whatnot), and then act accordingly. These final acts can be very different depending on their contexts, but the compromise goals from which they follow correlate to the extent to which they were able to correctly infer what everyone wants (including bargaining solutions etc.).
In a one-shot prisoner’s dilemma, would you cooperate with anyone who agrees that superrationality is the way to go?
Yes.
Hmm, it’s been a couple years since I read the paper, so not sure how that is meant… But I suppose either the decision algorithm is similar (1) because it goes through the superrationality step, or the decision algorithm has to be a bit similar (2) in order for people to consider superrationality in the first place. You need to subscribe to non-causal DTs or maybe have indexical uncertainty of some sort. It might be something that religious people and EAs come up with but that seems weird to most other people. (I think Calvinists have these EDT leanings, so maybe they’d embrace superrationality too? No idea.) I think superrationality breaks down in many earth-bound cases because too many people here would consider it weird, like the whole CDT crowd probably, unless they are aware of their indexical uncertainty, but that’s also still considered a bit weird.
Oh interesting! Ok so I guess there are two possibilities.
1) Either by “supperrationalists”, you mean something stronger than “agents taking acausal dependences into account in PD-like situations”, which I thought was roughly Caspar’s definition in his paper. And then, I’d be even more confused.
2) Or you really think that taking acausal dependences into account is, by itself, sufficient to create a significant correlation in two decision-algorithms. In that case, how do you explain that I would defect against you and exploit you in one-shot PD (very sorry, I just don’t believe we correlate ^^), despite being completely on board with supperrationality? How is that not a proof that common supperrationality is insufficient?
(Btw, happy to jump on a call to talk about this if you’d prefer that over writing.)
I think it’s closer to 2, and the clearer term to use is probably “superrational cooperator,” but I suppose that’s probably meant by “superrationalist”? Unclear. But “superrational cooperator” is clearer about (1) knowing about superrationality and (2) wanting to reap the gains from trade from superrationality. Condition 2 can be false because people use CDT or because they have very local or easily satisfied values and don’t care about distant or additional stuff.
So just as in all the thought experiments where EDT gets richer than CDT, your own behavior is the only evidence you have about what others are likely to predict about you. The multiverse part probably smooths that out a bit, so your own behavior gives you evidence of increasing or decreasing gains from trade as the fraction of agents in the multiverse that you think cooperate with you increases or decreases.
I think it would be “hard” to try to occupy that Goldilocks zone where you maximize the number of agents who wrongly believe that you’ll cooperate while you’re really defecting, because you’d have to simultaneously believe that you’re the sort of agent that cooperates despite actually defecting, which should give you evidence that you’re wrong about what reference class you’re likely to be put in. There may be agents like that out there, but even if that’s the case, they won’t have control over it. The way this will probably be factored in is that superrational cooperators will expect a slightly lower cooperation incidence to agents in reference classes of agents that are empirically very likely to cooperate while not being physically forced to cooperate because being in that reference class makes defection more profitable up to the point where it actually changes the assumptions others are likely to make about the reference class that have enabled the effect in the first place. That could mean that for any given reference class of agent who are able to defect, cooperation “densities” over 99% or so get rapidly less likely.
But really, I think, the winning strategy for anyone at all interested in distant gains from trade is to be a very simple, clear kind of superrational cooperator agent because that maximizes the chances that others will cooperate with that sort of agent. All that “trying to be clever” and “being the sort of agent that tries to be clever” probably just costs so much gains from trade right away that you’d have to value the distant gains from trade very low compared to your local stuff for it to make any economic sense, and then you can probably forget about the gains from trade anyway because others will also predict that. I think David Althaus and Johannes Treutlein have thought about this from the perspective of different value systems, but I don’t know of any published artifacts from that.
We can have a chat some time, gladly! But it’s been a while that I’ve done all this so I’m a bit slow. ^.^′
A think tank to investigate the game theory of ethics
Values and Reflective Processes, Effective Altruism, Research That Can Help Us Improve, Space Governance, Artificial Intelligence
Caspar Oesterheld’s work on Evidential Cooperation in Large Worlds (ECL) shows that some fairly weak assumptions about the shape of the universe are enough to arrive at the conclusion that there is one optimal system of ethics: the compromise between all the preferences of all agents who cooperate with each other acausally. That would solve ethics for all practical purposes. It would therefore have enormous effects on a wide variety of fields because of how foundational ethics is.
The main catch is that it will take a lot more thought and empirical study to narrow down what that optimal compromise ethical system looks like. The ethical systems and bargaining methods used on earth can serve as a sample and convergent drives can help us extrapolate to unobserved types of agents. We may never have certainty that we’ve found the optimal ethical system, but we can go from a state of overwhelming Knightian uncertainty to a state of quantifiable uncertainty. Along the way we can probably rule out many ethical systems as likely nonoptimal.
First and foremost, this is a reflective process that will inform altruistic priorities, which suggests the categories Values and Reflective Processes, Effective Altruism, and Research That Can Help Us Improve. But I also see applications wherever agents have trouble communicating: cooperation between multiple mass movements, cooperation between large groups of donors, cooperation between anonymous donors, cooperation between camps of voters, cooperation on urgent issues between civilizations that are too far separated to communicated quickly enough, cooperation between agents on different levels of the simulation hierarchy. ECL may turn out to be a convergent goal of a wide range of artificial intelligences. Thus it also has indirect effects on the categories of Space Governance and Artificial Intelligence. (But I don’t think it would be good for someone to prioritize this over more direct AI safety work at this time.)
I see a few weaknesses in the argument for ECL, so first step may be to get experts in game theory and physics together to probe these and work out exactly what assumptions go into ECL and how likely they are.
Some people have thought about this more than I have – including (of course) Caspar Oesterheld, Johannes Treutlein, David Althaus, Daniel Kokotajlo, and Lukas Gloor – but I don’t think anyone is currently focused on it.
ECL recommends that agents maximize a compromise utility function averaging their own and those of the agents that action-correlate with them (their “copies”). The compromise between me and my copies would look different from the compromise between you and your copies, right? So I could “solve ethics” for myself, but not for you, and vice versa. Ethics could be “solved” for everyone if all agents in the multiverse were action-correlated with each other to the exact same degree, which appears exceedingly unlikely. Do I miss something?
(Not a criticism of your proposal. I’m just trying to refine my understanding of ECL) :)
Thanks for the comment! I think that’s a misunderstanding because trading with copies of oneself wouldn’t do anything since you already want the same thing. The compromise between you would be the same as what you want individually.
But with ECL you instead employ the concept of “superrationality,” which Douglas Hofstadter, Gary Drescher, and others have already looked into in isolation. You have now learned of superrationality, and others out there have perhaps also figured it out (or will in the future). Superrationality is now the thing that you have in common and that allows you to coordinate our decisions without communicating.
That coordination relies a lot on Schelling points, on extrapolation from the things that we see around us, from general considerations when it comes to what sorts of agents will consider superrationality to be worth their while (some brands of consequentialists surely), etc.
I’ve mentioned some real-world examples of ECL for coordinating within and between communities like EA in this article.
Thanks for the reply! :)
By “copies”, I meant “agents which action-correlate with you” (i.e., those which will cooperate if you cooperate), not “agents sharing your values”. Sorry for the confusion.
Do you think all agents thinking superrationaly action-correlate? This seems like a very strong claim to me. My impression is that the agents with a decision-algorithm similar enough to mine to (significantly) action-correlate with me is a very small subset of all superrationalists . As your post suggests, even your past-self doesn’t fully action-correlate with you (although you don’t need “full correlation” for cooperation to be worthwhile, of course).
In a one-shot prisoner’s dilemma, would you cooperate with anyone who agrees that superrationality is the way to go?
In his paper on ECL, Caspar Oesterheld says (section 2, p.9): “I will tend to make arguments from similarity of decision algorithms rather than from common rationality, because I hold these to be more rigorous and more applicable whenever there is not authority to tell my collaborators and me about our common rationality.”
However, he also often uses “the agents with a decision-algorithm similar enough to mine to (significantly) action-correlate with me” and “all superrationalists ” interchangeably, which confuses me a lot.
Yes, but by implication not assumption. (Also no, not perfectly at least, because we’ll all always have some empirical uncertainty.)
Superrationalists want to compromise with each other (if they have the right aggregative-consequentialist mindset), so they try to infer what everyone else wants (in some immediate, pre-superrationality sense), calculate the compromise that follows from that, determine what actions that compromise implies for the context in which they find themselves (resources and whatnot), and then act accordingly. These final acts can be very different depending on their contexts, but the compromise goals from which they follow correlate to the extent to which they were able to correctly infer what everyone wants (including bargaining solutions etc.).
Yes.
Hmm, it’s been a couple years since I read the paper, so not sure how that is meant… But I suppose either the decision algorithm is similar (1) because it goes through the superrationality step, or the decision algorithm has to be a bit similar (2) in order for people to consider superrationality in the first place. You need to subscribe to non-causal DTs or maybe have indexical uncertainty of some sort. It might be something that religious people and EAs come up with but that seems weird to most other people. (I think Calvinists have these EDT leanings, so maybe they’d embrace superrationality too? No idea.) I think superrationality breaks down in many earth-bound cases because too many people here would consider it weird, like the whole CDT crowd probably, unless they are aware of their indexical uncertainty, but that’s also still considered a bit weird.
Oh interesting! Ok so I guess there are two possibilities.
1) Either by “supperrationalists”, you mean something stronger than “agents taking acausal dependences into account in PD-like situations”, which I thought was roughly Caspar’s definition in his paper. And then, I’d be even more confused.
2) Or you really think that taking acausal dependences into account is, by itself, sufficient to create a significant correlation in two decision-algorithms. In that case, how do you explain that I would defect against you and exploit you in one-shot PD (very sorry, I just don’t believe we correlate ^^), despite being completely on board with supperrationality? How is that not a proof that common supperrationality is insufficient?
(Btw, happy to jump on a call to talk about this if you’d prefer that over writing.)
I think it’s closer to 2, and the clearer term to use is probably “superrational cooperator,” but I suppose that’s probably meant by “superrationalist”? Unclear. But “superrational cooperator” is clearer about (1) knowing about superrationality and (2) wanting to reap the gains from trade from superrationality. Condition 2 can be false because people use CDT or because they have very local or easily satisfied values and don’t care about distant or additional stuff.
So just as in all the thought experiments where EDT gets richer than CDT, your own behavior is the only evidence you have about what others are likely to predict about you. The multiverse part probably smooths that out a bit, so your own behavior gives you evidence of increasing or decreasing gains from trade as the fraction of agents in the multiverse that you think cooperate with you increases or decreases.
I think it would be “hard” to try to occupy that Goldilocks zone where you maximize the number of agents who wrongly believe that you’ll cooperate while you’re really defecting, because you’d have to simultaneously believe that you’re the sort of agent that cooperates despite actually defecting, which should give you evidence that you’re wrong about what reference class you’re likely to be put in. There may be agents like that out there, but even if that’s the case, they won’t have control over it. The way this will probably be factored in is that superrational cooperators will expect a slightly lower cooperation incidence to agents in reference classes of agents that are empirically very likely to cooperate while not being physically forced to cooperate because being in that reference class makes defection more profitable up to the point where it actually changes the assumptions others are likely to make about the reference class that have enabled the effect in the first place. That could mean that for any given reference class of agent who are able to defect, cooperation “densities” over 99% or so get rapidly less likely.
But really, I think, the winning strategy for anyone at all interested in distant gains from trade is to be a very simple, clear kind of superrational cooperator agent because that maximizes the chances that others will cooperate with that sort of agent. All that “trying to be clever” and “being the sort of agent that tries to be clever” probably just costs so much gains from trade right away that you’d have to value the distant gains from trade very low compared to your local stuff for it to make any economic sense, and then you can probably forget about the gains from trade anyway because others will also predict that. I think David Althaus and Johannes Treutlein have thought about this from the perspective of different value systems, but I don’t know of any published artifacts from that.
We can have a chat some time, gladly! But it’s been a while that I’ve done all this so I’m a bit slow. ^.^′