So certainly physics-based priors is a big component, and indeed in some sense is all of it. That is, I think physics-based priors should give you an immediate answer of “you can’t influence the past with high probability”, and moreover that once you think through the problems in detail the conclusion will be that you could influence the past if physics were different (including boundary conditions, even if laws remain the same), but still that boundary condition priors should still tell us you can’t influence the past. I’m happy to elaborate.
First, I think saying CDT is wrong, full stop, is much less useful than saying that CDT has a limited domain of applicability (using Sean Carroll’s terminology from The Big Picture). Analogously, one shouldn’t say that Newtonian physics is wrong, but that it is has a limited domain of applicability, and one should be careful to apply it only in that domain. Of course, you can choose to stick to the “wrong” terminology; the claim is only that this is less useful.
So what’s the domain of applicability of CDT? Roughly, I think the domain is cases where the agent can’t be predicted by other agents in the world. I personally like to call this the “free will” case, but that’s my personal definition, so if you don’t like that definition we can call it the non-prediction case. The deterministic twin case violates this, as there is a dimension of decision making where non-prediction fails: each twin can perfectly predict the other’s actions conditional on their own actions. So deterministic twins are outside the domain of applicability of CDT.
A consequence of this view is whether we are in or out of the domain of applicability of CDT is an empirical question: you can’t resolve it from pure theory. I further claim (without pinning down the definitions very well) that “generic, un-tuned” situations fall into the non-prediction case. This is again an empirical claim, and roughly says that “something needs to happen” to be outside the non-prediction case. In the deterministic twin case, this “something” is the intentional construction of the twins. Some detailed claims:
Humanity’s past fits the non-prediction case. For example, it is not the case that “perhaps you and some of the guards are implementing vaguely similar decision procedures at some level” in World War 1, not least because most of decision theory was invented after World War 1. Again, this is a purely empirical claim: it could have been otherwise, and I’m claiming it wasn’t.
The multiverse fits the non-prediction case. I also believe that once we have a sufficient understanding of cosmology, we will conclude that it is most likely that the multiverse fits the non-prediction case, roughly because the causal linkages behind the multiverse (through quantum branching, inflation, or logical possibilities) are high temperature in some sense. This is an again an empirical prediction about cosmology, though of course it’s much harder to check and I’m much less confident in it than for (1).
The world does not entirely fall into the non-prediction case. As an example, it is perilous when advertisers have too much information and computation asymmetry with users, since that asymmetry can break non-prediction (more here). A consequence of this is that it’s good that people are studying decision theories with larger domains of applicability.
AGI safety v1 can likely be made to fall into the non-prediction case. This is another highly contingent claim, and requires some action to ensure, namely somehow telling AGIs to avoid the non-prediction case in appropriate senses (and designing them so that this is possible to do). (I expect to get jumped on for this one, but before you believe I’m just ignorant it might be worth asking Paul whether I’m just ignorant.) And I do mean v1; it’s quite possible that v2 goes better if we have the option of not telling them this.
I do want to emphasize that as a consequence of (3), (4), uncertainty about (2), and a way tinier amount of uncertainty about (1), I’m happy people are exploring this space. But of course I’m also going to place a lower estimate on its importance as a consequence of the above.
Re: “physics-based priors,” I don’t think I have a full sense of what you have in mind, but at a high level, I don’t yet see how physics comes into the debate. That is, AFAICT everyone agrees about the relevant physics — and in particular, that you can’t causally influence the past, “change” the past, and so on. The question as I see it (and perhaps I should’ve emphasized this more in the post, and/or put things less provocatively) is more conceptual/normative: whether when making decisions we should think of the past the way CDT does — e.g., as a set of variables whose probabilities our decision-making can’t alter — or in the way that e.g. EDT does — e.g., as a set of variables whose probabilities our decision-making can alter (and thus, a set of variables that EDT-ish decision-making implicitly tries to “control” in a non-causal sense). Non-causal decision theories are weird; but they aren’t actually “I don’t believe in normal physics” weird. They’re more “I believe in managing the news about the already-fixed past” weird.
Re: CDT’s domain of applicability, it sounds like your view is something like: “CDT generally works, but it fails in the type of cases that Joe treats as counter-examples to CDT.” I agree with this, and I think most people who reject CDT would agree, too (after all, most decision theories agree on what to do in most everyday cases; the traditional questions have been about what direction to go when their verdicts come apart). I’m inclined to think of this as CDT being wrong, because I’m inclined to think of decision theory as searching for the theory that will get the full range of cases right — but I’m not sure that much hinges on this. That said, I do think that even acknowledging that CDT fails sometimes involves rejecting some principles/arguments one might’ve thought would hold good in general (e.g. “c’mon, man, it’s no use trying to control the past,”the “what would your friend who can see what’s in the boxes say is better” argument, and so on) and thereby saying some striking and weird stuff (e.g. “Ok, it makes sense to try to control the past sometimes, just not that often”).
Re: 1-4, I agree that whether or not CDT leads you astray in a given case is an empirical question. I don’t have strong views about what range of actual cases are like this — though I’m sympathetic to your view re: 1, and as I mention in the post, I generally think we should just err on the side of not doing stuff that looks silly by normal lights. I also don’t have strong views about the relevance of non-causal decision-theory research for AGI safety (this project mostly emerged from personal interest).
By “physics-based” I’m lumping together physics and history a bit, but it’s hard to disentangle them especially when people start talking about multiverses. I generally mean “the combined information of the laws of physics and our knowledge of the past”. The reason I do want to cite physics too, even for the past case of (1), is that if you somehow disagreed about decision theorists in WW1 I’d go to the next part of the argument, which is that under the technology of WW1 we can’t do the necessary predictive control (they couldn’t build deterministic twins back then).
However, it seems like we’re mostly in agreement, and you could consider editing the post to make that more clear. The opening line of your post is “I think that you can “control” events you have no causal interaction with, including events in the past.” Now the claim is “everyone agrees about the relevant physics — and in particular, that you can’t causally influence the past”. These two sentences seem inconsistent, and especially since your piece is long and quite technical opening with a wrong summary may confuse people.
I realize you can get out of the inconsistency by leaning on the quotes, but it still seems misleading.
Ah, I see: you’re going to lean on the difference between “cause” and “control”. So to be clear: I am claiming that, as an empirical matter, we also can’t control the past, or even “control” the past.
To expand, I’m not using physics priors to argue that physics is causal, so we can’t control the past. I’m using physics and history priors to argue that we exist in the non-prediction case relative to the past, so CDT applies.
Cool, this gives me a clearer picture of where you’re coming from. I had meant the central question of the post to be whether it ever makes sense to do the EDT-ish try-to-control-the-past thing, even in pretty unrealistic cases—partly because I think answering “yes” to this is weird and disorienting in itself, even if it doesn’t end up making much of a practical difference day-to-day; and partly because a central objection to EDT is that the past, being already fixed, is never controllable in any practically-relevant sense, even in e.g. Newcomb’s cases. It sounds like your main claim is that in our actual everyday circumstances, with respect to things like the WWI case, EDTish and CDT recommendations don’t come apart—a topic I don’t spend much time on or have especially strong views about.
“you’re going to lean on the difference between ‘cause’ and ‘control’”—indeed, and I had meant the “no causal interaction with” part of opening sentence to indicate this. It does seem like various readers object to/were confused by the use of the term “control” here, and I think there’s room for more emphasis early on as to what specifically I have in mind; but at a high-level, I’m inclined to keep the term “control,” rather than trying to rephrase things solely in terms of e.g. correlations, because I think it makes sense to think of yourself as, for practical purposes, “controlling” what your copy writes on his whiteboard, what Omega puts in the boxes, etc; that more broadly, EDT-ish decision-making is in fact weird in the way that trying to control the past is weird, and that this makes it all the more striking and worth highlighting that EDT-ish decision-making seems, sometimes, like the right way to go.
So certainly physics-based priors is a big component, and indeed in some sense is all of it. That is, I think physics-based priors should give you an immediate answer of “you can’t influence the past with high probability”, and moreover that once you think through the problems in detail the conclusion will be that you could influence the past if physics were different (including boundary conditions, even if laws remain the same), but still that boundary condition priors should still tell us you can’t influence the past. I’m happy to elaborate.
First, I think saying CDT is wrong, full stop, is much less useful than saying that CDT has a limited domain of applicability (using Sean Carroll’s terminology from The Big Picture). Analogously, one shouldn’t say that Newtonian physics is wrong, but that it is has a limited domain of applicability, and one should be careful to apply it only in that domain. Of course, you can choose to stick to the “wrong” terminology; the claim is only that this is less useful.
So what’s the domain of applicability of CDT? Roughly, I think the domain is cases where the agent can’t be predicted by other agents in the world. I personally like to call this the “free will” case, but that’s my personal definition, so if you don’t like that definition we can call it the non-prediction case. The deterministic twin case violates this, as there is a dimension of decision making where non-prediction fails: each twin can perfectly predict the other’s actions conditional on their own actions. So deterministic twins are outside the domain of applicability of CDT.
A consequence of this view is whether we are in or out of the domain of applicability of CDT is an empirical question: you can’t resolve it from pure theory. I further claim (without pinning down the definitions very well) that “generic, un-tuned” situations fall into the non-prediction case. This is again an empirical claim, and roughly says that “something needs to happen” to be outside the non-prediction case. In the deterministic twin case, this “something” is the intentional construction of the twins. Some detailed claims:
Humanity’s past fits the non-prediction case. For example, it is not the case that “perhaps you and some of the guards are implementing vaguely similar decision procedures at some level” in World War 1, not least because most of decision theory was invented after World War 1. Again, this is a purely empirical claim: it could have been otherwise, and I’m claiming it wasn’t.
The multiverse fits the non-prediction case. I also believe that once we have a sufficient understanding of cosmology, we will conclude that it is most likely that the multiverse fits the non-prediction case, roughly because the causal linkages behind the multiverse (through quantum branching, inflation, or logical possibilities) are high temperature in some sense. This is an again an empirical prediction about cosmology, though of course it’s much harder to check and I’m much less confident in it than for (1).
The world does not entirely fall into the non-prediction case. As an example, it is perilous when advertisers have too much information and computation asymmetry with users, since that asymmetry can break non-prediction (more here). A consequence of this is that it’s good that people are studying decision theories with larger domains of applicability.
AGI safety v1 can likely be made to fall into the non-prediction case. This is another highly contingent claim, and requires some action to ensure, namely somehow telling AGIs to avoid the non-prediction case in appropriate senses (and designing them so that this is possible to do). (I expect to get jumped on for this one, but before you believe I’m just ignorant it might be worth asking Paul whether I’m just ignorant.) And I do mean v1; it’s quite possible that v2 goes better if we have the option of not telling them this.
I do want to emphasize that as a consequence of (3), (4), uncertainty about (2), and a way tinier amount of uncertainty about (1), I’m happy people are exploring this space. But of course I’m also going to place a lower estimate on its importance as a consequence of the above.
Thanks for these comments.
Re: “physics-based priors,” I don’t think I have a full sense of what you have in mind, but at a high level, I don’t yet see how physics comes into the debate. That is, AFAICT everyone agrees about the relevant physics — and in particular, that you can’t causally influence the past, “change” the past, and so on. The question as I see it (and perhaps I should’ve emphasized this more in the post, and/or put things less provocatively) is more conceptual/normative: whether when making decisions we should think of the past the way CDT does — e.g., as a set of variables whose probabilities our decision-making can’t alter — or in the way that e.g. EDT does — e.g., as a set of variables whose probabilities our decision-making can alter (and thus, a set of variables that EDT-ish decision-making implicitly tries to “control” in a non-causal sense). Non-causal decision theories are weird; but they aren’t actually “I don’t believe in normal physics” weird. They’re more “I believe in managing the news about the already-fixed past” weird.
Re: CDT’s domain of applicability, it sounds like your view is something like: “CDT generally works, but it fails in the type of cases that Joe treats as counter-examples to CDT.” I agree with this, and I think most people who reject CDT would agree, too (after all, most decision theories agree on what to do in most everyday cases; the traditional questions have been about what direction to go when their verdicts come apart). I’m inclined to think of this as CDT being wrong, because I’m inclined to think of decision theory as searching for the theory that will get the full range of cases right — but I’m not sure that much hinges on this. That said, I do think that even acknowledging that CDT fails sometimes involves rejecting some principles/arguments one might’ve thought would hold good in general (e.g. “c’mon, man, it’s no use trying to control the past,”the “what would your friend who can see what’s in the boxes say is better” argument, and so on) and thereby saying some striking and weird stuff (e.g. “Ok, it makes sense to try to control the past sometimes, just not that often”).
Re: 1-4, I agree that whether or not CDT leads you astray in a given case is an empirical question. I don’t have strong views about what range of actual cases are like this — though I’m sympathetic to your view re: 1, and as I mention in the post, I generally think we should just err on the side of not doing stuff that looks silly by normal lights. I also don’t have strong views about the relevance of non-causal decision-theory research for AGI safety (this project mostly emerged from personal interest).
By “physics-based” I’m lumping together physics and history a bit, but it’s hard to disentangle them especially when people start talking about multiverses. I generally mean “the combined information of the laws of physics and our knowledge of the past”. The reason I do want to cite physics too, even for the past case of (1), is that if you somehow disagreed about decision theorists in WW1 I’d go to the next part of the argument, which is that under the technology of WW1 we can’t do the necessary predictive control (they couldn’t build deterministic twins back then).
However, it seems like we’re mostly in agreement, and you could consider editing the post to make that more clear. The opening line of your post is “I think that you can “control” events you have no causal interaction with, including events in the past.” Now the claim is “everyone agrees about the relevant physics — and in particular, that you can’t causally influence the past”. These two sentences seem inconsistent, and especially since your piece is long and quite technical opening with a wrong summary may confuse people.
I realize you can get out of the inconsistency by leaning on the quotes, but it still seems misleading.
Ah, I see: you’re going to lean on the difference between “cause” and “control”. So to be clear: I am claiming that, as an empirical matter, we also can’t control the past, or even “control” the past.
To expand, I’m not using physics priors to argue that physics is causal, so we can’t control the past. I’m using physics and history priors to argue that we exist in the non-prediction case relative to the past, so CDT applies.
Cool, this gives me a clearer picture of where you’re coming from. I had meant the central question of the post to be whether it ever makes sense to do the EDT-ish try-to-control-the-past thing, even in pretty unrealistic cases—partly because I think answering “yes” to this is weird and disorienting in itself, even if it doesn’t end up making much of a practical difference day-to-day; and partly because a central objection to EDT is that the past, being already fixed, is never controllable in any practically-relevant sense, even in e.g. Newcomb’s cases. It sounds like your main claim is that in our actual everyday circumstances, with respect to things like the WWI case, EDTish and CDT recommendations don’t come apart—a topic I don’t spend much time on or have especially strong views about.
“you’re going to lean on the difference between ‘cause’ and ‘control’”—indeed, and I had meant the “no causal interaction with” part of opening sentence to indicate this. It does seem like various readers object to/were confused by the use of the term “control” here, and I think there’s room for more emphasis early on as to what specifically I have in mind; but at a high-level, I’m inclined to keep the term “control,” rather than trying to rephrase things solely in terms of e.g. correlations, because I think it makes sense to think of yourself as, for practical purposes, “controlling” what your copy writes on his whiteboard, what Omega puts in the boxes, etc; that more broadly, EDT-ish decision-making is in fact weird in the way that trying to control the past is weird, and that this makes it all the more striking and worth highlighting that EDT-ish decision-making seems, sometimes, like the right way to go.