Thanks very much for writing this it. I’d started to wonder about the same idea but this is a much better and clearer analysis than I could have done! A few questions as I try to get my head around this.
Could you say more about why the predictability trends towards zero? It’s intuitive that it does, but I’m not sure I can explain that intuition. Something like: we should have a uniform prior over the actual value of the action at very distant periods of time, right? An alternative assumption would be that the action has a continuous stream of benefits in perpetuity. I’m not sure how reasonable that is. Or is it the inclusion of counterfactuals, i.e. if that you didn’t do that good thing, someone else would be right behind you anyway?
Regarding ‘attractor states’, is the thought then that we shouldn’t have a uniform prior regarding what happens to those in the long run?
I’m wondering if the same analysis can be applied to actions as to the ‘business as usual’ trajectory of the future, i.e. where we don’t intervene. Many people seem to think it’s clear that the future, if it happens, will be good, and that we shouldn’t discount it to/towards zero.
The posterior estimate of value trends towards zero because we assumed that the prior distribution of u_t has a mean of 0. Intuitively, absent any other evidence, we believe a priori that our actions will have a net effect of 0 on the world (at any time in the future). (For example, I might think that my action of drinking some water will have 0 effect on the future unless I see evidence to the contrary. There’s a bit of discussion of why you might have this kind of “sceptical” prior in Holden Karnofsky’s blog posts.) Then, because our signal becomes very noisy as we predict further into the future, we weight the signal less and so discount our posterior towards 0. It would be perfectly possible to include non-zero-mean priors into the model, but it’s hard for me to think (off the top of my head) in what scenarios this would make sense. In your example of having a continuous stream of benefits in perpetuity, it seems more natural to model this as evidence you’ve received about the effects of an intervention, rather than your prior belief about the effects of an intervention.
For attractor states, I basically don’t think that the assumption of a signal with increasing variance as the time horizon increases is a good way of modelling these. That’s because in some sense predictability increases over time with attractor states (at the beginning, you don’t know which state you’ll be in, so predictability of future value is low; once you’re in the attractor state, you persist in that state for a long time so predictability is high). As MichaelStJules mentioned, Christian Tarsney’s paper is a better starting point for thinking about these attractor states.
Thanks very much for writing this it. I’d started to wonder about the same idea but this is a much better and clearer analysis than I could have done! A few questions as I try to get my head around this.
Could you say more about why the predictability trends towards zero? It’s intuitive that it does, but I’m not sure I can explain that intuition. Something like: we should have a uniform prior over the actual value of the action at very distant periods of time, right? An alternative assumption would be that the action has a continuous stream of benefits in perpetuity. I’m not sure how reasonable that is. Or is it the inclusion of counterfactuals, i.e. if that you didn’t do that good thing, someone else would be right behind you anyway?
Regarding ‘attractor states’, is the thought then that we shouldn’t have a uniform prior regarding what happens to those in the long run?
I’m wondering if the same analysis can be applied to actions as to the ‘business as usual’ trajectory of the future, i.e. where we don’t intervene. Many people seem to think it’s clear that the future, if it happens, will be good, and that we shouldn’t discount it to/towards zero.
Thanks for some thought provoking questions!
The posterior estimate of value trends towards zero because we assumed that the prior distribution of u_t has a mean of 0. Intuitively, absent any other evidence, we believe a priori that our actions will have a net effect of 0 on the world (at any time in the future). (For example, I might think that my action of drinking some water will have 0 effect on the future unless I see evidence to the contrary. There’s a bit of discussion of why you might have this kind of “sceptical” prior in Holden Karnofsky’s blog posts.) Then, because our signal becomes very noisy as we predict further into the future, we weight the signal less and so discount our posterior towards 0. It would be perfectly possible to include non-zero-mean priors into the model, but it’s hard for me to think (off the top of my head) in what scenarios this would make sense. In your example of having a continuous stream of benefits in perpetuity, it seems more natural to model this as evidence you’ve received about the effects of an intervention, rather than your prior belief about the effects of an intervention.
For attractor states, I basically don’t think that the assumption of a signal with increasing variance as the time horizon increases is a good way of modelling these. That’s because in some sense predictability increases over time with attractor states (at the beginning, you don’t know which state you’ll be in, so predictability of future value is low; once you’re in the attractor state, you persist in that state for a long time so predictability is high). As MichaelStJules mentioned, Christian Tarsney’s paper is a better starting point for thinking about these attractor states.