tl;dr: Tarsneyâs model updates me towards thinking reducing non-extinction existential risks should be a little less of a priority than I previously thought.
Hereâs a quote from Tarsney (which makes more sense after reading the rest of the paper):
The potentially enormous impact that the long-term rate of ENEs [events which nullify the intended effect of a longtermist intervention, e.g. a later extinction event] has on the expected value of longtermist interventions has implications for âintra-longtermistâ prioritization: We have strong pro tanto reason to focus on bringing about states such that both they and their complements are highly stable, since it is these interventions whose effects are likely to persist for a very long time (and thus to affect our civilization when it is more widespread and resource-rich). This suggests, in particular, that interventions focused on reducing existential risk may have higher expected value than, say, interventions aimed at reforming institutions or changing social values: Intuitively, the intended effects of these interventions are relatively easy to undo, or to achieve at some later date even if we fail to achieve them now. So the long-term rate of ENEs (i.e., value of r) may be significantly higher for these interventions than for existential risk mitigation.
This indeed seems like an interesting implication of Tarsneyâs model, and indeed updates me towards placing a bit less emphasis on reducing non-extinction existential risksâe.g., reducing the chance of lock-in of a bad governmental system or set of values.
(I already considered this a lower priority from longtermists as a whole than reducing extinction risks. But I also thought that longtermists should prioritise investigating this potential priority more than they currently do. I still think that, but now with a bit lower confidence.)
---
That said, I also think Tarsneyâs phrasing is a bit misleading. He compares âinterventions focused on reducing existential riskâ to âinterventions aimed at reforming institutions or changing social valuesâ. But interventions may be aimed at doing the latter as a means to doing the former; one could try to change institutions or social values with the primary goal of ultimately reducing existential risk (or extinction risk specifically). And Tarsneyâs model doesnât seem to push against those interventions relative to other means of reducing existential risk.
I think Tarsney really wants to compare interventions aimed at reducing extinction risk to interventions ultimately aimed at changing aspects of the long-term future other than whether humanity goes extinctâe.g., again, reducing the chance of lock-in of a bad governmental system or set of values.
This highlights another way in which Tarsneyâs phrasing seems a bit misleading: existential risk itself already includes non-extinction existential risk. So I think Tarsney should use the term âextinction riskâ here.
This indeed seems like an interesting implication of Tarsneyâs model, and indeed updates me towards placing a bit less emphasis on reducing non-extinction existential risksâe.g., reducing the chance of lock-in of a bad governmental system or set of values.
Surely âlock-inâ implies stability and persistence?
Greaves and MacAskill introduce the concept of the ânon-extinction attractor stateâ to capture interventions that can achieve the persistence Tarsney says is so important, but that donât rely on extinction to do so.
This includes institutional reform:
But once such institutions were created, they might persist indefinitely. Political institutions often change as a result of conflict or competition with other states. For strong world governments, this consideration would not apply (Caplan 2008). In the past, governments have also often changed as a result of civil war or internal revolution. However, advancing technology might make that far less likely for a future world government: modern and future surveillance technologies could prevent insurrection, and AI-controlled police and armies could be controlled by the leaders of the government, thereby removing the possibility of a military coup (Caplan 2008; Smith 2014).
Surely âlock-inâ implies stability and persistence?
Yeah, definitely. I see now that I didnât clearly explain what I meant. Itâs not that I changed my views on how how important the difference between lock-in of a bad governmental system or set of values and a future without such a lock-in is.
Itâs more like I somewhat updated my views regarding:
how likely such a lock-in is
and in particular how likely it is that a state that looks like it might be a lock-in would actually be a lock-in
and in particular how much the epistemic challenge to longtermism might undermine a focus on this type of potential lock-in in particular
And as a result, I somewhat updated me views regarding how much we should focus on preventing these outcomes. Analogous to how Iâd update my prioritisation of biorisk if I learned the relevant catastrophes were less likely than I thought, even if no less bad.
(Iâm still not sure that explanation is 100% clear.)
And yeah, Greaves and MacAskillâs ânon-extinction attractor stateâ concept is relevant here, and I liked that section of their paper :)
OK thatâs clearer, although Iâm not immediately sure why the paper would have achieved the following:
I somewhat updated my views regarding:
how likely such a lock-in is
and in particular how likely it is that a state that looks like it might be a lock-in would actually be a lock-in
...
I think Tarsney implies that institutional reform is less likely to be a true lock-in, but he doesnât really back this up with much argument. He just implies that this point is somewhat obvious. Under this assumption, I can understand why his model would lead to the following update:
...
...
and in particular how much the epistemic challenge to longtermism might undermine a focus on this type of potential lock-in in particular
In other words, if Tarsney had engaged in a discussion about why institutional change isnât actually likely to be stable/âpersistent, providing object-level reasons for why (which may involve disagreeing with Greaves and MacAskillâs points), I think I too would update away from thinking institutional change is that important, but I donât think he really engages in this discussion.
I should say that I havenât properly read through the whole paper (I have mainly relied on watching the video and skimming through the paper), so itâs possible Iâm missing some things.
I think it makes sense to be a bit confused about what claim Iâm making and why. I read the paper and made the initial version of these note a few weeks ago, so my memory of what the paper said and how it changed my views is slightly hazy.
But I think the key point is essentially the arguably obvious point that the rate of ENEs can be really important, and that that rate seems likely to be much higher when the target state is something like âa very good system of government or set of valuesâ or âa very bad system of government or set of valuesâ (compared to when the target state is whether an intelligent civilization exists). It does seem much more obvious that extinction or non-extinction are each stronger attractor states that particularly good or particularly bad non-extinction outcomes are.
This is basically something I already knew, but I think Tarsneyâs models and analysis made the point a bit more salient, and also made it clearer how important it is (since the rate of ENEs seems like probably one of the most important factors influencing the case for longtermism).
But what Iâve said above kind-of implicitly accepts Tarsneyâs focus (for the sake of his working example) on simply whether there is an intelligent civilization around, rather than what itâs doing. In reality, I think that what the civilization is doing is likely also very important.[1] So the above point about particularly good or particularly bad non-extinction outcomes maybe being only weak attractor states might also undermine the significance of keeping an intelligent civilization around.
But hereâs one way that might not be true: Maybe we think itâs easier to have a lock-in ofâor natural trends that maintainâa good non-extinction outcome than a bad non-extinction outcome. (I think Ord essentially implies this in The Precipice. I might soon post something related to this. Itâs also been discussed in some other places, e.g. here.) If so, then the point about the rate of ENEs suggests the case for avoiding unrecoverable dystopias and unrecoverable collapses might be weak, but it wouldnât as strongly suggest the case for avoiding extinction is weak.
...but this all seems rather complicated, and Iâm still not sure my thinking is clear, and even less sure my explanation is clear!
[1] Tarsney does acknowledge roughly this point later in the paper:
Additionally, there are other potential sources of epistemic resistance to longtermism besides Weak Attractors that this paper has not addressed. In particular, these include:
Neutral Attractors To entertain small values of r [the rate of ENEs], we must assume that the state S targeted by a longtermist intervention, and its complement ÂŹS, are both at least to some extent âattractorâ states: Once a system is in state S, or state ÂŹS, it is unlikely to leave that state any time soon. But to justify significant values of ve and vs, it must also be the case that the attractors we are able to target differ significantly in expected value. And itâs not clear that we can assume this. For instance, perhaps âlarge interstellar civilization exists in spatial region Xâ is an attractor state, but âlarge interstellar civilization exists in region X with healthy norms and institutions that generate a high level of valueâ is not. If civilizations tend to âwanderâ unpredictably between high-value and low-value states, it could be that despite their astronomical potential for value, the expected value of large interstellar civilizations is close to zero. In that case, we can have persistent effects on the far future, but not effects that matter (in expectation).
tl;dr: Tarsneyâs model updates me towards thinking reducing non-extinction existential risks should be a little less of a priority than I previously thought.
Hereâs a quote from Tarsney (which makes more sense after reading the rest of the paper):
See also Greaves and MacAskillâs concept of âattractor statesâ.
This indeed seems like an interesting implication of Tarsneyâs model, and indeed updates me towards placing a bit less emphasis on reducing non-extinction existential risksâe.g., reducing the chance of lock-in of a bad governmental system or set of values.
(I already considered this a lower priority from longtermists as a whole than reducing extinction risks. But I also thought that longtermists should prioritise investigating this potential priority more than they currently do. I still think that, but now with a bit lower confidence.)
---
That said, I also think Tarsneyâs phrasing is a bit misleading. He compares âinterventions focused on reducing existential riskâ to âinterventions aimed at reforming institutions or changing social valuesâ. But interventions may be aimed at doing the latter as a means to doing the former; one could try to change institutions or social values with the primary goal of ultimately reducing existential risk (or extinction risk specifically). And Tarsneyâs model doesnât seem to push against those interventions relative to other means of reducing existential risk.
I think Tarsney really wants to compare interventions aimed at reducing extinction risk to interventions ultimately aimed at changing aspects of the long-term future other than whether humanity goes extinctâe.g., again, reducing the chance of lock-in of a bad governmental system or set of values.
This highlights another way in which Tarsneyâs phrasing seems a bit misleading: existential risk itself already includes non-extinction existential risk. So I think Tarsney should use the term âextinction riskâ here.
Surely âlock-inâ implies stability and persistence?
Greaves and MacAskill introduce the concept of the ânon-extinction attractor stateâ to capture interventions that can achieve the persistence Tarsney says is so important, but that donât rely on extinction to do so.
This includes institutional reform:
Yeah, definitely. I see now that I didnât clearly explain what I meant. Itâs not that I changed my views on how how important the difference between lock-in of a bad governmental system or set of values and a future without such a lock-in is.
Itâs more like I somewhat updated my views regarding:
how likely such a lock-in is
and in particular how likely it is that a state that looks like it might be a lock-in would actually be a lock-in
and in particular how much the epistemic challenge to longtermism might undermine a focus on this type of potential lock-in in particular
And as a result, I somewhat updated me views regarding how much we should focus on preventing these outcomes. Analogous to how Iâd update my prioritisation of biorisk if I learned the relevant catastrophes were less likely than I thought, even if no less bad.
(Iâm still not sure that explanation is 100% clear.)
And yeah, Greaves and MacAskillâs ânon-extinction attractor stateâ concept is relevant here, and I liked that section of their paper :)
OK thatâs clearer, although Iâm not immediately sure why the paper would have achieved the following:
I think Tarsney implies that institutional reform is less likely to be a true lock-in, but he doesnât really back this up with much argument. He just implies that this point is somewhat obvious. Under this assumption, I can understand why his model would lead to the following update:
In other words, if Tarsney had engaged in a discussion about why institutional change isnât actually likely to be stable/âpersistent, providing object-level reasons for why (which may involve disagreeing with Greaves and MacAskillâs points), I think I too would update away from thinking institutional change is that important, but I donât think he really engages in this discussion.
I should say that I havenât properly read through the whole paper (I have mainly relied on watching the video and skimming through the paper), so itâs possible Iâm missing some things.
[Writing this comment quickly]
I think it makes sense to be a bit confused about what claim Iâm making and why. I read the paper and made the initial version of these note a few weeks ago, so my memory of what the paper said and how it changed my views is slightly hazy.
But I think the key point is essentially the arguably obvious point that the rate of ENEs can be really important, and that that rate seems likely to be much higher when the target state is something like âa very good system of government or set of valuesâ or âa very bad system of government or set of valuesâ (compared to when the target state is whether an intelligent civilization exists). It does seem much more obvious that extinction or non-extinction are each stronger attractor states that particularly good or particularly bad non-extinction outcomes are.
This is basically something I already knew, but I think Tarsneyâs models and analysis made the point a bit more salient, and also made it clearer how important it is (since the rate of ENEs seems like probably one of the most important factors influencing the case for longtermism).
But what Iâve said above kind-of implicitly accepts Tarsneyâs focus (for the sake of his working example) on simply whether there is an intelligent civilization around, rather than what itâs doing. In reality, I think that what the civilization is doing is likely also very important.[1] So the above point about particularly good or particularly bad non-extinction outcomes maybe being only weak attractor states might also undermine the significance of keeping an intelligent civilization around.
But hereâs one way that might not be true: Maybe we think itâs easier to have a lock-in ofâor natural trends that maintainâa good non-extinction outcome than a bad non-extinction outcome. (I think Ord essentially implies this in The Precipice. I might soon post something related to this. Itâs also been discussed in some other places, e.g. here.) If so, then the point about the rate of ENEs suggests the case for avoiding unrecoverable dystopias and unrecoverable collapses might be weak, but it wouldnât as strongly suggest the case for avoiding extinction is weak.
...but this all seems rather complicated, and Iâm still not sure my thinking is clear, and even less sure my explanation is clear!
[1] Tarsney does acknowledge roughly this point later in the paper:
OK thanks I think that is clearer now.