Error
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
Why are these expected values finite even in the limit?
It looks like this model is assuming that there is some floor risk level that the risk never drops below, which creates an upper bound for survival probability through n time periods based on exponential decay at that floor risk level. With the time of perils model, there is a large jolt of extinction risk during the time of perils, and then exponential decay of survival probability from there at the rate given by this risk floor.
The Jupyter notebook has this value as r_low=0.0001 per time period. If a time period is a year, that means a 1⁄10,000 chance of extinction each year after the time of perils is over. This implies a 10^-43 chance of surviving an additional million years after the time of perils is over (and a 10^-434 chance of surviving 10 million years, and a 10^-4343 chance of surviving 100 million years, …). This basically amounts to assuming that long-lived technologically advanced civilization is impossible. It’s why you didn’t have to run this model past the 140,000 year mark.
This constant r_low also gives implausible conditional probabilities. e.g. Intuitively, one might think that a technologically advanced civilization that has survived for 2 million years after making it through its time of perils has a pretty decent chance of making it to the 3 million year mark. But this model assumes that it still has a 1⁄10,000 chance of going extinct next year, and a 10^-43 chance of making it through another million years to the 3 million year mark.
This seems like a problem for any model which doesn’t involve decaying risk. If per-time-period risk is 1/n, then the model becomes wildly implausible if you extend it too far beyond n time periods, and it may have subtler problems before that. Perhaps you could (e.g.) build a time of perils model on top of a decaying r_low.
(Commenting on mobile, so excuse the link formatting.)
See also this comment and thread by Carl Shulman: https://forum.effectivealtruism.org/posts/zLZMsthcqfmv5J6Ev/the-discount-rate-is-not-zero?commentId=Nr35E6sTfn9cPxrwQ
Including his estimate (guess?) of 1 in a million risk per century in the long run:
https://forum.effectivealtruism.org/posts/zLZMsthcqfmv5J6Ev/the-discount-rate-is-not-zero?commentId=GzhapzRs7no3GAGF3
In general, even assigning a low but non-tiny probability to low long run risks can allow huge expected values.
See also Tarsney’s The Epistemic Challenge to Longtermism https://philarchive.org/rec/TARTEC-2 which is basically the cubic model here, with consistent per period risk rate over time, but allowing uncertainty over the rate.
Thorstad has recently responded to Tarsney’s model, by the way: https://ineffectivealtruismblog.com/2023/09/22/mistakes-in-the-moral-mathematics-of-existential-risk-part-4-optimistic-population-dynamics/
Good to hear from you Michael! Some thoughts:
You’re right that the Tarsney paper was an important driver in bringing cubic to this framework. That’s why it’s a key source in the value cases summary. Modelling uncertainty is an excellent next step for various scenarios.
Thanks very much for the link to David’s response. I hadn’t seen that!
Good to have the link to Carl’s thread, it’ll be valuable to run these models and get some visualisations with that 1 in a million estimate too!
It also seems worth mentioning grabby alien models, which, from my understanding, are consistent with a high probability of eventually encountering aliens. But again, we might not have near-certainty in such models or eventually encountering aliens. And I don’t know what kind of timeline this would happen on according to grabby alien models; I haven’t looked much into them.
One way to build risk decay into a model is to assume that the risk is unknown within some range, and to update on survival.
A very simple version of this is to assume an unknown constant per-century extinction risk, and to start with a uniform distribution on the size of that risk. Then the probability of going extinct in the first century is 1⁄2 (by symmetry), and the probability of going extinct in the second century conditional on surviving the first is smaller than that (since the higher-risk worlds have disproportionately already gone extinct) - with these assumptions it is exactly 1⁄3. In fact these very simple assumptions match Laplace’s law of succession, and so the probability of going extinct in the nth century conditional on surviving the first n-1 is 1/(n+1), and the unconditional probability of surviving at least n centuries is also 1/(n+1).
More realistic versions could put more thought into the prior, instead of just picking something that’s mathematically convenient.
Thank you very much Dan for your comments and for looking into the ins and outs of the work and highlighting various threads that could improve it.
There are two quite separate issues that you brought up here. First about infinite value, which can be recovered with new scenarios and, second, the specific parameter defaults used. The parameters the report used could be reasonable but also might seem over-optimistic or over-pessimistic, depending on your background views.
I totally agree that we should not anchor on any particular set of parameters, including the default ones. I think this is a good opportunity to emphasise one of the limitations in the concluding remarks saying that “we should be especially cautious about over-updating from specific quantitative conclusions”. As you hinted, one important reason for this is that the chosen parameters do not have enough data behind them and are not puzzles-free.
Some thoughts sparked by the comments in this thread:
You’re totally right to point out that the longer we survive in expectation the longer the simulation needs to be run for us to observe convergence.
I agree that risk is unlikely to be time-invariant for long eras, and I’m really excited about bringing in more realistic structures, like the one you suggest: an enriched Time of Perils with decaying risk. I’m hoping WIT or other interested researchers do more to spell out what these structures imply about the value of risk mitigation.
On the flip side of the default r_low seeming too high, if seen from the point of view of the start of a century, it’d imply a (1−0.0001)100≈0.99004933869 probability of surviving each century.
A tiny r_low might be more realistic, though I confess lacking strong intuitions either way about how risk will behave in the coming centuries, let alone millennia. In my mind, risk could decay or increase, and I do hope the patterns so far, for example these last 500 years, are nothing to go by.
Your point about conditional probabilities is a good way to introduce and think about thought experiments on risk profiles. It made me think that a civilisation like the one you describe surviving different hurdles could be modelled under Great Filters where you indeed use an r_low orders of magnitude smaller than the current default and you’d get something that fits the picture you’d suggest much better, even without introducing any modifications like the decaying risk. Let me know if you play around with the code to visualise this.
(speaking for myself)
The conditional risk point seems like a very interesting crux between people; I’ve talked both to people who think the point is so obviously true that it’s close to trivial and to people who think it’s insane (I’m more in the “close to trivial” position myself).
Another way to get infinite EV in the time of perils model would be to have a nonzero lower bound on the per period risk rate across a rate sequence, but allow that lower bound to vary randomly and get arbitrarily close to 0 across rate sequences. You can basically get a St Petersburg game, with the right kind of distribution over the long-run lower bound per period risk rate. The outcome would have finite value with probability 1, but still infinite EV.
EDIT: To illustrate, if f(r), the expected value of the future conditional on a per period risk rate r in the limit, goes to infinity as r goes to 0, then the expected value of f(r) will be infinite over at least some distributions for r in an interval (0, b], which excludes 0.
Furthermore, if you assign any positive credence to subdistributions over the rates together that give infinite conditional EV, then the unconditional expected value will be infinite (or undefined). So, I think you need to be extremely confident (imo, overconfident) to avoid infinite or undefined expected values under risk neutral expectational total utilitarianism.
This is absolutely fantastic work! One of the Forum posts of the year so far! A really good step towards getting robust estimates of xRisk work, would be great to see other work following up on this research agenda (both OAT and your own).[1]
Some thoughts:
If I understand correctly, the value gained from action M is always the same if the fractional reduction in xRisk is the same, ceteris paribus? That still means that there seems to be a tradeoff between assuming a high rate of xRisk and believing in astronomical value, assuming that the cost of an intervention is linear in size relative to the amount decrease (i.e. decrease xrisk from 50% to 40% in a given t is 10 times as hard than reducing from 50% to 49% - would be interesting to see this worked out robustly) I think that’s a robust finding which seems to be unintuitive both for EAs and EA critics
If you had to (and think it’s appropriate to do so), what do you think the default assumptions of xRisk mitigation efforts in EA currently believe to be true? I’d guess it’d be ‘time of perils’ and maybe quadratic or cubic growth? But as you point out, the difference between quadratic/cubic is immense, and could easily flip whether it would be the best marginal option for altruistic funding.
I’d be interested to see what BOTEC EVs look like under this model and some assumptions. Thorstad has done something like this, but it’d be good to get a more robust sense of what parameter configurations would be needed to make xRisk reduction become competitive with top-rated GiveWell Charities
Your finding on convergence is I think very important, not least because it undercuts one of the most common criticisms of xRisk/longtermist work “this assigns infinite value to future people which justifies arbitrary moral harm to current people” which just turns out to not hold under your models here. Not going to hold my breath for this critics to update though.
Great work sharing the notebook <3 really love the transparency, I think something like this should become more standard (not just in EA, but everywhere) so wanted to give you big props for exposing your model/code/parameters for anyone to check.
So yeah, great work, love it! Would love to see and support more work along these lines.
The new acronym could be ATOM perhaps? ;)
Thank you for all the comments JWS, I found your excitement contagious.
Some thoughts on your thoughts:
I couldn’t agree more that there’d be a lot of value from laying out parameter configurations. We have some more work coming out as part of this sequence that aims to help fill this gap!
I think it’d be great to see some survey data on what the commonly assumed risk patterns and valued trajectories are in the EA community. I’ve made a push from my little corner to hopefully get some data on common views. Whichever they are, you’re right to point out the immense differences in what could they imply.
I’m really happy that you found the notebook useful. I’ll make sure to update the GitHub with any new features and code discussions.
Nice comments!
My guess would be Time of Perils, but with a risk decaying exponentially to 0 after it (instead of a low constant risk).
Something similar to that critique (replacing infinite by astronomically large, and arbitrary by significant) could still hold if the risk decays to 0.
It’s true there are other scenarios that would recover infinite value. And the proof fails, as mentioned in the convergence section, with changes like r∞=0, or when the logistic cap c→∞ and we end up in the exponential case.
All that said, it is plausible that the universe has a finite length after all, which would provide that finite upper bound. Heat death, proton decay or even just the amount of accessible matter could provide physical limits. It’d be great to see more discussions on this informed by updated astrophysical theories.
Thanks for following up!
Personally, I do not think allowing the risk to decay to 0 is problematic. For a sufficiently long timeframe, there will be evidential symmetry between the risk profiles of any 2 actions (e.g. maybe everything that is bound together will dissolve), so the expected value of mitigation will eventually reach 0. As a result, the expected cumulative value of mitigation always converges.
This is excellent research! The quality of Rethink Priorities’ output consistently impresses me.
A couple questions:
What software did you use to create figure 1?
What made you decide to use discrete periods in your model as opposed to a continuous risk probability distribution?
Thank you very much Roman!
I used blender, modelled the 3D spheres, rendered it and photoshop for the text.
Discrete-time was inherited from the previous framework (OAT). It can be simpler, but continuous is sometimes more tractable and better suited for models emphasising other features. For example, when modelling economic growth directly, or when thinking about utility, or when we want to express a hazard rate that is micro-founded on some risk mechanism, those models would generally be better expressed in continuous time. This recent paper is a good example of the typical setups economics papers use in continuous time.
I don’t have the spare brain power to dig into this, but are you assuming that all possible trajectories have positive value?
Hi Siebe, yes, all the scenarios of this report assume positive value at all times. I don’t think it’s certain that this will happen which is why the concluding remarks mention “investigating value trajectories that feature negative value” as a possible extension. So, yes, I completely agree this is something to look into in more depth.
Right yeah, that makes sense.
I actually asked the same question as this research in my 2019 MA philosophy thesis and came to the informal conclusion that actions that moral disagreement about what is valuable + empirical uncertainty make it all very difficult: http://www.sieberozendal.com/wp-content/uploads/2020/01/Rozendal-S.T.-2019-Uncertainty-About-the-Expected-Moral-Value-of-the-Long-Term-Future.-MA-Thesis.pdf
You might find it interesting, though it’s much less formally sophisticated than your work :)
Great post—I’m embarrassed to have missed it til now! One key point I disagree with:
I think there are two big possible exceptions to the latter claim: benign AI and becoming sustainably multiplanetary. EAs have discussed the former a lot, and I don’t have much to add (though I’m highly sceptical of it as an arbitrary-value lock-in mechanism on cosmic timelines). I think the latter is more interestingly unexplored. Christopher Lankhof made a case for it here, but didn’t get much engagement, and what criticism he did get seems quite short-term to me: basically that shelters are a cheaper option, and therefore we should prioritise them.
Such criticism might or might not be true in the next few decades. But beyond that, if AI neither kills us nor locks us in to a dystopic or utopic path, and if there are no lightcone-threatening technologies available (e.g. the potential ability to trigger a false vacuum decay), then it seems like by far our best defence against extinction will be simple numbers. The more intelligent life there is in the more places, the bigger and therefore more improbable an event would have to be to kill everyone.
A naive—but I think reasonable, given above caveats—calculation would be to treat the destruction of life around each planet as at least somewhat independent. That would give us some kind of exponential decay function of extinction risk, such that your credence in extinction might be a(1-b)^(p-1), where a is some constant or function representing the risk of a single-planet civilisation going extinct, b is some decay rate—of max(1/2) for total complete independence of extinction on each planet—and p is the number of planets in your civilisation. Absent universe-destroying mechanisms or unstoppable AI, this credence would quickly approach 0.
Obviously ‘creating an self-sustaining settlement on a new planet’ isn’t exactly an everyday occurrence, but with a century or two of continuous technological progress (less, given rapid economic acceleration via e.g. moderately benign AI) it seems likely to progress via ‘doable’ to ‘actually pretty straightforward’. The same technologies that establish the first such colony will go a very long way towards establishing the next few.
In the shorter term, ‘self-sustainingness’ needn’t be an all or nothing proposition. A colony that could e.g. effectively recycle its nutrients for a decade or two would still likely serve as a better defence against e.g. biopandemics than any refuge on Earth—and unlike those on Earth, would be constantly pressure tested even before the apocalypse, so might end up being easier to make reliably robust (vs on-Earth shelters) than simple cost-analyses would suggest.
Thank you for adding various threads to the conversation Arepo! I don’t disagree with what I take to be your main point: benign AI and interstellar travel are likely to have a big impact. I will say though, while their success might significantly reduce risk, and for a long time, any given intervention is unlikely to make major progress towards them. Hence, at the intervention level, I’m tempted to remain sceptical about the abundance of interventions that dramatically reduce risk for a long time.
Great post! Some nitpicks...
In the 2nd sum, t = 1 and 500 are out of format. Before the 4th sum, rlow should be r_{low}. In the 4th sum, 10100 should be 10^100, and should be on top of the summation symbol.
You say r_0 is the starting risk, but the above implies r(0) = r_0 + r_inf. So I think r_0 should be replaced by r_0 - r_inf above, such that r(0) = r_0. I do not think this is relevant because I guess r_0 >> r_inf, so r_0 - r_inf is roughly equal to r_0.
f refers to a relative reduction in risk (not absolute), so I think you mean 0.01 % above (not “one basis point”). 1 basis point refers to an absolute variation of 0.01 pp.
Thank you very much for your words Vasco! And thank you for catching those formatting typos, I’ve corrected them now.
In order:
Two underscores seemed to have got lost in translation to markdown! Should be there now.
You’re right to point out that, in this context,r(0)=r0−r∞≈r0 but it isn’t exactly r0. I was using that approximation for the exposition but should have made that clearer, especially in the code. I’ve made minor corrections to reflect this.
I’ll also improve the phrasing to make the sentence you mentioned on f=0.0001 clearer.
Thanks again!
I was happy to see this endnote, but then I noticed several uses of “existential risk” in this abridged report when I think you should have said “extinction risk”. I’d recommend going through to check this.
It’s good to hear that you agree extinction is the better term in this framework. Though I think it makes sense to talk about the more general ‘existential’ term in the exposition sometimes. In particular, for entirely pedagogical reasons, I decided to leave it with the original terminology in the summary since readers who are already familiar with the original models might skim this post or miss that endnote, and the definition of risk hasn’t changed. I see this report, and the footnote, as asking researchers that, from hereon, we use extinction when the maths are set up like they are here. All that said, I’ve indeed noticed instances after the summary where the conceptual accuracy would be improved by making that swap. Thank you again; I’ll keep a closer eye on this, especially in future revised versions of the full report.
Hi Arvo,
I just wanted to note the overall expected value of the world may be driven by cases in which existential risk converges to 0, because the future should be discounted at its minimum. I also have the impression supporters of existential risk mitigation find the converge of existential risk to 0 quite plausible. In any case, I think there will still be convergence of the value of mitigation. After a sufficiently long time, the counterfactual value of mitigation will be 0 due to evidential symmetry, so the sum describing the value of mitigation will end in … 0 + 0 + 0 + 0 …, thus converging.