A toy model for technological existential risk
Preface: I’m quite uncertain about this model. It may be making too many simplifying assumptions to be useful, and I may not be applying Laplace’s Law of Succession correctly. I haven’t seen this approach anywhere, but I’m not confident it hasn’t been done before. I’m really interested in any comments or feedback anyone may have!
Acknowledgements: Thanks to Alex HT and Will Payne for interesting discussion, and thanks to Aimee Watts for proof-reading. All mistakes my own.
Summary
Humans seem likely to make much more technological progress in the next century then has been made previously.
Under a model based on Bostrom’s urn of creativity, having not discovered a devastating technology so far does not give much reason to think we won’t in the future.
This model serves as a way of deriving a prior for technological existential risk, and there might be good reasons to lower the expected risk.
This model may be too naïve an application of the urn of creativity, or of Laplace’s model, and may be flawed.
The Urn of Creativity Model
In the Vulnerable World Hypothesis, Bostrom describes technological progress as drawing balls blindly out of an urn. Some are white balls representing technologies that are relatively harmless, and some are black, representing technologies that upon discovery lead to human extinction.
So far humanity hasn’t drawn a black ball.
If we think existential risk in the next century mostly comes from dangerous new technologies, then we can think of estimating existential risk in the next century as estimating the chance we draw a black ball.
Laplace’s Law of Succession
Suppose we have no information about the ratio of black to white balls in the urn except our record so far. A mathematical rule called Laplace’s Law of Succession then says that the best estimate of the chance of drawing a black ball on the next go is , where n is the number of balls we’ve drawn so far, and s is the number of black balls we’ve drawn so far. So for us, s=0.
How do we calculate the probability of one black ball in the next m draws? Well we can’t just multiply the probability by itself, because every time you don’t draw a black ball, you should revise down your probability of drawing black. So the probability of drawing a black ball in the next m draws having drawn s black balls in the last n is
So assuming no black draws so far we get .
Applying the model
How do we choose n? Well we haven’t been literally drawing balls from an urn, but we can try and approximate a ‘draw’ by the equivalent time/resources needed for discovering a new technology. This is of course also hard to estimate and highly variable. But I don’t think the final answer is too sensitive to how we break up the time/resource into units.
Time
Firstly we can just approximate from time. Suppose human’s have been discovery new technology every year since the agricultural revolution. So we’ve drawn ~10,000 times. So n=10000, m=100.
We get .
So a 1% chance of extinction next century.
But we haven’t exactly been discovering technology at the same rate for the last 10,000 years. Suppose we think the vast majority of technology was discovered in the last ~250 years. Then n=250 instead. Then we get
So a 28.4% chance of extinction in next 100 years.
Person-years
However, even over the last 250 years the rate of progress has been increasing a lot, and so has the population! What if we used “person-years”? i.e. one draw from the urn is 1 year of life lived by a single person. Then we can use historical population estimates and future predictions. The total person years lived since 1750 is ~6.2 x 10^11, and the total number of person years we can expect in the next century is about ~10 x 10^12 [1].So n=6.2 x 10^11, m=10^12.
Then we get
.
So a 62% chance of extinction.
GDP
We could also consider using GDP instead. Suppose one draw from the urn is equivalent to $1 billion of GDP (though this doesn’t matter too much for the final answer). Then there has been GDP of $3.94 x 10^15 since 1750 and if we just assume 2% annual growth we can expect $3.76 x 10^16 in the next century. So n=3.94 x 10^6, m=3.76 x 10^7.
Then we get
So a 90% chance of extinction in the next century.
Uncertainties
There are quite a few things I’m uncertain about in this approach.
How does the model change if we update on inside view information?
We don’t just have the information that previous technology wasn’t disastrous, we also know what the technology was, and have a sense of how hard it is for a technology to cause devastation. This seems to be extra information not included in the model. Laplace’s law of succession works if we have a prior of a uniform distribution over the possible densities of black balls in the urn. I’m not sure how the final answers would change if this prior changed.
Am I correct in thinking the final answer is not too sensitive to the choice of a unit of a draw?
From just experimenting with different “units” e.g. $1 billion GDP equal to a draw, the final answer doesn’t seem to sensitive to the units. However I haven’t shown mathematically why this is the case.
Is the urn of creativity an over-simplification to the extent that this model is irrelevant?
We might, for example, expect that the chance of a technology being a black ball is going to increase over time as technology becomes more powerful. This might be analogous to the black balls being further down the urn. I’m unsure also how to incorporate this into the model and whether this would update us to think the risk higher or lower than calculated above. On the one hand it seems to clearly increase it if we think black balls will only become more likely to be drawn in the future. But on the other hand, perhaps if we would naturally not expect many black balls earlier in human history, then we can’t infer much from not finding any.
Conclusion
Overall, I think this model shouldn’t be taken too literally. But I think the following takeaway is interesting. Given how much technological advancement there is likely to be in the future compared to the past, we cannot be that confident that future technologies will not pose significant risks based only on the lack of harm done so far.
I’m really interested in any feedback or critiques of this approach, and whether it’s already been discussed somewhere else!
[1] Check this spreadsheet for my data calculations- data obtained from Our World in Data
So you’d in general be correct in applying Laplace’s law to this kind of scenario except that you run into selection effects (a keyword to Google is anthropic effect, or anthropic principle.) I.e., suppose that the chance of human extinction was actually much higher, on the order of 10% per year. Then, after 250 years, Earth will probably not have any humans, but if it does and they use Laplace’s rule to estimate its chances, it will overshoot them by a lot. That is, they can’t actually update on extinction happening because if it happens nobody will be there to update.
There is a magic trick where I give you a deck of cards, tell you to shuffle it, and choose a card however you want, and then I guess it correctly. Most of the time it doesn’t work, but on the 1⁄52 chance that it does, it looks really impressive (or so I’m told, I didn’t have the patience to do it enough times). There is also a scam based on a similar principle.
On the other hand, Laplace’s law is empirically really quite brutal, and in my experience tends to output probabilities that are too high. In particular, I’d assign some chance to there being no black balls, and that would eventually bring my probability of extinction close to 0, whereas Laplace’s law always predicts that an event will happen if given enough time (even if it has never happened before).
Overall, I guess I’d be more interested in trying to figure out the pathways to extinction and their probabilities. For technologies which already exist, that might involve looking at close calls, e.g., nuclear close calls.
Thanks for your comment!
I hadn’t thought to think about selection effects, thanks for pointing that out. I suppose Bostrom actually describes black balls as technologies that cause catastrophe but doesn’t set the bar as high as extinction. Then drawing a black ball doesn’t affect future populations drastically, so perhaps selection effects don’t apply?
Also, I think in The Precipice Toby Ord makes some inferences for natural extinction risk given the length of time humanity has existed for? Though I may not be remembering correctly. I think the logic was something like “Assume we’re randomly distributed amongst possible humans. If existential risk was very high, then there’d be a very small set of worlds in which humans have been around for this long, and it would be very unlikely that we’d be in such a world. Therefore it’s more likely that our estimate of existential risk is too high”. This then seems quite similar to my model of making inferences based on not having previously drawn a black ball. I don’t think I understand selection effects too well though so I appreciate any comments on this!