Vasco, how do your estimates account for model uncertainty? I don’t understand how you can put some probability on something being possible (i.e. p(extinction|nuclear war) > 0), but end up with a number like 5.93e-12 (i.e. 1 in ~160 billion). That implies an extremely, extremely high level of confidence. Putting ~any weight on models that give higher probabilities would lead to much higher estimates.
Vasco, how do your estimates account for model uncertainty?
I tried to account for model uncertainty assuming 10^-6 probability of human extinction given insufficient calorie production.
I don’t understand how you can put some probability on something being possible (i.e. p(extinction|nuclear war) > 0), but end up with a number like 5.93e-12 (i.e. 1 in ~160 billion). That implies an extremely, extremely high level of confidence.
Note there are infinitely many orders of magnitude between 0 and any astronomically low number like 5.93e-12. At least in theory, I can be quite uncertain while having a low best guess. I understand greater uncertainty (e.g. higher ratio between the 95th and 5th percentile) holding the median constant tends to increase the mean of heavy-tailed distributions (like lognormals), but it is unclear to which extent this applies. I have also accounted for that by using heavy-tailed distributions whenever I thought appropriate (e.g. I modelled the soot injected into the stratosphere per equivalent yield as a lognormal).
As a side note, 10 of 161 (6.21 %) forecasters of the Existential Risk Persuasion Tournament (XPT), 4 experts and 6 superforecasters, predicted a nuclear extinction risk until 2100 of exactly 0. I guess these participants know the risk is higher than 0, but consider it astronomically low too.
Putting ~any weight on models that give higher probabilities would lead to much higher estimates.
I used to be persuaded by this type of argument, which is made in many contexts by the global catastrophic risk community. I think it often misses that the weight a model should receive is not independent of its predictions. I would say high extinction risk goes against the low prior established by historical conflicts.
I am also not aware of any detailed empirical quantitative models estimating the probability of extinction due to nuclear war.
That’s an odd prior. I can see a case for a prior that gets you to <10^-6, maybe even 10^-9, but how can you get to substantially below 10^-9 annual with just historical data???
Sapiens hasn’t been around for that long for longer than a million years! (and conflict with homo sapiens or other human subtypes still seems like a plausible reason for extinction of other human subtypes to me). There have only been maybe 4 billion species total in all of geological history! Even if you have almost certainty that literally no species has ever died of conflict, you still can’t get a prior much lower than 1⁄4,000,000,000! (10^-9).
EDIT: I suppose you can multiply average lifespan of species and their number to get to ~10^-15 or 10^-16 prior? But that seems like a much worse prior imo for multiple reasons, including that I’m not sure no existing species has died of conflict (and I strongly suspect specific ones have).
That’s an odd prior. I can see a case for a prior that gets you to <10^-6, maybe even 10^-9, but how can you get to substantially below 10^-9 annual with just historical data???
Fitting a power law to the N rightmost points of the tail distribution of annual conflict deaths as a fraction of the global population leads to an annual probability of a conflict causing human extinction lower than 10^-9 for N no higher than 33 (for which the annual conflict extinction risk is 1.72*10^-10), where each point corresponds to one year from 1400 to 2000. The 33 rightmost points have annual conflict deaths as a fraction of the global population of at least 0.395 %. Below is how the annual conflict extinction risk evolves with the lowest annual conflict deaths as a fraction of the global population included in the power law fit (the data is here; the post is here).
The leftmost points of the tail suggest a high extinction risk because the tail distribution is quite flat for very low annual conflict deaths as a fraction of the global population.
The extinction risk starts to decay a lot as one uses increasingly rightmost points of the tail because the actual tail distribution also decreases for high annual conflict deaths as a fraction of the global population.
Sapiens hasn’t been around for that long for longer than a million years! (and conflict with homo sapiens or other human subtypes still seems like a plausible reason for extinction of other human subtypes to me). There have only been maybe 4 billion species total in all of geological history! Even if you have almost certainty that literally no species has ever died of conflict, you still can’t get a prior much lower than 1⁄4,000,000,000! (10^-9).
Interesting numbers! I think that kind of argument is too agnostic, in the sense it does not leverage the empirical evidence we have about human conflicts, and I worry it leads to predictions which are very off. For example, one could also argue the annual probability of a human born in 2024 growing to an height larger than the distance from the Earth to the Sun cannot be much lower than 10^-6 because Sapiens have only been around for 1 M years or so. However, the probability should be way way lower than that (excluding genetic engineering, very long light appendages, unreasonable interpretations of what I am referring to, like estimating the probability from the chance a spaceship with humans will collide with the Sun, etc.). One can see the probability of a (non-enhanced) human growing to such an height is much lower than 10^-6 based on the tail distribution of human heights. Since height roughly follows a normal distribution, the probability of huge heights is negligible. It might be the case that past human heights (conflicts) are not informative of future heights (conflicts), but past heights still seem to suggest an astronomically low chance of huge heights (conflicts causing human extinction).
It is also unclear from past data whether annual conflict deaths as a fraction of the global population will increase.
Below is some data on the linear regression of the logarithm of the annual conflict deaths as a fraction of the global population on the year.
There has been a slight downwards trend in the logarithm of the annual conflict deaths as a fraction of the global population, with the R^2 of the linear regression of it on the year being 8.45 %. However, it is unclear to me whether the sign of the slope is resilient against changes in the function I used to model the ratio between the Conflict Catalog’s and actual annual conflict deaths.
Thanks for engaging! For some reason I didn’t get a notification on this comment.
Broadly I think you’re not accounting enough for model uncertainty. I absolutely agree that arguments like mine above are too agnostic if deployed very generally. It will be foolish to throw away all information we have in favor of very agnostic flat priors.
However, I think most of those situations are importantly disanalogous to the question at hand here. To answer most real-world questions, we have multiple lines of reasoning consistent with our less agnostic models, and can reduce model uncertainty by appealing to more than one source of confidence.
I will follow your lead and use the “Can a human born in 2024 grow to the length of one astronomical unit” question to illustrate why I think that question is importantly disanalogous to the probability of conflict causing extinction. I think there are good reasons both outside the usual language of statistical modeling, and within it. I will focus on “outside” because I think that’s a dramatically more important point, and also more generalizable. But I will briefly discuss a “within-model” reason as well, as it might be more persuasive to some readers (not necessarily yourself).
Below is Vasco’s analogy for reference:
Interesting numbers! I think that kind of argument is too agnostic, in the sense it does not leverage the empirical evidence we have about human conflicts, and I worry it leads to predictions which are very off. For example, one could also argue the annual probability of a human born in 2024 growing to an height larger than the distance from the Earth to the Sun cannot be much lower than 10^-6 because Sapiens have only been around for 1 M years or so. However, the probability should be way way lower than that (excluding genetic engineering, very long light appendages, unreasonable interpretations of what I am referring to, like estimating the probability from the chance a spaceship with humans will collide with the Sun, etc.). One can see the probability of a (non-enhanced) human growing to such an height is much lower than 10^-6 based on the tail distribution of human heights. Since height roughly follows a normal distribution, the probability of huge heights is negligible. It might be the case that past human heights (conflicts) are not informative of future heights (conflicts), but past heights still seem to suggest an astronomically low chance of huge heights (conflicts causing human extinction).
Outside-model reasons:
For human height we have very strong principled, scientific, reasons to believe that someone born in 2024 cannot grow to a height larger than one astronomical unit. Note that none of them appeal to a normal distribution of human height:
Square-cube law of biomechanics. Muscles in a human-shaped body simply cannot support a very tall person, even to say the length of a medium-sized whale. Certainly not vertically.
Calories. I haven’t ran the numbers, but I strongly suspect that there isn’t enough food calories on Earth for someone born in 2024 to grow all the way to the sun. Even if there is, you can add a few zeroes of doubt by questioning the economic feasibility/willingness for Earth to supply the calories towards the sole purpose of someone growing insanely tall.
Oxygen&atmosphere. We know enough about how human biology works to know that you can’t breathe outside of the atmosphere.
I suppose you can arrange for a helmet + some other method to pump oxygen, but you also need some way to protect the skin.
Structural stability. human flesh and bone is only so stable so I don’t think you can get very long even if (1) is solved.
Lifespan and “Growthspan”. Even if we magically solve all the other physical problems, as a numerical issue, growing even 2 feet/year is insanely fast for humans, and that gets you to only 240 feet (~70 meters) in a (generously) 120 year lifespan where someone never stops growing until they die. So a non-enhanced “human”, even if we no longer use Earth physics but instead “movie/storybook physics,” could not, as a numerical matter, even reach the size of a medium-sized skyscraper, never mind leave the atmosphere.
etc
I say all of this as someone without much of a natural sciences background. I suspect if I talk to a biophysicist or astrophysicist or someone who studies anatomy, they can easily point to many more reasons why the proposed forecast is impractical, without appealing to the normal distribution of human height. All of these point strongly against human height reaching very large values, never mind literally one astronomical unit. Now some theoretical reasons can provide us with good explanatory power about the shape of the statstical distribution as well (for example, things that point to the generator of human height being additive and thus following the Central Limit Theorem), but afaik those theoretical arguments are weaker/lower confidence than the bounding arguments.
In contrast, if the only thing I knew about human height is the empirical observed data on human height so far, (eg it’s just displayed as a column of numbers), plus some expert assurances that the data fits a normal distribution extremely well, I will be much less confident that human height cannot reach extremal values.
Put more concretely, human average male height is maybe 172cm with a standard deviation of 8cm (The linked source has a ton of different studies; I’m just eyeballing the relevant numbers). Ballpark 140 million people born a year, ~50% of which are men. This corresponds to ~5.7 sds. 5.7 sds past 172cm is 218cm. If we throw in 3 more s.d.s (which corresponds to >>1,000,000 difference on likelihood on a normal distribution), we get to 242cm. This is a result that will be predicted as extremely unlikely from statistically modeling a normal distribution, but “allowed” with the more generous scientific bounds from 1-6 above (at least, they’re allowed at my own level of scientific sophistication; again I’m not an expert on the natural sciences).
Am I confident that someone born in 2024 can’t grow to be 1 AU tall? Yep, absolutely.
Am I confident that someone born in 2024 can’t grow to be 242cm? Nope. I just don’t trust the statistical modeling all that much. (If you disagree and are willing to offer 1,000,000:1 odds on this question, I’ll probably be willing to bet on it).
This is why I think it’s important to be able to think about a problem from multiple angles. In particular, it often is useful (outside of theoretical physics?) to think of real physical reality as a real concrete thing with 3 dimensions, not (just) a bunch of abstractions.[1]
Statistical disagreement
I just don’t have much confidence that you can cleanly differentiate a power law or log-normal distribution from more extreme distributions from the data alone. One of my favorite mathy quotes is “any extremal distribution looks like a straight-line when drawn on a log-log plot with a fat marker[2]”.
Statistically, when your sample size is <1000, it’s not hard to generate a family of distributions that have much-higher-than-observed numbers with significant but <1/10,000 probability. Theoretically, I feel like I need some (ideally multiple) underlying justifications[3] for a distribution’s shape before doing a bunch of curve fits and calling it a day. Or like, you can do the curve fit, but I don’t see how the curve fit alone gives us enough info to rule out 1⁄100,000 or 1 in 1,000,000 or 1 in 1,000,000,000 possibilities.
Now for normal distributions, or normal-ish distributions, this may not matter all that much in practice. As you say “height roughly follows a normal distribution,” so as long as a distribution is ~roughly normal, some small divergences doesn’t get you too far away (maybe with a slightly differently shaped underlying distribution that fits the data it’s possible to get a 242 cm human, maybe even 260 cm, but not 400cm, and certainly not 4000 cm). But for extremal distributions this clearly matters a lot. Different extremal distributions predict radically different things at the tail(s).
Early on in my career as a researcher, I reviewed a white paper which made this mistake rather egregiously. Basically they used a range of estimates to 2 different factors (cell size and number of cells per feeding tank) to come to a desired conclusion, without realizing that the two different factors combined lead to a physical impossibility (the cells will then compose exactly 100% of the tank).
I usually deploy this line when arguing against people who claim they discovered a power law when I suspect something like ~log-normal might be a better fit. But obviously it works in the other direction as well, the main issue is model uncertainty.
Tho tbh, even if you did have strong, principled justifications for a distribution’s shape, I still feel like it’s hard to get >2 additional OOMs of confidence in a distribution’s underlying shape for non-synthetic data. (“These factors all sure seem relevant, and there are a lot of these factors, and we have strong principled reasons to see why they’re additive to the target, so the central limit theorem surely applies” sure seems pretty persuasive. But I don’t think it’s more than 100:1 odds persuasive to me).
To add to this, assuming your numbers are right (I haven’t checked), there have been multiple people born since 1980 who ended up taller than 242cm, which I expect would make any single normal distribution extremely unlikely to be a good model (either a poor fit on the tails, or a poor fit on a large share of the data), given our data:
https://en.m.wikipedia.org/wiki/List_of_tallest_people
I suppose some have specific conditions that led to their height. I don’t know if all or most do.
Now for normal distributions, or normal-ish distributions, this may not matter all that much in practice. As you say “height roughly follows a normal distribution,” so as long as a distribution is ~roughly normal, some small divergences doesn’t get you too far away (maybe with a slightly differently shaped underlying distribution that fits the data it’s possible to get a 242 cm human, maybe even 260 cm, but not 400cm, and certainly not 4000 cm).
Since height roughly follows a normal distribution, the probability of huge heights is negligible.
Right, by “the probability of huge heights is negligible”, I meant way more than 2.42 m, such that the details of the distribution would not matter. I would not get an astronomically low probability of at least such an height based on the methodology I used to get an astronomically low chance of a conflict causing human extinction. To arrive at this, I looked into the empirical tail distribution. I did not fit a distribution to the 25th to 75th range, which is probably what would have suggested a normal distribution for height, and then extrapolated from there. I said I got an annual probability of conflict causing human extinction lower than 10^-9 using 33 or less of the rightmost points of the tail distribution. The 33rd tallest person whose height was recorded was actually 2.42 m, which illustrates I would not have gotten an astronomically low probability for at least 2.42 m.
This is why I think it’s important to be able to think about a problem from multiple angles.
I agree. What do you think is the annualised probability of a nuclear war or volcanic eruption causing human extinction in the next 10 years? Do you see any concrete scenarios where the probability of a nuclear war or volcanic eruption causing human extinction is close to Toby’s values?
I usually deploy this line [“any extremal distribution looks like a straight-line when drawn on a log-log plot with a fat marker”] when arguing against people who claim they discovered a power law when I suspect something like ~log-normal might be a better fit. But obviously it works in the other direction as well, the main issue is model uncertainty.
I think power laws overestimate extinction risk. They imply the probability of going from 80 M annual deaths to extinction would be the same as going from extinction to 800 billion annual deaths, which very much overestimates the risk of large death tolls. So it makes sense the tail distribution eventually starts to decay much faster than implied by a power law, especially if this is fitted to the left tail.
On the other hand, I agree it is unclear whether the above tail distribution suggests an annual probability of a conflict causing human extinction above/below 10^-9. Still, even my inside view annual extinction risk from nuclear war of 5.53*10^-10 (which makes no use of the above tail distribution) is only 0.0111 % (= 5.53*10^-10/(5*10^-6)) of Toby’s value.
I did not fit a distribution to the 25th to 75th range, which is probably what would have suggested a normal distribution for height, and then extrapolated from there. I said I got an annual probability of conflict causing human extinction lower than 10^-9 using 33 or less of the rightmost points of the tail distribution. The 33rd tallest person whose height was recorded was actually 2.42 m, which illustrates I would not have gotten an astronomically low probability for at least 2.42 m.
To be clear, I’m not accusing you of removing outliers from your data. I’m saying that you can’t rule out medium-small probabilities of your model being badly off based on all the direct data you have access to, when you have so few data points to fit your model (not due to your fault, but because reality only gave you so many data points to look at).
My guess is that randomly selecting 1000 data points of human height and fitting a distribution will more likely than not generate a ~normal distribution, but this is just speculation, I haven’t done the data analysis myself.
What do you think is the annualised probability of a nuclear war or volcanic eruption causing human extinction in the next 10 years? Do you see any concrete scenarios where the probability of a nuclear war or volcanic eruption causing human extinction is close to Toby’s values?
I haven’t been able to come up with a good toy model or bounds that I’m happy with, after thinking about it for a bit (I’m sure less than you or Toby or others like Luisa). If you or other commenters have models that you like, please let me know!
(In particular I’d be interested in a good generative argument for the prior).
Am I confident that someone born in 2024 can’t grow to be 242cm? Nope. I just don’t trust the statistical modeling all that much. (If you disagree and are willing to offer 1,000,000:1 odds on this question, I’ll probably be willing to bet on it).
I do not want to take this bet, but I am open to other suggestions. For example, I think it is very unlikely that transformative AI, as defined in Metaculus, will happen in the next few years.
Vasco, how do your estimates account for model uncertainty? I don’t understand how you can put some probability on something being possible (i.e. p(extinction|nuclear war) > 0), but end up with a number like 5.93e-12 (i.e. 1 in ~160 billion). That implies an extremely, extremely high level of confidence. Putting ~any weight on models that give higher probabilities would lead to much higher estimates.
Thanks for the comment, Stephen.
I tried to account for model uncertainty assuming 10^-6 probability of human extinction given insufficient calorie production.
Note there are infinitely many orders of magnitude between 0 and any astronomically low number like 5.93e-12. At least in theory, I can be quite uncertain while having a low best guess. I understand greater uncertainty (e.g. higher ratio between the 95th and 5th percentile) holding the median constant tends to increase the mean of heavy-tailed distributions (like lognormals), but it is unclear to which extent this applies. I have also accounted for that by using heavy-tailed distributions whenever I thought appropriate (e.g. I modelled the soot injected into the stratosphere per equivalent yield as a lognormal).
As a side note, 10 of 161 (6.21 %) forecasters of the Existential Risk Persuasion Tournament (XPT), 4 experts and 6 superforecasters, predicted a nuclear extinction risk until 2100 of exactly 0. I guess these participants know the risk is higher than 0, but consider it astronomically low too.
I used to be persuaded by this type of argument, which is made in many contexts by the global catastrophic risk community. I think it often misses that the weight a model should receive is not independent of its predictions. I would say high extinction risk goes against the low prior established by historical conflicts.
I am also not aware of any detailed empirical quantitative models estimating the probability of extinction due to nuclear war.
That’s an odd prior. I can see a case for a prior that gets you to <10^-6, maybe even 10^-9, but how can you get to substantially below 10^-9 annual with just historical data???
Sapiens hasn’t been around for that long for longer than a million years! (and conflict with homo sapiens or other human subtypes still seems like a plausible reason for extinction of other human subtypes to me). There have only been maybe 4 billion species total in all of geological history! Even if you have almost certainty that literally no species has ever died of conflict, you still can’t get a prior much lower than 1⁄4,000,000,000! (10^-9).
EDIT: I suppose you can multiply average lifespan of species and their number to get to ~10^-15 or 10^-16 prior? But that seems like a much worse prior imo for multiple reasons, including that I’m not sure no existing species has died of conflict (and I strongly suspect specific ones have).
Thanks for the comment, Linch.
Fitting a power law to the N rightmost points of the tail distribution of annual conflict deaths as a fraction of the global population leads to an annual probability of a conflict causing human extinction lower than 10^-9 for N no higher than 33 (for which the annual conflict extinction risk is 1.72*10^-10), where each point corresponds to one year from 1400 to 2000. The 33 rightmost points have annual conflict deaths as a fraction of the global population of at least 0.395 %. Below is how the annual conflict extinction risk evolves with the lowest annual conflict deaths as a fraction of the global population included in the power law fit (the data is here; the post is here).
The leftmost points of the tail suggest a high extinction risk because the tail distribution is quite flat for very low annual conflict deaths as a fraction of the global population.
The extinction risk starts to decay a lot as one uses increasingly rightmost points of the tail because the actual tail distribution also decreases for high annual conflict deaths as a fraction of the global population.Interesting numbers! I think that kind of argument is too agnostic, in the sense it does not leverage the empirical evidence we have about human conflicts, and I worry it leads to predictions which are very off. For example, one could also argue the annual probability of a human born in 2024 growing to an height larger than the distance from the Earth to the Sun cannot be much lower than 10^-6 because Sapiens have only been around for 1 M years or so. However, the probability should be way way lower than that (excluding genetic engineering, very long light appendages, unreasonable interpretations of what I am referring to, like estimating the probability from the chance a spaceship with humans will collide with the Sun, etc.). One can see the probability of a (non-enhanced) human growing to such an height is much lower than 10^-6 based on the tail distribution of human heights. Since height roughly follows a normal distribution, the probability of huge heights is negligible. It might be the case that past human heights (conflicts) are not informative of future heights (conflicts), but past heights still seem to suggest an astronomically low chance of huge heights (conflicts causing human extinction).
It is also unclear from past data whether annual conflict deaths as a fraction of the global population will increase.
Below is some data on the linear regression of the logarithm of the annual conflict deaths as a fraction of the global population on the year.
As I have said:
Thanks for engaging! For some reason I didn’t get a notification on this comment.
Broadly I think you’re not accounting enough for model uncertainty. I absolutely agree that arguments like mine above are too agnostic if deployed very generally. It will be foolish to throw away all information we have in favor of very agnostic flat priors.
However, I think most of those situations are importantly disanalogous to the question at hand here. To answer most real-world questions, we have multiple lines of reasoning consistent with our less agnostic models, and can reduce model uncertainty by appealing to more than one source of confidence.
I will follow your lead and use the “Can a human born in 2024 grow to the length of one astronomical unit” question to illustrate why I think that question is importantly disanalogous to the probability of conflict causing extinction. I think there are good reasons both outside the usual language of statistical modeling, and within it. I will focus on “outside” because I think that’s a dramatically more important point, and also more generalizable. But I will briefly discuss a “within-model” reason as well, as it might be more persuasive to some readers (not necessarily yourself).
Below is Vasco’s analogy for reference:
Outside-model reasons:
For human height we have very strong principled, scientific, reasons to believe that someone born in 2024 cannot grow to a height larger than one astronomical unit. Note that none of them appeal to a normal distribution of human height:
Square-cube law of biomechanics. Muscles in a human-shaped body simply cannot support a very tall person, even to say the length of a medium-sized whale. Certainly not vertically.
Calories. I haven’t ran the numbers, but I strongly suspect that there isn’t enough food calories on Earth for someone born in 2024 to grow all the way to the sun. Even if there is, you can add a few zeroes of doubt by questioning the economic feasibility/willingness for Earth to supply the calories towards the sole purpose of someone growing insanely tall.
Oxygen&atmosphere. We know enough about how human biology works to know that you can’t breathe outside of the atmosphere.
I suppose you can arrange for a helmet + some other method to pump oxygen, but you also need some way to protect the skin.
Structural stability. human flesh and bone is only so stable so I don’t think you can get very long even if (1) is solved.
Lifespan and “Growthspan”. Even if we magically solve all the other physical problems, as a numerical issue, growing even 2 feet/year is insanely fast for humans, and that gets you to only 240 feet (~70 meters) in a (generously) 120 year lifespan where someone never stops growing until they die. So a non-enhanced “human”, even if we no longer use Earth physics but instead “movie/storybook physics,” could not, as a numerical matter, even reach the size of a medium-sized skyscraper, never mind leave the atmosphere.
etc
I say all of this as someone without much of a natural sciences background. I suspect if I talk to a biophysicist or astrophysicist or someone who studies anatomy, they can easily point to many more reasons why the proposed forecast is impractical, without appealing to the normal distribution of human height. All of these point strongly against human height reaching very large values, never mind literally one astronomical unit. Now some theoretical reasons can provide us with good explanatory power about the shape of the statstical distribution as well (for example, things that point to the generator of human height being additive and thus following the Central Limit Theorem), but afaik those theoretical arguments are weaker/lower confidence than the bounding arguments.
In contrast, if the only thing I knew about human height is the empirical observed data on human height so far, (eg it’s just displayed as a column of numbers), plus some expert assurances that the data fits a normal distribution extremely well, I will be much less confident that human height cannot reach extremal values.
Put more concretely, human average male height is maybe 172cm with a standard deviation of 8cm (The linked source has a ton of different studies; I’m just eyeballing the relevant numbers). Ballpark 140 million people born a year, ~50% of which are men. This corresponds to ~5.7 sds. 5.7 sds past 172cm is 218cm. If we throw in 3 more s.d.s (which corresponds to >>1,000,000 difference on likelihood on a normal distribution), we get to 242cm. This is a result that will be predicted as extremely unlikely from statistically modeling a normal distribution, but “allowed” with the more generous scientific bounds from 1-6 above (at least, they’re allowed at my own level of scientific sophistication; again I’m not an expert on the natural sciences).
Am I confident that someone born in 2024 can’t grow to be 1 AU tall? Yep, absolutely.
Am I confident that someone born in 2024 can’t grow to be 242cm? Nope. I just don’t trust the statistical modeling all that much.
(If you disagree and are willing to offer 1,000,000:1 odds on this question, I’ll probably be willing to bet on it).
This is why I think it’s important to be able to think about a problem from multiple angles. In particular, it often is useful (outside of theoretical physics?) to think of real physical reality as a real concrete thing with 3 dimensions, not (just) a bunch of abstractions.[1]
Statistical disagreement
I just don’t have much confidence that you can cleanly differentiate a power law or log-normal distribution from more extreme distributions from the data alone. One of my favorite mathy quotes is “any extremal distribution looks like a straight-line when drawn on a log-log plot with a fat marker[2]”.
Statistically, when your sample size is <1000, it’s not hard to generate a family of distributions that have much-higher-than-observed numbers with significant but <1/10,000 probability. Theoretically, I feel like I need some (ideally multiple) underlying justifications[3] for a distribution’s shape before doing a bunch of curve fits and calling it a day. Or like, you can do the curve fit, but I don’t see how the curve fit alone gives us enough info to rule out 1⁄100,000 or 1 in 1,000,000 or 1 in 1,000,000,000 possibilities.
Now for normal distributions, or normal-ish distributions, this may not matter all that much in practice. As you say “height roughly follows a normal distribution,” so as long as a distribution is ~roughly normal, some small divergences doesn’t get you too far away (maybe with a slightly differently shaped underlying distribution that fits the data it’s possible to get a 242 cm human, maybe even 260 cm, but not 400cm, and certainly not 4000 cm). But for extremal distributions this clearly matters a lot. Different extremal distributions predict radically different things at the tail(s).
Early on in my career as a researcher, I reviewed a white paper which made this mistake rather egregiously. Basically they used a range of estimates to 2 different factors (cell size and number of cells per feeding tank) to come to a desired conclusion, without realizing that the two different factors combined lead to a physical impossibility (the cells will then compose exactly 100% of the tank).
I usually deploy this line when arguing against people who claim they discovered a power law when I suspect something like ~log-normal might be a better fit. But obviously it works in the other direction as well, the main issue is model uncertainty.
Tho tbh, even if you did have strong, principled justifications for a distribution’s shape, I still feel like it’s hard to get >2 additional OOMs of confidence in a distribution’s underlying shape for non-synthetic data. (“These factors all sure seem relevant, and there are a lot of these factors, and we have strong principled reasons to see why they’re additive to the target, so the central limit theorem surely applies” sure seems pretty persuasive. But I don’t think it’s more than 100:1 odds persuasive to me).
To add to this, assuming your numbers are right (I haven’t checked), there have been multiple people born since 1980 who ended up taller than 242cm, which I expect would make any single normal distribution extremely unlikely to be a good model (either a poor fit on the tails, or a poor fit on a large share of the data), given our data: https://en.m.wikipedia.org/wiki/List_of_tallest_people
I suppose some have specific conditions that led to their height. I don’t know if all or most do.
Thanks, Linch. Strongly upvoted.
Right, by “the probability of huge heights is negligible”, I meant way more than 2.42 m, such that the details of the distribution would not matter. I would not get an astronomically low probability of at least such an height based on the methodology I used to get an astronomically low chance of a conflict causing human extinction. To arrive at this, I looked into the empirical tail distribution. I did not fit a distribution to the 25th to 75th range, which is probably what would have suggested a normal distribution for height, and then extrapolated from there. I said I got an annual probability of conflict causing human extinction lower than 10^-9 using 33 or less of the rightmost points of the tail distribution. The 33rd tallest person whose height was recorded was actually 2.42 m, which illustrates I would not have gotten an astronomically low probability for at least 2.42 m.
I agree. What do you think is the annualised probability of a nuclear war or volcanic eruption causing human extinction in the next 10 years? Do you see any concrete scenarios where the probability of a nuclear war or volcanic eruption causing human extinction is close to Toby’s values?
I think power laws overestimate extinction risk. They imply the probability of going from 80 M annual deaths to extinction would be the same as going from extinction to 800 billion annual deaths, which very much overestimates the risk of large death tolls. So it makes sense the tail distribution eventually starts to decay much faster than implied by a power law, especially if this is fitted to the left tail.
On the other hand, I agree it is unclear whether the above tail distribution suggests an annual probability of a conflict causing human extinction above/below 10^-9. Still, even my inside view annual extinction risk from nuclear war of 5.53*10^-10 (which makes no use of the above tail distribution) is only 0.0111 % (= 5.53*10^-10/(5*10^-6)) of Toby’s value.
To be clear, I’m not accusing you of removing outliers from your data. I’m saying that you can’t rule out medium-small probabilities of your model being badly off based on all the direct data you have access to, when you have so few data points to fit your model (not due to your fault, but because reality only gave you so many data points to look at).
My guess is that randomly selecting 1000 data points of human height and fitting a distribution will more likely than not generate a ~normal distribution, but this is just speculation, I haven’t done the data analysis myself.
I haven’t been able to come up with a good toy model or bounds that I’m happy with, after thinking about it for a bit (I’m sure less than you or Toby or others like Luisa). If you or other commenters have models that you like, please let me know!
(In particular I’d be interested in a good generative argument for the prior).
I do not want to take this bet, but I am open to other suggestions. For example, I think it is very unlikely that transformative AI, as defined in Metaculus, will happen in the next few years.