Vasco, how do your estimates account for model uncertainty?
I tried to account for model uncertainty assuming 10^-6 probability of human extinction given insufficient calorie production.
I donât understand how you can put some probability on something being possible (i.e. p(extinction|nuclear war) > 0), but end up with a number like 5.93e-12 (i.e. 1 in ~160 billion). That implies an extremely, extremely high level of confidence.
Note there are infinitely many orders of magnitude between 0 and any astronomically low number like 5.93e-12. At least in theory, I can be quite uncertain while having a low best guess. I understand greater uncertainty (e.g. higher ratio between the 95th and 5th percentile) holding the median constant tends to increase the mean of heavy-tailed distributions (like lognormals), but it is unclear to which extent this applies. I have also accounted for that by using heavy-tailed distributions whenever I thought appropriate (e.g. I modelled the soot injected into the stratosphere per equivalent yield as a lognormal).
As a side note, 10 of 161 (6.21 %) forecasters of the Existential Risk Persuasion Tournament (XPT), 4 experts and 6 superforecasters, predicted a nuclear extinction risk until 2100 of exactly 0. I guess these participants know the risk is higher than 0, but consider it astronomically low too.
Putting ~any weight on models that give higher probabilities would lead to much higher estimates.
I used to be persuaded by this type of argument, which is made in many contexts by the global catastrophic risk community. I think it often misses that the weight a model should receive is not independent of its predictions. I would say high extinction risk goes against the low prior established by historical conflicts.
I am also not aware of any detailed empirical quantitative models estimating the probability of extinction due to nuclear war.
Thatâs an odd prior. I can see a case for a prior that gets you to <10^-6, maybe even 10^-9, but how can you get to substantially below 10^-9 annual with just historical data???
Sapiens hasnât been around for that long for longer than a million years! (and conflict with homo sapiens or other human subtypes still seems like a plausible reason for extinction of other human subtypes to me). There have only been maybe 4 billion species total in all of geological history! Even if you have almost certainty that literally no species has ever died of conflict, you still canât get a prior much lower than 1â4,000,000,000! (10^-9).
EDIT: I suppose you can multiply average lifespan of species and their number to get to ~10^-15 or 10^-16 prior? But that seems like a much worse prior imo for multiple reasons, including that Iâm not sure no existing species has died of conflict (and I strongly suspect specific ones have).
Thatâs an odd prior. I can see a case for a prior that gets you to <10^-6, maybe even 10^-9, but how can you get to substantially below 10^-9 annual with just historical data???
Fitting a power law to the N rightmost points of the tail distribution of annual conflict deaths as a fraction of the global population leads to an annual probability of a conflict causing human extinction lower than 10^-9 for N no higher than 33 (for which the annual conflict extinction risk is 1.72*10^-10), where each point corresponds to one year from 1400 to 2000. The 33 rightmost points have annual conflict deaths as a fraction of the global population of at least 0.395 %. Below is how the annual conflict extinction risk evolves with the lowest annual conflict deaths as a fraction of the global population included in the power law fit (the data is here; the post is here).
The leftmost points of the tail suggest a high extinction risk because the tail distribution is quite flat for very low annual conflict deaths as a fraction of the global population.
The extinction risk starts to decay a lot as one uses increasingly rightmost points of the tail because the actual tail distribution also decreases for high annual conflict deaths as a fraction of the global population.
Sapiens hasnât been around for that long for longer than a million years! (and conflict with homo sapiens or other human subtypes still seems like a plausible reason for extinction of other human subtypes to me). There have only been maybe 4 billion species total in all of geological history! Even if you have almost certainty that literally no species has ever died of conflict, you still canât get a prior much lower than 1â4,000,000,000! (10^-9).
Interesting numbers! I think that kind of argument is too agnostic, in the sense it does not leverage the empirical evidence we have about human conflicts, and I worry it leads to predictions which are very off. For example, one could also argue the annual probability of a human born in 2024 growing to an height larger than the distance from the Earth to the Sun cannot be much lower than 10^-6 because Sapiens have only been around for 1 M years or so. However, the probability should be way way lower than that (excluding genetic engineering, very long light appendages, unreasonable interpretations of what I am referring to, like estimating the probability from the chance a spaceship with humans will collide with the Sun, etc.). One can see the probability of a (non-enhanced) human growing to such an height is much lower than 10^-6 based on the tail distribution of human heights. Since height roughly follows a normal distribution, the probability of huge heights is negligible. It might be the case that past human heights (conflicts) are not informative of future heights (conflicts), but past heights still seem to suggest an astronomically low chance of huge heights (conflicts causing human extinction).
It is also unclear from past data whether annual conflict deaths as a fraction of the global population will increase.
Below is some data on the linear regression of the logarithm of the annual conflict deaths as a fraction of the global population on the year.
There has been a slight downwards trend in the logarithm of the annual conflict deaths as a fraction of the global population, with the R^2 of the linear regression of it on the year being 8.45 %. However, it is unclear to me whether the sign of the slope is resilient against changes in the function I used to model the ratio between the Conflict Catalogâs and actual annual conflict deaths.
Thanks for engaging! For some reason I didnât get a notification on this comment.
Broadly I think youâre not accounting enough for model uncertainty. I absolutely agree that arguments like mine above are too agnostic if deployed very generally. It will be foolish to throw away all information we have in favor of very agnostic flat priors.
However, I think most of those situations are importantly disanalogous to the question at hand here. To answer most real-world questions, we have multiple lines of reasoning consistent with our less agnostic models, and can reduce model uncertainty by appealing to more than one source of confidence.
I will follow your lead and use the âCan a human born in 2024 grow to the length of one astronomical unitâ question to illustrate why I think that question is importantly disanalogous to the probability of conflict causing extinction. I think there are good reasons both outside the usual language of statistical modeling, and within it. I will focus on âoutsideâ because I think thatâs a dramatically more important point, and also more generalizable. But I will briefly discuss a âwithin-modelâ reason as well, as it might be more persuasive to some readers (not necessarily yourself).
Below is Vascoâs analogy for reference:
Interesting numbers! I think that kind of argument is too agnostic, in the sense it does not leverage the empirical evidence we have about human conflicts, and I worry it leads to predictions which are very off. For example, one could also argue the annual probability of a human born in 2024 growing to an height larger than the distance from the Earth to the Sun cannot be much lower than 10^-6 because Sapiens have only been around for 1 M years or so. However, the probability should be way way lower than that (excluding genetic engineering, very long light appendages, unreasonable interpretations of what I am referring to, like estimating the probability from the chance a spaceship with humans will collide with the Sun, etc.). One can see the probability of a (non-enhanced) human growing to such an height is much lower than 10^-6 based on the tail distribution of human heights. Since height roughly follows a normal distribution, the probability of huge heights is negligible. It might be the case that past human heights (conflicts) are not informative of future heights (conflicts), but past heights still seem to suggest an astronomically low chance of huge heights (conflicts causing human extinction).
Outside-model reasons:
For human height we have very strong principled, scientific, reasons to believe that someone born in 2024 cannot grow to a height larger than one astronomical unit. Note that none of them appeal to a normal distribution of human height:
Square-cube law of biomechanics. Muscles in a human-shaped body simply cannot support a very tall person, even to say the length of a medium-sized whale. Certainly not vertically.
Calories. I havenât ran the numbers, but I strongly suspect that there isnât enough food calories on Earth for someone born in 2024 to grow all the way to the sun. Even if there is, you can add a few zeroes of doubt by questioning the economic feasibility/âwillingness for Earth to supply the calories towards the sole purpose of someone growing insanely tall.
Oxygen&atmosphere. We know enough about how human biology works to know that you canât breathe outside of the atmosphere.
I suppose you can arrange for a helmet + some other method to pump oxygen, but you also need some way to protect the skin.
Structural stability. human flesh and bone is only so stable so I donât think you can get very long even if (1) is solved.
Lifespan and âGrowthspanâ. Even if we magically solve all the other physical problems, as a numerical issue, growing even 2 feet/âyear is insanely fast for humans, and that gets you to only 240 feet (~70 meters) in a (generously) 120 year lifespan where someone never stops growing until they die. So a non-enhanced âhumanâ, even if we no longer use Earth physics but instead âmovie/âstorybook physics,â could not, as a numerical matter, even reach the size of a medium-sized skyscraper, never mind leave the atmosphere.
etc
I say all of this as someone without much of a natural sciences background. I suspect if I talk to a biophysicist or astrophysicist or someone who studies anatomy, they can easily point to many more reasons why the proposed forecast is impractical, without appealing to the normal distribution of human height. All of these point strongly against human height reaching very large values, never mind literally one astronomical unit. Now some theoretical reasons can provide us with good explanatory power about the shape of the statstical distribution as well (for example, things that point to the generator of human height being additive and thus following the Central Limit Theorem), but afaik those theoretical arguments are weaker/âlower confidence than the bounding arguments.
In contrast, if the only thing I knew about human height is the empirical observed data on human height so far, (eg itâs just displayed as a column of numbers), plus some expert assurances that the data fits a normal distribution extremely well, I will be much less confident that human height cannot reach extremal values.
Put more concretely, human average male height is maybe 172cm with a standard deviation of 8cm (The linked source has a ton of different studies; Iâm just eyeballing the relevant numbers). Ballpark 140 million people born a year, ~50% of which are men. This corresponds to ~5.7 sds. 5.7 sds past 172cm is 218cm. If we throw in 3 more s.d.s (which corresponds to >>1,000,000 difference on likelihood on a normal distribution), we get to 242cm. This is a result that will be predicted as extremely unlikely from statistically modeling a normal distribution, but âallowedâ with the more generous scientific bounds from 1-6 above (at least, theyâre allowed at my own level of scientific sophistication; again Iâm not an expert on the natural sciences).
Am I confident that someone born in 2024 canât grow to be 1 AU tall? Yep, absolutely.
Am I confident that someone born in 2024 canât grow to be 242cm? Nope. I just donât trust the statistical modeling all that much. (If you disagree and are willing to offer 1,000,000:1 odds on this question, Iâll probably be willing to bet on it).
This is why I think itâs important to be able to think about a problem from multiple angles. In particular, it often is useful (outside of theoretical physics?) to think of real physical reality as a real concrete thing with 3 dimensions, not (just) a bunch of abstractions.[1]
Statistical disagreement
I just donât have much confidence that you can cleanly differentiate a power law or log-normal distribution from more extreme distributions from the data alone. One of my favorite mathy quotes is âany extremal distribution looks like a straight-line when drawn on a log-log plot with a fat marker[2]â.
Statistically, when your sample size is <1000, itâs not hard to generate a family of distributions that have much-higher-than-observed numbers with significant but <1/â10,000 probability. Theoretically, I feel like I need some (ideally multiple) underlying justifications[3] for a distributionâs shape before doing a bunch of curve fits and calling it a day. Or like, you can do the curve fit, but I donât see how the curve fit alone gives us enough info to rule out 1â100,000 or 1 in 1,000,000 or 1 in 1,000,000,000 possibilities.
Now for normal distributions, or normal-ish distributions, this may not matter all that much in practice. As you say âheight roughly follows a normal distribution,â so as long as a distribution is ~roughly normal, some small divergences doesnât get you too far away (maybe with a slightly differently shaped underlying distribution that fits the data itâs possible to get a 242 cm human, maybe even 260 cm, but not 400cm, and certainly not 4000 cm). But for extremal distributions this clearly matters a lot. Different extremal distributions predict radically different things at the tail(s).
Early on in my career as a researcher, I reviewed a white paper which made this mistake rather egregiously. Basically they used a range of estimates to 2 different factors (cell size and number of cells per feeding tank) to come to a desired conclusion, without realizing that the two different factors combined lead to a physical impossibility (the cells will then compose exactly 100% of the tank).
I usually deploy this line when arguing against people who claim they discovered a power law when I suspect something like ~log-normal might be a better fit. But obviously it works in the other direction as well, the main issue is model uncertainty.
Tho tbh, even if you did have strong, principled justifications for a distributionâs shape, I still feel like itâs hard to get >2 additional OOMs of confidence in a distributionâs underlying shape for non-synthetic data. (âThese factors all sure seem relevant, and there are a lot of these factors, and we have strong principled reasons to see why theyâre additive to the target, so the central limit theorem surely appliesâ sure seems pretty persuasive. But I donât think itâs more than 100:1 odds persuasive to me).
To add to this, assuming your numbers are right (I havenât checked), there have been multiple people born since 1980 who ended up taller than 242cm, which I expect would make any single normal distribution extremely unlikely to be a good model (either a poor fit on the tails, or a poor fit on a large share of the data), given our data:
https://ââen.m.wikipedia.org/ââwiki/ââList_of_tallest_people
I suppose some have specific conditions that led to their height. I donât know if all or most do.
Now for normal distributions, or normal-ish distributions, this may not matter all that much in practice. As you say âheight roughly follows a normal distribution,â so as long as a distribution is ~roughly normal, some small divergences doesnât get you too far away (maybe with a slightly differently shaped underlying distribution that fits the data itâs possible to get a 242 cm human, maybe even 260 cm, but not 400cm, and certainly not 4000 cm).
Since height roughly follows a normal distribution, the probability of huge heights is negligible.
Right, by âthe probability of huge heights is negligibleâ, I meant way more than 2.42 m, such that the details of the distribution would not matter. I would not get an astronomically low probability of at least such an height based on the methodology I used to get an astronomically low chance of a conflict causing human extinction. To arrive at this, I looked into the empirical tail distribution. I did not fit a distribution to the 25th to 75th range, which is probably what would have suggested a normal distribution for height, and then extrapolated from there. I said I got an annual probability of conflict causing human extinction lower than 10^-9 using 33 or less of the rightmost points of the tail distribution. The 33rd tallest person whose height was recorded was actually 2.42 m, which illustrates I would not have gotten an astronomically low probability for at least 2.42 m.
This is why I think itâs important to be able to think about a problem from multiple angles.
I agree. What do you think is the annualised probability of a nuclear war or volcanic eruption causing human extinction in the next 10 years? Do you see any concrete scenarios where the probability of a nuclear war or volcanic eruption causing human extinction is close to Tobyâs values?
I usually deploy this line [âany extremal distribution looks like a straight-line when drawn on a log-log plot with a fat markerâ] when arguing against people who claim they discovered a power law when I suspect something like ~log-normal might be a better fit. But obviously it works in the other direction as well, the main issue is model uncertainty.
I think power laws overestimate extinction risk. They imply the probability of going from 80 M annual deaths to extinction would be the same as going from extinction to 800 billion annual deaths, which very much overestimates the risk of large death tolls. So it makes sense the tail distribution eventually starts to decay much faster than implied by a power law, especially if this is fitted to the left tail.
On the other hand, I agree it is unclear whether the above tail distribution suggests an annual probability of a conflict causing human extinction above/âbelow 10^-9. Still, even my inside view annual extinction risk from nuclear war of 5.53*10^-10 (which makes no use of the above tail distribution) is only 0.0111 % (= 5.53*10^-10/â(5*10^-6)) of Tobyâs value.
I did not fit a distribution to the 25th to 75th range, which is probably what would have suggested a normal distribution for height, and then extrapolated from there. I said I got an annual probability of conflict causing human extinction lower than 10^-9 using 33 or less of the rightmost points of the tail distribution. The 33rd tallest person whose height was recorded was actually 2.42 m, which illustrates I would not have gotten an astronomically low probability for at least 2.42 m.
To be clear, Iâm not accusing you of removing outliers from your data. Iâm saying that you canât rule out medium-small probabilities of your model being badly off based on all the direct data you have access to, when you have so few data points to fit your model (not due to your fault, but because reality only gave you so many data points to look at).
My guess is that randomly selecting 1000 data points of human height and fitting a distribution will more likely than not generate a ~normal distribution, but this is just speculation, I havenât done the data analysis myself.
What do you think is the annualised probability of a nuclear war or volcanic eruption causing human extinction in the next 10 years? Do you see any concrete scenarios where the probability of a nuclear war or volcanic eruption causing human extinction is close to Tobyâs values?
I havenât been able to come up with a good toy model or bounds that Iâm happy with, after thinking about it for a bit (Iâm sure less than you or Toby or others like Luisa). If you or other commenters have models that you like, please let me know!
(In particular Iâd be interested in a good generative argument for the prior).
Am I confident that someone born in 2024 canât grow to be 242cm? Nope. I just donât trust the statistical modeling all that much. (If you disagree and are willing to offer 1,000,000:1 odds on this question, Iâll probably be willing to bet on it).
I do not want to take this bet, but I am open to other suggestions. For example, I think it is very unlikely that transformative AI, as defined in Metaculus, will happen in the next few years.
Thanks for the comment, Stephen.
I tried to account for model uncertainty assuming 10^-6 probability of human extinction given insufficient calorie production.
Note there are infinitely many orders of magnitude between 0 and any astronomically low number like 5.93e-12. At least in theory, I can be quite uncertain while having a low best guess. I understand greater uncertainty (e.g. higher ratio between the 95th and 5th percentile) holding the median constant tends to increase the mean of heavy-tailed distributions (like lognormals), but it is unclear to which extent this applies. I have also accounted for that by using heavy-tailed distributions whenever I thought appropriate (e.g. I modelled the soot injected into the stratosphere per equivalent yield as a lognormal).
As a side note, 10 of 161 (6.21 %) forecasters of the Existential Risk Persuasion Tournament (XPT), 4 experts and 6 superforecasters, predicted a nuclear extinction risk until 2100 of exactly 0. I guess these participants know the risk is higher than 0, but consider it astronomically low too.
I used to be persuaded by this type of argument, which is made in many contexts by the global catastrophic risk community. I think it often misses that the weight a model should receive is not independent of its predictions. I would say high extinction risk goes against the low prior established by historical conflicts.
I am also not aware of any detailed empirical quantitative models estimating the probability of extinction due to nuclear war.
Thatâs an odd prior. I can see a case for a prior that gets you to <10^-6, maybe even 10^-9, but how can you get to substantially below 10^-9 annual with just historical data???
Sapiens hasnât been around for that long for longer than a million years! (and conflict with homo sapiens or other human subtypes still seems like a plausible reason for extinction of other human subtypes to me). There have only been maybe 4 billion species total in all of geological history! Even if you have almost certainty that literally no species has ever died of conflict, you still canât get a prior much lower than 1â4,000,000,000! (10^-9).
EDIT: I suppose you can multiply average lifespan of species and their number to get to ~10^-15 or 10^-16 prior? But that seems like a much worse prior imo for multiple reasons, including that Iâm not sure no existing species has died of conflict (and I strongly suspect specific ones have).
Thanks for the comment, Linch.
Fitting a power law to the N rightmost points of the tail distribution of annual conflict deaths as a fraction of the global population leads to an annual probability of a conflict causing human extinction lower than 10^-9 for N no higher than 33 (for which the annual conflict extinction risk is 1.72*10^-10), where each point corresponds to one year from 1400 to 2000. The 33 rightmost points have annual conflict deaths as a fraction of the global population of at least 0.395 %. Below is how the annual conflict extinction risk evolves with the lowest annual conflict deaths as a fraction of the global population included in the power law fit (the data is here; the post is here).
The leftmost points of the tail suggest a high extinction risk because the tail distribution is quite flat for very low annual conflict deaths as a fraction of the global population.
The extinction risk starts to decay a lot as one uses increasingly rightmost points of the tail because the actual tail distribution also decreases for high annual conflict deaths as a fraction of the global population.Interesting numbers! I think that kind of argument is too agnostic, in the sense it does not leverage the empirical evidence we have about human conflicts, and I worry it leads to predictions which are very off. For example, one could also argue the annual probability of a human born in 2024 growing to an height larger than the distance from the Earth to the Sun cannot be much lower than 10^-6 because Sapiens have only been around for 1 M years or so. However, the probability should be way way lower than that (excluding genetic engineering, very long light appendages, unreasonable interpretations of what I am referring to, like estimating the probability from the chance a spaceship with humans will collide with the Sun, etc.). One can see the probability of a (non-enhanced) human growing to such an height is much lower than 10^-6 based on the tail distribution of human heights. Since height roughly follows a normal distribution, the probability of huge heights is negligible. It might be the case that past human heights (conflicts) are not informative of future heights (conflicts), but past heights still seem to suggest an astronomically low chance of huge heights (conflicts causing human extinction).
It is also unclear from past data whether annual conflict deaths as a fraction of the global population will increase.
Below is some data on the linear regression of the logarithm of the annual conflict deaths as a fraction of the global population on the year.
As I have said:
Thanks for engaging! For some reason I didnât get a notification on this comment.
Broadly I think youâre not accounting enough for model uncertainty. I absolutely agree that arguments like mine above are too agnostic if deployed very generally. It will be foolish to throw away all information we have in favor of very agnostic flat priors.
However, I think most of those situations are importantly disanalogous to the question at hand here. To answer most real-world questions, we have multiple lines of reasoning consistent with our less agnostic models, and can reduce model uncertainty by appealing to more than one source of confidence.
I will follow your lead and use the âCan a human born in 2024 grow to the length of one astronomical unitâ question to illustrate why I think that question is importantly disanalogous to the probability of conflict causing extinction. I think there are good reasons both outside the usual language of statistical modeling, and within it. I will focus on âoutsideâ because I think thatâs a dramatically more important point, and also more generalizable. But I will briefly discuss a âwithin-modelâ reason as well, as it might be more persuasive to some readers (not necessarily yourself).
Below is Vascoâs analogy for reference:
Outside-model reasons:
For human height we have very strong principled, scientific, reasons to believe that someone born in 2024 cannot grow to a height larger than one astronomical unit. Note that none of them appeal to a normal distribution of human height:
Square-cube law of biomechanics. Muscles in a human-shaped body simply cannot support a very tall person, even to say the length of a medium-sized whale. Certainly not vertically.
Calories. I havenât ran the numbers, but I strongly suspect that there isnât enough food calories on Earth for someone born in 2024 to grow all the way to the sun. Even if there is, you can add a few zeroes of doubt by questioning the economic feasibility/âwillingness for Earth to supply the calories towards the sole purpose of someone growing insanely tall.
Oxygen&atmosphere. We know enough about how human biology works to know that you canât breathe outside of the atmosphere.
I suppose you can arrange for a helmet + some other method to pump oxygen, but you also need some way to protect the skin.
Structural stability. human flesh and bone is only so stable so I donât think you can get very long even if (1) is solved.
Lifespan and âGrowthspanâ. Even if we magically solve all the other physical problems, as a numerical issue, growing even 2 feet/âyear is insanely fast for humans, and that gets you to only 240 feet (~70 meters) in a (generously) 120 year lifespan where someone never stops growing until they die. So a non-enhanced âhumanâ, even if we no longer use Earth physics but instead âmovie/âstorybook physics,â could not, as a numerical matter, even reach the size of a medium-sized skyscraper, never mind leave the atmosphere.
etc
I say all of this as someone without much of a natural sciences background. I suspect if I talk to a biophysicist or astrophysicist or someone who studies anatomy, they can easily point to many more reasons why the proposed forecast is impractical, without appealing to the normal distribution of human height. All of these point strongly against human height reaching very large values, never mind literally one astronomical unit. Now some theoretical reasons can provide us with good explanatory power about the shape of the statstical distribution as well (for example, things that point to the generator of human height being additive and thus following the Central Limit Theorem), but afaik those theoretical arguments are weaker/âlower confidence than the bounding arguments.
In contrast, if the only thing I knew about human height is the empirical observed data on human height so far, (eg itâs just displayed as a column of numbers), plus some expert assurances that the data fits a normal distribution extremely well, I will be much less confident that human height cannot reach extremal values.
Put more concretely, human average male height is maybe 172cm with a standard deviation of 8cm (The linked source has a ton of different studies; Iâm just eyeballing the relevant numbers). Ballpark 140 million people born a year, ~50% of which are men. This corresponds to ~5.7 sds. 5.7 sds past 172cm is 218cm. If we throw in 3 more s.d.s (which corresponds to >>1,000,000 difference on likelihood on a normal distribution), we get to 242cm. This is a result that will be predicted as extremely unlikely from statistically modeling a normal distribution, but âallowedâ with the more generous scientific bounds from 1-6 above (at least, theyâre allowed at my own level of scientific sophistication; again Iâm not an expert on the natural sciences).
Am I confident that someone born in 2024 canât grow to be 1 AU tall? Yep, absolutely.
Am I confident that someone born in 2024 canât grow to be 242cm? Nope. I just donât trust the statistical modeling all that much.
(If you disagree and are willing to offer 1,000,000:1 odds on this question, Iâll probably be willing to bet on it).
This is why I think itâs important to be able to think about a problem from multiple angles. In particular, it often is useful (outside of theoretical physics?) to think of real physical reality as a real concrete thing with 3 dimensions, not (just) a bunch of abstractions.[1]
Statistical disagreement
I just donât have much confidence that you can cleanly differentiate a power law or log-normal distribution from more extreme distributions from the data alone. One of my favorite mathy quotes is âany extremal distribution looks like a straight-line when drawn on a log-log plot with a fat marker[2]â.
Statistically, when your sample size is <1000, itâs not hard to generate a family of distributions that have much-higher-than-observed numbers with significant but <1/â10,000 probability. Theoretically, I feel like I need some (ideally multiple) underlying justifications[3] for a distributionâs shape before doing a bunch of curve fits and calling it a day. Or like, you can do the curve fit, but I donât see how the curve fit alone gives us enough info to rule out 1â100,000 or 1 in 1,000,000 or 1 in 1,000,000,000 possibilities.
Now for normal distributions, or normal-ish distributions, this may not matter all that much in practice. As you say âheight roughly follows a normal distribution,â so as long as a distribution is ~roughly normal, some small divergences doesnât get you too far away (maybe with a slightly differently shaped underlying distribution that fits the data itâs possible to get a 242 cm human, maybe even 260 cm, but not 400cm, and certainly not 4000 cm). But for extremal distributions this clearly matters a lot. Different extremal distributions predict radically different things at the tail(s).
Early on in my career as a researcher, I reviewed a white paper which made this mistake rather egregiously. Basically they used a range of estimates to 2 different factors (cell size and number of cells per feeding tank) to come to a desired conclusion, without realizing that the two different factors combined lead to a physical impossibility (the cells will then compose exactly 100% of the tank).
I usually deploy this line when arguing against people who claim they discovered a power law when I suspect something like ~log-normal might be a better fit. But obviously it works in the other direction as well, the main issue is model uncertainty.
Tho tbh, even if you did have strong, principled justifications for a distributionâs shape, I still feel like itâs hard to get >2 additional OOMs of confidence in a distributionâs underlying shape for non-synthetic data. (âThese factors all sure seem relevant, and there are a lot of these factors, and we have strong principled reasons to see why theyâre additive to the target, so the central limit theorem surely appliesâ sure seems pretty persuasive. But I donât think itâs more than 100:1 odds persuasive to me).
To add to this, assuming your numbers are right (I havenât checked), there have been multiple people born since 1980 who ended up taller than 242cm, which I expect would make any single normal distribution extremely unlikely to be a good model (either a poor fit on the tails, or a poor fit on a large share of the data), given our data: https://ââen.m.wikipedia.org/ââwiki/ââList_of_tallest_people
I suppose some have specific conditions that led to their height. I donât know if all or most do.
Thanks, Linch. Strongly upvoted.
Right, by âthe probability of huge heights is negligibleâ, I meant way more than 2.42 m, such that the details of the distribution would not matter. I would not get an astronomically low probability of at least such an height based on the methodology I used to get an astronomically low chance of a conflict causing human extinction. To arrive at this, I looked into the empirical tail distribution. I did not fit a distribution to the 25th to 75th range, which is probably what would have suggested a normal distribution for height, and then extrapolated from there. I said I got an annual probability of conflict causing human extinction lower than 10^-9 using 33 or less of the rightmost points of the tail distribution. The 33rd tallest person whose height was recorded was actually 2.42 m, which illustrates I would not have gotten an astronomically low probability for at least 2.42 m.
I agree. What do you think is the annualised probability of a nuclear war or volcanic eruption causing human extinction in the next 10 years? Do you see any concrete scenarios where the probability of a nuclear war or volcanic eruption causing human extinction is close to Tobyâs values?
I think power laws overestimate extinction risk. They imply the probability of going from 80 M annual deaths to extinction would be the same as going from extinction to 800 billion annual deaths, which very much overestimates the risk of large death tolls. So it makes sense the tail distribution eventually starts to decay much faster than implied by a power law, especially if this is fitted to the left tail.
On the other hand, I agree it is unclear whether the above tail distribution suggests an annual probability of a conflict causing human extinction above/âbelow 10^-9. Still, even my inside view annual extinction risk from nuclear war of 5.53*10^-10 (which makes no use of the above tail distribution) is only 0.0111 % (= 5.53*10^-10/â(5*10^-6)) of Tobyâs value.
To be clear, Iâm not accusing you of removing outliers from your data. Iâm saying that you canât rule out medium-small probabilities of your model being badly off based on all the direct data you have access to, when you have so few data points to fit your model (not due to your fault, but because reality only gave you so many data points to look at).
My guess is that randomly selecting 1000 data points of human height and fitting a distribution will more likely than not generate a ~normal distribution, but this is just speculation, I havenât done the data analysis myself.
I havenât been able to come up with a good toy model or bounds that Iâm happy with, after thinking about it for a bit (Iâm sure less than you or Toby or others like Luisa). If you or other commenters have models that you like, please let me know!
(In particular Iâd be interested in a good generative argument for the prior).
I do not want to take this bet, but I am open to other suggestions. For example, I think it is very unlikely that transformative AI, as defined in Metaculus, will happen in the next few years.