This is some advice I wrote about doing back-of-the-envelope calculations (BOTECs) and uncertainty estimation, which are often useful as part of forecasting. This advice isn’t supposed to be a comprehensive guide by any means. The advice originated from specific questions that someone I was mentoring asked me. Note that I’m still fairly inexperienced with forecasting. If you’re someone with experience in forecasting, uncertainty estimation, or BOTECs, I’d love to hear how you would expand or deviate from this advice.
How to do uncertainty estimation?
A BOTEC is estimating one number from a series of calculations. So I think a good way to estimate uncertainty is to assign credible intervals to each input of the calculation. Then propagate the uncertainty in the inputs through to the output of the calculation.
Normally I choose a 90% interval. This is the default in Squiggle.
If you have a lot of data about the thing (say, >10 values), and the sample of data doesn’t seem particularly biased, then it might be reasonable to use the standard deviation of the data. (Measure this in log-space if you have reason to think it’s distributed log-normally—see next point about choosing the distribution.) Then compute the 90% credible interval as +/- 1.645*std, assuming a (log-)normal distribution.
How to choose the distribution:
It’s usually a choice between log-normal and normal.
If the variable seems like the sort of thing that could vary by orders of magnitude, then log-normal is best. Otherwise, normal.
You can use the data points you have, or the credible interval you chose, to inform this.
When in doubt, I’d say that most of the time (for AI-related BOTECs), log-normal distribution is a good choice. Log-normal is the default distribution in Squiggle when you specify a credible interval.
A uniform distribution might occasionally be useful if there are strict lower and upper bounds to the value and the value varies roughly uniformly. But you can clip other distributions by strict bounds in Squiggle using lclip and rclip.
If you do sanity checks and they conflict, how do you update?
This goes without saying, but double-check the calculations. Don’t go on a deep-dive before being confident that the calculations are implemented correctly.
Account for uncertainty. Are the estimates really in conflict, or could the confidence intervals in the estimates overlap?
Consult other people about why this conflict occurred and how it could be resolved.
If you have an explicit model that produced your original estimate, then I think it’s best to first try to find the flaw in your model. If it’s a flaw that could be patched somehow, then patch it and see if there is still conflict.
If there’s no clear way to patch your model, then try a different model entirely. See if that alternate model’s estimate is in conflict with the sanity-check value. If there’s no conflict (or less conflict) then the new model is most likely a better model.
If there’s no alternate model that you can feasibly use, then you might resort to adjusting your estimate directly by some fudge factor, or averaging your estimate with the sanity-check value. But be sure to communicate in your write-up about the original estimates that conflicted, and explain how you resolved the conflict.
How should you make a central estimate or best guess from a small number of data points (e.g. three)?
If the data points vary by large factors or orders of magnitude, then the geometric mean is probably best, since it’s equivalent to the arithmetic mean on a logarithmic scale.
Otherwise, the arithmetic mean is fine.
If you think that the credibility of each data point varies significantly, you should assign different weights to each data point.
I don’t know of a general, principled way to set weights; it seems pretty intuition-based. But if one data point seems twice as credible or incorporates information that is twice as reliable, for instance, then it makes sense to assign it twice as much weight.
TL;DR are there any forum posts or similarly accessible writing that clarify different notions of x-risk? If not, does it seem worth writing?
My impression is that prevailing notions of x-risk (i.e. what it means, not specific cause areas) have broadened or shifted over time, but there’s a lack of clarity about what notion/definition people are basing arguments on in discourse.
At the same time, discussion of x-risk sometimes seems too narrow. For example, in the most recent 80K podcast with Will MacAskill, they at one point talk about x-risk in terms of literal 100% human annihilation. IMO this is one of the least relevant notions of x-risk, for cause prioritisation purposes. Perhaps there’s a bias because literal human extinction is the most concrete/easy to explain/easy to reason about? Nowadays I frame longtermist cause prioritisation more like “what could cause the largestlosses to the expected value of the future” than “what could plausibly annihilate humanity”.
Bostrom (2002) defined x-risk as “one where an adverse outcome would either annihilate Earth-originating intelligent life or permanently and drastically curtail its potential”. There is also a taxonomy in section 3 of the paper. Torres (2019) explains and analyses five different definitions of x-risk, which I think all have some merit.
To be clear I think many people have internalised broader notions of x-risk in their thoughts and arguments, both generally and for specific cause areas. I just think it could use some clarification and a call for people to clarify themselves, e.g. in a forum post.
I’d love to see this post and generally more discussion of what kinds of x-risks and s-risks matter most. 80K’s views seem predicated on deeply held, nuanced, and perhaps unconventional views of longtermism, and it can be hard to learn all the context to catch up on those discussions.
I definitely think it could be good to have a set of definitions written up on the Forum, especially if it’s briefer/easier to reference than the Torres paper or other academic papers defining X-risk. If you do end up writing something, I’d be happy to look it over before you publish!
This is some advice I wrote about doing back-of-the-envelope calculations (BOTECs) and uncertainty estimation, which are often useful as part of forecasting. This advice isn’t supposed to be a comprehensive guide by any means. The advice originated from specific questions that someone I was mentoring asked me. Note that I’m still fairly inexperienced with forecasting. If you’re someone with experience in forecasting, uncertainty estimation, or BOTECs, I’d love to hear how you would expand or deviate from this advice.
How to do uncertainty estimation?
A BOTEC is estimating one number from a series of calculations. So I think a good way to estimate uncertainty is to assign credible intervals to each input of the calculation. Then propagate the uncertainty in the inputs through to the output of the calculation.
I recommend Squiggle for this (the Python version is https://github.com/rethinkpriorities/squigglepy/).
How to assign a credible interval:
Normally I choose a 90% interval. This is the default in Squiggle.
If you have a lot of data about the thing (say, >10 values), and the sample of data doesn’t seem particularly biased, then it might be reasonable to use the standard deviation of the data. (Measure this in log-space if you have reason to think it’s distributed log-normally—see next point about choosing the distribution.) Then compute the 90% credible interval as +/- 1.645*std, assuming a (log-)normal distribution.
How to choose the distribution:
It’s usually a choice between log-normal and normal.
If the variable seems like the sort of thing that could vary by orders of magnitude, then log-normal is best. Otherwise, normal.
You can use the data points you have, or the credible interval you chose, to inform this.
When in doubt, I’d say that most of the time (for AI-related BOTECs), log-normal distribution is a good choice. Log-normal is the default distribution in Squiggle when you specify a credible interval.
A uniform distribution might occasionally be useful if there are strict lower and upper bounds to the value and the value varies roughly uniformly. But you can clip other distributions by strict bounds in Squiggle using lclip and rclip.
If you do sanity checks and they conflict, how do you update?
This goes without saying, but double-check the calculations. Don’t go on a deep-dive before being confident that the calculations are implemented correctly.
Account for uncertainty. Are the estimates really in conflict, or could the confidence intervals in the estimates overlap?
Consult other people about why this conflict occurred and how it could be resolved.
If you have an explicit model that produced your original estimate, then I think it’s best to first try to find the flaw in your model. If it’s a flaw that could be patched somehow, then patch it and see if there is still conflict.
If there’s no clear way to patch your model, then try a different model entirely. See if that alternate model’s estimate is in conflict with the sanity-check value. If there’s no conflict (or less conflict) then the new model is most likely a better model.
If there’s no alternate model that you can feasibly use, then you might resort to adjusting your estimate directly by some fudge factor, or averaging your estimate with the sanity-check value. But be sure to communicate in your write-up about the original estimates that conflicted, and explain how you resolved the conflict.
How should you make a central estimate or best guess from a small number of data points (e.g. three)?
If the data points vary by large factors or orders of magnitude, then the geometric mean is probably best, since it’s equivalent to the arithmetic mean on a logarithmic scale.
Otherwise, the arithmetic mean is fine.
If you think that the credibility of each data point varies significantly, you should assign different weights to each data point.
I don’t know of a general, principled way to set weights; it seems pretty intuition-based. But if one data point seems twice as credible or incorporates information that is twice as reliable, for instance, then it makes sense to assign it twice as much weight.
TL;DR are there any forum posts or similarly accessible writing that clarify different notions of x-risk? If not, does it seem worth writing?
My impression is that prevailing notions of x-risk (i.e. what it means, not specific cause areas) have broadened or shifted over time, but there’s a lack of clarity about what notion/definition people are basing arguments on in discourse.
At the same time, discussion of x-risk sometimes seems too narrow. For example, in the most recent 80K podcast with Will MacAskill, they at one point talk about x-risk in terms of literal 100% human annihilation. IMO this is one of the least relevant notions of x-risk, for cause prioritisation purposes. Perhaps there’s a bias because literal human extinction is the most concrete/easy to explain/easy to reason about? Nowadays I frame longtermist cause prioritisation more like “what could cause the largest losses to the expected value of the future” than “what could plausibly annihilate humanity”.
Bostrom (2002) defined x-risk as “one where an adverse outcome would either annihilate Earth-originating intelligent life or permanently and drastically curtail its potential”. There is also a taxonomy in section 3 of the paper. Torres (2019) explains and analyses five different definitions of x-risk, which I think all have some merit.
To be clear I think many people have internalised broader notions of x-risk in their thoughts and arguments, both generally and for specific cause areas. I just think it could use some clarification and a call for people to clarify themselves, e.g. in a forum post.
I’d love to see this post and generally more discussion of what kinds of x-risks and s-risks matter most. 80K’s views seem predicated on deeply held, nuanced, and perhaps unconventional views of longtermism, and it can be hard to learn all the context to catch up on those discussions.
One distinction I like is OpenPhil talking about Level 1 and Level 2 GCRs: https://www.openphilanthropy.org/blog/long-term-significance-reducing-global-catastrophic-risks
I definitely think it could be good to have a set of definitions written up on the Forum, especially if it’s briefer/easier to reference than the Torres paper or other academic papers defining X-risk. If you do end up writing something, I’d be happy to look it over before you publish!
Some discussion here, too, in the context of introducing s-risks:
https://foundational-research.org/s-risks-talk-eag-boston-2017/