Thanks for this post Luisa! Really nice resource and I wish I caught it earlier. A couple methodology questions:
Why do you choose an arithmetic mean for aggregating these estimates? It seems like there is an argument to be made that in this case we care about order-of-magnitude correctness, which would imply taking the average of the log probabilities. This is equivalent to the geometric mean (I believe) and is recommended for fermi estimates e.g. (here)[https://www.lesswrong.com/posts/PsEppdvgRisz5xAHG/fermi-estimates].
Do you have a sense for how much, if any, these estimates are confounded by the variable of time? Are all estimates trying to guess likelihood of war in the few years following the estimate, or do some have longer time horizons (you mention this explicitly for a number of them, but struggling to find for all. Sorry if I missed)? If these are forecasting something close to the instantaneous yearly probability, do you think we should worry about adjusting estimates by when they were made, in case i.e. a lot has changed between 2005 and now?
Related to the above, do you believe risk of nuclear war is changing with time or approximately constant?
Did you consider any alternative schemes to weighting these estimates equally? I notice that for example the GJI estimate on US-Russia nuclear war is more than an order of magnitude lower than the rest, but is also the group I’d put my money on based on forecasting track record. Do you find these estimates approximately equally credible?
Why do you choose an arithmetic mean for aggregating these estimates?
This is a good point.
I’d add that as a general rule when aggregating binary predictions one should default to the average log odds, perhaps with an extremization factor as described in (Satopää et al, 2014).
Thanks for this post Luisa! Really nice resource and I wish I caught it earlier. A couple methodology questions:
Why do you choose an arithmetic mean for aggregating these estimates? It seems like there is an argument to be made that in this case we care about order-of-magnitude correctness, which would imply taking the average of the log probabilities. This is equivalent to the geometric mean (I believe) and is recommended for fermi estimates e.g. (here)[https://www.lesswrong.com/posts/PsEppdvgRisz5xAHG/fermi-estimates].
Do you have a sense for how much, if any, these estimates are confounded by the variable of time? Are all estimates trying to guess likelihood of war in the few years following the estimate, or do some have longer time horizons (you mention this explicitly for a number of them, but struggling to find for all. Sorry if I missed)? If these are forecasting something close to the instantaneous yearly probability, do you think we should worry about adjusting estimates by when they were made, in case i.e. a lot has changed between 2005 and now?
Related to the above, do you believe risk of nuclear war is changing with time or approximately constant?
Did you consider any alternative schemes to weighting these estimates equally? I notice that for example the GJI estimate on US-Russia nuclear war is more than an order of magnitude lower than the rest, but is also the group I’d put my money on based on forecasting track record. Do you find these estimates approximately equally credible?
Curious for your thoughts!
This is a good point.
I’d add that as a general rule when aggregating binary predictions one should default to the average log odds, perhaps with an extremization factor as described in (Satopää et al, 2014).
The reasons are a) empirically, it seems to work better, b) the way Bayes rules works it seems to suggest very strongly than log odds are the natural unit of evidence, c) apparently there are some complex theoretical reasons (“external bayesianism”) why this is better (the details go a bit over my head).
FYI, this post by Jaime has an extended discussion of this issue.