Identify and validate better methods of eliciting low-probability forecasts
I think this is important work, so I’m glad to hear that it’s a priority.
It’s a two-pronged approach, right? Measuring how reliable forecasters are at working with small probabilities, and using better elicitation measures to reduce the size of any error effect
I suspect that when measuring how reliable forecasters are at working with small probabilities you’ll find a broad range of reliability. It would be interesting to see how the XPT forecasts change if you exclude those with poor small-probability understanding, or if you weight each response according to the forecaster’s aptitude.
Using comparative judgements seems like a good avenue for exploration. Have you thought about any of the following?
Using “1-in-x” style probabilities instead of x%. This might be a more “natural” way of thinking about small probabilities
Eliciting probabilities by steps: first get respondents to give the order of magnitude (is your prediction between 1-in-10 and 1-in-100 or between 1-in-100 and 1-in-1000 or…), then have them narrow down further. This is still more abstract than the “struck by lightning” idea, but does not rely on the respondent’s level of lightning-strike knowledge
Giving respondents sense-checks on the answers they have given before, and an opportunity to amend their answers: “your estimate of X-risk through Bio is 1% and your estimate of X-risk through natural pandemics is 0.8%, so you think 80% of the x-risk from Bio comes from natural pandemics”
One more thing:
If we are not sure whether forecasters can tell 0.001% apart from 0.000001% (a magnitude difference of 1,000x), then we should treat a 0.000001% forecast of a catastrophic risk as if it were 0.001% and be much more cautious about potential dangers.
In theory, yes, but I think people are generally much more likely to say 0.001% when their “true” probability is 0.000001% than vice versa—maybe because we very rarely think about events of the order of 0.000001%, so 0.001% seems to cover the most unlikely events.
You might counter that we just need a small proportion of respondents to say 0.00001% when their “true” probability is 0.001% to risk undervaluing important risks. But not if we are using medians, as the XPT does.
I could be wrong on the above, but my take is that understanding the likely direction of errors in the “0.001% vs 0.000001%” scenarios maybe ought to be a priority.
I think this is important work, so I’m glad to hear that it’s a priority.
It’s a two-pronged approach, right? Measuring how reliable forecasters are at working with small probabilities, and using better elicitation measures to reduce the size of any error effect
I suspect that when measuring how reliable forecasters are at working with small probabilities you’ll find a broad range of reliability. It would be interesting to see how the XPT forecasts change if you exclude those with poor small-probability understanding, or if you weight each response according to the forecaster’s aptitude.
Using comparative judgements seems like a good avenue for exploration. Have you thought about any of the following?
Using “1-in-x” style probabilities instead of x%. This might be a more “natural” way of thinking about small probabilities
Eliciting probabilities by steps: first get respondents to give the order of magnitude (is your prediction between 1-in-10 and 1-in-100 or between 1-in-100 and 1-in-1000 or…), then have them narrow down further. This is still more abstract than the “struck by lightning” idea, but does not rely on the respondent’s level of lightning-strike knowledge
Giving respondents sense-checks on the answers they have given before, and an opportunity to amend their answers: “your estimate of X-risk through Bio is 1% and your estimate of X-risk through natural pandemics is 0.8%, so you think 80% of the x-risk from Bio comes from natural pandemics”
One more thing:
In theory, yes, but I think people are generally much more likely to say 0.001% when their “true” probability is 0.000001% than vice versa—maybe because we very rarely think about events of the order of 0.000001%, so 0.001% seems to cover the most unlikely events.
You might counter that we just need a small proportion of respondents to say 0.00001% when their “true” probability is 0.001% to risk undervaluing important risks. But not if we are using medians, as the XPT does.
I could be wrong on the above, but my take is that understanding the likely direction of errors in the “0.001% vs 0.000001%” scenarios maybe ought to be a priority.