(I agree that geometric-mean-of-odds is an irrelevant statistic and ‘Dissolving’ AI Risk’s headline number should be the mean-of-probabilities, 9.7%. I think some commenters noticed that too.)
Question: Do you happen to understand what it means to take a geometric mean of probabilities? In re-reading the paper, I’m realizing I don’t understand the methodology at all. For example, if there is a 33% chance we live in a world with 0% probability of doom, a 33% chance we live in a world with 50% probability of doom, and a 33% chance we live in a world with 100% probability of doom… then the geometric mean is (0% x 50% x 100%)^(1/3) = 0%, right?
Edit: Apparently the paper took a geometric mean of odds ratios, not probabilities. But this still means that had a single surveyed person said 0%, the entire model would collapse to 0%, which is wrong on its face.
Yeah, I agree; I think the geometric mean is degenerate unless your probability distribution quickly approaches density-0 around 0% and 100%. This is an intuition pump for why the geometric mean is the wrong statistic.
Also if you’re taking the geometric mean I think you should take it of the odds ratio (as the author does) rather than the probability; e.g. this makes probability-0 symmetric with probability-1.
I have grips with the methodology of the article, but I don’t think highlighting the geometric mean of odds over the mean of probabilities is a major fault. The core problem is assuming independence over the predictions at each stage. The right move would have been to aggregate the total P(doom) of each forecaster using geo mean of odds (not that I think that asking random people and aggregating their beliefs like this is particularly strong evidence).
The intuition pump that if someone assigns a zero percent chance then the geomean aggregate breaks is flawed:
There is an equally compelling pump the other way around: the arithmetic mean of probabilities defers unduly to people assigning a high chance. A single dissenter between 10 experts can bound the lower bound of the probability to their preferred up to a factor of 10.
And surely if anyone is assigning a zero percent chance to something, you can safely assume they are not taking the situation seriously and ignore them.
And if you are still worried about dissenters skewing the predictions, one common strategy is to winsorize, by clipping the predictions among the 5% and 95% percentile for example.
(I agree that geometric-mean-of-odds is an irrelevant statistic and ‘Dissolving’ AI Risk’s headline number should be the mean-of-probabilities, 9.7%. I think some commenters noticed that too.)
Question: Do you happen to understand what it means to take a geometric mean of probabilities? In re-reading the paper, I’m realizing I don’t understand the methodology at all. For example, if there is a 33% chance we live in a world with 0% probability of doom, a 33% chance we live in a world with 50% probability of doom, and a 33% chance we live in a world with 100% probability of doom… then the geometric mean is (0% x 50% x 100%)^(1/3) = 0%, right?
Edit: Apparently the paper took a geometric mean of odds ratios, not probabilities. But this still means that had a single surveyed person said 0%, the entire model would collapse to 0%, which is wrong on its face.
Yeah, I agree; I think the geometric mean is degenerate unless your probability distribution quickly approaches density-0 around 0% and 100%. This is an intuition pump for why the geometric mean is the wrong statistic.
Also if you’re taking the geometric mean I think you should take it of the odds ratio (as the author does) rather than the probability; e.g. this makes probability-0 symmetric with probability-1.
[To be clear I haven’t read most of the post.]
I have grips with the methodology of the article, but I don’t think highlighting the geometric mean of odds over the mean of probabilities is a major fault. The core problem is assuming independence over the predictions at each stage. The right move would have been to aggregate the total P(doom) of each forecaster using geo mean of odds (not that I think that asking random people and aggregating their beliefs like this is particularly strong evidence).
The intuition pump that if someone assigns a zero percent chance then the geomean aggregate breaks is flawed:
There is an equally compelling pump the other way around: the arithmetic mean of probabilities defers unduly to people assigning a high chance. A single dissenter between 10 experts can bound the lower bound of the probability to their preferred up to a factor of 10.
And surely if anyone is assigning a zero percent chance to something, you can safely assume they are not taking the situation seriously and ignore them.
In ultimate instance, we can theorize all we want, but as a matter of fact the best performance when predicting complex events is achieved when taking the geometric mean of odds, both in terms of logloss and brier scores. Without more compelling evidence or a very clear theoretical reason that distinguishes between the contexts, it seems weird to argue that we should treat AI risk differently.
And if you are still worried about dissenters skewing the predictions, one common strategy is to winsorize, by clipping the predictions among the 5% and 95% percentile for example.