(1) It seems like 5⁄6 of the essays are about AI risk, and not TAGI by 2043. I thought there were going to be 3 winners on each topic, but perhaps that was never stated in the rules. Rereading, it just says there would be two 1st places, two 2nd places, and two 3rd places. Seems the judges were more interested in (or persuaded by) arguments on AI safety & alignment, rather than TAGI within 20 years. A bit disappointing for everyone who wrote on the second topic. If the judges were more interested in safety & alignment forecasting, that would have been nice to know ahead of time.
(2) I’m also surprised that the Dissolving AI Risk paper was chosen. (No disrespect intended; it was clearly a thoughtful piece.)
To me, it makes perfect sense to dissolve the Fermi paradox by pointing out that the expected # of alien civilizations is a very different quantity than the probability of 0 alien civilizations. It’s logically possible to have both a high expectation and a high probability of 0.
But it makes almost no sense to me to dissolve probabilities by factoring them into probabilities of probabilities, and then take the geometric mean of that distribution. Taking the geometric mean of subprobabilities feels like a sleight of hand to end up with a lower number than what you started with, with zero new information added in the process. I feel like I must have missed the main point, so I’ll reread the paper.
Edit: After re-reading, it makes more sense to me. The paper takes the geometric means of odds ratios in order to aggregate survey entries. It doesn’t take the geometric mean of probabilities, and it doesn’t slice up probabilities arbitrarily (as they are the distribution over surveyed forecasters).
Edit2: As Jaime says below, the greater error is assuming independence of each stage. The original discussion got quite nerd-sniped by the geometric averaging, which is a bit of a shame, as there’s a lot more to the piece to discuss and debate.
(I agree that geometric-mean-of-odds is an irrelevant statistic and ‘Dissolving’ AI Risk’s headline number should be the mean-of-probabilities, 9.7%. I think some commenters noticed that too.)
Question: Do you happen to understand what it means to take a geometric mean of probabilities? In re-reading the paper, I’m realizing I don’t understand the methodology at all. For example, if there is a 33% chance we live in a world with 0% probability of doom, a 33% chance we live in a world with 50% probability of doom, and a 33% chance we live in a world with 100% probability of doom… then the geometric mean is (0% x 50% x 100%)^(1/3) = 0%, right?
Edit: Apparently the paper took a geometric mean of odds ratios, not probabilities. But this still means that had a single surveyed person said 0%, the entire model would collapse to 0%, which is wrong on its face.
Yeah, I agree; I think the geometric mean is degenerate unless your probability distribution quickly approaches density-0 around 0% and 100%. This is an intuition pump for why the geometric mean is the wrong statistic.
Also if you’re taking the geometric mean I think you should take it of the odds ratio (as the author does) rather than the probability; e.g. this makes probability-0 symmetric with probability-1.
I have grips with the methodology of the article, but I don’t think highlighting the geometric mean of odds over the mean of probabilities is a major fault. The core problem is assuming independence over the predictions at each stage. The right move would have been to aggregate the total P(doom) of each forecaster using geo mean of odds (not that I think that asking random people and aggregating their beliefs like this is particularly strong evidence).
The intuition pump that if someone assigns a zero percent chance then the geomean aggregate breaks is flawed:
There is an equally compelling pump the other way around: the arithmetic mean of probabilities defers unduly to people assigning a high chance. A single dissenter between 10 experts can bound the lower bound of the probability to their preferred up to a factor of 10.
And surely if anyone is assigning a zero percent chance to something, you can safely assume they are not taking the situation seriously and ignore them.
And if you are still worried about dissenters skewing the predictions, one common strategy is to winsorize, by clipping the predictions among the 5% and 95% percentile for example.
Congrats to the winners, readers, and writers!
Two big surprises for me:
(1) It seems like 5⁄6 of the essays are about AI risk, and not TAGI by 2043. I thought there were going to be 3 winners on each topic, but perhaps that was never stated in the rules. Rereading, it just says there would be two 1st places, two 2nd places, and two 3rd places. Seems the judges were more interested in (or persuaded by) arguments on AI safety & alignment, rather than TAGI within 20 years. A bit disappointing for everyone who wrote on the second topic. If the judges were more interested in safety & alignment forecasting, that would have been nice to know ahead of time.
(2) I’m also surprised that the Dissolving AI Risk paper was chosen. (No disrespect intended; it was clearly a thoughtful piece.)
To me, it makes perfect sense to dissolve the Fermi paradox by pointing out that the expected # of alien civilizations is a very different quantity than the probability of 0 alien civilizations. It’s logically possible to have both a high expectation and a high probability of 0.
But it makes almost no sense to me to dissolve probabilities by factoring them into probabilities of probabilities, and then take the geometric mean of that distribution. Taking the geometric mean of subprobabilities feels like a sleight of hand to end up with a lower number than what you started with, with zero new information added in the process. I feel like I must have missed the main point, so I’ll reread the paper.
Edit: After re-reading, it makes more sense to me. The paper takes the geometric means of odds ratios in order to aggregate survey entries. It doesn’t take the geometric mean of probabilities, and it doesn’t slice up probabilities arbitrarily (as they are the distribution over surveyed forecasters).
Edit2: As Jaime says below, the greater error is assuming independence of each stage. The original discussion got quite nerd-sniped by the geometric averaging, which is a bit of a shame, as there’s a lot more to the piece to discuss and debate.
(I agree that geometric-mean-of-odds is an irrelevant statistic and ‘Dissolving’ AI Risk’s headline number should be the mean-of-probabilities, 9.7%. I think some commenters noticed that too.)
Question: Do you happen to understand what it means to take a geometric mean of probabilities? In re-reading the paper, I’m realizing I don’t understand the methodology at all. For example, if there is a 33% chance we live in a world with 0% probability of doom, a 33% chance we live in a world with 50% probability of doom, and a 33% chance we live in a world with 100% probability of doom… then the geometric mean is (0% x 50% x 100%)^(1/3) = 0%, right?
Edit: Apparently the paper took a geometric mean of odds ratios, not probabilities. But this still means that had a single surveyed person said 0%, the entire model would collapse to 0%, which is wrong on its face.
Yeah, I agree; I think the geometric mean is degenerate unless your probability distribution quickly approaches density-0 around 0% and 100%. This is an intuition pump for why the geometric mean is the wrong statistic.
Also if you’re taking the geometric mean I think you should take it of the odds ratio (as the author does) rather than the probability; e.g. this makes probability-0 symmetric with probability-1.
[To be clear I haven’t read most of the post.]
I have grips with the methodology of the article, but I don’t think highlighting the geometric mean of odds over the mean of probabilities is a major fault. The core problem is assuming independence over the predictions at each stage. The right move would have been to aggregate the total P(doom) of each forecaster using geo mean of odds (not that I think that asking random people and aggregating their beliefs like this is particularly strong evidence).
The intuition pump that if someone assigns a zero percent chance then the geomean aggregate breaks is flawed:
There is an equally compelling pump the other way around: the arithmetic mean of probabilities defers unduly to people assigning a high chance. A single dissenter between 10 experts can bound the lower bound of the probability to their preferred up to a factor of 10.
And surely if anyone is assigning a zero percent chance to something, you can safely assume they are not taking the situation seriously and ignore them.
In ultimate instance, we can theorize all we want, but as a matter of fact the best performance when predicting complex events is achieved when taking the geometric mean of odds, both in terms of logloss and brier scores. Without more compelling evidence or a very clear theoretical reason that distinguishes between the contexts, it seems weird to argue that we should treat AI risk differently.
And if you are still worried about dissenters skewing the predictions, one common strategy is to winsorize, by clipping the predictions among the 5% and 95% percentile for example.