NunoSempere comments on When pooling forecasts, use the geometric mean of odds

NunoSempere 11 Sep 2021 12:04 UTC
6 points
1 ∶ 0
came back with 50% and 0.00000000001%.
I want to push back a bit against the use of 0.00000000001% in this example. In particular, I was sort of assuming that experts are kind of calibrated, and if two human experts have that sort of disagreement:
- Either this is the kind of scenario in which we’re discussing how a fair coin will land, and one of the experts has seen the coin
- Or something is very, very wrong
In particular, with some light selection of experts (e.g, decent Metaculus forecasters), I think you’d almost never see this kind of scenario unless someone was trolling you. In particular, if the 0.0..001% person was willing to bet a correspondingly high amount at those odds, I would probably weigh it very highly. And in this case I think the geometric mean would in fact be appropriate.
Though I guess that it wouldn’t be if you’re querying random experts who can randomly be catastrophically wrong, and the arithmetic mean would be more robust.
- Toby_Ord 13 Sep 2021 9:03 UTC
  6 points
  0 ∶ 0
  Parent
  I see what you mean, though you will find that scientific experts often end up endorsing probabilities like these. They model the situation, run the calculation and end up with 10^-12 and then say the probability is 10^-12. You are right that if you knew the experts were Bayesian and calibrated and aware of all the ways the model or calculation could be flawed, and had a good dose of humility, then you could read more into such small claimed probabilities — i.e. that they must have a mass of evidence they have not yet shared. But we are very rarely in a situation like that. Averaging a selection of Metaculus forecasters may be close, but is quite a special case when you think more broadly about the question of how to aggregate expert predictions.
  - NunoSempere 13 Sep 2021 14:52 UTC
    7 points
    0 ∶ 0
    Parent
    They model the situation, run the calculation and end up with 10^-12 and then say the probability is 10^-12.
    Consider that if you’re aggregating expert predictions, you might be generating probabilities too soon. Instead you could for instance interview the subject-matter experts, make the transcript available to expert forecasters, and then aggregate the probabilities of the latter. This might produce more accurate probabilities.
- Misha_Yagudin 11 Sep 2021 13:40 UTC
  5 points
  0 ∶ 0
  Parent
  I endorse Nuño’s comment re: 0.00000000001%.
  
  While it’s pretty easy to agree that a probability of a stupid mistake/typo is greater than 0.00000000001%, it is sometimes hard to follow in practice. I think Yudkowsky communicates it’s well on a more visceral level in his Infinite Certainty essay. I got to another level of appreciation of this point after doing a calibration exercise for mental arithmetics — all errors were unpredictable “oups” like misreading plus for minus or selecting the wrong answer after making correct calculations.