I want to push back a bit against the use of 0.00000000001% in this example. In particular, I was sort of assuming that experts are kind of calibrated, and if two human experts have that sort of disagreement:
Either this is the kind of scenario in which we’re discussing how a fair coin will land, and one of the experts has seen the coin
Or something is very, very wrong
In particular, with some light selection of experts (e.g, decent Metaculus forecasters), I think you’d almost never see this kind of scenario unless someone was trolling you. In particular, if the 0.0..001% person was willing to bet a correspondingly high amount at those odds, I would probably weigh it very highly. And in this case I think the geometric mean would in fact be appropriate.
Though I guess that it wouldn’t be if you’re querying random experts who can randomly be catastrophically wrong, and the arithmetic mean would be more robust.
I see what you mean, though you will find that scientific experts often end up endorsing probabilities like these. They model the situation, run the calculation and end up with 10^-12 and then say the probability is 10^-12. You are right that if you knew the experts were Bayesian and calibrated and aware of all the ways the model or calculation could be flawed, and had a good dose of humility, then you could read more into such small claimed probabilities — i.e. that they must have a mass of evidence they have not yet shared. But we are very rarely in a situation like that. Averaging a selection of Metaculus forecasters may be close, but is quite a special case when you think more broadly about the question of how to aggregate expert predictions.
They model the situation, run the calculation and end up with 10^-12 and then say the probability is 10^-12.
Consider that if you’re aggregating expert predictions, you might be generating probabilities too soon. Instead you could for instance interview the subject-matter experts, make the transcript available to expert forecasters, and then aggregate the probabilities of the latter. This might produce more accurate probabilities.
While it’s pretty easy to agree that a probability of a stupid mistake/typo is greater than 0.00000000001%, it is sometimes hard to follow in practice. I think Yudkowsky communicates it’s well on a more visceral level in his Infinite Certainty essay. I got to another level of appreciation of this point after doing a calibration exercise for mental arithmetics — all errors were unpredictable “oups” like misreading plus for minus or selecting the wrong answer after making correct calculations.
I want to push back a bit against the use of 0.00000000001% in this example. In particular, I was sort of assuming that experts are kind of calibrated, and if two human experts have that sort of disagreement:
Either this is the kind of scenario in which we’re discussing how a fair coin will land, and one of the experts has seen the coin
Or something is very, very wrong
In particular, with some light selection of experts (e.g, decent Metaculus forecasters), I think you’d almost never see this kind of scenario unless someone was trolling you. In particular, if the 0.0..001% person was willing to bet a correspondingly high amount at those odds, I would probably weigh it very highly. And in this case I think the geometric mean would in fact be appropriate.
Though I guess that it wouldn’t be if you’re querying random experts who can randomly be catastrophically wrong, and the arithmetic mean would be more robust.
I see what you mean, though you will find that scientific experts often end up endorsing probabilities like these. They model the situation, run the calculation and end up with 10^-12 and then say the probability is 10^-12. You are right that if you knew the experts were Bayesian and calibrated and aware of all the ways the model or calculation could be flawed, and had a good dose of humility, then you could read more into such small claimed probabilities — i.e. that they must have a mass of evidence they have not yet shared. But we are very rarely in a situation like that. Averaging a selection of Metaculus forecasters may be close, but is quite a special case when you think more broadly about the question of how to aggregate expert predictions.
Consider that if you’re aggregating expert predictions, you might be generating probabilities too soon. Instead you could for instance interview the subject-matter experts, make the transcript available to expert forecasters, and then aggregate the probabilities of the latter. This might produce more accurate probabilities.
I endorse Nuño’s comment re: 0.00000000001%.
While it’s pretty easy to agree that a probability of a stupid mistake/typo is greater than 0.00000000001%, it is sometimes hard to follow in practice. I think Yudkowsky communicates it’s well on a more visceral level in his Infinite Certainty essay. I got to another level of appreciation of this point after doing a calibration exercise for mental arithmetics — all errors were unpredictable “oups” like misreading plus for minus or selecting the wrong answer after making correct calculations.