I do agree that when new evidence comes in about the experts we should change how we weight them. But when we are pooling the probabilities we aren’t receiving any extra evidence about the experts (?).
Right, the evidence about the experts come from the new evidence that’s being updated on, not the pooling procedure. Suppose we’re pooling expert judgments, and we initially consider them all equally credible, so we use a symmetric pooling method. Then some evidence comes in. Our experts update on the evidence, and we also update on how credible each expert is, and pool their updated judgments together using an asymmetric pooling method, weighting experts by how well they anticipated evidence we’ve seen so far. This is clearest in the case where each expert is using some model, and we believe one of their models is correct but don’t know which one (the case you already agreed arithmetic averages of probabilities are appropriate). If we were weighting them all equally, and then we get some evidence that expert 1 thought was twice as likely as expert 2, then now we should think that expert 1 is twice as likely to be the one with the correct model as expert 2 is, and take a weighted arithmetic mean of their new probabilities where we weight expert 1 twice as heavily as expert 1. When you do this, your pooled probabilities handle Bayesian updates correctly. My point was that, even outside of this particular situation, we should still be taking expert credibility into account in some way, and expert credibility should depend on how well the expert anticipated observed evidence. If two experts assign odds ratios r0 and s0 to some event before observing new evidence, and we pool these into the odds ratio r1/20s1/20, and then we receive some evidence causing the experts to update to r1 and s1, respectively, but expert r anticipated that evidence better than expert s did, then I’d think this should mean we would weight expert r more heavily, and pool their new odds ratios into r2/31s1/31, or something like that. But we won’t handle Bayesian updates correctly if we do! The external Bayesianity property of the mean log odds pooling method means that to handle Bayesian updates correctly, we must update to the odds ratio r1/21s1/21, as if we learned nothing about the relative credibility of the two experts.
I agree that the way I presented it I framed the extreme expert as more knowledgeable. I did this for illustrative purposes. But I believe the setting works just as well when we take both experts to be equally knowledgeable / calibrated.
I suppose one reason not to see this as unfairly biased towards mean log odds is if you generally expect experts who give more extreme probabilities to actually be more knowledgeable in practice. I gave an example in my post illustrating why this isn’t always true, but a couple commenters on my post gave models for why it’s true under some assumptions, and I suppose it’s probably true in the data you’ve been using that’s been empirically supporting mean log odds.
Throwing away the information from the extreme prediction seems bad.
I can see where you’re coming from, but have an intuition that the geometric mean still trusts the information from outlying extreme predictions too much, which made a possible compromise solution occur to me, which to be clear, I’m not seriously endorsing.
I notice this is very surprising to me, because averaging log odds is anything but senseless.
I called it that because of its poor theoretical properties (I’m still not convinced they arise naturally in any circumstances), but in retrospect I don’t really endorse this given the apparently good empirical performance of mean log odds.
log odds make Bayes rule additive, and I expect means to work well when the underlying objects are additive
My take on this is that multiplying odds ratios is indeed a natural operation that you should expect to be an appropriate thing to do in many circumstances, but that taking the nth root of an odds ratio is not a natural operation, and neither is taking geometric means of odds ratios, which combines both of those operations. On the other hand, while adding probabilities is not a natural operation, taking weighted averages of probabilities is.
My gut feeling is this argument relies on an adversarial setting where you might get exploited. And this probably means that you should come up with a probability range for the additional evidence your opponent might have.
So if you think their evidence is uniformly distributed over −1 and 1 bits, you should combine that with your evidence by adding that evidence to your logarithmic odds. This gives you a probability distribution over the possible values. Then use that spread to decide which bet odds are worth the risk of exploitation.
Right, but I was talking about doing that backwards. If you’ve already worked out for which odds it’s worth accepting bets in each direction at, recover the probability that you must currently be assigning to the event in question. Arithmetic means of the bounds on probabilities implied by the bets you’d accept is a rough approximation to this: If you would be on X at odds implying any probability less than 2%, and you’d bet against X at odds implying any probability greater than 50%, then this is consistent with you currently assigning probability 26% to X, with a 50% chance that an adversary has evidence against X (in which case X has a 2% chance of being true), and a 50% chance that an adversary has evidence for X (in which case X has a 50% chance of being true).
I do not understand how this is about pooling different expert probabilities. But I might be misunderstanding your point.
It isn’t. My post was about pooling multiple probabilities of the same event. One source of multiple probabilities of the same event is the beliefs of different experts, which your post focused on exclusively. But a different possible source of multiple probabilities of the same event is the bounds in each direction on the probability of some event implied by the betting behavior of a single expert.
Right, the evidence about the experts come from the new evidence that’s being updated on, not the pooling procedure. Suppose we’re pooling expert judgments, and we initially consider them all equally credible, so we use a symmetric pooling method. Then some evidence comes in. Our experts update on the evidence, and we also update on how credible each expert is, and pool their updated judgments together using an asymmetric pooling method, weighting experts by how well they anticipated evidence we’ve seen so far. This is clearest in the case where each expert is using some model, and we believe one of their models is correct but don’t know which one (the case you already agreed arithmetic averages of probabilities are appropriate). If we were weighting them all equally, and then we get some evidence that expert 1 thought was twice as likely as expert 2, then now we should think that expert 1 is twice as likely to be the one with the correct model as expert 2 is, and take a weighted arithmetic mean of their new probabilities where we weight expert 1 twice as heavily as expert 1. When you do this, your pooled probabilities handle Bayesian updates correctly. My point was that, even outside of this particular situation, we should still be taking expert credibility into account in some way, and expert credibility should depend on how well the expert anticipated observed evidence. If two experts assign odds ratios r0 and s0 to some event before observing new evidence, and we pool these into the odds ratio r1/20s1/20, and then we receive some evidence causing the experts to update to r1 and s1, respectively, but expert r anticipated that evidence better than expert s did, then I’d think this should mean we would weight expert r more heavily, and pool their new odds ratios into r2/31s1/31, or something like that. But we won’t handle Bayesian updates correctly if we do! The external Bayesianity property of the mean log odds pooling method means that to handle Bayesian updates correctly, we must update to the odds ratio r1/21s1/21, as if we learned nothing about the relative credibility of the two experts.
I suppose one reason not to see this as unfairly biased towards mean log odds is if you generally expect experts who give more extreme probabilities to actually be more knowledgeable in practice. I gave an example in my post illustrating why this isn’t always true, but a couple commenters on my post gave models for why it’s true under some assumptions, and I suppose it’s probably true in the data you’ve been using that’s been empirically supporting mean log odds.
I can see where you’re coming from, but have an intuition that the geometric mean still trusts the information from outlying extreme predictions too much, which made a possible compromise solution occur to me, which to be clear, I’m not seriously endorsing.
I called it that because of its poor theoretical properties (I’m still not convinced they arise naturally in any circumstances), but in retrospect I don’t really endorse this given the apparently good empirical performance of mean log odds.
My take on this is that multiplying odds ratios is indeed a natural operation that you should expect to be an appropriate thing to do in many circumstances, but that taking the nth root of an odds ratio is not a natural operation, and neither is taking geometric means of odds ratios, which combines both of those operations. On the other hand, while adding probabilities is not a natural operation, taking weighted averages of probabilities is.
Right, but I was talking about doing that backwards. If you’ve already worked out for which odds it’s worth accepting bets in each direction at, recover the probability that you must currently be assigning to the event in question. Arithmetic means of the bounds on probabilities implied by the bets you’d accept is a rough approximation to this: If you would be on X at odds implying any probability less than 2%, and you’d bet against X at odds implying any probability greater than 50%, then this is consistent with you currently assigning probability 26% to X, with a 50% chance that an adversary has evidence against X (in which case X has a 2% chance of being true), and a 50% chance that an adversary has evidence for X (in which case X has a 50% chance of being true).
It isn’t. My post was about pooling multiple probabilities of the same event. One source of multiple probabilities of the same event is the beliefs of different experts, which your post focused on exclusively. But a different possible source of multiple probabilities of the same event is the bounds in each direction on the probability of some event implied by the betting behavior of a single expert.