This seems to connect to the concept of—fmeans: If the utility for an option is proportional to f(p), then the expected utility of your mixture model is equal to the expected utility using the f-mean of the expert’s probabilities p1 and p2 defined as f−1(f(p1)+f(p2)2), as the f in the utility calculation cancels out the f−1. If I recall correctly, all aggregation functions that fulfill some technical conditions on a generalized mean can be written as a f-mean.
In the first example, f is just linear, such that the f-mean is the arithmetic mean. In the second example, f is equal to the expected lifespan of 11−(1−p)=1p which yields the harmonic mean. As such, the geometric mean would correspond to the mixture model if and only if utility was logarithmic in p, as the geometric mean is the f-mean corresponding to the logarithm.
For a binary event with “true” probability q, the expected log-score for a forecast of p is qlog(p)∗(1−q)log(1−p)=log(pq(1−p)1−q), which equals log(√p1−p)=0.5log(p1−p) for q=0.5. So the geometric mean of odds would optimize yield the correct utility for the log-score according to the mixture model, if all the events we forecast were essentially coin tosses (which seems like a less satisfying synthesis than I hoped for).
Further questions that might be interesting to analyze from this point of view:
Is there some kind of approximate connection between the Brier score and the geometric mean of odds that could explain the empirical performance of the geometric mean on the Brier score? (There might very well not be anything, as the mixture model might not be the best way to think about aggregation).
What optimization target (under the mixture model) does extremization correspond to? Edit: As extremization is applied after the aggregation, it cannot be interpreted in terms of mixture models (if all forecasters give the same prediction, any f-mean has to have that value, but extremization yields a more extreme prediction.)
Note: After writing this, I noticed that UnexpectedValue’s comment on the top-level post essentially points to the same concept. I decided to still post this, as it seems more accessible than their technical paper while (probably) capturing the key insight.
Edit: Replaced “optimize” by “yield the correct utility for” in the third paragraph.
This seems to connect to the concept of—fmeans: If the utility for an option is proportional to f(p), then the expected utility of your mixture model is equal to the expected utility using the f-mean of the expert’s probabilities p1 and p2 defined as f−1(f(p1)+f(p2)2), as the f in the utility calculation cancels out the f−1. If I recall correctly, all aggregation functions that fulfill some technical conditions on a generalized mean can be written as a f-mean.
In the first example, f is just linear, such that the f-mean is the arithmetic mean. In the second example, f is equal to the expected lifespan of 11−(1−p)=1p which yields the harmonic mean. As such, the geometric mean would correspond to the mixture model if and only if utility was logarithmic in p, as the geometric mean is the f-mean corresponding to the logarithm.
For a binary event with “true” probability q, the expected log-score for a forecast of p is qlog(p)∗(1−q)log(1−p)=log(pq(1−p)1−q), which equals log(√p1−p)=0.5log(p1−p) for q=0.5. So the geometric mean of odds would
optimizeyield the correct utility for the log-score according to the mixture model, if all the events we forecast were essentially coin tosses (which seems like a less satisfying synthesis than I hoped for).Further questions that might be interesting to analyze from this point of view:
Is there some kind of approximate connection between the Brier score and the geometric mean of odds that could explain the empirical performance of the geometric mean on the Brier score? (There might very well not be anything, as the mixture model might not be the best way to think about aggregation).
What optimization target (under the mixture model) does extremization correspond to? Edit: As extremization is applied after the aggregation, it cannot be interpreted in terms of mixture models (if all forecasters give the same prediction, any f-mean has to have that value, but extremization yields a more extreme prediction.)
Note: After writing this, I noticed that UnexpectedValue’s comment on the top-level post essentially points to the same concept. I decided to still post this, as it seems more accessible than their technical paper while (probably) capturing the key insight.
Edit: Replaced “optimize” by “yield the correct utility for” in the third paragraph.
Thanks — I hadn’t heard of f-means before and it is a useful concept, and relevant here.