Hi! I’m an author of this paper and am happy to answer questions. Thanks to Jsevillamol for the summary!
A quick note regarding the context in which the extremization factor we suggest is “optimal”: rather than taking a Bayesian view of forecast aggregation, we take a robust/”worst case” view. In brief, we consider the following setup:
(1) you choose an aggregation method.
(2) an adversary chooses an information structure (i.e. joint probability distribution over the true answer and what partial information each expert knows) to make your aggregation method do as poorly as possible in expectation (subject to the information structure satisfying the projective substitutes condition).
In this setup, the 1.73 extremization constant is optimal, i.e. maximizes worst-case performance.
That said, I think it’s probably possible to do even better by using a non-linear extremization technique. Concretely, I strongly suspect that the less variance there is in experts’ forecasts, the less it makes sense to extremize (because the experts have more overlap in the information they know). I would be curious to see how low a loss it’s possible to get by taking into account not just the average log odds, but also the variance in the experts’ log odds. Hopefully we will have formal results to this effect (together with a concrete suggestion for taking variance into account) sometime soon :)
Hi! I’m an author of this paper and am happy to answer questions. Thanks to Jsevillamol for the summary!
A quick note regarding the context in which the extremization factor we suggest is “optimal”: rather than taking a Bayesian view of forecast aggregation, we take a robust/”worst case” view. In brief, we consider the following setup:
(1) you choose an aggregation method.
(2) an adversary chooses an information structure (i.e. joint probability distribution over the true answer and what partial information each expert knows) to make your aggregation method do as poorly as possible in expectation (subject to the information structure satisfying the projective substitutes condition).
In this setup, the 1.73 extremization constant is optimal, i.e. maximizes worst-case performance.
That said, I think it’s probably possible to do even better by using a non-linear extremization technique. Concretely, I strongly suspect that the less variance there is in experts’ forecasts, the less it makes sense to extremize (because the experts have more overlap in the information they know). I would be curious to see how low a loss it’s possible to get by taking into account not just the average log odds, but also the variance in the experts’ log odds. Hopefully we will have formal results to this effect (together with a concrete suggestion for taking variance into account) sometime soon :)