Perhaps this should have been “you are incentivized to only pick questions for which you think the aggregate is particularly wrong” (according to the distance implied by your scoring rule), and neglect other questions. Essentially, it’s the same problem as for the raw brier score:
Specifically, if someone has a brier score b^2, then they should not make a prediction on any question where the probability is between b and (1-b), even if they know the true probability exactly
but one step removed.
This is particularly noticeable and egregious in the case of important questions for which the probability is very low, for example Will China’s Three Gorges Dam fail before 1 October 2020?, where the difference between ~0% and 3% is important back in reality. But predicting on this question will lower your Brier score difference (because if you think it’s ~0%, the difference in Brier score will be very small; 0%=>0 vs 3%=>0.0018, where good forecasters tend to have much higher differences.)
One solution we tried at some foretold experiments was to pay out more (in the Brier score case, this would correspond to multiplying the brier score difference from the aggregate by a set amount) for questions we considered more important, so that even correcting smaller errors would be worth it.
Note that prediction markets still have a similar problem, where transactions fees and interest rates also mean that if the error is small enough you are also not incentivized to correct it.
Perhaps this should have been “you are incentivized to only pick questions for which you think the aggregate is particularly wrong” (according to the distance implied by your scoring rule), and neglect other questions. Essentially, it’s the same problem as for the raw brier score:
but one step removed.
This is particularly noticeable and egregious in the case of important questions for which the probability is very low, for example Will China’s Three Gorges Dam fail before 1 October 2020?, where the difference between ~0% and 3% is important back in reality. But predicting on this question will lower your Brier score difference (because if you think it’s ~0%, the difference in Brier score will be very small; 0%=>0 vs 3%=>0.0018, where good forecasters tend to have much higher differences.)
One solution we tried at some foretold experiments was to pay out more (in the Brier score case, this would correspond to multiplying the brier score difference from the aggregate by a set amount) for questions we considered more important, so that even correcting smaller errors would be worth it.
Note that prediction markets still have a similar problem, where transactions fees and interest rates also mean that if the error is small enough you are also not incentivized to correct it.