A related issue is that, if one is maximizing the difference between oneâs Brier score and the aggregateâs Brier score, one is incentivized to pick questions for which the one thinks the aggregate is particularly wrong. This is not necessarily a problem, but can be.
Whatâs the issue with this? Isnât this exactly what we want, to incentivize people to correct bad predictions? This gets us closer to prediction/âbetting markets.
Perhaps this should have been âyou are incentivized to only pick questions for which you think the aggregate is particularly wrongâ (according to the distance implied by your scoring rule), and neglect other questions. Essentially, itâs the same problem as for the raw brier score:
Specifically, if someone has a brier score b^2, then they should not make a prediction on any question where the probability is between b and (1-b), even if they know the true probability exactly
but one step removed.
This is particularly noticeable and egregious in the case of important questions for which the probability is very low, for example Will Chinaâs Three Gorges Dam fail before 1 October 2020?, where the difference between ~0% and 3% is important back in reality. But predicting on this question will lower your Brier score difference (because if you think itâs ~0%, the difference in Brier score will be very small; 0%=>0 vs 3%=>0.0018, where good forecasters tend to have much higher differences.)
One solution we tried at some foretold experiments was to pay out more (in the Brier score case, this would correspond to multiplying the brier score difference from the aggregate by a set amount) for questions we considered more important, so that even correcting smaller errors would be worth it.
Note that prediction markets still have a similar problem, where transactions fees and interest rates also mean that if the error is small enough you are also not incentivized to correct it.
Nuño might have additional thoughts, but I have a couple of concerns here.
Itâs possible to run into the following issues even (/âespecially) when people are âplaying perfectlyâ, at least in terms of trying to maximise points:
Correctly making the same forecast as the crowd doesnât have 0 value, as it makes the crowd prediction more robust to future bad predictions, however it does not earn you any points.
You are very strongly disincentivised from posting evidence that the crowd is wrong when you are in fact correct to disagree with the crowd.
Somewhat seperately, I think this particular scoring system risks people making some bad decisions from both a points perspective and a good forecasting perspective:
Thereâs a fine line between people understanding âI get more points if I am correct and the crowd is wrongâ and âI get more points if I disagree with the crowdâ, with the second line of reasoning potentially leading to people updating their forecasts away from the median in order to maximise their points potential.
Given how good crowds tend to be, most of the time when you think the crowd is very wrong, you are the person who is very wrong.
Edit: I re-ordered the points above in order to try to be more clear, not all of them are concerned about exactly the same thing.
Thereâs a fine line between people understanding âI get more points if I am correct and the crowd is wrongâ and âI get more points if I disagree with the crowdâ, with the second line of reasoning potentially leading to people updating their forecasts away from the median in order to maximise their points potential.
This shouldnât be a problem in the limit with a proper scoring rule.
Whatâs the issue with this? Isnât this exactly what we want, to incentivize people to correct bad predictions? This gets us closer to prediction/âbetting markets.
Perhaps this should have been âyou are incentivized to only pick questions for which you think the aggregate is particularly wrongâ (according to the distance implied by your scoring rule), and neglect other questions. Essentially, itâs the same problem as for the raw brier score:
but one step removed.
This is particularly noticeable and egregious in the case of important questions for which the probability is very low, for example Will Chinaâs Three Gorges Dam fail before 1 October 2020?, where the difference between ~0% and 3% is important back in reality. But predicting on this question will lower your Brier score difference (because if you think itâs ~0%, the difference in Brier score will be very small; 0%=>0 vs 3%=>0.0018, where good forecasters tend to have much higher differences.)
One solution we tried at some foretold experiments was to pay out more (in the Brier score case, this would correspond to multiplying the brier score difference from the aggregate by a set amount) for questions we considered more important, so that even correcting smaller errors would be worth it.
Note that prediction markets still have a similar problem, where transactions fees and interest rates also mean that if the error is small enough you are also not incentivized to correct it.
Nuño might have additional thoughts, but I have a couple of concerns here.
Itâs possible to run into the following issues even (/âespecially) when people are âplaying perfectlyâ, at least in terms of trying to maximise points:
Correctly making the same forecast as the crowd doesnât have 0 value, as it makes the crowd prediction more robust to future bad predictions, however it does not earn you any points.
You are very strongly disincentivised from posting evidence that the crowd is wrong when you are in fact correct to disagree with the crowd.
Somewhat seperately, I think this particular scoring system risks people making some bad decisions from both a points perspective and a good forecasting perspective:
Thereâs a fine line between people understanding âI get more points if I am correct and the crowd is wrongâ and âI get more points if I disagree with the crowdâ, with the second line of reasoning potentially leading to people updating their forecasts away from the median in order to maximise their points potential.
Given how good crowds tend to be, most of the time when you think the crowd is very wrong, you are the person who is very wrong.
Edit: I re-ordered the points above in order to try to be more clear, not all of them are concerned about exactly the same thing.
This shouldnât be a problem in the limit with a proper scoring rule.