A related issue is that, if one is maximizing the difference between one’s Brier score and the aggregate’s Brier score, one is incentivized to pick questions for which the one thinks the aggregate is particularly wrong. This is not necessarily a problem, but can be.
What’s the issue with this? Isn’t this exactly what we want, to incentivize people to correct bad predictions? This gets us closer to prediction/betting markets.
Perhaps this should have been “you are incentivized to only pick questions for which you think the aggregate is particularly wrong” (according to the distance implied by your scoring rule), and neglect other questions. Essentially, it’s the same problem as for the raw brier score:
Specifically, if someone has a brier score b^2, then they should not make a prediction on any question where the probability is between b and (1-b), even if they know the true probability exactly
but one step removed.
This is particularly noticeable and egregious in the case of important questions for which the probability is very low, for example Will China’s Three Gorges Dam fail before 1 October 2020?, where the difference between ~0% and 3% is important back in reality. But predicting on this question will lower your Brier score difference (because if you think it’s ~0%, the difference in Brier score will be very small; 0%=>0 vs 3%=>0.0018, where good forecasters tend to have much higher differences.)
One solution we tried at some foretold experiments was to pay out more (in the Brier score case, this would correspond to multiplying the brier score difference from the aggregate by a set amount) for questions we considered more important, so that even correcting smaller errors would be worth it.
Note that prediction markets still have a similar problem, where transactions fees and interest rates also mean that if the error is small enough you are also not incentivized to correct it.
Nuño might have additional thoughts, but I have a couple of concerns here.
It’s possible to run into the following issues even (/especially) when people are “playing perfectly”, at least in terms of trying to maximise points:
Correctly making the same forecast as the crowd doesn’t have 0 value, as it makes the crowd prediction more robust to future bad predictions, however it does not earn you any points.
You are very strongly disincentivised from posting evidence that the crowd is wrong when you are in fact correct to disagree with the crowd.
Somewhat seperately, I think this particular scoring system risks people making some bad decisions from both a points perspective and a good forecasting perspective:
There’s a fine line between people understanding “I get more points if I am correct and the crowd is wrong” and “I get more points if I disagree with the crowd”, with the second line of reasoning potentially leading to people updating their forecasts away from the median in order to maximise their points potential.
Given how good crowds tend to be, most of the time when you think the crowd is very wrong, you are the person who is very wrong.
Edit: I re-ordered the points above in order to try to be more clear, not all of them are concerned about exactly the same thing.
There’s a fine line between people understanding “I get more points if I am correct and the crowd is wrong” and “I get more points if I disagree with the crowd”, with the second line of reasoning potentially leading to people updating their forecasts away from the median in order to maximise their points potential.
This shouldn’t be a problem in the limit with a proper scoring rule.
What’s the issue with this? Isn’t this exactly what we want, to incentivize people to correct bad predictions? This gets us closer to prediction/betting markets.
Perhaps this should have been “you are incentivized to only pick questions for which you think the aggregate is particularly wrong” (according to the distance implied by your scoring rule), and neglect other questions. Essentially, it’s the same problem as for the raw brier score:
but one step removed.
This is particularly noticeable and egregious in the case of important questions for which the probability is very low, for example Will China’s Three Gorges Dam fail before 1 October 2020?, where the difference between ~0% and 3% is important back in reality. But predicting on this question will lower your Brier score difference (because if you think it’s ~0%, the difference in Brier score will be very small; 0%=>0 vs 3%=>0.0018, where good forecasters tend to have much higher differences.)
One solution we tried at some foretold experiments was to pay out more (in the Brier score case, this would correspond to multiplying the brier score difference from the aggregate by a set amount) for questions we considered more important, so that even correcting smaller errors would be worth it.
Note that prediction markets still have a similar problem, where transactions fees and interest rates also mean that if the error is small enough you are also not incentivized to correct it.
Nuño might have additional thoughts, but I have a couple of concerns here.
It’s possible to run into the following issues even (/especially) when people are “playing perfectly”, at least in terms of trying to maximise points:
Correctly making the same forecast as the crowd doesn’t have 0 value, as it makes the crowd prediction more robust to future bad predictions, however it does not earn you any points.
You are very strongly disincentivised from posting evidence that the crowd is wrong when you are in fact correct to disagree with the crowd.
Somewhat seperately, I think this particular scoring system risks people making some bad decisions from both a points perspective and a good forecasting perspective:
There’s a fine line between people understanding “I get more points if I am correct and the crowd is wrong” and “I get more points if I disagree with the crowd”, with the second line of reasoning potentially leading to people updating their forecasts away from the median in order to maximise their points potential.
Given how good crowds tend to be, most of the time when you think the crowd is very wrong, you are the person who is very wrong.
Edit: I re-ordered the points above in order to try to be more clear, not all of them are concerned about exactly the same thing.
This shouldn’t be a problem in the limit with a proper scoring rule.