Hey all-
I’m one of the developers at Cultivate Labs, the company that builds the forecasting platform for GJOpen and CSET Foretell. Really enjoyed the post. I get the sense that some of you may already know a bunch of this, but thought it might be worth chiming in:
Re: Incentives to selectively pick questions.
In the scoring system we typically use (Relative Brier Scores aka Net Brier Points), this tends to not be an issue (as suggested in the last paragraph of that section). You’re incentivized to forecast on questions where you think you can improve the aggregate forecast, which is exactly what we want.
By using a relative score, it also negates the need to force people to forecast on every question, since not forecasting gives you no score, which is effectively the median score. Copying the community forecast also becomes moot, since you get the same result by not forecasting. This system also does reward “first movers” since you can accumulate points each day—forecasts that are early & accurate will get a better score than those that are late & accurate.
Re: Incentives not to share information and to produce corrupt information
I agree with this in relation to forecaster rationales. My incentive is to not share new nuggets of information I used to formulate my forecast. A saving grace here, though, is that my forecast is still plainly visible. I could write a rationale trying to mislead people in order to encourage bad forecasts from them, but I’m unable to hide it if I forecast contrary to my misleading rationale. You still know my true beliefs—my probabilities.
Re: Discrete prizes distort forecasts
I agree that this is a challenge and it regularly concerns me that we’re creating perverse incentives. We’ve used the probabilistic rewards approach in the past and it seemed somewhat helpful. Generally, I think avoiding a top-heavy reward system is important and helpful.
One quasi-related and interesting thing, though, is that research has shown that the aggregate forecast is often not extreme enough and that you can improve the brier of the crowd by directly extremizing the aggregate forecast.
One of the biggest/most frequent complaints that we hear about our current system is that a Brier penalize misses more than it rewards hits. You can see this in the Raw Score vs. Probability Assigned to True Event chart in the wikipedia article that NunoSempere linked. We’ve discussed supporting a spherical scoring rule to make the reward/penalty more symmetrical, but haven’t pulled the trigger on it thus far.
Re: question selection—I agree that there are some edge cases where the scoring system doesn’t have perfect incentives around question selection (Nuno’s being a good example). But for us, getting people to forecast at all in these tournaments has been a much, much bigger problem than any question selection nuances inherent in the scoring system. If improving the overall system accuracy is the primary goal, we’re much more likely (IMO) to get more juice out of focusing time/resources/effort on increasing overall participation.
Re: extremizing—I haven’t read specific papers on this (though there are probably some out there from the IARPA ACE program, if I had to guess). This might be related, but I admit I haven’t actually read it :) - https://arxiv.org/pdf/1506.06405.pdf
But we’ve seen improvements in the aggregate forecast’s Brier score if we apply very basic extremization to it (ie. anything <50% gets pushed closer to 0, anything above 50% gets pushed closer to 100%). This was true even when we showed the crowd forecast to individuals. But I’ll also be the first to admit that connecting this to the idea that an overconfidence incentive is a good thing is purely speculative and is not something we’ve explicitly tested/investigated.