Re: question selection—I agree that there are some edge cases where the scoring system doesn’t have perfect incentives around question selection (Nuno’s being a good example). But for us, getting people to forecast at all in these tournaments has been a much, much bigger problem than any question selection nuances inherent in the scoring system. If improving the overall system accuracy is the primary goal, we’re much more likely (IMO) to get more juice out of focusing time/resources/effort on increasing overall participation.
Re: extremizing—I haven’t read specific papers on this (though there are probably some out there from the IARPA ACE program, if I had to guess). This might be related, but I admit I haven’t actually read it :) - https://arxiv.org/pdf/1506.06405.pdf
But we’ve seen improvements in the aggregate forecast’s Brier score if we apply very basic extremization to it (ie. anything <50% gets pushed closer to 0, anything above 50% gets pushed closer to 100%). This was true even when we showed the crowd forecast to individuals. But I’ll also be the first to admit that connecting this to the idea that an overconfidence incentive is a good thing is purely speculative and is not something we’ve explicitly tested/investigated.
Re: question selection—I agree that there are some edge cases where the scoring system doesn’t have perfect incentives around question selection (Nuno’s being a good example). But for us, getting people to forecast at all in these tournaments has been a much, much bigger problem than any question selection nuances inherent in the scoring system. If improving the overall system accuracy is the primary goal, we’re much more likely (IMO) to get more juice out of focusing time/resources/effort on increasing overall participation.
Re: extremizing—I haven’t read specific papers on this (though there are probably some out there from the IARPA ACE program, if I had to guess). This might be related, but I admit I haven’t actually read it :) - https://arxiv.org/pdf/1506.06405.pdf
But we’ve seen improvements in the aggregate forecast’s Brier score if we apply very basic extremization to it (ie. anything <50% gets pushed closer to 0, anything above 50% gets pushed closer to 100%). This was true even when we showed the crowd forecast to individuals. But I’ll also be the first to admit that connecting this to the idea that an overconfidence incentive is a good thing is purely speculative and is not something we’ve explicitly tested/investigated.