If one had access to the individual predictions, one could also try to take 1000 random bootstrap samples of size 1 of all the predictions, then 1000 random bootstrap samples of size 2, and so on and measure how accuracy changes with larger random samples. This might also be possible with data from other prediction sites.
I discussed this with Charles. It’s not possible to do exactly this with the API, but we can approximate this by looking at the final predictions just before close.
We can see that:
Questions with more predictors have better brier scores (regardless of # of predictors sampled)
Performance increases with # of predictors up to ~100 predictors
To account for the different brier scores based on groups of questions, I have normalized by subtracting off the performance of 8 predictors. This makes point 2 from above more clear to see.
When discussing this with Charles he suggested that questions which are ~0 / 1 are more popular and therefore they look easier. Excluding them, those charts look as follows:
Amazingly this seems to be ~all of the effect making more popular questions “easier”!
(NB: there’s only 22 questions with >= 256 predictors and 5% < p < 95% so the error bars on that cyan line should be quite wide)
Hi Simon, I’m working on a follow-up to this post that uses individual-level data. Could you please give some detail on how you “sampled” k predictors? As in, did you have access to individual data and could actually do the sampling? I’m not entirely sure what the x-axis in your plot means and what the difference betwenn “>N predictors” and “k predictors” is. Thank you!
I discussed this with Charles. It’s not possible to do exactly this with the API, but we can approximate this by looking at the final predictions just before close.
We can see that:
Questions with more predictors have better brier scores (regardless of # of predictors sampled)
Performance increases with # of predictors up to ~100 predictors
To account for the different brier scores based on groups of questions, I have normalized by subtracting off the performance of 8 predictors. This makes point 2 from above more clear to see.
When discussing this with Charles he suggested that questions which are ~0 / 1 are more popular and therefore they look easier. Excluding them, those charts look as follows:
Amazingly this seems to be ~all of the effect making more popular questions “easier”!
(NB: there’s only 22 questions with >= 256 predictors and 5% < p < 95% so the error bars on that cyan line should be quite wide)
Hi Simon, I’m working on a follow-up to this post that uses individual-level data. Could you please give some detail on how you “sampled” k predictors? As in, did you have access to individual data and could actually do the sampling? I’m not entirely sure what the x-axis in your plot means and what the difference betwenn “>N predictors” and “k predictors” is. Thank you!
iirc, there is access to the histogram, which tells you how many people predicted each %age. I then sampled k predictors from that distribution.
“k predictors” is the number of samples I was looking at
”>N predictors” was the total number of people who predicted on a given question