Thanks!
Weāre currently providing calibration and accuracy stats to our grant investigators through our Salesforce app in the hopes that theyāll find that feedback useful and actionable.
Iām not sure and Iād have to defer to decision-makers at OP. My model of them is that predictions are just one piece of evidence they look at.
Very good point!
I see a few ways of assessing āglobal overconfidenceā:
Lump all predictions into two bins (under and over 50%) and check that the lower point is above the diagonal and the upper one is below the diagonal. I just did this and the points are where youād expect if we were overconfident, but the 90% credible intervals still overlap with the diagonal, so pooling all the bins in this way still provides weak evidence of overconfidence.
Calculating the OC score as defined by Metaculus (scroll down to the bottom of the page and click the
(+)
sign next toDetails
). A score between 0 and 1 indicates overconfidence. Open Philās score is 0.175, so this is evidence that weāre overconfident. I donāt know how to put a meaningful confidence/ācredible interval on that number, so itās hard to say how strong this evidence is.Run a linear regression on the calibration curve and check that the slope is <1. When I do this for the original curve with 10 points, statsmodels OLS method spits out [0.772, 0.996] as a 95% confidence interval for the slope. I see this as stronger evidence of overconfidence than the previous ones.