It is true that given the primary source (presumably this), the implication is that rounding supers to 0.1 hurt them, but 0.05 didnât:
To explore this relationship, we rounded forecasts to the nearest 0.05, 0.10, or 0.33 to see whether Brier scores became less accurate on the basis of rounded forecasts rather than unrounded forecasts. [...]
For superforecasters, rounding to the nearest 0.10 produced significantly worse Brier scores [by implication, rounding to the nearest 0.05 did not]. However, for the other two groups, rounding to the nearest 0.10 had no influence. It was not until rounding was done to the nearest 0.33 that accuracy declined.
Prolonged aside:
That said, despite the absent evidence Iâm confident accuracy with superforecasters (and ~anyone elseâmore later, and elsewhere) does numerically drop with rounding to 0.05 (or anything else), even if has not been demonstrated to be statistically significant:
From first principles, if the estimate has signal, shaving bits of information from it by rounding should make it less accurate (and it obviously shouldnât make it more accurate, pretty reliably setting the upper bound of our uncertainty to 0).
Further, there seems very little motivation for the idea we have n discrete âbinsâ of probability across the number line (often equidistant!) inside our heads, and as we become better forecasters n increases. That we have some standard error to our guesses (which ~smoothly falls with increasing skill) seems significantly more plausible. As such the âroundingâ tests should be taken as loose proxies to assess this error.
Yet if error process is this, rather than ân real values + jitter no more than 0.025â, undersampling and aliasing should introduce a further distortion. Even if you think there really are n bins someone can âreallyâ discriminate between, intermediate values are best seen as a form of anti-aliasing (âThink it is more likely 0.1 than 0.15, but not sure, maybe its 60â40 between them so Iâll say 0.12â) which rounding ablates. In other words âaccurate to the nearest 0.1â does not mean the second decimal place carries no information.
Also, if you are forecasting distributions rather than point estimates (cf. Metaculus), said forecast distributions typically imply many intermediate value forecasts.
Empirically, thereâs much to suggest a T2 error explanation of the lack of a âsignificantâ drop. As youâd expect, the size of the accuracy loss grows with both how coarsely things are rounded, and the performance of the forecaster. Even if relatively finer coarsening makes things slightly worse, we may expect to miss it. This looks better to me on priors than these trends âhitting a wallâ at a given level of granularity (so Iâd guess untrained forecasters are numerically worse if rounded to 0.1, even if the worse performance means there is less signal to be lost, and in turn makes this hard to âstatistically significantlyâ detect).
Iâd adduce other facts against too. One is simply that superforecasters are prone to not give forecasts on a 5% scale, using intermediate values instead: given their good callibration, youâd expect them to iron out this Brier-score-costly jitter (also, this would be one of the few things they are doing worse than regular forecasters). Youâd also expect discretization in things like their calibration curve (e.g. events they say happen 12% of the time in fact happen 10% of time, whilst events that they say happen 13% of the time in fact happen 15% of the time), or other derived figures like ROC.
This is ironically foxy, so I wouldnât be shocked for this to be slain by the numerical data. But Iâd bet at good odds (north of 3:1) things like âTypically, for âsuperforecastsâ of X%, these events happened more frequently than those forecast at (X-1)%, (X-2)%, etc.â
It always seemed strange to me that the idea was expressed as âroundingâ. Replacing a 50.4% with 50% seems relatively innocuous to me; replacing 0.6% with 1% - or worse, 0.4% with 0% - seems like a very different thing altogether!
I think I broadly agree with what you say and will not bet against your last paragraph, except for the trivial sense that I expect most studies to be too underpowered to detect those differences.
It is true that given the primary source (presumably this), the implication is that rounding supers to 0.1 hurt them, but 0.05 didnât:
Prolonged aside:
That said, despite the absent evidence Iâm confident accuracy with superforecasters (and ~anyone elseâmore later, and elsewhere) does numerically drop with rounding to 0.05 (or anything else), even if has not been demonstrated to be statistically significant:
From first principles, if the estimate has signal, shaving bits of information from it by rounding should make it less accurate (and it obviously shouldnât make it more accurate, pretty reliably setting the upper bound of our uncertainty to 0).
Further, there seems very little motivation for the idea we have n discrete âbinsâ of probability across the number line (often equidistant!) inside our heads, and as we become better forecasters n increases. That we have some standard error to our guesses (which ~smoothly falls with increasing skill) seems significantly more plausible. As such the âroundingâ tests should be taken as loose proxies to assess this error.
Yet if error process is this, rather than ân real values + jitter no more than 0.025â, undersampling and aliasing should introduce a further distortion. Even if you think there really are n bins someone can âreallyâ discriminate between, intermediate values are best seen as a form of anti-aliasing (âThink it is more likely 0.1 than 0.15, but not sure, maybe its 60â40 between them so Iâll say 0.12â) which rounding ablates. In other words âaccurate to the nearest 0.1â does not mean the second decimal place carries no information.
Also, if you are forecasting distributions rather than point estimates (cf. Metaculus), said forecast distributions typically imply many intermediate value forecasts.
Empirically, thereâs much to suggest a T2 error explanation of the lack of a âsignificantâ drop. As youâd expect, the size of the accuracy loss grows with both how coarsely things are rounded, and the performance of the forecaster. Even if relatively finer coarsening makes things slightly worse, we may expect to miss it. This looks better to me on priors than these trends âhitting a wallâ at a given level of granularity (so Iâd guess untrained forecasters are numerically worse if rounded to 0.1, even if the worse performance means there is less signal to be lost, and in turn makes this hard to âstatistically significantlyâ detect).
Iâd adduce other facts against too. One is simply that superforecasters are prone to not give forecasts on a 5% scale, using intermediate values instead: given their good callibration, youâd expect them to iron out this Brier-score-costly jitter (also, this would be one of the few things they are doing worse than regular forecasters). Youâd also expect discretization in things like their calibration curve (e.g. events they say happen 12% of the time in fact happen 10% of time, whilst events that they say happen 13% of the time in fact happen 15% of the time), or other derived figures like ROC.
This is ironically foxy, so I wouldnât be shocked for this to be slain by the numerical data. But Iâd bet at good odds (north of 3:1) things like âTypically, for âsuperforecastsâ of X%, these events happened more frequently than those forecast at (X-1)%, (X-2)%, etc.â
It always seemed strange to me that the idea was expressed as âroundingâ. Replacing a 50.4% with 50% seems relatively innocuous to me; replacing 0.6% with 1% - or worse, 0.4% with 0% - seems like a very different thing altogether!
I think I broadly agree with what you say and will not bet against your last paragraph, except for the trivial sense that I expect most studies to be too underpowered to detect those differences.