Misha_Yagudin comments on Two directions for research on forecasting and decision making

Misha_Yagudin 13 Mar 2023 17:19 UTC
5 points
0 ∶ 0
Thanks for highlighting Beadle (2022), I will add it to our review!

I wonder how FFI Superforecasters were selected? It’s important to first select forecasters who are doing good and then evaluate their performance on new questions to avoid the issue of “training and testing on the same data.”
- Paal Fredrik Skjørten Kvarberg 16 Mar 2023 9:20 UTC
  1 point
  0 ∶ 0
  Parent
  Good question! There were many differences between the approaches by FFI and the GJP. One of them is that no superforecasters were selected and grouped in the FFI tournament.
  Here is google’s translation of a relevant passage: “In FFI’s tournament, the super forecasters consist of the 60 best participants overall. FFI’s tournament was not conducted one year at a time, but over three consecutive years, where many of the questions were not decided during the current year and the participants were not divided into experimental groups. It is therefore not appropriate to identify new groups of super forecasters along the way” (2022, 168). You can translate the entirety of 5.4 here for further clarification on how Beadle defines superforecasters in the FFI tournament.
  - Misha_Yagudin 16 Mar 2023 18:26 UTC
    3 points
    0 ∶ 0
    Parent
    So it’s fair to say that FFI-supers were selected and evaluated on the same data? This seems concerning. Specifically, on which questions the top-60 were selected, and on which questions the below scores were calculated? Did these sets of questions overlap?
    
    The standardised Brier scores of FFI superforecasters (–0.36) were almost perfectly similar to that of the initial forecasts of superforecasters in GJP (–0.37). [17] Moreover, even though regular forecasters in the FFI tournament were worse at prediction than GJP forecasters overall (probably due to not updating, training or grouping), the relative accuracy of FFI’s superforecasters compared to regular forecasters (-0.06), and to defence researchers with access to classified information (–0.1) was strikingly similar.[18]
    - Paal Fredrik Skjørten Kvarberg 18 Mar 2023 9:07 UTC
      1 point
      0 ∶ 0
      Parent
      Yes, the 60 FFI supers were selected and evaluated on the same 150 questions (Beadle, 2022, 169-170). Beadle also identified the top 100 forecasters based on the first 25 questions, and evaluated their performance on the basis of the remaining 125 questions to see if their accuracy was stable over time, or due to luck. Similarly to the GJP studies, he found that they were consistent over time (Beadle, 2022, 128-131).
      I should note that I have not studied the report very thoroughly, so I may be mistaken about this. I’ll have a closer look when I have the time and correct the answer above if it is wrong!