Ofer comments on Challenges in evaluating forecaster performance

Ofer Sep 12, 2020, 5:26 AM
1 point
1 vote
Overall karma indicates overall quality.
0 ∶ 0
Total points: 0
Agreement karma indicates agreement, separate from overall quality.
I didn’t follow that last sentence.

Notice that in the limit it’s obvious we should expect the forecasting frequency to affect the average daily Brier score: Suppose Alice makes a new forecast every day while Bob only makes a single forecast (which is equivalent to him making an initial forecast and then blindly making the same forecast every day until the question closes).
- Misha_Yagudin Sep 12, 2020, 10:43 AM
  2 points
  2 votes
  Overall karma indicates overall quality.
  0 ∶ 0
  Total points: 0
  Agreement karma indicates agreement, separate from overall quality.
  Parent
  re: limit — a nice example. Please notice, that Bob makes a forecast on a (uniformly) random day, so when you take an expectation over the days he is making forecasts on you get the average of scores for all days as if he forecasted every day.
  Let $N$ be the number of total days, $P_{d} = \frac{1}{N}$ be the probability Bob forecasted on a day $d$ , ${Brier}_{d}$ be the brier score of the forecast made on day $d$ :
  $\begin{matrix} E avg. Brier & = \sum d P_{d} \times \frac{{Brier}_{d} \times num. days forecast will be active}{total num. of active days} = \sum d P_{d} \times \frac{{Brier}_{d} \times (N - d)}{N - d} = \sum d P_{d} \times {Brier}_{d} = \frac{\sum {Brier}_{d}}{N} \end{matrix} .$
  I am a bit surprised that it worked out here because it breaks the assumption of the equality of the expected number of days forecast will be active. Lack of this assumption will play out if when aggregating over multiple questions [weighted by the number of active days]. Still, I hope this example gives helpful intuitions
  .
  What links here?
  - Misha_Yagudin's comment on Challenges in evaluating forecaster performance by Gregory Lewis🔸 (Sep 12, 2020, 3:37 PM; 1 point)
  - Ofer Sep 12, 2020, 2:13 PM
    1 point
    1 vote
    Overall karma indicates overall quality.
    0 ∶ 0
    Total points: 0
    Agreement karma indicates agreement, separate from overall quality.
    Parent
    Thanks for the explanation!
    
    I don’t think this formal argument conflicts with the claim that we should expect the forecasting frequency to affect the average daily Brier score. In the example that Flodorner gave where the forecast is essentially resolved before the official resolution date, Alice will have perfect daily Brier scores: ${Brier}_{d} = 0$ , for any $d > N^{'}$ , while in those days Bob will have imperfect Brier scores: ${Brier}_{d} = B r i e r_{N^{'}}$ .
    - Misha_Yagudin Sep 12, 2020, 3:40 PM
      1 point
      1 vote
      Overall karma indicates overall quality.
      0 ∶ 0
      Total points: 0
      Agreement karma indicates agreement, separate from overall quality.
      Parent
      Thanks for challenging me :) I wrote my takes after this discussion above.