Good point on the correlated outcomes. I think you’re right that cross-domain performance could be a good measure, especially since performance in a single domain could be driven by having a single foundational prior that turned out to be right, rather than genuine forecasting skill.
On the second point, I’m pretty sure the Metaculus results already just compare your Brier to the community based on the same set of questions. So you could base inter-forecaster comparisons based on that difference (weakly).
Good point on the correlated outcomes. I think you’re right that cross-domain performance could be a good measure, especially since performance in a single domain could be driven by having a single foundational prior that turned out to be right, rather than genuine forecasting skill.
On the second point, I’m pretty sure the Metaculus results already just compare your Brier to the community based on the same set of questions. So you could base inter-forecaster comparisons based on that difference (weakly).