I agree with your concerns on using a pure Brier score with open platforms. I expect that currently it makes the most sense within “tournaments” where participants are answering every question. Technically, I think some sort of objective, proper scoring rule is a prerequisite to a more advanced scoring system that conveys more useful information in open contexts.
I’ve seen some sort of a “relative Brier score” referenced frequently in associated research (definitely in the good judgement project papers, at a minimum) that scored forecasters based on the difficulty of each question, as determined by the performance of others who forecasted it. This seems promising, and I expect there are a lot of options in that direction.
I agree with your concerns on using a pure Brier score with open platforms. I expect that currently it makes the most sense within “tournaments” where participants are answering every question. Technically, I think some sort of objective, proper scoring rule is a prerequisite to a more advanced scoring system that conveys more useful information in open contexts.
I’ve seen some sort of a “relative Brier score” referenced frequently in associated research (definitely in the good judgement project papers, at a minimum) that scored forecasters based on the difficulty of each question, as determined by the performance of others who forecasted it. This seems promising, and I expect there are a lot of options in that direction.