In general if we’re asking about what has a “poor” track record, it would be good to think about quantification and comparison to alternatives. Note that we’d consider sites like Wikipedia as examples of institutions doing a form of truth evaluation.
Discussions of fact-checking institutions often focus on some concrete case that they got wrong; but they are bound to get some things wrong. The questions are :
What’s the overall track record over all statements (including those that seem easy/obvious)?
How well do they do against alternatives?
Analogously people often point out some particular cases where prediction markets did badly, but advocates of prediction markets just claim that they are at least as accurate over all as alternative prediction mechanisms. And right now many questions humans ask are not controversial (e.g. science questions, local questions). But AI currently says false things about these questions! So there’s lots of room for improvement without even touching the controversial stuff (though eventually one wants some relatively graceful handling of controversy).
In general if we’re asking about what has a “poor” track record, it would be good to think about quantification and comparison to alternatives. Note that we’d consider sites like Wikipedia as examples of institutions doing a form of truth evaluation.
Discussions of fact-checking institutions often focus on some concrete case that they got wrong; but they are bound to get some things wrong. The questions are :
What’s the overall track record over all statements (including those that seem easy/obvious)?
How well do they do against alternatives?
Analogously people often point out some particular cases where prediction markets did badly, but advocates of prediction markets just claim that they are at least as accurate over all as alternative prediction mechanisms. And right now many questions humans ask are not controversial (e.g. science questions, local questions). But AI currently says false things about these questions! So there’s lots of room for improvement without even touching the controversial stuff (though eventually one wants some relatively graceful handling of controversy).
(Thanks to Owain for most of these points.)