but it seems very odd to think of it as a Facebook feature (or other social media platform)
Yeah, maybe all of this is a bit fantastical. :)
Facebook and social media in general don’t really have an intellectual “brand”. It seems likely that if you did this as a Facebook feature, it would be more likely to get dismissed as “just another silly Facebook game.” Or if most of the people using it weren’t putting much effort into it, the predictionslikely wouldn’t be that accurate, and that could undermine the effort to convince the public of its value.
That’s certainly possible. For what it’s worth, while Facebook’s Forecast was met with some amount of skepticism, I wouldn’t say it was “dismissed” out of hand. The forecasting accuracy of Forecast’s users was also fairly good: “Forecast’s midpoint brier score [...] across all closed Forecasts over the past few months is 0.204, compared to Good Judgement’s published result of 0.227 for prediction markets.”
However, it’s true that a greater integration with Facebook would probably make the feature more controversial and also result in a lower forecasting accuracy.
Btw, Facebook is just one example—I write this because you seem to focus exclusively on Facebook in your comment. In some ways, Twitter might be more appropriate for such features.
Am I understanding correctly that each user is given one prediction score that applies to all their content? So that means that if someone is bad (good) at predicting COVID case counts, then if they post something else it gets down- (up-) weighted, even if the something else has nothing to do with COVID
That would be the less complicated option. It might be perceived as being unfair—not sure if this will be a big problem though.
I’m working under the assumption that people who make more correct forecasts in one domain will also tend to have a more accurate model of the world in other domains—on average, of course, there will be (many) exceptions. I’m not saying this is ideal; it’s just an improvement over the status quo where forecasting accuracy practically doesn’t matter all in determining how many people read your content.
Or do you have some system to figure out which forecasting questions count toward the recommender score for which pieces of content?
That would be the other, more complicated alternative. Perhaps this is feasible when using more coarse-grained domains like politics, medicine, technology, entertainment, et cetera, maybe in combination with machine learning.
Even then it seems weird—if someone made bad predictions about COVID in the past, that doesn’t necessarily imply that content they post now is bad.
Well, sure. But across all users there will likely be a positive correlation between past and future accuracy. I think it would be good for the world if people who made more correct forecasts about COVID in the past would receive more “views” than those who made more incorrect forecasts about COVID—even though it’s practically guaranteed that some people in the latter group will improve a lot (though in that case, they will be rewarded by the recommender system in the future for that) and even make better forecasts than people in the former group.
Presumably the purpose of this is to teach people how to be better forecasters.
I wouldn’t say that’s the main purpose.
If you have to hide other people’s forecasts to prevent abuse, then how are you supposed to learn by watching other forecasters?
My understanding is that’s how other platforms, like e.g. Metaculus, work as well. Of course, people can still write comments about what they forecasted and how they arrived at their conclusions.
Also, I think one can become better at forecasting on one’s own? (I think most people get better calibrated when they do calibration exercises on their own—they don’t need to watch other people do it.)
All the comparisons between forecasting and traditional fact-checking are weird because they seem to address different issues; forecasting doesn’t seem to be a replacement or alternative to fact-checking.
I didn’t mean to suggest that forecasting should replace fact-checking (though I can now see how our post and appendix conveyed that message). When comparing forecasting to fact-checking, I had in mind whether one should design recommendation algorithms to punish people whose statements were labeled false by fact-checkers.
In general, this seems to require a lot of editorial judgment on the part of Facebook as to what forecasting questions to use and what resolution criteria. [...] My guess is that this sort of editorial role is not something that social media platforms would be particularly enthusiastic about
Yeah, they certainly would be reluctant to do that. But given that they already do fact-checking, it doesn’t seem impossible.
Another way to game the system that you didn’t mention here: set up a bunch of accounts, make different predictions on each of them, and then abandon all the ones that got low scores, and start posting the stuff you want on the account that got a high score.
I agree that this is an issue. In practice, it doesn’t seem that concerning though. First, the recommendation algorithm would obviously need to take into account the number of forecasts in addition to their average accuracy in order to minimize rewarding statistical flukes. (Similarly to how Yelp displays restaurants with, say, an average of 4.5 rating but 100 ratings more prominently than restaurants with an average rating of 5.0 but only 5 ratings.) Thus, you would actually need to put in a lot of work to make this worthwhile (and set up, say, hundreds of accounts) or get very lucky (which is of course always possible).
It would probably also be prudent to put in some sort of decay to the forecasting accuracy boosting (such that a good forecasting accuracy, say, 10 years ago matters less than a good forecasting accuracy in this year) in order to incentivize users to continue making forecasts. Otherwise, people who achieved a very high forecasting accuracy in year 1 would be inclined to stop forecasting in order to avoid a regression to the mean.
I wonder if it might make more sense to think of this as a feature on a website like FiveThirtyEight that already has an audience that’s interested in probabilistic predictions and models.
Yeah, that’s an interesting idea. On the other hand, FiveThirtyEight is much smaller and it’s readers are presumably already more sophisticated so the potential upside seems smaller.
That being said, I agree that it might make more sense to focus on platforms with a more sophisticated user base (like, say, Substack). Or focus on news outlets like, say, the Washington Post. That might even be more promising.
To clarify, when I made the comment about it being “dismissed”, I wasn’t thinking so much about media coverage as I was about individual Facebook users seeing prediction app suggestions in their feed I was thinking that there are already a lot of unscientific and clickbait-y quizzes and games that get posted to Facebook, and was concerned that users might lump this in with those if it is presented in a similar way.
Yeah, they certainly would be reluctant to do that. But given that they already do fact-checking, it doesn’t seem impossible.
I agree, and I definitely admit that the existence of the Facebook Forecast app is evidence against my view. I was more focused on the idea that if the recommender algorithm is based on prediction scores, that would mean that Facebook’s choice of which questions to use would affect the recommendations across Facebook.
The forecasting accuracy of Forecast’s users was also fairly good: “Forecast’s midpoint brier score [...] across all closed Forecasts over the past few months is 0.204, compared to Good Judgement’s published result of 0.227 for prediction markets.”
For what it’s worth , as noted in Nuño’s comment this comparison holds little weight when the questions aren’t the same or on the same time scales; I’d take it as fairly weak evidence from my prior that real-money prediction markets are much more accurate.
as noted in Nuño’s comment this comparison holds little weight when the questions aren’t the same or on the same time scales
Right, definitely, I forgot to add this. I wasn’t trying to say that Forecast is more accurate than real-money prediction markets (or other forecasting platforms for that matter) but rather that Forecasts’ forecasting accuracy is at least clearly above the this-is-silly level.
Also, I think one can become better at forecasting on one’s own? (I think most people get better calibrated when they do calibration exercises on their own—they don’t need to watch other people do it.)
You also get feedback in form of the community median prediction on Metaculus and GJOpen, which in my experience is usually useful as feedback. Though I do think in general following the reasoning of competent individuals is very useful, but I think the comments and helpful people that enjoy teaching their skills do a solid job covering that.
Thanks for your detailed comment.
Yeah, maybe all of this is a bit fantastical. :)
That’s certainly possible. For what it’s worth, while Facebook’s Forecast was met with some amount of skepticism, I wouldn’t say it was “dismissed” out of hand. The forecasting accuracy of Forecast’s users was also fairly good: “Forecast’s midpoint brier score [...] across all closed Forecasts over the past few months is 0.204, compared to Good Judgement’s published result of 0.227 for prediction markets.”
However, it’s true that a greater integration with Facebook would probably make the feature more controversial and also result in a lower forecasting accuracy.
Btw, Facebook is just one example—I write this because you seem to focus exclusively on Facebook in your comment. In some ways, Twitter might be more appropriate for such features.
That would be the less complicated option. It might be perceived as being unfair—not sure if this will be a big problem though.
I’m working under the assumption that people who make more correct forecasts in one domain will also tend to have a more accurate model of the world in other domains—on average, of course, there will be (many) exceptions. I’m not saying this is ideal; it’s just an improvement over the status quo where forecasting accuracy practically doesn’t matter all in determining how many people read your content.
That would be the other, more complicated alternative. Perhaps this is feasible when using more coarse-grained domains like politics, medicine, technology, entertainment, et cetera, maybe in combination with machine learning.
Well, sure. But across all users there will likely be a positive correlation between past and future accuracy. I think it would be good for the world if people who made more correct forecasts about COVID in the past would receive more “views” than those who made more incorrect forecasts about COVID—even though it’s practically guaranteed that some people in the latter group will improve a lot (though in that case, they will be rewarded by the recommender system in the future for that) and even make better forecasts than people in the former group.
I wouldn’t say that’s the main purpose.
My understanding is that’s how other platforms, like e.g. Metaculus, work as well. Of course, people can still write comments about what they forecasted and how they arrived at their conclusions.
Also, I think one can become better at forecasting on one’s own? (I think most people get better calibrated when they do calibration exercises on their own—they don’t need to watch other people do it.)
I didn’t mean to suggest that forecasting should replace fact-checking (though I can now see how our post and appendix conveyed that message). When comparing forecasting to fact-checking, I had in mind whether one should design recommendation algorithms to punish people whose statements were labeled false by fact-checkers.
Yeah, they certainly would be reluctant to do that. But given that they already do fact-checking, it doesn’t seem impossible.
I agree that this is an issue. In practice, it doesn’t seem that concerning though. First, the recommendation algorithm would obviously need to take into account the number of forecasts in addition to their average accuracy in order to minimize rewarding statistical flukes. (Similarly to how Yelp displays restaurants with, say, an average of 4.5 rating but 100 ratings more prominently than restaurants with an average rating of 5.0 but only 5 ratings.) Thus, you would actually need to put in a lot of work to make this worthwhile (and set up, say, hundreds of accounts) or get very lucky (which is of course always possible).
It would probably also be prudent to put in some sort of decay to the forecasting accuracy boosting (such that a good forecasting accuracy, say, 10 years ago matters less than a good forecasting accuracy in this year) in order to incentivize users to continue making forecasts. Otherwise, people who achieved a very high forecasting accuracy in year 1 would be inclined to stop forecasting in order to avoid a regression to the mean.
Yeah, that’s an interesting idea. On the other hand, FiveThirtyEight is much smaller and it’s readers are presumably already more sophisticated so the potential upside seems smaller.
That being said, I agree that it might make more sense to focus on platforms with a more sophisticated user base (like, say, Substack). Or focus on news outlets like, say, the Washington Post. That might even be more promising.
To clarify, when I made the comment about it being “dismissed”, I wasn’t thinking so much about media coverage as I was about individual Facebook users seeing prediction app suggestions in their feed I was thinking that there are already a lot of unscientific and clickbait-y quizzes and games that get posted to Facebook, and was concerned that users might lump this in with those if it is presented in a similar way.
I agree, and I definitely admit that the existence of the Facebook Forecast app is evidence against my view. I was more focused on the idea that if the recommender algorithm is based on prediction scores, that would mean that Facebook’s choice of which questions to use would affect the recommendations across Facebook.
For what it’s worth , as noted in Nuño’s comment this comparison holds little weight when the questions aren’t the same or on the same time scales; I’d take it as fairly weak evidence from my prior that real-money prediction markets are much more accurate.
Right, definitely, I forgot to add this. I wasn’t trying to say that Forecast is more accurate than real-money prediction markets (or other forecasting platforms for that matter) but rather that Forecasts’ forecasting accuracy is at least clearly above the this-is-silly level.
You also get feedback in form of the community median prediction on Metaculus and GJOpen, which in my experience is usually useful as feedback. Though I do think in general following the reasoning of competent individuals is very useful, but I think the comments and helpful people that enjoy teaching their skills do a solid job covering that.