I’m not an expert on social media or journalism, but just some fairly low-confidence thoughts—it seems like this is areally interesting idea, but it seems very odd to think of it as a Facebook feature (or other social media platform):
Facebook and social media in general don’t really have an intellectual “brand”. It seems likely that if you did this as a Facebook feature, it would be more likely to get dismissed as “just another silly Facebook game.” Or if most of the people using it weren’t putting much effort into it, the predictionslikely wouldn’t be that accurate, and that could undermine the effort to convince the public of its value.
The part about promoting people with high prediction scores seems awkward. Am I understanding correctly that each user is given one prediction score that applies to all their content? So that means that if someone is bad (good) at predicting COVID case counts, then if they post something else it gets down- (up-) weighted, even if the something else has nothing to do with COVID? That’s likely to be perceived as very unfair. Or do you have some system to figure out which forecasting questions count toward the recommender score for which pieces of content? Even then it seems weird—if someone made bad predictions about COVID in the past, that doesn’t necessarily imply that content they post now is bad.
Presumably the purpose of this is to teach people how to be better forecasters. If you have to hide other people’s forecasts to prevent abuse, then how are you supposed to learn by watching other forecasters? Maybe the idea is that Facebook would produce content designed to teach forecasting—but that isn’t the kind of content that Facebook normally produces, and I’m not sure why we would expect Facebook to be particularly good at that.
All the comparisons between forecasting and traditional fact-checking are weird because they seem to address different issues; forecasting doesn’t seem to be a replacement or alternative to fact-checking. For instance, how would forecasting have helped to fight election misinformation? If you had a bunch of prediction questions about things like vote counts or the outcomes of court cases, by the time those questions resolved everything would be already over. (That’s not a problem with forecasting, since it’s not intended for those kinds of cases. But it does mean that it would not be possible to pitch this as an alternative to traditional fact-checking.)
In general, this seems to require a lot of editorial judgment on the part of Facebook as to what forecasting questions to use and what resolution criteria. (Especially this would be an issue if you were to use a user’s general forecasting score as part of the recommender algorithm—for instance, if Facebook included lots of forecasting questions about economic data, that would end up advantaging content posted by people who are interested in economics, while if the forecasting questions were about scientific discoveries instead, then it would instead advantage content posted by people who are interested in science.) My guess is that this sort of editorial role is not something that social media platforms would be particularly enthusiastic about—they were sort of forced into it by the misinformation problem, but in that case they mostly defer to reputable sources to adjudicate claims. While they could defer to reputable sources to resolve questions, I’m not sure who they would defer to to decide what questions to set up. (I’m assuming here that the platform is the one setting up the questions—is that the case?)
Another way to game the system that you didn’t mention here: set up a bunch of accounts, make different predictions on each of them, and then abandon all the ones that got low scores, and start posting the stuff you want on the account that got a high score.
I wonder if it might make more sense to think of this as a feature on a website like FiveThirtyEight that already has an audience that’s interested in probabilistic predictions and models. You could have a regular feature similar to The Riddler but for forecasting questions—each column could have several questions, you could have readers write in to make forecasts and explain their reasoning, and then publish the reasoning of the people who ended up most accurate, along with commentary.
but it seems very odd to think of it as a Facebook feature (or other social media platform)
Yeah, maybe all of this is a bit fantastical. :)
Facebook and social media in general don’t really have an intellectual “brand”. It seems likely that if you did this as a Facebook feature, it would be more likely to get dismissed as “just another silly Facebook game.” Or if most of the people using it weren’t putting much effort into it, the predictionslikely wouldn’t be that accurate, and that could undermine the effort to convince the public of its value.
That’s certainly possible. For what it’s worth, while Facebook’s Forecast was met with some amount of skepticism, I wouldn’t say it was “dismissed” out of hand. The forecasting accuracy of Forecast’s users was also fairly good: “Forecast’s midpoint brier score [...] across all closed Forecasts over the past few months is 0.204, compared to Good Judgement’s published result of 0.227 for prediction markets.”
However, it’s true that a greater integration with Facebook would probably make the feature more controversial and also result in a lower forecasting accuracy.
Btw, Facebook is just one example—I write this because you seem to focus exclusively on Facebook in your comment. In some ways, Twitter might be more appropriate for such features.
Am I understanding correctly that each user is given one prediction score that applies to all their content? So that means that if someone is bad (good) at predicting COVID case counts, then if they post something else it gets down- (up-) weighted, even if the something else has nothing to do with COVID
That would be the less complicated option. It might be perceived as being unfair—not sure if this will be a big problem though.
I’m working under the assumption that people who make more correct forecasts in one domain will also tend to have a more accurate model of the world in other domains—on average, of course, there will be (many) exceptions. I’m not saying this is ideal; it’s just an improvement over the status quo where forecasting accuracy practically doesn’t matter all in determining how many people read your content.
Or do you have some system to figure out which forecasting questions count toward the recommender score for which pieces of content?
That would be the other, more complicated alternative. Perhaps this is feasible when using more coarse-grained domains like politics, medicine, technology, entertainment, et cetera, maybe in combination with machine learning.
Even then it seems weird—if someone made bad predictions about COVID in the past, that doesn’t necessarily imply that content they post now is bad.
Well, sure. But across all users there will likely be a positive correlation between past and future accuracy. I think it would be good for the world if people who made more correct forecasts about COVID in the past would receive more “views” than those who made more incorrect forecasts about COVID—even though it’s practically guaranteed that some people in the latter group will improve a lot (though in that case, they will be rewarded by the recommender system in the future for that) and even make better forecasts than people in the former group.
Presumably the purpose of this is to teach people how to be better forecasters.
I wouldn’t say that’s the main purpose.
If you have to hide other people’s forecasts to prevent abuse, then how are you supposed to learn by watching other forecasters?
My understanding is that’s how other platforms, like e.g. Metaculus, work as well. Of course, people can still write comments about what they forecasted and how they arrived at their conclusions.
Also, I think one can become better at forecasting on one’s own? (I think most people get better calibrated when they do calibration exercises on their own—they don’t need to watch other people do it.)
All the comparisons between forecasting and traditional fact-checking are weird because they seem to address different issues; forecasting doesn’t seem to be a replacement or alternative to fact-checking.
I didn’t mean to suggest that forecasting should replace fact-checking (though I can now see how our post and appendix conveyed that message). When comparing forecasting to fact-checking, I had in mind whether one should design recommendation algorithms to punish people whose statements were labeled false by fact-checkers.
In general, this seems to require a lot of editorial judgment on the part of Facebook as to what forecasting questions to use and what resolution criteria. [...] My guess is that this sort of editorial role is not something that social media platforms would be particularly enthusiastic about
Yeah, they certainly would be reluctant to do that. But given that they already do fact-checking, it doesn’t seem impossible.
Another way to game the system that you didn’t mention here: set up a bunch of accounts, make different predictions on each of them, and then abandon all the ones that got low scores, and start posting the stuff you want on the account that got a high score.
I agree that this is an issue. In practice, it doesn’t seem that concerning though. First, the recommendation algorithm would obviously need to take into account the number of forecasts in addition to their average accuracy in order to minimize rewarding statistical flukes. (Similarly to how Yelp displays restaurants with, say, an average of 4.5 rating but 100 ratings more prominently than restaurants with an average rating of 5.0 but only 5 ratings.) Thus, you would actually need to put in a lot of work to make this worthwhile (and set up, say, hundreds of accounts) or get very lucky (which is of course always possible).
It would probably also be prudent to put in some sort of decay to the forecasting accuracy boosting (such that a good forecasting accuracy, say, 10 years ago matters less than a good forecasting accuracy in this year) in order to incentivize users to continue making forecasts. Otherwise, people who achieved a very high forecasting accuracy in year 1 would be inclined to stop forecasting in order to avoid a regression to the mean.
I wonder if it might make more sense to think of this as a feature on a website like FiveThirtyEight that already has an audience that’s interested in probabilistic predictions and models.
Yeah, that’s an interesting idea. On the other hand, FiveThirtyEight is much smaller and it’s readers are presumably already more sophisticated so the potential upside seems smaller.
That being said, I agree that it might make more sense to focus on platforms with a more sophisticated user base (like, say, Substack). Or focus on news outlets like, say, the Washington Post. That might even be more promising.
To clarify, when I made the comment about it being “dismissed”, I wasn’t thinking so much about media coverage as I was about individual Facebook users seeing prediction app suggestions in their feed I was thinking that there are already a lot of unscientific and clickbait-y quizzes and games that get posted to Facebook, and was concerned that users might lump this in with those if it is presented in a similar way.
Yeah, they certainly would be reluctant to do that. But given that they already do fact-checking, it doesn’t seem impossible.
I agree, and I definitely admit that the existence of the Facebook Forecast app is evidence against my view. I was more focused on the idea that if the recommender algorithm is based on prediction scores, that would mean that Facebook’s choice of which questions to use would affect the recommendations across Facebook.
The forecasting accuracy of Forecast’s users was also fairly good: “Forecast’s midpoint brier score [...] across all closed Forecasts over the past few months is 0.204, compared to Good Judgement’s published result of 0.227 for prediction markets.”
For what it’s worth , as noted in Nuño’s comment this comparison holds little weight when the questions aren’t the same or on the same time scales; I’d take it as fairly weak evidence from my prior that real-money prediction markets are much more accurate.
as noted in Nuño’s comment this comparison holds little weight when the questions aren’t the same or on the same time scales
Right, definitely, I forgot to add this. I wasn’t trying to say that Forecast is more accurate than real-money prediction markets (or other forecasting platforms for that matter) but rather that Forecasts’ forecasting accuracy is at least clearly above the this-is-silly level.
Also, I think one can become better at forecasting on one’s own? (I think most people get better calibrated when they do calibration exercises on their own—they don’t need to watch other people do it.)
You also get feedback in form of the community median prediction on Metaculus and GJOpen, which in my experience is usually useful as feedback. Though I do think in general following the reasoning of competent individuals is very useful, but I think the comments and helpful people that enjoy teaching their skills do a solid job covering that.
I’m not an expert on social media or journalism, but just some fairly low-confidence thoughts—it seems like this is areally interesting idea, but it seems very odd to think of it as a Facebook feature (or other social media platform):
Facebook and social media in general don’t really have an intellectual “brand”. It seems likely that if you did this as a Facebook feature, it would be more likely to get dismissed as “just another silly Facebook game.” Or if most of the people using it weren’t putting much effort into it, the predictionslikely wouldn’t be that accurate, and that could undermine the effort to convince the public of its value.
The part about promoting people with high prediction scores seems awkward. Am I understanding correctly that each user is given one prediction score that applies to all their content? So that means that if someone is bad (good) at predicting COVID case counts, then if they post something else it gets down- (up-) weighted, even if the something else has nothing to do with COVID? That’s likely to be perceived as very unfair. Or do you have some system to figure out which forecasting questions count toward the recommender score for which pieces of content? Even then it seems weird—if someone made bad predictions about COVID in the past, that doesn’t necessarily imply that content they post now is bad.
Presumably the purpose of this is to teach people how to be better forecasters. If you have to hide other people’s forecasts to prevent abuse, then how are you supposed to learn by watching other forecasters? Maybe the idea is that Facebook would produce content designed to teach forecasting—but that isn’t the kind of content that Facebook normally produces, and I’m not sure why we would expect Facebook to be particularly good at that.
All the comparisons between forecasting and traditional fact-checking are weird because they seem to address different issues; forecasting doesn’t seem to be a replacement or alternative to fact-checking. For instance, how would forecasting have helped to fight election misinformation? If you had a bunch of prediction questions about things like vote counts or the outcomes of court cases, by the time those questions resolved everything would be already over. (That’s not a problem with forecasting, since it’s not intended for those kinds of cases. But it does mean that it would not be possible to pitch this as an alternative to traditional fact-checking.)
In general, this seems to require a lot of editorial judgment on the part of Facebook as to what forecasting questions to use and what resolution criteria. (Especially this would be an issue if you were to use a user’s general forecasting score as part of the recommender algorithm—for instance, if Facebook included lots of forecasting questions about economic data, that would end up advantaging content posted by people who are interested in economics, while if the forecasting questions were about scientific discoveries instead, then it would instead advantage content posted by people who are interested in science.) My guess is that this sort of editorial role is not something that social media platforms would be particularly enthusiastic about—they were sort of forced into it by the misinformation problem, but in that case they mostly defer to reputable sources to adjudicate claims. While they could defer to reputable sources to resolve questions, I’m not sure who they would defer to to decide what questions to set up. (I’m assuming here that the platform is the one setting up the questions—is that the case?)
Another way to game the system that you didn’t mention here: set up a bunch of accounts, make different predictions on each of them, and then abandon all the ones that got low scores, and start posting the stuff you want on the account that got a high score.
I wonder if it might make more sense to think of this as a feature on a website like FiveThirtyEight that already has an audience that’s interested in probabilistic predictions and models. You could have a regular feature similar to The Riddler but for forecasting questions—each column could have several questions, you could have readers write in to make forecasts and explain their reasoning, and then publish the reasoning of the people who ended up most accurate, along with commentary.
Thanks for your detailed comment.
Yeah, maybe all of this is a bit fantastical. :)
That’s certainly possible. For what it’s worth, while Facebook’s Forecast was met with some amount of skepticism, I wouldn’t say it was “dismissed” out of hand. The forecasting accuracy of Forecast’s users was also fairly good: “Forecast’s midpoint brier score [...] across all closed Forecasts over the past few months is 0.204, compared to Good Judgement’s published result of 0.227 for prediction markets.”
However, it’s true that a greater integration with Facebook would probably make the feature more controversial and also result in a lower forecasting accuracy.
Btw, Facebook is just one example—I write this because you seem to focus exclusively on Facebook in your comment. In some ways, Twitter might be more appropriate for such features.
That would be the less complicated option. It might be perceived as being unfair—not sure if this will be a big problem though.
I’m working under the assumption that people who make more correct forecasts in one domain will also tend to have a more accurate model of the world in other domains—on average, of course, there will be (many) exceptions. I’m not saying this is ideal; it’s just an improvement over the status quo where forecasting accuracy practically doesn’t matter all in determining how many people read your content.
That would be the other, more complicated alternative. Perhaps this is feasible when using more coarse-grained domains like politics, medicine, technology, entertainment, et cetera, maybe in combination with machine learning.
Well, sure. But across all users there will likely be a positive correlation between past and future accuracy. I think it would be good for the world if people who made more correct forecasts about COVID in the past would receive more “views” than those who made more incorrect forecasts about COVID—even though it’s practically guaranteed that some people in the latter group will improve a lot (though in that case, they will be rewarded by the recommender system in the future for that) and even make better forecasts than people in the former group.
I wouldn’t say that’s the main purpose.
My understanding is that’s how other platforms, like e.g. Metaculus, work as well. Of course, people can still write comments about what they forecasted and how they arrived at their conclusions.
Also, I think one can become better at forecasting on one’s own? (I think most people get better calibrated when they do calibration exercises on their own—they don’t need to watch other people do it.)
I didn’t mean to suggest that forecasting should replace fact-checking (though I can now see how our post and appendix conveyed that message). When comparing forecasting to fact-checking, I had in mind whether one should design recommendation algorithms to punish people whose statements were labeled false by fact-checkers.
Yeah, they certainly would be reluctant to do that. But given that they already do fact-checking, it doesn’t seem impossible.
I agree that this is an issue. In practice, it doesn’t seem that concerning though. First, the recommendation algorithm would obviously need to take into account the number of forecasts in addition to their average accuracy in order to minimize rewarding statistical flukes. (Similarly to how Yelp displays restaurants with, say, an average of 4.5 rating but 100 ratings more prominently than restaurants with an average rating of 5.0 but only 5 ratings.) Thus, you would actually need to put in a lot of work to make this worthwhile (and set up, say, hundreds of accounts) or get very lucky (which is of course always possible).
It would probably also be prudent to put in some sort of decay to the forecasting accuracy boosting (such that a good forecasting accuracy, say, 10 years ago matters less than a good forecasting accuracy in this year) in order to incentivize users to continue making forecasts. Otherwise, people who achieved a very high forecasting accuracy in year 1 would be inclined to stop forecasting in order to avoid a regression to the mean.
Yeah, that’s an interesting idea. On the other hand, FiveThirtyEight is much smaller and it’s readers are presumably already more sophisticated so the potential upside seems smaller.
That being said, I agree that it might make more sense to focus on platforms with a more sophisticated user base (like, say, Substack). Or focus on news outlets like, say, the Washington Post. That might even be more promising.
To clarify, when I made the comment about it being “dismissed”, I wasn’t thinking so much about media coverage as I was about individual Facebook users seeing prediction app suggestions in their feed I was thinking that there are already a lot of unscientific and clickbait-y quizzes and games that get posted to Facebook, and was concerned that users might lump this in with those if it is presented in a similar way.
I agree, and I definitely admit that the existence of the Facebook Forecast app is evidence against my view. I was more focused on the idea that if the recommender algorithm is based on prediction scores, that would mean that Facebook’s choice of which questions to use would affect the recommendations across Facebook.
For what it’s worth , as noted in Nuño’s comment this comparison holds little weight when the questions aren’t the same or on the same time scales; I’d take it as fairly weak evidence from my prior that real-money prediction markets are much more accurate.
Right, definitely, I forgot to add this. I wasn’t trying to say that Forecast is more accurate than real-money prediction markets (or other forecasting platforms for that matter) but rather that Forecasts’ forecasting accuracy is at least clearly above the this-is-silly level.
You also get feedback in form of the community median prediction on Metaculus and GJOpen, which in my experience is usually useful as feedback. Though I do think in general following the reasoning of competent individuals is very useful, but I think the comments and helpful people that enjoy teaching their skills do a solid job covering that.