Error
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
Overall I like this idea, appreciate the expansiveness of the considerations discussed in the post, and would excited to hear takes from people working at social media companies.
Thoughts on the post directly
I think some version of some type of boosting visibility based on forecasting accuracy seems promising, but I feel uneasy about how this would be implemented. I’m concerned about (a) how this will be traded off with other qualities and (b) ensuring that current forecasting accuracy is actually a good proxy.
On (a), I think forecasting accuracy and the qualities it’s a proxy for represent a small subset of the space that determines which content I’d like to see promoted; e.g. it seems likely to be loosely correlated with writing quality. It may be tricky to strike the right balance in terms of how the promotion system works.
On (b):
Promoting and demoting content based on a small sample size of forecasts. In practice it often takes many resolved questions to discern which forecasters are more accurate, and I’m worried that it will be easy to increase/decrease visibility too early.
Even without a small sample size, there may be issues with many of the questions being correlated. I’m imagining a world in which lots of people predict on correlated questions about the 2016 presidential election, then Trump supporters get a huge boost in visibility after he wins because they do well on all of them.
That said, these issues can be mitigated with iteration on the forecasting feature if the people implementing it are careful and aware of these considerations.
Insofar as the intent is to incentivize people to predict on more socially relevant domains, I agree. But I think forecasting accuracy on sports, etc. is likely strongly correlated with performance in other domains. Additionally, people may feel more comfortable forecasting on things like sports than other domains which may be more politically charged.
My experience with Facebook Forecast compared to Metaculus
I’ve been forecasting regularly on Metaculus for about 9 months and Forecast for about 1 month.
I don’t feel as pressured to regularly go back and update my old predictions on Forecast as on Metaculus since Forecast is a play-money prediction market rather than a prediction platform. On Metaculus if I predict 60% and the community is at 50%, then don’t update for 6 months and the community has over time moved to 95%, I’m at a huge disadvantage in terms of score relative to predictors who did update. But with a prediction market, if I buy shares at 50 cents and the price of the shares go up to 95 cents, it just helps me. The prediction market structure makes me feel less pressured to continually update on old questions, which has both its positives and negatives but seems good for a social media forecasting structure.
The aggregate on Forecast is often decent, but occasionally horrible more egregiously and more often than on Metaculus (e.g. this morning I bought some shares for Kelly Loeffler to win the Georgia senate runoff at as low as ~5 points implying 5% odds, while election betting odds currently have Loeffler at 62%). The most common reasons I’ve noticed are:
People misunderstand how the market works and bet on whichever outcome they think is most probable, regardless of the prices.
People don’t make the error described in (1) (that I can tell), but are over-confident.
People don’t read the resolution criteria carefully.
Political biases.
There aren’t many predictors so the aggregate can be swung easily.
As hinted at in the post, there’s an issue with being able to copy the best predictors. I’ve followed 2 of the top predictors on Forecast and usually agree with their analyses and buy into the same markets with the same positions.
Forecast currently gives points when other people forecast based on your “reasons” (aka comments), and these points are then aggregated on the leaderboard with points gained from actual predictions. I wish there were separate leaderboards for these.
Thanks, great points!
Yeah, me too. For what it’s worth, Forecast mentions our post here.
Yeah, as we discuss in this section, forecasting accuracy is surely not the most important thing. If it were up to me, I’d focus on spreading (sophisticated) content on, say, effective altruism, AI safety, and so on. Of course, most people would never agree with this. In contrast, forecasting is perhaps something almost everyone can get behind and is also objectively measurable.
I agree that the concerns you list under (b) need to be addressed.
This is a very good idea. The problems in my view are biggest on the business model and audience demand side. But there are still modest ways it could move forward. Journalism outlets are possible collaborators but they need the incentive perhaps by being able to make original content out of the forecasts.
To the extent prediction accuracy correlates with other epistemological skills you could task above average forecasters in the audience with tasks like up- and down-voting content or comments, too. And thereby improve user participation on news sites even if journalists did not themselves make predictions.
Thanks!
I agree.
Yeah, maybe such outlets could receive financial support for their efforts by organizations like OpenPhil or the Rockefeller Foundation—which supported Vox’s Future Perfect.
Interesting idea. More generally, it might be valuable if news outlets adopted more advanced commenting systems, perhaps with Karma and Karma-adjusted voting (e.g., similar to the EA forum). From what I can tell, downvoting isn’t even possible on most newspaper websites. However, Karma-adjusted voting and downvotes could also have negative effects, especially if coupled with a less sophisticated user base and less oversight than on the EA forum.
Agree on both points. Economist’s World in 2021 partnership with Good Judgment is interesting here. I also think as GJ and others do more content themselves, other content producers will start to see the potential of forecasts as a differentiated form of user-generated content they could explore. (My background is media/publishing so more attuned to that side than the internal dynamics of the social platforms.) If there are further discussions on this and you’re looking for participants let me know.
Thanks for writing this! :)
Another potential outcome that comes to mind regarding such projects is a self-fulfilling prophecy effect (provided the predictions are not secret). I have no idea how much of an (positive/negative) impact it would have though.
Thanks. :)
That’s true though this is also an issue for other forecasting platforms—perhaps even more so for prediction markets where you could potentially earn millions by making your prediction come true. From what I can tell, this doesn’t seem to be a problem for other forecasting platforms, probably because most forecasted events are very difficult to affect by small groups of individuals. One exception that comes to mind is match fixing.
However, our proposal might be more vulnerable to this problem because there will (ideally) be many more forecasted events, so some of them might be easier to affect by a few individuals wishing to make their forecasts come true.
Some other people have mentioned Facebook’s Forecast. Have you thought about talking with them directly about these ideas? For reference, here is the main person.
Yes, we have talked with Rebecca about these ideas.
Thanks, stimulating ideas!
My quick take: Forecasting is such an intellectual exercise, I’d be really surprised if it becomes a popular feature on social media platforms, or will have effects on the epistemic competencies of the general population.
I think I‘d approach it more like making math or programming or chess a more widely shared skill: lobby to introduce it at schools, organize prestigious competitions for highschools and universities, convince employers that this is a valuable skill, make it easy to verify the skill (I like your idea of a coursera course + forecasting competition).
I’d also be surprised. :) Perhaps I’m not as pessimistic as you though. In a way, forecasting is not that “intellectual”. Many people bet on sport games which (implicitly) involves forecasting. Most people are also interested in weather and election forecasts and know how to interpret them (roughly).
Of course, forecasting wouldn’t become popular because it’s intrinsically enjoyable. People would have to get incentivized to do so (the point of our post). However, people are willing to do pretty complicated things (e.g., search engine optimization) in order to boost their views, so maybe this isn’t that implausible.
As we mention in the essay, one could also make forecasting much easier and more intuitive, by e.g. not using those fancy probability distributions like on Metaculus, but maybe just a simple slider ranging from 0% to 100%.
Forecasting also doesn’t have to be very popular. Even in our best case scenario, we envision that only a few percent of users make regular forecasts. It doesn’t seem highly unrealistic that many of the smartest and most engaged social media users (e.g., journalists) would be open to forecasting, especially if it boosts their views.
But yeah, given that there is no real demand for forecasting features, it would be really difficult to convince social media executives to adopt such features.
I agree that this approach is more realistic. :) However, it would require many more resources and would take longer.
Hm, regarding sports and election betting, I think you’re right that people find it enjoyable, but then again I’d expect no effect on epistemic skills due to this. Looking at sports betting bars in my town it doesn’t seem to be a place for people that e.g. would ever track their performance. But I also think the online Twitter crowd is different. I’m not sure how much I’d update on Youtubers investing time into gaming Youtube’s algorithms. This seems to be more a case of investing 2h watching stuff to get a recipe to implement?
Just in case you didn’t see it, Metaculus’ binary forecasts are implemented with exactly those 0%-100% sliders.
Not sure if I think it would require that many more resources. I was surprised that Metaculus’ AI forecasting tournament was featured on Forbes the other day with “only” $50k in prizes. Also, from the point of view of a participant, the EA groups forecasting tournament seemed to go really well and introduced at least 6 people I know of into more serious forecasting (being run by volunteers with prizes in form of $500 donation money). The coursera course sounds like something that’s just one grant away. Looking at Good Judgement Open, ~half of their tournaments seem to be funded by news agencies and research institutes, so reaching out to more (for-profit) orgs that could make use of forecasts and hiring good forecasters doesn’t seem so far off, either.
I also imagined that the effect on epistemic competence will mostly be that most people learn that they should defer more to the consensus of people with better forecasting ability, right? I might expect to see the same effect from having a prominent group of people that perform well in forecasting. E.g. barely anyone who’s not involved in professional math or chess or poker will pretend they could play as well as them. Most people would defer to them on math or poker or chess questions.
Yeah, I guess I was thinking about introducing millions of people to forecasting. But yeah, forecasting tournaments are a great idea.
I agree that a forecasting Coursera course is promising and much more realistic.
I’m not an expert on social media or journalism, but just some fairly low-confidence thoughts—it seems like this is areally interesting idea, but it seems very odd to think of it as a Facebook feature (or other social media platform):
Facebook and social media in general don’t really have an intellectual “brand”. It seems likely that if you did this as a Facebook feature, it would be more likely to get dismissed as “just another silly Facebook game.” Or if most of the people using it weren’t putting much effort into it, the predictionslikely wouldn’t be that accurate, and that could undermine the effort to convince the public of its value.
The part about promoting people with high prediction scores seems awkward. Am I understanding correctly that each user is given one prediction score that applies to all their content? So that means that if someone is bad (good) at predicting COVID case counts, then if they post something else it gets down- (up-) weighted, even if the something else has nothing to do with COVID? That’s likely to be perceived as very unfair. Or do you have some system to figure out which forecasting questions count toward the recommender score for which pieces of content? Even then it seems weird—if someone made bad predictions about COVID in the past, that doesn’t necessarily imply that content they post now is bad.
Presumably the purpose of this is to teach people how to be better forecasters. If you have to hide other people’s forecasts to prevent abuse, then how are you supposed to learn by watching other forecasters? Maybe the idea is that Facebook would produce content designed to teach forecasting—but that isn’t the kind of content that Facebook normally produces, and I’m not sure why we would expect Facebook to be particularly good at that.
All the comparisons between forecasting and traditional fact-checking are weird because they seem to address different issues; forecasting doesn’t seem to be a replacement or alternative to fact-checking. For instance, how would forecasting have helped to fight election misinformation? If you had a bunch of prediction questions about things like vote counts or the outcomes of court cases, by the time those questions resolved everything would be already over. (That’s not a problem with forecasting, since it’s not intended for those kinds of cases. But it does mean that it would not be possible to pitch this as an alternative to traditional fact-checking.)
In general, this seems to require a lot of editorial judgment on the part of Facebook as to what forecasting questions to use and what resolution criteria. (Especially this would be an issue if you were to use a user’s general forecasting score as part of the recommender algorithm—for instance, if Facebook included lots of forecasting questions about economic data, that would end up advantaging content posted by people who are interested in economics, while if the forecasting questions were about scientific discoveries instead, then it would instead advantage content posted by people who are interested in science.) My guess is that this sort of editorial role is not something that social media platforms would be particularly enthusiastic about—they were sort of forced into it by the misinformation problem, but in that case they mostly defer to reputable sources to adjudicate claims. While they could defer to reputable sources to resolve questions, I’m not sure who they would defer to to decide what questions to set up. (I’m assuming here that the platform is the one setting up the questions—is that the case?)
Another way to game the system that you didn’t mention here: set up a bunch of accounts, make different predictions on each of them, and then abandon all the ones that got low scores, and start posting the stuff you want on the account that got a high score.
I wonder if it might make more sense to think of this as a feature on a website like FiveThirtyEight that already has an audience that’s interested in probabilistic predictions and models. You could have a regular feature similar to The Riddler but for forecasting questions—each column could have several questions, you could have readers write in to make forecasts and explain their reasoning, and then publish the reasoning of the people who ended up most accurate, along with commentary.
Thanks for your detailed comment.
Yeah, maybe all of this is a bit fantastical. :)
That’s certainly possible. For what it’s worth, while Facebook’s Forecast was met with some amount of skepticism, I wouldn’t say it was “dismissed” out of hand. The forecasting accuracy of Forecast’s users was also fairly good: “Forecast’s midpoint brier score [...] across all closed Forecasts over the past few months is 0.204, compared to Good Judgement’s published result of 0.227 for prediction markets.”
However, it’s true that a greater integration with Facebook would probably make the feature more controversial and also result in a lower forecasting accuracy.
Btw, Facebook is just one example—I write this because you seem to focus exclusively on Facebook in your comment. In some ways, Twitter might be more appropriate for such features.
That would be the less complicated option. It might be perceived as being unfair—not sure if this will be a big problem though.
I’m working under the assumption that people who make more correct forecasts in one domain will also tend to have a more accurate model of the world in other domains—on average, of course, there will be (many) exceptions. I’m not saying this is ideal; it’s just an improvement over the status quo where forecasting accuracy practically doesn’t matter all in determining how many people read your content.
That would be the other, more complicated alternative. Perhaps this is feasible when using more coarse-grained domains like politics, medicine, technology, entertainment, et cetera, maybe in combination with machine learning.
Well, sure. But across all users there will likely be a positive correlation between past and future accuracy. I think it would be good for the world if people who made more correct forecasts about COVID in the past would receive more “views” than those who made more incorrect forecasts about COVID—even though it’s practically guaranteed that some people in the latter group will improve a lot (though in that case, they will be rewarded by the recommender system in the future for that) and even make better forecasts than people in the former group.
I wouldn’t say that’s the main purpose.
My understanding is that’s how other platforms, like e.g. Metaculus, work as well. Of course, people can still write comments about what they forecasted and how they arrived at their conclusions.
Also, I think one can become better at forecasting on one’s own? (I think most people get better calibrated when they do calibration exercises on their own—they don’t need to watch other people do it.)
I didn’t mean to suggest that forecasting should replace fact-checking (though I can now see how our post and appendix conveyed that message). When comparing forecasting to fact-checking, I had in mind whether one should design recommendation algorithms to punish people whose statements were labeled false by fact-checkers.
Yeah, they certainly would be reluctant to do that. But given that they already do fact-checking, it doesn’t seem impossible.
I agree that this is an issue. In practice, it doesn’t seem that concerning though. First, the recommendation algorithm would obviously need to take into account the number of forecasts in addition to their average accuracy in order to minimize rewarding statistical flukes. (Similarly to how Yelp displays restaurants with, say, an average of 4.5 rating but 100 ratings more prominently than restaurants with an average rating of 5.0 but only 5 ratings.) Thus, you would actually need to put in a lot of work to make this worthwhile (and set up, say, hundreds of accounts) or get very lucky (which is of course always possible).
It would probably also be prudent to put in some sort of decay to the forecasting accuracy boosting (such that a good forecasting accuracy, say, 10 years ago matters less than a good forecasting accuracy in this year) in order to incentivize users to continue making forecasts. Otherwise, people who achieved a very high forecasting accuracy in year 1 would be inclined to stop forecasting in order to avoid a regression to the mean.
Yeah, that’s an interesting idea. On the other hand, FiveThirtyEight is much smaller and it’s readers are presumably already more sophisticated so the potential upside seems smaller.
That being said, I agree that it might make more sense to focus on platforms with a more sophisticated user base (like, say, Substack). Or focus on news outlets like, say, the Washington Post. That might even be more promising.
To clarify, when I made the comment about it being “dismissed”, I wasn’t thinking so much about media coverage as I was about individual Facebook users seeing prediction app suggestions in their feed I was thinking that there are already a lot of unscientific and clickbait-y quizzes and games that get posted to Facebook, and was concerned that users might lump this in with those if it is presented in a similar way.
I agree, and I definitely admit that the existence of the Facebook Forecast app is evidence against my view. I was more focused on the idea that if the recommender algorithm is based on prediction scores, that would mean that Facebook’s choice of which questions to use would affect the recommendations across Facebook.
For what it’s worth , as noted in Nuño’s comment this comparison holds little weight when the questions aren’t the same or on the same time scales; I’d take it as fairly weak evidence from my prior that real-money prediction markets are much more accurate.
Right, definitely, I forgot to add this. I wasn’t trying to say that Forecast is more accurate than real-money prediction markets (or other forecasting platforms for that matter) but rather that Forecasts’ forecasting accuracy is at least clearly above the this-is-silly level.
You also get feedback in form of the community median prediction on Metaculus and GJOpen, which in my experience is usually useful as feedback. Though I do think in general following the reasoning of competent individuals is very useful, but I think the comments and helpful people that enjoy teaching their skills do a solid job covering that.