Prediction Markets for Science
Epistemic status: untested combination of ideas I think are individually solid. This is intended to be a quick description of some obvious ideas rather than a deeply justified novel contribution; I’ll cite some things but not be especially careful to review all prior work.
Prediction markets are a simple idea, most commonly implemented as sports betting: who will win a particular game? People place bets, the outcome is determined, and then bookies give money to the correct predictors. There are also prediction markets for arbitrary events like Metaculus. Here I want to focus on how prediction markets can be useful for the process of doing science.
The Basics
Why are prediction markets useful, to begin with? Let’s consider the example of sports betting, again. As an observer of sports, you might have some idea that one team is more likely to beat another team, but you likely don’t have a clear quantitative sense of how likely. As well, as you could discover by listening in to conversations between observers, disagreement about those beliefs is common, and often due to conflicting models, observations, and loyalties.
One common betting market means that prices will fluctuate until all the information is aggregated, and then there will be a ‘consensus’ number that reflects overall societal uncertainty. A journalist writing about the game doesn’t need to be any good at assessing how likely teams are to win, or how representative their sample of interviewed experts is; they can directly report the betting market odds as indicative of overall beliefs. [After all, if someone disagreed with the betting market odds, they would place a bet against them, which would cause the odds to move.]
This has several useful effects:
‘Money’ flows from bad predictors to good predictors. As prediction skill is rewarded, it will be specialized in and developed. As poor prediction skill is punished, it will be reduced and discarded. Someone who is good at foreseeing outcomes might stop doing other jobs to focus more time on predicting more games, causing more results to be better known ahead of time. Someone who always bets on the home team out of loyalty might quickly learn to only bet token amounts, and thus not be much pressure on the overall social consensus.
Predictions are shared and can be jointly acted on. Most people can just track the market price and act accordingly, and discussion of what the price ‘should’ be moves to more specialized trading forums. You can skip watching games that are predicted to be blowouts, instead finding the ones where the market is uncertain (and thus are likely to be suspenseful).
Trading volume lets you know become the outcome has happened how much society is interested in and disagrees on the prediction.
Info that is private or difficult to justify can be added to the communal pot more easily than thru conversational mechanisms.[1]
Replication Markets
When a paper is published, readers might want to know: was this a one-off fluke, or would another experiment conducted the same way have the same results? This is a job for a prediction market, and it’s worked well before on Many Labs 2, a psychology paper replication project. Peers in a field have a sense of which papers are solid and which are flimsy; prediction markets make that legible and easily shared instead of the product of intuitive expertise learned over lots of research.
Note that you can ask not just “will the results be significant and in the same direction?” but “what will the results be?”, but at least for that trial that market received less trading and performed much worse. This seems like the sort of thing that is possible to eventually develop skill for—sports bets that were once limited to ‘who will win?’ now often deal with ‘will A win by at least this much?’, a subtler question that’s harder to estimate. We’re still in the infancy of explicitly betting on science, so the initial absence of that skill doesn’t seem very predictive about the future.
Outcome Markets
The idea of ‘replication’ markets idea generalizes; you don’t need to have run the first paper! You could write the ‘methods’ section of a paper and see what results the market expects. This makes more sense in some fields than others; if you write a survey and then give it to a thousand people, your design is unlikely to change in flight, but many more exploratory fields have several non-informative and thus unpublished study designs on the path to a finished paper. This can still be usefully predicted in advance.
The main value of outcome markets is creating a field-wide consensus on undone experiments. For experiments where everyone agrees on the expected result, it may not be necessary to run the experiment at all (and a journal focused on ‘novel results’ might, looking at the market history, decide not to publish it). For experiments where the field disagrees on the result and there’s significant trading, then there’s obvious demand to run the experiment and generate the results to settle the bet.[2]
Note that this has two important subpoints: review and disagreement.
Peer review is often done to studies after they are ‘finished’, in the author’s point of view. This necessarily means that any reviewer comments on how the paper should have been done differently would have been appreciated earlier in the process (which is where outcome markets move them). Having proposed experiments be available to the field before the experiments have been run means that many suggestions can be incorporated (or not). Senior researchers with strong guidance can give advice to anyone, rather than just students that work with them directly.
Second, many fields, despite being motivated by controversies between research groups, write their papers ‘in isolation’. They identify an effect that they can demonstrate isn’t the result of noise, interpret that effect as supporting their broader hypothesis, and then publish. This is not a very Bayesian way to go about things; what’s interesting is not whether you think an outcome-that-happened is likely, but whether you think it’s likelier than the other side thinks. If both groups think the sun will rise tomorrow, then publishing a paper about how your hypothesis predicts the sun will rise and you observed the sun rising doesn’t do much to advance science (and might set it back, as now the other group needs to spend time writing a response).
Trading volume can fix this, as proposals about whether or not the sun will rise receive basically no disagreements, whereas proposals about controversial topics will receive significant bets from both parties. This pushes research towards double-crux and adversarial collaboration. (Note that, in order to bet lots of their stake on a proposal, the other group needs to believe it’s well-designed and will be accurately interpreted, incentivizing more shared designs and fewer attempts at dishonest gotchas.)
Science Popularization
Often, people want to ‘trust the science’, while the science is focused on exploring the uncertain frontier instead of educating the masses. People who give TED talks and publish mass-market books have their views well-understood, probably out of proportion with their reputation among experts. In cases where journalists disagree with the experts, it’s the journalists writing the newspapers.
Openly readable markets (even if only ‘experts’ can bet on them) make for easier transmission of the current scientific consensus (or lack thereof), even on recent and controversial questions (where the tenor of the controversy can be fairly and accurately reported). Hanson discusses how reporting on cold fusion could have more adequately conveyed the scientific community’s skepticism if there had been a common and reputable betting market.
For the long term, it seems like publicly accessible markets (rather than closed expert markets) are probably better on net; if the public really wants to know whether a particular fad diet is accurate, they can vote with their dollars to fund studies on those diets. If a particular expert has set up a small fiefdom where they control peer review and advancement, only the discovery of the price distortions by a larger actor outside the fiefdom (who can afford to out-bet the locals) can allow for their correction.
If you’re interested in working on this, the main group that I know of doing things here is the Science Prediction Market Project, and Robin Hanson has been thinking about this for a long time, writing lots of things worth reading.
- ^
For many gambling markets, the underlying events are designed to be random or competitions between people, which invites cheating to make things ‘easy to predict’. Attempts to guard against that are thus quite important and might dominate thinking on the issue.
For gambling markets about external events or cooperative projects (like science), this is of reduced importance (while still being important!). A researcher might want to raise their personal prestige at the expense of the societal project, by inappropriately promoting their hypotheses or suppressing disagreements or falsifying data. There still need to be strong guardrails in place to make that less likely. Prediction markets can help, here, by rewarding people who choose to fight bad actors and win.
- ^
While outcome markets would likely start as play-money or prestige games, you could imagine a future, more-developed version which replaces a lot of the granting pipeline, using the market mechanism itself to determine funding for experiments.
In this vision, time spent writing grant applications (which involves a bunch of politics and misrepresentation) and reviewing papers would be replaced by writing experimental designs and commenting and betting on them. If scientists spend less time on meta and more time on the object level, it’ll be because the system is more efficient (which isn’t obviously the case) or because the roles are specialized out into different roles, where object-level scientists are more like price-takers in the prediction markets who provide very little meta-effort and meta-level scientists are traders who might provide very little object-effort.
See also the (related but distinct) Visitor’s proposal from Moloch’s Toolbox, chapter 3 of Inadequate Equilibria:
VISITOR: Two subclasses within the profession of “scientist” are suggesters, whose piloting studies provide the initial suspicions of effects, and replicators whose job it is to confirm the result and nail things down solidly—the exact effect size and so on. When an important suggestive result arises, two replicators step forward to confirm it and nail down the exact conditions for producing it, being forbidden upon their honor to communicate with each other until they submit their findings. If both replicators agree on the particulars, that completes the discovery. The three funding bodies that sustained the suggester and the dual replicators would receive the three places of honor in the announcement. Do I need to explain how part of the function of any civilized society is to appropriately reward those who contribute to the public good?
There’s also this https://socialscienceprediction.org/ which is similar
Thanks for mentioning the Social Science Prediction Platform! We had some interest from other sciences as well.
With collaborators, we outlined some other reasons to forecast research results here: https://www.science.org/doi/10.1126/science.aaz1704. In short, forecasts can help to evaluate the novelty of a result (a double-edged sword: very unexpected results are more likely to be suspect), mitigate publication bias against null results / provide an alternative null, and over time help to improve the accuracy of forecasting. There are other reasons, as well, like identifying which treatment to test or which outcome variables to focus on (which might have the highest VoI). In the long run, if forecasts are linked to RCT results, it could also help us say more about those situations for which we don’t have RCTs—but that’s a longer-term goal. If this is an area of interest, I’ve got a podcast episode, EA Global presentation and some other things in this vein… this is probably the most detailed.
I agree that there’s a lot of work in this area and decision makers actively interested in it. I’ll also add that there’s a lot of interest on the researcher side, which is key.
P.S. The SSPP is hiring web developers, if you know anyone who might be a good fit.
In my opinion, the applications of prediction markets are much more general than these. I have a bunch of AI safety inspired markets up on Manifold and Metaculus. I’d say the main purpose of these markets is to direct future research and study. I’d phrase this use of markets as “A sub-field prioritization tool”. The hope is that markets would help me integrate information such as (1) methodology’s scalability e.g. in terms of data, compute, generalizability (2) research directions’ rate of progress (3) diffusion of a given research direction through the rest of academia, and applications.
Here are a few more markets to give a sense of what other AI research-related markets are out there: Google Chatbot, $100M open-source model, retrieval in gpt-4
I think this could be a very useful tool for improving public knowledge about the uncertainties in various areas of science.
But I wonder how can prediction markets be effectively implemented in decision-making processes.
For example, if there’s a vaccine going through testing, or an intervention being studied, do you think there could be a smart way to integrate prediction markets into more optimal policy decisions?
You could also go the other way and call out interventions, policies, and other currently implemented things based on uncertainty in the prediction markets.