Forecasting for Policy (FORPOL) - Main takeaways, practical learnings & report
Tl;dr: In Czech Priorities, an EA-aligned think tank based in Prague, together with Metaculus, we recently ran a complex forecasting tournament funded by EAIF to support Czech policymaking. Today, we are publishing a write-up of our findings from the tournament, as well as supplementary materials to help any teams or groups of forecasters looking to do the same.
______
From October 2022 to March 2023, we ran a forecasting tournament with a total of 54 questions. In March, we have discussed some of our preliminary findings in this post.
Almost all of our forecasting questions were developed in cooperation with 16 different public institutions and ministerial departments. Each institution or department defined its most useful forecasting topics, participated in a workshop to define specific questions with us, and was later provided with the results. This was intended as a proof of concept of one possible approach to incorporating forecasting in public decision-making.
Once defined, our forecasting questions were then posted on a private Metaculus sub-domain (in Czech language), where an average of 72 forecasters had the opportunity to address them as they would any other question on Metaculus (median of 18 predictions per user). Throughout the tournament, we produced 16 reports detailing the rationales and forecasts, to be used by the cooperating institutions. The institutions and their topics were listed in our previous post.
This approach, in combination with multiple media appearances, also allowed us to strengthen our position as one of the leading Czech institutions with expertise in foresight methods.
We’ve created a write-up detailing our steps in both talking to institutions and managing the tournament with its specifics. Here are our five overarching takeaways for groups of forecasters who want to make an impact through policymaking, and some specific lessons learned:
General takeaways
Develop partnerships with policymakers across various policy areas. Finding policymakers eager to try analytical innovations is usually not the main bottleneck.
Diversify your portfolio of foresight methods. Particularly in the EU, policymakers are becoming increasingly aware of the importance of foresight. Their familiarity with methods such as scenario planning or horizon scanning can grant more inroads into the policy process for forecasting if leveraged well.
Expect to deal with complex, long-term forecasting questions. They are requested often and can have larger impacts if used to adjust long-term plans and strategies.
Even among promising leads, only approx. 25% will see the process to the end. Forecasting is not yet a priority for many policymakers and there are many steps where it may fail on the way to delivering impact.
Don’t expect clearly measurable large impacts. Large impacts are usually difficult to trace back to individual data points (as they require more data and negotiation), while measurable uses are usually based on individual decisions, producing smaller impacts.
Practical lessons learned—Policy
Be prepared to be in the driver’s seat. While public institutions might be largely supportive of the idea of forecasting, their ability to closely cooperate for the whole duration of the several months-long forecasting tournament is limited. Keep in mind that their primary function is usually not to experiment and discuss forecasting questions and findings.
Don’t get locked in. Even if an institution is receptive to forecasting, you may eventually find out that developing feasible questions for their topics of interest is not possible (such as due to data availability, time horizons, etc.). In this case, do not feel obliged to submit sub-optimal questions to forecasters. You are the partner with awareness of how forecasting inputs should look.
Find the sweet spot. Policymakers will not want to include probabilistic forecasts in only one chapter in a policy document of many. It would look inconsistent. Aim for policy issues that are likely to have their own standalone discussions and outputs, where forecasting can really pop.
Give them something to think about. We received very positive feedback on including forecaster rationales and other contextualizing information in supplemental materials provided along with pure probabilistic information. Try to aim for anywhere between 3-10 pages for a handful of questions.
Practical lessons learned—Forecasting tournaments
Help them help you. Scoring rules determine the feedback that forecasters get on their predictions. They need to properly understand how scores are calculated and what they mean so that it serves to inform and motivate them.
Mind the gap. There are numerous factors that will make significant drop-offs inevitable in forecasting tournaments (cognitive and time demands, primarily online activity, etc.). Keep this in mind when planning your forecaster recruitment strategy and goals.
Variety is the spice of life. Forecasters strongly favored a wide range of topics covered in the tournament. There’s a need to strike a balance between greater diversity and the greater time investment (for research) demanded by it.
Can’t win them all. At the start, there are three important objectives: improving public understanding and acceptance of forecasting; identifying and developing top forecasters (and generally keeping forecasters engaged); and crowdsourcing forecasts useful for public policy. At various times these objectives may temporarily clash. Know which is your priority.
Rationales don’t compete. We offered additional rewards for well-thought-out rationales. While this improved the base quality of contributions, we did not observe explicit competition between forecasters in writing outstanding rationales.
Impacts
In our case, a handful of our partners already acted on the information/judgment presented in our reports. This has concerned, for example, the national foreclosure issue (some 6% of the total population have debts in arrears) where the debt relief process is being redesigned midst of strong lobbying and insufficient personal capacities; or the probabilities of outlier scenarios for European macroeconomic development, which was requested by the Slovakian Ministry of Finance to help calibrate their existing judgments.
Other partners claim to have incorporated this knowledge into the larger cycle of their policymaking process, but we haven’t yet seen any actual evidence of it. Our experience and candid discussions with policymakers and forecasters alike, however, also gave us some pointers on what pitfalls are intensified when tournaments are focused on policy questions (such as the incentives and motivations of forecasters).
In general, it seems useful to explore various approaches to grow the number of policymakers with personal experience & skills in forecasting. In our case, we found curiosity and willingness to try forecasting even in unexpected institutional locations (i.e. the Czech R&I funding body). This makes us more confident that the “external forecasts” approach (as compared to building internal prediction tournaments or focusing on advancing the forecasting skills of public servants) is worth investigating further precisely because it allows us to detect and draw on this interest irrespective of institutional and seniority distinctions and resource constraints.
While we hope that any readers with an interest in forecasting may find our experience useful, we expect that both this and any future projects of ours make it easier for other teams to work towards similar goals. To that effect, the write-up also contains an Annex of “Methodological Guidelines,” where we outline in more explicit terms the questions and decisions that we found were important to tackle when running the project, and what they may entail.
Access our full report HERE.
Thanks for writing this up! As someone who recently ran a forecasting event at a UK Government department for my MSc research project, I fully appreciate some of your challenges (e.g. around attrition and creating a variety of questions).
In your experience, how well did the participants feel the link was between the forecasts they were making and any decisions that were being made on the area/topic? Did they feel like the forecasts would influence/be integrated effectively when a decision on the relevant area was being made? If so, did you notice any improvement in forecasting accuracy? My reason for asking is an issue that is typically raised around forecasting is that it lacks decision-relevance, and that even if forecasts are elicited they have limited influence on the final decision. It’d be interesting to know if you found that perception as well, and if not, if there were any incentive benefits (i.e. if they felt their forecast would inform decisions, then did they become more accurate/try harder).
Out of interest, was there any training provided to participants, before during or after the tournament?
Thanks for the questions—your experience certainly sounds interesting as well (coming from someone with a smidgeon of past experience in the UK)!
As for the link between decision-relevance and forecaster activity: I think it bears repeating just how actively we had to manage our partnerships to not end up with virtually every question being long-term, which:
a) while obviously not instantly dropping decision relevance is at least heuristically tied to it (insofar as there are by default fewer incentives to act on any information regarding the future than there are for more immediate datapoints); b) presents a fundamental obstacle for both evaluating forecast accuracy itself (as the questions just linger unresolved) and for the tournament model which seeks to reward this accuracy or a proxy thereof
That being said, from the discussions we had I feel at least somewhat confident in making two claims: a) forecasters definitely cared about who will use the predictions and to what effect, though there didn’t seem to be significant variance in turnout or accuracy (insofar as we can measure it) bar a few outlier questions (which were duds on our part). b) as a result and based on our exit interviews with the top forecasters, I would think about decision-relevance as a binary or categorical variable, rather than a continuous one. If the forecasting body continuously builds credibility in presenting questions and giving feedback from the institutions, it activates the “I’m not shouting into the void” mode of forecasters and delivers any benefits that might have.
At the same time, however, it is possible that none of our questions involved a leveled-up immediate question (“Is Bin Laden hiding in the compound...”), where a threshold would be crossed and suddenly activate an even more desirable mode of thinking/evaluating evidence. It’s questionable, however, whether even if such a threshold exists, a sustainable forecasting ecosystem can be built that exists on the other side of it (though this would be the dream scenario, of course).
As for training: in the previous tournament we ran, there was a compulsory training course on the basics such as base rates, fermisation, etc. Given that many participants in FORPOL had already taken part in it’s predecessor, and that our sign-ups indicated that most were familiar with these from having read Superforecasting or already forecasted elsewhere, we kept an updated version of the short training course available, but no longer compulsory. There was no directed training after the tournament, as we did not observe demand for it.
Lastly, perhaps one nugget of personal experience you might find relevant/relatable: when working with the institutions, it definitely was not rare to feel like the causal inference aspects (and even just eliciting cognitive models of how the policy variables interact) might have deserved a whole project to themselves.