[Question] How valuable would more academic research on forecasting be? What questions should be researched?

MichaelA🔸Aug 12, 2020, 7:19 AM

23 points

Improving institutional decision-making Long-range forecasting Policy Forecasting

I’m interested in people’s thoughts on:

How valuable would it be for more academics to do research into forecasting?
How valuable would it be for more non-academics to do academic-ish research into forecasting? (E.g., for more people in think tanks or EA orgs to do research on forecasting that’s closer in levels of “rigour” to the average paper than to the average EA Forum post.)
What questions about forecasting should be researched by academics, or by non-academics using academic approaches?
- I imagine this could involve psychological experiments, historical research, conceptual/theoretical/mathematical work, political science literature reviews, etc.

(Note that I don’t mean “What questions should people forecast?” For that, there’s this post.)

Context

Many EAs have appreciated and drawn on Phil Tetlock’s research into forecasting. Some people in the EA community or related communities are now doing non-academic work on forecasting, such as:

building tools for forecasting (e.g. Foretold)
running forecasting projects (e.g. Foretell [no relation])
experimenting/playing around with different forecasting techniques and methods (e.g. here and here)
researching and writing about forecasting (e.g. here and here)

And some people in the EA community or related communities are doing academic work:

To forecast specific things (e.g. Grace et al.; Manheim)
About forecasting itself (e.g. Beard et al.; Baum [see also])

But I’m not aware of much academic research on forecasting itself from EAs. I imagine there might be room for more. And two things I’m considering trying to do in future are:

A PhD in psychology, focusing on forecasting or other things related to improving institutional decision-making
Research on forecasting in a job in an EA organisation or with a grant

But I don’t know how valuable that would be—e.g., maybe non-EA academics are covering this area well already, or maybe what’s really needed is just people in government/business trying to actually implement or make use of forecasting projects. Nor do I know what the important open questions or topics would be. And I haven’t engaged much with the academic literature on forecasting, other than reading Superforecasting, Expert Political Judgement, and various summaries by EAs.

So this leads me to ask the above questions, both for my own benefit and (hopefully) for the benefit of other people who might be a good fit for research into this topic. (To help capture that benefit, I’ve added this post to A central directory for open research questions.)

The 80,000 Hours profile on improving institutional decision-making has some great analysis and ideas on this, but it’s now almost 3 years old, and I’m interested in additional perspectives.

(I also posted these three questions to the Improving institutional decision-making Facebook group, and there was some discussion there.)

What links here?

MichaelA🔸Aug 12, 2020, 7:19 AM

23 points

8 comments2 min readEA link

Improving institutional decision-making Long-range forecasting Policy Forecasting

MichaelA🔸Aug 12, 2020, 7:26 AM
4 points
0 ∶ 0
1. How open are various decision-makers to actually paying attention to forecasts?
2. How likely are they to just make the same decision anyway, referring to forecasts when they justify this decision and ignoring them the rest of the time?
3. How does this vary for different decision-makers and contexts (e.g., politicians vs civil servants vs funders vs business leaders)?
4. How does this vary by different approaches to forecasting (e.g. those surveyed by Beard et al.), different presentations of forecasting, and different topics?
I’m guessing a good amount of research will have already been done on these topics, but I’ve been surprised about such things before.
I imagine these questions could be answered through a mixture of:
- surveys of relevant people
- lab experiments
- interviews with people who’ve tried implementing forecasting in relevant settings
- literature reviews of potentially relevant work in political science and political economy (e.g., on kinds of info are drawn on in political decision-making more generally)
- probably also other approaches
It seems plausible that these sorts of questions aren’t as well answered by academic research as by people just actually trying to implement forecasting in relevant institutions, get relevant decision-makers to pay attention to forecasts, etc.
(This is one of my own ideas about a cluster of questions that might warrant academic/academic-style research. I’d be interested in people’s thoughts on this cluster and the cluster I suggest in a different comment, including regarding how important, tractable, and neglected these questions are.)
Anon-biosec-account Feb 25, 2021, 11:07 AM
3 points
0 ∶ 0
Not really answering your question, but there is some recent work attempting to forecast clinical trial results that may be relevant: Can Oncologists Predict the Efficacy of Treatments in Randomized Trials? Kimmelman (the senior author) is doing other work on the topic too (e.g. here). I’m not aware of much published work in this space in a biomedical context.
My guess is that key decision makers in medicine (e.g. funders of trials), would not be very open to paying attention to forecasts (even if shown to be accurate to some degree), as there is a very strong culture of relying on data and in particular on RCTs.
niplav Aug 12, 2020, 8:21 AM
2 points
0 ∶ 0
Greenberg 2018 lists and evaluates forecasting scoring rules. Research on additionally more complex metrics that take into account e.g.:
- importance of the questions forecasted on (perhaps using an interest score such as Metaculus)
- number of questions forecasted on (brier score distorts if only few forecasts)
- relative performance as compared to other forecasters on similar sets of questions
Might be useful to set incentives right in forecasting tournaments. Prediction markets solve the first point by logarithmic subsidising.
MichaelA🔸Aug 12, 2020, 7:22 AM
2 points
0 ∶ 0
1. How does the resolution and calibration of forecasts vary by the forecasts’ “range” (e.g., whether the forecast is for an event 6 months away vs 3 years away vs 20 years away)?
2. How does this vary between topics, types of forecasters, etc.?
3. Do people who are superforecasters for short- or medium-range questions (e.g., those that resolve within 3 years) still do better than average for long-range forecasts?
4. Are there approaches to forecasting that are especially useful for long-range forecasts? (Approaches that could be considered include those surveyed by Beard et al.)
5. All the same questions, but now for “extreme” or “extremely rare” vs more “mundane” or “common” events (e.g., forecasts of global catastrophes vs forecasts of more minor disruptions).
6. All the same questions, but now for forecasts that are long-range and about extreme/extremely rare outcomes.
Background for this cluster of questions: How Feasible Is Long-range Forecasting?
One rationale for this cluster of questions would be to improve our ability to forecast global catastrophic and existential risks, and/or inform how we interpret those forecasts and how much weight we give them.
This research could use very similar methodologies to those used by Tetlock, just with different forecasts. However, a very important practical limitation is that the research may inherently take many years or decades.
(This post says “Fortunately, we might soon be in a position to learn more about long-range forecasting from the EPJ [Expert Political Judgement] data, since most EPJ forecasts (including most 25-year forecasts) will have resolved by 2022”. This might reduce the marginal value of further work on this, but I imagine the marginal value could remain quite high.)
- niplav Aug 12, 2020, 8:14 AM
  2 points
  0 ∶ 0
  Parent
  How does the resolution and calibration of forecasts vary by the
  forecasts’ “range” (e.g., whether the forecast is for an event 6 months
  away vs 3 years away vs 20 years away)?
  I have an (unfinished) essay on the topic using Metaculus and PredictionBook data. Relation between range and accuracy is negative within forecasts on one specific questions. Specifically, linear regression is 0.0019x+0.0105 for brier score over the range in days. Of course, I’ll look into better statistical analyses if I find time.
  - MichaelA🔸Aug 13, 2020, 8:08 AM
    3 points
    0 ∶ 0
    Parent
    Thanks for sharing this, I’ll try look over it soon.
    - niplav Aug 15, 2020, 8:39 AM
      1 point
      0 ∶ 0
      Parent
      Just beware I got feedback by two different people that it’s difficult to understand.

MichaelA🔸Aug 14, 2020, 12:13 AM
2 points
0 ∶ 0
I’ve just discovered the very recently published Forecasting AI Progress: A Research Agenda. The abstract reads:
Forecasting AI progress is essential to reducing uncertainty in order to appropriately plan for research efforts on AI safety and AI governance. While this is generally considered to be an important topic, little work has been conducted on it and there is no published document that gives and objective overview of the field. Moreover, the field is very diverse and there is no published consensus regarding its direction.
This paper describes the development of a research agenda for forecasting AI progress which utilized the Delphi technique to elicit and aggregate experts’ opinions on what questions and methods to prioritize. The results of the Delphi are presented; the remainder of the paper follow the structure of these results, briefly reviewing relevant literature and suggesting future work for each topic. Experts indicated that a wide variety of methods should be considered for forecasting AI progress. Moreover, experts identified salient questions that were both general and completely unique to the problem of forecasting AI progress. Some of the highest priority topics include the validation of (partially unresolved) forecasts, how to make forecasting action-guiding and the quality of different performance metrics. While statistical methods seem more promising, there is also recognition that supplementing judgmental techniques can be quite beneficial.
I hope to read the full paper soon, as I imagine it’d contain ideas and insights relevant to my questions (even if the paper’s primary focus is forecasting AI specifically).
(There are also some comments on the AIAF cross-post.)