Impactful Forecasting Prize for forecast writeups on curated Metaculus questions

eliflandFeb 4, 2022, 8:06 PM

91 points

cross-posted to LessWrong

TLDR

We’re giving out $4,000 to the best forecast writeups submitted via this form on these Metaculus questions by March 11 to encourage more people to forecast on impactful questions and write up their reasoning.

Motivation

We believe that forecasting on impactful questions is a great candidate for an activity more EA-interested people should partake in, given that it:

Provides direct value when done on decision-relevant questions.
Improves and demonstrates good judgment and research skills.
Can be fun: provides a concrete and gamified framing for activities that look similar to research.
Leads to generally learning more about the world.
Helps match up bright, like-minded collaborators.

This is informed by personal experience: we have learned a lot and found great collaborators through forecasting.

However, it’s difficult to start doing impactful forecasting right now. Metaculus has lots of questions, most of which aren’t selected for impact. It can be overwhelming to navigate and find questions that are both impactful and interesting. It may also be difficult to know where to start in an analysis. Additionally, it can be scary to share your thoughts publically without a push and if it’s good reasoning, the incentives are usually against you.^[1]

How to participate

We curated 25 Metaculus questions for prediction in this Airtable. Either Eli Lifland or Misha Yagudin has forecasted on each of these to provide a starting point for further analysis. The table has cause area tags to allow filtering for interest and expertise.

To participate: do the following by March 11 2022, 11:59 PM Anywhere On Earth:

Make a forecast on one of the selected questions and write up your reasoning.
Write up the reasoning for your forecast, on Metaculus or elsewhere (e.g. a forum/blog post).
1. We require you to share your reasoning publically, unless there are infohazards in which case we will consider private submissions.
Fill out this form with a link to your writeup and contact info; anonymous writeups and email addresses are okay.

You may submit up to one writeup per question; if you submit multiple, your final one will be used. You may submit entries for as many questions as you please.

We will host meetups in this gather.town space to chat about the contest and facilitate forecasting on the questions together. The meetups will be on Wed Feb 16 and Wed Mar 2, 6:30 − 8:30 PM UTC. See this ICS file for calendar events.

Prize details

We will distribute a total of $4,000 to the best forecast writeups of questions in the curated Airtable. Eli and Misha will judge which writeups are most valuable. Sam Glover may help with preliminary evaluations. The first prize will be a maximum of $2,000 and we will distribute prizes for a maximum of 15 writeups.

We will rate each analysis individually, and there is no limit on the amount of prizes a participant can win. Collaborative writeups are allowed and encouraged: writeups which are a collaboration by multiple forecasters should select one person as contact, and may distribute the prize among themselves as they please.

The primary criterion we will use for judging entries is how much it changes our mind compared to our initial forecasts. In that sense, this can be viewed as a large forecasting amplification experiment.

The following caveats apply:

We may adjust based on a subjective notion of how strongly held our previous views were, such that changing a strongly held view is rewarded more than changing a weakly held view.
Writeups that are less counterfactually valuable, e.g. widespread news updates that would be shared anyway, will be less likely to receive prizes.
We may reward clarity and conciseness, and reserve the right to skim especially long writeups.
We will attempt to avoid double counting similar insights across writeups on multiple questions, e.g. across the QuALITY questions resolving in 2025 and 2040.
If multiple entrants share very similar arguments, we will give more credit to the one shared first.
We reserve the right to diverge from the main criterion in other ways that we think will lead to the most fair distribution of prizes.

You may also submit writeups on these questions written before this post for retroactive prizes, though we expect to not award many of these as we chose the questions in part due to neglectedness of contributions. We will review entries and announce the winners by March 31.

Who should participate

If you’re reading this and feel interested to any extent, we’d encourage you to browse the Airtable and try forecasting on at least 1 question.

That being said, to give examples of our target audience:

College students interested in decision making with plenty of free time.
Professionals with experience related to one of the highlighted questions.
Aspiring generalist researchers at any stage in their career.

Question selection

We selected 50 candidate questions by browsing Metaculus, then subjectively rated questions on 4 dimensions from 1-5:

Decision importance: The importance of the decisions which will be affected by this question. Should combine cause area importance + importance within cause area.
Decision relevance: How much of an impact would this have on actual decisions if the forecast changed by a substantial amount? This factor is re-used from Nuño Sempere’s An estimate of the value of Metaculus questions.
Ease of contribution: How easy will it be for a “median generalist forecaster” to make a contribution to the analysis on this question within a few hours? e.g. questions requiring lots of domain expertise or background reading would score low here.
Neglectedness of contributions: How few contributions have there been on this specific question so far? How in need of attention is it? This should be subjectively evaluated using the existing count of forecasts and quantity + quality of comments/writeups.^[2]

A curation score was calculated, weighing decision importance at twice the other three due to it feeling like the most important factor.^[3] We chose a set of 25 questions based mainly on the curation score, but also including a diversity of cause areas and question types.

We also included 4 questions we wrote ourselves: these two on QuALITY performance in 3 and 18 years for informing AI strategy and these two on climate change.

We may add a few impactful questions to the curated Airtable in the next week or so. If so, we will write a comment announcing their addition.

Acknowledgments

Thanks to Nuño Sempere for reviewing an earlier version of the curated questions and selection criteria and to Jehan Azad for feedback on this post. All mistakes are our own.

↩︎
See also Bottlenecks to more impactful crowd forecasting.
↩︎
A more thorough evaluation could also investigate neglectedness of similar questions and the topic area generally, rather than just the Metaculus question considered.
↩︎
A more thorough selection framework might be better mathematically motivated, like the ITN framework.

What links here?

eliflandFeb 4, 2022, 8:06 PM

91 points

13 comments4 min readEA link

Bounty (closed)Metaculus Forecasting

Will Aldred Mar 30, 2022, 1:47 AM
16 points
0 ∶ 0
“To give examples of our target audience: [...] 3. Aspiring generalist researchers at any stage in their career.”

I agree that writing up forecasting reasoning is one way for aspiring generalist researchers to build generalist-type research skill, but also want to highlight some other options:
- Summarize/Collect previous posts/articles/papers (I think this is the probably the best skill-building activity for an aspiring generalist researcher)
- Read, then write book reviews (see posts tagged under ‘books,’ and also suggestions from Michael Aird and from Buck Shlegeris; also related is Holden Karnofsky’s ‘Reading books vs. engaging with them’)
- Build inside views (see Holden Karnofsky’s ‘Learning by writing’ and Neel Nanda’s ‘How I formed my own views about AI safety’
- From Linch Zhang’s shortform: “Deep dive into seminal papers/blog posts and attempt to identify all the empirical and conceptual errors in past work, especially writings by either a) other respected EAs or b) other stuff that we otherwise think of as especially important.”
- Apply for jobs/internships/research training programs (and view the process of writing written responses in your applications as skill-building)
- Possibly other things suggested in Aird’s ‘Notes on EA-related research, writing, testing fit, learning, and the Forum’
- elifland Apr 1, 2022, 12:25 PM
  4 points
  0 ∶ 0
  Parent
  Hey, thanks for sharing these other options. I agree that one of these choices makes more sense than forecasting in many cases, and likely (90%) the majority. But I still think forecasting is a solid contender and plausibly (25%) the best in the plurality of cases. Some reasons:
  1. Which activity is best likely depends a lot on which is easiest to actually start doing, because I think the primary barrier to doing most of these usefully is “just” actually getting started and completing something. Forecasting may (40%)^[1] be the most fun and least intimidating of these for many (33%+) prospective researchers because of the framing of competing on a leaderboard and the intrigue of trying to predict the future.
  2. I think the EA community has relatively good epistemics, but there is still room for improvement, and more researchers getting a forecasting background is one way to help with this (due to both epistemic training and identifying prospective researchers with good epistemics).
  3. Depending on the question, forecasting can look a lot like a bite-sized chunk of research, so I don’t think it’s mutually exclusive with some of the activities you listed and especially similar to summarizations/collections: for example, Ryan summarized relevant parts of papers then formed some semblance of an inside view in his winning entry.
  Also, I was speaking from personal experience here; e.g. Misha and I both have forecasted for a few years and enjoyed it while building skills and a track record, and are now doing ~generalist research or had the opportunity to and seriously considered it, respectively.
  1. ^
    I think this will become especially true as the UX of forecasting platforms improves; let’s say 55% this is true in 3 years from now, as I expect the UX here to improve more than the “UX” of other options like summarizing papers.
elifland Feb 4, 2022, 8:21 PM
6 points
0 ∶ 0
Some meta forecasting questions:
Will the Impactful Forecasting Prize get entries from at least 25 unique people/groups?
Will the Impactful Forecasting Prize get entries from at least 25 unique people/groups?
Will the Impactful Forecasting Prize get at least 100 total entries?
Will the Impactful Forecasting Prize get at least 100 total entries?
And the same on Manifold markets: here and here
- elifland Feb 5, 2022, 10:41 PM
  1 point
  0 ∶ 0
  Parent
  There results are pretty interesting! I’m surprised at how much optimism there is about 25 unique people/groups compared to 100 total entries; my intuition for expecting an average of about 4 entries per person/group was that most would only submit 1-2, but it only takes a few to submit on many questions to drive the average up substantially.
Peter Wildeford Feb 5, 2022, 9:17 PM
5 points
0 ∶ 0
This is an awesome initiative!

It would be cool to see Metaculus create a category for EA-relevant questions or otherwise make it easier to find EA-relevant questions. It would be even cooler if this were somewhat prominently tagged / highlighted.

It would also be cool to use Metaculus’s infrastructure to create a tournament around some of these questions.
- Pablo Feb 7, 2022, 3:48 PM
  4 points
  0 ∶ 0
  Parent
  There is already an ‘effective altruism’ category. I agree it would be nice to make questions in this category more prominent. It also looks like the category is not being applied to all EA-relevant questions (e.g. all questions in the Ragnarök series). I can pay someone familiar with EA to go through the entire list of questions and apply the EA category when appropriate. Do you think this would be worth it?
  - Peter Wildeford Feb 7, 2022, 6:00 PM
    4 points
    0 ∶ 0
    Parent
    I think so—happy to help with funding if needed
    - Pablo 23 Feb 2022 21:08 UTC
      2 points
      0 ∶ 0
      Parent
      Done. There were a lot of EA-relevant questions. It took him over 20 hours to categorize them all.
      - Peter Wildeford 23 Feb 2022 22:29 UTC
        2 points
        0 ∶ 0
        Parent
        Wow, thank you!
    - Pablo 7 Feb 2022 18:02 UTC
      2 points
      0 ∶ 0
      Parent
      Okay, will do.
elifland 11 Feb 2022 23:39 UTC
4 points
0 ∶ 0
Thanks to Juan Cambeiro, the questions are now also viewable as a Metaculus series.
- Misha_Yagudin 17 Feb 2022 18:10 UTC
  4 points
  0 ∶ 0
  Parent
  Thanks to Nuño Sempere, the questions are now also viewable as a Metaforecast dashboard.
Misha_Yagudin 9 Feb 2022 18:35 UTC
3 points
0 ∶ 0
We are happy to announce that EA Infrastructure Fund approved our funding request for this prize. (A few days ago actually.)