Literature review of Transformative Artificial Intelligence timelines

We summarize and compare several models and forecasts predicting when transformative AI will be developed.

Highlights

  • The review includes quantitative models, including both outside and inside view, and judgment-based forecasts by (teams of) experts.

  • While we do not necessarily endorse their conclusions, the inside-view model the Epoch team found most compelling is Ajeya Cotra’s “Forecasting TAI with biological anchors”, the best-rated outside-view model was Tom Davidson’s “Semi-informative priors over AI timelines”, and the best-rated judgment-based forecast was Samotsvety’s AGI Timelines Forecast.

  • The inside-view models we reviewed predicted shorter timelines (e.g. bioanchors has a median of 2052) while the outside-view models predicted longer timelines (e.g. semi-informative priors has a median over 2100). The judgment-based forecasts are skewed towards agreement with the inside-view models, and are often more aggressive (e.g. Samotsvety assigned a median of 2043).

Introduction

Over the last few years, we have seen many attempts to quantitatively forecast the arrival of transformative and/​or general Artificial Intelligence (TAI/​AGI) using very different methodologies and assumptions. Keeping track of and assessing these models’ relative strengths can be daunting for a reader unfamiliar with the field. As such, the purpose of this review is to:

  1. Provide a relatively comprehensive source of influential timeline estimates, as well as brief overviews of the methodologies of various models, so readers can make an informed decision over which seem most compelling to them.

  2. Provide a concise summarization of each model/​forecast distribution over arrival dates.

  3. Provide an aggregation of internal Epoch subjective weights over these models/​forecasts. These weightings do not necessarily reflect team members’ “all-things-considered” timelines, rather they are aimed at providing a sense of our views on the relative trustworthiness of the models.

For aggregating internal weights, we split the timelines into “model-based” and “judgment-based” timelines. Model-based timelines are given by the output of an explicit model. In contrast, judgment-based timelines are either aggregates of group predictions on, e.g., prediction markets, or the timelines of some notable individuals. We decompose in this way as these two categories roughly correspond to “prior-forming” and “posterior-forming” predictions respectively.

In both cases, we elicit subjective probabilities from each Epoch team member reflective of:

  1. how likely they believe a model’s assumptions and methodology to be essentially accurate, and

  2. how likely it is that a given forecaster/​aggregate of forecasters is well-calibrated on this problem,

respectively. Weights are normalized and linearly aggregated across the team to arrive at a summary probability. These numbers should not be interpreted too literally as exact credences, but rather a rough approximation of how the team views the “relative trustworthiness” of each model/​forecast.

Caveats

  • Not every model/​report operationalizes AGI/​TAI in the same way, and so aggregated timelines should be taken with an extra pinch of salt, given that they forecast slightly different things.

  • Not every model and forecast included below yields explicit predictions for the snapshots (in terms of CDF by year and quantiles) which we summarize below. In these cases, we have done our best to interpolate based on explicit data-points given.

  • We have included models and forecasts that were explained in more detail and lent themselves easily to a probabilistic summary. This means we do not cover less explained forecasts like Daniel Kokotajlo’s and influential pieces of work without explicit forecasts such as David Roodman’s Modelling the Human Trajectory.

Results

(Italicized values are interpolated from a gamma distribution fitted to known values). 1: See this appendix for the individual weightings from respondents and the rationale behind their aggregation.
Visualization of the different forecasts, and their aggregates.

Read the rest of the review here

Crossposted from LessWrong (35 points, 7 comments)