Red-teaming Holden Karnofsky’s AI timelines

Summary

This is a red teaming exercise on Holden Karnofsky’s AI Timelines: Where the Arguments, and the “Experts,” Stand (henceforth designated “HK’s AI Timelines”), completed in the context of the Red Team Challenge by Training for Good[1].

Our key conclusions are:

  • The predictions for the probability of transformative AI[2] (TAI) presented in “HK’s AI Timelines” seem reasonable in light of the forecasts presented in the linked technical reports, according to our own reading and numerical analysis (see 1. Prediction).

  • All but two of the technical reports of Open Philanthropy which informed Holden’s predictions were reviewed by what we judge to be credible experts (see 2. Reviewers of the technical reports).

  • The act of forecasting and discussing AI progress publicly may be an information hazard, due to the risk of shortening (real and predicted) timelines. However, the forecasts of “HK’s AI Timelines” seem more likely to lengthen than to shorten AI timelines, since they predict longer ones than the Metaculus community prediction for AGI (see 3. Information hazards).

Our key recommendations are:

  • Defining “transformative AI” more clearly, such that the predictions of “HK’s AI Timelines” could be better interpreted and verified against evidence (see Interpretation). This could involve listing several criteria (as in this Metaculus question), or a single condition (as in Bio anchors[3]).

  • Including an explanation about the inference of the predictions from the sources mentioned in the “one-table summary” (see Inference). This could include explicitly stating the weight given to each of them (quantitatively or qualitatively), and describing the influence of other sources.

  • Investigating whether research on AI timelines carried out in China might have been overlooked due to language barriers (see Representativeness).

We welcome comments on our key conclusions and recommendations, as well as on reasoning transparency, strength of arguments, and red-teaming efforts.

Author contributions

The contributions by author are as follows:

  • Simon: argument mapping, complementary background research, discussion, and structural and semantic editing of the text.

  • Vasco: argument mapping, background research, technical analysis, and structuring and writing of all sections.

Acknowledgements

Thanks to:

  • For discussions which informed this text, Kasey Shibayama, and Saksham Singhi.

  • For organising the Red Team Challenge, Training for Good.

  • For comments, Aaron Bergman, Cillian Crosson, Elika Somani, Hasan Asim, Jac Liew, and Robert Praas.

Introduction

We have analysed Holden Karnofsky’s blog post AI Timelines: Where the Arguments, and the “Experts,” Stand with the goal of constructively criticising Holden’s claims and the way they were communicated. In particular, we investigated:

The key motivations for red-teaming this particular article are:

  • AI timelines[4] being relevant to understand the extent to which positively shaping the development of artificial intelligence is one of the (or the) world’s most pressing problems.

  • “HK’s AI Timelines” and Holden Karnofsky arguably being influential in informing views about AI timelines and prioritisation amongst longtermist causes.

  • Holden Karnofsky arguably being influential in informing views on other important matters related to improving the world, thus making it appealing to contribute to any improvements of his ideas and writing.

1. Prediction

Holden Karnofsky estimates that:

There is more than a 10% chance we’ll see transformative AI within 15 years (by 2036); a ~50% chance we’ll see it within 40 years (by 2060); and a ~2/​3 chance we’ll see it this century (by 2100).

Karnofsky bases his forecast on a number of technical reports, and we analysed it by answering the following:

  • Interpretation: are the technical reports being accurately interpreted?

  • Inference: is the forecast consistent with the interpretations of the technical reports?

  • Representativeness: are the technical reports representative of the best available evidence?

The following sections deal with each of these questions. However, for the interpretation and inference, only 3 of the 9 in-depth pieces presented in the “one-table summary” of “HK’s AI Timelines” are studied:

These seem to be the only in-depth pieces that provide quantitative forecasts for the year by which TAI will be seen, which facilitates comparisons. Nevertheless, they do not cover all the evidence based on which Holden Karnofsky’s forecasts were made.

Interpretation

Are the technical reports being accurately interpreted?

We interpreted the numerical predictions made by the technical reports to be essentially in agreement with those made in “HK’s AI Timelines”.

Our interpretation of the forecasts for the probability of TAI given in the aforementioned reports (see the tab “AI Timelines predictions” of this Sheets), together with the one presented in the “one-table summary” of “HK’s AI Timelines”, is provided in the table below.

ReportInterpretation
“HK’s AI Timelines”Us
AI experts[5]

~20 % by 2036.

~50 % by 2060.

~70 % by 2100.

25 % by 2036.

49 % by 2060.

69 % by 2100.

Bio anchors[6]

> 10 % probability by 2036.

~ 50 % chance by 2055.

~ 80 % chance by 2100.

18 % by 2036.

50 % by 2050.

80 % by 2100.

SIP

8% by 2036.

13% by 2060.

20% by 2100.

8 % by 2036.

18 % by 2100.

For all the forecasts, our interpretation is in agreement with that of “HK’s AI Timelines” (when rounded to one significant digit).

However, it is worth noting the extent to which the “most aggressive” and “most conservative” estimates of “Bio anchors” differ from the respective “best guesses”[7] (see Part 4 of the report). This is illustrated in the table below (footnotes 8-11 are all citations from Ajeya Cotra).

Probability of TAI by the year…Conservative estimateBest guessAggressive estimate
2036[8]2%18%45%
2100[9]60%80%90%
Median forecast2090[10]20502040[11]

Indeed, the uncertainty of “Bio anchors” is acknowledged by Holden Karnofsky here.

There is also the question of the comparability of the differing definitions of transformative AI in the different technical reports, and if Holden’s interpretation of these justifies his overall estimate. We mostly agree with Holden’s claim in one of the footnotes of “HK’s AI Timelines” that:

In general, all of these [the reports’ predicted] probabilities refer to something at least as capable as PASTA, so they directionally should be underestimates of the probability of PASTA (though I don’t think this is a major issue).[12]

Regarding the first part of the above quote, the aforementioned probabilities refer:

  • In “AI experts”, to “high-level machine intelligence” (HLMI), which “is achieved when unaided machines can accomplish every task better and more cheaply than human workers”.

    • Since HLMI would perform “all of the human activities needed to speed up scientific and technological advancement” unaided, we would have PASTA.

  • In “Bio anchors”, to “transformative AI”, which “must bring the [global] growth rate to 20%-30% per year if used everywhere it would be profitable to use”[13].

    • If sustained, this growth seems fast enough “to bring us into a new, qualitatively different future”, thus being in agreement with the definition provided in “HK’s AI Timelines” for “transformative AI”.

    • However, PASTA could conceivably be more capable than AGI as defined in the operationalisation of this Metaculus question. If this is the case, Holden’s forecasts might not be “underestimates of the probability of PASTA”.

  • In “SIP”, to “artificial general intelligence” (AGI), i.e. “computer program(s) that can perform virtually any cognitive task as well as any human, for no more money than it would cost for a human to do it”.

    • This definition is similar to that of HLMI, although the emphasis on physical machines is less clear. However, it still seems to encompass “all of the human activities needed to speed up scientific and technological advancement”, hence we think this would also bring PASTA.

A more concrete definition of TAI in “HK’s AI Timelines” would have been useful to understand the extent to which its predictions are comparable with those of other sources.

Moreover, in the second half of the quotation above, Holden claims that he does not think it a “major issue” that the predicted probabilities should be “underestimates of the probability of PASTA”. We think a justification for this would be valuable, especially if the timelines for PASTA are materially shorter than those for TAI as defined in the technical reports (which could potentially be a major issue).

Inference

Is the forecast consistent with the interpretations of the technical reports?

We found Holden Karnofsky’s estimate to be consistent with our interpretation of the technical reports, even when accounting for the uncertainty of the forecasts of the individual reports.

Methodology

The inference depends not only on the point estimates of the technical reports (see Interpretation), but also on their uncertainty. Having this in mind, probability distributions representing the year by which TAI will be seen were fitted to the forecasts regarding our interpretation of the technical reports[14] (rows 2-4 and 7-9 in the table below). Moreover, “mean” and “aggregated” distributions which take into account all the three reports were calculated as follows:

  • Aggregated lognormal (5) (according to this):

    • Mean[15]: mean of the means of the fitted distributions weighted by the reciprocal of their variances.

    • Standard deviation: square root of the reciprocal of the sum of the reciprocals of the variances of the fitted distributions.

  • Aggregated loguniform (10):

    • Minimum: maximum of the minima of the fitted distributions.

    • Maximum: minimum of the maxima of the fitted distributions.

  • Mean lognormal/​loguniform (6 and 11):

    • Cumulative distribution function (CDF) equal to the mean of the CDFs of the fitted lognormal/​loguniform distributions weighted by the reciprocal of the variance of the fitted lognormal distributions.

The data points relative to the forecasts for 2036 and 2100 were used to estimate the parameters of such distributions. Estimates for the probability of TAI by these years are provided in the three reports and “HK’s AI Timelines”, which enables consistency. The parameters of the derived distributions are presented in the tab “Derived distributions parameters”.

Results and discussion

The forecasts for the probability of TAI by 2036, 2060 and 2100 are presented in the table below. In addition, values for all the years from 2025 to 2100 are given in the tab “Derived distributions CDFs” for all the derived distributions.

DistributionProbability that TAI will be seen (%) by…
203620602100
1. “HK’s AI timelines”> 105067
2. AI experts lognormal254169
3. Bio anchors lognormal184080
4. SIP lognormal81118
5. Aggregated lognormal82777
6. Mean lognormal193974
7. AI experts loguniform254269
8. Bio anchors loguniform184180
9. SIP loguniform81218
10. Aggregated loguniform184180
11. Mean loguniform194074
Range8 − 2511 − 4218 − 80

The forecasts of “HK’s AI Timelines” are aligned with those of the derived distributions[16]. These predict that the probability of TAI is:

  • By 2036, 8 % to 25 %, which agrees with the probability of more than 10 % predicted in “HK’s AI Timelines”.

  • By 2060, 11 % to 42 %, which is lower than the probability of 50 % predicted in “HK’s AI Timelines”. However, this seems reasonable for the following reasons:

    • The forecasts of “AI experts”, “Bio anchors” and “SIP” for 2060 were 49 %, > 50 %, and < 18 % (see Interpretation).

    • Giving more weight to “AI experts” and “Bio anchors” (whose timelines are longer) agrees with the relative weights estimated from the reciprocal of the variance of the fitted lognormal distributions (see E3:E5 of tab “Derived distributions parameters”):

      • 30 % for “AI experts”.

      • 65 % for “Bio anchors”.

      • 5 % for “SIP”.

    • The forecasts of the derived distributions for 2060 are not as accurate as for years closer to either 2036 or 2100, whose data points were used to determine the parameters of the fitted distributions.

  • By 2100, 18 % to 80 %, which contains the probability of roughly 70 % predicted in “HK’s AI Timelines”. A forecast closer to the upper bound also seems reasonable:

    • The range is 67 % to 80 % excluding the forecasts which only rely on data from “SIP” (rows 4 and 9), which should arguably have a lower weight according to the above.

Nevertheless, we think “HK’s AI Timelines” would benefit from including an explanation about how Holden’s forecasts were derived from the sources of the “one-table summary”. For example, explicitly stating the weight given to each of the sources mentioned in the “one-table summary” (quantitatively or qualitatively).

Representativeness

Are the technical reports representative of the best available evidence?

We think there may be further sources Holden Karnofsky could have considered for his base of evidence to be more representative, but that it seemingly strikes a good balance between being representative and succinct.

6 of the 9 pieces linked in the “one-table summary” of “HK’s AI Timelines” regard analyses from Open Philanthropy. This is noted in “HK’s AI Timelines”[17], and could reflect:

  • The value and overarching nature of Open Philanthropy’s analyses, which often cover multiple research fields.

  • The higher familiarity of Holden Karnofsky with such analyses.

These are valid reasons, but it would arguably be beneficial to include/​consider other sources. For example:

We did not, however, look into whether the conclusions of these publications would significantly update Holden’s claims.

It would also be interesting to know whether:

  • Research on AI timelines carried out in China might have been overlooked due to language barriers.

  • Developments in AI research after the publication of “HK’s AI Timelines” (see e.g. 1st paragraph of this post) might have updated Holden’s AI timelines.

  • “AI experts” is representative of other surveys (e.g. Gruetzemacher 2019).

That being said:

  • The “one-table summary” of “HK’s AI Timelines” was supposed to summarise the angles on AI forecasting discussed in The “most important century” series, rather than being a collection of all the relevant sources.

  • The predictions made in “HK’s AI Timelines” arguably do not solely rely on the sources listed there.

All in all, we essentially agree with the interpretations of the technical reports, and think Holden Karnofsky’s predictions could justifiably be inferred from their results. In addition, the sources which informed the predictions appear representative of the best available evidence. Consequently, the forecasts for TAI of “HK’s AI Timelines” seem reasonable.

2. Reviewers of the technical reports

We have not analysed the reviews of the technical reports from Open Philanthropy referred by Holden Karnofsky. However, their reviewers are seemingly credible. Brief descriptions are presented below:

For transparency, it seems worth mentioning the reasons for Past AI Forecasts not having been reviewed.

3. Information hazards

In the context of how to act in the absence of a robust expert consensus, it is argued in “HK’s AI Timelines” that the “most important century” hypothesis should be taken seriously until and unless a “field of AI forecasting” develops, based on what is known now. The following reasons are presented:

  • “We don’t have time to wait for a robust expert consensus”.

  • “Cunningham’s Law (“the best way to get a right answer is to post a wrong answer”) may be our best hope for finding the flaw in these arguments”.

  • “Skepticism this general seems like a bad idea”.

Even if the above points are true, AI forecasting could be an information hazard. As noted in Forecasting AI progress: a research agenda from Ross Gruetzemacher et al., “high probability forecasts of short timelines to human-level AI might reduce investment in safety as actors scramble to deploy it first to gain a decisive strategic advantage”[19] (see Superintelligence from Nick Bostrom).

That being said, the forecasts of “HK’s AI Timelines” seem more likely to lengthen than to shorten AI timelines[20]. On the one hand, it could be argued that they are shorter than those of most citizens. On the other hand:

  • The median forecast for TAI of “HK’s AI Timelines”, 2060, has concerned a later date than the median Metaculus’ community prediction for AGI (see tab “Metaculus predictions” of this Sheets), which was 2055 on the date on which “HK’s AI Timelines” was published (7 September 2021), and 2040 on 29 May 2022.

  • Holden Karnofsky’s timelines are longer than those of the other three “Public Figure Predictions” linked to Metaculus’ question about the Date of Artificial General Intelligence[21].

  1. ^

    We have not analysed in detail other posts from Cold Takes (Holden’s blog) related to AI forecasting. However, I (Vasco) read The Most Important Century in its entirety.

  2. ^

    “By “transformative AI”, I [Holden Karnofsky] mean “AI powerful enough to bring us into a new, qualitatively different future”. I specifically focus on what I’m calling PASTA: AI systems that can essentially automate all of the human activities needed to speed up scientific and technological advancement. I’ve argued that advanced AI could be sufficient to make this the most important century, via the potential for a productivity explosion as well as risks from misaligned AI”.

  3. ^

    For “Bio anchors”, see Part 1, section “Definitions for key abstractions used in the model”.

  4. ^

    AI timelines is an instance of AI forecasting which includes predicting when human-level AI will emerge.

  5. ^

    Forecasts for “high level machine intelligence (all human tasks)” taken from analysing Fig. 1 (see tab “AI experts”).

  6. ^

    The TAI forecasts are provided in Part 4 of the Bio anchors report.

  7. ^

    Note that Holden mentions here that: “Overall, my best guesses about transformative AI timelines are similar to those of Bio Anchors”.

  8. ^

    “I think a very broad range, from ~2% to ~45%, could potentially be defensible”.

  9. ^

    “Ultimately, I could see myself arriving at a view that assigns anywhere from ~60% to ~90% probability that TAI is developed this century; this view is even more tentative and subject to revision than my view about median TAI timelines. My best guess right now is about 80%”.

  10. ^

    “~2090 for my “most conservative plausible median”.

  11. ^

    “My “most aggressive plausible median” is ~2040”.

  12. ^

    PASTA qualifies as “transformative AI”, since it is an “AI powerful enough to bring us into a new, qualitatively different future”.

  13. ^

    Such growth rate is predicted to coincide with the emergence of AGI according to this Metaculus question. As of 25 June 2022, the community prediction for the time between the world real gross domestic product being 25% higher than every previous year and the development of artificial general intelligence (AGI) was one month, hence supporting Ajeya Cotra’s definition (although we are wary of inverse causation).

  14. ^

    The approach followed to determine the parameters of the fitted distributions is explained here.

  15. ^

    Here, “mean” is written in italic whenever it refers to the mean of the logarithm. Likewise for other statistics.

  16. ^

    This method is primarily used as a sense check (i.e. “Is Karnofsky’s estimate reasonable?”), and is not intended to precisely quantify deviations.

  17. ^

    “For transparency, note that many of the technical reports are Open Philanthropy analyses, and I am co-CEO of Open Philanthropy”.

  18. ^

    This was subsequently added by Holden (in this footnote) to address a key recommendation of a previous version of this analysis: “mentioning the reviews of Past AI Forecasts, or the reasons for it not having been reviewed”.

  19. ^
  20. ^

    Both the prediction and realisation of TAI.

  21. ^

    These respect Ray Kurzweil, Eliezer Yudkowsky, and Bryan Caplan.