Samotsvety’s AI risk forecasts
Crossposted to LessWrong and Foxy Scout
Introduction
In my review of What We Owe The Future (WWOTF), I wrote:
Finally, I’ve updated some based on my experience with Samotsvety forecasters when discussing AI risk… When we discussed the report on power-seeking AI, I expected tons of skepticism but in fact almost all forecasters seemed to give >=5% to disempowerment by power-seeking AI by 2070, with many giving >=10%.
In the comments, Peter Wildeford asked:
It looks like Samotsvety also forecasted AI timelines and AI takeover risk—are you willing and able to provide those numbers as well?
We separately received a request from the FTX Foundation to forecast on 3 questions about AGI timelines and risk.
I sent out surveys to get Samotsvety’s up-to-date views on all 5 of these questions, and thought it would be valuable to share the forecasts publicly.
A few of the headline aggregate forecasts are:
25% chance of misaligned AI takeover by 2100, barring pre-APS-AI catastrophe
81% chance of Transformative AI (TAI) by 2100, barring pre-TAI catastrophe
32% chance of AGI being developed in the next 20 years
Forecasts
In each case I aggregated forecasts by removing the single most extreme forecast on each end, then taking the geometric mean of odds.
To reduce concerns of in-group bias to some extent, I calculated a separate aggregate for those who weren’t highly-engaged EAs (HEAs) before joining Samotsvety. In most cases, these forecasters hadn’t engaged with EA much at all; in one case the forecaster was aligned but not involved with the community. Several have gotten more involved with EA since joining Samotsvety.
Unfortunately I’m unable to provide forecast rationales in this post due to forecaster time constraints, though I might in a future post. I provided my personal reasoning for relatively similar forecasts (35% AI takeover by 2100, 80% TAI by 2100) in my WWOTF review.
WWOTF questions
Aggregate (n=11) | Aggregate, non-pre-Samotsvety-HEAs (n=5) | Range | |
What’s your probability of misaligned AI takeover by 2100, barring pre-APS-AI catastrophe? | 25% | 14% | 3-91.5% |
What’s your probability of Transformative AI (TAI) by 2100, barring pre-TAI catastrophe? | 81% | 86% | 45-99.5% |
FTX Foundation questions
For the purposes of these questions, FTX Foundation defined AGI as roughly “AI systems that power a comparably profound transformation (in economic terms or otherwise) as would be achieved in [a world where cheap AI systems are fully substitutable for human labor]”. See here for the full definition used.
Unlike the above questions, these are not conditioning on no pre-AGI/TAI catastrophe.
Aggregate (n=11) | Aggregate, non-pre-Samotsvety-HEAs (n=5) | Range | |
What’s the probability of existential catastrophe from AI, conditional on AGI being developed by 2070?[1] | 38% | 23% | 4-98% |
What’s the probability of AGI being developed in the next 20 years? | 32% | 26% | 10-70% |
What’s the probability of AGI being developed by 2100? | 73% | 77% | 45-80% |
Who is Samotsvety Forecasting?
Edited to add: Our track record is now online here.
Samotsvety Forecasting is a forecasting group that was started primarily by Misha Yagudin, Nuño Sempere, and myself predicting as a team on INFER (then Foretell). Over time, we invited more forecasters who had very strong track records of accuracy and sensible comments, mostly on Good Judgment Open but also a few from INFER and Metaculus. Some strong forecasters were added through social connections, which means the group is a bit more EA-skewed than it would be without these additions. A few Samotsvety forecasters are also superforecasters.
How much do these forecasters know about AI?
Most forecasters have at least read Joe Carlsmith’s report on AI x-risk, Is Power-Seeking AI an Existential Risk?. Those who are short on time may have just skimmed the report and/or watched the presentation. We discussed the report section by section over the course of a few weekly meetings.
~5 forecasters also have some level of AI expertise, e.g. I did some adversarial robustness research during my last year of undergrad then worked at Ought applying AI to improve open-ended reasoning.
How much weight should we give to these aggregates?
My personal tier list for how much weight I give to AI x-risk forecasts to the extent I defer:
Individual forecasts from people who seem to generally have great judgment, and have spent a ton of time thinking about AI x-risk forecasting e.g. Cotra, Carlsmith
Samotsvety aggregates presented here
A superforecaster aggregate (I’m biased re: quality of Samotsvety vs. superforecasters, but I’m pretty confident based on personal experience)
Individual forecasts from AI domain experts who seem to generally have great judgment, but haven’t spent a ton of time thinking about AI x-risk forecasting (this is the one I’m most uncertain about, could see anywhere from 2-4)
Everything else I can think of I would give little weight to.[2][3]
Acknowledgments
Thanks to Tolga Bilge, Juan Cambeiro, Molly Hickman, Greg Justice, Jared Leibowich, Alex Lyzhov, Jonathan Mann, Nuño Sempere, Pablo Stafforini, and Misha Yagudin for making forecasts.
- ^
Unlike the WWOTF question, this includes any existential catastrophe caused by AI and not just misaligned takeovers (this is a non-negligible consideration for me personally and I’m guessing several other forecasters, though I do give most weight to misaligned takeovers).
- ^
Why do I give little weight to Metaculus’s views on AI? Primarily because of the incentives to make very shallow forecasts on a ton of questions (e.g. probably <20% of Metaculus AI forecasters have done the equivalent work of reading the Carlsmith report), and secondarily that forecasts aren’t aggregated from a select group of high performers but instead from anyone who wants to make an account and predict on that question.
- ^
Why do I give little weight to AI expert surveys such as When Will AI Exceed Human Performance? Evidence from AI Experts? I think most AI experts have incoherent and poor views on this because they don’t think of it as their job to spend time thinking and forecasting about what will happen with very powerful AI, and many don’t have great judgment.
- My take on What We Owe the Future by 1 Sep 2022 18:07 UTC; 353 points) (
- Update to Samotsvety AGI timelines by 24 Jan 2023 4:27 UTC; 120 points) (
- Samotsvety’s AI risk forecasts by 9 Sep 2022 4:01 UTC; 44 points) (LessWrong;
- EA & LW Forums Weekly Summary (5 − 11 Sep 22’) by 12 Sep 2022 23:21 UTC; 36 points) (
- AI #9: The Merge and the Million Tokens by 27 Apr 2023 14:20 UTC; 36 points) (LessWrong;
- 15 Feb 2023 19:48 UTC; 29 points) 's comment on There can be highly neglected solutions to less-neglected problems by (
- 2 Jan 2023 0:17 UTC; 24 points) 's comment on Your 2022 EA Forum Wrapped 🎁 by (
- EA & LW Forums Weekly Summary (5 − 11 Sep 22′) by 12 Sep 2022 23:24 UTC; 24 points) (LessWrong;
- Forecasting Newsletter: September 2022. by 12 Oct 2022 16:37 UTC; 23 points) (
- Monthly Overload of EA—October 2022 by 1 Oct 2022 12:32 UTC; 13 points) (
- 13 Jul 2023 16:14 UTC; 12 points) 's comment on Tetlock on low AI xrisk by (
- We Have Not Been Invited to the Future: e/acc and the Narrowness of the Way Ahead by 17 Jul 2024 22:15 UTC; 10 points) (
- 9 Sep 2022 4:01 UTC; 4 points) 's comment on My take on What We Owe the Future by (
- 25 Jul 2023 12:56 UTC; 2 points) 's comment on Summary of posts on XPT forecasts on AI risk and timelines by (
I’m curious whether there’s any answer AI experts could have given that would be a reasonably big update for you.
For example is there any level of consensus against ~AGI by 2070 (or some other date) that would be strong enough to move your forecast by 10 percentage points?
Good question. I think AI researchers views inform/can inform me. A few examples from the recent NLP Community Metasurvey. I would quote bits from this summary.
This was surprsing and updated me somewhat against shorter timelines (and higher risk) as, for example, it clashes with the “+12 OOMS Enough” premise of the Kokotajlo’s argument for short timelines of the Carlsmith report (see also this and his review of Carlsmith report).
If these numbers were significantly lower or higher, it would also probably update my views.
This number is puzzling and hard to interpret. It seems appropriate in light of AI Impacts’ What do ML researchers think about AI in 2022? where “48% of respondents gave at least 10% chance of an extremely bad outcome”.
I don’t fully understand what this implies about the ML community’s views on AI alignment. But I can see myself updating positively if these concerns would lead to more safety culture, alignment research, etc.
Note that we’d probably also look at the object level reasons for why they think that. E.g., new scaling laws findings could definitely shift our/my forecast by 10%.
Fair question. I say little weight but if it was far enough from my view I would update a little. My view also may not be representative of other forecasters, as is evident from Misha’s comment.
From the original Grace et al. survey (and I think the more recent ones as well? but haven’t read as closely) the ML researchers clearly had very incoherent views depending on the question being asked and elicitation techniques, which I think provides some evidence they haven’t thought about it that deeply and we shouldn’t take it too seriously (some incoherence is expected, but I think they gave wildly different answers for HLMI (human-level machine intelligence) and full automation of labor).
So I think I’d split up the thresholds by somewhat coherent vs. still very incoherent.
My current forecast for ~AGI by 2100 barring pre-AGI catastrophe is 80%. To move it to 70% based just a survey of ML experts, I think I’d have to see something like one of:
ML experts still appear to be very incoherent, but are giving a ~10% chance of ~AGI by 2100 on average across framings.
ML experts appear to be somewhat coherent, and are giving a ~25% chance of ~AGI by 2100.
(but I haven’t thought about this a lot, these numbers could change substantially on reflection or discussion/debate)
I am told that APS, in this context, stands for “advanced, planning, strategically aware” and is from Carlsmith’s report https://arxiv.org/abs/2206.13353
Yup, I linked the text “APS-AI” in the post to the relevant section of the report. Sorry if it wasn’t that noticeable!
I appreciate the link. I didn’t make good use of it, unfortunately—instead of reading it carefully I searched the page for the acronym hoping to find an expansion, and didn’t end up reading the list of properties.
I think it would be very valuable if more reports of this kind were citable in contexts where people are sensitive to signs of credibility and prestige. In other words, I think there are contexts where if this existed as a report on SSRN or even ArXiV, or on the website of an established institution, I think it could be citable and would be valuable as such. Currently I don’t think it could be cited (or taken seriously if cited). So if there are low-cost ways of publishing this or similar reports in a more polished way, I think that would be great.
Caveats that (i) maybe you have done this and I missed it; (ii) this comment isn’t really specific to this post but it’s been on my mind and this is the most recent post where it is applicable; and (iii) on balance it does nonetheles seem likely that the work required to turn this into a ‘polished’ report means doing so is not (close to) worthwhile.
That said: this is an excellent post and I’m very grateful for these forecasts.
Thanks for the suggestion and glad you found the forecasts helpful :)
I personally have a distaste for academic credentialist culture so am probably not the best person to turn this into a more prestigious looking report. I agree it might be valuable despite my distaste, so if anyone reading this is interested in doing so feel free to DM me and I can probably help with funding and review if you have a good writing track record.
This 25% forecast is an order of magnitude different from the Metaculus estimate of 2-2.5%
I don’t get why two different groups of forecasters aggregated results end up with an over an order of magnitude difference.
Any idea why? Have I misunderstood something? Is one group know to be better? Or is one group more likely to be bias? Or is forecasting risks just really super unreliable and not a thing to put much weight on?
https://www.metaculus.com/questions/2568/ragnar%25C3%25B6k-seriesresults-so-far/
In terms of forecasting accuracy on Metaculus, Eli’s individual performance is comparable[1] to the community aggregate on his own, despite him having optimised for volume (he’s 10th on the heavily volume weighted leaderboard). I expect that were he to have pushed less hard for volume, he’d have significantly outperformed the community aggregate even as an individual.[2]
Assuming the other Samotsvety forecasters are comparably good, I’d expect the aggregated forecasts from the group to very comfortably outperform the community aggregate, even if they weren’t paying unusual attention to the questions (which they are).
Comparing ‘score at resolution time’, Eli looks slightly worse than the community. Comparing ‘score across all times’, Eli looks better than the community. Score across all times is a better measure of skill when comparing individuals, but does disadvantage the community prediction, because at earlier times questions have fewer predictors.
As some independent evidence of this, I comfortably outperform the community aggregate, having tried less hard than Eli to optimise for volume. Eli has beaten me in more than one competition, and think he’s a better forecaster.
In addition to the points above, there have been a few jokes on questions like that about the scoring rule not being proper (if the world ends, you don’t get the negative points for being wrong!). Not sure how much of a factor that is, though, and I could imagine it being minimal.
My take is that we should give little weight to Metaculus. From footnote 2 of the post:
(Edited to add: I see the post you linked also includes the “Metaculus prediction” which theoretically performs significantly better than the community prediction by weighting stronger predictors more heavily. But if you look at its actual track record, it doesn’t do much better than the community. For binary questions at resolve time, it has a log score of 0.438 vs. 0.426 for community. At all times, it gets 0.280 vs. 0.261. For continuous questions at resolve time, it has a log score of 2.19 vs. 2.12. At all times, it gets 1.57 vs. 1.55.)
That said:
I wouldn’t want people to overestimate the precision of the estimates in this post! Take them as a few data points among many. I also think it’s very healthy for the community if many people are forming inside views about AI risk, though I understand it’s difficult and had a hard time with it myself for a while.
Ah the answer was in the footnotes all along. Silly me. Thank you for the reply!
Is this on specific topic areas (e.g., “TAI forecasting” or “EA topics”) or more generally?
More generally, though probably especially on AI. I’m not exactly sure how to handle some of the evidence for why I’m pretty confident on this, but I can gesture at a few data points that seem fine to share:
Nuño, Misha and I each did much better on INFER than at least a few superforecasters (and we were 3 of the top 4 on the leaderboard; no superforecasters did better than us).
I’ve seen a few superforecaster aggregate predictions that seemed pretty obviously bad in advance, which I pre-registered and in both cases I was right.
I said on Twitter I thought supers were overconfident on Russia invasion of Ukraine; supers were at 84% no invasion, l was at 55% (and Metaculus was similar, to be fair). Unfortunately, Russia went on to invade Ukraine a few weeks later.
I noticed that superforecasters may be overconfident on the rise of the Delta COVID variant, and it seems they likely were: they predicted a 14% chance 7-day median would rise above 140k cases and a 2% (!) chance it would rise above 210k; it ended up peaking at about 150k so this is weak evidence, but it still seems like 2% was crazy low.
I’m hesitant to make it seem like I’m bashing on superforecasters, I think they’re amazing overall relative to the general public. But I think Samotsvety is even more amazing :D I also think some superforecasters are much better forecasters than others.
This doesn’t sound like an outlandish claim to me. Still, I’m not yet convinced.
I was really into Covid forecasting at the time, so I was tempted to go back through my comment history and noticed that this seemed like an extremely easy call at the time. (I made this comment 15 days before yours where I was predicting >100,000 cases with 98% confidence, saying I’d probably go to 99% after more checking of my assumptions. Admittedly, >100,000 cases in a single day is significantly less than >140,000 cases for the 7-day average. Still, a confidence level of 98%+ suggests that I’d definitely have put a lot more than 14% on the latter.) This makes me suspect that maybe that particular question was quite unrepresentative for the average track record of superforecasters? Relatedly, if we only focus on instances where it’s obvious that some group’s consensus is wrong, it’s probably somewhat easy to find such instances (even for elite groups) because of the favorable selection effect at work. A through analysis would look at the track record on a pre-registered selection of questions.
Edit: The particular Covid question is strong evidence for “sometimes superforecasters don’t seem to be trying as much as they could.” So maybe your point is something like “On questions where we try as hard as possible, I trust us more than the average superforecaster prediction.” I think that stance might be reasonable.
As a superforecaster, I’m going to strongly agree with “sometimes superforecasters don’t seem to be trying as much as they could,” and they aren’t incentivized to do deep dives into every question.
I’d say they are individually somewhere between metaculus and a more ideal group, which Samovetseky seems to be close to, but I’m not an insider, and have limited knowledge of how you manage epistemic issues like independent elicitation before discussion. One thing Samovestsky does not have, unfortunately, is the type of more sophisticated algorithm to aggregates that are used by Metaculus and GJ, nor the same level of diversity as either—though overall I see those as less important than more effort by properly calibrated forecasters.
Really appreciate this deep dive!
Yeah, I think the evidence I felt comfortable sharing right now is enough to get to some confidence but perhaps not high confidence, so this is fair. The INFER point is probably stronger than the two bad predictions which is why I put it first.
I agree a more thorough analysis would look at the track record on a pre-registered selection of questions would be great. It’s pretty hard to know because the vast majority of superforecaster predictions are private and not on their public dashboard. Speaking for myself, I’d be pretty excited about a Samotsvety vs. supers vs. [any other teams who were interested] tournament happening.
That being said, I’m confused about how you seem to be taking “I was really into Covid forecasting at the time, so I was tempted to go back through my comment history and noticed that this seemed like an extremely easy call at the time” as an update toward superforecasters being better? If anything this feels like an update against superforecasters? The point I was trying to make was that it was a foreseeably wrong prediction and you further confirmed it?
I’d also say that on the cherry-picking point, I wasn’t exactly checking the superforecaster public dashboard super often over the last few years (like maybe I’ve checked ~25-50 days total) and there are only like 5 predictions up at a time.
I think it’s fair to interpret the Covid question to some extent as superforecasters not trying, but I’m confused about how you seem to be attributing little of it to prediction error? It could be a combination of both.
Good point. I over-updated on my feeling of “this particular question felt so easy at the time” so that I couldn’t imagine why anyone who puts serious time into it would get it badly wrong.
However, on reflection, I think it’s most plausible that different types of information were salient to different people, which could have caused superforecasters to make prediction errors even if they were trying seriously. (Specifically, the question felt easy to me because I happened to have a lot of detailed info on the UK situation, which presented one of the best available examples to use for forming a reference class.)
You’re right that I essentially gave even more evidence for the claim you were making.
Thanks for writing this!
You may want to consider creating a topic for “Samotsvety”, where posts such as this could be tagged.
Good idea, I made a very quick version. Anyone should feel free to update it.
Do the ranges 3-91.5% and 45-99.5% include or exclude the highest and lowest forecasts?
Includes the highest and lowest
This is great; thanks for sharing!
The ranges on these questions seem pretty wide. Do you have thoughts on that? For hard questions (e.g., about emerging technology several decades in the future, like these questions), do superforecasters/Samotsvety often have such a wide range?
To add to Eli’s comment, I think on such complex topics, it’s just common for even personal estimates to fluctuate quite a bit. For example, here is an excerpt from footnote 181 of Carlsmith report:
My impression is wide ranges are pretty common on questions as difficult/complex as these. I think large differences can often come from very-hard-to-resolve deep disagreements in intuitions, as we’ve seen with the MIRI conversations.
If the range was too small on this type of question I might be worried about herding/anchoring. In a few cases there are 1-2 outlier forecasts on each end and the rest are relatively close together.
It could be cool to see the individual forecasts presented in a histogram.
I believe your linked text for existential catastrophe (in the second table) is incorrect- I get a page not found error.
Substantively, I realize this is probably not something you originally asked (nor am I asking for it since presumably this’d take a bunch of time), but I’d be super curious to see what kind of uncertainty estimates folks put in this, and how combining using those uncertainties might look. If you have some intuition on what those intervals look like, that’d be interesting.
The reason I’m curious about this is probably fairly transparent, but given the pretty extensive broader community uncertainty on the topic, aggregating using those might yield a different point estimate, but more importantly might help people understand the problem better by seeing the large degree uncertainty involved. For example, it’d be interesting/useful to see how much probability people put outside a 10-90% range.
Thanks, fixed.
Good question! Perhaps we can include better uncertainty information in a future post at some point. For now, regarding my personal uncertainty, I’ll quote what I wrote about a week ago: