A major flaw in the Forecasting Research Institute's "Longitudinal Expert AI Panel" survey

The Forecasting Research Institute conducted a survey asking different kinds of experts (including technical and non-technical) many questions about AI progress. The report, which was just published, is here. I’ve only looked at the report briefly and there is a lot that could be examined and discussed.

The major flaw I want to point out is in the framing of a question where survey respondents are presented with three different scenarios for AI progress: 1) the slow progress scenario, 2) the moderate progress scenario, and 3) the rapid progress scenario.

All three scenarios describe what will happen by the end of 2030. Respondents have to choose between the three scenarios. There are only three options; there is no option to choose none.

First, two important qualifications. Here’s qualification #1, outlined on page 104:

In the following scenarios, we consider the development of AI capabilities, not adoption. Regulation, social norms, or extended integration processes could all prevent the application of AI to all tasks of which it is capable.

Qualification #2, also on page 104:

We consider a capability to have been achieved if there exists an AI system that can do it:

Inexpensively: with a computational cost not exceeding the salary of an appropriate
2025 human professional using the same amount of time to attempt the task.
Reliably: what this means is context-dependent, but typically we mean as reliably as, or more reliably than, a human or humans who do the same tasks professionally in 2025.

With that said, here is the scenario that stipulates the least amount of AI progress, the slow progress scenario (on page 105):

Slow Progress

By the end of 2030 in this slower-progress future, AI is a capable assisting technology for humans; it can automate basic research tasks, generate mediocre creative content, assist in vacation planning, and conduct relatively standard tasks that are currently (2025) performed by humans in homes and factories.

Researchers can benefit from literature reviews on almost any topic, written at the level of a capable PhD student, yet AI systems rarely produce novel and feasible solutions to difficult problems. As a result, genuine scientific breakthroughs remain almost entirely the result of human-run labs and grant cycles. Nevertheless, AI tools can support other research tasks (e.g., copy editing and data cleaning and analysis), freeing up time for researchers to focus on higher-impact tasks. AI can handle roughly half of all freelance software-engineering jobs that would take an experienced human approximately 8 hours to complete in 2025, and if a company augments its customer service team with AI, it can expect the model to be able to resolve most complaints.

Writers enjoy a small productivity boost; models can turn out respectable short stories, but full-length novels still need heavy human rewriting to avoid plot holes or stylistic drift. AI can make a 3-minute song that humans would blindly judge to be of equal quality to a song released by a current (2025) major record label. At home, an AI system can draft emails, top up your online grocery cart, or collate news articles, and—so long as the task would take a human an hour or less and is well-scoped—it performs on par with a competent human assistant. With a few prompts, AI can create an itinerary and make bookings for a weeklong family vacation that feels curated by a discerning travel agent.

Self-driving car capabilities have advanced, but none have achieved true level-5 autonomy. Meanwhile, household robots can make a cup of coffee and unload and load a dishwasher in some modern homes—but they can’t do it as fast as most humans and they require a consistent environment and occasional human guidance. In advanced factories, autonomous systems can perform specific, repetitive tasks that require precision but little adaptability (e.g., wafer handling in semiconductor fabrication facilities).

So, in the slowest progress scenario, by the end of 2030, respondents are to imagine what is either nearly AGI or AGI outright.

This is a really strange way to frame the question. The slowest scenario is extremely aggressive and the moderate and rapid scenarios are even more aggressive. What was the Forecasting Research Institute hoping to learn here?

Edited on Friday, November 14, 2025 at 9:30 AM Eastern to add the following.

Here’s the question respondents are asked with regard to these scenarios (on page 104):

At the end of 2030, what percent of LEAP panelists will choose “slow progress,” “moderate progress,” or “rapid progress” as best matching the general level of AI progress?

The percent of panelists forecasted by the respondents is then stated in the report as the probability the respondents assign to each scenario (e.g. on page 38).

Edit #2 on Friday, November 14, 2025 at 6:25 PM Eastern.

I just noticed that the Forecasting Research Institute made a post on the EA Forum a few days ago that presents the question results as probabilities:

By 2030, the average expert thinks there is a 23% chance of a “rapid” AI progress scenario, where AI writes Pulitzer Prize-worthy novels, collapses years-long research into days and weeks, outcompetes any human software engineer, and independently develops new cures for cancer. Conversely, they give a 28% chance of a slow-progress scenario, in which AI is a useful assisting technology but falls short of transformative impact.

If the results are going to be presented this way, it seems particularly important to consider the wording and framing of the question.

Edit #3 on Saturday, November 15, 2025 at 10:10 PM Eastern.

I just learned that, in 2023, the Forecasting Research Institute published a survey on existential risk where the wording of a question changed the respondents’ estimated probability by 750,000x. When asked to estimate the odds of human extinction by 2100 in terms of a percentage, the median response was 5% or 1 in 20. When asked to estimate in terms of a 1-in-X chance with some examples of probabilities of events (e.g. being struck by lightning), the median response was 1 in 15 million or 0.00000667%. Details here.

Since there seems to be some doubt as to whether the way a question is worded or presented can actually bias the responses that much and whether this is really such a big deal — let there be no more doubt!

Edit #4 on Thursday, November 20, 2025 at 3:00 PM Eastern.

There is an additional concern — which is entirely separate and distinct from anything mentioned in the post above or any of the edits — with the intersubjective resolution/metaprediction framing of the question. See the volcano analogy in my comment here. (See also my Singularity example in my subsequent comment here.) I may be mistaken, but I don’t see how we can derive what the respondents’ probability for each scenario is from a question that doesn’t ask anything directly about probability.

Respondents are asked to predict, in December 2030, “what percent of LEAP panelists will choose” each scenario (not with any probability). This implies that if they think there’s, say, a 51% chance that 30% of LEAP panelists will choose the slow scenario, they should respond to the question by saying 30% will choose the slow scenario. If they think there’s a 99% chance that 30% of LEAP panelists will choose the slow scenario, they should also respond by saying 30% will choose the slow scenario. In either case, the number in their answer is exactly the same, despite a 48-point difference in the probability they assign to this outcome. The report says that 30% is the probability respondents assign to the slow scenario, but it’s not clear that the respondents’ probability is 30%.

The Forecasting Research Institute only asks for the predicted “vote share” for each scenario and not the estimated probabilities behind those vote share predictions. It doesn’t seem possible to derive the respondents’ probability estimates from the vote share predictions alone. By analogy, if FiveThirtyEight’s 2020 election forecast predicts that Joe Biden will win a 55% share of the national vote, this doesn’t tell you what probability the model assigns to Biden winning the election (whether it’s, say, 70%, 80%, or 90%). The model’s probability is certainly not 55%. To know the model’s probability or guess at it, you would need information other than just the predicted vote share.

Edit #5 on Tuesday, December 2, 2025 at 1:00 PM Eastern.

The Forecasting Research Institute has changed the language in the report in response to the critique described above in edit #4! (It seems like titotal played an important role in this. Thank you, titotal.)

On page 32, the report now gives the survey results in the same intersubjective resolution/metaprediction wording the question was asked in, rather than as an unqualified probability:

By 2030, the average expert thinks that 23% of LEAP panelists will say the state of AI most closely mirrors an (“rapid”) AI progress scenario that matches some of these claims.

This is awesome! I’m very happy to see this. Thanks to the Forecasting Research Institute for making this change.

I see also the EA Forum post where the LEAP survey was announced has been updated in the same way, so thanks for that as well. Great to see.

Edit #6 on Saturday, December 6, 2025 at 8:20 PM Eastern.

Titotal has published a full breakdown of the error involving the intersubjective resolution/metaprediction framing of the survey question. It’s a great post that explains the error very well. Many thanks to titotal for taking the time to write the post and for convincing the Forecasting Research Institute that this was indeed an error, which I could not do. Thanks again to the Forecasting Research Institute for revising the report and their EA Forum post about the report.

Edit #7 on Monday, December 8, 2025 at 3:45 AM Eastern.

See my comment here (on titotal’s post) for some concrete evidence that the slow progress scenario is too high as a baseline, minimum progress scenario.

For example, the slow progress scenario predicts the development of household robots that can do various chores by December 2030. Metaculus, which tends to be highly aggressive and optimistic in its forecasts of AI capabilities progress, only predicts the development of such robots in mid-2032. To me, this indicates the slow progress scenario stipulates too much progress.

Metaculus is already aggressive, and the slow progress scenario seems to be more aggressive — at least on household robots, depending how exactly you interpret it — than Metaculus.

A major flaw in the Forecasting Research Institute’s “Longitudinal Expert AI Panel” survey