I canāt thank titotal enough for writing this post and for talking to the Forecasting Research Institute about the error described in this post.
Iām also incredibly thankful to the Forecasting Research Institute for listening to and integrating feedback from me and, in this case, mostly from titotal. Itās not nothing to be responsive to criticism and correction. I can only express appreciation for people who are willing to do this. Nobody loves criticism, but the acceptance of criticism is what it takes to move science, philosophy, and other fields forward. So, hallelujah for that.
I want to be clear that, as titotal noted, weāre just zeroing in here on one specific question discussed in the report, out of 18 total. It is an unfortunate thing that you can work hard on something that is quite large in scope and it can be almost entirely correct (I havenāt reviewed the rest of the report, but Iāll give the benefit of the doubt), but then the discussion focuses around the one mistake you made. I donāt want research or writing to be a thankless task that only elicits criticism, and I want to be thoughtful about how to raise criticism in the future.
For completeness, to make sure readers have a full understanding, I actually made three distinct and independent criticisms of this survey question and how it was reported. First, I noted that the probability of the rapid scenario was reported as an unqualified probability, rather than the probability of the scenario being the best matching of the three ā ābest matchingā is the wording the question used. The Forecasting Research Institute was quick to accept this point and promise to revise the report.
Second, I raised the problem around the intersubjective resolution/āmetaprediction framing that titotal describes in this post. After a few attempts, I passed the baton to titotal, figuring that titotalās reputation and math knowledge would make them more convincing. The Forecasting Research Institute has now revised the report in response, as well as their EA Forum post about the report.
Third, the primary issue I raised in my original post on this topic is about a potential anchoring effect or question wording bias with the survey question.[1] The slow progress scenario is extremely aggressive and optimistic about the amount of progress in AI capabilities between now and the end of 2030. I would personally guess the probability of AI gaining the sort of capabilities described in the slow progress scenario by the end of 2030 is significantly less than 0.1% or 1 in 1,000. I imagine most AI experts would say itās unlikely, if presented with the scenario in isolation and asked directly about its probability.
For example, here is what is said about household robots in the slow progress scenario:
By the end of 2030 in this slower-progress future, AI is a capable assisting technology for humans; it can ⦠conduct relatively standard tasks that are currently (2025) performed by humans in homes and factories.
Also:
Meanwhile, household robots can make a cup of coffee and unload and load a dishwasher in some modern homesābut they canāt do it as fast as most humans and they require a consistent environment and occasional human guidance.
Even Metaculus, which is known to be aggressive and optimistic about AI capabilities, and which is heavily used by people in the effective altruist community and the LessWrong community, where belief in near-term AGI is strong, puts the median date for the question āWhen will a reliable and general household robot be developed?ā in mid-2032. The resolution criteria for the Metaculus question are compatible with the sentence in the slow progress scenario, although those criteria also stipulate a lot of details that are not stipulated in the slow progress scenario.
An expert panel surveyed in 2020 and 2021 was asked, ā[5/ā10] years from now, what percentage of the time that currently goes into this task can be automated?ā and answered 47% for dish washing in 10 years, so in 2030 or 2031. I find this to be a somewhat confusing framing ā what does it mean for 47% of the time involved in dish washing to be automated? ā but it points to the baseline scenario in the LEAP survey involving contested questions and not just things we can take for granted.
Adam Jonas, a financial analyst at Morgan Stanley who has a track record of being extremely optimistic about AI and robotics (sometimes mistakenly so), and who the financial world interprets as having aggressive, optimistic forecasts, predicts that a āgeneral-purpose humanoidā robot for household chores will require ātechnological progress in both hardware and AI models, which should take about another decadeā, meaning around 2035. So, on Wall Street, even an optimist seems to be less optimistic than the LEAP surveyās slow progress scenario.
If the baseline scenario is more optimistic about AI capabilities progress than Metaculus, the results of a previous expert survey, and a Wall Street analyst on the optimistic end of the spectrum, then it seems plausible that the baseline scenario is already more optimistic than what the LEAP panelists would have reported as their median forecast if they had been asked in a different way. It seems way too aggressive as a baseline scenario. This makes it hard to know how to to interpret the panelistsā answers (in addition to the interpretative difficulty raised by the problem described in titotalās post above).
I have also used the term āframing effectā to describe this before ā following the Forecasting Research Institute and AI Impacts ā but now checking again the definition of that term in psychology, it seems to specifically refer to framing the same information as positive or negative, which doesnāt apply here.
I canāt thank titotal enough for writing this post and for talking to the Forecasting Research Institute about the error described in this post.
Iām also incredibly thankful to the Forecasting Research Institute for listening to and integrating feedback from me and, in this case, mostly from titotal. Itās not nothing to be responsive to criticism and correction. I can only express appreciation for people who are willing to do this. Nobody loves criticism, but the acceptance of criticism is what it takes to move science, philosophy, and other fields forward. So, hallelujah for that.
I want to be clear that, as titotal noted, weāre just zeroing in here on one specific question discussed in the report, out of 18 total. It is an unfortunate thing that you can work hard on something that is quite large in scope and it can be almost entirely correct (I havenāt reviewed the rest of the report, but Iāll give the benefit of the doubt), but then the discussion focuses around the one mistake you made. I donāt want research or writing to be a thankless task that only elicits criticism, and I want to be thoughtful about how to raise criticism in the future.
For completeness, to make sure readers have a full understanding, I actually made three distinct and independent criticisms of this survey question and how it was reported. First, I noted that the probability of the rapid scenario was reported as an unqualified probability, rather than the probability of the scenario being the best matching of the three ā ābest matchingā is the wording the question used. The Forecasting Research Institute was quick to accept this point and promise to revise the report.
Second, I raised the problem around the intersubjective resolution/āmetaprediction framing that titotal describes in this post. After a few attempts, I passed the baton to titotal, figuring that titotalās reputation and math knowledge would make them more convincing. The Forecasting Research Institute has now revised the report in response, as well as their EA Forum post about the report.
Third, the primary issue I raised in my original post on this topic is about a potential anchoring effect or question wording bias with the survey question.[1] The slow progress scenario is extremely aggressive and optimistic about the amount of progress in AI capabilities between now and the end of 2030. I would personally guess the probability of AI gaining the sort of capabilities described in the slow progress scenario by the end of 2030 is significantly less than 0.1% or 1 in 1,000. I imagine most AI experts would say itās unlikely, if presented with the scenario in isolation and asked directly about its probability.
For example, here is what is said about household robots in the slow progress scenario:
Also:
Even Metaculus, which is known to be aggressive and optimistic about AI capabilities, and which is heavily used by people in the effective altruist community and the LessWrong community, where belief in near-term AGI is strong, puts the median date for the question āWhen will a reliable and general household robot be developed?ā in mid-2032. The resolution criteria for the Metaculus question are compatible with the sentence in the slow progress scenario, although those criteria also stipulate a lot of details that are not stipulated in the slow progress scenario.
An expert panel surveyed in 2020 and 2021 was asked, ā[5/ā10] years from now, what percentage of the time that currently goes into this task can be automated?ā and answered 47% for dish washing in 10 years, so in 2030 or 2031. I find this to be a somewhat confusing framing ā what does it mean for 47% of the time involved in dish washing to be automated? ā but it points to the baseline scenario in the LEAP survey involving contested questions and not just things we can take for granted.
Adam Jonas, a financial analyst at Morgan Stanley who has a track record of being extremely optimistic about AI and robotics (sometimes mistakenly so), and who the financial world interprets as having aggressive, optimistic forecasts, predicts that a āgeneral-purpose humanoidā robot for household chores will require ātechnological progress in both hardware and AI models, which should take about another decadeā, meaning around 2035. So, on Wall Street, even an optimist seems to be less optimistic than the LEAP surveyās slow progress scenario.
If the baseline scenario is more optimistic about AI capabilities progress than Metaculus, the results of a previous expert survey, and a Wall Street analyst on the optimistic end of the spectrum, then it seems plausible that the baseline scenario is already more optimistic than what the LEAP panelists would have reported as their median forecast if they had been asked in a different way. It seems way too aggressive as a baseline scenario. This makes it hard to know how to to interpret the panelistsā answers (in addition to the interpretative difficulty raised by the problem described in titotalās post above).
I have also used the term āframing effectā to describe this before ā following the Forecasting Research Institute and AI Impacts ā but now checking again the definition of that term in psychology, it seems to specifically refer to framing the same information as positive or negative, which doesnāt apply here.