1. Are you referring to your exchange with David Mathers here?
3. I’m not sure what you’re saying here. Just to clarify what my point is: you’re arguing in the post that the slow scenario actually describes big improvements in AI capabilities. My counterpoint is that this scenario is not given a lot of weight by the respondents, suggesting that they mostly don’t agree with you on this.
You are guessing you know how the framing affected the results, which is your right, but it is my right to guess something different, and the whole point we do surveys is not to guess but to know. If we wanted to rely on guesses, we could have saved the Forecasting Research Institute the trouble of running the whole survey in the first place!
I don’t mind if you don’t respond — it’s fair to leave a discussion whenever you like — but I want to try to make my point clear for you and for anyone else who might read this post.
How do you reckon the responses to the survey question would be different if there were a significant question wording effect biasing the results? My position is: I simply don’t know and can’t say how the results would be different if the question were phrased in such a way as to better avoid a question wording effect. The reason to run surveys is to learn that and be surprised. If the question were worded and framed differently, maybe the results would be very different, maybe they would be a little different, maybe they would be exactly the same. I don’t know. Do you know? Do you actually know for sure? Or are you just guessing?
What if we consider the alternative? Let’s say the response was something like, I don’t know, 95% in favour of the slow progress scenario, 4% for the moderate scenario, and 1% for the rapid. Just to imagine something for the sake of illustration. Then you could also argue against a potential question wording effect biasing the results by appealing to the response data. You could say: well, clearly the respondents saw past the framing of the question and managed to accurately report their views anyway.
This should be troubling. If a high percentage can be used to argue against a question wording effect and a low percentage can be used to argue against a question wording effect, then no matter what the results are, you can argue that you don’t need to worry about a potential methodological problem because the results show they’re not a big deal. If any results can be used to argue against a methodological problem, then surely no results should be used to argue against a methodological problem. Does that make sense?
I don’t feel like I’m inventing the wheel here, but just talking about common concerns with how surveys are designed and worded. In general, you can’t know whether a response was biased or not just by looking at the data and not the methodology.
For reference, here are the results on page 141 of the report:
Yeah, the error here was mine sorry. I didn’t actually work on the survey, and I missed that it was actually estimating the % of the panel agreeing we are in a scenario, not the chance that that scenario will win a plurality of the panel. This is my fault not Connacher’s. I was not one of the survey designers, so please do not assume from this that the people at the FRI who designed the survey didn’t understand their own questions or anything like that.
For what it’s worse, I think this is decent evidence that the question is too confusing to be useful given that I mischaracterized it even though I was one of the forecasters. So I largely, although not entirely withdraw the claim that you should update on the survey results. (That is, I think it still constitutes suggestive evidence that you are way out of line with experts-and superforecasters- but no longer super-strong.)
I also somewhat withdraw the claim that we should take even well-designed expert surveys as strong evidence of the actual distribution of opinions. I had forgotten the magnitude of the framing effect that titotal found for the human extinction questions. That really does somewhat call the reliability of even a decently designed survey into question. That said, I don’t really see a better way to get at “what do experts” think than surveys here, and I doubt they have zero value. But people should probably test multiple framings more. Nonetheless “there could be a big framing effect because it asks for a %”, i.e. the titotal criticism, could apply to literally any survey, and I’m a bit skeptical of “surveys are a zero value method of getting at expert opinion”.
So I think I concede that you were right not to be massively moved by the survey, and I was wrong to say you should be. That said, maybe I’m wrong, but I seem to recall that you frequently imply that EA opinion on the plausibility of AGI by 2032 is way out of step with what “real experts” think. If your actual opinion is that no one has ever done a well-designed survey, then you should probably stop saying that. Or cite a survey you think is well-designed that actually shows other people are more out of step with expert opinion than you are, or say that EAs are out of step with expert opinion in your best guess, but you can’t really claim with any confidence that you are any more in line with it. My personal guess is that your probabilities are in fact several orders of magnitude away from the “real” median of experts and superforecasters, if we could somehow control for framing effects, but I admit I can’t prove this.
But I will say that if taken at face value the survey still shows a big gap between what experts think and your “under 1 in 100,000 chance of AGI by 2032″ (That is, you didn’t complain when I attributed that probability to you in the earlier thread, and I don’t see any other way to interpret “more likely that JFK is secretly still alive” given you insisted you meant it literally.) Obviously, if someone is thinking that the most likely outcome in 2030 we will be in a situation where approx. 1 in 4 people on the panel think we are already in the rapid scenario, they probably don’t think the chance of AGI by 2032 is under 1 in 100,000, since they are basically predicting that we’re going to bear near the upper end of the moderate scenario, which makes it hard to give a chance of AGI by 2 years after 2030 that low. (I suppose they just could have a low opinion of the panel, and think some of the members will be total idiots, but I consider that unlikely.) I’d also say that if forecasters made the mistake I did in interpreting the question, then again, they are clearly out of step with the probability you give. I’m also still prepared to defend the survey against some of your other criticisms
Yeah, the error here was mine sorry. I didn’t actually work on the survey, and I missed that it was actually estimating the % of the panel agreeing we are in a scenario, not the chance that that scenario will win a plurality of the panel. This is my fault not Connacher’s. I was not one of the survey designers, so please do not assume from this that the people at the FRI who designed the survey didn’t understand their own questions or anything like that.
I really appreciate that, but the report itself made the same mistake!
Contrary to this result, the public assigns more weight to the “Rapid Progress” scenario in the General AI Progress question: the average member of the public assigns a 26% chance to the rapid scenario (95% confidence interval [25.5%, 26.4%]), compared to 23% for experts (95% confidence interval [22.1%, 23.7%]).
And the same mistake is repeated again on the Forecasting Research Institute’s Substack in a post which is cross-posted on the EA Forum:
Speed of AI progress: By 2030, the average expert thinks there is a 23% chance of a “rapid” AI progress scenario, where AI writes Pulitzer Prize-worthy novels, collapses years-long research into days and weeks, outcompetes any human software engineer, and independently develops new cures for cancer. Conversely, they give a 28% chance of a slow-progress scenario, in which AI is a useful assisting technology but falls short of transformative impact.
There are two distinct issues here: 1) the “best matching” qualifier, which contradicts these unqualified statements about probability and 2) the intersubjective resolution/metaprediction framing of the question, which I still find confusing but I’m waiting to see if I can ultimately wrap my head around. (See my comment here.)
I give huge credit to Connacher Murphy for acknowledging that the probability should not be stated without the qualifier that this is only the respondents’ “best matching” scenario, and for promising to revise the report with that qualifier added. Kudos, a million times kudos. My gratitude and relief is immense. (I hope that the Forecasting Research Institute will also update the wording of the Substack post and the EA Forum post to clarify this.)
Conversely, it bothers me that Benjamin Tereick said that it’s only “slightly inaccurate” and not “a big issue” to present this survey response as the experts’ unqualified probabilities. Benjamin doesn’t work for the Forecasting Research Institute, so his statements don’t affect your organization’s reputation in my books, but I find that frustrating. In case it’s in doubt: making mistakes is absolutely fine and no problem. (Lord knows I make mistakes!) Acknowledging mistakes increases your credibility. (I think a lot of people have this backwards. I guess blame the culture we live in for that.)
For what it’s worse, I think this is decent evidence that the question is too confusing to be useful given that I mischaracterized it even though I was one of the forecasters.
You’re right!
That said, I don’t really see a better way to get at “what do experts” think than surveys here, and I doubt they have zero value.
It would be very expensive and maybe just not feasible, but in my opinion the most interesting and valuable data could be obtained from long-form, open-ended, semi-unstructured, qualitative research interviews.
Here’s why I say that. You know what the amazing part of this report is? The rationale examples! Specifically, the ones on page 142 and 143. We only get morsels, but, for me, this is the main attraction, not an afterthought.
For example, we get this rationale examples in support of the moderate progress scenario:
If the METR task horizon trends hold, then by 2030 AIs would be able to do software tasks that take humans months or weeks. This kind of time horizon mostly seems like the moderate progress world.
Therein lies the crux! Personally, I strongly believe the METR time horizon graph is not evidence of anything significant with regard to AGI, so it’s good to know where the disagreement lies.
Or this other rationale example for the moderate progress scenario:
Moderate sounds completely achievable even today with proper integration of the latest technologies (commercialized, announced, released & unreleased) of the SOTA labs into enterprise and consumer workflows. I’m less sure of the robot part.
This is unfathomable to me! Toby Crisford expressed similar incredulity about someone saying the same for the slow scenario, and I agree with him.
Either this respondent is just interpreting the scenario description completely differently than I am (this does happen kind of frequently), or, if they’re interpreting it the same way I am, the respondent is expressing a view that not only do I just not believe, I have a hard time fathoming how anyone could believe it.
So, it turns out it’s way more interesting to find out why people disagree than to find out that they disagree.
...I’m a bit skeptical of “surveys are a zero value method of getting at expert opinion”.
Listen, I’d read surveys all day if I could. The issue is just — using this FRI survey and the AI Impacts survey as the two good examples we have — it turns out survey design is thornier and more complicated than anyone realized going in.
So I think I concede that you were right not to be massively moved by the survey, and I was wrong to say you should be.
Thank you! I appreciate it!
That said, maybe I’m wrong, but I seem to recall that you frequently imply that EA opinion on the plausibility of AGI by 2032 is way out of step with what “real experts” think. If your actual opinion is that no one has ever done a well-designed survey, then you should probably stop saying that.
Before this FRI survey, the only expert survey we had on AGI timelines was the AI Impacts survey. I found the 69-year framing effect in that survey strange. Here’s what I said about it in a comment 3 weeks ago:
I agree that the enormous gap of 69 years between High-Level Machine Intelligence and Full Automation of Labour is weird and calls the whole thing into question. But I think all AGI forecasting should be called into question anyway. Who says human beings should be able to predict when a new technology will be invented? Who says human beings should be able to predict when the new science required to invent a new technology will be discovered? Why should we think forecasting AGI beyond anything more than a wild guess is possible?
After digging into this FRI survey — and seeing titotal’s brilliant comment about the 750,000x anchoring effect for the older survey — now I’m really questioning whether I should keep citing the AI Impacts survey at all. Even recognizing the framing effect in the AI Impacts survey was very strange, I didn’t realize the full extent to which the results of surveys could be the artefacts of survey design. I thought of that framing effect as a very strange quirk, and a big question mark, but now it seems like the problem is bigger and more fundamental than I realized. (Independently, I’ve recently been realizing just how differently people imagine AGI or other hypothetical future AI systems, even when care is taken to give a precise definition or paint a picture with a scenario like in the FRI survey.)
There’s also the AAAI survey where 76% of experts said it’s unlikely or very unlikely that current AI approaches (including LLMs) would scale to AGI (see page 66). That doesn’t ask anything about timing. But now I’m questioning even that result, since who knows if “AGI” is well-defined or what the respondents mean by that term?
That said, maybe I’m wrong, but I seem to recall that you frequently imply that EA opinion on the plausibility of AGI by 2032 is way out of step with what “real experts” think.
I think you might be mixing up two different things. I have strong words about people in EA who aren’t aware that many experts (most, if you believe the AAAI survey) think LLMs won’t scale to AGI, who don’t consider this to be a perspective worth seriously discussing, or who have never heard of or considered this perspective before. In that sense, I think the average/typical/median EA opinion is way out of step with what AI experts think.
When it comes to anything involving timelines (specifically, expert opinion vs. EA opinion), I don’t have particularly strong feelings about that, and my words have been much less strong. This is what I said in a recent post, which you commented on:
Moreover, if I change the reference class from, say, people in the Effective Altruism Forum filter bubble to, say, AI experts or superforecasters, the median year for AGI gets pushed out past 2045, so my prediction starts to look like a lot less of an outlier. But I don’t want to herd toward those forecasts, either.
I’m not even sure I’m not an outlier compared to AI experts. I was just saying if you take the surveys at face value — which now I’m especially questioning — my view looks a lot less like an outlier than if you used the EA Forum as the reference class.
My personal guess is that your probabilities are in fact several orders of magnitude away from the “real” median of experts and superforecasters, if we could somehow control for framing effects, but I admit I can’t prove this.
The median might obscure the deeper meaning we want to get to. The views of experts might be quite diverse. For example, there might be 10% of experts who put significantly less than a 1% probability on AGI within 10 years and 10% of experts who put more than a 90% probability on it. I’m not really worried about being off from the median. I would be worried if my view, or something close to it, wasn’t shared by a significant percentage of experts — and then I would really have to think carefully.[1]
For example, Yann LeCun, one of the foremost AI researchers in the world, has said it’s impossible that LLMs will scale to AGI — not improbable, not unlikely, impossible. Richard Sutton, another legendary AI researcher, has said LLMs are a dead end. This sounds like a ~0% probability, so I don’t mind if I agree with them and have a ~0% probability as well.
Surely you wouldn’t argue that all the experts should circularly update until all their probabilities are the same, right? That sounds like complete madness. People believe what they believe for reasons, and should be convinced on the grounds of reasons. This is the right way to do it. This is the Enlightenment’s world! We’re just living in it!
Maybe something we’re clashing over here is the difference between LessWrong-style/Yudkowskyian Bayesianism and the traditional, mainstream scientific mindset. LessWrong-style/Yudkowskyian Bayesianism emphasizes personal, subjective guesses about things — a lot. It also emphasizes updating quickly, even making snap judgments.
The traditional, mainstream scientific mindset emphasizes cautious vetting of evidence before making updates. It prefers to digest information slowly, thoughtfully. There is an emphasis on not making claims or stating numbers that one cannot justify, and on not relying on subjective guesses or probabilities.
I am a hardcore proponent of the traditional, mainstream scientific mindset. I do not consider LessWrong-style/Yudkowskyian Bayesianism to be of any particular merit or value. It seems to make people join cults more often than it leads them to great scientific achievement. Indeed, Yudkowsky’s/LessWrong’s epistemology has been criticized as anti-scientific, a criticism I’m inclined to agree with.
But I will say that if taken at face value the survey still shows a big gap between what experts think and your “under 1 in 100,000 chance of AGI by 2032” (That is, you didn’t complain when I attributed that probability to you in the earlier thread, and I don’t see any other way to interpret “more likely that JFK is secretly still alive” given you insisted you meant it literally.)
This is slightly incorrect, but it’s almost not important enough to be worth correcting. I said there was a less than 1 in 10,000 chance of the rapid progress scenario in the survey, without the adoption vs. capabilities caveat (which I wasn’t given at the time) — which I take to imply a much higher bar than “merely” AGI. I also made this exact same correction once before, in a comment you replied to. But it really doesn’t matter.
On reflection, I probably would say the probability of the rapid progress scenario (without the adoption vs. capabilities caveat) — a widely deployed, very powerful superhuman AGI or superintelligence by December 2030 — has less than a 1 in 100,000 chance of coming to pass, though. Here’s my reasoning. The average chance of getting struck by lightning over 5 years is, as I understand, about 1 in 250,000. I would be more surprised if the rapid scenario (without the adoption vs. capabilities caveat) came true than if I got struck by lightning within that same timeframe. Intuitively, me getting struck by lightning seems more likely than the rapid scenario (without the adoption vs. capabilities caveat). So, it seems like I should say the chances are less than 1 in 250,000.
That said, I’m not a forecaster, and I’m not sure how to calibrate subjective, intuitive probabilities significantly below 1% for unprecedented eschatological/world-historical/natural-historical events of cosmic significance that may involve the creation of new science that’s currently unknown to anyone in the world, and that can’t be predicted mechanically or statistically. What does the forecasting literature say about this?
AGI is a lower bar than the rapid scenario (in my interpretation), especially without the caveat, and the exact, nitpicky definition of AGI could easily make the probability go up or down a lot. For now, I’m sticking with significantly less than a 1 in 1,000 chance of AGI before the end of 2032 on my definition of AGI.[2] If I thought about it more, I might say the chance is less than 1 in 10,000, or less than 1 in 100,000, or less than 1 in 250,000, but I’d have to think about it more, and I haven’t yet.
What probability of AGI before the end of 2032, on my definition of AGI (see the footnote), would you give?
Let me know if there’s anything in your comment I didn’t respond to that you’d like an answer about.
If you happen to be curious, I wrote a post trying to integrate two truths: 1) if no one ever challenged the expert consensus or social consensus, the world would suck and 2) the large majority of people who challenge the expert consensus or social consensus are not able to improve upon it.
By artificial general intelligence (AGI), I mean a system that can think, plan, learn, and solve problems just like humans do, with:
At least an equal level of data efficiency (e.g. if a human can learn from one example, AGI must also be able to learn from one example and, not, say, one million)
At least an equal level of reliability (e.g. if humans do a task correctly 99.999% of the time, AGI must match or exceed that)
At least an equal level of fluidity or adaptability to novel problems and situations (e.g. if a human can solve a problem with zero training examples, AGI must be able to as well)
At least an equal ability to generate novel and creative ideas
It would be an interesting exercise to survey AI researchers and ask them to assign a probability for each of these problems being solved to human-level by some date, such as 2030, 2040, 2050, 2100, etc. Then, after all that, ask them to estimate the probability of all these problems being solved to human-level. This might be an inappropriately biasing framing. Or it might be a framing that actually gets the crux of the matter much better than other framings. I don’t know.
I don’t make a prediction about what the median response would be, but I suspect the distribution of responses would include a significant percentage of researchers who think the probability of solving all these problems by 2030 or 2040 is quite low. Especially if they were primed to anchor their probabilities to various 1-in-X probability events rather than only expressing them as a percentage. (Running two versions of the question, one with percentages and one with 1-in-X probabilities, could also be interesting.)
1. Are you referring to your exchange with David Mathers here?
3. I’m not sure what you’re saying here. Just to clarify what my point is: you’re arguing in the post that the slow scenario actually describes big improvements in AI capabilities. My counterpoint is that this scenario is not given a lot of weight by the respondents, suggesting that they mostly don’t agree with you on this.
You are guessing you know how the framing affected the results, which is your right, but it is my right to guess something different, and the whole point we do surveys is not to guess but to know. If we wanted to rely on guesses, we could have saved the Forecasting Research Institute the trouble of running the whole survey in the first place!
I don’t think this is an accurate summary of the disagreement, but I’ve tried to clarify my point twice already, so I’m going to leave it at that.
I don’t mind if you don’t respond — it’s fair to leave a discussion whenever you like — but I want to try to make my point clear for you and for anyone else who might read this post.
How do you reckon the responses to the survey question would be different if there were a significant question wording effect biasing the results? My position is: I simply don’t know and can’t say how the results would be different if the question were phrased in such a way as to better avoid a question wording effect. The reason to run surveys is to learn that and be surprised. If the question were worded and framed differently, maybe the results would be very different, maybe they would be a little different, maybe they would be exactly the same. I don’t know. Do you know? Do you actually know for sure? Or are you just guessing?
What if we consider the alternative? Let’s say the response was something like, I don’t know, 95% in favour of the slow progress scenario, 4% for the moderate scenario, and 1% for the rapid. Just to imagine something for the sake of illustration. Then you could also argue against a potential question wording effect biasing the results by appealing to the response data. You could say: well, clearly the respondents saw past the framing of the question and managed to accurately report their views anyway.
This should be troubling. If a high percentage can be used to argue against a question wording effect and a low percentage can be used to argue against a question wording effect, then no matter what the results are, you can argue that you don’t need to worry about a potential methodological problem because the results show they’re not a big deal. If any results can be used to argue against a methodological problem, then surely no results should be used to argue against a methodological problem. Does that make sense?
I don’t feel like I’m inventing the wheel here, but just talking about common concerns with how surveys are designed and worded. In general, you can’t know whether a response was biased or not just by looking at the data and not the methodology.
For reference, here are the results on page 141 of the report:
Yeah, the error here was mine sorry. I didn’t actually work on the survey, and I missed that it was actually estimating the % of the panel agreeing we are in a scenario, not the chance that that scenario will win a plurality of the panel. This is my fault not Connacher’s. I was not one of the survey designers, so please do not assume from this that the people at the FRI who designed the survey didn’t understand their own questions or anything like that.
For what it’s worse, I think this is decent evidence that the question is too confusing to be useful given that I mischaracterized it even though I was one of the forecasters. So I largely, although not entirely withdraw the claim that you should update on the survey results. (That is, I think it still constitutes suggestive evidence that you are way out of line with experts-and superforecasters- but no longer super-strong.)
I also somewhat withdraw the claim that we should take even well-designed expert surveys as strong evidence of the actual distribution of opinions. I had forgotten the magnitude of the framing effect that titotal found for the human extinction questions. That really does somewhat call the reliability of even a decently designed survey into question. That said, I don’t really see a better way to get at “what do experts” think than surveys here, and I doubt they have zero value. But people should probably test multiple framings more. Nonetheless “there could be a big framing effect because it asks for a %”, i.e. the titotal criticism, could apply to literally any survey, and I’m a bit skeptical of “surveys are a zero value method of getting at expert opinion”.
So I think I concede that you were right not to be massively moved by the survey, and I was wrong to say you should be. That said, maybe I’m wrong, but I seem to recall that you frequently imply that EA opinion on the plausibility of AGI by 2032 is way out of step with what “real experts” think. If your actual opinion is that no one has ever done a well-designed survey, then you should probably stop saying that. Or cite a survey you think is well-designed that actually shows other people are more out of step with expert opinion than you are, or say that EAs are out of step with expert opinion in your best guess, but you can’t really claim with any confidence that you are any more in line with it. My personal guess is that your probabilities are in fact several orders of magnitude away from the “real” median of experts and superforecasters, if we could somehow control for framing effects, but I admit I can’t prove this.
But I will say that if taken at face value the survey still shows a big gap between what experts think and your “under 1 in 100,000 chance of AGI by 2032″ (That is, you didn’t complain when I attributed that probability to you in the earlier thread, and I don’t see any other way to interpret “more likely that JFK is secretly still alive” given you insisted you meant it literally.) Obviously, if someone is thinking that the most likely outcome in 2030 we will be in a situation where approx. 1 in 4 people on the panel think we are already in the rapid scenario, they probably don’t think the chance of AGI by 2032 is under 1 in 100,000, since they are basically predicting that we’re going to bear near the upper end of the moderate scenario, which makes it hard to give a chance of AGI by 2 years after 2030 that low. (I suppose they just could have a low opinion of the panel, and think some of the members will be total idiots, but I consider that unlikely.) I’d also say that if forecasters made the mistake I did in interpreting the question, then again, they are clearly out of step with the probability you give. I’m also still prepared to defend the survey against some of your other criticisms
I really appreciate that, but the report itself made the same mistake!
Here is what the report says on page 38:
And the same mistake is repeated again on the Forecasting Research Institute’s Substack in a post which is cross-posted on the EA Forum:
There are two distinct issues here: 1) the “best matching” qualifier, which contradicts these unqualified statements about probability and 2) the intersubjective resolution/metaprediction framing of the question, which I still find confusing but I’m waiting to see if I can ultimately wrap my head around. (See my comment here.)
I give huge credit to Connacher Murphy for acknowledging that the probability should not be stated without the qualifier that this is only the respondents’ “best matching” scenario, and for promising to revise the report with that qualifier added. Kudos, a million times kudos. My gratitude and relief is immense. (I hope that the Forecasting Research Institute will also update the wording of the Substack post and the EA Forum post to clarify this.)
Conversely, it bothers me that Benjamin Tereick said that it’s only “slightly inaccurate” and not “a big issue” to present this survey response as the experts’ unqualified probabilities. Benjamin doesn’t work for the Forecasting Research Institute, so his statements don’t affect your organization’s reputation in my books, but I find that frustrating. In case it’s in doubt: making mistakes is absolutely fine and no problem. (Lord knows I make mistakes!) Acknowledging mistakes increases your credibility. (I think a lot of people have this backwards. I guess blame the culture we live in for that.)
You’re right!
It would be very expensive and maybe just not feasible, but in my opinion the most interesting and valuable data could be obtained from long-form, open-ended, semi-unstructured, qualitative research interviews.
Here’s why I say that. You know what the amazing part of this report is? The rationale examples! Specifically, the ones on page 142 and 143. We only get morsels, but, for me, this is the main attraction, not an afterthought.
For example, we get this rationale examples in support of the moderate progress scenario:
Therein lies the crux! Personally, I strongly believe the METR time horizon graph is not evidence of anything significant with regard to AGI, so it’s good to know where the disagreement lies.
Or this other rationale example for the moderate progress scenario:
This is unfathomable to me! Toby Crisford expressed similar incredulity about someone saying the same for the slow scenario, and I agree with him.
Either this respondent is just interpreting the scenario description completely differently than I am (this does happen kind of frequently), or, if they’re interpreting it the same way I am, the respondent is expressing a view that not only do I just not believe, I have a hard time fathoming how anyone could believe it.
So, it turns out it’s way more interesting to find out why people disagree than to find out that they disagree.
Listen, I’d read surveys all day if I could. The issue is just — using this FRI survey and the AI Impacts survey as the two good examples we have — it turns out survey design is thornier and more complicated than anyone realized going in.
Thank you! I appreciate it!
Before this FRI survey, the only expert survey we had on AGI timelines was the AI Impacts survey. I found the 69-year framing effect in that survey strange. Here’s what I said about it in a comment 3 weeks ago:
After digging into this FRI survey — and seeing titotal’s brilliant comment about the 750,000x anchoring effect for the older survey — now I’m really questioning whether I should keep citing the AI Impacts survey at all. Even recognizing the framing effect in the AI Impacts survey was very strange, I didn’t realize the full extent to which the results of surveys could be the artefacts of survey design. I thought of that framing effect as a very strange quirk, and a big question mark, but now it seems like the problem is bigger and more fundamental than I realized. (Independently, I’ve recently been realizing just how differently people imagine AGI or other hypothetical future AI systems, even when care is taken to give a precise definition or paint a picture with a scenario like in the FRI survey.)
There’s also the AAAI survey where 76% of experts said it’s unlikely or very unlikely that current AI approaches (including LLMs) would scale to AGI (see page 66). That doesn’t ask anything about timing. But now I’m questioning even that result, since who knows if “AGI” is well-defined or what the respondents mean by that term?
I think you might be mixing up two different things. I have strong words about people in EA who aren’t aware that many experts (most, if you believe the AAAI survey) think LLMs won’t scale to AGI, who don’t consider this to be a perspective worth seriously discussing, or who have never heard of or considered this perspective before. In that sense, I think the average/typical/median EA opinion is way out of step with what AI experts think.
When it comes to anything involving timelines (specifically, expert opinion vs. EA opinion), I don’t have particularly strong feelings about that, and my words have been much less strong. This is what I said in a recent post, which you commented on:
I’m not even sure I’m not an outlier compared to AI experts. I was just saying if you take the surveys at face value — which now I’m especially questioning — my view looks a lot less like an outlier than if you used the EA Forum as the reference class.
The median might obscure the deeper meaning we want to get to. The views of experts might be quite diverse. For example, there might be 10% of experts who put significantly less than a 1% probability on AGI within 10 years and 10% of experts who put more than a 90% probability on it. I’m not really worried about being off from the median. I would be worried if my view, or something close to it, wasn’t shared by a significant percentage of experts — and then I would really have to think carefully.[1]
For example, Yann LeCun, one of the foremost AI researchers in the world, has said it’s impossible that LLMs will scale to AGI — not improbable, not unlikely, impossible. Richard Sutton, another legendary AI researcher, has said LLMs are a dead end. This sounds like a ~0% probability, so I don’t mind if I agree with them and have a ~0% probability as well.
Surely you wouldn’t argue that all the experts should circularly update until all their probabilities are the same, right? That sounds like complete madness. People believe what they believe for reasons, and should be convinced on the grounds of reasons. This is the right way to do it. This is the Enlightenment’s world! We’re just living in it!
Maybe something we’re clashing over here is the difference between LessWrong-style/Yudkowskyian Bayesianism and the traditional, mainstream scientific mindset. LessWrong-style/Yudkowskyian Bayesianism emphasizes personal, subjective guesses about things — a lot. It also emphasizes updating quickly, even making snap judgments.
The traditional, mainstream scientific mindset emphasizes cautious vetting of evidence before making updates. It prefers to digest information slowly, thoughtfully. There is an emphasis on not making claims or stating numbers that one cannot justify, and on not relying on subjective guesses or probabilities.
I am a hardcore proponent of the traditional, mainstream scientific mindset. I do not consider LessWrong-style/Yudkowskyian Bayesianism to be of any particular merit or value. It seems to make people join cults more often than it leads them to great scientific achievement. Indeed, Yudkowsky’s/LessWrong’s epistemology has been criticized as anti-scientific, a criticism I’m inclined to agree with.
This is slightly incorrect, but it’s almost not important enough to be worth correcting. I said there was a less than 1 in 10,000 chance of the rapid progress scenario in the survey, without the adoption vs. capabilities caveat (which I wasn’t given at the time) — which I take to imply a much higher bar than “merely” AGI. I also made this exact same correction once before, in a comment you replied to. But it really doesn’t matter.
On reflection, I probably would say the probability of the rapid progress scenario (without the adoption vs. capabilities caveat) — a widely deployed, very powerful superhuman AGI or superintelligence by December 2030 — has less than a 1 in 100,000 chance of coming to pass, though. Here’s my reasoning. The average chance of getting struck by lightning over 5 years is, as I understand, about 1 in 250,000. I would be more surprised if the rapid scenario (without the adoption vs. capabilities caveat) came true than if I got struck by lightning within that same timeframe. Intuitively, me getting struck by lightning seems more likely than the rapid scenario (without the adoption vs. capabilities caveat). So, it seems like I should say the chances are less than 1 in 250,000.
That said, I’m not a forecaster, and I’m not sure how to calibrate subjective, intuitive probabilities significantly below 1% for unprecedented eschatological/world-historical/natural-historical events of cosmic significance that may involve the creation of new science that’s currently unknown to anyone in the world, and that can’t be predicted mechanically or statistically. What does the forecasting literature say about this?
AGI is a lower bar than the rapid scenario (in my interpretation), especially without the caveat, and the exact, nitpicky definition of AGI could easily make the probability go up or down a lot. For now, I’m sticking with significantly less than a 1 in 1,000 chance of AGI before the end of 2032 on my definition of AGI.[2] If I thought about it more, I might say the chance is less than 1 in 10,000, or less than 1 in 100,000, or less than 1 in 250,000, but I’d have to think about it more, and I haven’t yet.
What probability of AGI before the end of 2032, on my definition of AGI (see the footnote), would you give?
Let me know if there’s anything in your comment I didn’t respond to that you’d like an answer about.
If you happen to be curious, I wrote a post trying to integrate two truths: 1) if no one ever challenged the expert consensus or social consensus, the world would suck and 2) the large majority of people who challenge the expert consensus or social consensus are not able to improve upon it.
From a post a month ago:
Reliability and data efficiency are key concepts here. So are fluid intelligence, generalization, continual learning, learning efficiently from video/visual information, and hierarchical planning. In my definition of the term, these are table stakes for AGI.
It would be an interesting exercise to survey AI researchers and ask them to assign a probability for each of these problems being solved to human-level by some date, such as 2030, 2040, 2050, 2100, etc. Then, after all that, ask them to estimate the probability of all these problems being solved to human-level. This might be an inappropriately biasing framing. Or it might be a framing that actually gets the crux of the matter much better than other framings. I don’t know.
I don’t make a prediction about what the median response would be, but I suspect the distribution of responses would include a significant percentage of researchers who think the probability of solving all these problems by 2030 or 2040 is quite low. Especially if they were primed to anchor their probabilities to various 1-in-X probability events rather than only expressing them as a percentage. (Running two versions of the question, one with percentages and one with 1-in-X probabilities, could also be interesting.)