Morally, I am impressed that you are doing an in many ways socially awkward and uncomfortable thing because you think it is right.
BUT
I strongly object to you citing the Metaculus AGI question as significant evidence of AGI by 2030. I do not think that when people forecast that question, they are necessarily forecasting when AGI, as commonly understood or in the sense that’s directly relevant to X-risk will arrive. Yes the title of the question mentions AGI. But if you look at the resolution criteria, all an AI model has to in order to resolve the question ‘yes’ is pass a couple of benchmarks involving coding and general knowledge, put together a complicated model car, and imitate. None of that constitutes being AGI in the sense of “can replace any human knowledge worker in any job”. For one thing, it doesn’t involve any task that is carried out over a time span of days or weeks, but we know that memory and coherence over long time scales is something current models seem to be relatively bad at, compared to passing exam-style benchmarks. It also doesn’t include any component that tests the ability of models to learn new tasks at human-like speed, which again, seems to be an issue with current models. Now, maybe despite all this, it’s actually the case that any model that can pass the benchmark will in fact be AGI in the sense of “can permanently replace almost any human knowledge worker”, or at least will obviously only be a 1-2 years of normal research progress away from that. But that is a highly substantive assumption in my view.
I know this is only one piece of evidence you cite, and maybe it isn’t actually a significant driver of your timelines, but I still think it should have been left out.
Thanks David. I agree that the Metaculus question is a mediocre proxy for AGI, for the reasons you say. We included it primarily because it shows the magnitude of the AI timelines update that we and others have made over the past few years.
In case it’s helpful context, here are two footnotes that I included in the strategy document that this post is based on, but that we cut for brevity in this EA Forum version:
We define AGI using the Morris, et al./Deepmind (2024) definition (see table 1) of “competent AGI” for the purposes of this document: an AI system that performs as well as at least 50% of skilled adults at a wide range of non-physical tasks, including metacognitive tasks like learning new skills.
This Deepmind definition of AGI is the one that we primarily use internally. I think that we may get strategically significant AI capabilities before this though, for example via automated AI R&D.
On the Metaculus definition, I included this footnote:
The headline Metaculus forecast on AGI doesn’t fully line up with the Morris, et al. (2024) definition of AGI that we use in footnote 2. For example, the Metaculus definition includes robotic capabilities, and doesn’t include being able to successfully do long-term planning and execution loops. But nonetheless I think this is the closest proxy for an AGI timeline that I’ve found on a public prediction market.
Morally, I am impressed that you are doing an in many ways socially awkward and uncomfortable thing because you think it is right.
BUT
I strongly object to you citing the Metaculus AGI question as significant evidence of AGI by 2030. I do not think that when people forecast that question, they are necessarily forecasting when AGI, as commonly understood or in the sense that’s directly relevant to X-risk will arrive. Yes the title of the question mentions AGI. But if you look at the resolution criteria, all an AI model has to in order to resolve the question ‘yes’ is pass a couple of benchmarks involving coding and general knowledge, put together a complicated model car, and imitate. None of that constitutes being AGI in the sense of “can replace any human knowledge worker in any job”. For one thing, it doesn’t involve any task that is carried out over a time span of days or weeks, but we know that memory and coherence over long time scales is something current models seem to be relatively bad at, compared to passing exam-style benchmarks. It also doesn’t include any component that tests the ability of models to learn new tasks at human-like speed, which again, seems to be an issue with current models. Now, maybe despite all this, it’s actually the case that any model that can pass the benchmark will in fact be AGI in the sense of “can permanently replace almost any human knowledge worker”, or at least will obviously only be a 1-2 years of normal research progress away from that. But that is a highly substantive assumption in my view.
I know this is only one piece of evidence you cite, and maybe it isn’t actually a significant driver of your timelines, but I still think it should have been left out.
Thanks David. I agree that the Metaculus question is a mediocre proxy for AGI, for the reasons you say. We included it primarily because it shows the magnitude of the AI timelines update that we and others have made over the past few years.
In case it’s helpful context, here are two footnotes that I included in the strategy document that this post is based on, but that we cut for brevity in this EA Forum version:
This Deepmind definition of AGI is the one that we primarily use internally. I think that we may get strategically significant AI capabilities before this though, for example via automated AI R&D.
On the Metaculus definition, I included this footnote:
Thanks, that is reassuring.
Curious if you have better suggestions for forecasts to use, especially for communicating to a wider audience that’s new to AI safety.
I haven’t read it, but Zershaaneh Qureshi at Convergence Analysis wrote a recent report on pathways to short timelines.
I don’t know of anything better right now.