Morally, I am impressed that you are doing an in many ways socially awkward and uncomfortable thing because you think it is right.
BUT
I strongly object to you citing the Metaculus AGI question as significant evidence of AGI by 2030. I do not think that when people forecast that question, they are necessarily forecasting when AGI, as commonly understood or in the sense thatâs directly relevant to X-risk will arrive. Yes the title of the question mentions AGI. But if you look at the resolution criteria, all an AI model has to in order to resolve the question âyesâ is pass a couple of benchmarks involving coding and general knowledge, put together a complicated model car, and imitate. None of that constitutes being AGI in the sense of âcan replace any human knowledge worker in any jobâ. For one thing, it doesnât involve any task that is carried out over a time span of days or weeks, but we know that memory and coherence over long time scales is something current models seem to be relatively bad at, compared to passing exam-style benchmarks. It also doesnât include any component that tests the ability of models to learn new tasks at human-like speed, which again, seems to be an issue with current models. Now, maybe despite all this, itâs actually the case that any model that can pass the benchmark will in fact be AGI in the sense of âcan permanently replace almost any human knowledge workerâ, or at least will obviously only be a 1-2 years of normal research progress away from that. But that is a highly substantive assumption in my view.
I know this is only one piece of evidence you cite, and maybe it isnât actually a significant driver of your timelines, but I still think it should have been left out.
Thanks David. I agree that the Metaculus question is a mediocre proxy for AGI, for the reasons you say. We included it primarily because it shows the magnitude of the AI timelines update that we and others have made over the past few years.
In case itâs helpful context, here are two footnotes that I included in the strategy document that this post is based on, but that we cut for brevity in this EA Forum version:
We define AGI using the Morris, et al./âDeepmind (2024) definition (see table 1) of âcompetent AGIâ for the purposes of this document: an AI system that performs as well as at least 50% of skilled adults at a wide range of non-physical tasks, including metacognitive tasks like learning new skills.
This Deepmind definition of AGI is the one that we primarily use internally. I think that we may get strategically significant AI capabilities before this though, for example via automated AI R&D.
On the Metaculus definition, I included this footnote:
The headline Metaculus forecast on AGI doesnât fully line up with the Morris, et al. (2024) definition of AGI that we use in footnote 2. For example, the Metaculus definition includes robotic capabilities, and doesnât include being able to successfully do long-term planning and execution loops. But nonetheless I think this is the closest proxy for an AGI timeline that Iâve found on a public prediction market.
Morally, I am impressed that you are doing an in many ways socially awkward and uncomfortable thing because you think it is right.
BUT
I strongly object to you citing the Metaculus AGI question as significant evidence of AGI by 2030. I do not think that when people forecast that question, they are necessarily forecasting when AGI, as commonly understood or in the sense thatâs directly relevant to X-risk will arrive. Yes the title of the question mentions AGI. But if you look at the resolution criteria, all an AI model has to in order to resolve the question âyesâ is pass a couple of benchmarks involving coding and general knowledge, put together a complicated model car, and imitate. None of that constitutes being AGI in the sense of âcan replace any human knowledge worker in any jobâ. For one thing, it doesnât involve any task that is carried out over a time span of days or weeks, but we know that memory and coherence over long time scales is something current models seem to be relatively bad at, compared to passing exam-style benchmarks. It also doesnât include any component that tests the ability of models to learn new tasks at human-like speed, which again, seems to be an issue with current models. Now, maybe despite all this, itâs actually the case that any model that can pass the benchmark will in fact be AGI in the sense of âcan permanently replace almost any human knowledge workerâ, or at least will obviously only be a 1-2 years of normal research progress away from that. But that is a highly substantive assumption in my view.
I know this is only one piece of evidence you cite, and maybe it isnât actually a significant driver of your timelines, but I still think it should have been left out.
Thanks David. I agree that the Metaculus question is a mediocre proxy for AGI, for the reasons you say. We included it primarily because it shows the magnitude of the AI timelines update that we and others have made over the past few years.
In case itâs helpful context, here are two footnotes that I included in the strategy document that this post is based on, but that we cut for brevity in this EA Forum version:
This Deepmind definition of AGI is the one that we primarily use internally. I think that we may get strategically significant AI capabilities before this though, for example via automated AI R&D.
On the Metaculus definition, I included this footnote:
Thanks, that is reassuring.
Curious if you have better suggestions for forecasts to use, especially for communicating to a wider audience thatâs new to AI safety.
I havenât read it, but Zershaaneh Qureshi at Convergence Analysis wrote a recent report on pathways to short timelines.
I donât know of anything better right now.