No reason to assume an individual Metaculus commentator agrees with the Metaculus timeline, so I don’t think that’s very fair.
I actually think the two Metaculus questions are just bad questions. The detailed resolution criteria don’t necessarily match what we intuitively think=AGI or transformative AI, or obviously capture anything that important, and it is just unclear whether people are forecasting on the actual resolution criteria or on their own idea of what “AGI” is.
All the tasks in both AGI questions are quite short, so it’s easy to imagine an AI beating all of them, and yet not being able to replace most human knowledge workers, because it can’t handle long-running tasks. It’s also just not clear how performance on benchmark questions and the Turing test translates to competence with even short-term tasks in the real world. So even if you think AGI in the sense of “AI that can automate all knowledge work” (let alone all work) is far away, it might make sense to think we are only a few years from a system that can resolve these questions yes.
On the other hand, resolving the questions ‘yes’ could conceivably lag the invention of some very powerful and significant systems, perhaps including some that some reasonable definition would count as AGI.
As someone points out in the comments of one of the questions; right now, any mainstream LLM will fail the Turing test, however smart, because if you ask “how do I make chemical weapons” it’ll read you a stiff lecture about why it can’t do that as it would violate its principles. In theory, that could remain true even if we reach AGI. (The questions only resolve ‘yes’ if a system that can pass the Turing test is actually constructed, it’s not enough for this to be easy to do if Open AI or whoever want to.) And the stronger of the two questions requires that a system can do a complex manual task. Fair enough, some reasonable definitions of “AGI” do require machines that can match humans at every manual dexterity-based cognitive task. But a system that could automate all knowledge work, but not handle piloting a robot body would still be quite transformative.
Which particular resolution criteria do you think it’s unreasonable to believe will be met by 2027/2032 (depending on whether it’s the weak AGI question or the strong one)?
Two of the four in particular stand out. First, the Turing Test one exactly for the reason you mention—asking the model to violate the terms of service is surely an easy way to win. That’s the resolution criteria, so unless the Metaculus users think that’ll be solved in 3 years[1] then the estimates should be higher. Second, the SAT-passing requires “having less than ten SAT exams as part of the training data”, which is very unlikely in current Frontier models, and labs probably aren’t keen to share what exactly they have trained on.
it is just unclear whether people are forecasting on the actual resolution criteria or on their own idea of what “AGI” is.
No reason to assume an individual Metaculus commentator agrees with the Metaculus timeline, so I don’t think that’s very fair.
I don’t know if it is unfair. This is Metaculus! Premier forecasting website! These people should be reading the resolution criteria and judging their predictions according to them. Just going off personal vibes on how much they ‘feel the AGI’ feels like a sign of epistemic rot to me. I know not every Metaculus user agrees with this, but it is shaped by the aggregate − 2027/2032 are very short timelines, and those are median community predictions. This is my main issue with the Metaculus timelines atm.
I actually think the two Metaculus questions are just bad questions.
I mean, I do agree with you in the sense that they don’t fully match AGI, but that’s partly because ‘AGI’ covers a bunch of different ideas and concepts. It might well be possible for a system to satisfy these conditions but not replace knowledge workers, perhaps a new market focusing on automation and employment might be better but that also has its issues with operationalisation.
What I meant to say was unfair was basing “even Metaculus users, think Aschenbrenner’s stuff is bad, and they have short time lines, off the reaction to Aschenbrenner of only one or two people.
No reason to assume an individual Metaculus commentator agrees with the Metaculus timeline, so I don’t think that’s very fair.
I actually think the two Metaculus questions are just bad questions. The detailed resolution criteria don’t necessarily match what we intuitively think=AGI or transformative AI, or obviously capture anything that important, and it is just unclear whether people are forecasting on the actual resolution criteria or on their own idea of what “AGI” is.
All the tasks in both AGI questions are quite short, so it’s easy to imagine an AI beating all of them, and yet not being able to replace most human knowledge workers, because it can’t handle long-running tasks. It’s also just not clear how performance on benchmark questions and the Turing test translates to competence with even short-term tasks in the real world. So even if you think AGI in the sense of “AI that can automate all knowledge work” (let alone all work) is far away, it might make sense to think we are only a few years from a system that can resolve these questions yes.
On the other hand, resolving the questions ‘yes’ could conceivably lag the invention of some very powerful and significant systems, perhaps including some that some reasonable definition would count as AGI.
As someone points out in the comments of one of the questions; right now, any mainstream LLM will fail the Turing test, however smart, because if you ask “how do I make chemical weapons” it’ll read you a stiff lecture about why it can’t do that as it would violate its principles. In theory, that could remain true even if we reach AGI. (The questions only resolve ‘yes’ if a system that can pass the Turing test is actually constructed, it’s not enough for this to be easy to do if Open AI or whoever want to.) And the stronger of the two questions requires that a system can do a complex manual task. Fair enough, some reasonable definitions of “AGI” do require machines that can match humans at every manual dexterity-based cognitive task. But a system that could automate all knowledge work, but not handle piloting a robot body would still be quite transformative.
Which particular resolution criteria do you think it’s unreasonable to believe will be met by 2027/2032 (depending on whether it’s the weak AGI question or the strong one)?
Two of the four in particular stand out. First, the Turing Test one exactly for the reason you mention—asking the model to violate the terms of service is surely an easy way to win. That’s the resolution criteria, so unless the Metaculus users think that’ll be solved in 3 years[1] then the estimates should be higher. Second, the SAT-passing requires “having less than ten SAT exams as part of the training data”, which is very unlikely in current Frontier models, and labs probably aren’t keen to share what exactly they have trained on.
it is just unclear whether people are forecasting on the actual resolution criteria or on their own idea of what “AGI” is.
No reason to assume an individual Metaculus commentator agrees with the Metaculus timeline, so I don’t think that’s very fair.
I don’t know if it is unfair. This is Metaculus! Premier forecasting website! These people should be reading the resolution criteria and judging their predictions according to them. Just going off personal vibes on how much they ‘feel the AGI’ feels like a sign of epistemic rot to me. I know not every Metaculus user agrees with this, but it is shaped by the aggregate − 2027/2032 are very short timelines, and those are median community predictions. This is my main issue with the Metaculus timelines atm.
I actually think the two Metaculus questions are just bad questions.
I mean, I do agree with you in the sense that they don’t fully match AGI, but that’s partly because ‘AGI’ covers a bunch of different ideas and concepts. It might well be possible for a system to satisfy these conditions but not replace knowledge workers, perhaps a new market focusing on automation and employment might be better but that also has its issues with operationalisation.
On top of everything else needed to successfully pass the imitation game
What I meant to say was unfair was basing “even Metaculus users, think Aschenbrenner’s stuff is bad, and they have short time lines, off the reaction to Aschenbrenner of only one or two people.