No reason to assume an individual Metaculus commentator agrees with the Metaculus timeline, so I donât think thatâs very fair.
I actually think the two Metaculus questions are just bad questions. The detailed resolution criteria donât necessarily match what we intuitively think=AGI or transformative AI, or obviously capture anything that important, and it is just unclear whether people are forecasting on the actual resolution criteria or on their own idea of what âAGIâ is.
All the tasks in both AGI questions are quite short, so itâs easy to imagine an AI beating all of them, and yet not being able to replace most human knowledge workers, because it canât handle long-running tasks. Itâs also just not clear how performance on benchmark questions and the Turing test translates to competence with even short-term tasks in the real world. So even if you think AGI in the sense of âAI that can automate all knowledge workâ (let alone all work) is far away, it might make sense to think we are only a few years from a system that can resolve these questions yes.
On the other hand, resolving the questions âyesâ could conceivably lag the invention of some very powerful and significant systems, perhaps including some that some reasonable definition would count as AGI.
As someone points out in the comments of one of the questions; right now, any mainstream LLM will fail the Turing test, however smart, because if you ask âhow do I make chemical weaponsâ itâll read you a stiff lecture about why it canât do that as it would violate its principles. In theory, that could remain true even if we reach AGI. (The questions only resolve âyesâ if a system that can pass the Turing test is actually constructed, itâs not enough for this to be easy to do if Open AI or whoever want to.) And the stronger of the two questions requires that a system can do a complex manual task. Fair enough, some reasonable definitions of âAGIâ do require machines that can match humans at every manual dexterity-based cognitive task. But a system that could automate all knowledge work, but not handle piloting a robot body would still be quite transformative.
Which particular resolution criteria do you think itâs unreasonable to believe will be met by 2027/â2032 (depending on whether itâs the weak AGI question or the strong one)?
Two of the four in particular stand out. First, the Turing Test one exactly for the reason you mentionâasking the model to violate the terms of service is surely an easy way to win. Thatâs the resolution criteria, so unless the Metaculus users think thatâll be solved in 3 years[1] then the estimates should be higher. Second, the SAT-passing requires âhaving less than ten SAT exams as part of the training dataâ, which is very unlikely in current Frontier models, and labs probably arenât keen to share what exactly they have trained on.
it is just unclear whether people are forecasting on the actual resolution criteria or on their own idea of what âAGIâ is.
No reason to assume an individual Metaculus commentator agrees with the Metaculus timeline, so I donât think thatâs very fair.
I donât know if it is unfair. This is Metaculus! Premier forecasting website! These people should be reading the resolution criteria and judging their predictions according to them. Just going off personal vibes on how much they âfeel the AGIâ feels like a sign of epistemic rot to me. I know not every Metaculus user agrees with this, but it is shaped by the aggregate â 2027/â2032 are very short timelines, and those are median community predictions. This is my main issue with the Metaculus timelines atm.
I actually think the two Metaculus questions are just bad questions.
I mean, I do agree with you in the sense that they donât fully match AGI, but thatâs partly because âAGIâ covers a bunch of different ideas and concepts. It might well be possible for a system to satisfy these conditions but not replace knowledge workers, perhaps a new market focusing on automation and employment might be better but that also has its issues with operationalisation.
What I meant to say was unfair was basing âeven Metaculus users, think Aschenbrennerâs stuff is bad, and they have short time lines, off the reaction to Aschenbrenner of only one or two people.
No reason to assume an individual Metaculus commentator agrees with the Metaculus timeline, so I donât think thatâs very fair.
I actually think the two Metaculus questions are just bad questions. The detailed resolution criteria donât necessarily match what we intuitively think=AGI or transformative AI, or obviously capture anything that important, and it is just unclear whether people are forecasting on the actual resolution criteria or on their own idea of what âAGIâ is.
All the tasks in both AGI questions are quite short, so itâs easy to imagine an AI beating all of them, and yet not being able to replace most human knowledge workers, because it canât handle long-running tasks. Itâs also just not clear how performance on benchmark questions and the Turing test translates to competence with even short-term tasks in the real world. So even if you think AGI in the sense of âAI that can automate all knowledge workâ (let alone all work) is far away, it might make sense to think we are only a few years from a system that can resolve these questions yes.
On the other hand, resolving the questions âyesâ could conceivably lag the invention of some very powerful and significant systems, perhaps including some that some reasonable definition would count as AGI.
As someone points out in the comments of one of the questions; right now, any mainstream LLM will fail the Turing test, however smart, because if you ask âhow do I make chemical weaponsâ itâll read you a stiff lecture about why it canât do that as it would violate its principles. In theory, that could remain true even if we reach AGI. (The questions only resolve âyesâ if a system that can pass the Turing test is actually constructed, itâs not enough for this to be easy to do if Open AI or whoever want to.) And the stronger of the two questions requires that a system can do a complex manual task. Fair enough, some reasonable definitions of âAGIâ do require machines that can match humans at every manual dexterity-based cognitive task. But a system that could automate all knowledge work, but not handle piloting a robot body would still be quite transformative.
Which particular resolution criteria do you think itâs unreasonable to believe will be met by 2027/â2032 (depending on whether itâs the weak AGI question or the strong one)?
Two of the four in particular stand out. First, the Turing Test one exactly for the reason you mentionâasking the model to violate the terms of service is surely an easy way to win. Thatâs the resolution criteria, so unless the Metaculus users think thatâll be solved in 3 years[1] then the estimates should be higher. Second, the SAT-passing requires âhaving less than ten SAT exams as part of the training dataâ, which is very unlikely in current Frontier models, and labs probably arenât keen to share what exactly they have trained on.
it is just unclear whether people are forecasting on the actual resolution criteria or on their own idea of what âAGIâ is.
No reason to assume an individual Metaculus commentator agrees with the Metaculus timeline, so I donât think thatâs very fair.
I donât know if it is unfair. This is Metaculus! Premier forecasting website! These people should be reading the resolution criteria and judging their predictions according to them. Just going off personal vibes on how much they âfeel the AGIâ feels like a sign of epistemic rot to me. I know not every Metaculus user agrees with this, but it is shaped by the aggregate â 2027/â2032 are very short timelines, and those are median community predictions. This is my main issue with the Metaculus timelines atm.
I actually think the two Metaculus questions are just bad questions.
I mean, I do agree with you in the sense that they donât fully match AGI, but thatâs partly because âAGIâ covers a bunch of different ideas and concepts. It might well be possible for a system to satisfy these conditions but not replace knowledge workers, perhaps a new market focusing on automation and employment might be better but that also has its issues with operationalisation.
On top of everything else needed to successfully pass the imitation game
What I meant to say was unfair was basing âeven Metaculus users, think Aschenbrennerâs stuff is bad, and they have short time lines, off the reaction to Aschenbrenner of only one or two people.