Honestly, not sure I would agree with this. Like Chollet said, this is fundamentally different from simply scaling the amount of parameters (derived from pre-training) that a lot of previous scaling discourse centered around. To then take this inference time scaling stuff, which requires a qualitatively different CoT/āSearch Tree strategy to be appended to an LLM alongside an evaluator model, and call it scaling is a bit of a rhetorical sleight of hand.
While this is no doubt a big deal and a concrete step toward AGI, there are enough architectural issues around planning, multi-step tasks/āprojects and actual permanent memory (not just RAG) that Iām not updating as much as much as most people are on this. I would also like to see if this approach works on tasks without clear, verifiable feedback mechanisms (unlike software engineering/āprogramming or math). My timelines remain in the 2030s.
It might be fair to say that the o3 improvements are something fundamentally different to simple scaling, and that Chollet is still correct in his āLLMs will not simply scale to AGIā prediction. I didnāt mean in my comment to suggest he was wrong about that.
I could imagine someone criticizing him for exaggerating how far away we were from coming up with the necessary new ideas, given the o3 results, but Iām not so interested in the debate about exactly how right or wrong the predictions of this one person were.
The interesting thing for me is: whether he was wrong, or whether he was right but o3 does represent a fundamentally different kind of model, the upshot for how seriously we should take o3 seems the same! It feels like a pretty big deal!
He could have reacted to this news by criticizing the way that o3 achieved its results. He already said in the Dwarkesh Patel interview that someone beating ARC wouldnāt necessarily imply progress towards general intelligence if the way they achieved it went against the spirit of the task. When I clicked the link in this post, I thought it likely I was about to read an argument along those lines. But thatās not what I got. Instead he was acknowledging that this was important progress.
Iām by no means an expert, but timelines in the 2030s still seems pretty close to me! Iād have thought, based on arguments from people like Chollet, that we might be a bit further off than that (although only with the low confidence of a layperson trying to interpret the competing predictions of experts who seem to radically disagree with each other).
Given all the problems you mention, and the high costs still involved in running this on simple tasks, I agree it still seems many years away. But previously Iād have put a fairly significant probability on AGI not being possible this century (as well as assigning a significant probability to it happening very soon, basically ending up highly uncertain). But it feels like these results make the idea that AGI is still 100 years away seem much less plausible than it was before.
A comment from FranƧois Chollet on this topic posted on Bluesky on January 6, 2025:
I donāt think people really appreciate how simple ARC-AGI-1 was, and what solving it really means.
It was designed as the simplest, most basic assessment of fluid intelligence possible. Failure to pass signifies a near-total inability to adapt or problem-solve in unfamiliar situations.
Passing it means your system exhibits non-zero fluid intelligenceāyouāre finally looking at something that isnāt pure memorized skill. But it says rather little about how intelligent your system is, or how close to human intelligence it is.
Sure, I think Iāve seen that comment before, and Iām aware Chollet also included loads of caveats in his initial write up of the o3 results.
But going from zero fluid intelligence to non-zero fluid intelligence seems like it should be considered a very significant milestone! Even if the amount of fluid intelligence is still small.
Previously there was a question around whether the new wave of AI models were capable of any fluid intelligence at all. Now, even someone like Chollet has concluded they are, so it just becomes a question of how easily those capabilities can scale?
Thatās the way Iām currently thinking about it anyway. Very open to the possibility that the nearness of AGI is still being overhyped.
I agree that itās a significant milestone, or at least it might be. I just read this comment a few hours ago (and the Twitter thread it links to) and that dampens my enthusiasm. 43 million words to solve one ARC-AGI-1 puzzle is a lot.
Also, I want to understand more about how ARC-AGI-2 is different from ARC-AGI-1. Chollet has said that about half of the tasks in ARC-AGI-1 turned out to be susceptible to ābrute forceā-type approaches. I donāt know what that means.
I think itās easy to get carried away with the implications of a result like this when youāre surrounded by so many voices saying that AGI is coming within 5 years or within 10 years.
My response to FranƧois Cholletās comments on o3ā²s high score on ARC-AGI-1 was more like, āOh, thatās really interesting!ā rather than making some big change to my views on AGI. I have to say, I was more excited about it before I knew it took 43 million words of text and over 1,000 attempts per task.
I still think no one knows how to build AGI and that (not unrelatedly) we donāt know when AGI will be built.
Chollet recently started a new company focused on combining deep learning and program synthesis. Thatās interesting. He seems to think the major AI labs like OpenAI and Google DeepMind are also working on program synthesis, but I donāt know how much publicly available evidence there is for this.
I can add Cholletās company to the list of organizations that I know of that have publicly discussed theyāre doing R&D related to AGI other than just scaling LLMs. The others I know of:
The Alberta Machine Intelligence Institute and Keen Technologies, both organizations where Richard Sutton is a key person and which (if I understand correctly) are pursuing at least to some extent Suttonās āAlberta Plan for AI Researchā
Numenta, a company co-founded by Jeff Hawkins, who has made aggressive statements about Numentaās ability to develop AGI in the not-too-distant future using insights from neuroscience (the main insights they think theyāve found are described here)
Yann LeCunās team at Meta AI, formerly FAIR; LeCun has published a roadmap to AGI, except he doesnāt call it AGI
I might be forgetting one or two. I know in the past Demis Hassabis has made some general comments about DeepMindās research related to AGI, but I donāt know of any specifics.
My gut sense is that all of these approaches will fail ā program synthesis combined with deep learning, the Alberta Plan, Numentaās Thousand Brains Principles, and Yann LeCunās roadmap. But this is just a random gut intuition and not a serious, considered opinion.
I think the idea that weāre barreling toward the imminent, inevitable invention of AGI is wrong. The idea is that AGI is so easy to invent and progress is happening so fast and so spontaneously that we can hardly stop ourselves from inventing AGI.
It would be seen as odd to take this view in any other area of technology, probably even among effective altruists. We would be lucky if we were barreling toward imminent, inevitable nuclear fusion or a universal coronavirus vaccine or a cure for cancer or any number of technologies that donāt exist yet that weād love to have.
Why does no one claim these technologies are being developed so spontaneously, so automatically, that we would have to take serious action to prevent them from being invented soon? Why is the attitude that progress is hard, success is uncertain, and the road is long?
Given thatās how technology usually works, and I donāt see any reason for AGI to be easier or take less time ā in fact, it seems like it should be harder and take longer, since the science of intelligence and cognition is among the least understood areas of science ā Iām inclined to guess that most approaches will fail.
Even if the right general approach is found, it could take a very long time to figure out how to actually make concrete progress using that approach. (By analogy, many of the general ideas behind deep learning existed for decades before deep learning started to take off around 2012.)
Iām interested in Cholletās interpretation of the o3 results on ARC-AGI-1 and if there is a genuine, fundamental advancement involved (which today, after finding out those details about o3ā²s attempts, I believe less than I did yesterday) then thatās exciting. But only moderately exciting because the advancement is only incremental.
The story that AGI is imminent and if we skirt disaster, weāll land in utopia is exciting and engaging. I think we live in a more boring version of reality (but still, all things considered, a pretty interesting one!) where weāre still at the drawing board stage for AGI, people are pitching different ideas (e.g., program synthesis, the Alberta Plan, the Thousand Brain Principles, energy-based self-supervised learning), the way forward is unclear, and weāre mostly in the dark about the fundamental nature of intelligence and cognition. Who knows how long it will take us to figure it out.
Honestly, not sure I would agree with this. Like Chollet said, this is fundamentally different from simply scaling the amount of parameters (derived from pre-training) that a lot of previous scaling discourse centered around. To then take this inference time scaling stuff, which requires a qualitatively different CoT/āSearch Tree strategy to be appended to an LLM alongside an evaluator model, and call it scaling is a bit of a rhetorical sleight of hand.
While this is no doubt a big deal and a concrete step toward AGI, there are enough architectural issues around planning, multi-step tasks/āprojects and actual permanent memory (not just RAG) that Iām not updating as much as much as most people are on this. I would also like to see if this approach works on tasks without clear, verifiable feedback mechanisms (unlike software engineering/āprogramming or math). My timelines remain in the 2030s.
It might be fair to say that the o3 improvements are something fundamentally different to simple scaling, and that Chollet is still correct in his āLLMs will not simply scale to AGIā prediction. I didnāt mean in my comment to suggest he was wrong about that.
I could imagine someone criticizing him for exaggerating how far away we were from coming up with the necessary new ideas, given the o3 results, but Iām not so interested in the debate about exactly how right or wrong the predictions of this one person were.
The interesting thing for me is: whether he was wrong, or whether he was right but o3 does represent a fundamentally different kind of model, the upshot for how seriously we should take o3 seems the same! It feels like a pretty big deal!
He could have reacted to this news by criticizing the way that o3 achieved its results. He already said in the Dwarkesh Patel interview that someone beating ARC wouldnāt necessarily imply progress towards general intelligence if the way they achieved it went against the spirit of the task. When I clicked the link in this post, I thought it likely I was about to read an argument along those lines. But thatās not what I got. Instead he was acknowledging that this was important progress.
Iām by no means an expert, but timelines in the 2030s still seems pretty close to me! Iād have thought, based on arguments from people like Chollet, that we might be a bit further off than that (although only with the low confidence of a layperson trying to interpret the competing predictions of experts who seem to radically disagree with each other).
Given all the problems you mention, and the high costs still involved in running this on simple tasks, I agree it still seems many years away. But previously Iād have put a fairly significant probability on AGI not being possible this century (as well as assigning a significant probability to it happening very soon, basically ending up highly uncertain). But it feels like these results make the idea that AGI is still 100 years away seem much less plausible than it was before.
A comment from FranƧois Chollet on this topic posted on Bluesky on January 6, 2025:
o3 gets 3% on ARC-AGI-2.
Sure, I think Iāve seen that comment before, and Iām aware Chollet also included loads of caveats in his initial write up of the o3 results.
But going from zero fluid intelligence to non-zero fluid intelligence seems like it should be considered a very significant milestone! Even if the amount of fluid intelligence is still small.
Previously there was a question around whether the new wave of AI models were capable of any fluid intelligence at all. Now, even someone like Chollet has concluded they are, so it just becomes a question of how easily those capabilities can scale?
Thatās the way Iām currently thinking about it anyway. Very open to the possibility that the nearness of AGI is still being overhyped.
I agree that itās a significant milestone, or at least it might be. I just read this comment a few hours ago (and the Twitter thread it links to) and that dampens my enthusiasm. 43 million words to solve one ARC-AGI-1 puzzle is a lot.
Also, I want to understand more about how ARC-AGI-2 is different from ARC-AGI-1. Chollet has said that about half of the tasks in ARC-AGI-1 turned out to be susceptible to ābrute forceā-type approaches. I donāt know what that means.
I think itās easy to get carried away with the implications of a result like this when youāre surrounded by so many voices saying that AGI is coming within 5 years or within 10 years.
My response to FranƧois Cholletās comments on o3ā²s high score on ARC-AGI-1 was more like, āOh, thatās really interesting!ā rather than making some big change to my views on AGI. I have to say, I was more excited about it before I knew it took 43 million words of text and over 1,000 attempts per task.
I still think no one knows how to build AGI and that (not unrelatedly) we donāt know when AGI will be built.
Chollet recently started a new company focused on combining deep learning and program synthesis. Thatās interesting. He seems to think the major AI labs like OpenAI and Google DeepMind are also working on program synthesis, but I donāt know how much publicly available evidence there is for this.
I can add Cholletās company to the list of organizations that I know of that have publicly discussed theyāre doing R&D related to AGI other than just scaling LLMs. The others I know of:
The Alberta Machine Intelligence Institute and Keen Technologies, both organizations where Richard Sutton is a key person and which (if I understand correctly) are pursuing at least to some extent Suttonās āAlberta Plan for AI Researchā
Numenta, a company co-founded by Jeff Hawkins, who has made aggressive statements about Numentaās ability to develop AGI in the not-too-distant future using insights from neuroscience (the main insights they think theyāve found are described here)
Yann LeCunās team at Meta AI, formerly FAIR; LeCun has published a roadmap to AGI, except he doesnāt call it AGI
I might be forgetting one or two. I know in the past Demis Hassabis has made some general comments about DeepMindās research related to AGI, but I donāt know of any specifics.
My gut sense is that all of these approaches will fail ā program synthesis combined with deep learning, the Alberta Plan, Numentaās Thousand Brains Principles, and Yann LeCunās roadmap. But this is just a random gut intuition and not a serious, considered opinion.
I think the idea that weāre barreling toward the imminent, inevitable invention of AGI is wrong. The idea is that AGI is so easy to invent and progress is happening so fast and so spontaneously that we can hardly stop ourselves from inventing AGI.
It would be seen as odd to take this view in any other area of technology, probably even among effective altruists. We would be lucky if we were barreling toward imminent, inevitable nuclear fusion or a universal coronavirus vaccine or a cure for cancer or any number of technologies that donāt exist yet that weād love to have.
Why does no one claim these technologies are being developed so spontaneously, so automatically, that we would have to take serious action to prevent them from being invented soon? Why is the attitude that progress is hard, success is uncertain, and the road is long?
Given thatās how technology usually works, and I donāt see any reason for AGI to be easier or take less time ā in fact, it seems like it should be harder and take longer, since the science of intelligence and cognition is among the least understood areas of science ā Iām inclined to guess that most approaches will fail.
Even if the right general approach is found, it could take a very long time to figure out how to actually make concrete progress using that approach. (By analogy, many of the general ideas behind deep learning existed for decades before deep learning started to take off around 2012.)
Iām interested in Cholletās interpretation of the o3 results on ARC-AGI-1 and if there is a genuine, fundamental advancement involved (which today, after finding out those details about o3ā²s attempts, I believe less than I did yesterday) then thatās exciting. But only moderately exciting because the advancement is only incremental.
The story that AGI is imminent and if we skirt disaster, weāll land in utopia is exciting and engaging. I think we live in a more boring version of reality (but still, all things considered, a pretty interesting one!) where weāre still at the drawing board stage for AGI, people are pitching different ideas (e.g., program synthesis, the Alberta Plan, the Thousand Brain Principles, energy-based self-supervised learning), the way forward is unclear, and weāre mostly in the dark about the fundamental nature of intelligence and cognition. Who knows how long it will take us to figure it out.
Interesting, thanks!