The ARC performance is a huge update for me.
Iāve previously found Francois Cholletās arguments that LLMs are unlikely to scale to AGI pretty convincing. Mainly because he had created an until-now unbeaten benchmark to back those arguments up.
But reading his linked write-up, he describes this as ānot merely an incremental improvement, but a genuine breakthroughā. He does not admit he was wrong, but instead paints o3 as something fundamentally different to previous LLM-based AIs, which for the purpose of assessing the significance of o3, amounts to the same thing!
It might be fair to say that the o3 improvements are something fundamentally different to simple scaling, and that Chollet is still correct in his āLLMs will not simply scale to AGIā prediction. I didnāt mean in my comment to suggest he was wrong about that.
I could imagine someone criticizing him for exaggerating how far away we were from coming up with the necessary new ideas, given the o3 results, but Iām not so interested in the debate about exactly how right or wrong the predictions of this one person were.
The interesting thing for me is: whether he was wrong, or whether he was right but o3 does represent a fundamentally different kind of model, the upshot for how seriously we should take o3 seems the same! It feels like a pretty big deal!
He could have reacted to this news by criticizing the way that o3 achieved its results. He already said in the Dwarkesh Patel interview that someone beating ARC wouldnāt necessarily imply progress towards general intelligence if the way they achieved it went against the spirit of the task. When I clicked the link in this post, I thought it likely I was about to read an argument along those lines. But thatās not what I got. Instead he was acknowledging that this was important progress.
Iām by no means an expert, but timelines in the 2030s still seems pretty close to me! Iād have thought, based on arguments from people like Chollet, that we might be a bit further off than that (although only with the low confidence of a layperson trying to interpret the competing predictions of experts who seem to radically disagree with each other).
Given all the problems you mention, and the high costs still involved in running this on simple tasks, I agree it still seems many years away. But previously Iād have put a fairly significant probability on AGI not being possible this century (as well as assigning a significant probability to it happening very soon, basically ending up highly uncertain). But it feels like these results make the idea that AGI is still 100 years away seem much less plausible than it was before.