People who believe the Singularity will happen before the Brisbane Olympics are way too mean to Yann LeCun. In this post, it feels to me like you’re quoting him to do a “gotcha” and not engaging with the real substance of his argument.
Do frontier AI models have a good understanding of things like causality, time, and the physics of everyday objects? Well, recently o3-mini told me that a natural disaster caused an economic downturn that happened a month before the disaster, so not really. No human would ever make that mistake.
Yann is doing frontier AI research and actually has sophisticated thinking about the limitations of current LLMs. He even has a research program aimed at eventually overcoming those limitations with new AI systems. I think his general point that LLMs do not understand a lot of things that virtually all adult humans understand is still correct. Understanding, to me, does not mean that the LLM can answer correctly one time, but that it gives correct answers reliably and does not routinely make ridiculous mistakes (like the one I mentioned).
I like the ARC-AGI-2 benchmark because it quantifies what frontier AI models lack. Ordinary humans off the street get an average of 60% and every task has been solved by at least two ordinary humans in two attempts or less. GPT-4.5 gets 0.0%, o3-mini gets 0.0%, and every model that’s been tested gets under 5%. It’s designed to be challenging yet achievable for near-term AI models.
Benchmarks that reward memorizing large quantities of text have their place, but those benchmarks do not measure general intelligence.
People who believe the Singularity will happen before the Brisbane Olympics are way too mean to Yann LeCun. In this post, it feels to me like you’re quoting him to do a “gotcha” and not engaging with the real substance of his argument.
Do frontier AI models have a good understanding of things like causality, time, and the physics of everyday objects? Well, recently o3-mini told me that a natural disaster caused an economic downturn that happened a month before the disaster, so not really. No human would ever make that mistake.
Yann is doing frontier AI research and actually has sophisticated thinking about the limitations of current LLMs. He even has a research program aimed at eventually overcoming those limitations with new AI systems. I think his general point that LLMs do not understand a lot of things that virtually all adult humans understand is still correct. Understanding, to me, does not mean that the LLM can answer correctly one time, but that it gives correct answers reliably and does not routinely make ridiculous mistakes (like the one I mentioned).
I like the ARC-AGI-2 benchmark because it quantifies what frontier AI models lack. Ordinary humans off the street get an average of 60% and every task has been solved by at least two ordinary humans in two attempts or less. GPT-4.5 gets 0.0%, o3-mini gets 0.0%, and every model that’s been tested gets under 5%. It’s designed to be challenging yet achievable for near-term AI models.
Benchmarks that reward memorizing large quantities of text have their place, but those benchmarks do not measure general intelligence.