Oh yeah this wasn’t against you at all! I think you’re a great researcher, and an excellent interlocutor, and I learn a lot (and am learning a lot) from both your work and your reactions to my reaction.[1] Point five was very much a reaction against a ‘vibe’ I saw in the wake of your results being published.
Like let’s take Buck’s tweet for example. We know now that a) your results aren’t technically SOTA and b) It’s not an LLM solution, it’s an LLM + your scaffolding + program search, and I think that’s importantly not the same thing.
It’s not an LLM solution, it’s an LLM + your scaffolding + program search, and I think that’s importantly not the same thing.
I feel like this is a pretty strange way to draw the line about what counts as an “LLM solution”.
Consider the following simplified dialogue as an example of why I don’t think this is a natural place to draw the line:
Human skeptic: Humans don’t exhibit real intelligence. You see, they’ll never do something as impressive as sending a human to the moon.
Humans-have-some-intelligence advocate: Didn’t humans go to the moon in 1969?
Human skeptic: That wasn’t humans sending someone to the moon that was Humans + Culture + Organizations + Science sending someone to the moon! You see, humans don’t exhibit real intelligence!
Humans-have-some-intelligence advocate: … Ok, but do you agree that if we removed the Humans from the overall approach it wouldn’t work.
Human skeptic: Yes, but same with the culture and organization!
Humans-have-some-intelligence advocate: Sure, I guess. I’m happy to just call it humans+etc I guess. Do you have any predictions for specific technical feats which are possible to do with a reasonable amount of intelligence that you’re confident can’t be accomplished by building some relatively straightforward organization on top of a bunch of smart humans within the next 15 years?
Human skeptic: No.
Of course, I think actual LLM skeptics often don’t answer “No” to the last question. They often do have something that they think is unlikely to occur with a relatively straightforward scaffold on top of an LLM (a model descended from the current LLM paradigm, perhaps trained with semi-supervised learning and RLHF).
I actually don’t know what in particular Chollet thinks is unlikely here. E.g., I don’t know if he has strong views about the performance of my method, but using the SOTA multimodal model in 2 years.
Final final edit: Congrats on the ARC-AGI-PUB results, really impressive :)
This will be my final response on this thread, because life is very time consuming and I’m rapidly reaching the point where I need to dive back into the technical literature and stress-test my beliefs and intuitions again. I hope Ryan and any readers have found this exchange useful/enlightening for seeing two different perspectives hopefully have productive disagreement?
If you found my presentation of the scaling-skeptical position highly unconvincing, I’d recommend following the work and thoughts of Tan Zhi Xuan (find her on X here). One of biggest updates was finding her work after she pushed back on Jacob Steinhardt here, and recently she gave a talk about her approach to Alignment. I urge readers to consider spending much more of their time listening to her than to me about AI.
I feel like this is a pretty strange way to draw the line about what counts as an “LLM solution”.
I don’t think so? Again, I wouldn’t call CICERO an “LLM solution”. Surely there’ll be some amount of scaffolding which tips over into the scaffolding being the main thing and the LLM just being a component part? It’s probably all blurry lines for sure, but I think it’s important to separate ‘LLM only systems’ from ‘systems that include LLMs’, because it’s very easy to conceptual scale up the former but harder to do the latter.
Human skeptic: That wasn’t humans sending someone to the moon that was Humans + Culture + Organizations + Science sending someone to the moon! You see, humans don’t exhibit real intelligence!
I mean, you use this as a reductio, but that’s basically the theory of Distributed Cognition, and also linked to the ideas of ‘collective intelligence’, though that’s definitely not an area I’m an expert in by any means. Also reminds me a lot Chalmers and Clarks’ thesis of the Extended Mind.[1]
Of course, I think actual LLM skeptics often don’t answer “No” to the last question. They often do have something that they think is unlikely to occur with a relatively straightforward scaffold on top of an LLM (a model descended from the current LLM paradigm, perhaps trained with semi-supervised learning and RLHF).
So I can’t speak for Chollet and other LLM skeptics, and I think again LLMs+extra (or extras+LLMs) are a different beast from LLMs on their own and possibly an important crux. Here are some things I don’t think will happen in the near-ish future (on the current paradigm):
I believe an adversarial Imitation Game, where the interrogator is aware of both the AI system’s LLM-based nature and its failure modes, is unlikely to be consistently beaten in the near future.[2]
Primarily-LLM models, in my view, are highly unlikely to exhibit autopoietic behaviour or develop agentic designs independently (i.e. without prompting/direction by a human controller).
I don’t anticipate these models exponential increase the rate of scientific research or AI development.[3] They’ll more likely serve as tools used by scientists and researchers themselves to frame problems, but new and novel problems will still remain difficult and be bottlenecked by the real world + Hofstadter’s law.
I don’t anticipate Primarily-LLM models to become good at controlling and manoeuvring robotic bodies in the 3D world. This is especially true in a novel-test-case scenario (if someone could make a physical equivalent of ARC to test this, that’d be great)
This would be even less likely if the scaffolding remained minimal. For instance, if there’s no initial sorting code explicitly stating [IF challenge == turing_test GO TO turing_test_game_module].
Finally, as an anti-RSI operationalisation, the idea of LLM-based models assisting in designing and constructing a Dyson Sphere within 15 years seems… particularly far-fetched for me.
I’m not sure if this reply was my best, it felt a little all-over-the-place, but we are touching on some deep or complex topics! So I’ll respectfully bow out now, and thank again for the disucssion and giving me so much to think about. I really appreciate it Ryan :)
Of course, with a new breakthrough, all bets could be off, but it’s also definitionally impossible to predict those, and unrobust to draw straight lines and graphs to predict the future if you think breakthroughs will be need. (Not saying you do this, but some other AIXR people definitely seem to be)
I don’t think the objection is to ARC (the benchmark), I think the objection is to specific (very strong!) claims that chollet makes.
I think the benchmark is a useful contribution as I note in another comment.
Oh yeah this wasn’t against you at all! I think you’re a great researcher, and an excellent interlocutor, and I learn a lot (and am learning a lot) from both your work and your reactions to my reaction.[1] Point five was very much a reaction against a ‘vibe’ I saw in the wake of your results being published.
Like let’s take Buck’s tweet for example. We know now that a) your results aren’t technically SOTA and b) It’s not an LLM solution, it’s an LLM + your scaffolding + program search, and I think that’s importantly not the same thing.
I sincerely hope my post + comments have been somewhat more stimulating than frustrating for you
I think my results are probably SOTA based on more recent updates.
I feel like this is a pretty strange way to draw the line about what counts as an “LLM solution”.
Consider the following simplified dialogue as an example of why I don’t think this is a natural place to draw the line:
Human skeptic: Humans don’t exhibit real intelligence. You see, they’ll never do something as impressive as sending a human to the moon.
Humans-have-some-intelligence advocate: Didn’t humans go to the moon in 1969?
Human skeptic: That wasn’t humans sending someone to the moon that was Humans + Culture + Organizations + Science sending someone to the moon! You see, humans don’t exhibit real intelligence!
Humans-have-some-intelligence advocate: … Ok, but do you agree that if we removed the Humans from the overall approach it wouldn’t work.
Human skeptic: Yes, but same with the culture and organization!
Humans-have-some-intelligence advocate: Sure, I guess. I’m happy to just call it humans+etc I guess. Do you have any predictions for specific technical feats which are possible to do with a reasonable amount of intelligence that you’re confident can’t be accomplished by building some relatively straightforward organization on top of a bunch of smart humans within the next 15 years?
Human skeptic: No.
Of course, I think actual LLM skeptics often don’t answer “No” to the last question. They often do have something that they think is unlikely to occur with a relatively straightforward scaffold on top of an LLM (a model descended from the current LLM paradigm, perhaps trained with semi-supervised learning and RLHF).
I actually don’t know what in particular Chollet thinks is unlikely here. E.g., I don’t know if he has strong views about the performance of my method, but using the SOTA multimodal model in 2 years.
Final final edit: Congrats on the ARC-AGI-PUB results, really impressive :)
This will be my final response on this thread, because life is very time consuming and I’m rapidly reaching the point where I need to dive back into the technical literature and stress-test my beliefs and intuitions again. I hope Ryan and any readers have found this exchange useful/enlightening for seeing two different perspectives hopefully have productive disagreement?
If you found my presentation of the scaling-skeptical position highly unconvincing, I’d recommend following the work and thoughts of Tan Zhi Xuan (find her on X here). One of biggest updates was finding her work after she pushed back on Jacob Steinhardt here, and recently she gave a talk about her approach to Alignment. I urge readers to consider spending much more of their time listening to her than to me about AI.
I don’t think so? Again, I wouldn’t call CICERO an “LLM solution”. Surely there’ll be some amount of scaffolding which tips over into the scaffolding being the main thing and the LLM just being a component part? It’s probably all blurry lines for sure, but I think it’s important to separate ‘LLM only systems’ from ‘systems that include LLMs’, because it’s very easy to conceptual scale up the former but harder to do the latter.
I mean, you use this as a reductio, but that’s basically the theory of Distributed Cognition, and also linked to the ideas of ‘collective intelligence’, though that’s definitely not an area I’m an expert in by any means. Also reminds me a lot Chalmers and Clarks’ thesis of the Extended Mind.[1]
So I can’t speak for Chollet and other LLM skeptics, and I think again LLMs+extra (or extras+LLMs) are a different beast from LLMs on their own and possibly an important crux. Here are some things I don’t think will happen in the near-ish future (on the current paradigm):
I believe an adversarial Imitation Game, where the interrogator is aware of both the AI system’s LLM-based nature and its failure modes, is unlikely to be consistently beaten in the near future.[2]
Primarily-LLM models, in my view, are highly unlikely to exhibit autopoietic behaviour or develop agentic designs independently (i.e. without prompting/direction by a human controller).
I don’t anticipate these models exponential increase the rate of scientific research or AI development.[3] They’ll more likely serve as tools used by scientists and researchers themselves to frame problems, but new and novel problems will still remain difficult and be bottlenecked by the real world + Hofstadter’s law.
I don’t anticipate Primarily-LLM models to become good at controlling and manoeuvring robotic bodies in the 3D world. This is especially true in a novel-test-case scenario (if someone could make a physical equivalent of ARC to test this, that’d be great)
This would be even less likely if the scaffolding remained minimal. For instance, if there’s no initial sorting code explicitly stating [IF challenge == turing_test GO TO turing_test_game_module].
Finally, as an anti-RSI operationalisation, the idea of LLM-based models assisting in designing and constructing a Dyson Sphere within 15 years seems… particularly far-fetched for me.
I’m not sure if this reply was my best, it felt a little all-over-the-place, but we are touching on some deep or complex topics! So I’ll respectfully bow out now, and thank again for the disucssion and giving me so much to think about. I really appreciate it Ryan :)
Then you get into ideas like embodiment/enactivism etc
I can think of a bunch of strategies to win here, but I’m not gonna say so it doesn’t end up in GPT-5 or 6′s training data!
Of course, with a new breakthrough, all bets could be off, but it’s also definitionally impossible to predict those, and unrobust to draw straight lines and graphs to predict the future if you think breakthroughs will be need. (Not saying you do this, but some other AIXR people definitely seem to be)