Oh yeah this wasnât against you at all! I think youâre a great researcher, and an excellent interlocutor, and I learn a lot (and am learning a lot) from both your work and your reactions to my reaction.[1] Point five was very much a reaction against a âvibeâ I saw in the wake of your results being published.
Like letâs take Buckâs tweet for example. We know now that a) your results arenât technically SOTA and b) Itâs not an LLM solution, itâs an LLM + your scaffolding + program search, and I think thatâs importantly not the same thing.
Itâs not an LLM solution, itâs an LLM + your scaffolding + program search, and I think thatâs importantly not the same thing.
I feel like this is a pretty strange way to draw the line about what counts as an âLLM solutionâ.
Consider the following simplified dialogue as an example of why I donât think this is a natural place to draw the line:
Human skeptic: Humans donât exhibit real intelligence. You see, theyâll never do something as impressive as sending a human to the moon.
Humans-have-some-intelligence advocate: Didnât humans go to the moon in 1969?
Human skeptic: That wasnât humans sending someone to the moon that was Humans + Culture + Organizations + Science sending someone to the moon! You see, humans donât exhibit real intelligence!
Humans-have-some-intelligence advocate: ⌠Ok, but do you agree that if we removed the Humans from the overall approach it wouldnât work.
Human skeptic: Yes, but same with the culture and organization!
Humans-have-some-intelligence advocate: Sure, I guess. Iâm happy to just call it humans+etc I guess. Do you have any predictions for specific technical feats which are possible to do with a reasonable amount of intelligence that youâre confident canât be accomplished by building some relatively straightforward organization on top of a bunch of smart humans within the next 15 years?
Human skeptic: No.
Of course, I think actual LLM skeptics often donât answer âNoâ to the last question. They often do have something that they think is unlikely to occur with a relatively straightforward scaffold on top of an LLM (a model descended from the current LLM paradigm, perhaps trained with semi-supervised learning and RLHF).
I actually donât know what in particular Chollet thinks is unlikely here. E.g., I donât know if he has strong views about the performance of my method, but using the SOTA multimodal model in 2 years.
Final final edit: Congrats on the ARC-AGI-PUB results, really impressive :)
This will be my final response on this thread, because life is very time consuming and Iâm rapidly reaching the point where I need to dive back into the technical literature and stress-test my beliefs and intuitions again. I hope Ryan and any readers have found this exchange useful/âenlightening for seeing two different perspectives hopefully have productive disagreement?
If you found my presentation of the scaling-skeptical position highly unconvincing, Iâd recommend following the work and thoughts of Tan Zhi Xuan (find her on X here). One of biggest updates was finding her work after she pushed back on Jacob Steinhardt here, and recently she gave a talk about her approach to Alignment. I urge readers to consider spending much more of their time listening to her than to me about AI.
I feel like this is a pretty strange way to draw the line about what counts as an âLLM solutionâ.
I donât think so? Again, I wouldnât call CICERO an âLLM solutionâ. Surely thereâll be some amount of scaffolding which tips over into the scaffolding being the main thing and the LLM just being a component part? Itâs probably all blurry lines for sure, but I think itâs important to separate âLLM only systemsâ from âsystems that include LLMsâ, because itâs very easy to conceptual scale up the former but harder to do the latter.
Human skeptic: That wasnât humans sending someone to the moon that was Humans + Culture + Organizations + Science sending someone to the moon! You see, humans donât exhibit real intelligence!
I mean, you use this as a reductio, but thatâs basically the theory of Distributed Cognition, and also linked to the ideas of âcollective intelligenceâ, though thatâs definitely not an area Iâm an expert in by any means. Also reminds me a lot Chalmers and Clarksâ thesis of the Extended Mind.[1]
Of course, I think actual LLM skeptics often donât answer âNoâ to the last question. They often do have something that they think is unlikely to occur with a relatively straightforward scaffold on top of an LLM (a model descended from the current LLM paradigm, perhaps trained with semi-supervised learning and RLHF).
So I canât speak for Chollet and other LLM skeptics, and I think again LLMs+extra (or extras+LLMs) are a different beast from LLMs on their own and possibly an important crux. Here are some things I donât think will happen in the near-ish future (on the current paradigm):
I believe an adversarial Imitation Game, where the interrogator is aware of both the AI systemâs LLM-based nature and its failure modes, is unlikely to be consistently beaten in the near future.[2]
Primarily-LLM models, in my view, are highly unlikely to exhibit autopoietic behaviour or develop agentic designs independently (i.e. without prompting/âdirection by a human controller).
I donât anticipate these models exponential increase the rate of scientific research or AI development.[3] Theyâll more likely serve as tools used by scientists and researchers themselves to frame problems, but new and novel problems will still remain difficult and be bottlenecked by the real world + Hofstadterâs law.
I donât anticipate Primarily-LLM models to become good at controlling and manoeuvring robotic bodies in the 3D world. This is especially true in a novel-test-case scenario (if someone could make a physical equivalent of ARC to test this, thatâd be great)
This would be even less likely if the scaffolding remained minimal. For instance, if thereâs no initial sorting code explicitly stating [IF challenge == turing_test GO TO turing_test_game_module].
Finally, as an anti-RSI operationalisation, the idea of LLM-based models assisting in designing and constructing a Dyson Sphere within 15 years seems⌠particularly far-fetched for me.
Iâm not sure if this reply was my best, it felt a little all-over-the-place, but we are touching on some deep or complex topics! So Iâll respectfully bow out now, and thank again for the disucssion and giving me so much to think about. I really appreciate it Ryan :)
Of course, with a new breakthrough, all bets could be off, but itâs also definitionally impossible to predict those, and unrobust to draw straight lines and graphs to predict the future if you think breakthroughs will be need. (Not saying you do this, but some other AIXR people definitely seem to be)
Oh yeah this wasnât against you at all! I think youâre a great researcher, and an excellent interlocutor, and I learn a lot (and am learning a lot) from both your work and your reactions to my reaction.[1] Point five was very much a reaction against a âvibeâ I saw in the wake of your results being published.
Like letâs take Buckâs tweet for example. We know now that a) your results arenât technically SOTA and b) Itâs not an LLM solution, itâs an LLM + your scaffolding + program search, and I think thatâs importantly not the same thing.
I sincerely hope my post + comments have been somewhat more stimulating than frustrating for you
I think my results are probably SOTA based on more recent updates.
I feel like this is a pretty strange way to draw the line about what counts as an âLLM solutionâ.
Consider the following simplified dialogue as an example of why I donât think this is a natural place to draw the line:
Human skeptic: Humans donât exhibit real intelligence. You see, theyâll never do something as impressive as sending a human to the moon.
Humans-have-some-intelligence advocate: Didnât humans go to the moon in 1969?
Human skeptic: That wasnât humans sending someone to the moon that was Humans + Culture + Organizations + Science sending someone to the moon! You see, humans donât exhibit real intelligence!
Humans-have-some-intelligence advocate: ⌠Ok, but do you agree that if we removed the Humans from the overall approach it wouldnât work.
Human skeptic: Yes, but same with the culture and organization!
Humans-have-some-intelligence advocate: Sure, I guess. Iâm happy to just call it humans+etc I guess. Do you have any predictions for specific technical feats which are possible to do with a reasonable amount of intelligence that youâre confident canât be accomplished by building some relatively straightforward organization on top of a bunch of smart humans within the next 15 years?
Human skeptic: No.
Of course, I think actual LLM skeptics often donât answer âNoâ to the last question. They often do have something that they think is unlikely to occur with a relatively straightforward scaffold on top of an LLM (a model descended from the current LLM paradigm, perhaps trained with semi-supervised learning and RLHF).
I actually donât know what in particular Chollet thinks is unlikely here. E.g., I donât know if he has strong views about the performance of my method, but using the SOTA multimodal model in 2 years.
Final final edit: Congrats on the ARC-AGI-PUB results, really impressive :)
This will be my final response on this thread, because life is very time consuming and Iâm rapidly reaching the point where I need to dive back into the technical literature and stress-test my beliefs and intuitions again. I hope Ryan and any readers have found this exchange useful/âenlightening for seeing two different perspectives hopefully have productive disagreement?
If you found my presentation of the scaling-skeptical position highly unconvincing, Iâd recommend following the work and thoughts of Tan Zhi Xuan (find her on X here). One of biggest updates was finding her work after she pushed back on Jacob Steinhardt here, and recently she gave a talk about her approach to Alignment. I urge readers to consider spending much more of their time listening to her than to me about AI.
I donât think so? Again, I wouldnât call CICERO an âLLM solutionâ. Surely thereâll be some amount of scaffolding which tips over into the scaffolding being the main thing and the LLM just being a component part? Itâs probably all blurry lines for sure, but I think itâs important to separate âLLM only systemsâ from âsystems that include LLMsâ, because itâs very easy to conceptual scale up the former but harder to do the latter.
I mean, you use this as a reductio, but thatâs basically the theory of Distributed Cognition, and also linked to the ideas of âcollective intelligenceâ, though thatâs definitely not an area Iâm an expert in by any means. Also reminds me a lot Chalmers and Clarksâ thesis of the Extended Mind.[1]
So I canât speak for Chollet and other LLM skeptics, and I think again LLMs+extra (or extras+LLMs) are a different beast from LLMs on their own and possibly an important crux. Here are some things I donât think will happen in the near-ish future (on the current paradigm):
I believe an adversarial Imitation Game, where the interrogator is aware of both the AI systemâs LLM-based nature and its failure modes, is unlikely to be consistently beaten in the near future.[2]
Primarily-LLM models, in my view, are highly unlikely to exhibit autopoietic behaviour or develop agentic designs independently (i.e. without prompting/âdirection by a human controller).
I donât anticipate these models exponential increase the rate of scientific research or AI development.[3] Theyâll more likely serve as tools used by scientists and researchers themselves to frame problems, but new and novel problems will still remain difficult and be bottlenecked by the real world + Hofstadterâs law.
I donât anticipate Primarily-LLM models to become good at controlling and manoeuvring robotic bodies in the 3D world. This is especially true in a novel-test-case scenario (if someone could make a physical equivalent of ARC to test this, thatâd be great)
This would be even less likely if the scaffolding remained minimal. For instance, if thereâs no initial sorting code explicitly stating [IF challenge == turing_test GO TO turing_test_game_module].
Finally, as an anti-RSI operationalisation, the idea of LLM-based models assisting in designing and constructing a Dyson Sphere within 15 years seems⌠particularly far-fetched for me.
Iâm not sure if this reply was my best, it felt a little all-over-the-place, but we are touching on some deep or complex topics! So Iâll respectfully bow out now, and thank again for the disucssion and giving me so much to think about. I really appreciate it Ryan :)
Then you get into ideas like embodiment/âenactivism etc
I can think of a bunch of strategies to win here, but Iâm not gonna say so it doesnât end up in GPT-5 or 6â˛s training data!
Of course, with a new breakthrough, all bets could be off, but itâs also definitionally impossible to predict those, and unrobust to draw straight lines and graphs to predict the future if you think breakthroughs will be need. (Not saying you do this, but some other AIXR people definitely seem to be)