Thanks for sharing toby, I had just finished listening to the podcast and was about to share it here but it turns out you beat me to it! I think Iāll do a post going into the interview (Zvi-style)[1] and bringing up the most interesting points and cruxes, and why the ARC Challenge matters. To quickly give my thoughts on some of the things you bring up:
The ARC Challenge is the best benchmark out their imo, and itās telling that labs donāt release their scores on it. Chollet says in the interview that they test it but because they score badly, the donāt release them.
On timelines, Chollet says that OpenAIās success led the field to 1) stop sharing Frontier research and 2) make the field focus on LLMs alone, thereby setting back timelines to AGI. Iād also suggest that the āAGI in 2ā3 yearsā claims donāt make much sense to me unless you take an LLMs+scaling maximalist perspective.
And to respond to some other comments here:
To huw, I think the AI Safety field is mixed. The original perspective was that ASI would be like an AIXI model, but the success of transformers have changed that. Existing models and dependents could be economically damaging, but taking away the existential risk undermines the astronomical value of AI Safety from an EA perspective.
To OCB, I think we just disagree about how far LLMs are away from this. i think less that ARC is āneatā and more that it shows a critical failure model in the LLM paradigm. In the interview Chollet argues that the āscaffoldingā is actually the hard part of reasoning, and I agree with him.
To Mo, I guess Cholletās perspective would be that you need āopen-endednessā to be able to automate many/āmost work? A big crux I think here is whether āPASTAā is possible at all, or at least whether it can be used as a way to bootstrap everything else. Iām more of the perspective that science is probably the last thing that can possibly be automated, but that might depend on your definition of science.
Iām quite sceptical of Davidsonās work, and probably Karnofskyās, but Iāll need to revisit them in detail to treat them fairly.
The Metaculus AGI markets are, to me, crazy low. In both cases the resolution criteria are some LLM unfriendly, it seems that people are more going off āvibesā and not reading the fine print. Right now, for instance, any OpenAI model will be easily discovered in a proper imitation game by asking it to do something that violates the terms of service.
Iāll go into more depth on my follow-up post, and Iāll edit this bit of my comment wiht a link once Iām done.
A big crux I think here is whether āPASTAā is possible at all, or at least whether it can be used as a way to bootstrap everything else.
Do you mean āpossible at all using LLM technologyā or do you mean āpossible at all using any possible AI algorithm that will ever be inventedā?
As for the latter, I think (or at least, I hope!) that thereās wide consensus that whatever human brains do (individually and collectively), it is possible in principle for algorithms-running-on-chips to do those same things too. Brains are not magic, right?
As for the latter, I think (or at least, I hope!) that thereās wide consensus that whatever human brains do (individually and collectively), it is possible in principle for algorithms-running-on-chips to do those same things too. Brains are not magic, right?
I think this is probably true, but I wouldnāt be 100% certain about it. Brains may not be magic, but they are also very different physical entities to silicon chips, so there is no guarantee that the function of one could be efficiently emulated by the other. There could be some crucial aspect of the mind relying on a physical process which would be computationally infeasible to simulate using binary silicon transistors.
If there are any neuroscientists who have investigated this I would be interested!
OK yeah, āAGI is possible on chips but only if you have 1e100 of them or whateverā is certainly a conceivable possibility. :) For example, hereās me responding to someone arguing along those lines.
If there are any neuroscientists who have investigated this I would be interested!
(1) if you look at how human brains, say, go to the moon, or invent quantum mechanics, and you think about what algorithms could underlie that, then you would start talking about algorithms that entail building generative models, and editing them, and querying them, and searching through them, and composing them, blah blah.
(2) if you look at a biological brainās low-level affordances, itās a bunch of things related to somatic spikes and dendritic spikes and protein cascades and releasing and detecting neuropeptides etc.
(3) if you look at silicon chipās low-level affordances, itās a bunch of things related to switching transistors and currents going down wires and charging up capacitors and so on.
My view is: implementing (1) via (3) would involve a lot of inefficient bottlenecks where thereās no low-level affordance thatās a good match to the algorithmic operation we want ā¦ but the same is true of implementing (1) via (2). Indeed, I think the human brain does what it does via some atrociously inefficient workarounds to the limitations of biological neurons, limitations which would not be applicable to silicon chips.
By contrast, many people thinking about this problem are often thinking about āhow hard is it to use (3) to precisely emulate (2)?ā, rather than āwhatās the comparison between (1)ā(3) versus (1)ā(2)?ā. (If youāre still not following, see my discussion hereāsearch for ātransistor-by-transistor simulation of a pocket calculator microcontroller chipā.)
Another thing is that, if you look at what a single consumer GPU can do when it runs an LLM or diffusion modelā¦ well itās not doing human-level AGI, but itās sure doing something, and I think itās a sound intuition (albeit hard to formalize) to say āwell it kinda seems implausible that the brain is doing something thatās >1000Ć harder to calculate than thatā.
Thanks for those links, this is an interesting topic I may look into more in the future.
Another thing is that, if you look at what a single consumer GPU can do when it runs an LLM or diffusion modelā¦ well itās not doing human-level AGI, but itās sure doing something, and I think itās a sound intuition (albeit hard to formalize) to say āwell it kinda seems implausible that the brain is doing something thatās >1000Ć harder to calculate than thatā.
It doesnāt seem that implausible to me. In general I find the computational power required for different tasks (such as what I do in computational physics) frequently varies by many orders of magnitude. LLMs get to their level of performance by sifting throughall the data on the internet, something we canāt do, and yet still perform worse than a regular human on many tasks, so clearly theres a lot of extra something going on here. It actually seems kind of likely to me that what the brain is doing is more than 3 orders of magnitude more difficult.
I donāt know enough to be confident on any of this, but If AGI turns out to be impossible on silicon chips with earths resources, I would be surprised but not totally shocked.
Yeah I definitely donāt mean ābrains are magicā, humans are generally intelligent by any meaningful definition of the words, so we have an existence proof there that it is possible to be instantiated in some form.
Iām more sceptical of thinking science can be āautomatedā thoughāI think progressing scientific understanding of the world is in many ways quite a creative and open-ended endeavour. It requires forming beliefs about the world, updating them due to evidence, and sometimes making radical new shifts. Itās essentially the epistemological frame problem, and I think weāre way off a solution there.
I think I have a big similar crux with Aschenbrenner when he says things like āautomating AI research is all it takesāālike I think I disagree with that anyway but automating AI research is really, really hard! It might be āall it takesā because that problem is already AGI complete!
Iām confused what youāre trying to sayā¦ Supposing we do in fact invent AGI someday, do you think this AGI wonāt be able to do science? Or that it will be able to do science, but that wouldnāt count as āautomating scienceā?
Or maybe when you said āwhether āPASTAā is possible at allā, you meant āwhether āPASTAā is possible at all via future LLMsā?
Maybe youāre assuming that everyone here has a shared assumption that weāre just talking about LLMs, and that if someone says āAI will never do Xā they obviously means āLLMs will never do Xā? If so, I think thatās wrong (or at least I hope itās wrong), and I think we should be more careful with our terminology. AI is broader than LLMs. ā¦Well maybe Aschenbrenner is thinking that way, but I bet that if you were to ask a typical senior person in AI x-risk (e.g. Karnofsky) whether itās possible that there will be some big AI paradigm shift (away from LLMs) between now and TAI, they would say āWell yeah duh of course thatās possible,ā and then they would say that they would still absolutely want to talk about and prepare for TAI, in whatever algorithmic form it might take.
Apologies for not being clear! Iāll try and be a bit more clear here, but thereās probably a lot of inferential distance here and weāre covering some quite deep topics:
Supposing we do in fact invent AGI someday, do you think this AGI wonāt be able to do science? Or that it will be able to do science, but that wouldnāt count as āautomating scienceā?
Or maybe when you said āwhether āPASTAā is possible at allā, you meant āwhether āPASTAā is possible at all via future LLMsā?
So on the first section, Iām going for the latter and taking issue with the term āautomationā, which I think speaks to mindless, automatic process of achieving some output. But if digital functionalism were true, and we successful made a digital emulation of a human who contributed to scientific research, I wouldnāt call that āautomating scienceā, instead we would have created a being that can do science. That being would be creative, agentic, with the ability to formulate itās own novel ideas and hypotheses about the world. Itād be limited by its ability to sample from the world, design experiments, practice good epistemology, wait for physical results etc. etc. It might be the case that some scientific research happens quickly, and then subsequent breakthroughs happen more slowly, etc.
My opinions on this are also highly influenced by the works of Deutsch and Popper too, who essentially argue that the growth of knowledge cannot be predicted, and since science is (in some sense) the stock of human knowledge, and since what cannot be predicted cannot be automated, scientific āautomationā is in some sense impossible.
Maybe youāre assuming that everyone here has a shared assumption that weāre just talking about LLMs...but I bet that if you were to ask a typical senior person in AI x-risk (e.g. Karnofsky) whether itās possible that there will be some big AI paradigm shift (away from LLMs) between now and TAI, they would say āWell yeah duh of course thatās possible,ā and then they would say that they would still absolutely want to talk about and prepare for TAI, in whatever algorithmic form it might take.
Agreed, AI systems are larger than LLMs, and maybe I was being a bit loose with language. On the whole though, I think much of the case by proponents for the importance of working on AI Safety does assume that current paradigm + scale is all you need, or rest on works that assume it. For instance, Davidsonās Compute-Centric Framework model for OpenPhil states right in that opening page:
In this framework, AGI is developed by improving and scaling up approaches within the current ML paradigm, not by discovering new algorithmic paradigms.
And I get off the bus with this approach immediately because I donāt think thatās plausible.
As I said in my original comment, Iām working on a full post on the discussion between Chollet and Dwarkesh, which will hopefully make the AGI-sceptical position Iām coming from a bit more clear. If you end up reading it, Iād be really interested in your thoughts! :)
On the whole though, I think much of the case by proponents for the importance of working on AI Safety does assume that current paradigm + scale is all you need, or rest on works that assume it.
There were however plenty of people who were loudly arguing that it was important to work on AI x-risk before āthe current paradigmā was much of a thing (or in some cases long before āthe current paradigmā existed at all), and I think their arguments were sound at the time and remain sound today. (E.g. Alan Turing, Norbert Weiner, Yudkowsky, Bostrom, Stuart Russell, Tegmarkā¦) (OpenPhil seems to have started working seriously on AI in 2016, which was 3 years before GPT-2.)
On the timelines question, I know Chollet argues AGI is further off than a lot of people think, and maybe his views do imply that in expectation, but it also seems to me like his views introduce higher variance into the prediction, and so would also allow for the possibility of much more rapid AGI advancement than the conventional narrative does.
If you think we just need to scale LLMs to get to AGI, then you expect things to happen fast, but probably not that fast. Progress is limited by compute and by data availability.
But if there is some crucial set of ideas yet to be discovered, then thatās something that could change extremely quickly. Weāre potentially just waiting for someone to have a eureka moment. And weād be much less certain what exactly was possible with current hardware and data once that moment happens. Maybe we could have superhuman AGI almost overnight?
Thanks for sharing toby, I had just finished listening to the podcast and was about to share it here but it turns out you beat me to it! I think Iāll do a post going into the interview (Zvi-style)[1] and bringing up the most interesting points and cruxes, and why the ARC Challenge matters. To quickly give my thoughts on some of the things you bring up:
The ARC Challenge is the best benchmark out their imo, and itās telling that labs donāt release their scores on it. Chollet says in the interview that they test it but because they score badly, the donāt release them.
On timelines, Chollet says that OpenAIās success led the field to 1) stop sharing Frontier research and 2) make the field focus on LLMs alone, thereby setting back timelines to AGI. Iād also suggest that the āAGI in 2ā3 yearsā claims donāt make much sense to me unless you take an LLMs+scaling maximalist perspective.
And to respond to some other comments here:
To huw, I think the AI Safety field is mixed. The original perspective was that ASI would be like an AIXI model, but the success of transformers have changed that. Existing models and dependents could be economically damaging, but taking away the existential risk undermines the astronomical value of AI Safety from an EA perspective.
To OCB, I think we just disagree about how far LLMs are away from this. i think less that ARC is āneatā and more that it shows a critical failure model in the LLM paradigm. In the interview Chollet argues that the āscaffoldingā is actually the hard part of reasoning, and I agree with him.
To Mo, I guess Cholletās perspective would be that you need āopen-endednessā to be able to automate many/āmost work? A big crux I think here is whether āPASTAā is possible at all, or at least whether it can be used as a way to bootstrap everything else. Iām more of the perspective that science is probably the last thing that can possibly be automated, but that might depend on your definition of science.
Iām quite sceptical of Davidsonās work, and probably Karnofskyās, but Iāll need to revisit them in detail to treat them fairly.
The Metaculus AGI markets are, to me, crazy low. In both cases the resolution criteria are some LLM unfriendly, it seems that people are more going off āvibesā and not reading the fine print. Right now, for instance, any OpenAI model will be easily discovered in a proper imitation game by asking it to do something that violates the terms of service.
Iāll go into more depth on my follow-up post, and Iāll edit this bit of my comment wiht a link once Iām done.
In style only, I make no claims as to quality
Do you mean āpossible at all using LLM technologyā or do you mean āpossible at all using any possible AI algorithm that will ever be inventedā?
As for the latter, I think (or at least, I hope!) that thereās wide consensus that whatever human brains do (individually and collectively), it is possible in principle for algorithms-running-on-chips to do those same things too. Brains are not magic, right?
I think this is probably true, but I wouldnāt be 100% certain about it. Brains may not be magic, but they are also very different physical entities to silicon chips, so there is no guarantee that the function of one could be efficiently emulated by the other. There could be some crucial aspect of the mind relying on a physical process which would be computationally infeasible to simulate using binary silicon transistors.
If there are any neuroscientists who have investigated this I would be interested!
OK yeah, āAGI is possible on chips but only if you have 1e100 of them or whateverā is certainly a conceivable possibility. :) For example, hereās me responding to someone arguing along those lines.
There is never a neuroscience consensus but fwiw I fancy myself a neuroscientist and have some thoughts at: Thoughts on hardware /ā compute requirements for AGI.
One of various points I bring up is that:
(1) if you look at how human brains, say, go to the moon, or invent quantum mechanics, and you think about what algorithms could underlie that, then you would start talking about algorithms that entail building generative models, and editing them, and querying them, and searching through them, and composing them, blah blah.
(2) if you look at a biological brainās low-level affordances, itās a bunch of things related to somatic spikes and dendritic spikes and protein cascades and releasing and detecting neuropeptides etc.
(3) if you look at silicon chipās low-level affordances, itās a bunch of things related to switching transistors and currents going down wires and charging up capacitors and so on.
My view is: implementing (1) via (3) would involve a lot of inefficient bottlenecks where thereās no low-level affordance thatās a good match to the algorithmic operation we want ā¦ but the same is true of implementing (1) via (2). Indeed, I think the human brain does what it does via some atrociously inefficient workarounds to the limitations of biological neurons, limitations which would not be applicable to silicon chips.
By contrast, many people thinking about this problem are often thinking about āhow hard is it to use (3) to precisely emulate (2)?ā, rather than āwhatās the comparison between (1)ā(3) versus (1)ā(2)?ā. (If youāre still not following, see my discussion hereāsearch for ātransistor-by-transistor simulation of a pocket calculator microcontroller chipā.)
Another thing is that, if you look at what a single consumer GPU can do when it runs an LLM or diffusion modelā¦ well itās not doing human-level AGI, but itās sure doing something, and I think itās a sound intuition (albeit hard to formalize) to say āwell it kinda seems implausible that the brain is doing something thatās >1000Ć harder to calculate than thatā.
Thanks for those links, this is an interesting topic I may look into more in the future.
It doesnāt seem that implausible to me. In general I find the computational power required for different tasks (such as what I do in computational physics) frequently varies by many orders of magnitude. LLMs get to their level of performance by sifting through all the data on the internet, something we canāt do, and yet still perform worse than a regular human on many tasks, so clearly theres a lot of extra something going on here. It actually seems kind of likely to me that what the brain is doing is more than 3 orders of magnitude more difficult.
I donāt know enough to be confident on any of this, but If AGI turns out to be impossible on silicon chips with earths resources, I would be surprised but not totally shocked.
Yeah I definitely donāt mean ābrains are magicā, humans are generally intelligent by any meaningful definition of the words, so we have an existence proof there that it is possible to be instantiated in some form.
Iām more sceptical of thinking science can be āautomatedā thoughāI think progressing scientific understanding of the world is in many ways quite a creative and open-ended endeavour. It requires forming beliefs about the world, updating them due to evidence, and sometimes making radical new shifts. Itās essentially the epistemological frame problem, and I think weāre way off a solution there.
I think I have a big similar crux with Aschenbrenner when he says things like āautomating AI research is all it takesāālike I think I disagree with that anyway but automating AI research is really, really hard! It might be āall it takesā because that problem is already AGI complete!
Iām confused what youāre trying to sayā¦ Supposing we do in fact invent AGI someday, do you think this AGI wonāt be able to do science? Or that it will be able to do science, but that wouldnāt count as āautomating scienceā?
Or maybe when you said āwhether āPASTAā is possible at allā, you meant āwhether āPASTAā is possible at all via future LLMsā?
Maybe youāre assuming that everyone here has a shared assumption that weāre just talking about LLMs, and that if someone says āAI will never do Xā they obviously means āLLMs will never do Xā? If so, I think thatās wrong (or at least I hope itās wrong), and I think we should be more careful with our terminology. AI is broader than LLMs. ā¦Well maybe Aschenbrenner is thinking that way, but I bet that if you were to ask a typical senior person in AI x-risk (e.g. Karnofsky) whether itās possible that there will be some big AI paradigm shift (away from LLMs) between now and TAI, they would say āWell yeah duh of course thatās possible,ā and then they would say that they would still absolutely want to talk about and prepare for TAI, in whatever algorithmic form it might take.
Apologies for not being clear! Iāll try and be a bit more clear here, but thereās probably a lot of inferential distance here and weāre covering some quite deep topics:
So on the first section, Iām going for the latter and taking issue with the term āautomationā, which I think speaks to mindless, automatic process of achieving some output. But if digital functionalism were true, and we successful made a digital emulation of a human who contributed to scientific research, I wouldnāt call that āautomating scienceā, instead we would have created a being that can do science. That being would be creative, agentic, with the ability to formulate itās own novel ideas and hypotheses about the world. Itād be limited by its ability to sample from the world, design experiments, practice good epistemology, wait for physical results etc. etc. It might be the case that some scientific research happens quickly, and then subsequent breakthroughs happen more slowly, etc.
My opinions on this are also highly influenced by the works of Deutsch and Popper too, who essentially argue that the growth of knowledge cannot be predicted, and since science is (in some sense) the stock of human knowledge, and since what cannot be predicted cannot be automated, scientific āautomationā is in some sense impossible.
Agreed, AI systems are larger than LLMs, and maybe I was being a bit loose with language. On the whole though, I think much of the case by proponents for the importance of working on AI Safety does assume that current paradigm + scale is all you need, or rest on works that assume it. For instance, Davidsonās Compute-Centric Framework model for OpenPhil states right in that opening page:
And I get off the bus with this approach immediately because I donāt think thatās plausible.
As I said in my original comment, Iām working on a full post on the discussion between Chollet and Dwarkesh, which will hopefully make the AGI-sceptical position Iām coming from a bit more clear. If you end up reading it, Iād be really interested in your thoughts! :)
Yeah this is more true than I would like. I try to push back on it where possible, e.g. my post AI doom from an LLM-plateau-ist perspective.
There were however plenty of people who were loudly arguing that it was important to work on AI x-risk before āthe current paradigmā was much of a thing (or in some cases long before āthe current paradigmā existed at all), and I think their arguments were sound at the time and remain sound today. (E.g. Alan Turing, Norbert Weiner, Yudkowsky, Bostrom, Stuart Russell, Tegmarkā¦) (OpenPhil seems to have started working seriously on AI in 2016, which was 3 years before GPT-2.)
Thanks for your interesting thoughts on this!
On the timelines question, I know Chollet argues AGI is further off than a lot of people think, and maybe his views do imply that in expectation, but it also seems to me like his views introduce higher variance into the prediction, and so would also allow for the possibility of much more rapid AGI advancement than the conventional narrative does.
If you think we just need to scale LLMs to get to AGI, then you expect things to happen fast, but probably not that fast. Progress is limited by compute and by data availability.
But if there is some crucial set of ideas yet to be discovered, then thatās something that could change extremely quickly. Weāre potentially just waiting for someone to have a eureka moment. And weād be much less certain what exactly was possible with current hardware and data once that moment happens. Maybe we could have superhuman AGI almost overnight?