I think it is bad faith to pretend that those who argue for near-term AGI have no idea about any of this when all the well-known cases for near-term AGI (including both AI 2027 and IABIED) name continual learning as the major breakthrough required.
Can you provide citations? I wasnāt quickly able to find what youāre referring to.
I tried to search for the exact phrase ācontinual learningā in the book you mentioned ā If Anyone Builds It, Everyone Dies by Eliezer Yudkowsky and Nate Soares, for the benefit of other readers of these comments ā and got no results. I also did a Google search for āsite:ai-2027.com continual learningā and got no results. This isnāt a foolproof method since it relies on that exact phrase being used.
In any case, a specific quote from AI 2027 or the book would be helpful.
As briefly discussed in the post, I think a major problem with AI 2027ās way of thinking about things is the notion that AI will be able to pull itself up by its bootstraps by conducting AI research in the very near future (within 2 years). That is impossible given AI systemsā current capabilities and would require some kind of fundamental research breakthrough or some other major increase in capabilities (e.g. through scaling) very soon.
Importantly, the thesis of this post is not just that continual learning would be required for AGI, but vastly increased data efficiency. You can assume, for the sake of argument, AI systems will be able to continually learn, but if they learn more than 2,500x slowly than humans (or whatever it is), that precludes AGI. Data efficiency and generalization are even more important concepts for the argument of this post than continual learning.
By the way, I donāt think I claimed that nobody who argues that very near-term AGI is very likely is aware of the sort of things I discussed in my post. I just claimed that there have been āno good answersā to these kinds of objections.
At a glance, the only mentions of ālong-term memoryā in If Anyone Builds It, Everyone Dies (IABIED) by Yudkowsky and Soares are in the context of a short science fiction story. Itās just stipulated that a fictional AI system has it. Itās only briefly mentioned, and there is no explanation of what long-term memory consists of, how it works, or how it was developed. Itās similar to Dataās āpositronic brainā in Star Trek: The Next Generation or any number of hand-waved technological concepts in sci-fi. Do you think this is a good answer to my objection? If so, can you explain why?
Unless there are other passages from the book that Iām missing (maybe because they use a different phrasing), the mentions of ālong-term memoryā in the book donāt seem to have the centrality to the authorsā arguments or predictions that you implied. I donāt see textual evidence that Yudkowsky and Soares āname continual learning as the major breakthrough requiredā, unless you count those very brief mentions in the sci-fi story.
I think one of the main problems with AI 2027ā²s story is the impossibility of using current AI systems for AI research, as I discussed in the post. There is a chicken-and-egg problem. LLMs are currently useless at actually doing research (as opposed to just helping with research as a search engine, in the exactly same way Google helps with research). AI systems with extremely weak generalization, extremely poor data efficiency, and without continual learning (or online learning) cannot plausibly do research well. Somehow this challenge has to be overcome, and obviously an AI that canāt do research canāt do the research to give itself the capabilities required to do research. (That would be an AI pulling itself up by its bootstraps. Or, in the terminology of the philosopher Daniel Dennett, it would be a skyhook.) So, it comes down to human researches to solve this challenge.
From what Iāve seen, when AI 2027 talks about an AI discovery/āinnovation being made by human researchers, the authors just kind of give their subjective best guess of how long that discovery/āinnovation will take to be made. This is not a new or original objection, of course, but I share the same objection as others. Any of these discoveries/āinnovations could take a tenth as long or ten times as long as the authors guess. So, AI 2027 isnāt a scientific model (which I donāt think is what the authors explicitly claim, although thatās the impression some people seem to have gotten). Rather, itās an aggregation of a few peopleās intuitions. It doesnāt really serve as a persuasive piece of argumentation for most people for mainly that reason.
It doesnāt matter if they state that is a āmajor breakthrough requiredā if they donāt provide sufficient evidence that this breakthrough is in any way likely to happen in the immediate future. Yarrow is provided plenty of argumentation as to why it wonāt happen: if you disagree you should feel free to cite actual counter-evidence rather than throwing false accusations of bad faith around.
I agree that that comment may be going too far with claiming ābad faithā, but the article does have a pretty tedious undertone of having found some crazy gotcha that everyone is ignoring. (Iād agree that it gets at a crux and that some reasonable people, e.g. Karpathy, would align more with the OP here)
Whatās your response to the substance of the argument? From my perspective, people much more knowledgeable about AI and much more qualified than me have made the same or very similar objections, prominently in public, for some time now, and despite being fairly keyed in to these debates, I donāt see people giving serious replies to these objections. I have also tried to raise these sort of objections myself and generally found a lack of serious engagement on the substance.
I actually do see a significant number of people, including some people who are prominent in debates around AGI, giving replies that indicate a misunderstanding of these sorts of objections, or indications that people havenāt considered these sort of objections before, or hand-waving dismissals. But Iām still trying to find the serious replies. Itās possible there has been a serious and persuasive rebuttal somewhere I missed ā part of the purpose of writing a post like this is to elicit that, either from a commenter directly or from someone citing a previous rebuttal. But if you insist such a rebuttal is so obvious that these objections are tedious, I canāt believe you until you make that rebuttal or cite it.
Case in point⦠Matrice identified something in my post that was of secondary importance ā continual learning ā that, in the post, I was willing to hand-wave away to focus on the things that I think are of primary importance, namely, 1) physical limits to scaling, 2) the inability to learn from video data, 3) the lack of abundant human examples for most human skills, 4) data inefficiency, and 5) poor generalization. So, first of all, Matrice did not identify one of the five points I actually raised in the post.
Second, Matrice made a citation that, when I followed up on it, did not actually say what Matrice claimed it said, and in no way answered the objection that current AI canāt continually learn (which, to repeat, was not one of the main objections I made in my post anyway). It was literally just a short sci-fi story where itās simply said a fictional AI can continually learn, with no further discussion of the topic and no details beyond that. How is that a serious response to the objection about continual learning, and especially how is that a serious response to my post, when I didnāt raise continual learning as one of my main objections?
So, Matriceās reply mispresented both the thesis of my post and misrepresented the work they cited as a rebuttal to it.
If there is a better response to the substance of the objections I raised in my post than this, please let me know! Iām dying to hear it!
1) physical limits to scaling, 2) the inability to learn from video data, 3) the lack of abundant human examples for most human skills, 4) data inefficiency, and 5) poor generalization
All of those except 2) boil down to āfoundation models have to learn once and for all through training on collected datasets instead of continually learning for each instantiationā. See also AGIās Last Bottlenecks.
No, none of them boil down to that, and especially not (1).
Iāve already read the āA Definition of AGIā paper (which the blog post you linked to is based on) and it does not even mention the objections I made in this post, let alone offer a reply.
My main objection to the paper is that it makes a false inference that tests used to assess human cognitive capabilities can be used to test whether AI systems have those same capabilities. GPT-4 scored more than 100 on an IQ test in 2023, which would imply that it is an AGI if an AI that passes a test has the cognitive capabilities a human is believed to have if it passes that same test. The paper does not anticipate this objection or try to argue against it.
(Also, this is just a minor side point, but Andrej Karpathy did not actually say AGI is a decade away on Dwarkesh Patelās podcast. He said useful AI agents are a decade away. This is pretty clear in the interview or the transcript. Karpathy did not comment directly on the timeline for AGI, although it seems to be implied that AGI can come no sooner than AI agents.
Unfortunately, Dwarkesh or his editor or whoever titles his episodes, YouTube chapters, and clips has sometimes given inaccurate titles that badly misrepresent what the podcast guest actually said.)
How is āheterogeneous skillsā based on private information and āadapting to changing situation in real time with very little dataā not what continual learning mean?
Hereās a definition of continual learning from an IBM blog post:
Continual learning is an artificial intelligence (AI) learning approach that involves sequentially training a model for new tasks while preserving previously learned tasks. Models incrementally learn from a continuous stream of nonstationary data, and the total number of tasks to be learned is not known in advance.
To cope with real-world dynamics, an intelligent system needs to incrementally acquire, update, accumulate, and exploit knowledge throughout its lifetime. This ability, known as continual learning, provides a foundation for AI systems to develop themselves adaptively. In a general sense, continual learning is explicitly limited by catastrophic forgetting, where learning a new task usually results in a dramatic performance degradation of the old tasks.
The definition of continual learning is not related to generalization, data efficiency, the availability of training data, or the physical limits to LLM scaling.
You could have a continual learning system that is equally data inefficient as current AI systems and is equally poor at generalization. Continual learning does not solve the problem of training data being unavailable. Continual learning does not help you scale up training compute or training data if compute and data are scarce or expensive, nor does the ability to continually learn mean an AI system will automatically get all the performance improvements it would have gotten from continuing scaling trends.
Yes those quotes do refer to the need for a model to develop heterogeneous skills based on private information, and to adapt to changing situations in real life with very little data. I donāt see your problem.
In case itās helpful, I prompted Claude Sonnet 4.5 with extended thinking to explain three of the key concepts weāre discussing and I thought it gave a pretty good answer, which you can read here. (I archived that answer here, in case that link breaks.)
I gave GPT-5 Thinking almost the same prompt (I had to add some instructions because the first response it gave was way too technical) and it gave an okay answer, which you can read here. (Archive link here.)
I tried to Google for human-written explanations of the similarities and differences first, since thatās obviously preferable. But I couldnāt quickly find one, probably because thereās no particular reason to compare these concepts directly to each other.
No, those definitions quite clearly donāt say anything about data efficiency or generalization, or the other problems I raised.
I think you have misunderstood the concept of continual learning. It doesnāt mean what you seem to think it means. You seem to be confusing the concept of continual learning with some much more expansive concept, such as generality.
If Iām wrong, you should be able to quite easily provide citations that clearly show otherwise.
I donāt think Karpathy would describe his view as involving any sort of discontinuity in AI development. If anything his views are the most central no-discontinuity straight-lines-on-graphes view (no intelligence explosion accelerating the trends, no winter decelerating the trends). And if you think the mean date for AGI is 2035 then it would take extreme confidence (on the order of variance of less than a year) to claim AGI is less than 0.1% likely by 2032!
I was only mentioning Karpathy as someone reasonable who repeatedly points out the lack of online learning and seems to have (somewhat) longer timelines because of that. This is solely based on my general impression. I agree the stated probabilities seem wildly overconfident.
I donāt know what Andrej Karpathyās actual timeline for AGI is. In the Dwarkesh Patel interview that everyone has been citing, Karpathy says he thinks itās a decade until we get useful AI agents, not AGI. This implies he thinks AGI is at least a decade away, but he doesnāt actually directly address when he thinks AGI will arrive.
After the interview, Karpathy made a clarification on Twitter where he said 10 years to AGI should come across to people as highly optimistic in the grand scheme of things, which maybe implies he does actually think AGI is 10 years away and will arrive at the same time as useful AI agents. However, itās ambiguous enough I would hesitate to interpret it one way or another.
I could be wrong, but I didnāt get the impression that continual learning or online learning was Karpathyās main reason (let alone sole reason) for thinking useful AI agents are a decade away, or for his other comments that express skepticism or pessimism ā relative to people with 5-year AGI timelines ā about progress in AI or AI capabilities.
Continual learning/āonline learning is not one of the main issues raised in my post and while I think it is an important issue, you can hand-wave away continual learning and still have problems with scaling limits, learning from video data, human examples to imitation learn from, data inefficiency, and generalization.
Itās not just Andrej Karpathy but a number of other prominent AI researchers, such as FranƧois Chollet, Yann LeCun, and Richard Sutton, who have publicly raised objections to the idea that very near-term AGI is very likely via scaling LLMs. In fact, in the preamble of my post I linked to a previous post of mine where I discuss how a survey of AI researchers found they have a median timeline for AGI of over 20 years (and possibly much, much longer than 20 years, depending how you interpret the survey), and how, in another survey, 76% of AI experts surveyed think scaling LLMs or other current techniques is unlikely or very unlikely to reach AGI. Iām not defending a fringe, minority position in the AI world, but in fact something much closer to the majority view than what you typically see on the EA Forum.
I think it is bad faith to pretend that those who argue for near-term AGI have no idea about any of this when all the well-known cases for near-term AGI (including both AI 2027 and IABIED) name continual learning as the major breakthrough required.
Can you provide citations? I wasnāt quickly able to find what youāre referring to.
I tried to search for the exact phrase ācontinual learningā in the book you mentioned ā If Anyone Builds It, Everyone Dies by Eliezer Yudkowsky and Nate Soares, for the benefit of other readers of these comments ā and got no results. I also did a Google search for āsite:ai-2027.com continual learningā and got no results. This isnāt a foolproof method since it relies on that exact phrase being used.
In any case, a specific quote from AI 2027 or the book would be helpful.
As briefly discussed in the post, I think a major problem with AI 2027ās way of thinking about things is the notion that AI will be able to pull itself up by its bootstraps by conducting AI research in the very near future (within 2 years). That is impossible given AI systemsā current capabilities and would require some kind of fundamental research breakthrough or some other major increase in capabilities (e.g. through scaling) very soon.
Importantly, the thesis of this post is not just that continual learning would be required for AGI, but vastly increased data efficiency. You can assume, for the sake of argument, AI systems will be able to continually learn, but if they learn more than 2,500x slowly than humans (or whatever it is), that precludes AGI. Data efficiency and generalization are even more important concepts for the argument of this post than continual learning.
By the way, I donāt think I claimed that nobody who argues that very near-term AGI is very likely is aware of the sort of things I discussed in my post. I just claimed that there have been āno good answersā to these kinds of objections.
Itās called online learning in AI 2027 and human-like long-term memory in IABIED.
At a glance, the only mentions of ālong-term memoryā in If Anyone Builds It, Everyone Dies (IABIED) by Yudkowsky and Soares are in the context of a short science fiction story. Itās just stipulated that a fictional AI system has it. Itās only briefly mentioned, and there is no explanation of what long-term memory consists of, how it works, or how it was developed. Itās similar to Dataās āpositronic brainā in Star Trek: The Next Generation or any number of hand-waved technological concepts in sci-fi. Do you think this is a good answer to my objection? If so, can you explain why?
Unless there are other passages from the book that Iām missing (maybe because they use a different phrasing), the mentions of ālong-term memoryā in the book donāt seem to have the centrality to the authorsā arguments or predictions that you implied. I donāt see textual evidence that Yudkowsky and Soares āname continual learning as the major breakthrough requiredā, unless you count those very brief mentions in the sci-fi story.
I think one of the main problems with AI 2027ā²s story is the impossibility of using current AI systems for AI research, as I discussed in the post. There is a chicken-and-egg problem. LLMs are currently useless at actually doing research (as opposed to just helping with research as a search engine, in the exactly same way Google helps with research). AI systems with extremely weak generalization, extremely poor data efficiency, and without continual learning (or online learning) cannot plausibly do research well. Somehow this challenge has to be overcome, and obviously an AI that canāt do research canāt do the research to give itself the capabilities required to do research. (That would be an AI pulling itself up by its bootstraps. Or, in the terminology of the philosopher Daniel Dennett, it would be a skyhook.) So, it comes down to human researches to solve this challenge.
From what Iāve seen, when AI 2027 talks about an AI discovery/āinnovation being made by human researchers, the authors just kind of give their subjective best guess of how long that discovery/āinnovation will take to be made. This is not a new or original objection, of course, but I share the same objection as others. Any of these discoveries/āinnovations could take a tenth as long or ten times as long as the authors guess. So, AI 2027 isnāt a scientific model (which I donāt think is what the authors explicitly claim, although thatās the impression some people seem to have gotten). Rather, itās an aggregation of a few peopleās intuitions. It doesnāt really serve as a persuasive piece of argumentation for most people for mainly that reason.
It doesnāt matter if they state that is a āmajor breakthrough requiredā if they donāt provide sufficient evidence that this breakthrough is in any way likely to happen in the immediate future. Yarrow is provided plenty of argumentation as to why it wonāt happen: if you disagree you should feel free to cite actual counter-evidence rather than throwing false accusations of bad faith around.
I agree that that comment may be going too far with claiming ābad faithā, but the article does have a pretty tedious undertone of having found some crazy gotcha that everyone is ignoring. (Iād agree that it gets at a crux and that some reasonable people, e.g. Karpathy, would align more with the OP here)
Whatās your response to the substance of the argument? From my perspective, people much more knowledgeable about AI and much more qualified than me have made the same or very similar objections, prominently in public, for some time now, and despite being fairly keyed in to these debates, I donāt see people giving serious replies to these objections. I have also tried to raise these sort of objections myself and generally found a lack of serious engagement on the substance.
I actually do see a significant number of people, including some people who are prominent in debates around AGI, giving replies that indicate a misunderstanding of these sorts of objections, or indications that people havenāt considered these sort of objections before, or hand-waving dismissals. But Iām still trying to find the serious replies. Itās possible there has been a serious and persuasive rebuttal somewhere I missed ā part of the purpose of writing a post like this is to elicit that, either from a commenter directly or from someone citing a previous rebuttal. But if you insist such a rebuttal is so obvious that these objections are tedious, I canāt believe you until you make that rebuttal or cite it.
Case in point⦠Matrice identified something in my post that was of secondary importance ā continual learning ā that, in the post, I was willing to hand-wave away to focus on the things that I think are of primary importance, namely, 1) physical limits to scaling, 2) the inability to learn from video data, 3) the lack of abundant human examples for most human skills, 4) data inefficiency, and 5) poor generalization. So, first of all, Matrice did not identify one of the five points I actually raised in the post.
Second, Matrice made a citation that, when I followed up on it, did not actually say what Matrice claimed it said, and in no way answered the objection that current AI canāt continually learn (which, to repeat, was not one of the main objections I made in my post anyway). It was literally just a short sci-fi story where itās simply said a fictional AI can continually learn, with no further discussion of the topic and no details beyond that. How is that a serious response to the objection about continual learning, and especially how is that a serious response to my post, when I didnāt raise continual learning as one of my main objections?
So, Matriceās reply mispresented both the thesis of my post and misrepresented the work they cited as a rebuttal to it.
If there is a better response to the substance of the objections I raised in my post than this, please let me know! Iām dying to hear it!
All of those except 2) boil down to āfoundation models have to learn once and for all through training on collected datasets instead of continually learning for each instantiationā. See also AGIās Last Bottlenecks.
No, none of them boil down to that, and especially not (1).
Iāve already read the āA Definition of AGIā paper (which the blog post you linked to is based on) and it does not even mention the objections I made in this post, let alone offer a reply.
My main objection to the paper is that it makes a false inference that tests used to assess human cognitive capabilities can be used to test whether AI systems have those same capabilities. GPT-4 scored more than 100 on an IQ test in 2023, which would imply that it is an AGI if an AI that passes a test has the cognitive capabilities a human is believed to have if it passes that same test. The paper does not anticipate this objection or try to argue against it.
(Also, this is just a minor side point, but Andrej Karpathy did not actually say AGI is a decade away on Dwarkesh Patelās podcast. He said useful AI agents are a decade away. This is pretty clear in the interview or the transcript. Karpathy did not comment directly on the timeline for AGI, although it seems to be implied that AGI can come no sooner than AI agents.
Unfortunately, Dwarkesh or his editor or whoever titles his episodes, YouTube chapters, and clips has sometimes given inaccurate titles that badly misrepresent what the podcast guest actually said.)
How is āheterogeneous skillsā based on private information and āadapting to changing situation in real time with very little dataā not what continual learning mean?
Hereās a definition of continual learning from an IBM blog post:
Hereās another definition, from an ArXiv pre-print:
The definition of continual learning is not related to generalization, data efficiency, the availability of training data, or the physical limits to LLM scaling.
You could have a continual learning system that is equally data inefficient as current AI systems and is equally poor at generalization. Continual learning does not solve the problem of training data being unavailable. Continual learning does not help you scale up training compute or training data if compute and data are scarce or expensive, nor does the ability to continually learn mean an AI system will automatically get all the performance improvements it would have gotten from continuing scaling trends.
Yes those quotes do refer to the need for a model to develop heterogeneous skills based on private information, and to adapt to changing situations in real life with very little data. I donāt see your problem.
In case itās helpful, I prompted Claude Sonnet 4.5 with extended thinking to explain three of the key concepts weāre discussing and I thought it gave a pretty good answer, which you can read here. (I archived that answer here, in case that link breaks.)
I gave GPT-5 Thinking almost the same prompt (I had to add some instructions because the first response it gave was way too technical) and it gave an okay answer, which you can read here. (Archive link here.)
I tried to Google for human-written explanations of the similarities and differences first, since thatās obviously preferable. But I couldnāt quickly find one, probably because thereās no particular reason to compare these concepts directly to each other.
No, those definitions quite clearly donāt say anything about data efficiency or generalization, or the other problems I raised.
I think you have misunderstood the concept of continual learning. It doesnāt mean what you seem to think it means. You seem to be confusing the concept of continual learning with some much more expansive concept, such as generality.
If Iām wrong, you should be able to quite easily provide citations that clearly show otherwise.
I donāt think Karpathy would describe his view as involving any sort of discontinuity in AI development. If anything his views are the most central no-discontinuity straight-lines-on-graphes view (no intelligence explosion accelerating the trends, no winter decelerating the trends). And if you think the mean date for AGI is 2035 then it would take extreme confidence (on the order of variance of less than a year) to claim AGI is less than 0.1% likely by 2032!
I was only mentioning Karpathy as someone reasonable who repeatedly points out the lack of online learning and seems to have (somewhat) longer timelines because of that. This is solely based on my general impression. I agree the stated probabilities seem wildly overconfident.
I donāt know what Andrej Karpathyās actual timeline for AGI is. In the Dwarkesh Patel interview that everyone has been citing, Karpathy says he thinks itās a decade until we get useful AI agents, not AGI. This implies he thinks AGI is at least a decade away, but he doesnāt actually directly address when he thinks AGI will arrive.
After the interview, Karpathy made a clarification on Twitter where he said 10 years to AGI should come across to people as highly optimistic in the grand scheme of things, which maybe implies he does actually think AGI is 10 years away and will arrive at the same time as useful AI agents. However, itās ambiguous enough I would hesitate to interpret it one way or another.
I could be wrong, but I didnāt get the impression that continual learning or online learning was Karpathyās main reason (let alone sole reason) for thinking useful AI agents are a decade away, or for his other comments that express skepticism or pessimism ā relative to people with 5-year AGI timelines ā about progress in AI or AI capabilities.
Continual learning/āonline learning is not one of the main issues raised in my post and while I think it is an important issue, you can hand-wave away continual learning and still have problems with scaling limits, learning from video data, human examples to imitation learn from, data inefficiency, and generalization.
Itās not just Andrej Karpathy but a number of other prominent AI researchers, such as FranƧois Chollet, Yann LeCun, and Richard Sutton, who have publicly raised objections to the idea that very near-term AGI is very likely via scaling LLMs. In fact, in the preamble of my post I linked to a previous post of mine where I discuss how a survey of AI researchers found they have a median timeline for AGI of over 20 years (and possibly much, much longer than 20 years, depending how you interpret the survey), and how, in another survey, 76% of AI experts surveyed think scaling LLMs or other current techniques is unlikely or very unlikely to reach AGI. Iām not defending a fringe, minority position in the AI world, but in fact something much closer to the majority view than what you typically see on the EA Forum.