I didnât update my views on the survey because I havenât seen the survey. I did ask for the survey so I could see it. I havenât seen it yet. I couldnât find it on the website. I might change my mind after I see it. Who knows.
I agree the JFK scenario is extremely outlandish and would basically be impossible. I just think the rapid scenario is more outlandish and would also basically be impossible.
Everything you said about AI I just donât think is true at all. LLMs are just another narrow AI, similar to AlphaGo, AlphaStar, AlphaFold, and so on, and not a fundamental improvement in generality that gets us closer to AGI. You shouldnât have updated your AGI timelines based on LLMs. Thatâs just a mistake. Whatever you thought in 2018 about the probability of the rapid scenario, you should think the same now, or actually even less because more time has elapsed and the necessary breakthroughs have still not been made. So, what was your probability for the rapid scenario in 2018? And what would your probability have been if someone told you to imagine there would be very little progress toward the rapid scenario between 2018 and 2025? Thatâs what I think your probability should be.
To say that AIâs capabilities were basically nothing in 2018 is ahistorical. The baseline from which you are measuring progress is not correct, so that will lead you to overestimate progress.
I also get the impression you greatly overestimate Claudeâs capabilities relative to the cognitive challenges of generating the behaviours described in the rapid scenario.
AI being able to do AI research doesnât affect the timeline. Hereâs why. AI doing AI research requires fundamental advancements in AI to a degree that would make something akin to AGI or something akin to the rapid scenario happen anyway. So, whether AI does AI research canât accelerate the point at which we reach the rapid scenario. There are no credible arguments to the contrary.
The vast majority of benchmarks are not just somewhat misleading if seen as evidence about AGI progress. They are almost completely meaningless in terms of AGI progress, with perhaps the sole exception of the ARC-AGI benchmarks. Text Q&A benchmarks are about as meaningful an indication of AGI progress as AIâs ability to play go or StarCraft.
There is also the physical impossibility problem. If continuing scaling trends is literally physically impossible, then how can the probability of the rapid scenario be more than 1 in 10,000? (By the way, I said less than 1 in 10,000, not less than 1 in 100,000, although Iâm not sure it really matters.)
Someone should try to do the math on what it would take to scale RL training compute for LLMs to some level that could be considered a proxy for AGI or the sort of AI system that could make the rapid scenario possible. You will likely get some really absurd result. For example, I wouldnât be surprised if the result was that the energy required would mean weâd have to consume the entire Sun, or multiple stars, or multiple galaxies. In which case, the speed of light would render the rapid scenario impossible.
Combinatorial explosion is just that crazy. There isnât enough energy in the entire universe to brute force a 60-character password. RL is not the same thing as trying random combinations for a password, but there is an element of that in RL, and the state space of real world environments from the perspective of an AI agent is much, much larger than the possible combinations of a 60-character password.
Ok, thereâs a lot here, and Iâm not sure I can respond to all of it, but I will respond to some of it.
-I think you should be moved just by my telling you about the survey. Unless you are super confident either that I am lying/âmistaken about it, or that the FRI was totally incompetent in assembling an expert panel, the mere fact that Iâm telling you the median expert credence in the rapid scenario is 23% in the survey ought to make you think there is at least a pretty decent chance that you are giving it several orders of magnitude less credence than the median expert/âsuperforecaster. You should already be updating on there being a decent chance that is true, even if you donât know for sure. Unless you already believed there was a decent chance you were that far out of step with expert opinion, but I think that just means you were already probably doing the wrong thing in assigning ultra-low credence. I say âprobablyâ because the epistemology of disagreement IS very complicated, and maybe sometimes itâs ok to stick to your guns in the face of expert consensus.
-âPhysical impossibilityâ. Well, itâs not literally true that you canât scale any further at all. Thatâs why they are building all those data centers for eyewatering sums of money. Of course, they will hit limits eventually and perhaps soon-probably monetary before physical. But you admit yourself that no one has actually calculated how much compute is needed to reach AGI. And indeed, that is very hard to do. Actually Epoch, who are far from believers in the rapid scenario as far as I can tell think quite a lot of recent progress has come from algorithmic improvements, not scaling: https://ââblog.redwoodresearch.org/ââp/ââwhats-going-on-with-ai-progress-and Text search for âAlgorithmic improvementâ or âEpoch reports that we seeâ. So progress could continue to some degree even if we did hit limits on scaling. As far as I can tell, most of the people who do believe in the rapid scenario actually expect scaling of training compute to at least slow down a lot relatively soon, even though the expect big increases in the near future. Of course, none of this proves that we can reach AGI with current techniques just by scaling, and I am pretty dubious of that for any realistic amount of scaling. But I donât think you should be talking like the opposite has been proven. We donât know how much compute is needed for AGI with the techniques of today or the techniques available by 2029, so we donât know whether the needed amount of compute would breach physical or financial or any other limits.
-LLM âNarrownessâ and 2018 baseline: Well, I was probably a bit inexact about the baseline here. I guess what I meant was something like this. Before 2018ish, as a non-technical person, I never really heard anything about exciting AI stuff, even though I paid attention to EA a lot, and people in EA already cared a lot about AI safety and saw it as a top cause area. Since then, there has been loads of attention, literal founding fathers of the field like Hinton say there is something big going on, I find LLMs useful for work, there have been relatively hard to fake achievements like doing decently well on the Math Olympiad, and College students can now use AI to cheat on their essays, a task that absolutely would have been considered to involve âreal intelligenceâ before Chat-GPT. More generally, I remember a time, as someone who learnt a bit of cognitive science while studying philosophy, when the problem with AI was essentially being presented as âbut we just canât hardcode all our knowledge in, and on the other hand, its not clear neural nets can really learn natural languagesâ. Basically AI was seen as something that struggled with anything that involved holistic judgment based on pattern-matching and heuristics, rather than hard-coded rules. That problem now seems somewhat solved: We now seem to be able to get AIs to learn how to use natural language correctly, or play games like Go that canât be brute forced by exact calculation, but rely on pattern-recognition and âintuitionâ. These AIs might not be general, but the techniques for getting them to learn these things might be a big part of how you build an AI that actually is, since the seem to be applicable to large variety of kinds of data: image recognition, natural language, code, Go and many other games, information about proteins. The techniques for learning seem more general than many of the systems. That seems like relatively impressive progress for a short time to me as a layperson. I donât particularly think that should move anyone else that much, but it explains why it is not completely obvious to me, why we could not reach AGI by 2030 at current rates of progress. And again, I will emphasize, I think this is very unlikely. Probably my median is that real AGI is 25 years away. I just donât think it is 1 in a million âvery unlikelyâ.
I want to emphasize here though, that I donât really think anything under the 3rd dash here should change your mind. Thatâs more just an explanation of where I am coming from, and I donât think it should persuade anyone of anything really. But I definitely do think the stuff about expert opinion should make you tone down your extremely extreme confidence, even if just a bit.
Iâd also say that I think you are not really helping your own cause here by expressing such an incredibly super-high level of certainty, and making some sweeping claims that you canât really back up, like that we know right now that physical limits have a strong bearing on whether AGI will arrive soon. I usually upvote the stuff you post here about AGI, because I genuinely think you raise good, tough questions, for the many people around here with short timelines. (Plenty of those people probably have thought-through answers to those questions, but plenty probably donât and are just following what they see as EA consensus.) But I think you also have a tendency to overconfidence that makes it easier for people to just ignore what you say. This come out in you doing annoying things you donât really need to do, like moving quickly in some posts from âscaling wonât reach AGIâ to âAI boom is a bubble that will unravelâ without much supporting argument, when obviously, AI models could make vast revenues without being full AGI. It gives the impression of someone who is reasoning in a somewhat motivated manner, even as they also have thought about the topic a lot and have real insights.
I think your suspicion toward my epistemic practices is based simply on the fact that you disagree very strongly, you donât understand my views or arguments very deeply, you donât know my background or history, and youâre mentalizing incorrectly.
[Edited on 2025-11-18 at 05:10 UTC to add fancy formatting.]
AI bubble
For example, I have a detailed collection of thoughts about why I think AI investment is most likely in a bubble, but I havenât posted about that in much detail on the EA Forum yet â maybe I will, or maybe itâs not particularly central to these debates or on-topic for the forum. Iâm not sure to what extent an AI bubble popping would even change the minds of people in EA about the prospects of near-term AGI. How relevant is it?
I asked on here about to what extent the AI bubble popping would change peopleâs views on near-term AGI and the only answer I got is that it wouldnât move the needle. So, Iâm not sure if thatâs where the argument needs to go. Just because I briefly mention this topic in passing doesnât mean my full thoughts about the topic are really only that brief. It is hard to talk about these things and treat every topic mentioned, even off-handedly, in full detail without writing the whole Encyclopedia Britannica.
Also, I am much, much less sure about the AI bubble conclusion than I am about AGI or about the rapid scenario. It is extremely, trivially obvious that sub-AGI/âpre-AGI/ânon-AGI systems could potentially generate a huge amount of profit and justify huge company valuations, and indeed Iâve written something like 100 articles about that topic over the last 8 years. I used to have a whole blog/ânewsletter solely about that topic and I made a very small amount of money doing freelance writing primarily about the financial prospects of AI. I actually find it a little insulting that you would think I have never considered that AI could be a big financial opportunity without AGI coming to fruition in the near term.
[Edited on 2025-11-16 at 01:05 UTC to add: I ended up covering the bubble topichere.]
LLM scaling
Here is Toby Ord on the physical limits to scaling RL training compute for LLMs:
Grok 4 was trained on 200,000 GPUs located in xAIâs vast Colossus datacenter. To achieve the equivalent of a GPT-level jump through RL would (according to the rough scaling relationships above) require 1,000,000x the total training compute. To put that in perspective, it would require replacing every GPU in their datacenter with 5 entirely new datacenters of the same size, then using 5 years worth of the entire worldâs electricity production to train the model. So it looks infeasible for further scaling of RL-training compute to give even a single GPT-level boost.
This is not what it would take to get to AGI, itâs what it would take to get from Grok 4 to Grok 5 (assuming the scaling trend were to continue as it did from Grok 3 to Grok 4).
I am willing to say that, if Tobyâs calculation is correct, it is very close to an absolute certainty that this level of scaling of RL training compute for LLMs â using 5x the worldâs current annual electricity supply and 1 million datacentres â will not happen before the end of 2030.
My comments about extrapolating scaling to AGI potentially requiring galaxies is not really the actual main point Iâm trying to make about scaling, itâs just to emphasize the problem with runaway exponential growth of this kind and the error in extrapolating its long-term continuation. This is for emphasis and illustration, not a strongly held view.
A number of prominent experts like OpenAIâs former chief scientist Ilya Sutskever have said self-supervised pre-training of LLMs has run out of steam or reached a plateau. Anthropicâs CEO Dario Amodei said that Anthropicâs focus has shifted from pre-training to RL training. So, at this point we are relying quite a lot on scaling up RL training for LLMs to get better as a result of training. (More discussion here.) Inference compute can also be scaled up and thatâs great, but you have to pay the inference cost every query and canât amortize it across billions or trillions of queries like you can with the training cost. Plus, you run into a similar problem where once you scale up inference compute 100x and 100x again after that, the next 100x and the next 100x starts to become unwieldy.
Fundamental questions
On the philosophy of mind and cognitive science topics, I have been spiritually enthralled with these topics since I was a teenager in the 2000s and 2010s (when I first read books like Daniel Dennettâs Consciousness Explained and Douglas Hofstadterâs I Am A Strange Loop, and incidentally was also interested in people talking about AGI like Ray Kurzweil and Nick Bostrom, and even was reading/âwatching/âlistening to some of Eliezer Yudkowskyâs stuff back then), for a long time wanted to devote my life to studying them, and actually I still would like to do that if I could somehow figure out a practical path for that in life. As a philosophy undergrad, I published an essay on the computational theory of mind in an undergrad journal and, unfortunately, thatâs the closest Iâve come to making any kind of contribution to the field.
Iâve been following AI closely for a long time and I can imagine how you might have a distorted view of things if you see generative AI as having come basically out of nowhere. I started paying attention to deep learning and deep reinforcement learning around the time DeepMind showed its results with deep RL and Atari games. I really ramped up how much I started paying attention in 2017 when I started to think really seriously about self-driving cars. So, LLMs were quite a surprise for me, just as they were for many people, but they didnât drop out of the clear blue sky. I already had an intellectual context to put them in.
If you want to read some more substantive objections to the rapid scenario, there are some in the post above. Iâm not sure if you read those or just focused on the forecasting part in the preamble. The rapid scenario depends on a number of fundamental improvements in AI, including (but not limited to) vastly improved generalization, vastly improved data efficiency, the ability to learn effectively and efficiently from video data, and continual learning. These are not challenges that can be solved through scaling, full stop. And the rapid scenario cannot happen without solving them, full stop. There is more to say, but thatâs a good start.
Expert survey
On the survey, I might update on the survey once I see it, but I need to see it first. Iâd love to see it and Iâll keep an open mind when I do. There are some entirely epistemically legitimate reasons for me not to update on evidence I canât see or confirm, especially when thatâs evidence only about other peopleâs views and not direct evidence, and especially especially when whether itâs actually even new information to me about other peopleâs views depends on the details â such as who was surveyed â which I canât see and donât know.
There are strong concerns related to the concept of information cascades where, e.g., if you just survey the same people over and over and repackage it as new evidence, that would lead you to keep revising your credences upwards (or downwards) with no limit based on the same piece of evidence, or, e.g., people will circularly update their views â I tell you my view, you update based on that, you tell me your updated view (changed only because of me telling you my view), I update based on that, and so on and so forth, until we end up a significant way from where we started for no good reason. In case you think this is a silly hypothetical, I read a post or comment somewhere (I could dig it up) where someone who had been involved in the Bay Area rationalist community said they think this kind of circular updating actually happened. The imagery they gave was people sitting in a circle telling each other smaller and smaller numbers for the median date of AGI.
[Edited on 2025-11-14 at 07:52 UTC to add: I was able to findthe report. It was easy to find. I was just looking on the wrong webpage before. The discussion of the question about the ârapid progressâ scenario on page 104 and the responses on page 141 is confusing. Respondents are asked, âAt the end of 2030, what percent of LEAP panelists will choose âslow progress,â âmoderate progress,â or ârapid progressâ as best matching the general level of AI progress?â I find that a really strange and counterintuitive way to frame the question. How is this a proxy for probability that the scenario will occur? The framing of the question is highly ambiguous and the answers are highly ambiguous and hard to interpret.
Three rationale examples are given for the rapid progress scenario and all three contradict the rapid progress scenario. How are the rationale examples selected? Was there not one example of a respondent who actually thought the rapid progress scenario would occur? I donât understand this.
This is precisely why I donât update on evidence before seeing it. The devil is in the details.
The rationale examples are useful and Iâm glad they are included. They show problems both with the design of the survey and with the reasoning and evidence used by some of the respondents to come to conclusions about near-term AI progress, e.g., the famous METR time horizon graph is erroneously interpreted in a way that overlooks the crucial caveats, some of whicheven METR itselfhighlights. Instead of only measuring what METR measures, researchers should also measure something like performance on a diverse, heterogeneous array of manually graded real world or realistic tasks with the same success rate as humans. The result would be entirely different, i.e., approximately a flat line at zero rather than the appearance of an exponential trend.
Iâll also add asking respondents to choose only between the slow progress, moderate progress, and rapid progress scenarios is really poor survey design. All three scenarios arguably include proxies for or operationalizations of AGI, and respondents are not given the option to say no to all of them. Even the slow progress scenario says that AI âcan automate basic research tasks, generate mediocre creative content, assist in vacation planning, and conduct relatively standard tasks that are currently (2025) performed by humans in homes and factories.â AI can also ârarely produce novel and feasible solutions to difficult problems.â And AI can âcan handle roughly half of all freelance software-engineering jobs that would take an experienced human approximately 8 hours to complete in 2025â, write âfull-length novelsâ, âmake a 3-minute song that humans would blindly judge to be of equal quality to a song released by a current (2025) major record labelâ, and largely substitute for âa competent human assistantâ.
So, respondents were given a choice between AGI, AGI, and AGI, and chose AGI. This is not a useful survey! You are not giving the respondents a chance to say no! You are baking in the result into the question!
Another serious problem with the survey is the percentage of respondents affiliated with effective altruism. On page 20, the report says 28% of respondents were affiliated with effective altruism and that was reweighted down to 12%. This is exactly the problem with information cascades and circular updating that I anticipated. I donât need a new survey of people affiliated with effective altruism to tell me what people affiliated with effective altruism believe about AI. I already know that.
Another significant problem is that only around 45% of the experts have technical expertise in AI. But now Iâm just piling on.
You absolutely should not have told me to update on this survey before actually looking at it.]
[Edited on 2025-11-18 at 04:57 UTC to add: I madea postabout the Forecasting Research Institute report, specifically about the content of the slow progress scenario and the framing of that question.]
Unfortunately, this is a debate where forecasts canât be practically falsified or settled. If January 2031 rolls out and AI has still only made modest, incremental progress relative to today, the evidence is still open to interpretation as to whether a 97-98% chance the rapid scenario wouldnât happen was more reasonable or a 99.99%+ chance it wouldnât happen. We canât agree on how to interpret similar evidence today. I have no reason to think it would be any easier to come to an agreement on that in January 2031.
It is an interesting question, as you say, how to update our views based on the views of other people â whether, when, why, and by how much. I was surprised to recently see a survey where around 20% of philosophers accept or lean toward a supernatural explanation of consciousness. I guess itâs possible to live in a bubble where you can miss that lots of people think so differently than you. I would personally say that the chances that consciousness is a supernatural phenomenon are less than the rapid scenario. And that survey didnât make me revise up my credence in the supernatural at all. (What about you?)
I will say that the rapid scenario is akin to a supernatural miracle in the radical discontinuity to our sense of reality it implies. It is more or less the view that we will invent God â or many gods â before the 2032 Summer Olympics in Brisbane. You should not be so quick to chide someone for saying this is less than 0.01% likely to happen. Outside of the EA/ârationalist/âBay Area tech industry bubble, this kind of thing is widely seen as completely insane and ludicrous.
In my interpretation, the rapid scenario is an even higher bar than âjustâ inventing AGI, it implies superhuman AGI. So, for instance, a whole brain emulation wouldnât qualify. An AGI that is âmerelyâ human-level wouldnât qualify. I canât make a Grammy-calibre album, write a Pulitzer or Booker Prize-calibre book, or make a Hollywood movie, nor can I run a company or do scientific research, and I am a general intelligence. The rapid scenario implies superhuman AGI or superintelligence, so itâs less likely than âjustâ AGI.
Meta discussion
Please forgive me for how long this comment is, but I suddenly felt the need to say⌠the following...
Iâm starting to get the temptation to ask you questions like, âWhat probability would you put on the core metaphysical and cosmological beliefs of each of the major world religions turning out to be correct?â which is a sign this conversation is getting overly meta, overly abstract, and veering into âHow many angels can dance on the head of a pin?â territory. I actually am fascinated with epistemology and want to explore some of these questions more (but in another context than this comment thread that would be more appropriate). I am a bit interested in forecasting, but not fascinated, and kind of would like to understand it better (I donât understand it very well currently). I would particularly like to understand forecasting better as it pertains to the threshold or demarcation between topics it is rigorous to forecast about, for which there is evidence of efficacy, such as elections or near-term geopolitical events, and topics for which forecasting is unrigorous, not supported by scientific evidence, and probably inappropriate, such as, âWhat is the probability of the core tenets of Hindu theology such as the identity of Atman and Brahman being correct?â
My personal contention is that actually a huge problem with the EA Forum (and also LessWrong, to an even worse extent) is how much time, energy, and attention gets sucked into these highly abstract meta debates. To me, itâs like debating about whether you should update your probability of whether thereâs peanut butter in the cupboard based on my stated probability of whether thereâs peanut butter in the cupboard, when we could just look in the cupboard. The abstract content of that debate is actually pretty damn interesting, and I would like to take an online course on that or something, but thatâs the indulgent attitude of a philosophy student and not what I think practically matters here. I simply want more people to engage substantively with the object-level points Iâm making, e.g. about learning from video data, generalization, data efficiency, scaling limits, and so on. Thatâs âlooking in the cupboardâ. I could be wrong about everything. I could be making basic mistakes. I donât know. What can I do except try to have the conversation?
By the way, when I give my probabilities for something, I am just trying to faithfully and honestly report, as a number, what I think my subjective or qualitative sense of the likelihood of something implies. I am not necessarily making an argument that anyone else should have that same probability. I just want them to talk to me about the object-level issues. The probabilities are a side thing that I need to get out of the way to talk about that. I donât intend me reporting my best guess at how my intuitions translate into numbers as an insult against anyone. This passes the reversibility test for me: if someone says they think their probability for something is 1,000x higher or lower than mine, I donât interpret that as an insult.
So, I donât think is it impolite for me to express the numbers that are my best guess. I do kind of accept that I will be less persuasive if I say a number that seems too extreme, which is why Iâve been kind of softballing/âsandbagging what I say about this. Also, I think if someone says âsignificantly less than 1%â or even just âless than 1%â or â1%â, thatâs enough to motivate the discussion of object-level topics and to move on from the guessing probabilities portion of the conversation. So, itâs kind of irrelevant whether I say less than 1 in 1,000, less than 1 in 10,000, less than than 1 in 100,000, or less than 1 in 1 million. Yes, I get that these are very different probabilities (each one an order of magnitude lower than the last!), but from the perspective of just hurrying along to the object-level discussion, it doesnât really make a difference.
I almost would be willing to accept I should sandbag my actual probability even more for the sake of diplomacy and persuasion, and just say âless than 1%â or something like that. But that seems a little bit morally corrupt â maybe âmorally corruptâ is too strong, but, hey, Iâd rather just be transparent and honest rather than water things down to be more persuasive to people who are very far away from me on this topic. (The question of how integrate considerations about both diplomacy and frankness into oneâs communications is another fascinating topic, but also another diversion away from the object-level issues pertaining to the prospects of near-term AGI.)
Some people in this community sometimes like to pretend they donât have feelings and are just calculating machines running through numbers, but the emotion is betrayed anyway. The undercurrent of this conversation is that some people take offense at my views or find them irritating, and I have an incentive to placate them if I want to engage them in conversation. I accept that that is true. I am no diplomat or mediator, and I donât feel particularly competent at persuasion.
My honest motivation for engaging in these debates is mostly sheer boredom, curiosity, and a desire for intellectual enrichment and activity. Yeah, yeah, there is some plausible social benefit or moral reason to do this by course correcting effective altruism, but Iâm kind of 50â50 on my p(doom) for effective altruism anyway, and I think the chances are slim that Iâm going to make a dent in that. So, if it were just a grinding chore, I wouldnât do it.
Anyway, all this is to say: please just talk to me about the object-level issues, and try to keep the rest of it (e.g. getting into the weeds of forecasting, open questions in epistemology that philosophers and other experts donât agree on, abstract meta debates) low, and only bring it up when itâs really, really important. (Not just you, personally, David, this is my general request.) Iâm dying to talk about the object-level issues, and somehow I keep ending up talking about this meta stuff. (I am easily distractible and will talk forever about all sorts of topics, even topics that donât matter and donât relate to the issue I originally wanted to talk about, so itâs my fault too.)
I didnât update my views on the survey because I havenât seen the survey. I did ask for the survey so I could see it. I havenât seen it yet. I couldnât find it on the website. I might change my mind after I see it. Who knows.
I agree the JFK scenario is extremely outlandish and would basically be impossible. I just think the rapid scenario is more outlandish and would also basically be impossible.
Everything you said about AI I just donât think is true at all. LLMs are just another narrow AI, similar to AlphaGo, AlphaStar, AlphaFold, and so on, and not a fundamental improvement in generality that gets us closer to AGI. You shouldnât have updated your AGI timelines based on LLMs. Thatâs just a mistake. Whatever you thought in 2018 about the probability of the rapid scenario, you should think the same now, or actually even less because more time has elapsed and the necessary breakthroughs have still not been made. So, what was your probability for the rapid scenario in 2018? And what would your probability have been if someone told you to imagine there would be very little progress toward the rapid scenario between 2018 and 2025? Thatâs what I think your probability should be.
To say that AIâs capabilities were basically nothing in 2018 is ahistorical. The baseline from which you are measuring progress is not correct, so that will lead you to overestimate progress.
I also get the impression you greatly overestimate Claudeâs capabilities relative to the cognitive challenges of generating the behaviours described in the rapid scenario.
AI being able to do AI research doesnât affect the timeline. Hereâs why. AI doing AI research requires fundamental advancements in AI to a degree that would make something akin to AGI or something akin to the rapid scenario happen anyway. So, whether AI does AI research canât accelerate the point at which we reach the rapid scenario. There are no credible arguments to the contrary.
The vast majority of benchmarks are not just somewhat misleading if seen as evidence about AGI progress. They are almost completely meaningless in terms of AGI progress, with perhaps the sole exception of the ARC-AGI benchmarks. Text Q&A benchmarks are about as meaningful an indication of AGI progress as AIâs ability to play go or StarCraft.
There is also the physical impossibility problem. If continuing scaling trends is literally physically impossible, then how can the probability of the rapid scenario be more than 1 in 10,000? (By the way, I said less than 1 in 10,000, not less than 1 in 100,000, although Iâm not sure it really matters.)
Someone should try to do the math on what it would take to scale RL training compute for LLMs to some level that could be considered a proxy for AGI or the sort of AI system that could make the rapid scenario possible. You will likely get some really absurd result. For example, I wouldnât be surprised if the result was that the energy required would mean weâd have to consume the entire Sun, or multiple stars, or multiple galaxies. In which case, the speed of light would render the rapid scenario impossible.
Combinatorial explosion is just that crazy. There isnât enough energy in the entire universe to brute force a 60-character password. RL is not the same thing as trying random combinations for a password, but there is an element of that in RL, and the state space of real world environments from the perspective of an AI agent is much, much larger than the possible combinations of a 60-character password.
Ok, thereâs a lot here, and Iâm not sure I can respond to all of it, but I will respond to some of it.
-I think you should be moved just by my telling you about the survey. Unless you are super confident either that I am lying/âmistaken about it, or that the FRI was totally incompetent in assembling an expert panel, the mere fact that Iâm telling you the median expert credence in the rapid scenario is 23% in the survey ought to make you think there is at least a pretty decent chance that you are giving it several orders of magnitude less credence than the median expert/âsuperforecaster. You should already be updating on there being a decent chance that is true, even if you donât know for sure. Unless you already believed there was a decent chance you were that far out of step with expert opinion, but I think that just means you were already probably doing the wrong thing in assigning ultra-low credence. I say âprobablyâ because the epistemology of disagreement IS very complicated, and maybe sometimes itâs ok to stick to your guns in the face of expert consensus.
-âPhysical impossibilityâ. Well, itâs not literally true that you canât scale any further at all. Thatâs why they are building all those data centers for eyewatering sums of money. Of course, they will hit limits eventually and perhaps soon-probably monetary before physical. But you admit yourself that no one has actually calculated how much compute is needed to reach AGI. And indeed, that is very hard to do. Actually Epoch, who are far from believers in the rapid scenario as far as I can tell think quite a lot of recent progress has come from algorithmic improvements, not scaling: https://ââblog.redwoodresearch.org/ââp/ââwhats-going-on-with-ai-progress-and Text search for âAlgorithmic improvementâ or âEpoch reports that we seeâ. So progress could continue to some degree even if we did hit limits on scaling. As far as I can tell, most of the people who do believe in the rapid scenario actually expect scaling of training compute to at least slow down a lot relatively soon, even though the expect big increases in the near future. Of course, none of this proves that we can reach AGI with current techniques just by scaling, and I am pretty dubious of that for any realistic amount of scaling. But I donât think you should be talking like the opposite has been proven. We donât know how much compute is needed for AGI with the techniques of today or the techniques available by 2029, so we donât know whether the needed amount of compute would breach physical or financial or any other limits.
-LLM âNarrownessâ and 2018 baseline: Well, I was probably a bit inexact about the baseline here. I guess what I meant was something like this. Before 2018ish, as a non-technical person, I never really heard anything about exciting AI stuff, even though I paid attention to EA a lot, and people in EA already cared a lot about AI safety and saw it as a top cause area. Since then, there has been loads of attention, literal founding fathers of the field like Hinton say there is something big going on, I find LLMs useful for work, there have been relatively hard to fake achievements like doing decently well on the Math Olympiad, and College students can now use AI to cheat on their essays, a task that absolutely would have been considered to involve âreal intelligenceâ before Chat-GPT. More generally, I remember a time, as someone who learnt a bit of cognitive science while studying philosophy, when the problem with AI was essentially being presented as âbut we just canât hardcode all our knowledge in, and on the other hand, its not clear neural nets can really learn natural languagesâ. Basically AI was seen as something that struggled with anything that involved holistic judgment based on pattern-matching and heuristics, rather than hard-coded rules. That problem now seems somewhat solved: We now seem to be able to get AIs to learn how to use natural language correctly, or play games like Go that canât be brute forced by exact calculation, but rely on pattern-recognition and âintuitionâ. These AIs might not be general, but the techniques for getting them to learn these things might be a big part of how you build an AI that actually is, since the seem to be applicable to large variety of kinds of data: image recognition, natural language, code, Go and many other games, information about proteins. The techniques for learning seem more general than many of the systems. That seems like relatively impressive progress for a short time to me as a layperson. I donât particularly think that should move anyone else that much, but it explains why it is not completely obvious to me, why we could not reach AGI by 2030 at current rates of progress. And again, I will emphasize, I think this is very unlikely. Probably my median is that real AGI is 25 years away. I just donât think it is 1 in a million âvery unlikelyâ.
I want to emphasize here though, that I donât really think anything under the 3rd dash here should change your mind. Thatâs more just an explanation of where I am coming from, and I donât think it should persuade anyone of anything really. But I definitely do think the stuff about expert opinion should make you tone down your extremely extreme confidence, even if just a bit.
Iâd also say that I think you are not really helping your own cause here by expressing such an incredibly super-high level of certainty, and making some sweeping claims that you canât really back up, like that we know right now that physical limits have a strong bearing on whether AGI will arrive soon. I usually upvote the stuff you post here about AGI, because I genuinely think you raise good, tough questions, for the many people around here with short timelines. (Plenty of those people probably have thought-through answers to those questions, but plenty probably donât and are just following what they see as EA consensus.) But I think you also have a tendency to overconfidence that makes it easier for people to just ignore what you say. This come out in you doing annoying things you donât really need to do, like moving quickly in some posts from âscaling wonât reach AGIâ to âAI boom is a bubble that will unravelâ without much supporting argument, when obviously, AI models could make vast revenues without being full AGI. It gives the impression of someone who is reasoning in a somewhat motivated manner, even as they also have thought about the topic a lot and have real insights.
I think your suspicion toward my epistemic practices is based simply on the fact that you disagree very strongly, you donât understand my views or arguments very deeply, you donât know my background or history, and youâre mentalizing incorrectly.
[Edited on 2025-11-18 at 05:10 UTC to add fancy formatting.]
AI bubble
For example, I have a detailed collection of thoughts about why I think AI investment is most likely in a bubble, but I havenât posted about that in much detail on the EA Forum yet â maybe I will, or maybe itâs not particularly central to these debates or on-topic for the forum. Iâm not sure to what extent an AI bubble popping would even change the minds of people in EA about the prospects of near-term AGI. How relevant is it?
I asked on here about to what extent the AI bubble popping would change peopleâs views on near-term AGI and the only answer I got is that it wouldnât move the needle. So, Iâm not sure if thatâs where the argument needs to go. Just because I briefly mention this topic in passing doesnât mean my full thoughts about the topic are really only that brief. It is hard to talk about these things and treat every topic mentioned, even off-handedly, in full detail without writing the whole Encyclopedia Britannica.
Also, I am much, much less sure about the AI bubble conclusion than I am about AGI or about the rapid scenario. It is extremely, trivially obvious that sub-AGI/âpre-AGI/ânon-AGI systems could potentially generate a huge amount of profit and justify huge company valuations, and indeed Iâve written something like 100 articles about that topic over the last 8 years. I used to have a whole blog/ânewsletter solely about that topic and I made a very small amount of money doing freelance writing primarily about the financial prospects of AI. I actually find it a little insulting that you would think I have never considered that AI could be a big financial opportunity without AGI coming to fruition in the near term.
[Edited on 2025-11-16 at 01:05 UTC to add: I ended up covering the bubble topic here.]
LLM scaling
Here is Toby Ord on the physical limits to scaling RL training compute for LLMs:
This is not what it would take to get to AGI, itâs what it would take to get from Grok 4 to Grok 5 (assuming the scaling trend were to continue as it did from Grok 3 to Grok 4).
I am willing to say that, if Tobyâs calculation is correct, it is very close to an absolute certainty that this level of scaling of RL training compute for LLMs â using 5x the worldâs current annual electricity supply and 1 million datacentres â will not happen before the end of 2030.
My comments about extrapolating scaling to AGI potentially requiring galaxies is not really the actual main point Iâm trying to make about scaling, itâs just to emphasize the problem with runaway exponential growth of this kind and the error in extrapolating its long-term continuation. This is for emphasis and illustration, not a strongly held view.
A number of prominent experts like OpenAIâs former chief scientist Ilya Sutskever have said self-supervised pre-training of LLMs has run out of steam or reached a plateau. Anthropicâs CEO Dario Amodei said that Anthropicâs focus has shifted from pre-training to RL training. So, at this point we are relying quite a lot on scaling up RL training for LLMs to get better as a result of training. (More discussion here.) Inference compute can also be scaled up and thatâs great, but you have to pay the inference cost every query and canât amortize it across billions or trillions of queries like you can with the training cost. Plus, you run into a similar problem where once you scale up inference compute 100x and 100x again after that, the next 100x and the next 100x starts to become unwieldy.
Fundamental questions
On the philosophy of mind and cognitive science topics, I have been spiritually enthralled with these topics since I was a teenager in the 2000s and 2010s (when I first read books like Daniel Dennettâs Consciousness Explained and Douglas Hofstadterâs I Am A Strange Loop, and incidentally was also interested in people talking about AGI like Ray Kurzweil and Nick Bostrom, and even was reading/âwatching/âlistening to some of Eliezer Yudkowskyâs stuff back then), for a long time wanted to devote my life to studying them, and actually I still would like to do that if I could somehow figure out a practical path for that in life. As a philosophy undergrad, I published an essay on the computational theory of mind in an undergrad journal and, unfortunately, thatâs the closest Iâve come to making any kind of contribution to the field.
Iâve been following AI closely for a long time and I can imagine how you might have a distorted view of things if you see generative AI as having come basically out of nowhere. I started paying attention to deep learning and deep reinforcement learning around the time DeepMind showed its results with deep RL and Atari games. I really ramped up how much I started paying attention in 2017 when I started to think really seriously about self-driving cars. So, LLMs were quite a surprise for me, just as they were for many people, but they didnât drop out of the clear blue sky. I already had an intellectual context to put them in.
If you want to read some more substantive objections to the rapid scenario, there are some in the post above. Iâm not sure if you read those or just focused on the forecasting part in the preamble. The rapid scenario depends on a number of fundamental improvements in AI, including (but not limited to) vastly improved generalization, vastly improved data efficiency, the ability to learn effectively and efficiently from video data, and continual learning. These are not challenges that can be solved through scaling, full stop. And the rapid scenario cannot happen without solving them, full stop. There is more to say, but thatâs a good start.
Expert survey
On the survey, I might update on the survey once I see it, but I need to see it first. Iâd love to see it and Iâll keep an open mind when I do. There are some entirely epistemically legitimate reasons for me not to update on evidence I canât see or confirm, especially when thatâs evidence only about other peopleâs views and not direct evidence, and especially especially when whether itâs actually even new information to me about other peopleâs views depends on the details â such as who was surveyed â which I canât see and donât know.
There are strong concerns related to the concept of information cascades where, e.g., if you just survey the same people over and over and repackage it as new evidence, that would lead you to keep revising your credences upwards (or downwards) with no limit based on the same piece of evidence, or, e.g., people will circularly update their views â I tell you my view, you update based on that, you tell me your updated view (changed only because of me telling you my view), I update based on that, and so on and so forth, until we end up a significant way from where we started for no good reason. In case you think this is a silly hypothetical, I read a post or comment somewhere (I could dig it up) where someone who had been involved in the Bay Area rationalist community said they think this kind of circular updating actually happened. The imagery they gave was people sitting in a circle telling each other smaller and smaller numbers for the median date of AGI.
[Edited on 2025-11-14 at 07:52 UTC to add: I was able to find the report. It was easy to find. I was just looking on the wrong webpage before. The discussion of the question about the ârapid progressâ scenario on page 104 and the responses on page 141 is confusing. Respondents are asked, âAt the end of 2030, what percent of LEAP panelists will choose âslow progress,â âmoderate progress,â or ârapid progressâ as best matching the general level of AI progress?â I find that a really strange and counterintuitive way to frame the question. How is this a proxy for probability that the scenario will occur? The framing of the question is highly ambiguous and the answers are highly ambiguous and hard to interpret.
Three rationale examples are given for the rapid progress scenario and all three contradict the rapid progress scenario. How are the rationale examples selected? Was there not one example of a respondent who actually thought the rapid progress scenario would occur? I donât understand this.
This is precisely why I donât update on evidence before seeing it. The devil is in the details.
The rationale examples are useful and Iâm glad they are included. They show problems both with the design of the survey and with the reasoning and evidence used by some of the respondents to come to conclusions about near-term AI progress, e.g., the famous METR time horizon graph is erroneously interpreted in a way that overlooks the crucial caveats, some of which even METR itself highlights. Instead of only measuring what METR measures, researchers should also measure something like performance on a diverse, heterogeneous array of manually graded real world or realistic tasks with the same success rate as humans. The result would be entirely different, i.e., approximately a flat line at zero rather than the appearance of an exponential trend.
Iâll also add asking respondents to choose only between the slow progress, moderate progress, and rapid progress scenarios is really poor survey design. All three scenarios arguably include proxies for or operationalizations of AGI, and respondents are not given the option to say no to all of them. Even the slow progress scenario says that AI âcan automate basic research tasks, generate mediocre creative content, assist in vacation planning, and conduct relatively standard tasks that are currently (2025) performed by humans in homes and factories.â AI can also ârarely produce novel and feasible solutions to difficult problems.â And AI can âcan handle roughly half of all freelance software-engineering jobs that would take an experienced human approximately 8 hours to complete in 2025â, write âfull-length novelsâ, âmake a 3-minute song that humans would blindly judge to be of equal quality to a song released by a current (2025) major record labelâ, and largely substitute for âa competent human assistantâ.
So, respondents were given a choice between AGI, AGI, and AGI, and chose AGI. This is not a useful survey! You are not giving the respondents a chance to say no! You are baking in the result into the question!
Another serious problem with the survey is the percentage of respondents affiliated with effective altruism. On page 20, the report says 28% of respondents were affiliated with effective altruism and that was reweighted down to 12%. This is exactly the problem with information cascades and circular updating that I anticipated. I donât need a new survey of people affiliated with effective altruism to tell me what people affiliated with effective altruism believe about AI. I already know that.
Another significant problem is that only around 45% of the experts have technical expertise in AI. But now Iâm just piling on.
You absolutely should not have told me to update on this survey before actually looking at it.]
[Edited on 2025-11-18 at 04:57 UTC to add: I made a post about the Forecasting Research Institute report, specifically about the content of the slow progress scenario and the framing of that question.]
Unfortunately, this is a debate where forecasts canât be practically falsified or settled. If January 2031 rolls out and AI has still only made modest, incremental progress relative to today, the evidence is still open to interpretation as to whether a 97-98% chance the rapid scenario wouldnât happen was more reasonable or a 99.99%+ chance it wouldnât happen. We canât agree on how to interpret similar evidence today. I have no reason to think it would be any easier to come to an agreement on that in January 2031.
It is an interesting question, as you say, how to update our views based on the views of other people â whether, when, why, and by how much. I was surprised to recently see a survey where around 20% of philosophers accept or lean toward a supernatural explanation of consciousness. I guess itâs possible to live in a bubble where you can miss that lots of people think so differently than you. I would personally say that the chances that consciousness is a supernatural phenomenon are less than the rapid scenario. And that survey didnât make me revise up my credence in the supernatural at all. (What about you?)
I will say that the rapid scenario is akin to a supernatural miracle in the radical discontinuity to our sense of reality it implies. It is more or less the view that we will invent God â or many gods â before the 2032 Summer Olympics in Brisbane. You should not be so quick to chide someone for saying this is less than 0.01% likely to happen. Outside of the EA/ârationalist/âBay Area tech industry bubble, this kind of thing is widely seen as completely insane and ludicrous.
In my interpretation, the rapid scenario is an even higher bar than âjustâ inventing AGI, it implies superhuman AGI. So, for instance, a whole brain emulation wouldnât qualify. An AGI that is âmerelyâ human-level wouldnât qualify. I canât make a Grammy-calibre album, write a Pulitzer or Booker Prize-calibre book, or make a Hollywood movie, nor can I run a company or do scientific research, and I am a general intelligence. The rapid scenario implies superhuman AGI or superintelligence, so itâs less likely than âjustâ AGI.
Meta discussion
Please forgive me for how long this comment is, but I suddenly felt the need to say⌠the following...
Iâm starting to get the temptation to ask you questions like, âWhat probability would you put on the core metaphysical and cosmological beliefs of each of the major world religions turning out to be correct?â which is a sign this conversation is getting overly meta, overly abstract, and veering into âHow many angels can dance on the head of a pin?â territory. I actually am fascinated with epistemology and want to explore some of these questions more (but in another context than this comment thread that would be more appropriate). I am a bit interested in forecasting, but not fascinated, and kind of would like to understand it better (I donât understand it very well currently). I would particularly like to understand forecasting better as it pertains to the threshold or demarcation between topics it is rigorous to forecast about, for which there is evidence of efficacy, such as elections or near-term geopolitical events, and topics for which forecasting is unrigorous, not supported by scientific evidence, and probably inappropriate, such as, âWhat is the probability of the core tenets of Hindu theology such as the identity of Atman and Brahman being correct?â
My personal contention is that actually a huge problem with the EA Forum (and also LessWrong, to an even worse extent) is how much time, energy, and attention gets sucked into these highly abstract meta debates. To me, itâs like debating about whether you should update your probability of whether thereâs peanut butter in the cupboard based on my stated probability of whether thereâs peanut butter in the cupboard, when we could just look in the cupboard. The abstract content of that debate is actually pretty damn interesting, and I would like to take an online course on that or something, but thatâs the indulgent attitude of a philosophy student and not what I think practically matters here. I simply want more people to engage substantively with the object-level points Iâm making, e.g. about learning from video data, generalization, data efficiency, scaling limits, and so on. Thatâs âlooking in the cupboardâ. I could be wrong about everything. I could be making basic mistakes. I donât know. What can I do except try to have the conversation?
By the way, when I give my probabilities for something, I am just trying to faithfully and honestly report, as a number, what I think my subjective or qualitative sense of the likelihood of something implies. I am not necessarily making an argument that anyone else should have that same probability. I just want them to talk to me about the object-level issues. The probabilities are a side thing that I need to get out of the way to talk about that. I donât intend me reporting my best guess at how my intuitions translate into numbers as an insult against anyone. This passes the reversibility test for me: if someone says they think their probability for something is 1,000x higher or lower than mine, I donât interpret that as an insult.
So, I donât think is it impolite for me to express the numbers that are my best guess. I do kind of accept that I will be less persuasive if I say a number that seems too extreme, which is why Iâve been kind of softballing/âsandbagging what I say about this. Also, I think if someone says âsignificantly less than 1%â or even just âless than 1%â or â1%â, thatâs enough to motivate the discussion of object-level topics and to move on from the guessing probabilities portion of the conversation. So, itâs kind of irrelevant whether I say less than 1 in 1,000, less than 1 in 10,000, less than than 1 in 100,000, or less than 1 in 1 million. Yes, I get that these are very different probabilities (each one an order of magnitude lower than the last!), but from the perspective of just hurrying along to the object-level discussion, it doesnât really make a difference.
I almost would be willing to accept I should sandbag my actual probability even more for the sake of diplomacy and persuasion, and just say âless than 1%â or something like that. But that seems a little bit morally corrupt â maybe âmorally corruptâ is too strong, but, hey, Iâd rather just be transparent and honest rather than water things down to be more persuasive to people who are very far away from me on this topic. (The question of how integrate considerations about both diplomacy and frankness into oneâs communications is another fascinating topic, but also another diversion away from the object-level issues pertaining to the prospects of near-term AGI.)
Some people in this community sometimes like to pretend they donât have feelings and are just calculating machines running through numbers, but the emotion is betrayed anyway. The undercurrent of this conversation is that some people take offense at my views or find them irritating, and I have an incentive to placate them if I want to engage them in conversation. I accept that that is true. I am no diplomat or mediator, and I donât feel particularly competent at persuasion.
My honest motivation for engaging in these debates is mostly sheer boredom, curiosity, and a desire for intellectual enrichment and activity. Yeah, yeah, there is some plausible social benefit or moral reason to do this by course correcting effective altruism, but Iâm kind of 50â50 on my p(doom) for effective altruism anyway, and I think the chances are slim that Iâm going to make a dent in that. So, if it were just a grinding chore, I wouldnât do it.
Anyway, all this is to say: please just talk to me about the object-level issues, and try to keep the rest of it (e.g. getting into the weeds of forecasting, open questions in epistemology that philosophers and other experts donât agree on, abstract meta debates) low, and only bring it up when itâs really, really important. (Not just you, personally, David, this is my general request.) Iâm dying to talk about the object-level issues, and somehow I keep ending up talking about this meta stuff. (I am easily distractible and will talk forever about all sorts of topics, even topics that donât matter and donât relate to the issue I originally wanted to talk about, so itâs my fault too.)