76% of experts saying it’s “unlikely” the current paradigm will lead to AGI leaves ample room for a majority thinking there’s a 10%+ chance it will . . . .
. . . . and the field are still mostly against you (at the 10% threshold).
I agree that the “unlikely” statistic leaves ample room for the majority of the field thinking there is a 10%+ chance, but it does not establish that the majority actually thinks that.
I would like to bring back more of the pre-ChatGPT disposition where people were more comfortable emphasizing their uncertainty, but standing by the expected value of AI safety work.
I think there are at least two (potentially overlapping) ways one could take the general concern that @Yarrow Bouchard 🔸 is identifying here. One, if accepted, leads to the substantive conclusion that EA individuals, orgs, and funders shouldn’t be nearly as focused on AI because the perceived dangers are just too remote. An alternative framing doesn’t necessarily lead there. It goes something like there has been a significant and worrisome decline in the quality of epistemic practices surrounding AI in EA since the advent of ChatGPT. If it—but not the other—framing is accepted, it leads in my view to a different set of recommended actions.
I flag that since I think the relevant considerations for assessing the alternative framing could be significantly different.
One not need choose between the two because they both point toward the same path: re-examine claims with greater scrutiny. There is no excuse for the egregious flaws in works like “Situational Awareness” and AI 2027. This is not serious scholarship. To the extent the EA community gets fooled by stuff like this, its reasoning process, and its weighing of evidence, will be severely impaired.
If you get rid of all the low-quality work and retrace all the steps of the argument from the beginning, might the EA community end up in basically the same place all over again, with a similar estimation of AGI risk and a similar allocation of resources toward it? Well, sure, it might. But it might not.
If your views are largely informed by falsehoods and ridiculous claims, half-truths and oversimplifications, greedy reductionism and measurements with little to no construct validity or criterion validity, and, in some cases, a lack of awareness of countervailing ideas or the all-too-eager dismissal of inconvenient evidence, then you simply don’t know what your views would end up being if you started all over again with more rigour and higher standards. The only appropriate response is to clear house. Put the ideas and evidence into a crucible and burn away what doesn’t belong. Then, start from the beginning and see what sort of conclusions can actually be justified with what remains.
A large part of the blame lies at the feet of LessWrong and at the feet of all the people in EA who decided, in some important cases quite early on, to mingle the two communities. LessWrong promotes skepticism and suspicion of academia, mainstream/institutional science, traditional forms of critical thinking and scientific skepticism, journalism, and society at large. At the same time, LessWrong promotes reverence and obsequence toward its own community, positioning itself as an alternative authority to replace academia, science, traditional critical thought, journalism, and mainstream culture. Not innocently. LessWrong is obsessed with fringe thinking. The community has created multiple groups that Ozy Brennan describes as “cults”. Given how small the LessWrong community is, I’d have to guess that the rate at which the community creates cults must be multiple orders of magnitude higher than the base rate for the general population.
LessWrong is also credulous about racist pseudoscience, and, in the words of a former Head of Communications at the Centre for Effective Altruism, is largely “straight-up racist”. One of the admins of LessWrong and co-founders of Lightcone Infrastructure once said, in the context of a discussion about the societal myth that gay people are evil or malicious and a danger to children:
I think… finding out (in the 1950s) that someone maintained many secret homosexual relationships for many years is actually a signal the person is fairly devious, and is both willing and capable of behaving in ways that society has strong norms about not doing.
It obviously isn’t true about homosexuals once the norm was lifted, but my guess is that it was at the time accurate to make a directional bayesian update that the person had behaved in actually bad and devious ways.
Such statements make “rationalist” a misnomer. (I was able to partially dissuade of him of this nonsense by showing him some of the easily accessible evidence he could have looked up for himself, but the community did not seem to particularly value my intervention.)
I don’t know that the epistemic practices of the EA community can be rescued as long the EA community remains interpenetrated with LessWrong to a major degree. The purpose of LessWrong is not to teach rationality, but to disable one’s critical faculties until one is willing to accept nonsense. Perhaps it is futile to clamour for better-quality scholarship when such a large undercurrent of the EA community is committed to the idea that normal ideas of what constitutes good scholarship are wrong and that the answers to what constitutes actually good scholarship lie with Eliezer Yudkowsky, an amateur philosopher with no relevant qualifications or achievements in any field, who frequently speaks with absolute confidence and is wrong, who experts often find non-credible, who has said he literally sees himself as the smartest person on Earth, and who rarely admits mistakes (despite making many) or issues corrections. If Yudkowsky is your highest and more revered authority, if you follow him in rejecting academia, institutional science, mainstream philosophy, journalism, normal culture, and so on, then I don’t know what could possibly convince you that the untrue things you believe are untrue, since your fundamental epistemology comes down to whether Yudkowsky says something is true or not, and he’s told you to reject all other sources of truth.
To the extent the EA community is under LessWrong’s spell, it will probably remain systemically irrational forever. Only within the portions of the EA community who have broken that spell, or never come under it in the first place, is there the hope for academic standards, mainstream scientific standards, traditional critical thinking, journalistic fact-checking, culturally evolved wisdom, and so on to take hold. It would be like expecting EA to be rational about politics while 30% of the community is under the spell of QAnon, or to be rational about global health while a large part of the community is under the spell of anti-vaccination pseudoscience. It’s just not gonna happen.
But maybe my root cause analysis is wrong and the EA community can course correct without fundamentally divorcing LessWrong. I don’t know. I hope that, whatever is the root cause, whatever it takes to fix it, the EA community’s current low standards for evidence and argumentation pertaining to AGI risk get raised significantly.
I don’t think it’s a brand new problem, by the way. Around 2016, I was periodically arguing with people about AI on the main EA group on Facebook. One of my points of contention was that MIRI’s focus on symbolic AI was a dead-end and that machine learning had empirically produced much better results, and was where the AI field was now focused. (MIRI took a long time before they finally hired their first researcher to focus on machine learning.) I didn’t have any more success convincing people about that back then than I’ve been having lately with my current points of contention.
I agree though that the situation seems to have gotten much worse in recent years, and ChatGPT (and LLMs in general) probably had a lot to do with that.
I don’t think EAs AI focus is a product only of interaction with Less Wrong,-not claiming you said otherwise-but I do think people outside the Less Wrong bubble tend to be less confident AGI is imminent, and in that sense less “cautious”.
I think EAs AI focus is largely a product of the fact that Nick Bostrom knew Will and Toby when they were founding EA, and was a big influence on their ideas. Of course, to some degree this might be indirect influence from Yudkowsky since he was always interacting with Nick Bostrom, but it’s hard to know in what direction the influenced flowed here. I was around in Oxford during the embryonic stages of EA, and while I was not involved-beyond being a GWWC member, I did have the odd conversation with people who were involved, and my memory is that even then, people were talking about X-risk from AI as a serious contender for the best cause area, as early as at least 2014, and maybe a bit before that. They -EDIT: by “they” here I mean, “some people in Oxford, I don’t remember who”; don’t know when Will and Toby specifically first interacted with LW folk-were involved in discussion with LW people, but I don’t think they got the idea FROM LW. Seems more likely to me they got it from Bostrom and the Future of Humanity Institute, who were just down the corridor.
What is true is that Oxford people have genuinely expressed much more caution about timelines. I.e. in What We Owe the Future, published as late as 2022, Will is still talking about how AGI might be more than 50 years, away but also “it might come soon-within the next fifty or even twenty years.” (If you’re wondering what evidence he cites, it’s the Cotra bioanchors report.) His discussion primarily emphasizes uncertainty about exactly when AGI will arrive, and how we can’t be confident it’s not close. He cites a figure from an Open Phil report guessing an 8% chance of AGI by 2036*. I know you’re view is that this is all wildly wrong still, but it’s quite different from what many-not all-Less Wrong people say, who tend to regard 20 years as a long time line. (Maybe Will has updated to shorter timelines since of course.)
I think there is something of a divide between people who believe strongly in a particular set of LessWrong derived ideas about the imminence of AGI, and another set of people who are mainly driven by something like “we should take positive EV bets with a small chance of paying off, and doing AI stuff just in case AGI arrives soon”. Defending the point about taking positive EV bets with only a small chance of pay-off is what a huge amount of the academic work on Longtermism at the GPI in Oxford was about. (This stuff definitely has been subjected to-severe-levels of peer reviewed scrutiny, as it keeps showing up in top philosophy journals with rejection rates of like, 90%.)
*This is more evidence people were prepared to bet big on AI risk long before the idea that AGI is actually imminent became as popular as it is now. I think people just rejected the idea that useful work could only be done when AGI was definitely near, and we had near-AGI models.
eh, I think the main reason EAs believe AGI stuff is reasonably likely is because this opinion is correct, given the best available evidence[1].
Having a genealogical explanation here is sort of answering the question on the wrong meta-level, like giving a historical explanation for “why do evolutionists believe in genes” or telling a touching story about somebody’s pet pig for “why do EAs care more about farmed animal welfare than tree welfare.”
Or upon hearing “why does Google use ads instead of subscriptions?” answering with the history of their DoubleClick acquisition. That history is real, but it’s downstream of the actual explanation: the economics of internet search heavily favor ad-supported models regardless of the specific path any company took. The genealogy is epiphenomenal.
EDIT: man I’m worried my comment will be read as a soldier-mindset thing that only makes sense if you presume the “AGI likely soon” is already correct. Which does not improve on the conversation. Please only upvote it iff a version of you that’s neutral on the object-level question would also upvote this comment.
Yeah, it’s fair objection that even answer the why question like I did presupposes that EAs are wrong, or at least, merely luckily right. (I think this is a matter of degree, and that EAs overrated the imminence of AGI and the risk of takeover on average, but it’s still at least reasonable to believe AI safety and governance work can have very high expected value for roughly the reasons EAs do.) But I was responding to Yarrow who does think that EAs are just totally wrong, so I guess really I was saying that “conditional on a sociological explanation being appropriate, I don’t think it’s as LW-driven as Yarrow thinks”, although LW is undoubtedly important.)
presupposes that EAs are wrong, or at least, merely luckily right
Right, to be clear I’m far from certain that the stereotypical “EA view” is right here.
I guess really I was saying that “conditional on a sociological explanation being appropriate, I don’t think it’s as LW-driven as Yarrow thinks”, although LW is undoubtedly important.
Sure that makes a lot of sense! I was mostly just using your comment to riff on a related concept.
I think reality is often complicated and confusing, and it’s hard to separate out contingency vs inevitable stories for why people believe what they believe. But I think the correct view is that EAs’ belief on AGI probability and risk (within an order of magnitude or so) is mostly not contingent (as of the year 2025) even if it turns out to be ultimately wrong.
The Google ads example was the best example I could think of to illustrate this. I’m far from certain that Google’s decision to use ads was actually the best source of long-term revenue (never mind being morally good lol). But it still seemed like the internet as we understand it meant it was implausible that Google ads was counterfactually due to their specific acquisitions.
Similarly, even if EAs ignored AI before for some reason, and never interacted with LW or Bostrom, it’s implausible that, as of 2025, people who are concerned with ambitious, large-scale altruistic impact (and have other epistemic, cultural, and maybe demographic properties characteristic of the movement) would not think of AI as a big deal. AI is just a big thing in the world that’s growing fast. Anybody capable of reading graphs can see that.
That said, specific micro-level beliefs (and maybe macro ones) within EA and AI risk might be different without influence from either LW or the Oxford crowd. For example there might be a stronger accelerationist arm. Alternatively, people might be more queasy with the closeness with the major AI companies, and there will be a stronger and more well-funded contingent of folks interested in public messaging on pausing or stopping AI. And in general if the movement didn’t “wake up” to AI concerns at all pre-ChatGPT I think we’d be in a more confused spot.
How many angels can dance on the head on a pin? An infinite number because angels have no spatial extension? Or maybe if we assume angels have a diameter of ~1 nanometre plus ~1 additional nanometre of diameter for clearance for dancing we can come up with a ballpark figure? Or, wait, are angels closer to human-sized? When bugs die do they turn into angels? What about bacteria? Can bacteria dance? Are angels beings who were formerly mortal, or were they “born” angels?[1]
AI is just a big thing in the world that’s growing fast. Anybody capable of reading graphs can see that.
Well, some of the graphs are just made-up, like those in “Situational Awareness”, and some of the graphs are woefully misinterpreted to be about AGI when they’re clearly not, like the famous METR time horizon graph.[2] I imagine that a non-trivial amount of EA misjudgment around AGI results from a failure to correctly read and interpret graphs.
And, of course, when people like titotal examine the math behind some of these graphs, like those in AI 2027, they are sometimes found to be riddled with major mistakes.
What I said elsewhere about AGI discourse in general is true about graphs in particular: the scientifically defensible claims are generally quite narrow, caveated, and conservative. The claims that are broad, unqualified, and bold are generally not scientifically defensible. People at METR themselves caveat the time horizons graph and note its narrow scope (I cited examples of this elsewhere in the comments on this post). Conversely, graphs that attempt to make a broad, unqualified, bold claim about AGI tend to be complete nonsense.
Out of curiosity, roughly what probability would you assign to there being an AI financial bubble that pops sometime within the next five years or so? If there is an AI bubble and if it popped, how would that affect your beliefs around near-term AGI?
How is correctness physically instantiated in space and time and how does it physically cause physical events in the world, such as speaking, writing, brain activity, and so on? Is this an important question to ask in this context? Do we need to get into this?
You can take an epistemic practice in EA such as “thinking that Leopold Aschenbrenner’s graphs are correct” and ask about the historical origin of that practice without making a judgement about whether the practice is good or bad, right or wrong. You can ask the question in a form like, “How did people in EA come to accept graphs like those in ‘Situational Awareness’ as evidence?” If you want to frame it positively, you could ask the question as something like, “How did people in EA learn to accept graphs like these as evidence?” If you want to frame it negatively, you could ask, “How did people in EA not learn not to accept graphs like these as evidence?” And of course you can frame it neutrally.
The historical explanation is a separate question from the evaluation of correctness/incorrectness and the two don’t conflict with each other. By analogy, you can ask, “How did Laverne come to believe in evolution?” And you could answer, “Because it’s the correct view,” which would be right, in a sense, if a bit obtuse, or you could answer, “Because she learned about evolution in her biology classes in high school and college”, which would also be right, and which would more directly answer the question. So, a historical explanation does not necessarily imply that a view is wrong. Maybe in some contexts it insinuates it, but both kinds of answers can be true.
Do you know a source that formally makes the argument that the METR graph is about AGI? I am trying to pin down the series of logical steps that people are using to get from that graph to AGI. I would like to spell out why I think this inference is wrong, but first it would be helpful to see someone spell out the inference they’re making.
Upvoted because I think this is interesting historical/intellectual context, but I think you might have misunderstood what I was trying to say in the comment you replied to. (I joined Giving What We Can in 2009 and got heavily involved in my university EA group from 2015-2018, so I’m aware that AI has been a big topic in AI for a very long time, but I’ve never had any involvement with Oxford University or had any personal connections with Toby Ord or Will MacAskill, besides a few passing online interactions.)
In my comment above, I wasn’t saying that EA’s interpenetration with LessWrong is largely to blame for the level of importance that the ideas of near-term AGI and AGI risk currently have in EA. (I also think that is largely true, but that wasn’t the point of my previous comment.) I was saying that the influence of LessWrong and EA’s embrace of the LessWrong subculture is largely to blame for the EA community accepting ridiculous stuff like “Situational Awareness”, AI 2027, and so on, despite it having glaring flaws.
Focus on AGI risk at the current level EA gives it could be rational, or it might not be. What is definitely true is that the EA community accepts a lot of completely irrational stuff related to AGI risk. LessWrong doesn’t believe in academia, institutional science, academic philosophy, journalism, scientific skepticism, common sense, and so on. LessWrong believes in Eliezer Yudkowsky, the Sequences, and LessWrong. So, members of the LessWrong community go completely off the rails and create or join cults at seemingly a much, much higher rate than the general population. Because they’ve been coached to reject the foundations of sanity that most people have, and to put their trust and belief in this small, fringe community.
The EA community is not nearly as bad as LessWrong. If I thought it was as bad, I wouldn’t bother trying to convince anyone in EA of anything, because I would think they were beyond rational persuasion. But EA has been infected to a very significant degree by the LessWrong irrationality. I think the level of emphasis that EA puts on subjective guesses as a source of truth and an accompanying sort of lazy, incurious approach to inquiry (why look stuff up or attempt to create a rigorous, defensible thesis when you can just guess stuff?) is one example of the LessWrong influence. Eliezer Yudkowsky quite literally, explicitly believes that his subjective guesses are a better guide to truth than the approach of traditional, mainstream scientific institutions and communities. Yudkowsky has attempted to teach his approach to subjectively guessing things to the LessWrong community (and enjoyers of Harry Potter fanfic). That approach has leeched into the EA community.
The result is you can have things like “Situational Awareness” and AI 2027 where the “data” is made-up and just consists of some random people’s random subjective guesses. This is the kind of stuff that should never be taken even a little bit seriously.
If you want to know which approach produces better results, look at the achievements of academic science — which underlie basically the entire modern world — versus the achievements of the LessWrong community — some Harry Potter fanfic and about half a dozen cults, despite believing their approach is unambiguously superior. If you adjust for time and population, the comparison still comes out favourably for science versus Yudkowskian subjective guessology. How many towns of under 5,000 people create even a single cult within even the span of 50 years? Versus the LessWrong community creating multiple cults within 16 years of its existence.
I could be totally wrong in my root cause analysis. EA may have developed these bad habits independently of LessWrong. In any case, I think it’s clear that these are bad habits, that they lead nowhere good, and that EA should clear house (i.e. stop believing in subjective guess-based or otherwise super low-quality argumentative writing) and raise the bar for the quality of arguments and evidence that are taken seriously to something a bit closer to the academic or scientific level.
I don’t have an idyllic view of academia. I don’t think it’s all black-and-white. I recently re-read a review of Colin McGinn’s ridiculous book on the philosophy of physics. On one hand, the descriptions of and quotes from the book reminded me of all the stuff that drives me crazy in academic philosophy. On the other hand, the reviewer is a philosopher and her review is published in a philosophy journal. So, there’s a push and pull.
Maybe a good analogy for academia is liberal democracy. It’s often a huge mess, full of ongoing conflicts and struggles, frequently unjust and unreasonable, but ultimately it produces an astronomical amount of value, rivalling the best of anything humans have ever done. By vouching for academia or liberal democracy, I’m not saying it’s all good, I’m just saying that the overall process is good. And the process itself (in both cases) can surely be improved, but through reform and evolution involving a lot of people with expertise, not by a charismatic outsider with a zealous following (e.g. illiberal/authoritarian strongmen, in the case of government, or someone like Yudkowsky, in the case of academia, who, incidentally, has a bit of an authoritarian attitude, not politically, but intellectually).
Can you say more about what makes something “a subjective guess” for you? When you say well under 0.05% chance of AGI in 10 years, is that a subjective guess?
Like, suppose I am asked, as a pro-forecaster, to say whether the US will invade Syria, after a US military build-up involving air craft carriers in the Eastern Med, and I look for newspaper reports of signs of this, look up the base rate of how often the US bluffs with a military build up rather than invading, and then make a guess as to how likely an invasion is, is that “a subjective guess”. Or am I relying on data? What about if I am doing what AI 2027 did and trying to predict when LLMs match human coding ability on the basis of current data. Suppose I use the METR data like they did, and I do the following. I assume that if AIs are genuinely able to complete 90% of real world tasks that take human coders 6 months, then they are likely as good at coding as humans. I project the METR data out to find a date for when we will hit 6-months tasks, theoretically if the trend continues. But then, instead of stopping, and saying that is my forecast, I remember that benchmark performance is generally a bit misleading in terms of real-world competence, and remember METR found that AIs often couldn’t complete more realistic versions of the tasks which the benchmark counted them as passing. (Couldn’t find a source for this claim, but I remember seeing it somewhere.) I decide maybe when models will hit real world 6-month task 90% completion rate should maybe be a couple more doubling times of the 90 time-horizon METR metric forward. I move my forecast for human-level coders to, say, 15 months after the original to reflect this. Am I making a subjective guess, or relying on data? When I made the adjustment to reflect issues about construct validity, did that make my forecast more subjective? If so, did it make it worse, or did it make it better? I would say better, and I think you’d probably agree, even if you still think the forecast is bad.
This geopolitical example here is not particularly hypothetical. I genuinely get paid to do this for Good Judgment, and not ONLY by EA orgs, although often it is by them. We don’t know who the clients are, but some questions have been clearly commercial in nature and of zero EA interest.
I’m not particular offended* if you think this kind of “get allegedly expert forecasters, rather than or as well as domain experts to predict stuff” is nonsense. I do it because people pay me and it’s great fun, rather than because I have seriously investigated it’s value. But what I do disagree with the idea that this is distinctively a Less Wrong rationalist thing. There’s a whole history of relatively well-known work on it by the American political scientists Philip Tetlock that I think began when Yudkowsky was literally still a child. It’s out of that work that Good Judgment, that org for which I work as a forecaster comes, not anything to do with Less Wrong. It’s true that LessWrong rationalists are often enthusiastic about it, but that’s not all that interesting on its own. (In general many Yudkowskian ideas actually seem derived from quite mainstream sources on rationality and decision-making to me. I would not reject them just because you don’t like what LW does with them. Bayesian epistemology is a real research program in philosophy for example.)
*Or at least, I am trying my best not to be offended, because I shouldn’t be, but of course I am human and objectivity about something I derive status and employment from is hard. Though I did have a cool conversation at the least EAG London with a very good forecaster who thought it was terrible Open Phil put money into forecasting because it just wasn’t very useful or important.
What does the research literature say about the accuracy of short-term (e.g. 1-year timescales) geopolitical forecasting?
And what does the research literature say about the accuracy of long-term (e.g. longer than 5-year timescales) forecasting about technological progress?
(Should you even bother to check the literature to find out, or should you just guess how accurate you think each one probably is and leave it at that?)
When you say well under 0.05% chance of AGI in 10 years, is that a subjective guess?
Of course. And I’ll add that I think such guesses, including my own, have very little meaning or value. It may even be worse to make them than to not make them at all.
But then, instead of stopping, and saying that is my forecast, I remember that benchmark performance is generally a bit misleading in terms of real-world competence, and remember METR found that AIs often couldn’t complete more realistic versions of the tasks which the benchmark counted them as passing. (Couldn’t find a source for this claim, but I remember seeing it somewhere.)
This seems like a huge understatement. My impression is that the construct validity and criterion validity of the benchmarks METR uses, i.e. how much benchmark performance translates into real world performance, is much worse than you describe.
I think it would be closer to the truth to say if you’re trying to predict when AI systems will replace human coders, the benchmarks are meaningless and should be completely ignored. I’m not saying that’s the absolute truth, just that’s it’s closer to the truth than saying benchmark performance is “generally a bit misleading in terms of real-world competence”.
Probably there’s some loose correlation between benchmark performance and real-world competence, but it’s not nearly one-to-one.
I decide maybe when models will hit real world 6-month task 90% completion rate should maybe be a couple more doubling times of the 90 time-horizon METR metric forward. I move my forecast for human-level coders to, say, 15 months after the original to reflect this. Am I making a subjective guess, or relying on data?
Definitely making a subjective guess. For example, what if performance on benchmarks simply never generalizes to real world performance? Never, ever, ever, not in a million years never?
By analogy, what level of performance on go would AlphaGo need to achieve before you would guess it would be capable of baking a delicious croissant? Maybe these systems just can’t do what you’re expecting them to do. And a chart can’t tell you whether that’s true or not.
What about if I am doing what AI 2027 did and trying to predict when LLMs match human coding ability on the basis of current data.
AI 2027 admits the role that gut intuition plays in their forecast. For example:
Disclaimer added Dec 2025: This forecast relies substantially on intuitive judgment, and involves high levels of uncertainty. Unfortunately, we believe that incorporating intuitive judgment is necessary to forecast timelines to highly advanced AIs, since there simply isn’t enough evidence to extrapolate conclusively.
An example intuition:
Intuitively it feels like once AIs can do difficult long-horizon tasks with ground truth external feedback, it doesn’t seem that hard to generalize to more vague tasks. After all, many of the sub-tasks of the long-horizon tasks probably involved using similar skills.
Okay, and what if it is hard? What if this kind of generalization is beyond the capabilities of current deep learning/deep RL systems? What if takes 20+ years of research to figure out? Then the whole forecast is out the window.
What’s the reward signal for vague tasks? This touches on open research problems that have existed in deep RL for many years. Why is this going to be fully solved within the next 2-4 years? Because “intuitively, it feels like” it will be?
Another example is online learning, which is a form of continual learning. AI 2027 highlights this capability:
Agent-2, more so than previous models, is effectively “online learning,” in that it’s built to never really finish training. Every day, the weights get updated to the latest version, trained on more data generated by the previous version the previous day.
But I can’t find anywhere else in any of the AI 2027 materials where they discuss online learning or continual learning. Are they thinking that online learning will not be one of the capabilities humans will have to invent? That AI will be able to invent online learning without first needing online learning to be able to invent such things? What does the scenario actually assume about online learning? Is it important or not? Is it necessary or unnecessary? And will it be something humans invent or AI invents?
When I tried to find what the AI 2027 authors have said about this, I found an 80,000 Hours Podcast interview where Daniel Kokotajlo said a few things about online learning, such as the following:
Luisa Rodriguez: OK. So it sounds like some people will think that these persistent deficiencies will be long-term bottlenecks. And you’re like, no, we’ll just pour more resources into the thing doing the thing that it does well, and that will get us a long way to —
Daniel Kokotajlo: Probably. To be clear, I’m not confident. I would say that there’s like maybe a 30% or 40% chance that something like this is true, and that the current paradigm basically peters out over the next few years. And probably the companies still make a bunch of money by making iterations on the current types of systems and adapting them for specific tasks and things like that.
And then there’s a question of when will the data efficiency breakthroughs happen, or when will the online learning breakthroughs happen, or whatever the thing is. And then this is an incredibly wealthy industry right now, and paradigm shifts of this size do seem to be happening multiple times a decade, arguably: think about the difference between the current AIs and the AIs of 2015. The whole language model revolution happened five years ago, the whole scaling laws thing like six, seven years ago. And now also AI agents — training the AIs to actually do stuff over long periods — that’s happening in the last year.
So it does feel to me like even if the literal, exact current paradigm plateaus, there’s a strong chance that sometime in the next decade — maybe 2033, maybe 2035, maybe 2030 — the huge amount of money and research going into overcoming these bottlenecks will succeed in overcoming these bottlenecks.
The other things Kokotajlo says in the interview about online learning and data efficiency are equally hazy and hand-wavy. It just comes down to his personal gut intuition. In the part I just quoted, he says maybe these fundamental research breakthroughs will happen in 2030-2035, but what if it’s more like 2070-2075, or 2130-2135? How would one come to know such a thing?
What historical precedent or scientific evidence do we have to support the idea that anyone can predict, with any accuracy, the time when new basic science will be discovered? As far as I know, this is not possible. So, what’s the point of AI 2027? Why did the authors write it and why did anyone other than the authors take it seriously?
nostalgebraist originally made this critique here, very eloquently.
(In general many Yudkowskian ideas actually seem derived from quite mainstream sources on rationality and decision-making to me. I would not reject them just because you don’t like what LW does with them. Bayesian epistemology is a real research program in philosophy for example.)
It can easily be true that Yudkowsky’s ideas about things are loosely derived from or inspired by ideas that make sense and that Yudkowsky’s don’t make a lick of sense themselves. I don’t think most self-identified Bayesians outside of the LessWrong community would agree with Yudkowsky’s rejection of institutional science, for instance. Yudkowsky’s irrationality says nothing about whether (the mainstream version of) Bayesianism is a good idea or not; whether (the mainstream version of) Bayesianism, or other ideas Yudkowsky draws from, are a good idea or not says nothing about whether Yudkowsky’s ideas are irrational.
By analogy, pseudoscience and crackpot physics are often loosely derived from or inspired by ideas in mainstream science. The correctness of mainstream science doesn’t imply the correctness of pseudoscience or crackpot physics. Conversely, the incorrectness of pseudoscience or crackpot physics doesn’t imply the incorrectness of mainstream science. It wouldn’t be a defense of a crackpot physics theory that it’s inspired by legitimate physics, and the legitimacy of the ideas Yudkowsky is drawing from isn’t a defense of Yudkowsky’s bizarre views.
I think forecasting is perfectly fine within the limitations that the scientific research literature on forecasting outlines. I think Yudkowsky’s personal twist on Aristotelian science or subjectively guessing which scientific propositions are true or false and then assuming he’s right (without seeking empirical evidence) because he thinks he has some kind of nearly superhuman intelligence — I think that’s absurd and that’s obviously not what people like Philip Tetlock have been advocating.
The personal honor of Yudkowsky, who I’ve barely read and don’t much like, or his influence on other people’s intellectual style. I am not a rationalist, though I’ve met some impressive people who probably are.
The specific judgment calls and arguments made in AI 2027.
Using the METR graph to forecast superhuman coders (even if I probably do think this is MORE reasonable than you do; but I’m not super-confident about its validity as a measure of real-world coding. But I was not trying to describe how I personally would forecast superhuman coders, but just to give a hypothetical case where making a forecast more “subjective” plausibly improves it.)
Rather what I took myself to be saying was:
Judgmental forecasting is not particularly a LW thing, and it is what AI2027 was doing, whether or not they were doing it well.
You can’t really avoid what you are calling “subjectivity” when doing judgmental forecasting, at least if that means not just projecting a trend in data and having done with it, but instead letting qualitative considerations effect the final number you give.
Sometimes it would clearly make a forecast better to make it more “subjective” if that just means less driven only by a projection of a trend in data into the future.
In predicting a low chance of AGI in the near term, you are also just making an informed guess influenced by data but also by qualitative considerations, argumemt, gut instinct etc. At that level of description, your forecast is just as “made up” as AI2027. (But of course this is completely compatible with the claim that some of AI2027′s specific guesses are not well-justified enough or implausible.)
Now, it may be that forecasting is useless here, because no one can predict how technology will develop five years out. But I’m pretty comfortable saying that if THAT is your view, then you really shouldn’t also be super-confident the chance of near-term AGI is low. Though I do think saying “this just can’t be forecasted reliably” on its own is consistent with criticizing people who are confident AGI is near.
Strong upvoted. Thank you for clarifying your views. That’s helpful. We might be getting somewhere.
With regard to AI 2027, I get the impression that a lot of people in EA and in the wider world were not initially aware that AI 2027 was an exercise in judgmental forecasting. The AI 2027 authors did not sufficiently foreground this in the presentation of their “results”. I would guess there are still a lot of people in EA and outside it who think AI 2027 is something more rigorous, empirical, quantitative, and/or scientific than a judgmental forecasting exercise.
I think this was a case of some people in EA being fooled or tricked (even if that was not the authors’ intention). They didn’t evaluate the evidence they were looking at properly. You were quick to agree with my characterization of AI 2027 as a forecast based on subjective intuitions. However, in one previous instance on the EA Forum, I also cited nostalgebraist’s eloquent post and made essentially the same argument I just made, and someone strongly disagreed. So, I think people are just getting fooled, thinking that evidence exists that really doesn’t.
What does the forecasting literature say about long-term technology forecasting? I’ve only looked into it a little bit, but generally technology forecasting seems really inaccurate, and the questions forecasters/experts are being asked in those studies seem way easier than forecasting something like AGI. So, I’m not sure there is a credible scientific basis for the idea of AGI forecasting.
I have been saying from the beginning and I’ll say once again that my forecast of the probability and timeline of AGI is just a subjective guess and there’s a high level of irreducible uncertainty here. I wish that people would stop talking so much about forecasting and their subjective guesses. This eats up an inordinate portion of the conversation, despite its low epistemic value and credibility. For months, I have been trying to steer the conversation away from forecasting toward object-level technical issues.
Initially, I didn’t want to give any probability, timeline, or forecast, but I realized the only way to be part of the conversation in EA is to “play the game” and say a number. I had hoped that would only be the beginning of the conversation, not the entire focus of the conversation forever.
You can’t squeeze Bayesian blood from a stone of uncertainty. You can’t know what you can’t know by an act of sheer will. Most discussion of AGI forecasting is wasted effort. Most of it is mostly pointless.
What is not pointless is understanding the object-level technical issues better. If anything helps with AGI forecasting accuracy (and that’s a big “if”), this will. But it also has other important advantages, such as:
Helping us understand what risks AGI might or might not pose
Helping us understand what we might be able to do, if anything, to prepare for AGI, and what we would need to know to usefully prepare
Getting a better sense of what kinds of technical or scientific research might be promising to fund in order to advance fundamental AI capabilities
Understanding the economic impact of generative AI
Possibly helping to inform a better picture of how the human mind works
And more topics besides these.
I would consider it a worthy contribution to the discourse to play some small part in raising the overall knowledge level of people in EA about the object-level technical issues relevant to the AI frontier and to AGI. Based on track records, technology forecasting may be mostly forlorn, but, based on track records, science certainly isn’t forlorn. Focusing on the science of AI rather than on an Aristotelian approach would be a beautiful return to Enlightenment values, away from the anti-scientific/anti-Enlightenment thinking that pervades much of this discourse.
By the way, in case it’s not already clear, saying there is a high level of irreducible uncertainty does not support funding whatever AGI-related research program people in EA might currently feel inclined to fund. The number of possible ways the mind could work and the number of possible paths the future could take is large, perhaps astronomically large, perhaps infinite. To arbitrarily seize on one and say that’s the one, pour millions of dollars into that — that is not justifiable.
I think what you are saying here is mostly reasonable, even if I am not sure how much I agree: it seems to turn on very complicated issue in the philosophy of probability/decision theory, and what you should do when accurate prediction is hard, and exactly how bad predictions have to be to be valueless. Having said that, I don’t think your going to succeed in steering conversation away from forecasts if you keep writing about how unlikely it is that AGI will arrive near term. Which you have done a lot, right?
I’m genuinely not sure how much EA funding for AI-related stuff even is wasted on your view. To a first approximation, EA is what Moskowitz and Tuna fund. When I look at Coefficient’s-i.e. what previously was Open Phil’s-7 most recent AI safety and governance grants here’s what I find:
1) A joint project of METR and RAND to develop new ways of assessing AI systems for risky capabilities.
2) “AI safety workshop field building” by BlueDot Impact
3) An AI governance workshop at ICML
4) “General support” for the Center for Governance of AI.
5) A “study on encoded reasoning in LLMs at the University of Maryland”
So is this stuff bad or good on the worldview you’ve just described? I have no idea, basically. None of it is forecasting, plausibly it all broadly falls under either empirical research on current and very near future models, training new researchers, or governance stuff, though that depends on what “research on misalignment” means. But of course, you’d only endorse if it is good research. If you are worried about lack of academic credibility specifically, as far as I can tell 7 out of the 20 most recent grants are to academic research in universities. It does seem pretty obvious to me that significant ML research goes on at places other than universities, though, not least the frontier labs themselves.
I don’t really know all the specifics of all the different projects and grants, but my general impression is that very little (if any) of the current funding makes sense or can be justified if the goal is to do something useful about AGI (as opposed to, say, make sure Claude doesn’t give risky medical advice). Absent concerns about AGI, I don’t know if Coefficient Giving would be funding any of this stuff.
To make it a bit concrete, there at least five different proposed pathways to AGI, and I imagine the research Coefficient Giving is only relevant to one of the five pathways, if it’s even relevant to that one. But the number five is arbitrary here. The actual decision-relevant number might be a hundred, or a thousand, or a million, or infinity. It just doesn’t feel meaningful or practical to try to map out the full space of possible theories of how the mind works and apply the precautionary principle against the whole possibility space. Why not just do science instead?
By word count, I think I’ve written significantly more about object-level technical issues relevant to AGI than directly about AGI forecasts or my subjective guesses of timelines or probabilities. The object-level technical issues are what I’ve tried to emphasize. Unfortunately, commenters seem fixated on surveys, forecasts, and bets, and don’t seem to be as interested in the object-level technical topics. I keep trying to steer the conversation in a technical direction. But people keep wanting to steer it back toward forecasting, subjective guesses, and bets.
My post “Frozen skills aren’t general intelligence” mainly focuses on object-level technical issues, including some of the research problems discussed in the other post. You have the top comment on that post (besides SummaryBot) and your comment is about a forecasting survey.
People on the EA Forum are apparently just really into surveys, bets, and forecasts.
The forum is kind of a bit dead generally, for one thing.
I don’t really get on what grounds your are saying that the Coefficient Grants are not to people to do science, apart from the governance ones. I also think you are switching back and forth between: “No one knows when AGI will arrive, best way to prepare just in case is more normal AI science” and “we know that AGI is far, so there’s no point doing normal science to prepare against AGI now, although there might be other reasons to do normal science.”
If we don’t know which of infinite or astronomically many possible theories about AGI are more likely to be correct than the others, how can we prepare?
Maybe alignment techniques conceived based on our current wrong theory make otherwise benevolent and safe AGIs murderous and evil on the correct theory. Or maybe they’re just inapplicable. Who knows?
Not everything being funded here even IS alignment techniques, but also, insofar as you just want general better understanding of AI as a domain through science, why wouldn’t you learn useful stuff from applying techniques to current models. If the claim is that current models are too different from any possible AGI for this info to be useful, why do you think “do science” would help prepare for AGI at all? Assuming you do think that, which still seems unclear to me.
You might learn useful stuff about current models from research on current models, but not necessarily anything useful about AGI (except maybe in the slightest, most indirect way). For example, I don’t know if anyone thinks if we had invested 100x or 1,000x more into research on symbolic AI systems 30 years ago, that we would know meaningfully more about AGI today. So, as you anticipated, the relevance of this research to AGI depends on an assumption about the similar between a hypothetical future AGI and current models.
However, even if you think AGI will be similar to current models, or it might be similar, there might be no cost to delaying research related to alignment, safety, control, preparedness, value lock-in, governance, and so on until more fundamental research progress on capabilities has been made. If in five or ten or fifteen years or whatever we understand much better how AGI will be built, then a single $1 million grant to a few researchers might produce more useful knowledge about alignment, safety, etc. than Dustin Moskovitz’s entire net worth would produce today if it were spend on research into the same topics.
My argument about “doing basic science” vs. “mitigating existential risk” is that these collapse into the same thing unless you make very specific assumptions about which theory of AGI is correct. I don’t think those assumptions are justifiable.
Put it this way: let’s say we are concerned that, for reasons due to fundamental physics, the universe might spontaneously end. But we also suspect that, if this is true, there may be something we can do to prevent it. What we want to know is a) if the universe is in danger in the first place, b) if so, how soon, and c) if so, what we can do about it.
To know any of these three things, (a), (b), or (c), we need to know which fundamental theory of physics is correct, and what the fundamental physical properties of our universe are. Problem is, there are half a dozen competing versions of string theory, and within those versions, the number of possible variations that could describe our universe is astronomically large, 10^500, or 10^272,000, or possibly even infinite. We don’t know which variation correctly describes our universe.
Plus, a lot of physicists say string theory is a poorly conceived theory in the first place. Some offer competing theories. Some say we just don’t know yet. There’s no consensus. Everybody disagrees.
What does the “existential risk” framing get us? What action does it recommend? How does the precautionary principle apply? Let’s say you have a $10 billion budget. How do you spend it to mitigate existential risk?
I don’t see how this doesn’t just loop all the way back around to basic science. Whether there’s an existential risk, and if so, when we need to worry about it, and if when the time comes, what we can do about it, are all things we can only know if we figure out the basic science. How do we figure out the basic science? By doing the basic science. So, your $10 billion budget will just go to funding basic science, the same physics research that is getting funded anyway.
The space of possible theories about how the mind works is at least six, plus a lot of people saying we just don’t know yet, and there are probably silly but illustrative ways to formulate it where you get very large numbers.
For instance, if we think the correct theory can be summed up in just 100 bits of information, then the number of possible theories is 10,000.
Or we could imagine what would happen if we paid a very large number of experts from various relevant fields (e.g. philosophy, cognitive science, AI) a lot of money to spend a year coming up with a one-to-two-page description of as many original, distinct, even somewhat plausible or credible theories as they could think of. Then we group together all the submissions that were similar enough and counted them as the same theory. How many distinct theories would we end up with? A handful? Dozens? Hundreds? Thousands?
I’m aware these thought experiments are ridiculous, but I’m trying to emphasize the point that the space of possible ideas seems very large. At the frontier of knowledge in a domain like the science of the mind, which largely exists in a pre-scientific or protoscientific or pre-paradigmatic state, trying to actually map out the space of theories that might possibly be correct is a daunting task. Doing that well, to a meaningful extent, ultimately amounts to actually doing the science or advancing the frontier of knowledge yourself.
What is the right way to apply the precautionary principle in this situation? I would say the precautionary principle isn’t the right way to think about it. We would like to be precautionary, but we don’t know enough to know how to be. We’re in a situation of fundamental, wide-open uncertainty, at the frontier of knowledge, in a largely pre-scientific state of understanding about the nature of the mind and intelligence. So, we don’t know how to reduce risk — for example, our ideas on how to reduce risk might do nothing or they might increase risk.
I agree that the “unlikely” statistic leaves ample room for the majority of the field thinking there is a 10%+ chance, but it does not establish that the majority actually thinks that.
I think there are at least two (potentially overlapping) ways one could take the general concern that @Yarrow Bouchard 🔸 is identifying here. One, if accepted, leads to the substantive conclusion that EA individuals, orgs, and funders shouldn’t be nearly as focused on AI because the perceived dangers are just too remote. An alternative framing doesn’t necessarily lead there. It goes something like there has been a significant and worrisome decline in the quality of epistemic practices surrounding AI in EA since the advent of ChatGPT. If it—but not the other—framing is accepted, it leads in my view to a different set of recommended actions.
I flag that since I think the relevant considerations for assessing the alternative framing could be significantly different.
One not need choose between the two because they both point toward the same path: re-examine claims with greater scrutiny. There is no excuse for the egregious flaws in works like “Situational Awareness” and AI 2027. This is not serious scholarship. To the extent the EA community gets fooled by stuff like this, its reasoning process, and its weighing of evidence, will be severely impaired.
If you get rid of all the low-quality work and retrace all the steps of the argument from the beginning, might the EA community end up in basically the same place all over again, with a similar estimation of AGI risk and a similar allocation of resources toward it? Well, sure, it might. But it might not.
If your views are largely informed by falsehoods and ridiculous claims, half-truths and oversimplifications, greedy reductionism and measurements with little to no construct validity or criterion validity, and, in some cases, a lack of awareness of countervailing ideas or the all-too-eager dismissal of inconvenient evidence, then you simply don’t know what your views would end up being if you started all over again with more rigour and higher standards. The only appropriate response is to clear house. Put the ideas and evidence into a crucible and burn away what doesn’t belong. Then, start from the beginning and see what sort of conclusions can actually be justified with what remains.
A large part of the blame lies at the feet of LessWrong and at the feet of all the people in EA who decided, in some important cases quite early on, to mingle the two communities. LessWrong promotes skepticism and suspicion of academia, mainstream/institutional science, traditional forms of critical thinking and scientific skepticism, journalism, and society at large. At the same time, LessWrong promotes reverence and obsequence toward its own community, positioning itself as an alternative authority to replace academia, science, traditional critical thought, journalism, and mainstream culture. Not innocently. LessWrong is obsessed with fringe thinking. The community has created multiple groups that Ozy Brennan describes as “cults”. Given how small the LessWrong community is, I’d have to guess that the rate at which the community creates cults must be multiple orders of magnitude higher than the base rate for the general population.
LessWrong is also credulous about racist pseudoscience, and, in the words of a former Head of Communications at the Centre for Effective Altruism, is largely “straight-up racist”. One of the admins of LessWrong and co-founders of Lightcone Infrastructure once said, in the context of a discussion about the societal myth that gay people are evil or malicious and a danger to children:
Such statements make “rationalist” a misnomer. (I was able to partially dissuade of him of this nonsense by showing him some of the easily accessible evidence he could have looked up for himself, but the community did not seem to particularly value my intervention.)
I don’t know that the epistemic practices of the EA community can be rescued as long the EA community remains interpenetrated with LessWrong to a major degree. The purpose of LessWrong is not to teach rationality, but to disable one’s critical faculties until one is willing to accept nonsense. Perhaps it is futile to clamour for better-quality scholarship when such a large undercurrent of the EA community is committed to the idea that normal ideas of what constitutes good scholarship are wrong and that the answers to what constitutes actually good scholarship lie with Eliezer Yudkowsky, an amateur philosopher with no relevant qualifications or achievements in any field, who frequently speaks with absolute confidence and is wrong, who experts often find non-credible, who has said he literally sees himself as the smartest person on Earth, and who rarely admits mistakes (despite making many) or issues corrections. If Yudkowsky is your highest and more revered authority, if you follow him in rejecting academia, institutional science, mainstream philosophy, journalism, normal culture, and so on, then I don’t know what could possibly convince you that the untrue things you believe are untrue, since your fundamental epistemology comes down to whether Yudkowsky says something is true or not, and he’s told you to reject all other sources of truth.
To the extent the EA community is under LessWrong’s spell, it will probably remain systemically irrational forever. Only within the portions of the EA community who have broken that spell, or never come under it in the first place, is there the hope for academic standards, mainstream scientific standards, traditional critical thinking, journalistic fact-checking, culturally evolved wisdom, and so on to take hold. It would be like expecting EA to be rational about politics while 30% of the community is under the spell of QAnon, or to be rational about global health while a large part of the community is under the spell of anti-vaccination pseudoscience. It’s just not gonna happen.
But maybe my root cause analysis is wrong and the EA community can course correct without fundamentally divorcing LessWrong. I don’t know. I hope that, whatever is the root cause, whatever it takes to fix it, the EA community’s current low standards for evidence and argumentation pertaining to AGI risk get raised significantly.
I don’t think it’s a brand new problem, by the way. Around 2016, I was periodically arguing with people about AI on the main EA group on Facebook. One of my points of contention was that MIRI’s focus on symbolic AI was a dead-end and that machine learning had empirically produced much better results, and was where the AI field was now focused. (MIRI took a long time before they finally hired their first researcher to focus on machine learning.) I didn’t have any more success convincing people about that back then than I’ve been having lately with my current points of contention.
I agree though that the situation seems to have gotten much worse in recent years, and ChatGPT (and LLMs in general) probably had a lot to do with that.
I don’t think EAs AI focus is a product only of interaction with Less Wrong,-not claiming you said otherwise-but I do think people outside the Less Wrong bubble tend to be less confident AGI is imminent, and in that sense less “cautious”.
I think EAs AI focus is largely a product of the fact that Nick Bostrom knew Will and Toby when they were founding EA, and was a big influence on their ideas. Of course, to some degree this might be indirect influence from Yudkowsky since he was always interacting with Nick Bostrom, but it’s hard to know in what direction the influenced flowed here. I was around in Oxford during the embryonic stages of EA, and while I was not involved-beyond being a GWWC member, I did have the odd conversation with people who were involved, and my memory is that even then, people were talking about X-risk from AI as a serious contender for the best cause area, as early as at least 2014, and maybe a bit before that. They -EDIT: by “they” here I mean, “some people in Oxford, I don’t remember who”; don’t know when Will and Toby specifically first interacted with LW folk-were involved in discussion with LW people, but I don’t think they got the idea FROM LW. Seems more likely to me they got it from Bostrom and the Future of Humanity Institute, who were just down the corridor.
What is true is that Oxford people have genuinely expressed much more caution about timelines. I.e. in What We Owe the Future, published as late as 2022, Will is still talking about how AGI might be more than 50 years, away but also “it might come soon-within the next fifty or even twenty years.” (If you’re wondering what evidence he cites, it’s the Cotra bioanchors report.) His discussion primarily emphasizes uncertainty about exactly when AGI will arrive, and how we can’t be confident it’s not close. He cites a figure from an Open Phil report guessing an 8% chance of AGI by 2036*. I know you’re view is that this is all wildly wrong still, but it’s quite different from what many-not all-Less Wrong people say, who tend to regard 20 years as a long time line. (Maybe Will has updated to shorter timelines since of course.)
I think there is something of a divide between people who believe strongly in a particular set of LessWrong derived ideas about the imminence of AGI, and another set of people who are mainly driven by something like “we should take positive EV bets with a small chance of paying off, and doing AI stuff just in case AGI arrives soon”. Defending the point about taking positive EV bets with only a small chance of pay-off is what a huge amount of the academic work on Longtermism at the GPI in Oxford was about. (This stuff definitely has been subjected to-severe-levels of peer reviewed scrutiny, as it keeps showing up in top philosophy journals with rejection rates of like, 90%.)
*This is more evidence people were prepared to bet big on AI risk long before the idea that AGI is actually imminent became as popular as it is now. I think people just rejected the idea that useful work could only be done when AGI was definitely near, and we had near-AGI models.
eh, I think the main reason EAs believe AGI stuff is reasonably likely is because this opinion is correct, given the best available evidence[1].
Having a genealogical explanation here is sort of answering the question on the wrong meta-level, like giving a historical explanation for “why do evolutionists believe in genes” or telling a touching story about somebody’s pet pig for “why do EAs care more about farmed animal welfare than tree welfare.”
Or upon hearing “why does Google use ads instead of subscriptions?” answering with the history of their DoubleClick acquisition. That history is real, but it’s downstream of the actual explanation: the economics of internet search heavily favor ad-supported models regardless of the specific path any company took. The genealogy is epiphenomenal.
The historical explanations are thus mildly interesting but they conflate the level of why.
EDIT: man I’m worried my comment will be read as a soldier-mindset thing that only makes sense if you presume the “AGI likely soon” is already correct. Which does not improve on the conversation. Please only upvote it iff a version of you that’s neutral on the object-level question would also upvote this comment.
Which is a different claim from whether it’s ultimately correct. Reality is hard.
Yeah, it’s fair objection that even answer the why question like I did presupposes that EAs are wrong, or at least, merely luckily right. (I think this is a matter of degree, and that EAs overrated the imminence of AGI and the risk of takeover on average, but it’s still at least reasonable to believe AI safety and governance work can have very high expected value for roughly the reasons EAs do.) But I was responding to Yarrow who does think that EAs are just totally wrong, so I guess really I was saying that “conditional on a sociological explanation being appropriate, I don’t think it’s as LW-driven as Yarrow thinks”, although LW is undoubtedly important.)
Right, to be clear I’m far from certain that the stereotypical “EA view” is right here.
Sure that makes a lot of sense! I was mostly just using your comment to riff on a related concept.
I think reality is often complicated and confusing, and it’s hard to separate out contingency vs inevitable stories for why people believe what they believe. But I think the correct view is that EAs’ belief on AGI probability and risk (within an order of magnitude or so) is mostly not contingent (as of the year 2025) even if it turns out to be ultimately wrong.
The Google ads example was the best example I could think of to illustrate this. I’m far from certain that Google’s decision to use ads was actually the best source of long-term revenue (never mind being morally good lol). But it still seemed like the internet as we understand it meant it was implausible that Google ads was counterfactually due to their specific acquisitions.
Similarly, even if EAs ignored AI before for some reason, and never interacted with LW or Bostrom, it’s implausible that, as of 2025, people who are concerned with ambitious, large-scale altruistic impact (and have other epistemic, cultural, and maybe demographic properties characteristic of the movement) would not think of AI as a big deal. AI is just a big thing in the world that’s growing fast. Anybody capable of reading graphs can see that.
That said, specific micro-level beliefs (and maybe macro ones) within EA and AI risk might be different without influence from either LW or the Oxford crowd. For example there might be a stronger accelerationist arm. Alternatively, people might be more queasy with the closeness with the major AI companies, and there will be a stronger and more well-funded contingent of folks interested in public messaging on pausing or stopping AI. And in general if the movement didn’t “wake up” to AI concerns at all pre-ChatGPT I think we’d be in a more confused spot.
How many angels can dance on the head on a pin? An infinite number because angels have no spatial extension? Or maybe if we assume angels have a diameter of ~1 nanometre plus ~1 additional nanometre of diameter for clearance for dancing we can come up with a ballpark figure? Or, wait, are angels closer to human-sized? When bugs die do they turn into angels? What about bacteria? Can bacteria dance? Are angels beings who were formerly mortal, or were they “born” angels?[1]
Well, some of the graphs are just made-up, like those in “Situational Awareness”, and some of the graphs are woefully misinterpreted to be about AGI when they’re clearly not, like the famous METR time horizon graph.[2] I imagine that a non-trivial amount of EA misjudgment around AGI results from a failure to correctly read and interpret graphs.
And, of course, when people like titotal examine the math behind some of these graphs, like those in AI 2027, they are sometimes found to be riddled with major mistakes.
What I said elsewhere about AGI discourse in general is true about graphs in particular: the scientifically defensible claims are generally quite narrow, caveated, and conservative. The claims that are broad, unqualified, and bold are generally not scientifically defensible. People at METR themselves caveat the time horizons graph and note its narrow scope (I cited examples of this elsewhere in the comments on this post). Conversely, graphs that attempt to make a broad, unqualified, bold claim about AGI tend to be complete nonsense.
Out of curiosity, roughly what probability would you assign to there being an AI financial bubble that pops sometime within the next five years or so? If there is an AI bubble and if it popped, how would that affect your beliefs around near-term AGI?
How is correctness physically instantiated in space and time and how does it physically cause physical events in the world, such as speaking, writing, brain activity, and so on? Is this an important question to ask in this context? Do we need to get into this?
You can take an epistemic practice in EA such as “thinking that Leopold Aschenbrenner’s graphs are correct” and ask about the historical origin of that practice without making a judgement about whether the practice is good or bad, right or wrong. You can ask the question in a form like, “How did people in EA come to accept graphs like those in ‘Situational Awareness’ as evidence?” If you want to frame it positively, you could ask the question as something like, “How did people in EA learn to accept graphs like these as evidence?” If you want to frame it negatively, you could ask, “How did people in EA not learn not to accept graphs like these as evidence?” And of course you can frame it neutrally.
The historical explanation is a separate question from the evaluation of correctness/incorrectness and the two don’t conflict with each other. By analogy, you can ask, “How did Laverne come to believe in evolution?” And you could answer, “Because it’s the correct view,” which would be right, in a sense, if a bit obtuse, or you could answer, “Because she learned about evolution in her biology classes in high school and college”, which would also be right, and which would more directly answer the question. So, a historical explanation does not necessarily imply that a view is wrong. Maybe in some contexts it insinuates it, but both kinds of answers can be true.
But this whole diversion has been unnecessary.
Do you know a source that formally makes the argument that the METR graph is about AGI? I am trying to pin down the series of logical steps that people are using to get from that graph to AGI. I would like to spell out why I think this inference is wrong, but first it would be helpful to see someone spell out the inference they’re making.
Upvoted because I think this is interesting historical/intellectual context, but I think you might have misunderstood what I was trying to say in the comment you replied to. (I joined Giving What We Can in 2009 and got heavily involved in my university EA group from 2015-2018, so I’m aware that AI has been a big topic in AI for a very long time, but I’ve never had any involvement with Oxford University or had any personal connections with Toby Ord or Will MacAskill, besides a few passing online interactions.)
In my comment above, I wasn’t saying that EA’s interpenetration with LessWrong is largely to blame for the level of importance that the ideas of near-term AGI and AGI risk currently have in EA. (I also think that is largely true, but that wasn’t the point of my previous comment.) I was saying that the influence of LessWrong and EA’s embrace of the LessWrong subculture is largely to blame for the EA community accepting ridiculous stuff like “Situational Awareness”, AI 2027, and so on, despite it having glaring flaws.
Focus on AGI risk at the current level EA gives it could be rational, or it might not be. What is definitely true is that the EA community accepts a lot of completely irrational stuff related to AGI risk. LessWrong doesn’t believe in academia, institutional science, academic philosophy, journalism, scientific skepticism, common sense, and so on. LessWrong believes in Eliezer Yudkowsky, the Sequences, and LessWrong. So, members of the LessWrong community go completely off the rails and create or join cults at seemingly a much, much higher rate than the general population. Because they’ve been coached to reject the foundations of sanity that most people have, and to put their trust and belief in this small, fringe community.
The EA community is not nearly as bad as LessWrong. If I thought it was as bad, I wouldn’t bother trying to convince anyone in EA of anything, because I would think they were beyond rational persuasion. But EA has been infected to a very significant degree by the LessWrong irrationality. I think the level of emphasis that EA puts on subjective guesses as a source of truth and an accompanying sort of lazy, incurious approach to inquiry (why look stuff up or attempt to create a rigorous, defensible thesis when you can just guess stuff?) is one example of the LessWrong influence. Eliezer Yudkowsky quite literally, explicitly believes that his subjective guesses are a better guide to truth than the approach of traditional, mainstream scientific institutions and communities. Yudkowsky has attempted to teach his approach to subjectively guessing things to the LessWrong community (and enjoyers of Harry Potter fanfic). That approach has leeched into the EA community.
The result is you can have things like “Situational Awareness” and AI 2027 where the “data” is made-up and just consists of some random people’s random subjective guesses. This is the kind of stuff that should never be taken even a little bit seriously.
If you want to know which approach produces better results, look at the achievements of academic science — which underlie basically the entire modern world — versus the achievements of the LessWrong community — some Harry Potter fanfic and about half a dozen cults, despite believing their approach is unambiguously superior. If you adjust for time and population, the comparison still comes out favourably for science versus Yudkowskian subjective guessology. How many towns of under 5,000 people create even a single cult within even the span of 50 years? Versus the LessWrong community creating multiple cults within 16 years of its existence.
I could be totally wrong in my root cause analysis. EA may have developed these bad habits independently of LessWrong. In any case, I think it’s clear that these are bad habits, that they lead nowhere good, and that EA should clear house (i.e. stop believing in subjective guess-based or otherwise super low-quality argumentative writing) and raise the bar for the quality of arguments and evidence that are taken seriously to something a bit closer to the academic or scientific level.
I don’t have an idyllic view of academia. I don’t think it’s all black-and-white. I recently re-read a review of Colin McGinn’s ridiculous book on the philosophy of physics. On one hand, the descriptions of and quotes from the book reminded me of all the stuff that drives me crazy in academic philosophy. On the other hand, the reviewer is a philosopher and her review is published in a philosophy journal. So, there’s a push and pull.
Maybe a good analogy for academia is liberal democracy. It’s often a huge mess, full of ongoing conflicts and struggles, frequently unjust and unreasonable, but ultimately it produces an astronomical amount of value, rivalling the best of anything humans have ever done. By vouching for academia or liberal democracy, I’m not saying it’s all good, I’m just saying that the overall process is good. And the process itself (in both cases) can surely be improved, but through reform and evolution involving a lot of people with expertise, not by a charismatic outsider with a zealous following (e.g. illiberal/authoritarian strongmen, in the case of government, or someone like Yudkowsky, in the case of academia, who, incidentally, has a bit of an authoritarian attitude, not politically, but intellectually).
Can you say more about what makes something “a subjective guess” for you? When you say well under 0.05% chance of AGI in 10 years, is that a subjective guess?
Like, suppose I am asked, as a pro-forecaster, to say whether the US will invade Syria, after a US military build-up involving air craft carriers in the Eastern Med, and I look for newspaper reports of signs of this, look up the base rate of how often the US bluffs with a military build up rather than invading, and then make a guess as to how likely an invasion is, is that “a subjective guess”. Or am I relying on data? What about if I am doing what AI 2027 did and trying to predict when LLMs match human coding ability on the basis of current data. Suppose I use the METR data like they did, and I do the following. I assume that if AIs are genuinely able to complete 90% of real world tasks that take human coders 6 months, then they are likely as good at coding as humans. I project the METR data out to find a date for when we will hit 6-months tasks, theoretically if the trend continues. But then, instead of stopping, and saying that is my forecast, I remember that benchmark performance is generally a bit misleading in terms of real-world competence, and remember METR found that AIs often couldn’t complete more realistic versions of the tasks which the benchmark counted them as passing. (Couldn’t find a source for this claim, but I remember seeing it somewhere.) I decide maybe when models will hit real world 6-month task 90% completion rate should maybe be a couple more doubling times of the 90 time-horizon METR metric forward. I move my forecast for human-level coders to, say, 15 months after the original to reflect this. Am I making a subjective guess, or relying on data? When I made the adjustment to reflect issues about construct validity, did that make my forecast more subjective? If so, did it make it worse, or did it make it better? I would say better, and I think you’d probably agree, even if you still think the forecast is bad.
This geopolitical example here is not particularly hypothetical. I genuinely get paid to do this for Good Judgment, and not ONLY by EA orgs, although often it is by them. We don’t know who the clients are, but some questions have been clearly commercial in nature and of zero EA interest.
I’m not particular offended* if you think this kind of “get allegedly expert forecasters, rather than or as well as domain experts to predict stuff” is nonsense. I do it because people pay me and it’s great fun, rather than because I have seriously investigated it’s value. But what I do disagree with the idea that this is distinctively a Less Wrong rationalist thing. There’s a whole history of relatively well-known work on it by the American political scientists Philip Tetlock that I think began when Yudkowsky was literally still a child. It’s out of that work that Good Judgment, that org for which I work as a forecaster comes, not anything to do with Less Wrong. It’s true that LessWrong rationalists are often enthusiastic about it, but that’s not all that interesting on its own. (In general many Yudkowskian ideas actually seem derived from quite mainstream sources on rationality and decision-making to me. I would not reject them just because you don’t like what LW does with them. Bayesian epistemology is a real research program in philosophy for example.)
*Or at least, I am trying my best not to be offended, because I shouldn’t be, but of course I am human and objectivity about something I derive status and employment from is hard. Though I did have a cool conversation at the least EAG London with a very good forecaster who thought it was terrible Open Phil put money into forecasting because it just wasn’t very useful or important.
What does the research literature say about the accuracy of short-term (e.g. 1-year timescales) geopolitical forecasting?
And what does the research literature say about the accuracy of long-term (e.g. longer than 5-year timescales) forecasting about technological progress?
(Should you even bother to check the literature to find out, or should you just guess how accurate you think each one probably is and leave it at that?)
Of course. And I’ll add that I think such guesses, including my own, have very little meaning or value. It may even be worse to make them than to not make them at all.
This seems like a huge understatement. My impression is that the construct validity and criterion validity of the benchmarks METR uses, i.e. how much benchmark performance translates into real world performance, is much worse than you describe.
I think it would be closer to the truth to say if you’re trying to predict when AI systems will replace human coders, the benchmarks are meaningless and should be completely ignored. I’m not saying that’s the absolute truth, just that’s it’s closer to the truth than saying benchmark performance is “generally a bit misleading in terms of real-world competence”.
Probably there’s some loose correlation between benchmark performance and real-world competence, but it’s not nearly one-to-one.
Definitely making a subjective guess. For example, what if performance on benchmarks simply never generalizes to real world performance? Never, ever, ever, not in a million years never?
By analogy, what level of performance on go would AlphaGo need to achieve before you would guess it would be capable of baking a delicious croissant? Maybe these systems just can’t do what you’re expecting them to do. And a chart can’t tell you whether that’s true or not.
AI 2027 admits the role that gut intuition plays in their forecast. For example:
An example intuition:
Okay, and what if it is hard? What if this kind of generalization is beyond the capabilities of current deep learning/deep RL systems? What if takes 20+ years of research to figure out? Then the whole forecast is out the window.
What’s the reward signal for vague tasks? This touches on open research problems that have existed in deep RL for many years. Why is this going to be fully solved within the next 2-4 years? Because “intuitively, it feels like” it will be?
Another example is online learning, which is a form of continual learning. AI 2027 highlights this capability:
But I can’t find anywhere else in any of the AI 2027 materials where they discuss online learning or continual learning. Are they thinking that online learning will not be one of the capabilities humans will have to invent? That AI will be able to invent online learning without first needing online learning to be able to invent such things? What does the scenario actually assume about online learning? Is it important or not? Is it necessary or unnecessary? And will it be something humans invent or AI invents?
When I tried to find what the AI 2027 authors have said about this, I found an 80,000 Hours Podcast interview where Daniel Kokotajlo said a few things about online learning, such as the following:
The other things Kokotajlo says in the interview about online learning and data efficiency are equally hazy and hand-wavy. It just comes down to his personal gut intuition. In the part I just quoted, he says maybe these fundamental research breakthroughs will happen in 2030-2035, but what if it’s more like 2070-2075, or 2130-2135? How would one come to know such a thing?
What historical precedent or scientific evidence do we have to support the idea that anyone can predict, with any accuracy, the time when new basic science will be discovered? As far as I know, this is not possible. So, what’s the point of AI 2027? Why did the authors write it and why did anyone other than the authors take it seriously?
nostalgebraist originally made this critique here, very eloquently.
It can easily be true that Yudkowsky’s ideas about things are loosely derived from or inspired by ideas that make sense and that Yudkowsky’s don’t make a lick of sense themselves. I don’t think most self-identified Bayesians outside of the LessWrong community would agree with Yudkowsky’s rejection of institutional science, for instance. Yudkowsky’s irrationality says nothing about whether (the mainstream version of) Bayesianism is a good idea or not; whether (the mainstream version of) Bayesianism, or other ideas Yudkowsky draws from, are a good idea or not says nothing about whether Yudkowsky’s ideas are irrational.
By analogy, pseudoscience and crackpot physics are often loosely derived from or inspired by ideas in mainstream science. The correctness of mainstream science doesn’t imply the correctness of pseudoscience or crackpot physics. Conversely, the incorrectness of pseudoscience or crackpot physics doesn’t imply the incorrectness of mainstream science. It wouldn’t be a defense of a crackpot physics theory that it’s inspired by legitimate physics, and the legitimacy of the ideas Yudkowsky is drawing from isn’t a defense of Yudkowsky’s bizarre views.
I think forecasting is perfectly fine within the limitations that the scientific research literature on forecasting outlines. I think Yudkowsky’s personal twist on Aristotelian science or subjectively guessing which scientific propositions are true or false and then assuming he’s right (without seeking empirical evidence) because he thinks he has some kind of nearly superhuman intelligence — I think that’s absurd and that’s obviously not what people like Philip Tetlock have been advocating.
I’m not actually that interested in defending:
The personal honor of Yudkowsky, who I’ve barely read and don’t much like, or his influence on other people’s intellectual style. I am not a rationalist, though I’ve met some impressive people who probably are.
The specific judgment calls and arguments made in AI 2027.
Using the METR graph to forecast superhuman coders (even if I probably do think this is MORE reasonable than you do; but I’m not super-confident about its validity as a measure of real-world coding. But I was not trying to describe how I personally would forecast superhuman coders, but just to give a hypothetical case where making a forecast more “subjective” plausibly improves it.)
Rather what I took myself to be saying was:
Judgmental forecasting is not particularly a LW thing, and it is what AI2027 was doing, whether or not they were doing it well.
You can’t really avoid what you are calling “subjectivity” when doing judgmental forecasting, at least if that means not just projecting a trend in data and having done with it, but instead letting qualitative considerations effect the final number you give.
Sometimes it would clearly make a forecast better to make it more “subjective” if that just means less driven only by a projection of a trend in data into the future.
In predicting a low chance of AGI in the near term, you are also just making an informed guess influenced by data but also by qualitative considerations, argumemt, gut instinct etc. At that level of description, your forecast is just as “made up” as AI2027. (But of course this is completely compatible with the claim that some of AI2027′s specific guesses are not well-justified enough or implausible.)
Now, it may be that forecasting is useless here, because no one can predict how technology will develop five years out. But I’m pretty comfortable saying that if THAT is your view, then you really shouldn’t also be super-confident the chance of near-term AGI is low. Though I do think saying “this just can’t be forecasted reliably” on its own is consistent with criticizing people who are confident AGI is near.
Strong upvoted. Thank you for clarifying your views. That’s helpful. We might be getting somewhere.
With regard to AI 2027, I get the impression that a lot of people in EA and in the wider world were not initially aware that AI 2027 was an exercise in judgmental forecasting. The AI 2027 authors did not sufficiently foreground this in the presentation of their “results”. I would guess there are still a lot of people in EA and outside it who think AI 2027 is something more rigorous, empirical, quantitative, and/or scientific than a judgmental forecasting exercise.
I think this was a case of some people in EA being fooled or tricked (even if that was not the authors’ intention). They didn’t evaluate the evidence they were looking at properly. You were quick to agree with my characterization of AI 2027 as a forecast based on subjective intuitions. However, in one previous instance on the EA Forum, I also cited nostalgebraist’s eloquent post and made essentially the same argument I just made, and someone strongly disagreed. So, I think people are just getting fooled, thinking that evidence exists that really doesn’t.
What does the forecasting literature say about long-term technology forecasting? I’ve only looked into it a little bit, but generally technology forecasting seems really inaccurate, and the questions forecasters/experts are being asked in those studies seem way easier than forecasting something like AGI. So, I’m not sure there is a credible scientific basis for the idea of AGI forecasting.
I have been saying from the beginning and I’ll say once again that my forecast of the probability and timeline of AGI is just a subjective guess and there’s a high level of irreducible uncertainty here. I wish that people would stop talking so much about forecasting and their subjective guesses. This eats up an inordinate portion of the conversation, despite its low epistemic value and credibility. For months, I have been trying to steer the conversation away from forecasting toward object-level technical issues.
Initially, I didn’t want to give any probability, timeline, or forecast, but I realized the only way to be part of the conversation in EA is to “play the game” and say a number. I had hoped that would only be the beginning of the conversation, not the entire focus of the conversation forever.
You can’t squeeze Bayesian blood from a stone of uncertainty. You can’t know what you can’t know by an act of sheer will. Most discussion of AGI forecasting is wasted effort. Most of it is mostly pointless.
What is not pointless is understanding the object-level technical issues better. If anything helps with AGI forecasting accuracy (and that’s a big “if”), this will. But it also has other important advantages, such as:
Helping us understand what risks AGI might or might not pose
Helping us understand what we might be able to do, if anything, to prepare for AGI, and what we would need to know to usefully prepare
Getting a better sense of what kinds of technical or scientific research might be promising to fund in order to advance fundamental AI capabilities
Understanding the economic impact of generative AI
Possibly helping to inform a better picture of how the human mind works
And more topics besides these.
I would consider it a worthy contribution to the discourse to play some small part in raising the overall knowledge level of people in EA about the object-level technical issues relevant to the AI frontier and to AGI. Based on track records, technology forecasting may be mostly forlorn, but, based on track records, science certainly isn’t forlorn. Focusing on the science of AI rather than on an Aristotelian approach would be a beautiful return to Enlightenment values, away from the anti-scientific/anti-Enlightenment thinking that pervades much of this discourse.
By the way, in case it’s not already clear, saying there is a high level of irreducible uncertainty does not support funding whatever AGI-related research program people in EA might currently feel inclined to fund. The number of possible ways the mind could work and the number of possible paths the future could take is large, perhaps astronomically large, perhaps infinite. To arbitrarily seize on one and say that’s the one, pour millions of dollars into that — that is not justifiable.
I think what you are saying here is mostly reasonable, even if I am not sure how much I agree: it seems to turn on very complicated issue in the philosophy of probability/decision theory, and what you should do when accurate prediction is hard, and exactly how bad predictions have to be to be valueless. Having said that, I don’t think your going to succeed in steering conversation away from forecasts if you keep writing about how unlikely it is that AGI will arrive near term. Which you have done a lot, right?
I’m genuinely not sure how much EA funding for AI-related stuff even is wasted on your view. To a first approximation, EA is what Moskowitz and Tuna fund. When I look at Coefficient’s-i.e. what previously was Open Phil’s-7 most recent AI safety and governance grants here’s what I find:
1) A joint project of METR and RAND to develop new ways of assessing AI systems for risky capabilities.
2) “AI safety workshop field building” by BlueDot Impact
3) An AI governance workshop at ICML
4) “General support” for the Center for Governance of AI.
5) A “study on encoded reasoning in LLMs at the University of Maryland”
6) “Research on misalignment” here: https://www.meridiancambridge.org/labs
7) “Secure Enclaves for LLM Evaluation” here https://openmined.org/
So is this stuff bad or good on the worldview you’ve just described? I have no idea, basically. None of it is forecasting, plausibly it all broadly falls under either empirical research on current and very near future models, training new researchers, or governance stuff, though that depends on what “research on misalignment” means. But of course, you’d only endorse if it is good research. If you are worried about lack of academic credibility specifically, as far as I can tell 7 out of the 20 most recent grants are to academic research in universities. It does seem pretty obvious to me that significant ML research goes on at places other than universities, though, not least the frontier labs themselves.
I don’t really know all the specifics of all the different projects and grants, but my general impression is that very little (if any) of the current funding makes sense or can be justified if the goal is to do something useful about AGI (as opposed to, say, make sure Claude doesn’t give risky medical advice). Absent concerns about AGI, I don’t know if Coefficient Giving would be funding any of this stuff.
To make it a bit concrete, there at least five different proposed pathways to AGI, and I imagine the research Coefficient Giving is only relevant to one of the five pathways, if it’s even relevant to that one. But the number five is arbitrary here. The actual decision-relevant number might be a hundred, or a thousand, or a million, or infinity. It just doesn’t feel meaningful or practical to try to map out the full space of possible theories of how the mind works and apply the precautionary principle against the whole possibility space. Why not just do science instead?
By word count, I think I’ve written significantly more about object-level technical issues relevant to AGI than directly about AGI forecasts or my subjective guesses of timelines or probabilities. The object-level technical issues are what I’ve tried to emphasize. Unfortunately, commenters seem fixated on surveys, forecasts, and bets, and don’t seem to be as interested in the object-level technical topics. I keep trying to steer the conversation in a technical direction. But people keep wanting to steer it back toward forecasting, subjective guesses, and bets.
For example, I wrote a 2,000-word post called “Unsolved research problems on the road to AGI”. There are two top-level comments. The one with the most karma proposes a bet.
My post “Frozen skills aren’t general intelligence” mainly focuses on object-level technical issues, including some of the research problems discussed in the other post. You have the top comment on that post (besides SummaryBot) and your comment is about a forecasting survey.
People on the EA Forum are apparently just really into surveys, bets, and forecasts.
The forum is kind of a bit dead generally, for one thing.
I don’t really get on what grounds your are saying that the Coefficient Grants are not to people to do science, apart from the governance ones. I also think you are switching back and forth between: “No one knows when AGI will arrive, best way to prepare just in case is more normal AI science” and “we know that AGI is far, so there’s no point doing normal science to prepare against AGI now, although there might be other reasons to do normal science.”
If we don’t know which of infinite or astronomically many possible theories about AGI are more likely to be correct than the others, how can we prepare?
Maybe alignment techniques conceived based on our current wrong theory make otherwise benevolent and safe AGIs murderous and evil on the correct theory. Or maybe they’re just inapplicable. Who knows?
Not everything being funded here even IS alignment techniques, but also, insofar as you just want general better understanding of AI as a domain through science, why wouldn’t you learn useful stuff from applying techniques to current models. If the claim is that current models are too different from any possible AGI for this info to be useful, why do you think “do science” would help prepare for AGI at all? Assuming you do think that, which still seems unclear to me.
You might learn useful stuff about current models from research on current models, but not necessarily anything useful about AGI (except maybe in the slightest, most indirect way). For example, I don’t know if anyone thinks if we had invested 100x or 1,000x more into research on symbolic AI systems 30 years ago, that we would know meaningfully more about AGI today. So, as you anticipated, the relevance of this research to AGI depends on an assumption about the similar between a hypothetical future AGI and current models.
However, even if you think AGI will be similar to current models, or it might be similar, there might be no cost to delaying research related to alignment, safety, control, preparedness, value lock-in, governance, and so on until more fundamental research progress on capabilities has been made. If in five or ten or fifteen years or whatever we understand much better how AGI will be built, then a single $1 million grant to a few researchers might produce more useful knowledge about alignment, safety, etc. than Dustin Moskovitz’s entire net worth would produce today if it were spend on research into the same topics.
My argument about “doing basic science” vs. “mitigating existential risk” is that these collapse into the same thing unless you make very specific assumptions about which theory of AGI is correct. I don’t think those assumptions are justifiable.
Put it this way: let’s say we are concerned that, for reasons due to fundamental physics, the universe might spontaneously end. But we also suspect that, if this is true, there may be something we can do to prevent it. What we want to know is a) if the universe is in danger in the first place, b) if so, how soon, and c) if so, what we can do about it.
To know any of these three things, (a), (b), or (c), we need to know which fundamental theory of physics is correct, and what the fundamental physical properties of our universe are. Problem is, there are half a dozen competing versions of string theory, and within those versions, the number of possible variations that could describe our universe is astronomically large, 10^500, or 10^272,000, or possibly even infinite. We don’t know which variation correctly describes our universe.
Plus, a lot of physicists say string theory is a poorly conceived theory in the first place. Some offer competing theories. Some say we just don’t know yet. There’s no consensus. Everybody disagrees.
What does the “existential risk” framing get us? What action does it recommend? How does the precautionary principle apply? Let’s say you have a $10 billion budget. How do you spend it to mitigate existential risk?
I don’t see how this doesn’t just loop all the way back around to basic science. Whether there’s an existential risk, and if so, when we need to worry about it, and if when the time comes, what we can do about it, are all things we can only know if we figure out the basic science. How do we figure out the basic science? By doing the basic science. So, your $10 billion budget will just go to funding basic science, the same physics research that is getting funded anyway.
The space of possible theories about how the mind works is at least six, plus a lot of people saying we just don’t know yet, and there are probably silly but illustrative ways to formulate it where you get very large numbers.
For instance, if we think the correct theory can be summed up in just 100 bits of information, then the number of possible theories is 10,000.
Or we could imagine what would happen if we paid a very large number of experts from various relevant fields (e.g. philosophy, cognitive science, AI) a lot of money to spend a year coming up with a one-to-two-page description of as many original, distinct, even somewhat plausible or credible theories as they could think of. Then we group together all the submissions that were similar enough and counted them as the same theory. How many distinct theories would we end up with? A handful? Dozens? Hundreds? Thousands?
I’m aware these thought experiments are ridiculous, but I’m trying to emphasize the point that the space of possible ideas seems very large. At the frontier of knowledge in a domain like the science of the mind, which largely exists in a pre-scientific or protoscientific or pre-paradigmatic state, trying to actually map out the space of theories that might possibly be correct is a daunting task. Doing that well, to a meaningful extent, ultimately amounts to actually doing the science or advancing the frontier of knowledge yourself.
What is the right way to apply the precautionary principle in this situation? I would say the precautionary principle isn’t the right way to think about it. We would like to be precautionary, but we don’t know enough to know how to be. We’re in a situation of fundamental, wide-open uncertainty, at the frontier of knowledge, in a largely pre-scientific state of understanding about the nature of the mind and intelligence. So, we don’t know how to reduce risk — for example, our ideas on how to reduce risk might do nothing or they might increase risk.