Here’s a hypothesis:
The base case / historical precedent for existential AI risk is:
- AGI has never been developed
- ASI has never been developed
- Existentially deadly technology has never been developed (I don’t count nuclear war or engineered pandemics, as they’ll likely leave survivors)
- Highly deadly technology (>1M deaths) has never been cheap and easily copied
- We’ve never had supply chains so fully automated end-to-end that they could become self-sufficient with enough intelligence
- We’ve never had technology so networked that it could all be taken over by a strong enough hacker
Therefore, if you’re in the skeptic camp, you don’t have to make as much of an argument about specific scenarios where many things happen. You can just wave your arms and say it’s never happened before because it’s really hard and rare, as supported by the historical record.
In contrast, if you’re in the concerned camp, you’re making more of a positive claim about an imminent departure from historical precedent, so the burden of proof is on you. You have to present some compelling model or principles for explaining why the future is going to be different from the past.
Therefore, I think the concerned camp relying on theoretical arguments with multiple steps of logic might be a structural side effect of them having to argue against the historical precedent, rather than any innate preference for that type of argument.
Ted Sanders
“Refuted” feels overly strong to me. The essay says that market participants don’t think TAGI is coming, and those market participants have strong financial incentive to be correct, which feels unambiguously correct to me. So either TAGI isn’t coming soon, or else a lot of people with a lot of money on the line are wrong. They might well be wrong, but their stance is certainly some form of evidence, and evidence in the direction of no TAGI. Certainly the evidence isn’t bulletproof, condsidering the recent mispricings of NVIDIA and other semi stocks.
In my own essay, I elaborated on the same point using prices set by more-informed insiders: e.g., valuations and hiring by Anthropic/DeepMind/etc., which also seem to imply that TAGI isn’t coming soon. If they have a 10% chance of capturing 10% of the value for 10 years of doubling the world economy, that’s like $10T. And yet investment expenditures and hiring and valuations are nowhere near that scale. The fact that Google has more people working on ads than TAGI implies that they think TAGI is far off. (Or, more accurately, that marginal investments would not accelerate TAGI timelines or market share.)
Great comment. We didn’t explicitly allocate probability to those scenarios, and if you do, you end up with much higher numbers. Very reasonable to do so.
I think that’s a great criticism. Perhaps our conditional odds of Taiwan derailment are too high because we’re too anchored to today’s distribution of production.
One clarification/correction to what I said above: I see the derailment events 6-10 as being conditional on us being on the path to TAGI had the derailments not occurred. So steps 1-5 might not have happened yet, but we are in a world where they will happen if the derailment does not occur. (So not really conditional on TAGI already occurring, and not necessarily conditional on AGI, but probably AGI is occurring in most of those on-the-path-to-TAGI scenarios.)
Edit: More precisely, the cascade is:
- Probability of us developing TAGI, assuming no derailments
- Probability of us being derailed, conditional on otherwise being on track to develop TAGI without derailment
Question: Do you happen to understand what it means to take a geometric mean of probabilities? In re-reading the paper, I’m realizing I don’t understand the methodology at all. For example, if there is a 33% chance we live in a world with 0% probability of doom, a 33% chance we live in a world with 50% probability of doom, and a 33% chance we live in a world with 100% probability of doom… then the geometric mean is (0% x 50% x 100%)^(1/3) = 0%, right?
Edit: Apparently the paper took a geometric mean of odds ratios, not probabilities. But this still means that had a single surveyed person said 0%, the entire model would collapse to 0%, which is wrong on its face.
Great comment! Thanks especially for trying to point the actual stages going wrong, rather than hand-waving the multiple stage fallacy, which we all are of course well aware of.
Replying to the points:For example, the authors assign around 1% to events 1-5 happening before 2043. If they’re correct, then conditioning on events 1-5 happening before 2043, they’ll very likely only happen just before 2043. But this leaves very little time for any “derailing” to occur after that, and so the conditional probability of derailing should be far smaller than what they’ve given (62%).
From my POV, if events 1-5 have happened, then we have TAGI. It’s already done. The derailments are not things that could happen after TAGI to return us to a pre-TAGI state. They are events that happen before TAGI and modify the estimates above.
The authors might instead say that they’re not conditioning on events 1-5 literally happening when estimating conditional probability of derailing, but rather conditioning on something more like “events 1-5 would have happened without the 5 types of disruption listed”. That way, their 10% estimate for a derailing pandemic could include a pandemic in 2025 in a world which was otherwise on track for reaching AGI. But I don’t think this is consistent, because the authors often appeal to the assumption that AGI already exists when talking about the probability of derailing (e.g. the probability of pandemics being created). So it instead seems to me like they’re explicitly treating the events as sequential in time, but implicitly treating the events as sequential in logical flow, in a way which significantly decreases the likelihood they assign to TAI by 2043.
Yes, we think AGI will precede TAGI by quite some time, and therefore it’s reasonable to talk about derailments of TAGI conditional on AGI.
Congrats to the winners, readers, and writers!
Two big surprises for me:
(1) It seems like 5⁄6 of the essays are about AI risk, and not TAGI by 2043. I thought there were going to be 3 winners on each topic, but perhaps that was never stated in the rules. Rereading, it just says there would be two 1st places, two 2nd places, and two 3rd places. Seems the judges were more interested in (or persuaded by) arguments on AI safety & alignment, rather than TAGI within 20 years. A bit disappointing for everyone who wrote on the second topic. If the judges were more interested in safety & alignment forecasting, that would have been nice to know ahead of time.
(2) I’m also surprised that the Dissolving AI Risk paper was chosen. (No disrespect intended; it was clearly a thoughtful piece.)
To me, it makes perfect sense to dissolve the Fermi paradox by pointing out that the expected # of alien civilizations is a very different quantity than the probability of 0 alien civilizations. It’s logically possible to have both a high expectation and a high probability of 0.
But it makes almost no sense to me to dissolve probabilities by factoring them into probabilities of probabilities, and then take the geometric mean of that distribution. Taking the geometric mean of subprobabilities feels like a sleight of hand to end up with a lower number than what you started with, with zero new information added in the process. I feel like I must have missed the main point, so I’ll reread the paper.
Edit: After re-reading, it makes more sense to me. The paper takes the geometric means of odds ratios in order to aggregate survey entries. It doesn’t take the geometric mean of probabilities, and it doesn’t slice up probabilities arbitrarily (as they are the distribution over surveyed forecasters).Edit2: As Jaime says below, the greater error is assuming independence of each stage. The original discussion got quite nerd-sniped by the geometric averaging, which is a bit of a shame, as there’s a lot more to the piece to discuss and debate.
The end-to-end training run is not what makes learning slow. It’s the iterative reinforcement learning process of deploying in an environment, gathering data, training on that data, and then redeploying with a new data collection strategy, etc. It’s a mistake, I think, to focus only the narrow task of updating model weights and omit the critical task of iterative data collection (i.e., reinforcement learning).
Sorry for seeming disingenuous. :(
(I think I will stop posting here for a while.)
What is Vol analysis?
Do you have any material on this? It sounds plausible to me but I couldn’t find anything with a quick search.
Nope, it’s just an unsubstantiated guess based on seeing what small teams can build today vs 30 years ago. Also based on the massive improvement in open-source libraries and tooling compared to then. Today’s developers can work faster at higher levels of abstraction compared to folks back then.
In this world we have AIs that cheaply automate half of work. That seems like it would have immense economic value and promise, enough to inspire massive new investments in AI companies....
Ah, I think we have a crux here. I think that, if you could hire—for the same price as a human—a human-level AGI, that would indeed change things a lot. I’d reckon the AGI would have a 3-4x productivity boost from being able to work 24⁄7, and would be perfectly obedient, wouldn’t be limited to working in a single field, could more easily transfer knowledge to other AIs, could be backed up and/or replicated, wouldn’t need an office or a fun work environment, can be “hired” or “fired” ~instantly without difficulty, etc.
That feels somehow beside the point, though. I think in any such scenario, there’s also going to be very cheap AIs with sub-human intelligence that would have broad economic impact too.
Absolutely agree. AI and AGI will likely provide immense economic value even before the threshold of transformative AGI is crossed.
Still, supposing that AI research today is:
50⁄50 mix of capital and labor
faces diminishing returns
and has elastic demand
...then even a 4x labor productivity boost may not be all that path-breaking when you zoom out enough. Things will speed up, surely, but they might won’t create transformative AGI overnight. Even AGI researchers will need time and compute to do their experiments.
Nope, we didn’t miss the possibility of AGIs being very sample efficient in their learning. We just don’t think it’s certain, which is why we forecast a number below 100%. Sounds like your estimate is higher than ours; however, that doesn’t mean we missed the possibility.
Let me replay my understanding to you, to see if I understand. You are predicting that...
IF:
we gathered all files stored on hard drives
...decompressed them into streams of bytes
...trained a monstrous model to predict the next chunk in each stream
...and also trained it to play every winnable computer game ever made
THEN:
You are 50% confident we’d get AGI* using 2013 algos
You are 80% confident we’d get AGI* using 2023 algos
WHERE:
*AGI means AI that is general; i.e., able to generalize to all sorts of data way outside its training distribution. Meaning:
It avoids overfitting on the data despite its massive parameter count. E.g., not just memorizing every file or brute forcing all the exploitable speedrunning bugs in a game that don’t generalize to real-world understanding.
It can learn skills and tasks that are barely represented in the computer dataset but that real-life humans are nonetheless able to quickly understand and learn due to their general world models
It can made to develop planning, reasoning, and strategy skills not well represented by next-token prediction (e.g., it would learn to how write a draft, reflect on it, and edit it, even though it’s never been trained to do that and has only been optimized to append single tokens in sequence)
It simultaneously avoids underfitting due to any regularization techniques used to avoid the above overfitting problems
ASSUMING:
We don’t train on data not stored on computers
We don’t train on non-computer games (but not a big crux if you want to posit high fidelity basketball simulations, for example)
We don’t train on games without win conditions (but not a big crux, as most have them)
Is this a correct restatement of your prediction?
And are your confidence levels for this resulting in AGI on the first try? Within ten tries? Within a year of trial and error? Within a decade of trial and error?
(Rounding to the nearest tenth of a percent, I personally am 0.0% confident we’d get AGI on our first try with a system like this, even with 10^50 FLOPS.)
What can superintelligent ANI tell us about superintelligent AGI?
Confidence intervals over probabilities don’t make much sense to me. The probability itself is already the confidence interval over the binary domain [event happens, event doesn’t happen].
I guess to me the idea of confidence intervals over probabilities implies two different kinds of probabilities. E.g., a reducible flavor and an irreducible flavor. I don’t see what a two-tiered system of probability adds, exactly.
No it’s not just extrapolating base rates (that would be a big blunder). We assume that the development of proto-AGI or AGI will rapidly accelerate progress and investment, and our conditional forecasts are much more optimistic about progress than they would be otherwise.
However, it’s a totally fair to disagree with us on the degree of that acceleration. Even with superhuman AGI, for example, I don’t think we’re moving away from semiconductor transistors in less than 15 years. Of course, it really depends on how superhuman this superhuman intelligence would be. We discuss this more in the essay.
despite current models learning vastly faster than humans (training time of LLMs is not a human lifetime, and covers vastly more data)
Some models learning some things faster than humans does not imply AGI will learn all things faster than humans. Self-driving cars, for example, are taking much longer to learn to drive than teenagers do.
Agree that:
The odds of AGI by 2043 are much, much higher than transformative AGI by 2043
AGI will rapidly accelerate progress toward transformative AGI
The odds of transformative AGI by 2053 is higher than by 2043
We didn’t explicitly forecast 2053 in the paper, just 2043 (0.4%) and 2100 (41%). If I had to guess without much thought I might go with 3%. It’s a huge advantage to get 10 extra years to build fabs, make algorithms efficient, collect vast training sets, train from slow/expensive real-world feedback, and recover from rare setbacks.
My mental model is some kind of S surve where progress in the short-term is extremely unlikely, progress in the medium-term is more likely, and after a while, the longer it takes to happen, the less likely it is to happen in any given year, as that suggests that some ingredient is still missing and hard to get.
I think you may be right that twenty years is before the S of my S curve really kicks in. Twenty just feels so short with everything that needs to be solved and scaled. I’m much more open-minded about forty.
In particular, it seems like some of your estimates make more sense to me if I read them as saying “Well there will likely exist some task that AI systems can’t do.” But I think such claims aren’t very relevant for transformative AI, which would in turn lead to AGI.
By the same token, if the AIs were looking at humans they might say “Well there will exist some tasks that humans can’t do” and of course they’d be right, but the relevant thing is the single non-cherry-picked variable of overall economic impact. The AIs would be wrong to conclude that humans have slow economic growth because we can’t do some tasks that AIs are great at, and the humans would be wrong to conclude that AIs will have slow economic growth because they can’t do some tasks we are great at. The exact comparison is only relevant for assessing things like complementarity, which make large impacts happen strictly more quickly than they would otherwise.
(This might be related to me disliking AGI though, and then it’s kind of on OpenPhil for asking about it. They could also have asked about timelines to 100000x electricity production and I’d be making broadly the same arguments, so in some sense it must be me who is missing the point.)
Yep. We’re using the main definition supplied by Open Philanthropy, which I’ll paraphrase as “nearly all human work at human cost or less by 2043.”
If the definition was more liberal, e.g., AGI as smart as humans, or AI causing world GDP to rise by >100%, we would have forecasted higher probabilities. We expect AI to get wildly more powerful over the next decades and wildly change the face of human life and work. The public is absolutely unprepared. We are very bullish on AI progress, and we think AI safety is an important, tractable, and neglected problem. Creating new entities with the potential to be more powerful than humanity is a scary, scary thing.
One small clarification: the skeptical group was not all superforecasters. There were two domain experts as well. I was one of them.
I’m sympathetic to David’s point here. Even though the skeptic camp was selected for their skepticism, I think we still get some information from the fact that many hours of research and debate didn’t move their opinions. I think there are plausible alternative worlds where the skeptics come in with low probabilities (by construction), but update upward by a few points after deeper engagement reveals holes in their early thinking.