Thanks for these comments, Greg, and sorry for taking a while to get round to them.
This is conservative. Why not “GPT-5”? (In which case the 100,000x efficiency gain becomes 10,000,000,000x.)
Of course there’s some ambiguity in what “as capable as a human being” means, since present-day LLMs are already superhuman in some domains (like general knowledge), and before AI systems are smarter in every important way than humans, they will be smarter in increasingly many but not all ways. But in the broader context of the piece, we’re interested in AI systems which effectively substitute for a human researcher, and I just don’t think GPT-5 will be that good. Do you disagree or were we just understanding the claim differently?
See APM section for how misaligned ASI takeover could lead to extinction.
Are you missing a quotation here?
Why is this [capturing more wealth before AI poses meaning catastrophic risk] likely? Surely we need a Pause to be able to do this?
It’s a conditional, so we’re not claiming it’s more likely than not that AI generates a lot of wealth before reaching very high “catastrophic risk potential” (if ever), but I do think it’s plausible. One scenario where this looks likely is the one described by Epoch in this post, where AI services are diffusely integrated into the world economy. I think it would be more likely if we do not see something like a software intelligence explosion (i.e. “takeoff” from automating AI R&D). It would also be made more likely by laws and regulations which successfully restrict dangerous uses of AI.
A coordinated pause might block a lot of the wealth-generating effects of AI, if most those effects come from frontier models. But a pause (generally or on specific applications/uses) could certainly make the scenario we mention more likely (and even if it didn’t, that in itself wouldn’t make it a bad idea).
Expect these to be more likely to cause extinction than a good future? (Given Vulnerable World)
Not sure how to operationalise that question. I think most individual new technologies (historically and in the future) will make the world better, and I think the best world we can feasibly get to at the current technology level is much less good than the best world we can get to with sustained tech progress. How likely learning more unknown unknowns is (in general) to cause extinction is partly a function of whether there are “recipes for ruin” hidden in the tech tree, and then how society handles them. So I think I’d prefer “a competent and well-prepared society continues to learn new unknown unknowns (i.e. novel tech or other insights)” over “we indefinitely stop the kind of tech progress/inquiry that could yield unknown unknowns” over “a notably incompetent or poorly-prepared society learns lots of new unknown unknowns all at once”.
If superintelligence is catastrophically misaligned, then it will take over, and the other challenges won’t be relevant.
I expect we agree on this at least in theory, but maybe worth noting explicitly: if you’re prioritising between some problems, one problem completely undermines everything else if you fail on it, it doesn’t follow that you should fully prioritise work on that problem. Though I do think the work going into preventing AI takeover is embarrassingly inadequate to the importance of the problem.
It [ensuring that we get helpful superintelligence earlier in time] increases takeover risk(!)
Emphasis here on the “helpful” (with respect to the challenges we list earlier, and a background level of frontier progress). I don’t think we should focus efforts on speeding up frontier progress in the broad sense. This appendix to this report discusses the point that speeding up specific AI applications is rarely if ever worthwhile, because it involves speeding up up AI progress in general.
We need at least 13 9s of safety for ASI, and the best current alignment techniques aren’t even getting 3 9s...
Can you elaborate on this? How are we measuring the reliability of current alignment techniques here? If you roughly know the rate of failure of components of a system, and you can build in redundancy, and isolate failures before they spread, you can get away with any given component failing somewhat regularly. I think if you can confidently estimate the failure rates of different (sub-)components of the system, you’re already in a good place, because then you can build AIs the way engineers build and test bridges, airplanes, and nuclear power stations. I don’t have an informed view on whether we’ll reach that level of confidence in how to model the AIs (which is indeed reason to be pretty freaked out).
This whole section (the whole paper?) assumes that an intelligence explosion is inevitable.
Sure — “assumes that” in the sense of “is conditional on”. I agree that most of the points we raise are less relevant if we don’t get an intelligence explosion (as the title suggests). Not “assumes” as in “unconditionally asserts”. We say: “we think an intelligence explosion is more likely than not this century, and may well begin within a decade.” (where “intelligence explosion” is informally understood as a very rapid and sustained increase in the collective capabilities of AI systems). Agree it’s not inevitable, and there are levers to pull which influence the chance of an intelligence explosion.
But could also just lead to Mutually Assured AI Malfunction (MAIM).
Is this good or bad, on your view? Seems more stabilising than a regime which favours AI malfunction “first strikes”?
This [bringing forward the start of the intelligence explosion] sounds like a terrible and reckless idea! Because we don’t know exactly where the thresholds are for recursive self-improvement to kick in.
I agree it would be reckless if it accidentally made a software intelligence explosion happen sooner or be more likely! And I think it’s a good point that we don’t know much about the thresholds for accelerating progress from automating AI R&D. Suggests we should be investing more in setting up relevant measures and monitoring them carefully (+ getting AI developers to report on them).
Yes, unless we stop it happening (and we should!)
See comment above! Probably we disagree on how productive and feasible a pause on frontier development is (just going off the fact that you are working on pushing for it and I am not), but perhaps we should have emphasised more that pausing is an option.
Problem is knowing that by the time the “if” is verified to have occurred, it could well be too late to do the “then” (e.g. once a proto-ASI has already escaped onto the internet).
I think we’re operating with different pictures in our head here. I agree that naive “if-then” policies could easily kick in too late to prevent deceptively aligned AI doing some kind of takeover (in particular because the deceptively aligned AI could know about and try to avoid triggering the “if” condition). But most “if-then” policies I am imagining are not squarely focused on avoiding AI takeover (nor is most of the piece).
Need a moratorium now, not unworkable “if-then” commitments!
it’s not clear to me if-then policies are less “workable” than a blanket moratorium on frontier AI development, in terms of the feasibility of implementing them. I guess you could be very pessimistic about whether any if-then commitments would at all help, which it sounds like you are.
This [challenges downstream of ASI] is assuming ASI is alignable! (The whole Not just misalignment section is).
Again, it’s true that we’d only face most the challenges we list if we avoid full-blown AI takeover, but we’re not asserting that ASI is alignable with full confidence. I agree that if you are extremely confident that ASI is not alignable, then all these downstream issues matters less. I currently think it’s more likely than not that we avoid full-blown AI takeover, which makes me think it’s worth considering downstream issues.
Do you disagree or were we just understanding the claim differently?
I disagree, assuming we are operating under the assumption that GPT-5 means “increase above GPT-4 relative to the increase GPT-4 was above GPT-3” (which I think is what you are getting at in the paper?), rather than what the thing that will actually be called GPT-5 will be like. And it has an “o-series style” reasoning model built on top of it, and whatever other scaffolding needed to make it agentic (computer use etc).
“a notably incompetent or poorly-prepared society learns lots of new unknown unknowns all at once”
I think that is, unfortunately, where we are heading!
“It [ensuring that we get helpful superintelligence earlier in time] increases takeover risk(!)”
Emphasis here on the “helpful”
I think the problem is the word “ensuring”, when there’s no way we can ensure it. The result is increasing risk when people take this as a green light to go faster and bring forward the time where we take the (most likely fatal) gamble on ASI.
“We need at least 13 9s of safety for ASI, and the best current alignment techniques aren’t even getting 3 9s...”
Can you elaborate on this? How are we measuring the reliability of current alignment techniques here?
I’m going by published results where various techniques are reported, and show things like 80% reduction in harmful outputs, 90% reduction in deception, 99% reduction in jailbreaks etc.
Is this good or bad, on your view? Seems more stabilising than a regime which favours AI malfunction “first strikes”?
Yeah. Although an international non-proliferation treaty would be far better. Perhaps MAIM might prompt this though?
but perhaps we should have emphasised more that pausing is an option.
Yes!
But most “if-then” policies I am imagining are not squarely focused on avoiding AI takeover
Thanks for these comments, Greg, and sorry for taking a while to get round to them.
Of course there’s some ambiguity in what “as capable as a human being” means, since present-day LLMs are already superhuman in some domains (like general knowledge), and before AI systems are smarter in every important way than humans, they will be smarter in increasingly many but not all ways. But in the broader context of the piece, we’re interested in AI systems which effectively substitute for a human researcher, and I just don’t think GPT-5 will be that good. Do you disagree or were we just understanding the claim differently?
Are you missing a quotation here?
It’s a conditional, so we’re not claiming it’s more likely than not that AI generates a lot of wealth before reaching very high “catastrophic risk potential” (if ever), but I do think it’s plausible. One scenario where this looks likely is the one described by Epoch in this post, where AI services are diffusely integrated into the world economy. I think it would be more likely if we do not see something like a software intelligence explosion (i.e. “takeoff” from automating AI R&D). It would also be made more likely by laws and regulations which successfully restrict dangerous uses of AI.
A coordinated pause might block a lot of the wealth-generating effects of AI, if most those effects come from frontier models. But a pause (generally or on specific applications/uses) could certainly make the scenario we mention more likely (and even if it didn’t, that in itself wouldn’t make it a bad idea).
Not sure how to operationalise that question. I think most individual new technologies (historically and in the future) will make the world better, and I think the best world we can feasibly get to at the current technology level is much less good than the best world we can get to with sustained tech progress. How likely learning more unknown unknowns is (in general) to cause extinction is partly a function of whether there are “recipes for ruin” hidden in the tech tree, and then how society handles them. So I think I’d prefer “a competent and well-prepared society continues to learn new unknown unknowns (i.e. novel tech or other insights)” over “we indefinitely stop the kind of tech progress/inquiry that could yield unknown unknowns” over “a notably incompetent or poorly-prepared society learns lots of new unknown unknowns all at once”.
I expect we agree on this at least in theory, but maybe worth noting explicitly: if you’re prioritising between some problems, one problem completely undermines everything else if you fail on it, it doesn’t follow that you should fully prioritise work on that problem. Though I do think the work going into preventing AI takeover is embarrassingly inadequate to the importance of the problem.
Emphasis here on the “helpful” (with respect to the challenges we list earlier, and a background level of frontier progress). I don’t think we should focus efforts on speeding up frontier progress in the broad sense. This appendix to this report discusses the point that speeding up specific AI applications is rarely if ever worthwhile, because it involves speeding up up AI progress in general.
Can you elaborate on this? How are we measuring the reliability of current alignment techniques here? If you roughly know the rate of failure of components of a system, and you can build in redundancy, and isolate failures before they spread, you can get away with any given component failing somewhat regularly. I think if you can confidently estimate the failure rates of different (sub-)components of the system, you’re already in a good place, because then you can build AIs the way engineers build and test bridges, airplanes, and nuclear power stations. I don’t have an informed view on whether we’ll reach that level of confidence in how to model the AIs (which is indeed reason to be pretty freaked out).
Sure — “assumes that” in the sense of “is conditional on”. I agree that most of the points we raise are less relevant if we don’t get an intelligence explosion (as the title suggests). Not “assumes” as in “unconditionally asserts”. We say: “we think an intelligence explosion is more likely than not this century, and may well begin within a decade.” (where “intelligence explosion” is informally understood as a very rapid and sustained increase in the collective capabilities of AI systems). Agree it’s not inevitable, and there are levers to pull which influence the chance of an intelligence explosion.
Is this good or bad, on your view? Seems more stabilising than a regime which favours AI malfunction “first strikes”?
I agree it would be reckless if it accidentally made a software intelligence explosion happen sooner or be more likely! And I think it’s a good point that we don’t know much about the thresholds for accelerating progress from automating AI R&D. Suggests we should be investing more in setting up relevant measures and monitoring them carefully (+ getting AI developers to report on them).
See comment above! Probably we disagree on how productive and feasible a pause on frontier development is (just going off the fact that you are working on pushing for it and I am not), but perhaps we should have emphasised more that pausing is an option.
I think we’re operating with different pictures in our head here. I agree that naive “if-then” policies could easily kick in too late to prevent deceptively aligned AI doing some kind of takeover (in particular because the deceptively aligned AI could know about and try to avoid triggering the “if” condition). But most “if-then” policies I am imagining are not squarely focused on avoiding AI takeover (nor is most of the piece).
it’s not clear to me if-then policies are less “workable” than a blanket moratorium on frontier AI development, in terms of the feasibility of implementing them. I guess you could be very pessimistic about whether any if-then commitments would at all help, which it sounds like you are.
Again, it’s true that we’d only face most the challenges we list if we avoid full-blown AI takeover, but we’re not asserting that ASI is alignable with full confidence. I agree that if you are extremely confident that ASI is not alignable, then all these downstream issues matters less. I currently think it’s more likely than not that we avoid full-blown AI takeover, which makes me think it’s worth considering downstream issues.
Thanks again for your comments!
I disagree, assuming we are operating under the assumption that GPT-5 means “increase above GPT-4 relative to the increase GPT-4 was above GPT-3” (which I think is what you are getting at in the paper?), rather than what the thing that will actually be called GPT-5 will be like. And it has an “o-series style” reasoning model built on top of it, and whatever other scaffolding needed to make it agentic (computer use etc).
I think that is, unfortunately, where we are heading!
I think the problem is the word “ensuring”, when there’s no way we can ensure it. The result is increasing risk when people take this as a green light to go faster and bring forward the time where we take the (most likely fatal) gamble on ASI.
I’m going by published results where various techniques are reported, and show things like 80% reduction in harmful outputs, 90% reduction in deception, 99% reduction in jailbreaks etc.
Yeah. Although an international non-proliferation treaty would be far better. Perhaps MAIM might prompt this though?
Yes!
They should be! We need strict red lines in the evals program[1].
See replies in the other thread. Thanks again for engaging!
That are short of things like “found in the wild escaped from the lab”(!)