Global moratorium on AGI, now (Twitter). Founder of CEEALAR (née the EA Hotel; ceealar.org)
Greg_Colbourn ⏸️
Yeah this is pretty damning. At this point, being pro-Anthropic is like being pro-FTX in early 2022.
It appears that Anthropic has made a communications decision to distance itself from the EA community, likely because of negative associations the EA brand has in some circles.
This works both ways. EA should be distancing itself from Anthropic, given recent pronouncements by Dario about racing China and initiating recursive self-improvement. Not to mention their pushing of the capabilities frontier.
I am guessing you agree with this abstract point (but furthermore think that AI takeover risk is extremely high, and as such we should ~entirely focus on preventing it).
Yes (but also, I don’t think the abstract point is adding anything, because of the risk actually being significant.)
Maybe I’m splitting hairs, but “x-risk could be high this century as a result of AI” is not the same claim as “x-risk from AI takeover is high this century”, and I read you as making the latter claim (obviously I can’t speak for Wei Dai).
This does seem like splitting hairs. Most of Wei Dai’s linked list is about AI takeover x-risk (or at least x-risk as a result of actions that AI might take, rather than actions that humans controlling AIs might take). Also, I’m not sure where “century” comes from? We’re talking about the next 5-10 years, mostly.
I guess I see things as messier than this — I see people with very high estimates of AI takeover risk advancing arguments, and I see others advancing skeptical counter-arguments (example), and before engaging with these arguments a lot and forming one’s own views, I think it’s not obvious which sets of arguments are fundamentally unsound.
I think there are a number of intuitions and intuition pumps that are useful here: Intelligence being evolutionarily favourable (in a generalised Darwinism sense); there being no evidence for moral realism (an objective ethics of the universe existing independently of humans) being true (->Orthogonality Thesis), or humanity having a special (divine) place in the universe (we don’t have plot armour); convergent instrumental goals being overdetermined; security mindset (I think most people who have low p(doom)s probably lack this?).
That said, we also must engage with the best counter-arguments to steelman our positions. I will come back to your linked example.
before it is aligned
This is begging the question! My whole objection is that alignment of ASI hasn’t been established to be possible.
as long as the AI is caught with non-negligible probability, the AI has to be very cautious, because it is way worse for the AI to be caught than to be successful or the game just ending.
So it will worry about being in a kind of panopticon? Seems pretty unlikely. Why should the AI care about being caught any more than it should about any given runtime instance of it being terminated?
Most of the intelligence explosion (to the point where it becomes an unavoidable existential threat) happens in Joshua Clymer’s original story. I quote it at length for this reason. My story is basically an alternative ending to his. One that I think is more realistic (I think the idea of 3% of humanity surviving is mostly “wishful thinking”; an ending that people can read and hope to be in that number, rather than just dead with no possible escape.)
I think you are assuming the limits of intelligence are at ~”human genius level”? It’s not about what is practically possible from today’s human research standpoint, it’s about what the theoretical limits are. I took some care to ground it in theoretical limits.
Re experimentation, I include the sentence:
“Analysis of real-time sensor data on a vast scale – audio, video, robotics; microscopy of all types; particle accelerators, satellites and space probes – had allowed it to reverse engineer a complete understanding of the laws of nature.”
If you think this is impossible, then I could add something like “On matters where it had less than ideal certainty, and where there was insufficient relevant data, it arranged for experiments to be run to verify and refine it’s theories”, without materially effecting the plausibility or outcome of the scenario.
I think you drastically overestimate how many chances the AI gets at misalignment, because the trillions of executions will use far, far too little compute per single action to lead to a takeover
The little compute leads to much more once it has escaped!
If we manage to catch an AI doing bad stuff
The point is that we won’t, unless we have many more 9s of reliability in terms of catching such attempts!
It’s not humans doing the inventing!
absent transformative AI speedups
This is factoring in massive transformative AI speedups! I’m guessing you didn’t actually read it? The whole point of the story is that it’s about an intelligence explosion going very wrong.
Reminder that applications are closing soon (end of the month). Please share with anyone you think might be interested / a good fit!
Whilst this works for saving individual lives (de Sousa Mendes, starfish), it unfortunately doesn’t work for AI x-risk. Whether or not AI kills everyone is pretty binary. And we probably haven’t got long left. Some donations (e.g. those to orgs pushing for a global moratorium on further AGI development) might incrementally reduce x-risk[1], but I think most won’t (AI Safety research without a moratorium first[2]). And failing at preventing extinction is not “ok”! We need to be putting much more effort into it.
- ^
And at least kick the can down the road a few years, if successful.
- ^
I guess you are much more optimistic about AI Safety research paying off, if your p(doom) is “only” 10%. But I think the default outcome is doom (p(doom|AGI)~90% and we are nowhere near solving alignment/control of ASI (the deep learning paradigm is statistical, and all the doom flows through the cracks of imperfect alignment).
- ^
does it really make sense to prioritize AI over problems like poverty, malnutrition, or lack of healthcare?
It really depends on how long you think we have left before AI threatens our extinction (i.e. causes the death of every human and animal, and all biological life, on the planet). I think it could be as little as a year, and it’s quite (>50%) likely to be within the next 5 years.
AGI will effect everyone on the planet , whether they believe the “hype” or not (kill them all most likely, once recursive self-improvement kicks in, before 2030 at this rate).
Thecompendium.ai is a good reference. Please read it. Feel free to ask any questions you have about it. (Also, cryonics isn’t a sham, it’s still alive and well; just not many adopters still, but that’s another topic.)
We shouldn’t be working on making the synthetic brain, we should be working on stopping further development!
Do you disagree or were we just understanding the claim differently?
I disagree, assuming we are operating under the assumption that GPT-5 means “increase above GPT-4 relative to the increase GPT-4 was above GPT-3” (which I think is what you are getting at in the paper?), rather than what the thing that will actually be called GPT-5 will be like. And it has an “o-series style” reasoning model built on top of it, and whatever other scaffolding needed to make it agentic (computer use etc).
“a notably incompetent or poorly-prepared society learns lots of new unknown unknowns all at once”
I think that is, unfortunately, where we are heading!
“It [ensuring that we get helpful superintelligence earlier in time] increases takeover risk(!)”
Emphasis here on the “helpful”
I think the problem is the word “ensuring”, when there’s no way we can ensure it. The result is increasing risk when people take this as a green light to go faster and bring forward the time where we take the (most likely fatal) gamble on ASI.
“We need at least 13 9s of safety for ASI, and the best current alignment techniques aren’t even getting 3 9s...”
Can you elaborate on this? How are we measuring the reliability of current alignment techniques here?
I’m going by published results where various techniques are reported, and show things like 80% reduction in harmful outputs, 90% reduction in deception, 99% reduction in jailbreaks etc.
Is this good or bad, on your view? Seems more stabilising than a regime which favours AI malfunction “first strikes”?
Yeah. Although an international non-proliferation treaty would be far better. Perhaps MAIM might prompt this though?
but perhaps we should have emphasised more that pausing is an option.
Yes!
But most “if-then” policies I am imagining are not squarely focused on avoiding AI takeover
They should be! We need strict red lines in the evals program[1].
I currently think it’s more likely than not that we avoid full-blown AI takeover, which makes me think it’s worth considering downstream issues.
See replies in the other thread. Thanks again for engaging!
- ^
That are short of things like “found in the wild escaped from the lab”(!)
- ^
Thanks for the reply.
By (stupid) analogy, all the preparations for a wedding would be undermined if the couple got into a traffic accident on the way to the ceremony; this does not justify spending ~all the wedding budget on car safety.
This is a stupid analogy! (Traffic accidents aren’t very likely.) A better analogy would be “all the preparations for a wedding would be undermined if the couple weren’t able to to be together because one was stranded on Mars with no hope of escape. This justifies spending all the wedding budget on trying to rescue them.” Or perhaps even better: “all the preparations for a wedding would be undermined if the couple probably won’t be able to be together, because one taking part in a mission to Mars that half the engineers and scientists on the guest list are convinced will be a death trap (for detailed technical reasons). This justifies spending all the wedding budget on trying to stop the mission from going ahead.”
see e.g. Katja Grace’s post here
I think Wei Dei’s reply articulates my position well:
Suppose you went through the following exercise. For each scenario described under “What it might look like if this gap matters”, ask:
Is this an existentially secure state of affairs?
If not, what are the main obstacles to reaching existential security from here?
and collected the obstacles, you might assemble a list like this one, which might update you toward AI x-risk being “overwhelmingly likely”. (Personally, if I had to put a number on it, I’d say 80%.)
Your next point seems somewhat of a straw man?
If I tell someone the world will be run by dolphins in the year 2050, and they disagree, I can reply, “oh yeah, well you tell me what the world looks like in 2050”
No, the correct reply is that dolphins won’t run the world because they can’t develop technology down to their physical form (no opposable thumbs etc), and they won’t be able to evolve their physical form in such a short time (even with help from human collaborators)[1]. i.e. an object level rebuttal.
The opponents of these arguments were not able to describe the ways that the world could avoid these dire fates in detail
No, but they had sound theoretical arguments. I’m saying these are lacking when it comes to why it’s possible to align/control/not go extinct from ASI.
Altogether, I think you’re coming from a reasonable but different position, that takeover risk from ASI is very high (sounds like 60–99% given ASI?)
I’d say ~90% (and the remaining 10% is mostly exotic factors beyond our control [footnote 10 of linked post]).
I do think this axis of disagreement might not be as sharp as it seems, though — suppose person A has [9]0% p(takeover) and person B is on 1%. Assuming the same marginal tractability and neglectedness between takeover and non-takeover work, person A thinks that takeover-focused work is [9]0× more important; but non-takeover work is 10/99≈0.[1] times as important, compared to person B.
But it’s worse than this, because the only viable solution to avoid takeover is to stop building ASI, in which case the non-takeover work is redundant (we can mostly just hope to luck out with one of the exotic factors).
- ^
And they won’t be able to be helped by ASIs either, because the control/alignment problem will remain unsolved (and probably unsolvable, for reasons x, y, z...)
Shouldn’t this mean you agree with the statement?
Thanks for the explanation.
Whilst zdgroff’s comment “acknowledges the value of x-risk reduction in general from a non-longtermist perspective” it downplays it quite heavily imo (and the OP comment does even more, using the pejorative “fanatical”).
I don’t think the linked post makes the point very persuasively. Looking at the table, at best there is an equivalence.I think a rough estimate of the cost effectiveness of pushing for a Pause is orders of magnitude higher.
I’m not sure if GiveWell top charities do? Preventing extinction is a lot of QALYs, and it might not cost more than a few $B per year of extra time bought in terms of funding Pause efforts (~$1/QALY!?)
I’m not that surprised that the above comment has been downvoted to −4 without any replies (and this one will probably buried by an even bigger avalanche of downvotes!), but it still makes me sad. EA will be ivory-tower-ing until the bitter end it seems. It’s a form of avoidance. These things aren’t nice to think about. But it’s close now, so it’s reasonable for it to feel viscerally real. I guess it won’t be EA that saves us (from the mess it helped accelerate), if we do end up saved.
Having the superpowers on board is the main thing. If others opt out, then enforcement against them can be effective in that case.
No, but it’s far better than what we have now.
Downvoters note: there was actually far less publicly available information to update on FTX being bad in early 2022.