Global moratorium on AGI, now (Twitter). Founder of CEEALAR (née the EA Hotel; ceealar.org)
[Separating out this paragraph into a new comment as I’m guessing it’s what lead to the downvotes, and I’d quite like the point of the parent paragraph to stand alone. Not sure if anyone will see this now though.]I think it’s imperative to get the leaders of AGI companies to realise that they are in a suicide race (and that AGI will likely kill them too). The default outcome of AGI is doom. For extinction risk at the 1% level, it seems reasonable (even though it’s still 80M lives in expectation) to pull the trigger on AGI for a 99% chance of utopia. This is totally wrong-headed and is arguably contributing massively to current x-risk.
Also, in general I’m personally much more sceptical of such a moonshot paying off, given shorter timelines and the possibility that x-safety from ASI may well be impossible. I think OP was 2022′s best idea for AI Safety. 2024′s is PauseAI.
People from those orgs were aware, but none were keen enough about the idea to go as far as attempting a pilot run (e.g. the 2 week retreat idea). I think general downside risk aversion was probably a factor. This was in the pre-chatGPT days of a much narrower Overton Window though, so maybe it’s time for the idea to be revived? On the other hand, maybe it’s much less needed now there is government involvement, and national AI Safety Institutes attracting top talent.
At vastly superhuman capabilities (including intelligence and rationality), it should be easier to reduce existential-level mistakes to tiny levels. They would have vastly more capability for assessing and mitigating risks and for moral reflection
They are still human though, and humans are famous for making mistakes, even the most intelligent and rational of us. It’s even regarded by many as part of what being human is—being fallible. That’s not (too much of) a problem at current power differentials, but it is when we’re talking of solar-system-rearranging powers for millions of subjective years without catastrophic error...
a temporary pause only delays the inevitable doom.
Yes. The pause should be indefinite, or at least until global consensus to proceed, with democratic acceptance of whatever risk remains.
Perhaps. But remember they will be smarter than us, so controlling them might not be so easy (especially if they gain access to enough computer power to speed themselves up massively. And they need not be hostile, just curious, to accidentally doom us.)
Because of the crazy high power differential, and propensity for accidents (can a human really not mess up on an existential scale if acting for millions of years subjectively at superhuman capability levels?). As I say in my comment above:
Even the nicest human could accidentally obliterate the rest of us if uplifted to superintelligence and left running for subjective millions of years (years of our time). “Whoops, I didn’t expect that to happen from my little physics experiment”; “Uploading everyone into a hive mind is what my extrapolations suggested was for the best (and it was just so boring talking to you all at one word per week of my time)”.
I agree that they would most likely be safer than ML-derived ASI. What I’m saying is that they still won’t be safe enough to prevent an existential catastrophe. It might buy us a bit more time (if uploads happen before ASI), but that might only be measured in years. Moratorium >> mind uploads > ML-derived ASI.
I think there is an unstated assumption here that uploading is safe. And by safe, I mean existentially safe for humanity. If in addition to being uploaded, a human is uplifted to superintelligence, would they—indeed any given human in such a state—be aligned enough with humanity as a whole to not cause an existential disaster? Arguably humans right now are only relatively existentially safe because power imbalances between them are limited.Even the nicest human could accidentally obliterate the rest of us if uplifted to superintelligence and left running for subjective millions of years (years of our time). “Whoops, I didn’t expect that to happen from my little physics experiment”; “Uploading everyone into a hive mind is what my extrapolations suggested was for the best (and it was just so boring talking to you all at one word per week of my time)”.
Although safety for the individual being uploaded would be far from guaranteed either.
If you’re on X, Please share my tweet re the book giveaway.
Good point re it being a quantitative matter. I think the current priority is to kick the can down the road a few years with a treaty. Once that’s done we can see about kicking the can further. Without a full solution to x-safety|AGI (dealing with alignment, misuse and coordination), maybe all we can do is keep kicking the can down the road.
“woah, AI is powerful, I better be the one to build it”
I think this ship has long since sailed. The (Microsoft) OpenAI, Google Deepmind and (Amazon) Anthropic race is already enough to end the world. They have enough money, and all the best talent. If anything, if governments enter the race that might actually slow things down, by further dividing talent and the hardware supply. We need an international AGI non-proliferation treaty. I think any risks for governments joining the race is more than outweighed by the chances of them working toward a viable treaty.
It’s not even (necessarily) a default instrumental goal. It’s collateral damage as the result of other instrumental goals. It may just go straight for dismantling the Sun, knowing that we won’t be able to stop it. Or straight for ripping apart the planet with nanobots (no need for a poison everyone simultaneously step).
I do not agree that it is absolutely clear that the default goal of an AGI is for it to kill literally everyone, as the OP asserts.
The OP says
goals that entail killing literally everyone (which is the default)
[my emphasis in bold]. This is a key distinction. No one is saying that the default goal will be killing humans; the whole issue is one of collateral damage—it will end up with (to us) arbitrary goals that result in convergent intstrumental goals that lead to us all dead as collateral damage (e.g. turning the planet into “computronium”, or dismantling the Sun for energy).
No one is saying p(doom) is 100%, but there is good reason to think that it is 50% or more—that the default outcome of AGI is doom. It doesn’t default to somehow everything being ok. To alignment solving itself, or the alignment that has been done today (or by 2030) being enough if we get a foom tomorrow (by 2030). I’ve not seen any compelling argument to that effect.Thanks for the links. I think a lot of the problem with the proposed solutions is that they don’t scale to ASI, and aren’t water tight. Having 99.999999% alignment in the limit of ASI performing billions of actions a minute still means everyone dead after a little while. RHLF’d GPT-4 is only safe because it is weak. Alignment at the level that is typical human-to-humanity, or what is represented by “common sense” that can be picked up from training data, is still nowhere near sufficient. Uplifting any given human to superintelligence would also lead to everyone dead before too long, due to the massive power imbalance, even if it’s just by accident (“whoops I was just doing some physics experiments; didn’t think that would happen”; “I thought it would be cool if everyone became a post-human hive mind; I thought they’d like it”).And quite apart from alignment, we still need to eliminate catastrophic risks from misuse (jailbreaks, open sourced unaligned base model weights) and coordination failure (how to avoid chaos when everyone is wishing for different things from their genies). Those alone are enough to justify shutting it all down now.
I’ve not come across any arguments that debunk the risk in anywhere near the same rigour (and I still have a $1000 bounty open here). Please link to the “careful thought on the matter” from the other side that you mention (or add here). I’m with Richard Ngo when he says:
I’m often cautious when publicly arguing that AGI poses an existential risk, because our arguments aren’t as detailed as I’d like. But I should remember that the counterarguments are *much* worse—I’ve never seen a plausible rebuttal to the core claims. That’s terrifying.
There has already been much written on this, enough for there to be a decent level of consensus (which indeed there is here (EAF/LW)).
Here’s my attempt at a concise explanation for why the default outcome of AGI is doom.