You seem to be lumping people like Richard Ngo, who is fairly epistemically humble, in with people who are absolutely sure that the default path leads to us all dying. It is only the latter that I’m criticizing.
I agree that AI poses an existential risk, in the sense that it is hard to rule out that the default path poses a serious chance of the end of civilization. That’s why I work on this problem full-time.
I do not agree that it is absolutely clear that default instrumental goals of an AGI entail it killing literally everyone, as the OP asserts.
(I provide some links to views dissenting from this extreme confidence here.)
I do not agree that it is absolutely clear that the default goal of an AGI is for it to kill literally everyone, as the OP asserts.
The OP says
goals that entailkilling literally everyone (which is the default)
[my emphasis in bold]. This is a key distinction. No one is saying that the default goal will be killing humans; the whole issue is one of collateral damage—it will end up with (to us) arbitrary goals that result in convergent intstrumental goals that lead to us all dead as collateral damage (e.g. turning the planet into “computronium”, or dismantling the Sun for energy).
Sure, I understand that it’s a supposed default instrumental goal and not a terminal goal. Sorry that my wording didn’t make that distinction clear. I’ve now edited it to do so, but I think my overall points stand.
It’s not even (necessarily) a default instrumental goal. It’s collateral damage as the result of other instrumental goals. It may just go straight for dismantling the Sun, knowing that we won’t be able to stop it. Or straight for ripping apart the planet with nanobots (no need for a poison everyone simultaneously step).
No one is saying p(doom) is 100%, but there is good reason to think that it is 50% or more—that the default outcome of AGI is doom. It doesn’t default to somehow everything being ok. To alignment solving itself, or the alignment that has been done today (or by 2030) being enough if we get a foom tomorrow (by 2030). I’ve not seen any compelling argument to that effect.
Thanks for the links. I think a lot of the problem with the proposed solutions is that they don’t scale to ASI, and aren’t water tight. Having 99.999999% alignment in the limit of ASI performing billions of actions a minute still means everyone dead after a little while. RHLF’d GPT-4 is only safe because it is weak.
Alignment at the level that is typical human-to-humanity, or what is represented by “common sense” that can be picked up from training data, is still nowhere near sufficient. Uplifting any given human to superintelligence would also lead to everyone dead before too long, due to the massive power imbalance, even if it’s just by accident (“whoops I was just doing some physics experiments; didn’t think that would happen”; “I thought it would be cool if everyone became a post-human hive mind; I thought they’d like it”).
And quite apart from alignment, we still need to eliminate catastrophic risks from misuse (jailbreaks, open sourced unaligned base model weights) and coordination failure (how to avoid chaos when everyone is wishing for different things from their genies). Those alone are enough to justify shutting it all down now.
You seem to be lumping people like Richard Ngo, who is fairly epistemically humble, in with people who are absolutely sure that the default path leads to us all dying. It is only the latter that I’m criticizing.
I agree that AI poses an existential risk, in the sense that it is hard to rule out that the default path poses a serious chance of the end of civilization. That’s why I work on this problem full-time.
I do not agree that it is absolutely clear that default instrumental goals of an AGI entail it killing literally everyone, as the OP asserts.
(I provide some links to views dissenting from this extreme confidence here.)
The OP says
[my emphasis in bold]. This is a key distinction. No one is saying that the default goal will be killing humans; the whole issue is one of collateral damage—it will end up with (to us) arbitrary goals that result in convergent intstrumental goals that lead to us all dead as collateral damage (e.g. turning the planet into “computronium”, or dismantling the Sun for energy).
Sure, I understand that it’s a supposed default instrumental goal and not a terminal goal. Sorry that my wording didn’t make that distinction clear. I’ve now edited it to do so, but I think my overall points stand.
It’s not even (necessarily) a default instrumental goal. It’s collateral damage as the result of other instrumental goals. It may just go straight for dismantling the Sun, knowing that we won’t be able to stop it. Or straight for ripping apart the planet with nanobots (no need for a poison everyone simultaneously step).
Fair enough, I edited it again. I still think the larger points stand unchanged.
No one is saying p(doom) is 100%, but there is good reason to think that it is 50% or more—that the default outcome of AGI is doom. It doesn’t default to somehow everything being ok. To alignment solving itself, or the alignment that has been done today (or by 2030) being enough if we get a foom tomorrow (by 2030). I’ve not seen any compelling argument to that effect.
Thanks for the links. I think a lot of the problem with the proposed solutions is that they don’t scale to ASI, and aren’t water tight. Having 99.999999% alignment in the limit of ASI performing billions of actions a minute still means everyone dead after a little while. RHLF’d GPT-4 is only safe because it is weak.
Alignment at the level that is typical human-to-humanity, or what is represented by “common sense” that can be picked up from training data, is still nowhere near sufficient. Uplifting any given human to superintelligence would also lead to everyone dead before too long, due to the massive power imbalance, even if it’s just by accident (“whoops I was just doing some physics experiments; didn’t think that would happen”; “I thought it would be cool if everyone became a post-human hive mind; I thought they’d like it”).
And quite apart from alignment, we still need to eliminate catastrophic risks from misuse (jailbreaks, open sourced unaligned base model weights) and coordination failure (how to avoid chaos when everyone is wishing for different things from their genies). Those alone are enough to justify shutting it all down now.