I’ll go further and say that I think those two claims are widely believed by many in the AI safety world (in which I count myself) with a degree of confidence that goes way beyond what can be justified by any argument that has been provided by anyone, anywhere, and I think this is a huge epistemic failure of that part of the AI safety community.
I strongly downvoted the OP for making these broad, sweeping, controversial claims as if they are established fact and obviously correct, as opposed to one possible way the world could be which requires good arguments to establish, and not attempting any serious understanding of and engagement with the viewpoints of people who disagree that these organizations shutting down would be the best thing for the world.
I would like the AI safety community to work much harder on its epistemic standards.
These essays are well known and I’m aware of basically all of them. I deny that there’s a consensus on the topic, that the essays you link are representative of the range of careful thought on the matter, or that the arguments in these essays are anywhere near rigorous enough to meet my criterion: justifying the degree of confidence expressed in the OP (and some of the posts you link).
I’ve not come across any arguments that debunk the risk in anywhere near the same rigour (and I still have a $1000 bounty open here). Please link to the “careful thought on the matter” from the other side that you mention (or add here). I’m with Richard Ngo when he says:
I’m often cautious when publicly arguing that AGI poses an existential risk, because our arguments aren’t as detailed as I’d like. But I should remember that the counterarguments are *much* worse—I’ve never seen a plausible rebuttal to the core claims. That’s terrifying.
You seem to be lumping people like Richard Ngo, who is fairly epistemically humble, in with people who are absolutely sure that the default path leads to us all dying. It is only the latter that I’m criticizing.
I agree that AI poses an existential risk, in the sense that it is hard to rule out that the default path poses a serious chance of the end of civilization. That’s why I work on this problem full-time.
I do not agree that it is absolutely clear that default instrumental goals of an AGI entail it killing literally everyone, as the OP asserts.
(I provide some links to views dissenting from this extreme confidence here.)
I do not agree that it is absolutely clear that the default goal of an AGI is for it to kill literally everyone, as the OP asserts.
The OP says
goals that entailkilling literally everyone (which is the default)
[my emphasis in bold]. This is a key distinction. No one is saying that the default goal will be killing humans; the whole issue is one of collateral damage—it will end up with (to us) arbitrary goals that result in convergent intstrumental goals that lead to us all dead as collateral damage (e.g. turning the planet into “computronium”, or dismantling the Sun for energy).
Sure, I understand that it’s a supposed default instrumental goal and not a terminal goal. Sorry that my wording didn’t make that distinction clear. I’ve now edited it to do so, but I think my overall points stand.
It’s not even (necessarily) a default instrumental goal. It’s collateral damage as the result of other instrumental goals. It may just go straight for dismantling the Sun, knowing that we won’t be able to stop it. Or straight for ripping apart the planet with nanobots (no need for a poison everyone simultaneously step).
No one is saying p(doom) is 100%, but there is good reason to think that it is 50% or more—that the default outcome of AGI is doom. It doesn’t default to somehow everything being ok. To alignment solving itself, or the alignment that has been done today (or by 2030) being enough if we get a foom tomorrow (by 2030). I’ve not seen any compelling argument to that effect.
Thanks for the links. I think a lot of the problem with the proposed solutions is that they don’t scale to ASI, and aren’t water tight. Having 99.999999% alignment in the limit of ASI performing billions of actions a minute still means everyone dead after a little while. RHLF’d GPT-4 is only safe because it is weak.
Alignment at the level that is typical human-to-humanity, or what is represented by “common sense” that can be picked up from training data, is still nowhere near sufficient. Uplifting any given human to superintelligence would also lead to everyone dead before too long, due to the massive power imbalance, even if it’s just by accident (“whoops I was just doing some physics experiments; didn’t think that would happen”; “I thought it would be cool if everyone became a post-human hive mind; I thought they’d like it”).
And quite apart from alignment, we still need to eliminate catastrophic risks from misuse (jailbreaks, open sourced unaligned base model weights) and coordination failure (how to avoid chaos when everyone is wishing for different things from their genies). Those alone are enough to justify shutting it all down now.
To be clear, mostly I’m not asking for “more work”, I’m asking people to use much better epistemic hygiene. I did use the phrase “work much harder on its epistemic standards”, but by this I mean please don’t make sweeping, confident claims as if they are settled fact when there’s informed disagreement on those subjects.
Nevertheless, some examples of the sort of informed disagreement I’m referring to:
The mere existence of many serious alignment researchers seriously optimistic about scalable oversight methods such as debate.
This post by Matthew Barnett arguing we’ve been able to specify values much more successfully than MIRI anticipated.
Shard theory, developed mostly by Alex Turner and Quintin Pope, calling into question the utility argmaxer framework which has been used to justify many historical concerns about instrumental convergence leading to AI takeover.
This comment by me arguing ChatGPT is pretty aligned compared to MIRI’s historical predictions, because it does what we mean and not what we say.
A detailed set of objections from Quintin Pope to Eliezer’s views, which Eliezer responded to by saying it’s “kinda long”, and engaged with extremely superficially before writing it off.
This by Stuhlmüller and Byun, as well as many other articles by others, arguing that process oversight is a viable alignment strategy, converging with rather than opposing capabilities.
Notably, the extreme doomer contingent has largely failed even to understand, never mind engage with, some of these arguments, frequently lazily pattern-matching and misrepresenting them as more basic misconceptions. A typical example is thinking Matthew Barnett and I have been saying that GPT understanding human values is evidence against the MIRI/doomer worldview (after all, “the AI knows what you want but does not care, as we’ve said all along”), when in fact we’re saying there’s evidence we have actually pointed GPT successfully at those values.
It’s fine if you have a different viewpoint. Just don’t express that viewpoint as if it’s self-evidently right when there’s serious disagreement on the matter among informed, thoughtful people. An article like the OP which claims that labs should shut down should at least try to engage with the views of someone who thinks the labs should not shut down, and not just pretend such people are fools unworthy of mention.
Aw shit, I’m very sorry for how I phrased it! I realized that it sounds like I’m digging at you. To be clear, I was asking for any links to discussions of alternative views, because I’m curious and haven’t heard them. What I meant is that it’s very easy for me to ask you to do work, by summarizing other people’s opinions. So it was a caveat that you don’t have to elaborate too much.
Going to retract that comment to prevent misunderstanding. Thanks for the links.
I’ll go further and say that I think those two claims are widely believed by many in the AI safety world (in which I count myself) with a degree of confidence that goes way beyond what can be justified by any argument that has been provided by anyone, anywhere, and I think this is a huge epistemic failure of that part of the AI safety community.
I strongly downvoted the OP for making these broad, sweeping, controversial claims as if they are established fact and obviously correct, as opposed to one possible way the world could be which requires good arguments to establish, and not attempting any serious understanding of and engagement with the viewpoints of people who disagree that these organizations shutting down would be the best thing for the world.
I would like the AI safety community to work much harder on its epistemic standards.
There has already been much written on this, enough for there to be a decent level of consensus (which indeed there is here (EAF/LW)).
These essays are well known and I’m aware of basically all of them. I deny that there’s a consensus on the topic, that the essays you link are representative of the range of careful thought on the matter, or that the arguments in these essays are anywhere near rigorous enough to meet my criterion: justifying the degree of confidence expressed in the OP (and some of the posts you link).
I’ve not come across any arguments that debunk the risk in anywhere near the same rigour (and I still have a $1000 bounty open here). Please link to the “careful thought on the matter” from the other side that you mention (or add here). I’m with Richard Ngo when he says:
You seem to be lumping people like Richard Ngo, who is fairly epistemically humble, in with people who are absolutely sure that the default path leads to us all dying. It is only the latter that I’m criticizing.
I agree that AI poses an existential risk, in the sense that it is hard to rule out that the default path poses a serious chance of the end of civilization. That’s why I work on this problem full-time.
I do not agree that it is absolutely clear that default instrumental goals of an AGI entail it killing literally everyone, as the OP asserts.
(I provide some links to views dissenting from this extreme confidence here.)
The OP says
[my emphasis in bold]. This is a key distinction. No one is saying that the default goal will be killing humans; the whole issue is one of collateral damage—it will end up with (to us) arbitrary goals that result in convergent intstrumental goals that lead to us all dead as collateral damage (e.g. turning the planet into “computronium”, or dismantling the Sun for energy).
Sure, I understand that it’s a supposed default instrumental goal and not a terminal goal. Sorry that my wording didn’t make that distinction clear. I’ve now edited it to do so, but I think my overall points stand.
It’s not even (necessarily) a default instrumental goal. It’s collateral damage as the result of other instrumental goals. It may just go straight for dismantling the Sun, knowing that we won’t be able to stop it. Or straight for ripping apart the planet with nanobots (no need for a poison everyone simultaneously step).
Fair enough, I edited it again. I still think the larger points stand unchanged.
No one is saying p(doom) is 100%, but there is good reason to think that it is 50% or more—that the default outcome of AGI is doom. It doesn’t default to somehow everything being ok. To alignment solving itself, or the alignment that has been done today (or by 2030) being enough if we get a foom tomorrow (by 2030). I’ve not seen any compelling argument to that effect.
Thanks for the links. I think a lot of the problem with the proposed solutions is that they don’t scale to ASI, and aren’t water tight. Having 99.999999% alignment in the limit of ASI performing billions of actions a minute still means everyone dead after a little while. RHLF’d GPT-4 is only safe because it is weak.
Alignment at the level that is typical human-to-humanity, or what is represented by “common sense” that can be picked up from training data, is still nowhere near sufficient. Uplifting any given human to superintelligence would also lead to everyone dead before too long, due to the massive power imbalance, even if it’s just by accident (“whoops I was just doing some physics experiments; didn’t think that would happen”; “I thought it would be cool if everyone became a post-human hive mind; I thought they’d like it”).
And quite apart from alignment, we still need to eliminate catastrophic risks from misuse (jailbreaks, open sourced unaligned base model weights) and coordination failure (how to avoid chaos when everyone is wishing for different things from their genies). Those alone are enough to justify shutting it all down now.
It’s easy to ask other people to do work, but do you have things to read on the range of careful thought on this topic?
To be clear, mostly I’m not asking for “more work”, I’m asking people to use much better epistemic hygiene. I did use the phrase “work much harder on its epistemic standards”, but by this I mean please don’t make sweeping, confident claims as if they are settled fact when there’s informed disagreement on those subjects.
Nevertheless, some examples of the sort of informed disagreement I’m referring to:
The mere existence of many serious alignment researchers seriously optimistic about scalable oversight methods such as debate.
This post by Matthew Barnett arguing we’ve been able to specify values much more successfully than MIRI anticipated.
Shard theory, developed mostly by Alex Turner and Quintin Pope, calling into question the utility argmaxer framework which has been used to justify many historical concerns about instrumental convergence leading to AI takeover.
This comment by me arguing ChatGPT is pretty aligned compared to MIRI’s historical predictions, because it does what we mean and not what we say.
A detailed set of objections from Quintin Pope to Eliezer’s views, which Eliezer responded to by saying it’s “kinda long”, and engaged with extremely superficially before writing it off.
This by Stuhlmüller and Byun, as well as many other articles by others, arguing that process oversight is a viable alignment strategy, converging with rather than opposing capabilities.
Notably, the extreme doomer contingent has largely failed even to understand, never mind engage with, some of these arguments, frequently lazily pattern-matching and misrepresenting them as more basic misconceptions. A typical example is thinking Matthew Barnett and I have been saying that GPT understanding human values is evidence against the MIRI/doomer worldview (after all, “the AI knows what you want but does not care, as we’ve said all along”), when in fact we’re saying there’s evidence we have actually pointed GPT successfully at those values.
It’s fine if you have a different viewpoint. Just don’t express that viewpoint as if it’s self-evidently right when there’s serious disagreement on the matter among informed, thoughtful people. An article like the OP which claims that labs should shut down should at least try to engage with the views of someone who thinks the labs should not shut down, and not just pretend such people are fools unworthy of mention.
Aw shit, I’m very sorry for how I phrased it! I realized that it sounds like I’m digging at you. To be clear, I was asking for any links to discussions of alternative views, because I’m curious and haven’t heard them. What I meant is that it’s very easy for me to ask you to do work, by summarizing other people’s opinions. So it was a caveat that you don’t have to elaborate too much.
Going to retract that comment to prevent misunderstanding. Thanks for the links.
Oh lol, thanks for explaining! Sorry for misunderstanding you. (It’s a pretty amusing misunderstanding though, I think you’d agree.)