Sure! It would depend on what you mean by “an argument against AI risk”:
If you mean “What’s the main argument that makes you more optimistic about AI outcomes?”, I made a list of these in 2018.
If you mean “What’s the likeliest way you think it could turn out that aligning AGI is unnecessary in order to do a pivotal act / initiate an as-long-as-needed reflection?”, I’d currently guess it’s using strong narrow-AI systems to accelerate you to Drexlerian nanotechnology (which can then be used to build powerful things like “large numbers of fast-running human whole-brain emulations”).
If you mean “What’s the likeliest way you think it could turn out that humanity’s current trajectory is basically OK / no huge actions or trajectory changes are required?”, I’d say that the likeliest scenario is one where AGI kills all humans, but this isn’t a complete catastrophe for the future value of the reachable universe because the AGI turns out to be less like a paperclip maximizer and more like a weird sentient alien that wants to fill the universe with extremely-weird-but-awesome alien civilizations. This sort of scenario is discussed in Superintelligent AI is necessary for an amazing future, but far from sufficient.
If you mean “What’s the likeliest way you think it could turn out that EAs are focusing too much on AI and should focus on something else instead?”, I’d guess it’s if we should focus more on biotech. E.g., this conjunction could turn out to be true: (1) AGI is 40+ years away; (2) by default, it will be easy for small groups of crazies to kill all humans with biotech in 20 years; and (3) EAs could come up with important new ways to avoid disaster if we made this a larger focus (though it’s already a reasonably large focus in EA).
Another way it could be bad that EAs are focusing on AI is if EAs are accelerating AGI capabilities / shortening timelines way more than we’re helping with alignment (or otherwise increasing the probability of good outcomes).
It’s mostly a summary of Yudkowsky/Bostrom ideas, but with a bunch of the ideas garbled and misunderstood.
Mitchell says that one of the core assumptions of AI risk arguments is “that any goal could be ‘inserted’ by humans into a superintelligent AI agent”. But that’s not true, and in fact a lot of the risk comes from the fact that we have no idea how to’insert’ a goal into an AGI system.
The paperclip maximizer hypothetical here is a misunderstanding of the original idea. (Though it’s faithful to the version Bostrom gives in Superintelligence.) And the misunderstanding seems to have caused Mitchell to misunderstood a bunch of other things about the alignment problem. Picking one of many examples of just-plain-false claims:
“And importantly, in keeping with Bostrom’s orthogonality thesis, the machine has achieved superintelligence without having any of its own goals or values, instead waiting for goals to be inserted by humans.”
The article also says that “research efforts on alignment are underway at universities around the world and at big AI companies such as Google, Meta and OpenAI”. I assume Google here means DeepMind, but what alignment research at Meta does Mitchell have in mind??
Also: “Many researchers are actively engaged in alignment-based projects, ranging from attempts at imparting principles of moral philosophy to machines, to training large language models on crowdsourced ethical judgments.”
… That sure is a bad picture of what looks difficult about alignment.
Could you list some of your favorite/the strongest-according-to-you (collections of) arguments against AI risk?
Sure! It would depend on what you mean by “an argument against AI risk”:
If you mean “What’s the main argument that makes you more optimistic about AI outcomes?”, I made a list of these in 2018.
If you mean “What’s the likeliest way you think it could turn out that aligning AGI is unnecessary in order to do a pivotal act / initiate an as-long-as-needed reflection?”, I’d currently guess it’s using strong narrow-AI systems to accelerate you to Drexlerian nanotechnology (which can then be used to build powerful things like “large numbers of fast-running human whole-brain emulations”).
If you mean “What’s the likeliest way you think it could turn out that humanity’s current trajectory is basically OK / no huge actions or trajectory changes are required?”, I’d say that the likeliest scenario is one where AGI kills all humans, but this isn’t a complete catastrophe for the future value of the reachable universe because the AGI turns out to be less like a paperclip maximizer and more like a weird sentient alien that wants to fill the universe with extremely-weird-but-awesome alien civilizations. This sort of scenario is discussed in Superintelligent AI is necessary for an amazing future, but far from sufficient.
If you mean “What’s the likeliest way you think it could turn out that EAs are focusing too much on AI and should focus on something else instead?”, I’d guess it’s if we should focus more on biotech. E.g., this conjunction could turn out to be true: (1) AGI is 40+ years away; (2) by default, it will be easy for small groups of crazies to kill all humans with biotech in 20 years; and (3) EAs could come up with important new ways to avoid disaster if we made this a larger focus (though it’s already a reasonably large focus in EA).
Another way it could be bad that EAs are focusing on AI is if EAs are accelerating AGI capabilities / shortening timelines way more than we’re helping with alignment (or otherwise increasing the probability of good outcomes).
Here are a few non-MIRI perspectives if you’re interested:
What does it mean to align AI with human values?
The implausibility of intelligence explosion
Against the singularity hypothesis
Book Review: Reframing Superintelligence
This article is… really bad.
It’s mostly a summary of Yudkowsky/Bostrom ideas, but with a bunch of the ideas garbled and misunderstood.
Mitchell says that one of the core assumptions of AI risk arguments is “that any goal could be ‘inserted’ by humans into a superintelligent AI agent”. But that’s not true, and in fact a lot of the risk comes from the fact that we have no idea how to’insert’ a goal into an AGI system.
The paperclip maximizer hypothetical here is a misunderstanding of the original idea. (Though it’s faithful to the version Bostrom gives in Superintelligence.) And the misunderstanding seems to have caused Mitchell to misunderstood a bunch of other things about the alignment problem. Picking one of many examples of just-plain-false claims:
“And importantly, in keeping with Bostrom’s orthogonality thesis, the machine has achieved superintelligence without having any of its own goals or values, instead waiting for goals to be inserted by humans.”
The article also says that “research efforts on alignment are underway at universities around the world and at big AI companies such as Google, Meta and OpenAI”. I assume Google here means DeepMind, but what alignment research at Meta does Mitchell have in mind??
Also: “Many researchers are actively engaged in alignment-based projects, ranging from attempts at imparting principles of moral philosophy to machines, to training large language models on crowdsourced ethical judgments.”
… That sure is a bad picture of what looks difficult about alignment.
This essay is quite bad. A response here: A reply to Francois Chollet on intelligence explosion
I disagree with Thorstad and Drexler, but those resources seem much better.