RobBensinger comments on My highly personal skepticism braindump on existential risk from artificial intelligence.

RobBensinger 7 Feb 2023 16:52 UTC
14 points
5 ∶ 1
What does it mean to align AI with human values?
This article is… really bad.
It’s mostly a summary of Yudkowsky/Bostrom ideas, but with a bunch of the ideas garbled and misunderstood.
Mitchell says that one of the core assumptions of AI risk arguments is “that any goal could be ‘inserted’ by humans into a superintelligent AI agent”. But that’s not true, and in fact a lot of the risk comes from the fact that we have no idea how to’insert’ a goal into an AGI system.
The paperclip maximizer hypothetical here is a misunderstanding of the original idea. (Though it’s faithful to the version Bostrom gives in Superintelligence.) And the misunderstanding seems to have caused Mitchell to misunderstood a bunch of other things about the alignment problem. Picking one of many examples of just-plain-false claims:
“And importantly, in keeping with Bostrom’s orthogonality thesis, the machine has achieved superintelligence without having any of its own goals or values, instead waiting for goals to be inserted by humans.”
The article also says that “research efforts on alignment are underway at universities around the world and at big AI companies such as Google, Meta and OpenAI”. I assume Google here means DeepMind, but what alignment research at Meta does Mitchell have in mind??
Also: “Many researchers are actively engaged in alignment-based projects, ranging from attempts at imparting principles of moral philosophy to machines, to training large language models on crowdsourced ethical judgments.”
… That sure is a bad picture of what looks difficult about alignment.
The implausibility of intelligence explosion
This essay is quite bad. A response here: A reply to Francois Chollet on intelligence explosion
I disagree with Thorstad and Drexler, but those resources seem much better.