Tbh the whole piece is my go to for skepticism about AI. In particular, the analogy with alchemy seems apropos given that concepts like sentience are very ill posed.
What would you say are good places to get up to speed on what we’ve learned about AI risk and the alignment problem in the past 8 years? Thanks much!
In particular, the analogy with alchemy seems apropos given that concepts like sentience are very ill posed.
I took another look at that section, interesting to learn more about the alchemists.
I think most AI alignment researchers consider ‘sentience’ to be unimportant for questions of AI existential risk—it doesn’t turn out to matter whether or not an AI is conscious or has qualia or anything like that. [1] What matters a lot more is whether AI can model the world and gain advanced capabilities, and AI systems today are making pretty quick progress along both these dimensions.
What would you say are good places to get up to speed on what we’ve learned about AI risk and the alignment problem in the past 8 years?
My favorite overview of the general topic is the AGI Safety Fundamentals course from EA Cambridge. I found taking the actual course to be very worthwhile, but they also make the curriculum freely available online. Weeks 1-3 are mostly about AGI risk and link to a lot of great readings on the topic. The weeks after that are mostly about looking at different approaches to solving AI alignment.
As for what has changed specifically in the last 8 years. I probably can’t do the topic justice, but a couple things that jump out at me:
The “inner alignment” problem has been identified and articulated. Most of the problems from Bostrom’s Superintelligence (2014) fall under the category of what we now call “outer alignment”, as the inner alignment problem wasn’t really known at that time. Outer alignment isn’t solved yet, but substantial work has been done on it. Inner alignment, on the other hand, is something many researchers consider to be more difficult.
AI has advanced more rapidly than many people anticipated. People used to point to many things that ML models and other computer programs couldn’t do yet as evidence that we were a long way from having anything resembling AI. But AI has now passed many of those milestones.
Here I’ll list out some of those previously unsolved problems along with AI advances since 2015 that have solved them: Beating humans at Go (AlphaGo), beating humans at StarCraft (AlphaStar), biological protein folding (AlphaFold), having advanced linguistic/conversational abilities (GPT-3, PaLM), generalizing knowledge to competence in new tasks (XLand), artistic creation (DALL·E 2), multi-modal capabilities like combined language + vision + robotics (SayCan, Socratic Models, Gato).
Because of these rapid advances, many people have updated their estimates of when transformative AI will arrive to many years sooner than they previously thought. This cuts down on the time we have to solve the alignment problem.
--
[1]: It matters a lot whether the AI is sentient for moral questions around how we should treat advanced AI. But those are separate questions from AI x-risk.
Tbh the whole piece is my go to for skepticism about AI. In particular, the analogy with alchemy seems apropos given that concepts like sentience are very ill posed.
What would you say are good places to get up to speed on what we’ve learned about AI risk and the alignment problem in the past 8 years? Thanks much!
I took another look at that section, interesting to learn more about the alchemists.
I think most AI alignment researchers consider ‘sentience’ to be unimportant for questions of AI existential risk—it doesn’t turn out to matter whether or not an AI is conscious or has qualia or anything like that. [1] What matters a lot more is whether AI can model the world and gain advanced capabilities, and AI systems today are making pretty quick progress along both these dimensions.
My favorite overview of the general topic is the AGI Safety Fundamentals course from EA Cambridge. I found taking the actual course to be very worthwhile, but they also make the curriculum freely available online. Weeks 1-3 are mostly about AGI risk and link to a lot of great readings on the topic. The weeks after that are mostly about looking at different approaches to solving AI alignment.
As for what has changed specifically in the last 8 years. I probably can’t do the topic justice, but a couple things that jump out at me:
The “inner alignment” problem has been identified and articulated. Most of the problems from Bostrom’s Superintelligence (2014) fall under the category of what we now call “outer alignment”, as the inner alignment problem wasn’t really known at that time. Outer alignment isn’t solved yet, but substantial work has been done on it. Inner alignment, on the other hand, is something many researchers consider to be more difficult.
Links on inner alignment: Canonical post on inner alignment, Article explainer, Video explainer
AI has advanced more rapidly than many people anticipated. People used to point to many things that ML models and other computer programs couldn’t do yet as evidence that we were a long way from having anything resembling AI. But AI has now passed many of those milestones.
Here I’ll list out some of those previously unsolved problems along with AI advances since 2015 that have solved them: Beating humans at Go (AlphaGo), beating humans at StarCraft (AlphaStar), biological protein folding (AlphaFold), having advanced linguistic/conversational abilities (GPT-3, PaLM), generalizing knowledge to competence in new tasks (XLand), artistic creation (DALL·E 2), multi-modal capabilities like combined language + vision + robotics (SayCan, Socratic Models, Gato).
Because of these rapid advances, many people have updated their estimates of when transformative AI will arrive to many years sooner than they previously thought. This cuts down on the time we have to solve the alignment problem.
--
[1]: It matters a lot whether the AI is sentient for moral questions around how we should treat advanced AI. But those are separate questions from AI x-risk.