AI safety is the study of ways to reduce risks posed by artificial intelligence.
Interventions that aim to reduce these risks can be split into:
Technical alignment - research on how to align AI systems with human or moral goals
AI governance - reducing AI risk by e.g. global coordination around regulating AI development or providing incentives for corporations to be more cautious in their AI research
AI forecasting - predicting AI capabilities ahead of time
Reading on why AI might be an existential risk
Hilton, Benjamin (2023) Preventing an AI-related catastrophe, 80000 Hours, March 2023
Cotra, Ajeya (2022) Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover Effective Altruism Forum, July 18
Carlsmith, Joseph (2022) Is Power-Seeking AI an Existential Risk? Arxiv, 16 June
Yudkowsky, Eliezer (2022) AGI Ruin: A List of Lethalities LessWrong, June 5
Ngo et al (2023) The alignment problem from a deep learning perspectiveArxiv, February 23
Arguments against AI safety
AI safety and AI risk is sometimes referred to as a Pascal’s Mugging [1], implying that the risks are tiny and that for any stated level of ignorable risk the the payoffs could be exaggerated to force it to still be a top priority. A response to this is that in a survey of 700 ML researchers, the median answer to the “the probability that the long-run effect of advanced AI on humanity will be “extremely bad (e.g., human extinction)” was 5% with, with 48% of respondents giving 10% or higher[2]. These probabilites are too high (by at least 5 orders of magnitude) to be consider Pascalian.
Further reading on arguments against AI Safety
Grace, Katja (2022) Counterarguments to the basic AI x-risk case EA Forum, October 14
Garfinkel, Ben (2020) Scrutinising classic AI risk arguments 80000 Hours Podcast, July 9
AI safety as a career
80,000 Hours’ medium-depth investigation rates technical AI safety research a “priority path”—among the most promising career opportunities the organization has identified so far.[3][4] Richard Ngo and Holden Karnofsky also have advice for those interested in working on AI Safety[5][6].
Further reading
Gates, Vael (2022) Resources I send to AI researchers about AI safety, Effective Altruism Forum, June 13.
Krakovna, Victoria (2017) Introductory resources on AI safety research, Victoria Krakovna’s Blog, October 19.
Ngo, Richard (2019) Disentangling arguments for the importance of AI safety, Effective Altruism Forum, January 21.
Rice, Issa; Naik, Vipul (2024) Timeline of AI safety, Timelines Wiki
Related entries
AI alignment | AI governance | AI forecasting| AI takeoff | AI race | Economics of artificial intelligence |AI interpretability | AI risk | cooperative AI | building the field of AI safety |
- ^
https://twitter.com/amasad/status/1632121317146361856 The CEO of Replit, a coding organisation who are involved in ML Tools
- ^
- ^
Todd, Benjamin (2023) The highest impact career paths our research has identified so far, 80,000 Hours, May 12.
- ^
Hilton, Benjamin (2023) AI safety technical research, 80,000 Hours, June 19th
- ^
Ngo, Richard (2023) AGI safety career advice, EA Forum, May 2
- ^
Karnofsky, Holden (2023), Jobs that can help with the most important century, EA Forum, Feb 12