This is my list of resources I send to machine learning (ML) researchers in presenting arguments about AI safety. New resources have been coming out fast, and I’ve also been user-testing these, so the top part of this post are my updated (Nov 2022) recommendations. The rest of the post (originally posted June 2022) has been modified for organization but mostly left for reference; I make occasional additions to it (last updated June 2023).
(Readings that are more philosophical and involve x-risk and discussion of AGI-like systems, so expected to be less liked by ML researchers (I have some limited data suggesting this), but they’re anecdotally well-liked by EAs)
These reading choices are drawn from thevarious other reading lists (also Victoria Krakovna’s); this is not original in any way, just something to draw from if you’re trying to send someone some of the more accessible resources.
There is money in the space! If you want to do AI alignment research, you can be funded by either Open Philanthropy (students,faculty- one can also just email them directly) or LTFF with your research proposal. Updated funding sources: Foundational Research Grants, Lightspeed Grants, SFFif you already have a charity
If you wanted to rapidly learn more about the theoretical technical AI alignment space, walking through this curriculum is one of the best resources. A lot of the interesting theoretical stuff is happening online, at LessWrong / Alignment Forum (Introductory Content), since this field is still pretty pre-paradigmatic and people are still working through a lot of the ideas.
Chapter 3 is on natural risks, including risks of asteroid and comet impacts, supervolcanic eruptions, and stellar explosions. Ord argues that we can appeal to the fact that we have already survived for 2,000 centuries as evidence that the total existential risk posed by these threats from nature is relatively low (less than one in 2,000 per century).
Chapter 4 is on anthropogenic risks, including risks from nuclear war, climate change, and environmental damage. Ord estimates these risks as significantly higher, each posing about a one in 1,000 chance of existential catastrophe within the next 100 years. However, the odds are much higher that climate change will result in non-existential catastrophes, which could in turn make us more vulnerable to other existential risks.
Chapter 5 is on future risks, including engineered pandemics and artificial intelligence. Worryingly, Ord puts the risk of engineered pandemics causing an existential catastrophe within the next 100 years at roughly one in thirty. With any luck the COVID-19 pandemic will serve as a “warning shot,” making us better able to deal with future pandemics, whether engineered or not. Ord’s discussion of artificial intelligence is more worrying still. The risk here stems from the possibility of developing an AI system that both exceeds every aspect of human intelligence and has goals that do not coincide with our flourishing. Drawing upon views held by many AI researchers, Ord estimates that the existential risk posed by AI over the next 100 years is an alarming one in ten.
Chapter 6 turns to questions of quantifying particular existential risks (some of the probabilities cited above do not appear until this chapter) and of combining these into a single estimate of the total existential risk we face over the next 100 years. Ord’s estimate of the latter is one in six.
A single AI system with goals that are hostile to humanity quickly becomes sufficiently capable for complete world domination, and causes the future to contain very little of what we value, as described in “Superintelligence”. (Note from Vael: Where the AI has an instrumental incentive to destroy humans and uses its planning capabilities to do so, for example via synthetic biology or nanotechnology.)
This involves multiple AIs accidentally being trained to seek influence, and then failing catastrophically once they are sufficiently capable, causing humans to become extinct or otherwise permanently lose all influence over the future. (Note from Vael: I think we might have to pair this with something like “and in loss of control, the environment then becomes uninhabitable to humans through pollution or consumption of important resources for humans to survive”)
This involves AIs pursuing easy-to-measure goals, rather than the goals humans actually care about, causing us to permanently lose some influence over the future. (Note from Vael: I think we might have to pair this with something like “and in loss of control, the environment then becomes uninhabitable to humans through pollution or consumption of important resources for humans to survive”)
War
Some kind of war between humans, exacerbated by developments in AI, causes an existential catastrophe. AI is a significant risk factor in the catastrophe, such that no catastrophe would be occurred without the developments in AI. The proximate cause of the catastrophe is the deliberate actions of humans, such as the use of AI-enabled, nuclear or other weapons. See Dafoe (2018) for more detail. (Note from Vael: Though there’s a recent argument that it may be unlikely for nuclear weapons to cause an extinction event, and instead it would just be catastrophically bad. One could still do it with synthetic biology though, probably, to get all of the remote people.)
Misuse
Intentional misuse of AI by one or more actors causes an existential catastrophe (excluding cases where the catastrophe was caused by misuse in a war that would not have occurred without developments in AI). See Karnofsky (2016) for more detail.
Other
There’s also two related communities who care about these issues, who you might find interesting
Effective Altruism community, whose strong internet presence is on the EA Forum.If you’re interested in working on an AI safety career, you can apply to schedule a one-on-one coaching call here.
Governance, aimed at highly capable systems in addition to today’s systems
It seemed like a lot of your thoughts about AI risk went through governance, so wanted to mention what the space looks like (spoiler: it’s preparadigmatic) if you haven’t seen that yet!
Read the posts by US Policy Careers! And reach out to connect with the DC community via the forms on their writeups (e.g. this form from this page).
Center for Security and Emerging Technology (CSET). See also CSET Foundational Research Grants, which is technically-oriented but gives flavor for some of the work.
Tianxia 天下and Concordia Consulting 安远咨询 are the main organizations in the space. If you’re interested in getting involved in those communities, let me know and I can connect you!
Olah, C., Cammarata, N., Schubert, L., Goh, G., Petrov, M., & Carter, S. (2020). Zoom in: An introduction to circuits.Distill, 5(3), e00024-001. (Important work in the field of AI interpretability, a subfield of AI safety)
I swear I didn’t set out to self-promote here—it’s just doing weirdly well on user testing for both EAs and ML researchers at the moment (this is partly because it’s relatively current; I expect it’ll do less well over time)
Note: I’ve written a new version of this talk that goes over the AI risk arguments through March 2023, and there’s a new website talking about my interview findings (ai-risk-discussions.org).
In the interests of increasing options, I wanted to reach out and say that I’d be particularly happy to help you explore synthetic biology pathways more, if you were so inclined. I think it’s pretty plausible we’ll get another worse pandemic in our lifetimes, and worth investing a career or part of a career to work on it. Especially since so few people will make that choice, so a single person probably matters a lot compared to entering other more popular careers.
No worries if you’re not interested though—this is just one option out of many. I’m emailing you in a batch instead of individually so that hopefully you feel empowered to ignore this email and be done with this class :P. Regardless, thanks for a great quarter and hope you have great summers!
If you are interested:
I’m happy to talk on Zoom, get you connected up with resources (reading list, 80K, job board) and researchers at Stanford (e.g. Megan Palmer’s lab, Daniel Greene, Prof. Luby). [also mention 80K coaching if relevant]
There’s a lot of room for new (startup / nonprofit) projects to be started—consider Alvea (website), and this list, and other lists contained in these posts. Plus: job board!
Resources I send to AI researchers about AI safety
This is my list of resources I send to machine learning (ML) researchers in presenting arguments about AI safety. New resources have been coming out fast, and I’ve also been user-testing these, so the top part of this post are my updated (Nov 2022) recommendations. The rest of the post (originally posted June 2022) has been modified for organization but mostly left for reference; I make occasional additions to it (last updated June 2023).
Core recommended resources
Core readings for ML researchers[1]
Overall
FAQ on Catastrophic AI Risks (2023), Yoshua Bengio
“The Alignment Problem from a Deep Learning Perspective” by Richard Ngo et al. (2022), 65m
Arguments for risk from advanced AI systems
“Why I Think More NLP Researchers Should Engage with AI Safety Concerns” by Sam Bowman (2022), 15m (Note: stop at the section “The new lab”)
“Researcher Perceptions of Current and Future AI” by Vael Gates (2022), 48m (Note: Skip the Q&A). [2]
Orienting
“More is Different for AI” by Jacob Steinhardt (2022), 30m (Note: intro and first three posts only)
“AI Timelines/Risk Projections as of Sep. 2022” (Note: first 3 pages only), 5m
“Frequent Arguments About Alignment” by John Schulman (2021), 15m
Research directions
See Ngo et al. (2022), above
Watch “Current Work in AI Alignment” by Paul Christiano (2019), 30m (Note: here is the transcript)
(these are much less vetted, added quickly) Research in goal misgeneralization (Shah et al., 2022); specification gaming (Krakovna et al., 2020); mechanistic interpretability (Olsson et al. (2022), Meng et al. (2022)) and using unsupervised methods (Burns et al., 2022; thread), eliciting latent knowledge (ELK); ML safety divided into robustness, monitoring, alignment and external safety (Hendrycks et al., 2022)
Core readings for the public
“The Case For Taking AI Seriously As A Threat to Humanity” by Kelsey Piper (2020), 30m
“The Alignment Problem” by Brian Christian (2020), book
Core readings for EAs
(Readings that are more philosophical and involve x-risk and discussion of AGI-like systems, so expected to be less liked by ML researchers (I have some limited data suggesting this), but they’re anecdotally well-liked by EAs)
Watch “Existential Risk from Power-Seeking AI” by Joe Carlsmith (2021), 37m (Note: watch the first 37 min and skip the Q&A. Here is the transcript)
This talk is based on the full report (audio): Carlsmith (2021)
“The Most Important Century” by Holden Karnofsky
Key piece: “AI Timelines” by Holden Karnofsky (2021), 13m
“Why AI Alignment Could be Hard with Modern Deep Learning” by Ajeya Cotra (2021), 30m (Note: feel free to skip the section “How deep learning works at a high level”)
“80,000 Hours Podcast: Preventing an AI-related Catastrophe” (2022), 2.5h
Highly recommended for the motivated: AGISF Technical Curriculum
Getting involved for EAs
If you haven’t read Charlie’s writeup about research, or Gabe’s writeup about engineering, worth a look! Richard Ngo’s AGI safety career advice is also good. Also if you’re interested in theory, see John Wentworth’s writeup about independent research, and Vivek wrote some alignment exercises to try (also see John Wentworth’s work in general). With respect to outreach, I’d try to use a more technical pitch than what Vael used; I think Sam Bowman’s pitch is pretty great, and Marius also has a nice writeup of his pitch (not specific to NLP).
Full list of recommended resources
These reading choices are drawn from the various other reading lists (also Victoria Krakovna’s); this is not original in any way, just something to draw from if you’re trying to send someone some of the more accessible resources.
Public-oriented
“The Case For Taking AI Seriously As A Threat to Humanity” by Kelsey Piper (2020), 30m
“The Alignment Problem” by Brian Christian (2020), book
“80,000 Hours Podcast: Preventing an AI-related Catastrophe” (2022), 2.5h
“The Most Important Century” by Holden Karnofsky
Key piece: “AI Timelines” by Holden Karnofsky (2021), 13m
AI Safety YouTube channel by Robert Miles
(Alternatives: “Of Myths and Moonshine” by Stuart Russell (2014), 5m // “Human-Compatible” by Stuart Russell (2019), book // the Wikipedia page just got substantially rewritten: https://en.wikipedia.org/wiki/AI_alignment)
Central Arguments
“FAQ on Catastrophic AI Risks” by Yoshua Bengio (2023)
“Why I Think More NLP Researchers Should Engage with AI Safety Concerns” by Sam Bowman (2022), 15m (Note: feel free to stop at the section “The new lab”)
“AI Safety and Neighboring Communities: A Quick-Start Guide, as of Summer 2022” by Sam Bowman (2022)
“More is Different for AI” by Jacob Steinhardt (2022), 30m (Note: intro and first three posts are most important)
Watch “Existential Risk from Power-Seeking AI” by Joe Carlsmith (2021), 37m (Note: watch the first 37 min and skip the Q&A. Here is the transcript)
This talk is based on the full report: Carlsmith (2021)
“Why Alignment Could be Hard with Modern Deep Learning” by Ajeya Cotra (2021), 30m (Note: feel free to skip the section “How deep learning works at a high level”)
“Researcher Perceptions of Current and Future AI” by Vael Gates (2022), 48m (Note: Skip the Q&A).
“The Alignment Problem From a Deep Learning Perspective” by Richard Ngo (2022), 40m
“Without Specific Countermeasures, the Easiest Path to Transformative AI Likely Leads to AI Takeover” by Ajeya Cotra (2022), 101m
“AI Timelines/Risk Projections as of Sep. 2022” (Note: first 3 pages are most important)
Technical Work on AI alignment
See Ngo et al. (2022) above
Watch “Current Work in AI Alignment” by Paul Christiano (2019), 30m (transcript)
AI Alignment / Safety Organizations (longform) and (shortform)
“Goal Misgeneralization: Why Correct Specifications Aren’t Enough For Correct Goals” by Shah et al. (2022)
“Discovering Latent Knowledge in Language Models Without Supervision” by Burns et al. (2023), Tweet thread
Eliciting Latent Knowledge from the Alignment Research Center
Interpretability work aimed at alignment: Chris Olah’s work: e.g. “Multimodal Neurons in Artificial Neural Networks” by Goh et al. (2021) and “In-context Learning and Induction Heads” by Olsson et al. (2022), and David Bau’s work: (e.g.“Locating and Editing Factual Associations in GPT” by Meng K., et al. (2022))
“Optimal Policies Tend to Seek Power” by Turner et al. (2021)
“Specification Gaming: The Flip Side of AI Ingenuity” by DeepMind Safety Research (2020), 8m
“The Off-Switch Game” by Menell et al. (2016)
“Corrigibility” by Soares et al. (2015)
“Unsolved Problems in AI Safety” by Hendrycks, D., et al. (2021), 60m
“Concrete Problems in AI safety” by Amodei, D., et al. (2016)
AI Safety Resources by Victoria Krakovna (DeepMind)
AGI Safety Fundamentals
Alignment Newsletter and ML Safety Newsletter
How does this lead to xrisk / killing people though?
“AI Could Defeat All of Us Combined” by Holden Karnofsky (2022), 20m
“AI Suggested 40,000 New Possible Chemical Weapons in Just Six Hours” by Justine Calma (2022)
“AGI Run: A List of Lethalities” by Eliezer Yudkowsky (2022)
“Survey on AI Existential Risk Scenarios” by Clarke, C., et al. (2021), 8m
Further reading:
“What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)” by Andrew Critch (2021), 26m
“What Failure Looks Like” by Ben Pace (2020), 8m
Prologue of “Life 3.0” by Max Tegmark (2017)
Forecasting (When might advanced AI be developed?)
“AI Timelines” by Holden Karnofsky (2021), 13m
“Forecasting Transformative AI” by Holden Karnofsky (2021), 13m
“Updates and Lessons from AI forecasting” by Jacob Steinhardt (2021), 15m
“AI Forecasting One Year In” by Jacob Steinhardt (2022), 15m
“Forecasting ML Benchmarks in 2023” by Jacob Steinhardt (2022), 15m
“When Will AI Exceed Human Performance? Evidence From AI Experts” by Grace, K., et al. (2018), 20m
“The Scaling Hypothesis” by Gwern (2020)
Calibration and Forecasting
“Superforecasting in a Nutshell” by Luke Muehlhauser (2021)
Calibration Training, 30m
Register predictions about AI risk on either of these two questions: Date of Weak AGI and Date of AGI, 30m
“AI Safety and Timelines” by Metaculus user Sergio (2022), 14m
Common Misconceptions
“Frequent Arguments About Alignment” by John Schulman (2021), 15m
Counterarguments to AI safety (messy doc):
“Arguments against advanced AI safety”
Collection of public surveys about AI
“Surveys of public opinion on AI” on AI Impacts Wiki
Miscellaneous older text
Text I’m no longer using but still use for reference sometimes.
If you’re interested in getting into this:
The strongest academic center is probably UC Berkeley’s Center for Human-Compatible AI. Mostly there are researchers distributed at different institutions e.g. Sam Bowman at NYU, Dan Hendrycks and Jacob Steinhardt at UC Berkeley, Dylan Hadfield-Menell at MIT, David Krueger at Cambridge, Alex Turner at Oregon State, etc. Also, a lot of the work is done by industry and nonprofits: Anthropic, Redwood Research, OpenAI’s safety team, DeepMind’s Safety team, Alignment Research Center, Machine Intelligence Research Institute, independent researchers in various places. Consider also the Cooperative AI Foundation, and Andrew Critch’s article on AI safety areas.
There is money in the space! If you want to do AI alignment research, you can be funded by either Open Philanthropy (students, faculty- one can also just email them directly) or LTFF with your research proposal. Updated funding sources: Foundational Research Grants, Lightspeed Grants, SFF if you already have a charity
If you wanted to rapidly learn more about the theoretical technical AI alignment space, walking through this curriculum is one of the best resources. A lot of the interesting theoretical stuff is happening online, at LessWrong / Alignment Forum (Introductory Content), since this field is still pretty pre-paradigmatic and people are still working through a lot of the ideas.
And if you’re interested in what the career pathway looks like, check out Rohin Shah (DeepMind)’s FAQ here! An additional guide is here. Also: guide to get involved in AI safety research engineering.
Introduction to large-scale risks from humanity, including “existential risks” that could lead to the extinction of humanity
The first third of this book summary (copied below) of the book “The Precipice: Existential Risk and the Future of Humanity” by Toby Ord
How AI could be an existential risk
AI alignment researchers disagree a weirdly high amount about how AI could constitute an existential risk, so I hardly think the question is settled. Some plausible ones people are considering (copied from the paper)
“Superintelligence”
A single AI system with goals that are hostile to humanity quickly becomes sufficiently capable for complete world domination, and causes the future to contain very little of what we value, as described in “Superintelligence”. (Note from Vael: Where the AI has an instrumental incentive to destroy humans and uses its planning capabilities to do so, for example via synthetic biology or nanotechnology.)
Part 2 of “What failure looks like”
This involves multiple AIs accidentally being trained to seek influence, and then failing catastrophically once they are sufficiently capable, causing humans to become extinct or otherwise permanently lose all influence over the future. (Note from Vael: I think we might have to pair this with something like “and in loss of control, the environment then becomes uninhabitable to humans through pollution or consumption of important resources for humans to survive”)
Part 1 of “What failure looks like”
This involves AIs pursuing easy-to-measure goals, rather than the goals humans actually care about, causing us to permanently lose some influence over the future. (Note from Vael: I think we might have to pair this with something like “and in loss of control, the environment then becomes uninhabitable to humans through pollution or consumption of important resources for humans to survive”)
War
Some kind of war between humans, exacerbated by developments in AI, causes an existential catastrophe. AI is a significant risk factor in the catastrophe, such that no catastrophe would be occurred without the developments in AI. The proximate cause of the catastrophe is the deliberate actions of humans, such as the use of AI-enabled, nuclear or other weapons. See Dafoe (2018) for more detail. (Note from Vael: Though there’s a recent argument that it may be unlikely for nuclear weapons to cause an extinction event, and instead it would just be catastrophically bad. One could still do it with synthetic biology though, probably, to get all of the remote people.)
Misuse
Intentional misuse of AI by one or more actors causes an existential catastrophe (excluding cases where the catastrophe was caused by misuse in a war that would not have occurred without developments in AI). See Karnofsky (2016) for more detail.
Other
There’s also two related communities who care about these issues, who you might find interesting
Effective Altruism community, whose strong internet presence is on the EA Forum.If you’re interested in working on an AI safety career, you can apply to schedule a one-on-one coaching call here.
Rationalist community. The most(?) popular blog from this community is from Scott Alexander (first blog, second blog), and the Rationalist’s main online forum is LessWrong. Amusingly, they also write fantastic fanfiction (e.g. Harry Potter and the Methods of Rationality) and I think some of their nonfiction is fantastic.
Governance, aimed at highly capable systems in addition to today’s systems
It seemed like a lot of your thoughts about AI risk went through governance, so wanted to mention what the space looks like (spoiler: it’s preparadigmatic) if you haven’t seen that yet!
Read the posts by US Policy Careers! And reach out to connect with the DC community via the forms on their writeups (e.g. this form from this page).
AI Governance and Coordination by 80,000 Hours
Horizon Fellowship
Center for Security and Emerging Technology (CSET). See also CSET Foundational Research Grants, which is technically-oriented but gives flavor for some of the work.
AI governance curriculum (highly recommended)
The longtermist AI governance landscape: a basic overview and more personal posts of how to get involved (also the search term “AI Policy”)
AI Governance: Opportunity and Theory of Impact / AI Governance: A Research Agenda by Allan Dafoe and GovAIgenerally
See also: Legal Priorities Project, and Gillian Hadfield (U. Toronto)
The intersection of governance and technical AI safety work
see also What does it take to catch a Chinchilla? Verifying Rules on Large-Scale Neural Network Training via Compute Monitoring
and follow-up work Tools for Verifying Neural Models’ Training Data
AI Safety in China
Tianxia 天下and Concordia Consulting 安远咨询 are the main organizations in the space. If you’re interested in getting involved in those communities, let me know and I can connect you!
China-related AI safety and governance paths
ChinAI Newsletter
AI Safety community building, student-focused (see academic efforts above)
AI Safety Hub has a good set of resources if you reach out to them
EA Cambridge’s AGI Safety Fundamentals (AGISF) program
Stanford Existential Risk Initiative (SERI), Swiss Existential Risk Initiative (CHERI), Cambridge Existential Risk Initiative (CARI)
An article about Stanford EA and Stanford SAIA
Global Challenges Project
(if they’re interested in my work specifically) Transcripts on Interviews with AI Researchers, by Vael Gates
https://forum.effectivealtruism.org/posts/S7dhJR5TDwPb5jypG/levelling-up-in-ai-safety-research-engineering
https://forum.effectivealtruism.org/posts/7WXPkpqKGKewAymJf/how-to-pursue-a-career-in-technical-ai-alignment
Center for AI Safety
If they’re curious about other existential / global catastrophic risks:
Large-scale risks from synthetic biology
Calma J. AI suggested 40,000 new possible chemical weapons in just six hours. The Verge, 2022
Kupferschmidt K. (2017) “How Canadian researchers reconstituted an extinct poxvirus for $100,000 using mail-order DNA”. Science, AAAS.
“Reducing global catastrophic biological risks” by 80,000 Hours
Email I sent to some Stanford students, with further resources[3]
“List of Lists of Concrete Biosecurity Project Ideas”
Talk to an expert!
Large-scale risks from nuclear
Nuclear close calls, by the Future of Humanity Institute
“Nuclear Security” by 80,000 Hours
Why I don’t think we’re on the right timescale to worry most about climate change:
“Climate change” by 80,000 Hours
List for “Preventing Human Extinction” class
I’ve also included a list of resources that I had students read through for the course Stanford first-year course “Preventing Human Extinction”.
When might advanced AI be developed?
Grace, K., Salvatier, J., Dafoe, A., Zhang, B., & Evans, O. (2018). When will AI exceed human performance? Evidence from AI experts. Journal of Artificial Intelligence Research, 62, 729-754.
Why might advanced AI be a risk?
Cotra, A. (2021, Sep 21). Why AI alignment could be hard with modern deep learning. Cold Takes. https://www.cold-takes.com/why-ai-alignment-could-be-hard-with-modern-deep-learning/
Krakovna, V., Uesato, J., Mikulik, V., Rahtz, M., Everitt, T., Kumar, R., Kenton, Z., Leike, J., & Legg, S. (2020) Specification gaming: the flip side of AI ingenuity. DeepMind Safety Research. https://medium.com/@deepmindsafetyresearch/specification-gaming-the-flip-side-of-ai-ingenuity-c85bdb0deeb4
Thinking about making advanced AI go well (technical)
Christiano, P. (2019). Current work in AI alignment [Lecture]. YouTube. https://www.youtube.com/watch?v=-vsYtevJ2bc
Choose one:
Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., & Mané, D. (2016). Concrete problems in AI safety. arXiv preprint arXiv:1606.06565
Hendrycks, D., Carlini, N., Schulman, J., & Steinhardt, J. (2021). Unsolved Problems in ML Safety. arXiv preprint arXiv:2109.13916.
Thinking about making advanced AI go well (governance)
Dafoe, A. (2020). AI Governance: Opportunity and Theory of Impact. Center for the Governance of AI. September 15, 2020.
Optional (large-scale risks from AI)
Karnofsky, H. (n.d.) The “most important century” blog post series (few page summary). Cold Takes. https://www.cold-takes.com/most-important-century/
Ngo, R. (2020). AGI Safety from First Principles. Alignment Forum. https://www.alignmentforum.org/s/mzgtmmTKKn5MuCzFJ
Zweetsloot, R., & Dafoe, A. (2019). Thinking About Risks From AI: Accidents, Misuse and Structure. Lawfare. February, 11, 2019.
Miles, R. (2021, June 24). Intro to AI Safety, Remastered [Video]. YouTube. https://www.youtube.com/watch?v=pYXy-A4siMw
Clarke, S., Carlier, A., & Schuett, J. (2021). Survey on AI existential risk scenarios. Alignment Forum. https://www.alignmentforum.org/posts/WiXePTj7KeEycbiwK/survey-on-ai-existential-risk-scenarios
Carlsmith, J. (2021). Is power-seeking AI an existential risk?. Alignment Forum. https://www.alignmentforum.org/posts/HduCjmXTBD4xYTegv/draft-report-on-existential-risk-from-power-seeking-ai
Natural science sources
Calma J. AI suggested 40,000 new possible chemical weapons in just six hours. The Verge, 2022
Olah, C., Cammarata, N., Schubert, L., Goh, G., Petrov, M., & Carter, S. (2020). Zoom in: An introduction to circuits. Distill, 5(3), e00024-001. (Important work in the field of AI interpretability, a subfield of AI safety)
See https://www.lesswrong.com/posts/gpk8dARHBi7Mkmzt9/what-ai-safety-materials-do-ml-researchers-find-compelling
I swear I didn’t set out to self-promote here—it’s just doing weirdly well on user testing for both EAs and ML researchers at the moment (this is partly because it’s relatively current; I expect it’ll do less well over time)
Note: I’ve written a new version of this talk that goes over the AI risk arguments through March 2023, and there’s a new website talking about my interview findings (ai-risk-discussions.org).
Hi X,
[warm introduction]
In the interests of increasing options, I wanted to reach out and say that I’d be particularly happy to help you explore synthetic biology pathways more, if you were so inclined. I think it’s pretty plausible we’ll get another worse pandemic in our lifetimes, and worth investing a career or part of a career to work on it. Especially since so few people will make that choice, so a single person probably matters a lot compared to entering other more popular careers.
No worries if you’re not interested though—this is just one option out of many. I’m emailing you in a batch instead of individually so that hopefully you feel empowered to ignore this email and be done with this class :P. Regardless, thanks for a great quarter and hope you have great summers!
If you are interested:
I’m happy to talk on Zoom, get you connected up with resources (reading list, 80K, job board) and researchers at Stanford (e.g. Megan Palmer’s lab, Daniel Greene, Prof. Luby). [also mention 80K coaching if relevant]
A lot of the students who are interested in this at Stanford are affiliated with Stanford EA (I’d sign up for a one-on-one), and there’s some very cool people working on these issues at the community-building (“Tessa Alexanian: How Biology Has Changed”, “Biosecurity as an EA cause area”), grantmaking, start-up (see next point) and governance levels (see job board).
There’s a lot of room for new (startup / nonprofit) projects to be started—consider Alvea (website), and this list, and other lists contained in these posts. Plus: job board!