Four reasons I find AI safety emotionally compelling
‘I don’t feel emotionally motivated to work on AI safety, even though I’m intellectually convinced that it’s important.’
It always surprises me when people say this because I find my work at Nonlinear on AI safety incredibly motivating. I’m sharing my reasons in the hope that they’ll resonate with some of you, and that these ideas will help bring your emotional drives into greater harmony with your abstract convictions.
1. The dramatic scale of AGI is inspiring
When I was a kid, I wanted to save the world. Like many EAs, I was obsessed with stories of superheroes who could use their powers to save whole cities from catastrophe. I aspired to be like Gandhi, or Martin Luther King, and to do something really big and important; something that would forge widespread, lasting change. But as I got older, these dreams collided violently with the harsh reality: changing things is really really hard. I became more realistic and scaled down my ambitions.
I started off working on global health. In this area, the most I could hope for was (for example) to reduce infant mortality by a percent or two in a single country—and that’s if my career was an astonishing success. This is significant—that’s a lot of lives saved! - but it’s far more modest than my childhood ambitions. I resigned myself to the fact that while I could make significant improvements to the world, the problems we face would remain immense and oppressive.
Similarly, at one time, I felt most motivated to work on animal welfare. But again: even if I was unusually successful in my work, I could only hope to make a small dent in the problem. I might (for example) secure slightly better conditions for factory-farmed chickens in the US, but there would still be factory farms. And even if factory farms were abolished, what about the suffering of animals in the wild?
This situation was demotivating over time: the thought of so much intense pain was emotionally crushing, and I knew that even if I worked extremely hard and improved some animals’ lives, those improvements would be dwarfed by the suffering that remained.
With AGI, on the other hand, my ambitions to really change things have returned. AI is likely to change the world massively: for better or for worse. When we talk about AI, we often focus on its dangers: the fact that misaligned AI could kill us all or cause astronomical suffering. But there is a more positive side: if we manage to align AGI—if we get it right—it could fix so much. A superintelligent AI aligned with human values could end poverty, war, oppression, abuse, suffering itself—you name it. It could improve the world far more than any other intervention or advance to date.
This could be the most important cause in all of history. This time, if enough people work hard, if we’re thoughtful and strategic, we could really achieve utopian improvements to the human condition. I find this vision incredibly inspiring.
2. AI safety means you don’t have to “choose” a cause
Because aligned AI could have such wide-reaching effects, I find it easier to wholeheartedly dedicate myself to AI alignment. I used to feel terrible about all the problems I wasn’t helping with. As EAs, we know that we are always in triage and need to prioritize ruthlessly, but it can still feel painful, even if we know it’s the right thing to do.
I was working on poverty and animal welfare, but then I’d think—what about domestic abuse? What about the lobsters you see in tanks? What about depression? What about human rights abuses? What about North Korea? Or just plain old cancer, heart disease, death?
In the past, when I thought about one of these problems, I had to regretfully set it aside, and tell myself ‘yes, this is bad; but what you’re working on is more impactful. Stay focused!’
Now, though, whenever I remember that billions are still poor, or wild animals are being torn apart by predators, or that so many people still live under oppression and tyranny, I think to myself ‘AI could fix this’. Before, other problems were distracting; now, they only add fuel to my motivation to help AI go right. An aligned superintelligence could solve every problem that (for now) we neglect in order to work on aligning superintelligence; so choosing to work on AI safety seems like less of an agonizing trade-off.
3. It’s globally and locally helpful
If you work on AI risk, you don’t need to make trade-offs between your self-interest and your altruistic desires, because if AGI is developed in your lifetime, it will benefit or harm everyone.
For most EA cause areas, you need to be motivated by pure altruism; when you work on the cause, you are not working towards your personal goals. Working on animal welfare or global development benefits others, but it doesn’t benefit us. But when I work on AI safety, this could directly benefit me and my loved ones, as well as countless others. If a misaligned superintelligence kills us all within my lifetime, then it won’t just harm strangers living on the other side of the world, or unknown future people—it will harm me and everyone I care about!
Similarly, if we create an aligned superintelligence, it could solve problems that directly affect me: sickness, sadness, death. AI could make us immortal! This would be pretty motivating by itself, even if I didn’t have any purely altruistic motivation.
4. It’s awe-inspiring to try to build something so good
There is something deeply awe-inspiring about building something that is so much smarter, stronger, and better than us. We talk a lot about how much AI could surpass us in skill or intelligence; but (if we succeed), our superintelligence will also surpass us in goodness. It could embody all of our human virtues without our human vices: our partiality, our irrationality, the biases baked into us by evolution. It could be genuinely impartial and benevolent in a way that humans can’t.
I want something that good to exist.
And this and so many other reasons are why I find AI safety to be one of the most compelling cause areas I’ve ever worked in.
Let me know other reasons you find AI x/s-risks emotionally compelling in the comments. It would be great to compile the best ones and have a piece you can re-read or point people towards when they need a boost.
Reminder that you can listen to this post on your podcast player once/if it reaches 25 upvotes using the Nonlinear Library.
This post was written collaboratively by Kat Woods and Amber Dawn Ace as part of Nonlinear’s experimental Writing Internship program. The ideas are Kat’s; Kat explained them to Amber, and Amber wrote them up. We would like to offer this service to other EAs who want to share their as-yet unwritten ideas or expertise.
If you would be interested in working with Amber to write up your ideas, fill out this form.
- Who wants to be hired? (May-September 2022) by 27 May 2022 9:49 UTC; 117 points) (
- Monthly Overload of EA—July 2022 by 1 Jul 2022 16:22 UTC; 55 points) (
- Do EA folks want AGI at all? by 16 Jul 2022 5:44 UTC; 8 points) (
- 20 Sep 2022 9:56 UTC; 5 points) 's comment on I’m Interviewing Kat Woods, EA Powerhouse. What Should I Ask? by (
This is really strongly giving off suspicious convergence vibes, especially “AI safety means you don’t have to choose a cause”.
Also, “AI is better than us” is kind of scary religious talk. It sounds like we are worshipping a god and trying to summon it :)
I also got the same feeling, but then discarded it because this is not supposed to be a prioritisation argument, simply motivation.
It doesn’t need to (suspiciously) claim that AI safety so happens to also be best for your other interests, just that it helps there too, and that that’s nice to know :)
So long as make your commitments based on solid rational reasoning, it’s ok to lean into sources of motivation that wouldn’t be intellectually persuasive, but motivate you nonetheless.
I really appreciate the sentiment behind this—I get the sense that working on AI safety can feel very doom-y at times, and appreciate any efforts to alleviate that mental stress.
But I also worry that leaning into these specific reasons may lead to intellectual blindspots. E.g., Believing that aligned AI will make every other cause redundant leads me to emotionally discount considerations such as temporal discount rate or tractability. If you can justify your work as a silver bullet, then how much longer would you be willing to work on that, even when it seems impossible? Where does one draw the line?
My main point here is that these reasons can be great motivators, but should only be called upon after someone has intellectually mapped out the reasons why they are working on AI and what would need to change for them to stop working on it.
On some level, I get all of this, and try to think about it to inspire myself, but ultimately I find it very hard to actually be comforted by this sort of thing. I’m sort of hesitant to write about why, both because I hope other people can find comfort in this without me Eeyoring all over the comments, and in part because, like everything I write about this topic, it is going to make me sound very weird in some ways, but it is important to how I, and maybe some other people, relate to this weird big thing when we look at it too long.
Adding these details makes everything feel even less real. There’s of course a little rational basis for this:
https://www.readthesequences.com/Burdensome-Details
But it feels even harder to grasp than I think even this suggests. There was a post on here a little while ago I can’t quite manage to find, that basically talked about how the genre of thing AGI risk was looked really suspicious and rapturey in a way that should maybe raise alarm bells. I’m not so sure about this, I think in a way this genre of prediction is something of an attractor to many people and suspicious because of it, but even that isn’t the end of the world (no pun intended). I would kind of rather that occasionally some group of smart, dedicated people throw their life work at some false alarm rapture (at least if the dynamics of the relevant groups they are in are otherwise healthy and reasonable) than, well, this:
https://astralcodexten.substack.com/p/heuristics-that-almost-always-work
But this genre does make it harder for people like me to viscerally feel anything about the event at all. The most I can usually manage is seeing it as like, some sort of big digital asteroid headed for us, not as an unborn god.
When I actually do take AGI seriously emotionally, my reactions are really weird. I’m probably not alone in this, but over the last few months, starting with Yudkowsky’s doom post, and escalating with all of the capabilities developments that got released, this risk did suddenly feel emotionally real to me in a way it never did before, and mostly this was just emotionally devastating. Probably in part because it is much easier for me to feel the digital asteroid than unborn god thing. It has led me to realize that I have some really weird feelings about the whole thing.
One interesting thing I have sometimes seen people point out is that, selfishly, you really ought to hope that AGI comes sooner, because if it comes during your lifetime, you have a chance of living indefinitely, and if it kills you, well, you would have died only a little while afterwards anyway. Therefore, if you really hope that it does come later, because that increases the odds that it goes well, it means you are a really genuinely altruistic person. I do hope that AGI comes later rather than sooner, but on reflection, this is not an altruistic feeling, but just a really really confusing one.
I would prefer that AGI arrive one day after my natural lifespan ends than one day before. Maybe I am just not enough of a techno-optimist transhumanist type in my heart of hearts, but it just seems so horrible to me that I might be there when everything and everyone I love ends, even if there’s a very good chance that won’t be the result and things will instead be great. I think this might be sort of similar to how parents are able to stand the idea that their children will one day die. Maybe it is easier to cope with when they think to themselves “but I’ll be dead by then anyway”, even if this comfort ultimately isn’t grounded in something in itself very different—I think if a parent learned they would die fifty years later, they would suddenly be much more scared of their child’s death even if it were to happen at the same time.
I am also much more sad at the idea that something like factory farming could be ended by everything ending, rather than by us actually fixing anything, than I am made hopeful by the idea that the future could be great. I really don’t understand any of these feelings, none of them are what I would have predicted, but they really suck, both personally and for my priorities, and I’m perversely grateful that, mostly by purposely reading less and less about AGI recently, I am managing to make it feel more unreal to myself again.
Maybe the takeaway from all of this is that I am personally a real bummer, I’m struggling with some other mental health problems right now anyway that maybe distort how I’m able to relate to this issue. Also I probably maybe just shouldn’t work on AI safety. But anyway, this is some of the weird ways my reactions to this issue have gone in practice.
I think I mainly agree with the other comments (from Devin Kalish and Miranda Zhang), but on net I’m still glad that this post exists. Many thanks for writing it :)
Specifically, I think positive/x-hope (as opposed to negative/x-risk) styled framings can be valuable. Because:
There’s some literature in behavioral psychology and in risk studies which says, roughly, that a significant fraction of people will kind of shut down and ignore the message when told about a threat. See, e.g., normalcy bias and ostrich effect.
Mental contrasting has been shown to work (and the study replicates, as far as I can tell). Essentially, the research behind mental contrasting tells us that people tend to be strongly motivated by working toward a desired future.[1]
Anecdotally, I’ve found that some people do just resonate strongly with “inspirational visions” and the idea of helping build something really cool.
Technically, mental constrasting only works if the perceived chance of success is high. I’m not sure what percentage “high” in this context maps to, but there is an issue here for folks who have a non-low P(doom) via misaligned AI… To counter this, there is the option of resorting to the Dark Arts, self-deception in particular, but that discussion is beyond the scope of this comment. See the Dark Arts tag on LessWrong if interested.