a casual intro to AI doom and alignment
this post, intended notably for people outside of the AI alignment community, intends to convey my current perspective about AI doom and alignment and why i think those are important issues. i hold these beliefs not with absolute confidence, but with enough that i think i ought to be focused on these issues.
tl;dr: the development of advanced AI will likely cause the permanent extinction of everything we value, sometime this decade or maybe the next. not many people are working on solving this, and we largely don’t know what we’re doing. you can help by trying to do alignment research.
what’s going on?
people in a variety of organizations such as OpenAI and DeepMind are researching ever more advanced artificial intelligence. they’re not doing this out of malice, or even that much for profit; from what i understand, they’re doing it because they believe it’s cool and because they think it’s genuinely going to improve the world.
i think they’re mistaken. i, and most of the AI alignment community, think that it’s likely to have catastrophic consequences we call “doom”; typically the total extinction of everything we value, or possibly worse.
the reasons why can be simple or complicated, depend on your assumptions about AI and ethics and various other things. no small post is going to fully address all the counter-arguments people are going to have. here’s a short explanation which is intuitive to me:
nobody even knows how to make advanced AIs pursue anything specific, let alone how to make advanced AIs pursue goals that encompass everything we care about
because of these, and because of things like the orthogonality thesis, as soon as someone builds the first AI that is good at pursuing something, that thing is very unlikely to be something we want.
because of instrumental convergence, any AI that is good at pursuing something we don’t want will want to use as many resources as possible to pursue it. this includes everything we value; everything we value is made of matter and energy that the AI could be using to better accomplish what it’s pursuing.
powerful AI is likely to happen somewhat soon — within this decade, or maybe the next. you can read about why i think this, but you can also look at metaculus’ predictions about general AI, and there is lively debate on LessWrong.
common counter-arguments to AI doom concern, and responses to those, can be found on the “bad AI alignment take bingo”.
what is AI alignment?
“AI alignment” is the field of study of how to make AI pursue goals which, when pursued, lead to worlds we’d want, as opposed to worlds in which we’re all dead.
some of the people working to develop ever more advanced AI — doing what we call “AI capability research” or simply “AI capabilities” — are aware of the arguments put forth by the alignment community. some of them disagree with those arguments. others are aware of them, but continue working for various reasons, typically to do with the difficulty for people to pursue what they actually want.
the AI alignment community has much of its public discourse and publications on the LessWrong website, a platform which originally hosted The Sequences as an introduction to some ideas about rationality, around which evolved the community that is still active there now.
i’ve heard estimates for the number of people working on AI alignment ranging from 70 to 300. this is very small, considering the importance and the difficulty of the task at hand.
the field of AI alignment is very confused, at the moment. we largely don’t know what we’re doing. we’re pursuing varied fields of investigation, mostly without a big picture plan of how to solve the problem. we don’t even have a consensus on what is necessary or sufficient to solve AI alignment. needless to say, things are not looking good.
but, even if we figured out how to make an advanced AI not dangerous, significant problems remain, as pointed out by this graph from steve byrnes:
indeed, we could develop a method to make AI safe, but someone else could still build dangerous AI later and cause doom that way — this could be because they don’t know about that method, because they don’t care, because they can’t be bothered, because they made a mistake while trying to implement it, because that method doesn’t work for their particular flavor of AI, or any other reason. as the important AGI Ruin post puts it, we need to stop “Facebook AI Research from destroying the world six months later”.
given this, we need not just a method to make AI safe, but also either a way to make sure everyone uses that method, correctly or a powerful, aligned AI that saves us forever. you can read more about my view of AI alignment and how to prevent doom in my outlook on AI risk mitigation.
some people ask questions like, aligned to whose values? shouldn’t it be aligned to everyone? and how do we do that? — my answer is twofold. on the theoretical side, aligning AI to everyone is not what an alignment researcher or team should want to do. on the practical side, we’re currently way too desperate for anything that works to be picky; to quote AGI Ruin:
At this point, I no longer care how it works, I don’t care how you got there, I am cause-agnostic about whatever methodology you used, all I am looking at is prospective results, all I want is that we have justifiable cause to believe of a pivotally useful AGI ‘this will not kill literally everyone’. Anybody telling you I’m asking for stricter ‘alignment’ than this has failed at reading comprehension. The big ask from AGI alignment, the basic challenge I am saying is too difficult, is to obtain by any strategy whatsoever a significant chance of there being any survivors.
how can i help?
i had heard about these arguments before, but i only started emotionally worrying about AI doom when github copilot and things like it came out, and subsequently i refocused what i was doing with my life. if you agree that AI doom is or might be very concerning, then you might want to help.
first, take care of yourself. you’re probly going to create more value, both for yourself and the world, if you don’t become too doomer.
second, learn about alignment; both the technical field of study and its community. some useful resources include:
this great talk (and its accompanying slides) or this post summarizing it;
you can join the alignment discord for my alignment research organization, Orthogonal.
the pretty good Alignment Research Field Guide;
finally, i think The Sequences remain a good foundation for rationality.
there are ways to help without doing research, but i believe research is the bottleneck right now.
it’s not all doom and gloom; AI could actually give us a great utopian future! (see 1, 2, 3, 4, 5, 6) it just takes a whole lot of work to get there, and the alternative is pretty bad.
Curious why people downvote this, if anyone wants to offer their perspective?
I like it a lot due to being very straightforward, well-argued (unless I’m missing something), and it seems very good for getting the attention of people like me. I’d be more likely to gloss over anything trying to use professionaleese to argue for the same main point. This type of writing strikes me as much more credibly communicating what the author actually believes and why they believe it, rather than trying to conform to organisational stakeholders or something.
If they aren’t using caps, that effectively removes the confounding hypothesis that they wrote it with the motivation to impress their employers, the academy, or somesuch. It’s a costly, credible signal.
easily best intro to agi safety