As part of a presentation I’ll be giving soon, I’ll be spending a bit of time explaining why AI might be an x-risk.
Can anyone point me to existing succinct explanations for this, which are as convincing as possible, brief (I won’t be spending long on this), and (of course) demonstrating good epistemics.
The audience will be actuaries interested in ESG investing.
If someone fancies entering some brief explanations as an answer, feel free, but I was expecting to see links to content which already exists, since I’m sure there’s loads of it.
Here is an intuitive, brief answer that should provide evidence that there is risk:
In the history of life before humans, there have been 5 documented mass extinctions. Humans—the first generally intelligent agent to evolve on our planet—are not causing the 6th mass extinction.
An intelligent agent that is superior to humans, clearly has potential to be another mass extinction agent—and if it turns out humans are in conflict with that agent, the risks are real.
So it makes sense to understand that risk—and, today, we don’t, even though development of these agents is barrowing forward at an incredible pace.
https://en.wikipedia.org/wiki/Holocene_extinction
https://www.cambridge.org/core/journals/oryx/article/briefly/03807C841A690A77457EECA4028A0FF9
Hi Sanjay, There is this post.
I think my explainer on the topic does a good job:
https://forum.effectivealtruism.org/posts/CghaRkCDKYTbMhorc/the-importance-of-ai-alignment-explained-in-5-points
Due to the hierarchical manner in which I wrote the piece, it’s brief as long as you don’t go down too deep following too many of the claims.
How about something like:
AI systems are rapidly becoming more capable.
They could become extremely powerful in the next 10-50 years.
We basically don’t understand how they work (except at high levels of abstraction) or what’s happening inside them. This gets even harder as they get bigger and/or more general.
We don’t know how to reliably get these systems to do what we want them to do. One, it’s really hard to specify what exactly we want. Two, even if we could, their goals/drives may not generalize to new environments.
But it does seem like, whatever objectives they do aim for, they’ll face incentives that conflict with our interests. For example, accruing more power, preserving option value, avoiding being shut off and so on is generally useful, whatever goal you pursue.
It’s really hard to rigorously test AIs because they (1) are the result of a “blind” optimization process (not a deliberate design), (2) are monolithic (i.e. don’t consist of individual testable components), and (3) may at some point be smarter than us.
There are strong incentives to develop and deploy AI systems. This means powerful AI systems may be deployed even if they aren’t adequately safe/tested.
Of course this is a rough argument, and necessarily leaves out a bunch of detail and nuance.
AI Risk for Epistemic Minimalists (Alex Flint, 2021).
As far as I’m aware, the best introduction to AI safety is the AI safety chapter in The Precipice. I’ve tested it on two 55-year-olds and it worked.
It’s a bit long (20 minute read according to lesswrong), but it’s filled to the brim with winning strategies for giving people a fair chance to understand AI as an X-risk. A list of names of reputable people who wholeheartedly endorse AI safety, for example.