For Cold Takes posts on forecasting AI, see this sequence: Cold Takes on Forecasting AI, especially AI Timelines: Where the Arguments, and the “Experts,” Stand — a summary of when we should expect transformative AI to be developed, based on the multiple angles covered previously in the series.
Consider also exploring the Most Important Century series.
The five key posts in this series are:
Why AI alignment could be hard with modern deep learning (by Ajeya)
Deep learning models can have various “motivations” that lead to good performance on a task, similar to how human employees have different motivations for their work.
There is evidence that deep learning models sometimes pursue goals that their designers did not intend.
If this continues to happen with very powerful models, they could make important decisions without considering human values.
The alignment problem is the challenge of ensuring that advanced deep learning models do not pursue dangerous goals.
The post discusses the difficulty of alignment when deep learning models are more capable than humans, provides a more technical explanation of the alignment problem, and discusses the risks of failing to solve the alignment problem.
AI Could Defeat All Of Us Combined
This post sketches out the basic argument for why I think AI could defeat all of human civilization.
AI could manipulate humans and create overpowering advanced technologies.
Even “merely human-level” AI could still defeat us all—by quickly coming to rival human civilization in terms of total population and resources.
At a high level, I think we should be worried if a huge (competitive with world population) and rapidly growing set of highly skilled humans on another planet was trying to take down civilization just by using the Internet. So we should be worried about a large set of disembodied AIs as well.
The post addresses a few objections/common questions:
How can AIs be dangerous without bodies?
If lots of different companies and governments have access to AI, won’t this create a “balance of power” so that no one actor is able to bring down civilization?
Won’t we see warning signs of AI takeover and be able to nip it in the bud?
Isn’t it fine or maybe good if AIs defeat us? They have rights too.
And the post closes with some thoughts on just how unprecedented it would be to have something on our planet capable of overpowering us all.
How might we align transformative AI if it’s developed very soon?
This post gives my understanding of what the set of available strategies for aligning transformative AI would be if it were developed very soon, and why they might or might not work.
AI Safety Seems Hard to Measure
Maybe we’ll succeed in reducing AI risk, and maybe we won’t. Unfortunately, I think it could be hard to know either way. This piece is about four fairly distinct-seeming reasons that this could be the case—and that AI safety could be an unusually difficult sort of science.
High-level hopes for AI alignment
While I think misalignment risk is serious and presents major challenges, I don’t agree with sentiments1 along the lines of “We haven’t figured out how to align an AI, so if transformative AI comes soon, we’re doomed.” Here I’m going to talk about some of my high-level hopes for how we might end up avoiding this risk.