bgarfinkel comments on AMA or discuss my 80K podcast episode: Ben Garfinkel, FHI researcher

bgarfinkel 18 Jul 2020 23:15 UTC
15 points
0 ∶ 0
I agree with Aidan’s suggestion that Human Compatible is probably the best introduction to risks from AI (for both non-technical readers and readers with CS backgrounds). It’s generally accessible and engagingly written, it’s up-to-date, and it covers a number of different risks. Relative to many other accounts, I think it also has the virtue of focusing less on any particular development scenario and expressing greater optimism about the feasibility of alignment. If someone’s too pressed for time to read Human Comptabile, the AI risk chapter in The Precipice would then be my next best bet. Another very readable option, mainly for non-CS people, would be the AI risk chapters in The AI Does Not Hate You: I think they may actually be the cleanest distillation of the “classic” AI risk argument.

For people with CS backgrounds, hoping for a more technical understanding of the problems safety/alignment researchers are trying to solve, I think that Concrete Problems in AI Safety, Scalable Agent Alignment Via Reward Modeling, and Rohin Shah’s blog post sequence on “value learning” are especially good picks. Although none of these resources frames safety/alignment research as something that’s intended to reduce existential risks.

I think that AI Governance: A Research Agenda would be the natural starting point for social scientists, especially if they have a substantial interest in risks beyond alignment.

Of course, for anyone interested in digging into arguments around AI risk, I think that Superintelligence is still a really important read. (Even beyond its central AI risk argument, it also has a ton of interesting ideas on the future of intelligent life, ethics, and the strategic landscape that other resources don’t.) But it’s not where I think people should start.