I gave a talk introducing AI alignment / risks from advanced AI in June 2022, aimed at a generally technical audience. However, given how fast AI has been moving, I felt I needed an updated talk. I’ve made a new one closely based off Richard Ngo’s Twitter thread, itself based off of The Alignment Problem from a Deep Learning Perspective. There’s still too much text, but these slides are updated through March 2023 and have a more technical lens.
People are welcome to use this without attribution, and I hope it’s useful for any fieldbuilders who want to improve it! I’m also happy to give this talk if people would like me to—the slides come out to about 45m, with whatever time remaining for discussion.
New talk slides: The Alignment Problem: Potential Risks from Highly-Capable AI
Main thesis slide:
Appendix
Bonus data that I collected after the talk (which was given to AI safety academics)
Comments:
Great talk! I liked the clear description of the relative resources going into alignment and improvement of capabilities.
“Not influential” only because I have already read a lot on this topic :-)
Sorry to be a downer I just don’t believe in this stuff, I’m not a materialist
I think the alignment problem is one that we, as humans, may not be able to figure out.
I want to highlight aisafety.training which I think is currently the single most useful link go give to anyone who want’s to join the effort of AI Safety research.
Who ever gave me a disagreement vote, I’d be interested to hear why. No pressure though.
I didn’t give a disagreement vote, but I do disagree on aisafety.training being the “single most useful link to give anyone who wants to join the effort of AI Safety research”, just because there’s a lot of different resources out there and I think “most useful” depends on the audience. I do think it’s a useful link, but most useful is a hard bar to meet!
I agree that it’s not the most useful link for everyone. I can see how my initial message was ambiguous about this. What I meant is that by all the links I know, I expect this to be most useful on average.
Like, if I meat someone and have a conversation with someone and I had to constrain myself to give them only a single link, I might pick another recourse to give them, based on their personal situation. But if I wrote a post online or gave a talk to a broad audience, and I had to pick only one link to share, it would be this one.