SammyDMartin comments on Why AI alignment could be hard with modern deep learning

SammyDMartin Sep 27, 2021, 10:05 PM
2 points
0 ∶ 0
Very good summary! I’ve been working on a (much drier) series of posts explaining different AI risk scenarios—https://forum.effectivealtruism.org/posts/KxDgeyyhppRD5qdfZ/link-post-how-plausible-are-ai-takeover-scenarios

But I think I might adopt ‘Sycophant’/‘Schemer’ as better more descriptive names for WFLL1/WFLL2, Outer/Inner alignment failure going forward

I also liked that you emphasised how much the optimist Vs pessimist case depends on hard to articulate intuitions about things like how easily findable deceptive models are and how easy incremental course correction is. I called this the ‘hackability’ of alignment—https://www.lesswrong.com/posts/zkF9PNSyDKusoyLkP/investigating-ai-takeover-scenarios#Alignment__Hackability_