Stuart Russell’s “assistance game” research agenda, started in 2016, is now widely seen as mostly irrelevant to modern deep learning— see former student Rohin Shah’s review here, as well as Alex Turner’s comments here.
The second link just takes me to Alex Turner’s shortform page on LW, where ctrl+f-ing “assistance” doesn’t get me any results. I do find this comment when searching for “CIRL”, which criticizes the CIRL/assistance games research program, but does not claim that it is irrelevant to modern deep learning. For what it’s worth, I think it’s plausible that Alex Turner thinks that assistance games is mostly irrelevant to modern deep learning (and plausible that he doesn’t think that) - I merely object that the link provided doesn’t provide good evidence of that claim.
The first link is to Rohin Shah’s reviews of Human Compatible and some assistance games / CIRL research papers. ctrl+f-ing “deep” gets me two irrelevant results, plus one description of a paper “which is inspired by [the CIRL] paper and does a similar thing with deep RL”. It would be hard to write such a paper if CIRL (aka assistance games) was mostly irrelevant to modern deep learning. The closest thing I can find is in the summary of Human Compatible, which says “You might worry that the proposed solution [of making AI via CIRL / assistance games] is quite challenging: after all, it requires a shift in the entire way we do AI.”. This doesn’t make assistance games irrelevant to modern deep learning—in 2016, it would have been true to say that moving the main thrust of AI research to language modelling so as to produce helpful chatbots required a shift in the entire way we did AI, but research into deeply learned large language models was not irrelevant to deep learning as of 2016 - in fact, it sprung out of 2016-era deep learning.
Yeah, I don’t think it’s accurate to say that I see assistance games as mostly irrelevant to modern deep learning, and I especially don’t think that it makes sense to cite my review of Human Compatible to support that claim.
The one quote that Daniel mentions about shifting the entire way we do AI is a paraphrase of something Stuart says, and is responding to the paradigm of writing down fixed, programmatic reward functions. And in fact, we have now changed that dramatically through the use of RLHF, for which a lot of early work was done at CHAI, so I think this reflects positively on Stuart.
I’ll also note that in addition to the “Learning to Interactively Learn and Assist” paper that does CIRL with deep RL which Daniel cited above, I also wrote a paper with several CHAI colleagues that applied deep RL to solve assistance games.
My position is that you can roughly decompose the overall problem into two subproblems: (1) in theory, what should an AI system do? (2) Given a desire for what the AI system should do, how do we make it do that?
The formalization of assistance games is more about (1), saying that AI systems should behave more like assistants than like autonomous agents (basically the point of my paper linked above). These are mostly independent. Since deep learning is an answer to (2) while assistance games are an answer to (1), you can use deep learning to solve assistance games.
I’d also say that the current form factor of ChatGPT, Claude, Bard etc is very assistance-flavored, which seems like a clear success of prediction at least. On the other hand, it seems unlikely that CHAI’s work on CIRL had much causal impact on this, so in hindsight it looks less useful to have done this research.
All this being said, I view (2) as the more pressing problem for alignment, and so I spend most of my time on that, which implies not working on assistance games as much any more. So I think it’s overall reasonable to take me as mildly against work on assistance games (but not to take me as saying that it is irrelevant to modern deep learning).
I asked Alex “no chance you can comment on whether you think assistance games are mostly irrelevant to modern deep learning?”
His response was “i think it’s mostly irrelevant, yeah, with moderate confidence”. He then told me he’d lost his EA forum credentials and said I should feel free to cross-post his message here.
(For what it’s worth, as people may have guessed, I disagree with him—I think you can totally do CIRL-type stuff with modern deep learning, to the extent you can do anything with modern deep learning.)
The second link just takes me to Alex Turner’s shortform page on LW, where ctrl+f-ing “assistance” doesn’t get me any results. I do find this comment when searching for “CIRL”, which criticizes the CIRL/assistance games research program, but does not claim that it is irrelevant to modern deep learning. For what it’s worth, I think it’s plausible that Alex Turner thinks that assistance games is mostly irrelevant to modern deep learning (and plausible that he doesn’t think that) - I merely object that the link provided doesn’t provide good evidence of that claim.
The first link is to Rohin Shah’s reviews of Human Compatible and some assistance games / CIRL research papers. ctrl+f-ing “deep” gets me two irrelevant results, plus one description of a paper “which is inspired by [the CIRL] paper and does a similar thing with deep RL”. It would be hard to write such a paper if CIRL (aka assistance games) was mostly irrelevant to modern deep learning. The closest thing I can find is in the summary of Human Compatible, which says “You might worry that the proposed solution [of making AI via CIRL / assistance games] is quite challenging: after all, it requires a shift in the entire way we do AI.”. This doesn’t make assistance games irrelevant to modern deep learning—in 2016, it would have been true to say that moving the main thrust of AI research to language modelling so as to produce helpful chatbots required a shift in the entire way we did AI, but research into deeply learned large language models was not irrelevant to deep learning as of 2016 - in fact, it sprung out of 2016-era deep learning.
Yeah, I don’t think it’s accurate to say that I see assistance games as mostly irrelevant to modern deep learning, and I especially don’t think that it makes sense to cite my review of Human Compatible to support that claim.
The one quote that Daniel mentions about shifting the entire way we do AI is a paraphrase of something Stuart says, and is responding to the paradigm of writing down fixed, programmatic reward functions. And in fact, we have now changed that dramatically through the use of RLHF, for which a lot of early work was done at CHAI, so I think this reflects positively on Stuart.
I’ll also note that in addition to the “Learning to Interactively Learn and Assist” paper that does CIRL with deep RL which Daniel cited above, I also wrote a paper with several CHAI colleagues that applied deep RL to solve assistance games.
My position is that you can roughly decompose the overall problem into two subproblems: (1) in theory, what should an AI system do? (2) Given a desire for what the AI system should do, how do we make it do that?
The formalization of assistance games is more about (1), saying that AI systems should behave more like assistants than like autonomous agents (basically the point of my paper linked above). These are mostly independent. Since deep learning is an answer to (2) while assistance games are an answer to (1), you can use deep learning to solve assistance games.
I’d also say that the current form factor of ChatGPT, Claude, Bard etc is very assistance-flavored, which seems like a clear success of prediction at least. On the other hand, it seems unlikely that CHAI’s work on CIRL had much causal impact on this, so in hindsight it looks less useful to have done this research.
All this being said, I view (2) as the more pressing problem for alignment, and so I spend most of my time on that, which implies not working on assistance games as much any more. So I think it’s overall reasonable to take me as mildly against work on assistance games (but not to take me as saying that it is irrelevant to modern deep learning).
I asked Alex “no chance you can comment on whether you think assistance games are mostly irrelevant to modern deep learning?”
His response was “i think it’s mostly irrelevant, yeah, with moderate confidence”. He then told me he’d lost his EA forum credentials and said I should feel free to cross-post his message here.
(For what it’s worth, as people may have guessed, I disagree with him—I think you can totally do CIRL-type stuff with modern deep learning, to the extent you can do anything with modern deep learning.)