Yeah, I don’t think it’s accurate to say that I see assistance games as mostly irrelevant to modern deep learning, and I especially don’t think that it makes sense to cite my review of Human Compatible to support that claim.
The one quote that Daniel mentions about shifting the entire way we do AI is a paraphrase of something Stuart says, and is responding to the paradigm of writing down fixed, programmatic reward functions. And in fact, we have now changed that dramatically through the use of RLHF, for which a lot of early work was done at CHAI, so I think this reflects positively on Stuart.
I’ll also note that in addition to the “Learning to Interactively Learn and Assist” paper that does CIRL with deep RL which Daniel cited above, I also wrote a paper with several CHAI colleagues that applied deep RL to solve assistance games.
My position is that you can roughly decompose the overall problem into two subproblems: (1) in theory, what should an AI system do? (2) Given a desire for what the AI system should do, how do we make it do that?
The formalization of assistance games is more about (1), saying that AI systems should behave more like assistants than like autonomous agents (basically the point of my paper linked above). These are mostly independent. Since deep learning is an answer to (2) while assistance games are an answer to (1), you can use deep learning to solve assistance games.
I’d also say that the current form factor of ChatGPT, Claude, Bard etc is very assistance-flavored, which seems like a clear success of prediction at least. On the other hand, it seems unlikely that CHAI’s work on CIRL had much causal impact on this, so in hindsight it looks less useful to have done this research.
All this being said, I view (2) as the more pressing problem for alignment, and so I spend most of my time on that, which implies not working on assistance games as much any more. So I think it’s overall reasonable to take me as mildly against work on assistance games (but not to take me as saying that it is irrelevant to modern deep learning).
Yeah, I don’t think it’s accurate to say that I see assistance games as mostly irrelevant to modern deep learning, and I especially don’t think that it makes sense to cite my review of Human Compatible to support that claim.
The one quote that Daniel mentions about shifting the entire way we do AI is a paraphrase of something Stuart says, and is responding to the paradigm of writing down fixed, programmatic reward functions. And in fact, we have now changed that dramatically through the use of RLHF, for which a lot of early work was done at CHAI, so I think this reflects positively on Stuart.
I’ll also note that in addition to the “Learning to Interactively Learn and Assist” paper that does CIRL with deep RL which Daniel cited above, I also wrote a paper with several CHAI colleagues that applied deep RL to solve assistance games.
My position is that you can roughly decompose the overall problem into two subproblems: (1) in theory, what should an AI system do? (2) Given a desire for what the AI system should do, how do we make it do that?
The formalization of assistance games is more about (1), saying that AI systems should behave more like assistants than like autonomous agents (basically the point of my paper linked above). These are mostly independent. Since deep learning is an answer to (2) while assistance games are an answer to (1), you can use deep learning to solve assistance games.
I’d also say that the current form factor of ChatGPT, Claude, Bard etc is very assistance-flavored, which seems like a clear success of prediction at least. On the other hand, it seems unlikely that CHAI’s work on CIRL had much causal impact on this, so in hindsight it looks less useful to have done this research.
All this being said, I view (2) as the more pressing problem for alignment, and so I spend most of my time on that, which implies not working on assistance games as much any more. So I think it’s overall reasonable to take me as mildly against work on assistance games (but not to take me as saying that it is irrelevant to modern deep learning).