To add onto Jacob and Paul’s comments, I think that while HRAD is more mature in the sense that more work has gone into solving HRAD problems and critiquing possible solutions, the gap seems much smaller to me when it comes to the justification for thinking HRAD is promising vs justification for Paul’s approach being promising. In fact, I think the arguments for Paul’s work being promising are more solid than those for HRAD, despite it only being Paul making those arguments—I’ve had a much harder time understanding anything more nuanced than the basic case for HRAD I gave above, and a much easier time understanding why Paul thinks his approach is promising.
[ETA: By the end of 2016 this problem no longer seems like the most serious.]
…
[ETA: while robust learning remains a traditional AI challenge, it is not at all clear that it is possible. And meta-execution actually seems like the ingredient furthest from existing ML practice, as well as having non-obvious feasibility.]
My interpretation of this is that between March 2016 and the end of 2016, Paul updated the difficulty of his approach upwards. (I think given the context, he means that other problems, namely robust learning and meta-execution, are harder, not that informed oversight has become easier.) I wanted to point this out to make sure you updated on his update. Clearly Paul still thinks his approach is more promising than HRAD, but perhaps not by as much as before.
the gap seems much smaller to me when it comes to the justification for thinking HRAD is promising vs justification for Paul’s approach being promising
This seems wrong to me. For example, in the “learning to reason from human” approaches, the goal isn’t just to learn to reason from humans, but to do it in a way that maintains competitiveness with unaligned AIs. Suppose a human overseer disapproves of their AI using some set of potentially dangerous techniques, how can we then ensure that the resulting AI is still competitive? Once someone points this out, proponents of the approach, to continue thinking their approach is promising, would need to give some details about how they intend to solve this problem. Subsequently, justification for thinking the approach is promising is more subtle and harder to understand. I think conversations like this have occurred for MIRI’s approach far more than Paul’s, which may be a large part of why you find Paul’s justifications easier to understand.
This doesn’t match my experience of why I find Paul’s justifications easier to understand. In particular, I’ve been following MIRI since 2011, and my experience has been that I didn’t find MIRI’s arguments (about specific research directions) convincing in 2011*, and since then have had a lot of people try to convince me from a lot of different angles. I think pretty much all of the objections I have are ones I generated myself, or would have generated myself. Although, the one major objection I didn’t generate myself is the one that I feel most applies to Paul’s agenda.
( * There was a brief period shortly after reading the sequences that I found them extremely convincing, but I think I was much more credulous then than I am now. )
I think the argument along these lines that I’m most sympathetic to is that Paul’s agenda fits more into the paradigm of typical ML research, and so is more likely to fail for reasons that are in many people’s collective blind spot (because we’re all blinded by the same paradigm).
That actually didn’t cross my mind before, so thanks for pointing it out. After reading your comment, I decided to look into Open Phil’s recent grants to MIRI and OpenAI, and noticed that of the 4 technical advisors Open Phil used for the MIRI grant investigation (Paul Christiano, Jacob Steinhardt, Christopher Olah, and Dario Amodei), all either have a ML background or currently advocate a ML-based approach to AI alignment. For the OpenAI grant however, Open Phil didn’t seem to have similarly engaged technical advisors who might be predisposed to be critical of the potential grantee (e.g., HRAD researchers), and in fact two of the Open Phil technical advisors are also employees of OpenAI (Paul Christiano and Dario Amodei). I have to say this doesn’t look very good for Open Phil in terms of making an effort to avoid potential blind spots and bias.
(Speaking for myself, not OpenPhil, who I wouldn’t be able to speak for anyways.)
For what it’s worth, I’m pretty critical of deep learning, which is the approach OpenAI wants to take, and still think the grant to OpenAI was a pretty good idea; and I can’t really think of anyone more familiar with MIRI’s work than Paul who isn’t already at MIRI (note that Paul started out pursuing MIRI’s approach and shifted in an ML direction over time).
That being said, I agree that the public write-up on the OpenAI grant doesn’t reflect that well on OpenPhil, and it seems correct for people like you to demand better moving forward (although I’m not sure that adding HRAD researchers as TAs is the solution; also note that OPP does consult regularly with MIRI staff, though I don’t know if they did for the OpenAI grant).
I can’t really think of anyone more familiar with MIRI’s work than Paul who isn’t already at MIRI (note that Paul started out pursuing MIRI’s approach and shifted in an ML direction over time).
The Agent Foundations Forum would have been a good place to look for more people familiar with MIRI’s work. Aside from Paul, I see Stuart Armstrong, Abram Demski, Vadim Kosoy, Tsvi Benson-Tilsen, Sam Eisenstat, Vladimir Slepnev, Janos Kramar, Alex Mennen, and many others. (Abram, Tsvi, and Sam have since joined MIRI, but weren’t employees of it at the time of the Open Phil grant.)
That being said, I agree that the public write-up on the OpenAI grant doesn’t reflect that well on OpenPhil, and it seems correct for people like you to demand better moving forward
I had previously seen some complaints about the way the OpenAI grant was made, but until your comment, hadn’t thought of a possible group blind spot due to a common ML perspective. If you have any further insights on this and related issues (like why you’re critical of deep learning but still think the grant to OpenAI was a pretty good idea, what are your objections to Paul’s AI alignment approach, how could Open Phil have done better), would you please write them down somewhere?
I think there’s something to this—thanks.
To add onto Jacob and Paul’s comments, I think that while HRAD is more mature in the sense that more work has gone into solving HRAD problems and critiquing possible solutions, the gap seems much smaller to me when it comes to the justification for thinking HRAD is promising vs justification for Paul’s approach being promising. In fact, I think the arguments for Paul’s work being promising are more solid than those for HRAD, despite it only being Paul making those arguments—I’ve had a much harder time understanding anything more nuanced than the basic case for HRAD I gave above, and a much easier time understanding why Paul thinks his approach is promising.
Daniel, while re-reading one of Paul’s posts from March 2016, I just noticed the following:
My interpretation of this is that between March 2016 and the end of 2016, Paul updated the difficulty of his approach upwards. (I think given the context, he means that other problems, namely robust learning and meta-execution, are harder, not that informed oversight has become easier.) I wanted to point this out to make sure you updated on his update. Clearly Paul still thinks his approach is more promising than HRAD, but perhaps not by as much as before.
This seems wrong to me. For example, in the “learning to reason from human” approaches, the goal isn’t just to learn to reason from humans, but to do it in a way that maintains competitiveness with unaligned AIs. Suppose a human overseer disapproves of their AI using some set of potentially dangerous techniques, how can we then ensure that the resulting AI is still competitive? Once someone points this out, proponents of the approach, to continue thinking their approach is promising, would need to give some details about how they intend to solve this problem. Subsequently, justification for thinking the approach is promising is more subtle and harder to understand. I think conversations like this have occurred for MIRI’s approach far more than Paul’s, which may be a large part of why you find Paul’s justifications easier to understand.
This doesn’t match my experience of why I find Paul’s justifications easier to understand. In particular, I’ve been following MIRI since 2011, and my experience has been that I didn’t find MIRI’s arguments (about specific research directions) convincing in 2011*, and since then have had a lot of people try to convince me from a lot of different angles. I think pretty much all of the objections I have are ones I generated myself, or would have generated myself. Although, the one major objection I didn’t generate myself is the one that I feel most applies to Paul’s agenda.
( * There was a brief period shortly after reading the sequences that I found them extremely convincing, but I think I was much more credulous then than I am now. )
I think the argument along these lines that I’m most sympathetic to is that Paul’s agenda fits more into the paradigm of typical ML research, and so is more likely to fail for reasons that are in many people’s collective blind spot (because we’re all blinded by the same paradigm).
That actually didn’t cross my mind before, so thanks for pointing it out. After reading your comment, I decided to look into Open Phil’s recent grants to MIRI and OpenAI, and noticed that of the 4 technical advisors Open Phil used for the MIRI grant investigation (Paul Christiano, Jacob Steinhardt, Christopher Olah, and Dario Amodei), all either have a ML background or currently advocate a ML-based approach to AI alignment. For the OpenAI grant however, Open Phil didn’t seem to have similarly engaged technical advisors who might be predisposed to be critical of the potential grantee (e.g., HRAD researchers), and in fact two of the Open Phil technical advisors are also employees of OpenAI (Paul Christiano and Dario Amodei). I have to say this doesn’t look very good for Open Phil in terms of making an effort to avoid potential blind spots and bias.
(Speaking for myself, not OpenPhil, who I wouldn’t be able to speak for anyways.)
For what it’s worth, I’m pretty critical of deep learning, which is the approach OpenAI wants to take, and still think the grant to OpenAI was a pretty good idea; and I can’t really think of anyone more familiar with MIRI’s work than Paul who isn’t already at MIRI (note that Paul started out pursuing MIRI’s approach and shifted in an ML direction over time).
That being said, I agree that the public write-up on the OpenAI grant doesn’t reflect that well on OpenPhil, and it seems correct for people like you to demand better moving forward (although I’m not sure that adding HRAD researchers as TAs is the solution; also note that OPP does consult regularly with MIRI staff, though I don’t know if they did for the OpenAI grant).
The Agent Foundations Forum would have been a good place to look for more people familiar with MIRI’s work. Aside from Paul, I see Stuart Armstrong, Abram Demski, Vadim Kosoy, Tsvi Benson-Tilsen, Sam Eisenstat, Vladimir Slepnev, Janos Kramar, Alex Mennen, and many others. (Abram, Tsvi, and Sam have since joined MIRI, but weren’t employees of it at the time of the Open Phil grant.)
I had previously seen some complaints about the way the OpenAI grant was made, but until your comment, hadn’t thought of a possible group blind spot due to a common ML perspective. If you have any further insights on this and related issues (like why you’re critical of deep learning but still think the grant to OpenAI was a pretty good idea, what are your objections to Paul’s AI alignment approach, how could Open Phil have done better), would you please write them down somewhere?