Max_Daniel comments on Some thoughts on deference and inside-view models

Max_Daniel 3 Jun 2020 9:08 UTC
9 points
0 ∶ 0
Thanks, I think this is a useful clarification. I’m actually not sure if I even clearly distinguished these cases in my thinking when I wrote my previous comments, but I agree the thing you quoted is primarily relevant to when end-to-end stories will be externally validated. (By which I think you mean something like: they would lead to an ‘objective’ solution, e.g. maths proof, if executed without major changes.)
The extent to which we agree depends on what counts as end-to-end story. For example, consider someone working on ML transparency claiming their research is valuable for AI alignment. My guess is:
- If literally everything they can say when queried is “I don’t know how transparency helps with AI alignment, I just saw the term in some list of relevant research directions”, then we both are quite pessimistic about the value of that work.
- If they say something like “I’ve made the deliberate decision not to focus on research for which I can fully argue it will be relevant to AI alignment right now. Instead, I just focus on understanding ML transparency as best as I can because I think there are many scenarios in which understanding transparency will be beneficial.”, and then they say something showing they understand longtermist thought on AI risk, then I’m not necessarily pessimistic. I’d think they won’t come up with their own research agenda in the next two years, but depending on the circumstances I might well be optimistic about that person’s impact over their whole career, and I wouldn’t necessarily recommend them to change their approach. I’m not sure what you’d think, but I think initially I read you as being pessimistic in such a case, and this was partly what I was reacting against.
- If they give an end-to-end story for how their work fits within AI alignment, then all else equal I consider that to be a good sign. However, depending on the circumstances I might still think the best long-term strategy for that person is to postpone the direct pursuit of that end-to-end story and instead focus on targeted deliberate practice of some of the relevant skills, or at least complement the direct pursuit with such deliberate practice. For example, if someone is very junior, and their story says that mathematical logic is important for their work, I might recommend they grab a logic textbook and work through all the exercises. My guess is we disagree on such cases, but that the disagreement is somewhat gradual; i.e. we both agree about extreme cases, but I’d more often recommend more substantial deliberate practice.