Think this is at least partially my fault. I included a phrase “(in the metric of papers written, say)” when discussing progress in the above post, but I didn’t really think this was the main metric you were judging things on. I’ll edit that out.
The sense it which it felt “bemusingly unfair” was that the natural situation it brought to mind was taking a bright grad student, telling them to work on AI safety and giving them no more supervision, then waiting 1-3 years. In that scenario I’d be ecstatic to see something like what MIRI have done.
I don’t actually think that’s the claim that was intended either, though. I think the write-up was trying to measure something like the technical impressiveness of the theorems proved (of course I’m simplifying a bit). There is at least something reasonable in assessing this, in that it is common in academia, and I think is often a decent proxy for How good are the people doing this work?, particularly if they’re optimising for that metric. In doing so it also provided some useful information to me, because I hadn’t seriously tried to assess this.
However, it isn’t the metric I actually care about. I’m interested in their theory-building rather than their theorem-proving. I wouldn’t say I’m extremely impressed by them on that metric, but at least enough that when I interpreted the claim as being about theory-building, I felt it was quite unfair.
Very interested to know whether you think this is a fair perspective on what was actually being assessed.
I feel like I care a lot about theory-building, and at least some of the other internal and external reviewers care a lot about it as well. As an example, consider External Review #1 of Paper #3 (particularly the section starting “How significant do you feel these results are for that?”). Here are some snippets (link to document here):
The first paragraph suggests that this problem is motivated by the concern of assigning probabilities to computations. This can be viewed as an instance of the more general problems of (a) modeling a resource-bounded decision maker computing probabilities and (b) finding techniques to help a resource-bounded decision maker compute probabilities. I find both of these problems very interesting. But I think that the model here is not that useful for either of these problems. Here are some reasons why:
It’s not clear why the properties of uniform coherence are the “right” ones to focus on. Uniform coherence does imply that, for any fixed formula, the probability converges to some number, which is certainly a requirement that we would want. This is implied by the second property of uniform coherence. But that property considers not just constant sequences of formulas, but sequence where the nth formula implies the (n+1)st. Why do we care about such sequences? [...]
The issue of computational complexity is not discussed in the paper, but it is clearly highly relevant. [...]
Several more points are raised, followed by (emphasis mine):
I see no obvious modification of uniformly coherent schemes that would address these concerns. Even worse, despite the initial motivation, the authors do not seem to be thinking about these motivational issues.
For another example, see External Review #1 of Paper #4 (I’m avoiding commenting on internal reviews because I want to be sensitive to breaking anonymity).
On the website, it is promised that this paper makes a step towards figuring out how to come up with “logically non-omniscient reasoners”. [...]
This surely sounds impressive, but there is the question whether this is a correct interpretation of Theorem 5. In particular, one could imagine two cases: a) we are predicting a single type of computation, and b) we are predicting several types of computations. In case (a), why would the delays matter in asymptotic convergence in the first place? [...] In case (b), the setting that is studied is not a good abstraction: in this case there should be some “contextual information” available to the learner, otherwise the only way to distinguish between two types of computations will be based on temporal relation, which is a very limiting assumption here.
To end with some thoughts of my own: in general, when theory-building I think it is very important to consider both the relevance of the theoretical definitions to the original problem of interest, and the richness of what can actually be said. I don’t think that definitions can be assessed independently of the theory that can be built from them. At the danger of self-promotion, I think that my own work here, which makes both definitional and theoretical contributions relevant to ML + security, does a good job of putting forth definitions and justifying them (by showing that we can get unexpectedly strong results in the setting considered, via a nice and fairly general algorithm, and that these results have unexpected and important implications for initially unrelated-seeming problems). I also claim that this work is relevant to AI safety but perhaps others will disagree.
Thanks for taking the time to highlight these. This is helpful, and shows that I hadn’t quite done my homework in the above characterisation of the difference.
I agree then that the review was at least significantly concerned with theory-building. I had originally read this basket of concerns as more about clarity of communication (which I think is a big issue with MIRI’s work), but I grant that there’s actually quite a lot of overlap between the issues. See also my recent reply to Anna elsewhere in the comment thread.
I like the thoughts of your own at the end. I do think that the value of definitions depends on what you can build on them (although I’m not sure whether “richness” is the right characterisation—it seems that sometimes the right definition makes the correct answer to a question you care about extremely clear, without necessarily any real sophistication in the middle).
I think that work of the type you link to is important, and roughly the type want the majority of work in the next decade to be (disclaimer: I haven’t yet read it carefully). I am still interested in work which tries to build ahead and get us a better theory for systems which are in important ways more powerful than current systems. I think it’s harder to ground this well (basically you’re paying a big nearsightedness penalty), but there’s time-criticality of doing it early if it’s needed to inform swathes of later work.
Here’s my current high-level take on the difference in our perspectives:
There is an ambiguity in whether MIRI’s work is actually useful theory-building that they are just doing a poor job of communicating clearly, or whether it’s not building something useful.
I tend towards giving them the benefit of the doubt / hedging that they are doing something valuable.
The Open Phil review takes a more sceptical position, that if they can’t clearly express the value of the work, maybe there is not so much to it.
Also, I realized it might not be clear why I thought the quotes above are relevant to whether the reviews addressed the “theory-building” aspect. The point is it seems to me that the quoted parts of the reviews are directly engaging with whether the definitions make sense / the results are meaningful, which is a question about the adequacy of the theory for addressing the claimed questions, and not of its technical impressiveness. (I could imagine you don’t feel this addresses what you meant by theory-building, but in that case you’ll have to be more specific for me to understand what you have in mind.)
Think this is at least partially my fault. I included a phrase “(in the metric of papers written, say)” when discussing progress in the above post, but I didn’t really think this was the main metric you were judging things on. I’ll edit that out.
The sense it which it felt “bemusingly unfair” was that the natural situation it brought to mind was taking a bright grad student, telling them to work on AI safety and giving them no more supervision, then waiting 1-3 years. In that scenario I’d be ecstatic to see something like what MIRI have done.
I don’t actually think that’s the claim that was intended either, though. I think the write-up was trying to measure something like the technical impressiveness of the theorems proved (of course I’m simplifying a bit). There is at least something reasonable in assessing this, in that it is common in academia, and I think is often a decent proxy for How good are the people doing this work?, particularly if they’re optimising for that metric. In doing so it also provided some useful information to me, because I hadn’t seriously tried to assess this.
However, it isn’t the metric I actually care about. I’m interested in their theory-building rather than their theorem-proving. I wouldn’t say I’m extremely impressed by them on that metric, but at least enough that when I interpreted the claim as being about theory-building, I felt it was quite unfair.
Very interested to know whether you think this is a fair perspective on what was actually being assessed.
I feel like I care a lot about theory-building, and at least some of the other internal and external reviewers care a lot about it as well. As an example, consider External Review #1 of Paper #3 (particularly the section starting “How significant do you feel these results are for that?”). Here are some snippets (link to document here):
Several more points are raised, followed by (emphasis mine):
For another example, see External Review #1 of Paper #4 (I’m avoiding commenting on internal reviews because I want to be sensitive to breaking anonymity).
To end with some thoughts of my own: in general, when theory-building I think it is very important to consider both the relevance of the theoretical definitions to the original problem of interest, and the richness of what can actually be said. I don’t think that definitions can be assessed independently of the theory that can be built from them. At the danger of self-promotion, I think that my own work here, which makes both definitional and theoretical contributions relevant to ML + security, does a good job of putting forth definitions and justifying them (by showing that we can get unexpectedly strong results in the setting considered, via a nice and fairly general algorithm, and that these results have unexpected and important implications for initially unrelated-seeming problems). I also claim that this work is relevant to AI safety but perhaps others will disagree.
Thanks for taking the time to highlight these. This is helpful, and shows that I hadn’t quite done my homework in the above characterisation of the difference.
I agree then that the review was at least significantly concerned with theory-building. I had originally read this basket of concerns as more about clarity of communication (which I think is a big issue with MIRI’s work), but I grant that there’s actually quite a lot of overlap between the issues. See also my recent reply to Anna elsewhere in the comment thread.
I like the thoughts of your own at the end. I do think that the value of definitions depends on what you can build on them (although I’m not sure whether “richness” is the right characterisation—it seems that sometimes the right definition makes the correct answer to a question you care about extremely clear, without necessarily any real sophistication in the middle).
I think that work of the type you link to is important, and roughly the type want the majority of work in the next decade to be (disclaimer: I haven’t yet read it carefully). I am still interested in work which tries to build ahead and get us a better theory for systems which are in important ways more powerful than current systems. I think it’s harder to ground this well (basically you’re paying a big nearsightedness penalty), but there’s time-criticality of doing it early if it’s needed to inform swathes of later work.
Here’s my current high-level take on the difference in our perspectives:
There is an ambiguity in whether MIRI’s work is actually useful theory-building that they are just doing a poor job of communicating clearly, or whether it’s not building something useful.
I tend towards giving them the benefit of the doubt / hedging that they are doing something valuable.
The Open Phil review takes a more sceptical position, that if they can’t clearly express the value of the work, maybe there is not so much to it.
Also, I realized it might not be clear why I thought the quotes above are relevant to whether the reviews addressed the “theory-building” aspect. The point is it seems to me that the quoted parts of the reviews are directly engaging with whether the definitions make sense / the results are meaningful, which is a question about the adequacy of the theory for addressing the claimed questions, and not of its technical impressiveness. (I could imagine you don’t feel this addresses what you meant by theory-building, but in that case you’ll have to be more specific for me to understand what you have in mind.)