Thank you very much for the write-up of your reasoning!
After having just now (prior to reading this post) a two-hour conversation on this topic, I’m glad to see we agree on the main points.
The main thing I think you didn’t say, is that not only is it important that a diversity of approaches are worked on (and funded), but also that as the field is rapidly growing, it could easily be the case that the current distribution of resources into research approaches will be set in place [That’s not quite right, see Owen CB’s comment below]. While the ML-based approach at OpenAI and DeepMind and so on are promising, I would be unhappy if they grew to be the only approach—I don’t feel we’re confident enough to know whether they’re the right approach. As such, we have a comparative advantage relative to all future versions of ourselves in funding other approaches and shaping the overall research space.
I’m likely to donate to MIRI myself this year, but first I will check out other approaches being worked on. I believe that Paul Christiano is not funding constrained (but will ask), and will look through e.g. the list of FLI grantees, and check up on whose work appears promising and is funding constrained. In the case where I have sufficient time to review their work in detail, I may offer to purchase an impact certificate from the researcher(s) who I think did the best research with their grant money.
(including the bizarre-ness of OpenPhil’s analysis of number-of-papers-written, which is not how one measures progress of fundamentals research.)
What in the grant write-up makes you think the focus was on number-of-papers-written? I was one of the reviewers and that was definitely not our process.
(Disclaimer: I’m a scientific advisor for OpenPhil, all opinions here are my own.)
Think this is at least partially my fault. I included a phrase “(in the metric of papers written, say)” when discussing progress in the above post, but I didn’t really think this was the main metric you were judging things on. I’ll edit that out.
The sense it which it felt “bemusingly unfair” was that the natural situation it brought to mind was taking a bright grad student, telling them to work on AI safety and giving them no more supervision, then waiting 1-3 years. In that scenario I’d be ecstatic to see something like what MIRI have done.
I don’t actually think that’s the claim that was intended either, though. I think the write-up was trying to measure something like the technical impressiveness of the theorems proved (of course I’m simplifying a bit). There is at least something reasonable in assessing this, in that it is common in academia, and I think is often a decent proxy for How good are the people doing this work?, particularly if they’re optimising for that metric. In doing so it also provided some useful information to me, because I hadn’t seriously tried to assess this.
However, it isn’t the metric I actually care about. I’m interested in their theory-building rather than their theorem-proving. I wouldn’t say I’m extremely impressed by them on that metric, but at least enough that when I interpreted the claim as being about theory-building, I felt it was quite unfair.
Very interested to know whether you think this is a fair perspective on what was actually being assessed.
I feel like I care a lot about theory-building, and at least some of the other internal and external reviewers care a lot about it as well. As an example, consider External Review #1 of Paper #3 (particularly the section starting “How significant do you feel these results are for that?”). Here are some snippets (link to document here):
The first paragraph suggests that this problem is motivated by the concern of assigning probabilities to computations. This can be viewed as an instance of the more general problems of (a) modeling a resource-bounded decision maker computing probabilities and (b) finding techniques to help a resource-bounded decision maker compute probabilities. I find both of these problems very interesting. But I think that the model here is not that useful for either of these problems. Here are some reasons why:
It’s not clear why the properties of uniform coherence are the “right” ones to focus on. Uniform coherence does imply that, for any fixed formula, the probability converges to some number, which is certainly a requirement that we would want. This is implied by the second property of uniform coherence. But that property considers not just constant sequences of formulas, but sequence where the nth formula implies the (n+1)st. Why do we care about such sequences? [...]
The issue of computational complexity is not discussed in the paper, but it is clearly highly relevant. [...]
Several more points are raised, followed by (emphasis mine):
I see no obvious modification of uniformly coherent schemes that would address these concerns. Even worse, despite the initial motivation, the authors do not seem to be thinking about these motivational issues.
For another example, see External Review #1 of Paper #4 (I’m avoiding commenting on internal reviews because I want to be sensitive to breaking anonymity).
On the website, it is promised that this paper makes a step towards figuring out how to come up with “logically non-omniscient reasoners”. [...]
This surely sounds impressive, but there is the question whether this is a correct interpretation of Theorem 5. In particular, one could imagine two cases: a) we are predicting a single type of computation, and b) we are predicting several types of computations. In case (a), why would the delays matter in asymptotic convergence in the first place? [...] In case (b), the setting that is studied is not a good abstraction: in this case there should be some “contextual information” available to the learner, otherwise the only way to distinguish between two types of computations will be based on temporal relation, which is a very limiting assumption here.
To end with some thoughts of my own: in general, when theory-building I think it is very important to consider both the relevance of the theoretical definitions to the original problem of interest, and the richness of what can actually be said. I don’t think that definitions can be assessed independently of the theory that can be built from them. At the danger of self-promotion, I think that my own work here, which makes both definitional and theoretical contributions relevant to ML + security, does a good job of putting forth definitions and justifying them (by showing that we can get unexpectedly strong results in the setting considered, via a nice and fairly general algorithm, and that these results have unexpected and important implications for initially unrelated-seeming problems). I also claim that this work is relevant to AI safety but perhaps others will disagree.
Thanks for taking the time to highlight these. This is helpful, and shows that I hadn’t quite done my homework in the above characterisation of the difference.
I agree then that the review was at least significantly concerned with theory-building. I had originally read this basket of concerns as more about clarity of communication (which I think is a big issue with MIRI’s work), but I grant that there’s actually quite a lot of overlap between the issues. See also my recent reply to Anna elsewhere in the comment thread.
I like the thoughts of your own at the end. I do think that the value of definitions depends on what you can build on them (although I’m not sure whether “richness” is the right characterisation—it seems that sometimes the right definition makes the correct answer to a question you care about extremely clear, without necessarily any real sophistication in the middle).
I think that work of the type you link to is important, and roughly the type want the majority of work in the next decade to be (disclaimer: I haven’t yet read it carefully). I am still interested in work which tries to build ahead and get us a better theory for systems which are in important ways more powerful than current systems. I think it’s harder to ground this well (basically you’re paying a big nearsightedness penalty), but there’s time-criticality of doing it early if it’s needed to inform swathes of later work.
Here’s my current high-level take on the difference in our perspectives:
There is an ambiguity in whether MIRI’s work is actually useful theory-building that they are just doing a poor job of communicating clearly, or whether it’s not building something useful.
I tend towards giving them the benefit of the doubt / hedging that they are doing something valuable.
The Open Phil review takes a more sceptical position, that if they can’t clearly express the value of the work, maybe there is not so much to it.
Also, I realized it might not be clear why I thought the quotes above are relevant to whether the reviews addressed the “theory-building” aspect. The point is it seems to me that the quoted parts of the reviews are directly engaging with whether the definitions make sense / the results are meaningful, which is a question about the adequacy of the theory for addressing the claimed questions, and not of its technical impressiveness. (I could imagine you don’t feel this addresses what you meant by theory-building, but in that case you’ll have to be more specific for me to understand what you have in mind.)
Thanks for pointing that out! I’ve been conflating your comments with other conversations I’ve had with people about MIRI, and have removed my sentence. I just read through the OpenPhil report carefully again.
I think that I disagree with OpenPhil’s stated conclusions, but due to having looked at different papers (I had forgotten that the ‘unsupervised grad student’ comment referred just to the three papers submitted, and I’d mis-remembered exactly which papers they were). After conversations with a few early-stage researchers in other fields, I think that some of the other papers might be notably more impressive (e.g. the Grain-of-Truth paper accepted to UAI).
I understand the key example of MIRI’s theory-building approach to be the extensive Logical Inductors paper, but haven’t heard much feedback on the usefulness/impressiveness from non-MIRI researchers yet. I’d be quite interested to know if you have read it and updated up/down about MIRI as a result (as I’m considering donating to MIRI this year partly based on this).
After conversations with researchers whose opinions I respect, I’ve been lead to believe certain other papers are very impressive (e.g. the Grain-of-Truth paper accepted to UAI).
Could you say more about the credibility of these researchers’ opinions? E.g. what fields are they in, how successful in their fields, how independent of MIRI?
Pretty independant of MIRI, early-stage researchers, other areas of theoretical CS (formal logic, game theory). I didn’t mean for this to be strong evidence, have changed the wording.
I think it’s quite unlikely that the current distribution will be set in place. And actually on purely current distributions I’m not sure the MIRI approach is underrepresented. On the other hand I think it’s likely that the current distribution will influence future distribution, which is what’s relevant; I’m trying to push back a little against an expected trend towards ML-based approaches representing a very large share of the work.
Yes, you’re right about it not being ‘set in place’. I more meant to say that, while funding and interest has grown significantly (OpenAI and DeepMind have in principle billions of dollars of spending power each and are now significantly interested in this topic), MIRI failed to reach it’s $800k minimal fundraising target this year, and so I expect that the main approaches to AI that are being followed elsewhere will get the most attention in the future.
OpenAI and DeepMind have in principle billions of dollars of spending power each and are now significantly interested in this topic)
While I think there is a true point in this vicinity (it will be a lot easier to fund ML-based approaches, including at these organizations, but also others), this seems to be overstating the relevant resources and the effort going into safety topics. OpenAI has been funded with a billion dollars (although it might receive more funding later), and its annual spending must of course be lower. And both of these organizations have primary aims of advancing AI, with limited efforts on safety issues thus far.
Thank you very much for the write-up of your reasoning!
After having just now (prior to reading this post) a two-hour conversation on this topic, I’m glad to see we agree on the main points.
The main thing I think you didn’t say, is that not only is it important that a diversity of approaches are worked on (and funded), but also that as the field is rapidly growing, it could easily be the case that the current distribution of resources into research approaches will be set in place [That’s not quite right, see Owen CB’s comment below]. While the ML-based approach at OpenAI and DeepMind and so on are promising, I would be unhappy if they grew to be the only approach—I don’t feel we’re confident enough to know whether they’re the right approach. As such, we have a comparative advantage relative to all future versions of ourselves in funding other approaches and shaping the overall research space.
I’m likely to donate to MIRI myself this year, but first I will check out other approaches being worked on. I believe that Paul Christiano is not funding constrained (but will ask), and will look through e.g. the list of FLI grantees, and check up on whose work appears promising and is funding constrained. In the case where I have sufficient time to review their work in detail, I may offer to purchase an impact certificate from the researcher(s) who I think did the best research with their grant money.
What in the grant write-up makes you think the focus was on number-of-papers-written? I was one of the reviewers and that was definitely not our process.
(Disclaimer: I’m a scientific advisor for OpenPhil, all opinions here are my own.)
Think this is at least partially my fault. I included a phrase “(in the metric of papers written, say)” when discussing progress in the above post, but I didn’t really think this was the main metric you were judging things on. I’ll edit that out.
The sense it which it felt “bemusingly unfair” was that the natural situation it brought to mind was taking a bright grad student, telling them to work on AI safety and giving them no more supervision, then waiting 1-3 years. In that scenario I’d be ecstatic to see something like what MIRI have done.
I don’t actually think that’s the claim that was intended either, though. I think the write-up was trying to measure something like the technical impressiveness of the theorems proved (of course I’m simplifying a bit). There is at least something reasonable in assessing this, in that it is common in academia, and I think is often a decent proxy for How good are the people doing this work?, particularly if they’re optimising for that metric. In doing so it also provided some useful information to me, because I hadn’t seriously tried to assess this.
However, it isn’t the metric I actually care about. I’m interested in their theory-building rather than their theorem-proving. I wouldn’t say I’m extremely impressed by them on that metric, but at least enough that when I interpreted the claim as being about theory-building, I felt it was quite unfair.
Very interested to know whether you think this is a fair perspective on what was actually being assessed.
I feel like I care a lot about theory-building, and at least some of the other internal and external reviewers care a lot about it as well. As an example, consider External Review #1 of Paper #3 (particularly the section starting “How significant do you feel these results are for that?”). Here are some snippets (link to document here):
Several more points are raised, followed by (emphasis mine):
For another example, see External Review #1 of Paper #4 (I’m avoiding commenting on internal reviews because I want to be sensitive to breaking anonymity).
To end with some thoughts of my own: in general, when theory-building I think it is very important to consider both the relevance of the theoretical definitions to the original problem of interest, and the richness of what can actually be said. I don’t think that definitions can be assessed independently of the theory that can be built from them. At the danger of self-promotion, I think that my own work here, which makes both definitional and theoretical contributions relevant to ML + security, does a good job of putting forth definitions and justifying them (by showing that we can get unexpectedly strong results in the setting considered, via a nice and fairly general algorithm, and that these results have unexpected and important implications for initially unrelated-seeming problems). I also claim that this work is relevant to AI safety but perhaps others will disagree.
Thanks for taking the time to highlight these. This is helpful, and shows that I hadn’t quite done my homework in the above characterisation of the difference.
I agree then that the review was at least significantly concerned with theory-building. I had originally read this basket of concerns as more about clarity of communication (which I think is a big issue with MIRI’s work), but I grant that there’s actually quite a lot of overlap between the issues. See also my recent reply to Anna elsewhere in the comment thread.
I like the thoughts of your own at the end. I do think that the value of definitions depends on what you can build on them (although I’m not sure whether “richness” is the right characterisation—it seems that sometimes the right definition makes the correct answer to a question you care about extremely clear, without necessarily any real sophistication in the middle).
I think that work of the type you link to is important, and roughly the type want the majority of work in the next decade to be (disclaimer: I haven’t yet read it carefully). I am still interested in work which tries to build ahead and get us a better theory for systems which are in important ways more powerful than current systems. I think it’s harder to ground this well (basically you’re paying a big nearsightedness penalty), but there’s time-criticality of doing it early if it’s needed to inform swathes of later work.
Here’s my current high-level take on the difference in our perspectives:
There is an ambiguity in whether MIRI’s work is actually useful theory-building that they are just doing a poor job of communicating clearly, or whether it’s not building something useful.
I tend towards giving them the benefit of the doubt / hedging that they are doing something valuable.
The Open Phil review takes a more sceptical position, that if they can’t clearly express the value of the work, maybe there is not so much to it.
Also, I realized it might not be clear why I thought the quotes above are relevant to whether the reviews addressed the “theory-building” aspect. The point is it seems to me that the quoted parts of the reviews are directly engaging with whether the definitions make sense / the results are meaningful, which is a question about the adequacy of the theory for addressing the claimed questions, and not of its technical impressiveness. (I could imagine you don’t feel this addresses what you meant by theory-building, but in that case you’ll have to be more specific for me to understand what you have in mind.)
Thanks for pointing that out! I’ve been conflating your comments with other conversations I’ve had with people about MIRI, and have removed my sentence. I just read through the OpenPhil report carefully again.
I think that I disagree with OpenPhil’s stated conclusions, but due to having looked at different papers (I had forgotten that the ‘unsupervised grad student’ comment referred just to the three papers submitted, and I’d mis-remembered exactly which papers they were). After conversations with a few early-stage researchers in other fields, I think that some of the other papers might be notably more impressive (e.g. the Grain-of-Truth paper accepted to UAI).
I understand the key example of MIRI’s theory-building approach to be the extensive Logical Inductors paper, but haven’t heard much feedback on the usefulness/impressiveness from non-MIRI researchers yet. I’d be quite interested to know if you have read it and updated up/down about MIRI as a result (as I’m considering donating to MIRI this year partly based on this).
Could you say more about the credibility of these researchers’ opinions? E.g. what fields are they in, how successful in their fields, how independent of MIRI?
Pretty independant of MIRI, early-stage researchers, other areas of theoretical CS (formal logic, game theory). I didn’t mean for this to be strong evidence, have changed the wording.
I think it’s quite unlikely that the current distribution will be set in place. And actually on purely current distributions I’m not sure the MIRI approach is underrepresented. On the other hand I think it’s likely that the current distribution will influence future distribution, which is what’s relevant; I’m trying to push back a little against an expected trend towards ML-based approaches representing a very large share of the work.
Yes, you’re right about it not being ‘set in place’. I more meant to say that, while funding and interest has grown significantly (OpenAI and DeepMind have in principle billions of dollars of spending power each and are now significantly interested in this topic), MIRI failed to reach it’s $800k minimal fundraising target this year, and so I expect that the main approaches to AI that are being followed elsewhere will get the most attention in the future.
While I think there is a true point in this vicinity (it will be a lot easier to fund ML-based approaches, including at these organizations, but also others), this seems to be overstating the relevant resources and the effort going into safety topics. OpenAI has been funded with a billion dollars (although it might receive more funding later), and its annual spending must of course be lower. And both of these organizations have primary aims of advancing AI, with limited efforts on safety issues thus far.