Many thanks for the reply, Rob, and apologies for missing the AMAâalthough this discussion may work better in this thread anyway.
Respectfully, my reading of the Open Phil report suggests it is more broadly adverse than you suggest: in broad strokes, the worries are 1) That the research MIRI is undertaking probably isnât that helpful at improving AI risk; and 2) The research output MIRI has made along these lines is in any case unimpressive. I am sympathetic to both lines of criticism, but I am more worried by the latter than the former: AI risk is famously recondrite, thus diversity of approaches seems desirable.
Some elements of Open Philâs remarks on the latter concern seem harsh to meâin particular the remark that the suite of papers presented would be equivalent to 1-3 yearâs work from an unsupervised grad student is inapposite given selection, and especially given the heartening progress of papers being presented at UAI (although one of these is by Armstrong, who I gather is principally supported by FHI).
Yet others are frankly concerning. It is worrying that many of the papers produced by MIRI were considered unimpressive. It is even more worrying that despite the considerable efforts Open Phil made to review MIRIâs efficacyâcomissioning academics to review, having someone spend a hundred hours looking at them, etc. - they remain unconvinced of the quality of your work. That they emphasize fairly research-independent considerations in offering a limited grant (e.g. involvement in review process, germinating SPARC, hedging against uncertainty of approaches) is hardly a ringing endorsement; that they expressly benchmark MIRIâs research quality as less than a higher end academic grantee likewise; comparison to other grants Open Phil have made in the AI space (e.g. 1.1M to FLI, 5.5M for a new center at UC Berkeley) even more so.
It has been remarked on this forum before MIRI is a challenging organisation to evaluate as the output (technical research in computer science) is opaque to most without a particular quantitative background. MIRIâs predictions and responses to Open Phil implies a more extreme position: even domain experts are unlikely to appreciate the value of MIRIâs work without a considerable back-and-forth with MIRI itself. I confess scepticism at this degree of inferential distance, particularly given the Open Phil staff involved in this report involved several people who previously worked with MIRI.
I accept MIRI may not be targetting conventional metrics of research success (e.g. academic publications). Yet across most proxy indicators (e.g. industry involvement, academic endorsement, collaboration) for MIRI âdoing good researchâ, the evidence remains pretty thin on the groundâand, as covered above, direct assessment of research quality by domain experts is mixed at best. I look forward to the balance of evidence shifting favourably: the new conference papers are promising, ditto the buzz around logical induction (although I note the blogging is by people already in MIRIâs sphere of influence/âformer staff, and MIRIâs previous âblockbuster resultâ in decision theory has thus far underwhelmed). Yet this hope, alongside the earnest assurances of MIRI thatâif only experts gave them the timeâthey would be persuaded of their value, is not a promissory note that easily justifies an organisation with a turnover of $2M/âyear, nor fundraising for over a million dollars more.
I take this opportunity to note I have made an even-odds bet with Carl Shulman for $1000, donated to the charity of the winnerâs choice over whether Open Philâs next review of MIRI has a more favourable evaluation of their research.
I am wiser, albeit poorer: the bet resolved in Carlâs favour. I will edit this comment with the donation destination he selects, with further lamentations from me in due course.
Carl has gotten back to me with where he would like to donate his gains, ill-gotten through picking on epistemic inferiorsâakin to crocodiles in the Serengeti river picking off particularly frail or inept wildebeest on their crossing. The $1000 will go to MIRI.
With cognitive function mildly superior to the median geriatric wildebeest, I can take some solace that these circumstances imply this sum is better donated by him than I, and that MIRI is doing better on a crucial problem for the far future than I had supposed.
Thanks for the response, Gregory. I was hoping to see more questions along these lines in the AMA, so Iâm glad you followed up.
Open Philâs grant write-up is definitely quite critical, and not an endorsement. One of Open Philâs main criticisms of MIRI is that they donât think our agent foundations agenda is likely to be useful for AI alignment; but their reasoning behind this is complicated, and neither Open Phil nor MIRI has had time yet to write up our thoughts in any detail. I suggest pinging me to say more about this once MIRI and Open Phil have put up more write-ups on this topic, since the hope is that the write-ups will also help third parties better evaluate our research methods on their merits.
I think Open Philâs assessment that the papers they reviewed were âtechnically unimpressiveâ is mainly based on the papers âAsymptotic Convergence in Online Learning with Unbounded Delaysâ and (to a lesser extent) âInductive Coherence.â These are technically unimpressive, in the sense that theyâre pretty easy results to get once youâre looking for them. (The proof in âAsymptotic Convergence...â was finished in less than a week.) From my perspective the impressive step is Scott Garrabrant (the papersâ primary author) getting from the epistemic state (1) âI notice AIXI fails in reflection tasks, and that this failure is deep and canât be easily patchedâ to:
(2) âI notice that one candidate for âthe ability AIXI is missing that would fix these deep defectsâ is âlearning mathematical theorems while respecting patterns in whether a given theorem can be used to (dis)prove other theorems.ââ
(3) âI notice that another candidate for âthe ability AIXI is missing that would fix these deep defectsâ is âlearning mathematical theorems while respecting empirical patterns in whether a claim looks similar to a set of claims that turned out to be theorems.ââ
(4) âI notice that the two most obvious and straightforward ways to formalize these two abilities donât let you get the other ability for free; in fact, the obvious and straightforward algorithm for the first ability precludes possessing the second ability, and vice versa.â
In contrast, I think the reviewers were mostly assessing how difficult it would be to get from 2/â3/â4 to a formal demonstration that thereâs at least one real (albeit impractical) algorithm that can actually exhibit ability 2, and one that can exhibit ability 3. This is a reasonable question to look at, since itâs a lot harder to retrospectively assess how difficult it is to come up with a semiformal insight than how difficult it is to formalize the insight; but those two papers werenât really chosen for being technically challenging or counter-intuitive. They were chosen because they help illustrate two distinct easy/âstraightforward approaches to LU that turned out to be hard to reconcile, and also because (speaking with the benefit of hindsight) conceptually disentangling these two kinds of approaches turned out to be one of the key insights leading to âLogical Induction.â
I confess scepticism at this degree of inferential distance, particularly given the Open Phil staff involved in this report involved several people who previously worked with MIRI.
I wasnât surprised that thereâs a big inferential gap for most of Open Philâs technical advisorsâwe havenât talked much with Chris/âDario/âJacob about the reasoning behind our research agenda. I was surprised by how big the gap was for Daniel Dewey, Open Philâs AI risk program officer. Danielâs worked with us before and has a lot of background in alignment research at FHI, and we spent significant time trying to understand each otherâs views, so this was a genuine update for me about how non-obvious our heuristics are to high-caliber researchers in the field, and about how much background researchers at MIRI and FHI have in common. This led to a lot of wasted time: I did a poor job addressing Danielâs questions until late in the review process.
Iâm not sure what prior probability you should have assigned to âthe case for MIRIâs research agenda is too complex to be reliably communicated in the relevant timeframe.â Evaluating how promising basic research is for affecting the long-run trajectory of the field of AI is inherently a lot more complicated than evaluating whether AI risk is a serious issue, for example. I donât have as much experience communicating the former, so the arguments are still rough. There are a couple of other reasons MIRIâs research focus might have more inferential distance than the typical alignment research project:
(a) Weâve been thinking about these problems for over a decade, so weâve had time to arrive at epistemic states that depend on longer chains of reasoning. Similarly, weâve had time to explore and rule out various obvious paths (that turn out to be dead ends).
(b) Our focus is on topics we donât expect to jibe well with academia and industry, often because they look relatively intractable and unimportant from standard POVs.
(c) âHigh-quality nonstandard formal intuitionsâ are what we do. This is what put us ahead of the curve on understanding the AI alignment problem, and the basic case for MIRI (from the perspective of people like Holden who see our early analysis and promotion of the alignment problem as our clearest accomplishment) is that our nonstandard formal intuitions may continue to churn out correct and useful insights about AI alignment when we zero in on subproblems. MIRI and FHI were unusual enough to come up with the idea of AI alignment research in the first place, so theyâre likely to come up with relatively unusual approaches within AI alignment.
Based on the above, I think the lack of mutual understanding is moderately surprising rather than extremely surprising. Regardless, itâs clear that we need to do a better job communicating how we think about choosing open problems to work on.
I note the blogging is by people already in MIRIâs sphere of influence/âformer staff, and MIRIâs previous âblockbuster resultâ in decision theory has thus far underwhelmed)
I donât think weâve ever worked with Scott Aaronson, though weâre obviously on good terms with him. Also, our approach to decision theory stirred up a lot of interest from professional decision theorists at last yearâs Cambridge conference; expect more about this in the next few months.
is not a promissory note that easily justifies an organization with a turnover of $2M/âyear, nor fundraising for over a million dollars more.
I think this is a reasonable criticism, and Iâm hoping our upcoming write-ups will help address this. If your main concern is that Open Phil doesnât think our work on logical uncertainty, reflection, and decision-theoretic counterfactuals is likely to be safety-relevant, keep in mind that Open Phil gave us $500k expecting this to raise our 2016 revenue from $1.6-2 million (the amount of 2016 revenue we projected absent Open Philâs support) to $2.1-2.5 million, in part to observe the ROI of the added $500k. Weâve received around $384k in our fundraiser so far (with four days to go), which is maybe 35-60% of what weâd expect based on past fundraiser performance. (E.g., we received $597k in our 2014 fundraisers and $955k in our 2015 ones.) Combined with our other non-Open-Phil funding sources, that means weâve so far received around $1.02M in 2016 revenue outside Open Phil, which is solidly outside the $1.6-2M range weâve been planning around.
There are a lot of reasons donors might be retracting; Iâd be concerned if the reason is that theyâre expecting Open Phil to handle MIRIâs funding on their own, or that theyâre interpreting some action of Open Philâs as a signal that Open Phil wants broadly Open-Phil-aligned donors to scale back support for MIRI.
(In all of the above, Iâm speaking only for myself; Open Phil staff and advisors donât necessarily agree with the above, and might frame things differently.)
âAlso, our approach to decision theory stirred up a lot of interest from professional decision theorists at last yearâs Cambridge conference; expect more about this in the next few months.â
A quick note to say that comments that have made their way back to me from relevant circles agree with this.
Also, my own impressionâfrom within academia, but outside decision theory and AIâis that the level of recognition of, and respect for, MIRIâs work is steadily rising in academia, although inferential gaps like what nate describes certainly exist, plus more generic cultural gaps. Iâve heard positive comments about MIRIâs work from academics I wouldnât have expected even to have heard of MIRI. And my impression, from popping by things like Cambridgeâs MIRIx discussion group, is that theyâre populated for the most part by capable people with standard academic backgrounds who have become involved based on the merits of the work rather than any existing connection to MIRI (although I imagine some are or were lesswrong readers).
Nate, my thanks for your reply. I regret I may not have expressed myself well enough for your reply to precisely target the worries I expressed; I also regret insofar as you reply overcomes my poor expression, it make my worries grow deeper.
If I read your approach to the Open Phil review correctly, you submitted some of the more technically unimpressive papers for review because they demonstrated the lead author developing some interesting ideas for research direction, and that they in some sense lead up to the âbig resultâ (Logical Induction). If so, this looks like a pretty surprising error: one of the standard worries facing MIRI given its fairly slender publication record is the technical quality of the work, and it seemed pretty clear that was the objective behind sending them out for evaluation. Under whatever constraints Open Phil provided, Iâd have sent the âbest by academic lightsâ papers I had.
In candour, I think âMIRI barking up the wrong treeâ and/âor (worse) âMIRI not doing that much good research)â is a much better explanation for what is going on than âinferential distanceâ. I struggle to imagine a fairer (or more propitious-to-MIRI) hearing than the Open Phil review: it involved two people (Dewey and Christiano) who previously worked with you guys, Dewey spent over 100 hours trying to understand the value of your work, they comissioned external experts in the field to review your work.
Suggesting that the fairly adverse review that results may be a product of lack of understanding makes MIRI seem more like a mystical tradition than a research group. If MIRI is unable to convince someone like Dewey, the prospects of it making the necessary collaborations or partnerships with the wider AI community look grim.
I donât think weâve ever worked with Scott Aaronson, though weâre obviously on good terms with him. Also, our approach to decision theory stirred up a lot of interest from professional decision theorists at last yearâs Cambridge conference; expect more about this in the next few months.
I had Aaronson down as within MIRIâs sphere of influence, but if I overstate I apologize (I am correct in that Yuan previously worked for you, right?)
I look forward to seeing MIRI producing or germinating some concrete results in decision theory. The âunderwhelming blockbusterâ I referred to above was the TDT/âUDT etc. stuff, which MIRI widely hyped but has since then languised in obscurity.
There are a lot of reasons donors might be retracting; Iâd be concerned if the reason is that theyâre expecting Open Phil to handle MIRIâs funding on their own, or that theyâre interpreting some action of Open Philâs as a signal that Open Phil wants broadly Open-Phil-aligned donors to scale back support for MIRI.
It may simply be the usual (albeit regrettable) trait of donors jockeying to be âlast resortâ - I guess it would depend what the usual distribution of donations are with respect to fundraising deadlines.
If donors are retracting, I would speculate Open Philâs report may be implicated. One potential model would be donors interpreting Open Philâs fairly critical support to be an argument against funding further growth by MIRI, thus pulling back so MIRIs overall revenue hovers at previous year levels (I donât read in the Open Phil a report a particular revenus target they wanted you guys to have). Perhaps a simpler explanation would be having a large and respected org do a fairly in depth review and give a fairly mixed review makes previously enthusiastic donors update to be more tepid, and perhaps direct their donations to other players in the AI space.
With respect, I doubt I will change my mind due to MIRI giving further write-ups, and if donors are pulling back in part âdue toâ Open Phil, I doubt it will change their minds either. It may be that âHigh quality non-standard formal insightsâ is what you guys do, but the value of that is pretty illegible on its own: it needs to be converted into tangible accomplishments (e.g. good papers, esteem from others in the field, interactions in industry) first to convince people there is actually something there, but also as this probably the plausible route to this comparative advantage having any impact.
Thus far this has not happened to a degree commensurate with MIRIâs funding base. I wrote four-and-a-half years ago that I was disappointed in MIRIâs lack of tangible accomplishments: I am even more disappointed that I find my remarks now follow fairly similar lines. Happily it can be fixedâif the logical induction result âtakes offâ as I infer you guys hope it does, it will likely fix itself. Unless and until then, I remain sceptical about MIRIâs value.
Under whatever constraints Open Phil provided, Iâd have sent the âbest by academic lightsâ papers I had.
We originally sent Nick Beckstead what we considered our four most important 2015 results, at his request; these were (1) the incompatibility of the âInductive Coherenceâ framework and the âAsymptotic Convergence in Online Learning with Unbounded Delaysâ framework; (2) the demonstration in âProof-Producing Reflection for HOLâ that a non-pathological form of self-referential reasoning is possible in a certain class of theorem-provers; (3) the reflective oracles result presented in âA Formal Solution to the Grain of Truth Problem,â âReflective Variants of Solomonoff Induction and AIXI,â and âReflective Oraclesâ; (4) and Vadim Kosoyâs âOptimal Predictorsâ work. The papers we listed under 1, 2, and 4 then got used in an external review process they probably werenât very well-suited for.
I think this was more or less just an honest miscommunication. I told Nick in advance that I only assigned an 8% probability to external reviewers thinking the âAsymptotic ConvergenceâŠâ result was âgoodâ on its own (and only a 20% probability for âInductive Coherenceâ). My impression of what happened is that Open Phil staff interpreted my pushback as saying that I thought the external reviews wouldnât carry much Bayesian evidence (but that the internal reviews still would), where what I was trying to communicate was that I thought the papers didnât carry very much Bayesian evidence about our technical output (and that I thought the internal reviewers would need to speak to us about technical specifics in order to understand why we thought they were important). Thus, we were surprised when their grant decision and write-up put significant weight on the internal reviews of those papers (and they were surprised that we were surprised). This is obviously really unfortunate, and another good sign that I should have committed more time and care to clearly communicating my thinking from the outset.
Regarding picking better papers for external review: We only put out 10 papers directly related to our technical agendas between Jan 2015 and Mar 2016, so the option space is pretty limited, especially given the multiple constraints Open Phil wanted to meet. Optimizing for technical impressiveness and non-obviousness as a stand-alone result, I might have instead gone with Critchâs bounded Löb paper and the grain of truth problem paper over the AC/âIC results. We did submit the grain of truth problem paper to Open Phil, but they decided not to review it because it didnât meet other criteria they were interested in.
If MIRI is unable to convince someone like Dewey, the prospects of it making the necessary collaborations or partnerships with the wider AI community look grim.
Iâm less pessimistic about building collaborations and partnerships, in part because weâre already on pretty good terms with other folks in the community, and in part because I think we have different models of how technical ideas spread. Regardless, I expect that with more and better communication, we can (upon re-evaluation) raise the probability of Open Phil staff that the work weâre doing is important.
More generally, though, I expect this task to get easier over time as we get better at communicating about our research. Thereâs already a body of AI alignment research (and, perhaps, methodology) that requires the equivalent of multiple university courses to understand, but there arenât curricula or textbooks for teaching it. If we can convince a small pool of researchers to care about the research problems we think are important, this will let us bootstrap to the point where we have more resources for communicating information that requires a lot of background and sustained scholarship, as well as more of the institutional signals that this stuff warrants a time investment.
I can maybe make the time expenditure thus far less mysterious if I mention a couple more ways I erred in trying to communicate my model of MIRIâs research agenda:
My early discussion with Daniel was framed around questions like âWhat specific failure mode do you expect to be exhibited by advanced AI systems iff their programmers donât understand logical uncertainty?â I made the mistake of attempting to give straight/ânon-evasive answers to those sorts of questions and let the discussion focus on that evaluation criterion, rather than promptly saying âMIRIâs research directions mostly arenât chosen to directly address a specific failure mode in a notional software systemâ and âI donât think thatâs a good heuristic for identifying research thatâs likely to be relevant to long-run AI safety.â
I fell prey to the transparency illusion pretty hard, and that was completely my fault. Mid-way through the process, Daniel made a write-up of what he had gathered so far; this write-up revealed a large number of miscommunications and places where I thought I had transmitted a concept of mine but Daniel had come away with a very different concept. Itâs clear in retrospect that we should have spent a lot more time with me having Daniel try to explain what he thought I meant, and I had all the tools to predict this in foresight; but I foolishly assumed that wouldnât be necessary in this case.
(I plan to blog more about the details of these later.)
I think these are important mistakes that show I hadnât sufficiently clarified several concepts in my own head, or spent enough time understanding Danielâs position. My hope is that I can do a much better job of avoiding these sorts of failures in the next round of discussion, now that I have a better model of where Open Philâs staff and advisors are coming from and what the review process looks like.
(I am correct in that Yuan previously worked for you, right?)
Yeah, though that was before my time. He did an unpaid internship with us in the summer of 2013, and weâve occasionally contracted him to tutor MIRI staff. Qiaochuâs also a lot socially closer to MIRI; he attended three of our early research workshops.
Unless and until then, I remain sceptical about MIRIâs value.
I think thatâs a reasonable stance to take, and that there are other possible reasonable stances here too. Some of the variables I expect EAs to vary on include âlevel of starting confidence in MIRIâs mathematical intuitions about complicated formal questionsâ and âgeneral risk tolerance.â A relatively risk-intolerant donor is right to wait until we have clearer demonstrations of success; and a relatively risk-tolerant donor who starts without a very high confidence in MIRIâs intuitions about formal systems might be pushed under a donation threshold by learning that an important disagreement has opened up between us and Daniel Dewey (or between us and other people at Open Phil).
Also, thanks for laying out your thinking in so much detailâI suspect there are other people who had more or less the same reaction to Open Philâs grant write-up but havenât spoken up about it. Iâd be happy to talk more about this over email, too, including answering Qs from anyone else who wants more of my thoughts on this.
Relevant update: Daniel Dewey and Nick Beckstead of Open Phil have listed MIRI as one of ten âreasonably strong options in causes of interestâ for individuals looking for places to donate this year.
Many thanks for the reply, Rob, and apologies for missing the AMAâalthough this discussion may work better in this thread anyway.
Respectfully, my reading of the Open Phil report suggests it is more broadly adverse than you suggest: in broad strokes, the worries are 1) That the research MIRI is undertaking probably isnât that helpful at improving AI risk; and 2) The research output MIRI has made along these lines is in any case unimpressive. I am sympathetic to both lines of criticism, but I am more worried by the latter than the former: AI risk is famously recondrite, thus diversity of approaches seems desirable.
Some elements of Open Philâs remarks on the latter concern seem harsh to meâin particular the remark that the suite of papers presented would be equivalent to 1-3 yearâs work from an unsupervised grad student is inapposite given selection, and especially given the heartening progress of papers being presented at UAI (although one of these is by Armstrong, who I gather is principally supported by FHI).
Yet others are frankly concerning. It is worrying that many of the papers produced by MIRI were considered unimpressive. It is even more worrying that despite the considerable efforts Open Phil made to review MIRIâs efficacyâcomissioning academics to review, having someone spend a hundred hours looking at them, etc. - they remain unconvinced of the quality of your work. That they emphasize fairly research-independent considerations in offering a limited grant (e.g. involvement in review process, germinating SPARC, hedging against uncertainty of approaches) is hardly a ringing endorsement; that they expressly benchmark MIRIâs research quality as less than a higher end academic grantee likewise; comparison to other grants Open Phil have made in the AI space (e.g. 1.1M to FLI, 5.5M for a new center at UC Berkeley) even more so.
It has been remarked on this forum before MIRI is a challenging organisation to evaluate as the output (technical research in computer science) is opaque to most without a particular quantitative background. MIRIâs predictions and responses to Open Phil implies a more extreme position: even domain experts are unlikely to appreciate the value of MIRIâs work without a considerable back-and-forth with MIRI itself. I confess scepticism at this degree of inferential distance, particularly given the Open Phil staff involved in this report involved several people who previously worked with MIRI.
I accept MIRI may not be targetting conventional metrics of research success (e.g. academic publications). Yet across most proxy indicators (e.g. industry involvement, academic endorsement, collaboration) for MIRI âdoing good researchâ, the evidence remains pretty thin on the groundâand, as covered above, direct assessment of research quality by domain experts is mixed at best. I look forward to the balance of evidence shifting favourably: the new conference papers are promising, ditto the buzz around logical induction (although I note the blogging is by people already in MIRIâs sphere of influence/âformer staff, and MIRIâs previous âblockbuster resultâ in decision theory has thus far underwhelmed). Yet this hope, alongside the earnest assurances of MIRI thatâif only experts gave them the timeâthey would be persuaded of their value, is not a promissory note that easily justifies an organisation with a turnover of $2M/âyear, nor fundraising for over a million dollars more.
I take this opportunity to note I have made an even-odds bet with Carl Shulman for $1000, donated to the charity of the winnerâs choice over whether Open Philâs next review of MIRI has a more favourable evaluation of their research.
I am wiser, albeit poorer: the bet resolved in Carlâs favour. I will edit this comment with the donation destination he selects, with further lamentations from me in due course.
Carl has gotten back to me with where he would like to donate his gains, ill-gotten through picking on epistemic inferiorsâakin to crocodiles in the Serengeti river picking off particularly frail or inept wildebeest on their crossing. The $1000 will go to MIRI.
With cognitive function mildly superior to the median geriatric wildebeest, I can take some solace that these circumstances imply this sum is better donated by him than I, and that MIRI is doing better on a crucial problem for the far future than I had supposed.
Why do people keep betting against Carl Shulman!
Thanks for the response, Gregory. I was hoping to see more questions along these lines in the AMA, so Iâm glad you followed up.
Open Philâs grant write-up is definitely quite critical, and not an endorsement. One of Open Philâs main criticisms of MIRI is that they donât think our agent foundations agenda is likely to be useful for AI alignment; but their reasoning behind this is complicated, and neither Open Phil nor MIRI has had time yet to write up our thoughts in any detail. I suggest pinging me to say more about this once MIRI and Open Phil have put up more write-ups on this topic, since the hope is that the write-ups will also help third parties better evaluate our research methods on their merits.
I think Open Philâs assessment that the papers they reviewed were âtechnically unimpressiveâ is mainly based on the papers âAsymptotic Convergence in Online Learning with Unbounded Delaysâ and (to a lesser extent) âInductive Coherence.â These are technically unimpressive, in the sense that theyâre pretty easy results to get once youâre looking for them. (The proof in âAsymptotic Convergence...â was finished in less than a week.) From my perspective the impressive step is Scott Garrabrant (the papersâ primary author) getting from the epistemic state (1) âI notice AIXI fails in reflection tasks, and that this failure is deep and canât be easily patchedâ to:
(2) âI notice that one candidate for âthe ability AIXI is missing that would fix these deep defectsâ is âlearning mathematical theorems while respecting patterns in whether a given theorem can be used to (dis)prove other theorems.ââ
(3) âI notice that another candidate for âthe ability AIXI is missing that would fix these deep defectsâ is âlearning mathematical theorems while respecting empirical patterns in whether a claim looks similar to a set of claims that turned out to be theorems.ââ
(4) âI notice that the two most obvious and straightforward ways to formalize these two abilities donât let you get the other ability for free; in fact, the obvious and straightforward algorithm for the first ability precludes possessing the second ability, and vice versa.â
In contrast, I think the reviewers were mostly assessing how difficult it would be to get from 2/â3/â4 to a formal demonstration that thereâs at least one real (albeit impractical) algorithm that can actually exhibit ability 2, and one that can exhibit ability 3. This is a reasonable question to look at, since itâs a lot harder to retrospectively assess how difficult it is to come up with a semiformal insight than how difficult it is to formalize the insight; but those two papers werenât really chosen for being technically challenging or counter-intuitive. They were chosen because they help illustrate two distinct easy/âstraightforward approaches to LU that turned out to be hard to reconcile, and also because (speaking with the benefit of hindsight) conceptually disentangling these two kinds of approaches turned out to be one of the key insights leading to âLogical Induction.â
I wasnât surprised that thereâs a big inferential gap for most of Open Philâs technical advisorsâwe havenât talked much with Chris/âDario/âJacob about the reasoning behind our research agenda. I was surprised by how big the gap was for Daniel Dewey, Open Philâs AI risk program officer. Danielâs worked with us before and has a lot of background in alignment research at FHI, and we spent significant time trying to understand each otherâs views, so this was a genuine update for me about how non-obvious our heuristics are to high-caliber researchers in the field, and about how much background researchers at MIRI and FHI have in common. This led to a lot of wasted time: I did a poor job addressing Danielâs questions until late in the review process.
Iâm not sure what prior probability you should have assigned to âthe case for MIRIâs research agenda is too complex to be reliably communicated in the relevant timeframe.â Evaluating how promising basic research is for affecting the long-run trajectory of the field of AI is inherently a lot more complicated than evaluating whether AI risk is a serious issue, for example. I donât have as much experience communicating the former, so the arguments are still rough. There are a couple of other reasons MIRIâs research focus might have more inferential distance than the typical alignment research project:
(a) Weâve been thinking about these problems for over a decade, so weâve had time to arrive at epistemic states that depend on longer chains of reasoning. Similarly, weâve had time to explore and rule out various obvious paths (that turn out to be dead ends).
(b) Our focus is on topics we donât expect to jibe well with academia and industry, often because they look relatively intractable and unimportant from standard POVs.
(c) âHigh-quality nonstandard formal intuitionsâ are what we do. This is what put us ahead of the curve on understanding the AI alignment problem, and the basic case for MIRI (from the perspective of people like Holden who see our early analysis and promotion of the alignment problem as our clearest accomplishment) is that our nonstandard formal intuitions may continue to churn out correct and useful insights about AI alignment when we zero in on subproblems. MIRI and FHI were unusual enough to come up with the idea of AI alignment research in the first place, so theyâre likely to come up with relatively unusual approaches within AI alignment.
Based on the above, I think the lack of mutual understanding is moderately surprising rather than extremely surprising. Regardless, itâs clear that we need to do a better job communicating how we think about choosing open problems to work on.
I donât think weâve ever worked with Scott Aaronson, though weâre obviously on good terms with him. Also, our approach to decision theory stirred up a lot of interest from professional decision theorists at last yearâs Cambridge conference; expect more about this in the next few months.
I think this is a reasonable criticism, and Iâm hoping our upcoming write-ups will help address this. If your main concern is that Open Phil doesnât think our work on logical uncertainty, reflection, and decision-theoretic counterfactuals is likely to be safety-relevant, keep in mind that Open Phil gave us $500k expecting this to raise our 2016 revenue from $1.6-2 million (the amount of 2016 revenue we projected absent Open Philâs support) to $2.1-2.5 million, in part to observe the ROI of the added $500k. Weâve received around $384k in our fundraiser so far (with four days to go), which is maybe 35-60% of what weâd expect based on past fundraiser performance. (E.g., we received $597k in our 2014 fundraisers and $955k in our 2015 ones.) Combined with our other non-Open-Phil funding sources, that means weâve so far received around $1.02M in 2016 revenue outside Open Phil, which is solidly outside the $1.6-2M range weâve been planning around.
There are a lot of reasons donors might be retracting; Iâd be concerned if the reason is that theyâre expecting Open Phil to handle MIRIâs funding on their own, or that theyâre interpreting some action of Open Philâs as a signal that Open Phil wants broadly Open-Phil-aligned donors to scale back support for MIRI.
(In all of the above, Iâm speaking only for myself; Open Phil staff and advisors donât necessarily agree with the above, and might frame things differently.)
âAlso, our approach to decision theory stirred up a lot of interest from professional decision theorists at last yearâs Cambridge conference; expect more about this in the next few months.â A quick note to say that comments that have made their way back to me from relevant circles agree with this. Also, my own impressionâfrom within academia, but outside decision theory and AIâis that the level of recognition of, and respect for, MIRIâs work is steadily rising in academia, although inferential gaps like what nate describes certainly exist, plus more generic cultural gaps. Iâve heard positive comments about MIRIâs work from academics I wouldnât have expected even to have heard of MIRI. And my impression, from popping by things like Cambridgeâs MIRIx discussion group, is that theyâre populated for the most part by capable people with standard academic backgrounds who have become involved based on the merits of the work rather than any existing connection to MIRI (although I imagine some are or were lesswrong readers).
Nate, my thanks for your reply. I regret I may not have expressed myself well enough for your reply to precisely target the worries I expressed; I also regret insofar as you reply overcomes my poor expression, it make my worries grow deeper.
If I read your approach to the Open Phil review correctly, you submitted some of the more technically unimpressive papers for review because they demonstrated the lead author developing some interesting ideas for research direction, and that they in some sense lead up to the âbig resultâ (Logical Induction). If so, this looks like a pretty surprising error: one of the standard worries facing MIRI given its fairly slender publication record is the technical quality of the work, and it seemed pretty clear that was the objective behind sending them out for evaluation. Under whatever constraints Open Phil provided, Iâd have sent the âbest by academic lightsâ papers I had.
In candour, I think âMIRI barking up the wrong treeâ and/âor (worse) âMIRI not doing that much good research)â is a much better explanation for what is going on than âinferential distanceâ. I struggle to imagine a fairer (or more propitious-to-MIRI) hearing than the Open Phil review: it involved two people (Dewey and Christiano) who previously worked with you guys, Dewey spent over 100 hours trying to understand the value of your work, they comissioned external experts in the field to review your work.
Suggesting that the fairly adverse review that results may be a product of lack of understanding makes MIRI seem more like a mystical tradition than a research group. If MIRI is unable to convince someone like Dewey, the prospects of it making the necessary collaborations or partnerships with the wider AI community look grim.
I had Aaronson down as within MIRIâs sphere of influence, but if I overstate I apologize (I am correct in that Yuan previously worked for you, right?)
I look forward to seeing MIRI producing or germinating some concrete results in decision theory. The âunderwhelming blockbusterâ I referred to above was the TDT/âUDT etc. stuff, which MIRI widely hyped but has since then languised in obscurity.
It may simply be the usual (albeit regrettable) trait of donors jockeying to be âlast resortâ - I guess it would depend what the usual distribution of donations are with respect to fundraising deadlines.
If donors are retracting, I would speculate Open Philâs report may be implicated. One potential model would be donors interpreting Open Philâs fairly critical support to be an argument against funding further growth by MIRI, thus pulling back so MIRIs overall revenue hovers at previous year levels (I donât read in the Open Phil a report a particular revenus target they wanted you guys to have). Perhaps a simpler explanation would be having a large and respected org do a fairly in depth review and give a fairly mixed review makes previously enthusiastic donors update to be more tepid, and perhaps direct their donations to other players in the AI space.
With respect, I doubt I will change my mind due to MIRI giving further write-ups, and if donors are pulling back in part âdue toâ Open Phil, I doubt it will change their minds either. It may be that âHigh quality non-standard formal insightsâ is what you guys do, but the value of that is pretty illegible on its own: it needs to be converted into tangible accomplishments (e.g. good papers, esteem from others in the field, interactions in industry) first to convince people there is actually something there, but also as this probably the plausible route to this comparative advantage having any impact.
Thus far this has not happened to a degree commensurate with MIRIâs funding base. I wrote four-and-a-half years ago that I was disappointed in MIRIâs lack of tangible accomplishments: I am even more disappointed that I find my remarks now follow fairly similar lines. Happily it can be fixedâif the logical induction result âtakes offâ as I infer you guys hope it does, it will likely fix itself. Unless and until then, I remain sceptical about MIRIâs value.
We originally sent Nick Beckstead what we considered our four most important 2015 results, at his request; these were (1) the incompatibility of the âInductive Coherenceâ framework and the âAsymptotic Convergence in Online Learning with Unbounded Delaysâ framework; (2) the demonstration in âProof-Producing Reflection for HOLâ that a non-pathological form of self-referential reasoning is possible in a certain class of theorem-provers; (3) the reflective oracles result presented in âA Formal Solution to the Grain of Truth Problem,â âReflective Variants of Solomonoff Induction and AIXI,â and âReflective Oraclesâ; (4) and Vadim Kosoyâs âOptimal Predictorsâ work. The papers we listed under 1, 2, and 4 then got used in an external review process they probably werenât very well-suited for.
I think this was more or less just an honest miscommunication. I told Nick in advance that I only assigned an 8% probability to external reviewers thinking the âAsymptotic ConvergenceâŠâ result was âgoodâ on its own (and only a 20% probability for âInductive Coherenceâ). My impression of what happened is that Open Phil staff interpreted my pushback as saying that I thought the external reviews wouldnât carry much Bayesian evidence (but that the internal reviews still would), where what I was trying to communicate was that I thought the papers didnât carry very much Bayesian evidence about our technical output (and that I thought the internal reviewers would need to speak to us about technical specifics in order to understand why we thought they were important). Thus, we were surprised when their grant decision and write-up put significant weight on the internal reviews of those papers (and they were surprised that we were surprised). This is obviously really unfortunate, and another good sign that I should have committed more time and care to clearly communicating my thinking from the outset.
Regarding picking better papers for external review: We only put out 10 papers directly related to our technical agendas between Jan 2015 and Mar 2016, so the option space is pretty limited, especially given the multiple constraints Open Phil wanted to meet. Optimizing for technical impressiveness and non-obviousness as a stand-alone result, I might have instead gone with Critchâs bounded Löb paper and the grain of truth problem paper over the AC/âIC results. We did submit the grain of truth problem paper to Open Phil, but they decided not to review it because it didnât meet other criteria they were interested in.
Iâm less pessimistic about building collaborations and partnerships, in part because weâre already on pretty good terms with other folks in the community, and in part because I think we have different models of how technical ideas spread. Regardless, I expect that with more and better communication, we can (upon re-evaluation) raise the probability of Open Phil staff that the work weâre doing is important.
More generally, though, I expect this task to get easier over time as we get better at communicating about our research. Thereâs already a body of AI alignment research (and, perhaps, methodology) that requires the equivalent of multiple university courses to understand, but there arenât curricula or textbooks for teaching it. If we can convince a small pool of researchers to care about the research problems we think are important, this will let us bootstrap to the point where we have more resources for communicating information that requires a lot of background and sustained scholarship, as well as more of the institutional signals that this stuff warrants a time investment.
I can maybe make the time expenditure thus far less mysterious if I mention a couple more ways I erred in trying to communicate my model of MIRIâs research agenda:
My early discussion with Daniel was framed around questions like âWhat specific failure mode do you expect to be exhibited by advanced AI systems iff their programmers donât understand logical uncertainty?â I made the mistake of attempting to give straight/ânon-evasive answers to those sorts of questions and let the discussion focus on that evaluation criterion, rather than promptly saying âMIRIâs research directions mostly arenât chosen to directly address a specific failure mode in a notional software systemâ and âI donât think thatâs a good heuristic for identifying research thatâs likely to be relevant to long-run AI safety.â
I fell prey to the transparency illusion pretty hard, and that was completely my fault. Mid-way through the process, Daniel made a write-up of what he had gathered so far; this write-up revealed a large number of miscommunications and places where I thought I had transmitted a concept of mine but Daniel had come away with a very different concept. Itâs clear in retrospect that we should have spent a lot more time with me having Daniel try to explain what he thought I meant, and I had all the tools to predict this in foresight; but I foolishly assumed that wouldnât be necessary in this case.
(I plan to blog more about the details of these later.)
I think these are important mistakes that show I hadnât sufficiently clarified several concepts in my own head, or spent enough time understanding Danielâs position. My hope is that I can do a much better job of avoiding these sorts of failures in the next round of discussion, now that I have a better model of where Open Philâs staff and advisors are coming from and what the review process looks like.
Yeah, though that was before my time. He did an unpaid internship with us in the summer of 2013, and weâve occasionally contracted him to tutor MIRI staff. Qiaochuâs also a lot socially closer to MIRI; he attended three of our early research workshops.
I think thatâs a reasonable stance to take, and that there are other possible reasonable stances here too. Some of the variables I expect EAs to vary on include âlevel of starting confidence in MIRIâs mathematical intuitions about complicated formal questionsâ and âgeneral risk tolerance.â A relatively risk-intolerant donor is right to wait until we have clearer demonstrations of success; and a relatively risk-tolerant donor who starts without a very high confidence in MIRIâs intuitions about formal systems might be pushed under a donation threshold by learning that an important disagreement has opened up between us and Daniel Dewey (or between us and other people at Open Phil).
Also, thanks for laying out your thinking in so much detailâI suspect there are other people who had more or less the same reaction to Open Philâs grant write-up but havenât spoken up about it. Iâd be happy to talk more about this over email, too, including answering Qs from anyone else who wants more of my thoughts on this.
Relevant update: Daniel Dewey and Nick Beckstead of Open Phil have listed MIRI as one of ten âreasonably strong options in causes of interestâ for individuals looking for places to donate this year.