2016 AI Risk Literature Review and Charity Comparison

LarksDec 13, 2016, 4:36 AM

57 points

AI alignment Donation writeup Summer Program on Applied Rationality and Cognition Cause prioritization Existential risk Center on Long-Term Risk AI safety

Introduction

I’ve long been concerned about AI Risk. Now that there are a few charities working on the problem, it seems desirable to compare them, to determine where scarce donations should be sent. This is a similar role to that which GiveWell performs for global health charities, and somewhat similar to an securities analyst with regard possible investments. However, while people have evaluated individual organisations, I haven’t seen anyone else attempt to compare them, so hopefully this is valuable to others.

I’ve attempted to do so. This is a very big undertaking, and I am very conscious of the many ways in which this is not up to the task. The only thing I wish more than the skill and time to do it better is that someone else would do it! If people find this useful enough to warrant doing again next year I should be able to do it much more efficiently, and spend more time on the underlying model of how papers translate into risk-reduction value.

My aim is basically to judge the output of each organisation in 2016 and compare it to their budget. This should give a sense for the organisations’ average cost-effectiveness. Then we can consider factors that might increase or decrease the marginal cost-effectiveness going forward.

This organisation-centric approach is in contrast to a researcher-centric approach, where we would analyse which researchers do good work, and then donate wherever they are. An extreme version of the other approach would be to simply give money directly to researchers—e.g if I like Logical Induction, I would simply fund Scott Garrabrant directly and ignore MIRI. I favour the organisation-centric approach because it helps keep organisations accountable. Additionally, if researcher skill is the only thing that matters for research output, it doesn’t really matter which organisations end up getting the money and employing the researchers, assuming broadly the same researchers are hired. Different organisations might hire different researchers, but then we are back at judging institutional quality rather than individual researcher quality.

Judging organisations on their historical output is naturally going to favour more mature organisations. A new startup, whose value all lies in the future, will be disadvantaged. However, many of the newer organisations seem to have substantial funding, which should be sufficient for them to operate for a few years and produce a decent amount of research. This research can then form the basis of a more informed evaluation after a year or two of operation, where we decide whether it is the best place for further funding. If the initial endowment is insufficient to produce a track record of research, incremental funding seems unlikely to make a difference. You might disagree here if you thought there were strong threshold effects; maybe $6m will allow a critical mass of top talent to be hired, but $5.9m will not.

Unfortunately there is a lack of a standardised metric—as we can’t directly measure the change in risk, so there is no equivalent for incremental life-years (as for GiveWell), or profits (for Investment). So this is going to involve a lot of judgement.

This judgement involves analysing a large number papers relating to Xrisk that were produced during 2016. Hopefully the year-to-year volatility of output is sufficiently low that this is a reasonable metric. I also attempted to include papers during December 2015, to take into account the fact that I’m missing the last month’s worth of output from 2016, but I can’t be sure I did this successfully. I then evaluated them for their contribution to a variety of different issues—for example, how much new insight does this paper add to our thinking about the strategic landscape, how much progress does this paper make on moral uncertainty issues an AGI is likely to face. I also attempted to judge how replaceable a paper was—is this a paper that would likely have been created anyway by non-safety-concerned AI researchers?

I have not made any attempt to compare AI Xrisk research to any other kind of research; this article is aimed at people who have already decided to focus on AI Risk. It also focuses on published articles, at the expense of both other kinds of writing (like blog posts or outreach) and non-textual output, like conferences. There are a lot of rabbit holes and it is hard enough to get to the bottom of even one!

This article focuses on AI risk work. Some of the organisations surveyed (FHI, FLI, GCRI etc.) also work on other risks. As such it if you think other types of Existential Risk research are similarly important to AI risk work you should give organisations like GCRI or GPP some credit for this.

Prior Literature

The Open Philanthropy Project did a review of MIRI here. The verdict was somewhat equivocal—there were many criticisms, but they ended up giving MIRI $500,000, which while less than they could have given, was nonetheless rather more than the default, zero. However, the disclosures section is painful to read. Typically we would hope that analysts and subjects would not live in the same house—or be dating coworkers. This is in accordance with their anti-principles, which explicitly de-prioritise external comprehensibility and avoiding conflicts of interest. Worse from our perspective, the report makes no attempt to compare donations to MIRI to donations to any other organisation.

Owen Cotton-Barratt recently wrote a piece explaining his choice of donating to MIRI. However this also contained relatively little evaluation of alternatives to MIRI—while there is an implicit endorsement through his decision to donate, as a Research Fellow at FHI is is inappropriate for him to donate to FHI, so his decision has little information value with regard the FHI-MIRI tradeoff.

Methodology & General Remarks

Here are some technical details which arose during this project.

Technical vs Non-Technical Papers

Public Policy / Public Outreach

In some ways AI Xrisk fits very naturally into a policy discussion. It’s basically concerned with a negative externality of AI research, which suggests the standard economics toolkit of pigovian taxes and property rights / liability allocation. The unusual aspects of the issue (like irreversibility) suggest outright regulation could be warranted. This is certainly close to many people’s intuitions about trying to use the state as a vector to solve massive coordination problems.

However, I now think this is a mistake.

My impression is that policy on technical subjects (as opposed to issues that attract strong views from the general population) is generally made by the government and civil servants in consultation with, and being lobbied by, outside experts and interests. Without expert (e.g. top ML researchers at Google, CMU & Baidu) consensus, no useful policy will be enacted.

Pushing directly for policy seems if anything likely to hinder expert consensus. Attempts to directly influence the government to regulate AI research seem very adversarial, and risk being pattern-matched to ignorant opposition to GM foods or nuclear power. We don’t want the ‘us-vs-them’ situation, that has occurred with climate change, to happen here. AI researchers who are dismissive of safety law, regarding it as an imposition and encumbrance to be endured or evaded, will probably be harder to convince of the need to voluntarily be extra-safe—especially as the regulations may actually be totally ineffective. The only case I can think of where scientists are relatively happy about punitive safety regulations, nuclear power, is one where many of those initially concerned were scientists themselves.

Given this, I actually think policy outreach to the general population is probably negative in expectation.

‘Papers to point people to’

Summary articles like Concrete Problems can be useful both for establishing which ideas are the most important to work on (vs a previous informal understanding thereof) and to provide something to point new researchers towards, providing actionable research topics. However progress here is significantly less additive than for object-level research: New theorems don’t generally detract from old theorems, but new pieces-to-be-pointed-to often replace older pieces-to-be-pointed-to.

Technical papers

Generally I value technical papers—mathematics research—highly. I think this are a credible signal of quality, both to me as a donor and also to mainstream ml researchers, and additive—each result can build on previous work. What is more, they are vital—mankind cannot live on strategy white papers alone!

Marginal Employees

In their recent evaluation of MIRI, the Open Philanthropy Project asked to focus on papers written by recent employees, to get an impression of the quality of marginal employees, to whom incremental donations would presumably be funding. However, I think this is plausibly a mistake. When evaluating public companies, we are always concerned about threats to their core business which may be obscured by smaller, more rapidly growing, segments. We don’t want companies to ‘buy growth’ in an attempt to cover up core weakness. A similar principle seems plausible here: an organisation might hire new employees that are all individually productive, thereby covering up a reduction in productivity/focus/output from the founders or earlier employees. In the absence of these additional hires, the (allegedly sub-marginal) existing employees would be more productive, in order to ensure the organisation did not fall below a minimum level of output. I think this is why a number of EA organisations seem to have seen sublinear returns to scale.

Paper authorship allocation

Virtually all AI Xrisk related papers are co-authored, frequently between organisations. This raises the question of how to allocate credit between institutions. In general in academic the first author has done most of the work, with a sharp drop off (though this is not the case in fields like economics, where an alphabetic ordering is used).

In cases where authors had multiple affiliations, I assigned credit to X-risk organisations over universities. In a few cases where an author was affiliated with multiple organisations I used my judgement e.g. assigning Stuart Armstrong to FHI not MIRI.

This policy could be criticized if you thought external co-authors was a good thing, for expanding the field, and hence should not be a discount.

Ability to scale

No organisation I asked to review this document suggested they were not looking for more donations.

My impression is that very small organisations actually scale worse than larger organisations, because very small organisations have a number of advantages that are lost as they grow—large organisations have already lost these advantages. These include a lack of internal coordination problems and highly motivated founders who are willing to work for little or nothing.

Notable for both MIRI and FHI I think their best work for the year was produced by non-founders, suggesting this point has been passed for them, which is a positive.

This means I am relatively more keen to fund either very small or relatively large organisations, unless it is necessary to prevent the small organisation from going under.

Professionalism & Reputational Risks

Organisations should consider reputational risks. To the extent that avoiding AI Risk largely consists in persuading a wide range of actors to take the threat seriously and act appropriately, actions that jeopardize this by making the movement appear silly or worse should be avoided.

In the past EA organisations have been wanting in this regard. There have been at least two major negative PR events, and a number of near misses.

One contributor to this risk is that from a PR perspective there is no clear distinction between the actions organisational leaders take in their role representing their organisations and the actions they take as private persons. If a politician or CEO behaves immorally in their personal life, this is taken (perhaps unfairly) as a mark against the organisation—claims that statements or actions do not reflect the views of their employer are simply not credible. Indeed, leaders are often judged by unusually stringent standards—they should behave in a way that not merely *is not immoral*, but is *unquestionably not immoral*. This includes carefully controlling political statements, as even mainstream views can have negative ramifications. I think Jaan’s policy is prudent:

as a general rule, i try to steer clear of hot political debates (and signalling tribal affiliations), because doing that seems instrumentally counter-productive for my goal of x-risk reduction. source

Xrisk organisations should consider having policies in place to prevent senior employees from espousing controversial political opinions on facebook or otherwise publishing materials that might bring their organisation into disrepute. They should also ensure that senior employees do not attempt to take advantage of their position. This requires organisations to bear in mind that they need to maintain the respect of the world at large, and that actions which appear acceptable within the Bay Area may not be so to the wider world.

Financial Controls

If money is to be spent wisely it must first not be lost or stolen. A number of organisations have had worrying financial mismanagement in the past. For example, MIRI suffered a major theft by ex-employee in 2009, though apparently they have recovered the money.

However, I’m not sure this information is all that useful to a potential donor—the main predictor of whether I’m aware of some financial mismanagement in an organisation’s past is simply how familiar I am with the organisation, followed by how old they are, which it is unfair to punish them for.

It might be worth giving FHI some credit here, as they are both old enough and I’m familiar enough with them that the absence of evidence of mismanagement may actually constitute some non-trivial evidence of its absence. Generalising this point, university affiliation might be a protective factor—though not always this was not the case, as shown when SCI’s parent, Imperial College, misplaced $333,000 , and it can also raise fungibility issues.

Communication with donors

All organisations should write a short document annually (preferably early December), laying out their top goals for the coming year in a clear and succinct manner, and then briefly describing how successful they thought they were at achieving the previous year’s goals. The document should also contain a simple table showing total income and expenditure for the last 3 years, projections for the next year, and the number of employees. Some organisations have done a good job of this; others could improve.

Public companies frequently do Non Deal Roadshows (NDRs), where some combination of CEO, CFO and IR will travel to meet with investors, answering their questions as well as giving investors a chance to judge management quality.

While it would be unduly expensive, both in terms of time and money, for Xrisk organisations to host such tours, when senior members are visiting cities with major concentrations of potential donors (e.g. NYC, London, Oxford, Bay Area) they should consider hosting informal events where people can ask questions; GiveWell already hosts such an annual meeting in NYC. This could help improve accountability and reduce donor alienation.

Literature and Organisational Review

Technical Safety Work Focused Organisations

MIRI

MIRI is the largest pure-play AI existential risk group. Based in Berkeley, it focuses on mathematics research that is unlikely to be produced by academics, trying to build the foundations for the development of safe AIs. Their agent foundations work is basically trying to work out the correct way of thinking about agents and learning/decision making by spotting areas where our current models fail and seeking to improve there.

Much of their work this year seems to involve trying to address self-reference in some way—how can we design, or even just model, agents that are smart enough to think about themselves? This work is technical, abstract, and requires a considerable belief in their long-term vision, as it is rarely locally applicable.

During the year they announced something of a pivot, towards spending more of their time on ml work, in addition to their previous agent-foundations focus. I think this is possibly a mistake; while more directly relevant, this work seems significantly more addressable by mainstream ml researchers than their agent-foundations work, though to be fair mainstream ml researchers have generally not actually done it. Additionally, this work seems somewhat outside of their expertise. In any event at this early stage the new research direction has not (and wouldn’t have been expected to) produced any research to judge it by.

Virtually all of MIRI’s work, especially on the agent foundations side, does very well on replaceability; it seems unlikely that anyone not motivated by AI safety would produce this work. Even within those concerned about friendly AI, few not at MIRI would produce this work.

Parametric Bounded Lob’s Theorem and Robust Cooperation of Bounded Agents offers a cool and substantive result. Basically by proving a bounded version of Lob’s theorem we can ensure that proof-finding agents will be able to utilise Lobian reasoning. This is especially useful for agents that need to model other agents, as it allows two ‘equally powerful’ agents to come to conclusions about each other. In terms of improving our understanding of general reasoning agents, this seems like a reasonable step forward, especially for self-improving agents, who need to reason about *even more powerful agents*. It could also help game-theoretic approaches at getting useful work from unfriendly AIs, as it shows the danger that separate AIs could causally cooperate in this fashion, though I don’t think MIRI would necessarily agree on that point.

Inductive Coherence was an interesting attempt to solve the problem of reasoning about logical uncertainty. I didn’t really follow the maths. Asymptotic Convergence in Online Learning with Unbounded Delays was another attempt to solve the issue from another angle. According to MIRI it’s basically superseded by the next paper anyway, so I didn’t invest too much time in these papers.

Logical Induction is a very impressive paper. Basically they make a lot of progress on the problem of logical uncertainty by setting up a financial market of Arrow-Debreu securities for logical statements, and then specifying that you shouldn’t be very exploitable in this market. From this, a huge number of desirable, and somewhat surprising, properties follow. The paper provides a model of a logical agent that we can work with to prove other results, before we actually have a practical implementation of that agent. Hopefully this also helps cause some differential progress towards more transparent AI techniques.

A Formal Solution to the Grain of Truth Problem provides a class of bayesian agents whose priors assign positive probability to the other agents in the class. The mathematics behind the paper seem pretty impressive, and the result seems useful—ultimately AIs will have to be able to locate themselves in the world, and to think about other AIs. Producing an abstract formal way of modelling these issues now helps us make progress before such AI is actually developed—and thinking about abstract general systems is often easier than messy particular instantiations. The lead author on this paper was Jan Leike, who was at ANU (now at Deepmind/FHI), so MIRI only gets partial credit here.

Alignment for Advanced ML Systems is a high-level strategy / ‘point potential researchers towards this so they understand what to work on’ piece. I’d say it’s basically midway between Concrete Problems and Value Learning Problem; more explicit about the Xrisk / Value Learning problems than the former, but more ML than the latter. It discusses a variety of issues and summarises some of the literature, including Reward Hacking, Scalable Oversight, Domesticity, Ambiguity Identification, Robustness to Distributional Shift, and Value Extrapolation.

Formalizing Convergent Instrumental Goals is a cute paper that basically formalises the classic Omohundro paper on the subject, showing that AGI won’t by default leave humans alone and co-exist from the Ort Cloud. Apparently some people didn’t find Omohundro’s initial argument intuitively obvious—this nice formalisation hopefully renders the conclusion even clearer. However, wouldn’t consider the model developed here (which is purposefully very bare-bones in the interests of generality) as a foundation for future work; this is a one-and-done paper.

Defining Human Values for Value Learners produces a model of human values basically as concepts that abstract from lower experiences like pain or hunger in order to better promote them—pain in turn abstracting from evolutionary goals in order to better promote the germline. It’s a nice idea, but I doubt we will get to the correct model this way—as opposed to more ML-inspired routes.

MIRI also sponsored a series of MIRIx workshops, helping external researchers engage with MIRI’s ideas. One of these lead to Self-Modification in Rational Agents, where Tom Everitt et al basically formalise an intuitive result, from LessWrong and no doubt elsewhere—that Ghandi does not want to want to murder—in nice ML style. Given how much good work has come out of ANU, however, perhaps the MIRIx workshop should not get that much counterfactual credit.

MIRI submitted a document to the White House’s Request for Information on AI safety. The submission seems pretty good, but it’s hard to tell what if any impact it had. The submission was not referenced in the final White House report, but I don’t think that’s much evidence.

MIRI’s lead researcher is heavily involved as an advisor (and partial owner) in a startup that is trying to develop more intuitive mathematical explanations; MIRI also paid him to develop content about AI risk for that platform. He also published a short eBook, which was very funny but somewhat pornographic and not very related to AI. I think this is probably not very helpful for MIRI’s reputation as a serious research institution.

MIRI spent around $1,650,000 in 2015, and $1,750,000 in 2016.

FHI

FHI is a well-established research institute, affiliated with Oxford and led by Nick Bostrom. Compared to the other groups we are reviewing they have a large staff and large budget. As a relatively mature institution they produced a decent amount of research over the last year that we can evaluate.

Their research is more varied than MIRI’s, including strategic work, work directly addressing the value-learning problem, and corrigibility work.

Stuart Armstrong has two notable papers this year, Safely Interruptable Agents and Off-policy Monte Carlo agents with variable behaviour policies, both on the theme of Interruptibility—how to design an AI such that it can be interrupted after being launched and its behaviour altered. In the long run this won’t be enough—we will have to solve the AI alignment problem eventually—but this might help provide more time, or maybe a saving throw. Previously I had thought I understood this research agenda—it was about making an AI indifferent to a red button through cleverly designed utility functions or priors. With these latest two papers I’m less sure, as they seem to be concerned only with interruptions during the training phase, and do not prevent the AI from predicting or trying to prevent interruption. However they seem to be working on a coherent program, so I trust that this research direction makes sense. Also importantly, one of the papers was coauthored Laurent Orseau of Deepmind. I think these sorts of collaborations with leading AI researchers are incredibly valuable.

Jan Leike, who recently joined Deepmind, is also affiliated with FHI, and as a fan of his work (including the Grain of Truth paper described above) I am optimistic about what he will produce, if also pessimistic about my ability to judge it. Exploration Potential seems to provide a metric to help AIs explore in a goal-aware fashion, which is desirable, but there seems to be still a long way to go before this problem is solved.

Strategic Openness returns to a question FHI addressed before, namely what are the costs and benefits open AI research, as opposed to secretive or proprietary. This paper is as comprehensive as one would expect from Bostrom. Much of the material is obvious when you read it, but collecting a sufficient number of individually trivial things can produce a valuable result. The paper seems like a valuable strategic contribution; unfortunately it may simply be ignored by people who want to set up Open AI groups. It does well on replaceability; it seems unlikely that anyone not motivated by AI safety would produce this work. It might benefit from the extra prestige of being published in a journal.

FHI also published a Learning the Preferences of Ignorant, Inconsistant Agents, on how to infer values from ignorant and inconsistant agents. This provides simple functional forms for hyperbolic discounting or ignorant agents, some cute examples of learning what combinations of preferences and bias could have yielded behaviour, and a survey to show the model agrees with ordinary people’s intuitions. However while it and the related paper from 2015, Learning the Preferences of Bounded Agents, are some of the best work I have seen on the subject, they have no solution to problem of too many free parameters; it shows possible combinations of bias+value that could account for actions, but no way to differentiate between the two. FHI had the lead author (Owain Evans) but the second and third authors were not FHI. The paper came out December 2015, but we are offsetting our year by one month so this still falls within the time period. Presumably the work was done earlier in 2015, but equally presumably there is other research FHI is working on now that I can’t see

The other FHI research consists of three main collaborations with the Global Priorities Project on strategic policy-orientated research, lead by Owen Cotton-Barratt. Underprotection of Unpredictable Statistical Lives Compared to Predictable Ones basically lays the groundwork for regulation / internalisation of low-p high impact risks. The core idea is pretty obvious but obvious things still need stating, and the point about competition from irrational competitors is good and probably non-intuitive to many; the same issue occurs when profit-motivated western firms attempt to compete with ‘strategic’ Asian competitors (e.g. the Chinese steel industry, or various Japanese firms). While it discusses using insurance to solve some issues, it doesn’t mention how setting the attach point=total assets can solve some incentive alignment problems. More notably, it does not address the danger that industry-requested regulation can lead to regulatory capture. The article is also behind a paywall, which seems likely to reduce its impact. The working paper, Beyond risk-benefit analysis: pricing externalities for gain-of-function research of concern deals with a similar issue. Overall while I think this work is quite solid economic theory, and addresses a neglected overall topic, I think it is unlikely that this approach will make much difference to AI risk, though it could be useful for biosecurity or the like. They also produced Global Catastrophic Risks, which, mea culpa, I have not read.

FHI spent £1.1m in 2016 (they were unable to provide me 2015 numbers due to a staff absence). Assuming their cost structure is fundamentally sterling based, this corresponds to around $1,380,000.

OpenAI

OpenAI, Musk’s AI research company, apparently has $1bn pledged. I doubt incremental donations are best spent here, even though they seem to be doing some good work, like Concrete Problems in AI Safety.

If this document proves useful enough to produce again next year, I’ll aim to include a longer section on OpenAI.

Center for Human-Compatible AI

The Center for Human-Compatible AI, founded by Stuart Russell in Berkeley, launched in August.

As they are extremely new, there is no track record to judge and compare—the publications on their publications page appear to all have been produced prior to the founding of the institute. I think there is a good chance that they will do good work—Russell has worked on relevant papers before, like Cooperative Inverse Reinforcement Learning, which addresses how an AI should explore if it has a human teaching it the value of different outcomes. I think the two Evans et al papers (one, two) offer a more promising approach to this specific question, because they do not assume the AI can directly observe the value of an outcome, but the Russell paper may be useful for corrigibility—see for example Dylan Hadfield-Menell’s The Off Switch. Information Gathering Actions over Human Internal State seems also potentially relevant to the value learning problem.

However, they have adequate (over $5m) initial funding from the Open Philanthropy Project, the Leverhulme Trust, CITRIS and the Future of Humanity Institute. If they cannot produce a substantial amount of research over the next year with this quantity of funding, it seems unlikely that any more would help (though if you believed in convex returns to funding it might, for example due to threshold//critical mass effects), and if they do we can review them then. As such I wish them good luck.

Strategy / Outreach focused organisations

FLI

The Future of Life Institute was founded to do outreach, including run the Puerto Rico conference. Elon Musk donated $10m for the organisation to re-distribute; given the size of the donation it has rightfully come to somewhat dominate their activity. The 2015 grant program recommended around $7m grants to a variety of researchers and institutions. Some of the grants were spread over several years.

My initial intention was to evaluate FLI as a annual grant-making organisation, judging them by their research portfolio. I have read over 26 papers thus supported. However, I am now sceptical this is the correct way to think about FLI.

In terms of directly funding the most important research, donating to FLI is unlikely to be the best strategy. Their grant pool is allocated by a board of (anonymous) AI researchers. While grants are made in accordance with the excellent research priorities document, much of the money has historically funded, and is likely to continue to fund, shorter term research projects than donors may otherwise prioritise. The valuable longer-term projects they do fund tend to be at institutes like MIRI or FHI, so donors wishing to support this could simply donate directly. Of course some of their funding does support valuable long-term research that is not done at the other organisations. For example they supported the Steinhardt et al paper Avoiding Imposters and Delinquents: Adversarial Crowdsourcing and Peer Prediction on how to get useful work from unreliable agents, which I do not see how donors could have supported except through FLI.

Also unfortunately I simply don’t have time to review all the research they supported, which is entirely my own fault. Hopefully the pieces I read (which include many by MIRI and other institutes mentioned in this piece) are representative of their overall portfolio.

Rather, I think FLI’s main work is in consensus-building. They successfully got a large number of leading AI researchers to sign the Open Letter, which was among other things referenced in the White House report on AI, though the letter took place before the 2016 time frame we are looking at. The ‘mainstreaming’ of the cause over the last two years is plausibly partly attributable to FLI; unfortunately it is very hard to judge to what extent.

They also work on non-AI existential risks, in particular nuclear war. In keeping with the focus of this document and my own limitations, I will not attempt to evaluate their output here, but other potential donors should keep it in mind.

CSER

CSER is an existential risk focused group located in Cambridge. Its founding was announced in late 2012 - the organisation existed in some form in March 2014.

They were significantly responsible for the award of £10m by the Leverhulme trust to fund a new research institute in Cambridge, the Leverhulme Centre for the Future of Intelligence. To the extent CSER is responsible, this is good leverage of financial resources. However, other organisations are also performing outreach, and there is a limit to how many child organisations you can spawn in the same city: if the first had trouble hiring, might not the second?

In August of 2016 CSER published an update, including a list of research they currently have underway, to be published online shortly.

While they have held a large number of events, including conferences, as of December 2016 there is no published research on their website. When I reached out to CSER they said they had various pieces in the process of peer review, but were not in a position to publically share them—hopefully next year or later in December.

In general I think it is best for research to be as open as possible. If not shared publicly it cannot exert very much influence, and we cannot evaluate the impact of the organisation. It is somewhat disappointing that CSER has not produced any public-facing research over the course of multiple years; apparently they have had trouble hiring.

As such I encourage CSER to publish (even if not peer reviewed, though not at the expense of peer review) so it can be considered for future donations.

GCRI

The Global Catastrophic Risks Institute is run by Seth Baum and Tony Barrett. They have produced work on a variety of existential risks.

This includes strategic work, for example On the Promotion of Safe and Socially Beneficial Artificial Intelligence, which provides a nuanced discussion of how to frame the issue so as not to alienate key stakeholders. For example, it argues that an ‘AI arms race’ is bad framing, inimical to creating a culture of safety. It also highlights the military and auto industry as possible forces for a safety culture. This paper significantly informed my own thinking on the subject.

Another strategic piece is A Model of Pathways to Artificial Superintelligence Catastrophe for Risk and Decision Analysis, which applies risk tree analysis to AI. This seems to be a very methodical approach to the problem.

In both cases the work does not seem replaceable—it seems unlikely that industry participants would produce such work.

They also produced some work on ensuring adequate food supply in the event of disaster and on the ethics of space exploration, which both seem valuable, but I’m not qualified to judge.

Previously Seth Baum suggested that one of their main advantages lay in skill at stakeholder engagement; while I certainly agree this is very important, it’s hard to evaluate from the outside.

GCRI operates on a significantly smaller budget than some of the other organisations; they spent $98,000 in 2015 and approximately $170,000 in 2016.

Global Priorities Project

The Global Priorities Project is a small group at Oxford focusing on strategic work, including advising governments and publishing research. For ease of reference, here is that section again:

Underprotection of Unpredictable Statistical Lives Compared to Predictable Ones basically lays the groundwork for regulation / internalisation of low-p high impact risks. The core idea is pretty obvious but obvious things still need stating, and the point about competition from irrational competitors is good and probably non-intuitive to many; the same issue occurs when profit-motivated western firms attempt to compete with ‘strategic’ Asian competitors (e.g. the Chinese steel industry, or various Japanese firms). While it discusses using insurance to solve some issues, it doesn’t mention how setting the attach point=total assets can solve some incentive alignment problems. More notably, it does not address the danger that industry-requested regulation can lead to regulatory capture. The article is also behind a paywall, which seems likely to reduce its impact. The working paper, Beyond risk-benefit analysis: pricing externalities for gain-of-function research of concern deals with a similar issue. Overall while I think this work is quite solid economic theory, and addresses a neglected overall topic, I think it is unlikely that this approach will make much difference to AI risk, though it could be useful for biosecurity or the like. They also produced Global Catastrophic Risks, which, mea culpa, I have not read.

Its operations have now been absorbed within CEA, which raises donor fungibility concerns. Historically CEA partially addressed this by allocating unrestricted donations in proportion to restricted donations. However, they rescinded this policy. As a result, I am not sure how donors could ensure the GPP actually counterfactually benefited from increased donations.

GPP has successfully produced nuanced pieces of research aimed at providing a foundation for future policy. Doing so requires an unemotional evaluation of the situation, and a certain apolitical attitude, to ensure your work can influence both political parties. While the GPP people seem well suited to this task, CEA executives have on a number of occasions promoted a partisan view of EA, which hopefully will not affect the work of the GPP.

GPP clearly collaborates closely with FHI. GPP’s noteworthy publications this year were coauthored with FHI, including the fact that lead author Owen Cotton-Barratt lists dual affiliation. CEA and FHI also share offices. As such it seems likely that, if all GPP’s supporters decided to donate to FHI instead of CEA, GPP’s researchers might simply end up being employed by FHI. This would entail some restructuring costs but the long term impact on research output does not seem very large.

AI Impacts

note: this section added 2016-12-14.

AI Impacts is a small group that does high-level strategy work, especially on AI timelines, loosely associated with MIRI.

They seem to have done some quite interesting work—for example the article on the intelligence capacity of current hardware, which argues that current global computing hardware could only support a relatively small number of EMs. This was quite surprising to me, and would make me much more confident about the prospects for humanity if we developed EMs soon; a small initial number would allow us to adapt before their impact became overwhelming. They also successfully found a significant error in previously published Xrisk work significantly undermined the conclusion (which had been that the forecasts of experts and non-experts did not significantly differ).

Unfortunately they do not seem to have a strong record of publishing. My impression is their work has received relatively little attention, partly because of this, though as the intended end-user of the research appears to be people who are already very interested in AI safety, maybe they do not need much distribution.

They were supported by a FLI grant, and apparently do not have need for additional funding at this time.

Xrisks Institute

The X-Risks institute appears to be mainly involved in publishing magazine articles, as opposed to academic research. As I think popular outreach—as opposed to academic outreach—is quite low value for AI Risk, and potentially counterproductive if done poorly, I have not reviewed their work in great detail.

X-Risks Net

The X-risks net have produced a variety of strategic maps that summarise the landscape around various existential risks.

CFAR, 80K, REG

All three organisations are ‘meta’ in some way: CFAR attempts helps equip people to do the required research; 80k helps people choose effective careers, and REG spends money to raise even more.

I think CFAR’s mission, especially SPARC, is very interesting; I have donated there in the past. Nate Soares (head of MIRI) credits them with producing at least one counterfactual incremental researcher, though as MIRI now claims to be dollar-constrained, presumably absent CFAR they would have hired the current marginal candidate earlier instead.

“it led directly to MIRI hires, at least one of which would not have happened otherwise” source

They also recently announced change of strategy, towards more direct AI focus.

However I do not know how to evaluate them, so choose to say nothing rather than do a bad job.

Other Papers

Arguably the best paper of the year, Concrete Problems in AI Safety, was not published by any of the above organisations—it was a collaboration between researchers at Google Brain, Stanford, Berkeley and Paul Christiano (who is now at OpenAI). It is a high-level strategy / literature review / ‘point potential researchers towards this so they understand what to work on’ piece, focusing on problems that are relevant to and addressable by mainstream ml researchers. It discusses a variety of issues and summarises some of the literature, including Reward Hacking, Scalable Oversight (including original work by Paul Christiano), Domesticity/Low impact, Safe Exploration, Robustness to Distributional Shift. It was mentioned in the White House policy document.

The AGI Containment Problem, on AI box design, is also interesting, again not produced by any of the above organisations. It goes through in some detail many problems that a box would have to address in a significantly more concrete and organised way than previous treatments of the subject.

Conclusion

I think the most valuable papers this year were basically

Concrete Problems, which is not really linked to any of the above organisations, though Paul Christiano was partly funded by FLI.
Parametric Bounded Lob, which is MIRI
Logical Induction, which is MIRI.
Grain of Truth, which is partially MIRI, but the lead author was independent, subsequently at Deepmind/FHI. Additionally, some of the underlying ideas are due to Paul Christiano, now at OpenAI.
Interruptibility, which is a FHI—Deepmind collaboration.

In general I am more confident the FHI work will be useful than the MIRI work, as it more directly addresses the issue. It seems quite likely general AI could be developed via a path that renders the MIRI roadmap unworkable (e.g. if the answer is just to add enough layers to your neural net), though MIRI’s recent pivot towards ml work seems intended to address this.

However, the MIRI work is significantly less replaceable—and FHI is already pretty irreplaceable! I basically believe that if MIRI were not pursuing it no-one else would. And if MIRI is correct, their work is more vital than FHI’s.

To achieve this output, MIRI spent around $1,750,000, while FHI spent around $1,400,000.

Hopefully my deliberations above prove useful to some readers. Here is my eventual decision, rot13′d so you can do come to your own conclusions first if you wish:

Qbangr gb obgu gur Znpuvar Vagryyvtrapr Erfrnepu Vafgvghgr naq gur Shgher bs Uhznavgl Vafgvghgr, ohg fbzrjung ovnfrq gbjneqf gur sbezre. V jvyy nyfb znxr n fznyyre qbangvba gb gur Tybony Pngnfgebcuvp Evfxf Vafgvghgr.

However I wish to emphasis that all the above organisations seem to be doing good work on the most important issue facing mankind. It is the nature of making decisions under scarcity that we must prioritize some over others, and I hope that all organisations will understand that this necessarily involves negative comparisons at times.

Neglected questions

Here are some issues that seem to have not been addressed much by research during 2016:

Neglected Problems that have to be solved eventually

The problem of Reward Hacking / wireheading—but see Self-Modification of Policy and Utility Function in Rational Agents

Ontology Identification—how can an AI match up its goal structure with its representation of the world

Normative Uncertainty—how should an AI act if it is uncertain about the true value (except inasmuch as it is implicitly addressed by Value Learning papers like Owain’s)

Value Extrapolation—how do we go from a person’s actual values, which might be contradictory, to some sort of reflective equilibria? And how do we combine the values of multiple people?

Stolen Future—how do we ensure first mover advantages don’t allow a small group of people, whose values do not reflect those of wider humanity, past and present, from gaining undue influence.

Neglected Problems that would probably helpful to be solved

Domesticity—How to design an AI that tries not to affect the world much, voluntarily staying in its box.

Differential progress—is it advantageous to promote a certain type of AI development above others?

Disclosures

I shared a draft of this document with representatives of MIRI, FHI, FLI, GPP, CSER and GCRI. CSER was unable to review the document due to their annual conference. I’m very grateful for Greg Lewis, Alex Flint and Jess Riedel for helping review this document. Any remaining inadequacies and mistakes are my own.

I interned at MIRI back when it was SIAI, volunteered very briefly at GWWC (part of CEA) and once applied for a job at FHI. I am personal friends with people at many of the above organisations.

I added the section on AI Impacts and made corrected some typos 2016-12-14.

References

Amran Siddiqui, Alan Fern, Thomas Dietterich and Shubhomoy Das; Finite Sample Complexity of Rare Pattern Anomaly Detection; http://auai.org/uai2016/proceedings/papers/226.pdf

Andrew Critch; Parametric Bounded Lob’s Theorem and Robust Cooperation of Bounded Agents; http://arxiv.org/abs/1602.04184

Anthony Barrett and Seth Baum; A Model of Pathways to Artificial Superintelligence Catastrophe for Risk and Decision Analysis; http://sethbaum.com/ac/fc_AI-Pathways.html

Anthony M. Barrett and Seth D. Baum; Risk analysis and risk management for the artificial superintelligence research and development process; http://sethbaum.com/ac/fc_AI-RandD.html

Aristide C Y Tossou and Christos Dimitrakakis; Algorithms for Differentially Private Multi-Armed Bandits; https://arxiv.org/pdf/1511.08681.pdf

Bas Steunebrink, Kristinn Thorisson, Jurgen Schmidhuber; Growing Recursive Self-Improvers; http://people.idsia.ch/~steunebrink/Publications/AGI16_growing_recursive_self-improvers.pdf

Carolyn Kim, Ashish Sabharwal and Stefano Ermon; Exact sampling with integer linear programs and random perturbations; https://cs.stanford.edu/~ermon/papers/kim-sabharwal-ermon.pdf

Chang Liu, Jessica Hamrick, Jaime Fisac, Anca Dragan, Karl Hedrick, Shankar Sastry and Thomas Griffiths; Goal Inference Improves Objective and Perceived Performance in Human Robot Collaboration; http://www.jesshamrick.com/publications/pdf/Liu2016-Goal_Inference_Improves_Objective.pdf

Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, Dan Mané; Concrete Problems in AI Safety; https://arxiv.org/abs/1606.06565

David Silk; Limits to Verification and validation and artificial intelligence; https://arxiv.org/abs/1604.06963

Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, Stuart Russell; Cooperative Inverse Reinforcement Learning; https://arxiv.org/abs/1606.03137

Dylan Hadfield-Menell; The Off Switch; https://intelligence.org/files/csrbai/hadfield-menell-slides.pdf

Ed Felten and Terah Lyons; The Administration’s Report on the Future of Artificial Intelligence; https://www.whitehouse.gov/blog/2016/10/12/administrations-report-future-artificial-intelligence

Federico Pistono, Roman V Yampolskiy; Unethical Research: How to Create a Malevolent Artificial Intelligence; https://arxiv.org/ftp/arxiv/papers/1605/1605.02817.pdf

Fereshte Khani, Martin Rinard and Percy Liang; Unanimous prediction for 100% precision with application to learning semantic mappings; https://arxiv.org/abs/1606.06368

Jacob Steinhardt, Percy Liang; Unsupervised Risk Estimation with only Structural Assumptions; cs.stanford.edu/~jsteinhardt/publications/risk-estimation/preprint.pdf

Jacob Steinhardt, Gregory Valiant and Moses Charikar; Avoiding Imposters and Delinquents: Adversarial Crowdsourcing and Peer Prediction; https://arxiv.org/abs/1606.05374

James Babcock, Janos Kramar, Roman Yampolskiy; The AGI Containment Problem; https://arxiv.org/pdf/1604.00545v3.pdf

Jan Leike, Jessica Taylor, Benya Fallenstein; A Formal Solution to the Gain of Truth Problem; http://www.auai.org/uai2016/proceedings/papers/87.pdf

Jan Leike, Tor Lattimore, Laurent Orseau and Marcus Hutter; Thompson Sampling is Asymptotically Optimal in General Enviroments; https://arxiv.org/abs/1602.07905

Jan Leike; Exploration Potential; https://arxiv.org/abs/1609.04994

Jan Leike; Nonparametric General Reinforcement Learning; https://jan.leike.name/publications/Nonparametric%20General%20Reinforcement%20Learning%20-%20Leike%202016.pdf

Jessica Taylor, Eliezer Yudkowsky, Patrick LaVictoire, Andrew Critch; Alignment for Advanced Machine Learning Systems; https://intelligence.org/files/AlignmentMachineLearning.pdf

Jessica Taylor; Quantilizers: A Safer Alternative to Maximizers for Limited Optimization; http://www.aaai.org/ocs/index.php/WS/AAAIW16/paper/view/12613

Joshua Greene, Francesca Rossi, John Tasioulas, Kristen Brent Venable, Brian Williams; Embedding Ethical Principles in Collective Decision Support Systems; http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/12457

Joshua Greene; Our driverless dilemma; http://science.sciencemag.org/content/352/6293/1514

Kaj Sotala; Defining Human Values for Value Learners; http://www.aaai.org/ocs/index.php/WS/AAAIW16/paper/view/12633

Kristinn R. Thórisson, Jordi Bieger, Thröstur Thorarensen, Jóna S. Sigurðardóttir and Bas R. Steunebrink; Why Artificial Intelligence Needs a Task Theory (And What It Might Look Like); http://people.idsia.ch/~steunebrink/Publications/AGI16_task_theory.pdf

Kristinn Thorisson; About Understanding;

Laurent Orseau and Stuart Armstrong; Safely Interruptible Agents; http://www.auai.org/uai2016/proceedings/papers/68.pdf

Lun-Kai Hsu, Tudor Achim and Stefano Ermon; Tight variational bounds via random projections and i-projections; https://arxiv.org/abs/1510.01308

Marc Lipsitch, Nicholas Evans, Owen Cotton-Barratt; Underprotection of Unpredictable Statistical Lives Compared to Predictable Ones; http://onlinelibrary.wiley.com/doi/10.1111/risa.12658/full

Nate Soares and Benya Fallenstein; Aligning Superintelligence with Human Interests: A Technical Research Agenda; https://intelligence.org/files/TechnicalAgenda.pdf

Nate Soares; MIRI OSTP submission; https://intelligence.org/2016/07/23/ostp/

Nate Soares; The Value Learning Problem; https://intelligence.org/files/ValueLearningProblem.pdf

Nathan Fulton, Andre Plater; A logic of proofs for differential dynamic logic: Toward independently checkable proof certificates for dynamic logics; http://nfulton.org/papers/lpdl.pdf

Nick Bostrom; Strategic Implications of Openness in AI Development; http://www.nickbostrom.com/papers/openness.pdf

Owain Evans, Andreas Stuhlmuller, Noah Goodman; Learning the Preferences of Bounded Agents; https://www.fhi.ox.ac.uk/wp-content/uploads/nips-workshop-2015-website.pdf

Owain Evans, Andreas Stuhlmuller, Noah Goodman; Learning the Preferences of Ignorant, Inconsistent Agents; https://arxiv.org/abs/1512.05832

Owen Cotton-Barratt, Sebastian Farquhar, Andrew Snyder-Beattie; Beyond Risk-Benefit Analysis: Pricing Externalities for Gain-of-Function Research of Concern; http://globalprioritiesproject.org/2016/03/beyond-risk-benefit-analysis-pricing-externalities-for-gain-of-function-research-of-concern/

Owen Cotton-Barratt, Sebastian Farquhar, John Halstead, Stefan Schubert, Andrew Snyder-Beattie; Global Catastrophic Risks 2016; http://globalprioritiesproject.org/2016/04/global-catastrophic-risks-2016/

Peter Asaro; The Liability Problem for Autonomous Artificial Agents; https://www.aaai.org/ocs/index.php/SSS/SSS16/paper/view/12699

Phil Torres; Agential Risks: A Comprehensive Introduction; http://jetpress.org/v26.2/torres.pdf

Roman V. Yampolskiy; Artificial Intelligence Safety and Cybersecurity: a Timeline of AI Failures; https://arxiv.org/ftp/arxiv/papers/1610/1610.07997.pdf

Roman V. Yampolskiy; Taxonomy of Pathways to Dangerous Artificial Intelligence; https://arxiv.org/abs/1511.03246

Roman Yampolskiy; Verifier Theory from Axioms to Unverifiability of Mathematics; http://128.84.21.199/abs/1609.00331

Scott Garrabrant, Tsvi Benson-Tilsen, Andrew Critch, Nate Soares and Jessica Taylor; Logical Induction; https://intelligence.org/2016/09/12/new-paper-logical-induction/

Scott Garrabrant, Benya Fallenstein, Abram Demski, Nate Soares; Inductive Coherence; https://arxiv.org/abs/1604.05288

Scott Garrabrant, Benya Fallenstein, Abram Demski, Nate Soares; Uniform Coherence; https://arxiv.org/abs/1604.05288

Scott Garrabrant, Nate Soares and Jessica Taylor; Asymptotic Convergence in Online Learning with Unbounded Delays; https://arxiv.org/abs/1604.05280

Scott Garrabrant, Siddharth Bhaskar, Abram Demski, Joanna Garrabrant, George Koleszarik and Evan Lloyd; Asymptotic Logical Uncertainty and the Benford Test; http://arxiv.org/abs/1510.03370

Seth Baum and Anthony Barrett; The most extreme risks: Global catastrophes; http://sethbaum.com/ac/fc_Extreme.html

Seth Baum; On the Promotion of Safe and Socially Beneficial Artificial Intelligence; https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2816323

Seth Baum; The Ethics of Outer Space: A Consequentialist Perspective; http://sethbaum.com/ac/2016_SpaceEthics.html

Shengjia Zhao, Sorathan Chaturapruek, Ashish Sabharwal andStefano Ermon; Closing the gap between short and long xors for model counting; https://arxiv.org/abs/1512.08863

Soenke Ziesche and Roman V. Yampolskiy; Artificial Fun: Mapping Minds to the Space of Fun; https://arxiv.org/abs/1606.07092

Stephanie Rosenthal, Sai Selvaraj, Manuela Veloso; Verbalization: Narration of Autonomous Robot Experience.; http://www.ijcai.org/Proceedings/16/Papers/127.pdf

Stephen M. Omohundro; The Basic AI Drives; https://selfawaresystems.files.wordpress.com/2008/01/ai_drives_final.pdf

Stuart Armstrong; Off-policy Monte Carlo agents with variable behaviour policies; https://www.fhi.ox.ac.uk/wp-content/uploads/monte_carlo_arXiv.pdf

Tom Everitt and marcus Hutter; Avoiding wireheading with value reinforcement learning; https://arxiv.org/abs/1605.03143

Tom Everitt, Daniel Filan, Mayank Daswani, and Marcus Hutter; Self-Modification of Policy and Utility Function in Rational Agents; http://www.tomeveritt.se/papers/AGI16-sm.pdf

Tsvi Benson-Tilsen, Nate Soares; Formalizing Convergent Instrumental Goals; http://www.aaai.org/ocs/index.php/WS/AAAIW16/paper/view/12634

Tudor Achim, Ashish Sabharwal, Stefano Ermon; Beyond parity constraints: Fourier analysis of hash functions for inference; http://www.jmlr.org/proceedings/papers/v48/achim16.html

Vincent Conitzer, Walter Sinnott-Armstrong, Jana Schaich Borg, Yuan Deng and Max Kramer; Moral Decision Making Frameworks for Artificial Intelligence; https://users.cs.duke.edu/~conitzer/moralAAAI17.pdf

Vincent Muller and Nick Bostrom; Future Progress in Artificial Intelligence: A survey of Expert Opinion; www.nickbostrom.com/papers/survey.pdf

Vittorio Perera, Sai P. Selveraj, Stephanie Rosenthal, Manuela Veloso; Dynamic Generation and Refinement of Robot Verbalization; http://www.cs.cmu.edu/~mmv/papers/16roman-verbalization.pdf

Zuhe Zhang, Benjamin Rubinstein, Christos Dimitrakakis; On the Differential Privacy of Bayesian Inference; https://arxiv.org/abs/1512.06992

What links here?