We’re planning to make potential risks from artificial intelligence a major priority this year. We feel this cause presents an outstanding philanthropic opportunity — with extremely high importance, high neglectedness, and reasonable tractability (our three criteria for causes) — for someone in our position. We believe that the faster we can get fully up to speed on key issues and explore the opportunities we currently see, the faster we can lay the groundwork for informed, effective giving both this year and in the future.
With all of this in mind, we’re placing a larger “bet” on this cause, this year, than we are placing even on other focus areas — not necessarily in terms of funding (we aren’t sure we’ll identify very large funding opportunities this year, and are more focused on laying the groundwork for future years), but in terms of senior staff time, which at this point is a scarcer resource for us. Consistent with our philosophy of hits-based giving, we are doing this not because we have confidence in how the future will play out and how we can impact it, but because we see a risk worth taking. In about a year, we’ll formally review our progress and reconsider how senior staff time is allocated.
This post will first discuss why I consider this cause to be an outstanding philanthropic opportunity. (My views are fairly representative, but not perfectly representative, of those of other staff working on this cause.) It will then give a broad outline of our planned activities for the coming year, some of the key principles we hope to follow in this work, and some of the risks and reservations we have about prioritizing this cause as highly as we are.
In brief:
It seems to me that artificial intelligence is currently on a very short list of the most dynamic, unpredictable, and potentially world-changing areas of science. I believe there’s a nontrivial probability that transformative AI will be developed within the next 20 years, with enormous global consequences.
By and large, I expect the consequences of this progress — whether or not transformative AI is developed soon — to be positive. However, I also perceive risks. Transformative AI could be a very powerful technology, with potentially globally catastrophic consequences if it is misused or if there is a major accident involving it. Because of this, I see this cause as having extremely high importance (one of our key criteria), even while accounting for substantial uncertainty about the likelihood of developing transformative AI in the coming decades and about the size of the risks. I discuss the nature of potential risks below; note that I think they do not apply to today’s AI systems.
I consider this cause to be highly neglected in important respects. There is a substantial and growing field around artificial intelligence and machine learning research, but most of it is not focused on reducing potential risks. We’ve put substantial work into trying to ensure that we have a thorough landscape of the researchers, funders, and key institutions whose work is relevant to potential risks from advanced AI. We believe that the amount of work being done is well short of what it productively could be (despite recent media attention); that philanthropy could be helpful; and that the activities we’re considering wouldn’t be redundant with those of other funders.
I believe that there is useful work to be done today in order to mitigate future potential risks. In particular, (a) I think there are important technical problems that can be worked on today, that could prove relevant to reducing accident risks; (b) I preliminarily feel that there is also considerable scope for analysis of potential strategic and policy considerations.
More broadly, the Open Philanthropy Project may be able to help support an increase in the number of people – particularly people with strong relevant technical backgrounds—thinking through how to reduce potential risks, which could be important in the future even if the work done in the short term does not prove essential. I believe that one of the things philanthropy is best-positioned to do is provide steady, long-term support as fields and institutions grow.
I consider this a challenging cause. I think it would be easy to do harm while trying to do good. For example, trying to raise the profile of potential risks could contribute (and, I believe, has contributed to some degree) to non-nuanced or inaccurate portrayals of risk in the media, which in turn could raise the risks of premature and/or counterproductive regulation. I consider the Open Philanthropy Project relatively well-positioned to work in this cause while being attentive to pitfalls, and to deeply integrate people with strong technical expertise into our work.
My views on this cause have evolved considerably over time. I will discuss the evolution of my thinking in detail in a future post, but this post focuses on the case for prioritizing this cause today.
Importance
It seems to me that AI and machine learning research is currently on a very short list of the most dynamic, unpredictable, and potentially world-changing areas of science.[1] In particular, I believe that this research may lead eventually to the development of transformative AI, which we have roughly and conceptually defined as AI that precipitates a transition comparable to (or more significant than) the agricultural or industrial revolution. I believe there is a nontrivial likelihood (at least 10% with moderate robustness, and at least 1% with high robustness) that transformative AI will be developed within the next 20 years. For more detail on the concept of transformative AI (including a more detailed definition), and why I believe it may be developed in the next 20 years, see our previous post.
I believe that today’s AI systems are accomplishing a significant amount of good, and by and large, I expect the consequences of further progress on AI — whether or not transformative AI is developed soon — to be positive. Improvements in AI have enormous potential to improve the speed and accuracy of medical diagnosis; reduce traffic accidents by making autonomous vehicles more viable; help people communicate with better search and translation; facilitate personalized education; speed up science that can improve health and save lives; accelerate development of sustainable energy sources; and contribute on a huge number of other fronts to improving global welfare and productivity. As I’ve written before, I believe that economic and technological development have historically been highly beneficial, often despite the fact that particular developments were subject to substantial pessimism before they played out. I also expect that if and when transformative AI is very close to development, many people will be intensely aware of both the potential benefits and risks, and will work to maximize the benefits and minimize the risks.
With that said, I think the risks are real and important:
Misuse risks. One of the main ways in which AI could be transformative is by enabling/accelerating the development of one or more enormously powerful technologies. In the wrong hands, this could make for an enormously powerful tool of authoritarians, terrorists, or other power-seeking individuals or institutions. I think the potential damage in such a scenario is nearly limitless (if transformative AI causes enough acceleration of a powerful enough technology), and could include long-lasting or even permanent effects on the world as a whole. We refer to this class of risk as “misuse risks.” I do not think we should let misuse scenarios dominate our thinking about the potential consequences of AI, any more than for any other powerful technology, but I do think it is worth asking whether there is anything we can do today to lay the groundwork for avoiding misuse risks in the future.
Accident risks. I also believe that there is a substantial class of potential “accident risks” that could rise (like misuse risks) to the level of global catastrophic risks. In the course of many conversations with people in the field, we’ve seen substantial (though far from universal) concern that such risks could arise and no clear arguments for being confident that they will be easy to address. These risks are difficult to summarize; we’ve described them in more detail previously, and I will give only a basic outline here.
As goal-directed AI systems (such as reinforcement learning systems) become more capable, they will likely pursue the goals (e.g. as implied by a loss function) assigned to them in increasingly effective, unexpected, and hard-to-understand ways. Among these unexpected behaviors, there could be harmful behaviors, arising from (a) mismatches between the goals that programmers conceptually intend and the goals programmers technically, formally specify; (b) failures of AI systems to detect and respond to major context changes (I understand context change to be an area that many currently-highly-capable AI systems perform poorly at); (c) other technical problems. (See below for a slightly more detailed description of one possible failure mode.) It may be difficult to catch undesirable behaviors when an AI system is operating, in part because undesirable behaviors may be hard to distinguish from clever and desirable behaviors. It may, furthermore, be difficult and time-consuming to implement measures for confidently preventing undesirable behaviors, since they might emerge only in particular complex real-world situations (which raise the odds of major context changes and the risks of unexpected strategies for technically achieving specified goals) rather than in testing. If institutions end up “racing” to deploy powerful AI systems, this could create a significant risk of not taking sufficient precautions.
The result could be a highly intelligent, autonomous, unchecked system or set of systems optimizing for a problematic goal, which could put powerful technologies to problematic purposes and could cause significant harm. I think the idea of a globally catastrophic accident from AI only makes sense for certain kinds of AI—not for all things I would count as transformative AI. My rough impression at this time is that this sort of risk does not have a high overall likelihood (when taking into account that I expect people to take measures to prevent it), though it may have a high enough likelihood to be a very important consideration given the potential stakes. In conversations on this topic, I’ve perceived very large differences of opinion on the size of this risk, and could imagine changing my view on the matter significantly in the next year or so.
Other risks. Some risks could stem from changes that come about due to widespread use of AI systems, rather than from a particular accident or misuse. In particular, AI advances could dramatically transform the economy by leading to the automation of many tasks—including driving and various forms of manufacturing—currently done professionally by many people. The effects of such a transformation seem hard to predict and could be highly positive, but there are risks that it could greatly exacerbate inequality and harm well-being by worsening employment options for many people. We are tentatively less likely to focus on this type of risk than the above two types, since we expect this type of risk to be (a) relatively likely to develop gradually, with opportunities to respond as it develops; (b) less extreme in terms of potential damage, and in particular less likely to be a global catastrophic risk as we’ve defined it, than misuse or accidents; (c) somewhat less neglected than the other risks. But this could easily change depending on what we learn and what opportunities we come across.
If the above reasoning is right (and I believe much of it is highly debatable, particularly when it comes to my previous post’s arguments as well as the importance of accident risks), I believe it implies that this cause is not just important but something of an outlier in terms of importance, given that we are operating in an expected-value framework and are interested in low-probability, high-potential-impact scenarios.[2] The underlying stakes would be qualitatively higher than those of any issues we’ve explored or taken on under the U.S. policy category, to a degree that I think more than compensates for e.g. a “10% chance that this is relevant in the next 20 years” discount. When considering other possible transformative developments, I can’t think of anything else that seems equally likely to be comparably transformative on a similar time frame, while also presenting such a significant potential difference between best- and worst-case imaginable outcomes.
One reason that I’ve focused on a 20-year time frame is that I think this kind of window should, in a sense, be considered “urgent” from a philanthropist’s perspective. I see philanthropy as being well-suited to low-probability, long-term investments. I believe there are many past cases in which it took a very long time for philanthropy to pay off,[3] especially when its main value-added was supporting the gradual growth of organizations, fields and research that would eventually make a difference. If I thought there were negligible probability of transformative AI in the next 20 years, I would still consider this cause important enough to be a focus area for us, but we would not be prioritizing it as highly as we plan to this year.
The above has focused on potential risks of transformative AI. There are also many potential AI developments short of transformative AI that could be very important. For example:
Continued advances in computer vision, audio recognition, etc. could dramatically alter what sorts of surveillance are possible, with a wide variety of potential implications; advances in robotics could have major implications for the future of warfare or policing. These could be important whether or not they ended up being “transformative” in our sense.
Automation could have major economic implications, again even if the underlying AI systems are not “transformative” in our sense.
We are interested in these potential developments, and see the possibility of helping to address them as a potential benefit of allocating resources to this cause. With that said, my previously expressed views, if correct, would imply that most of the “importance” (as we’ve defined it) in this cause comes from the enormously high-stakes possibility of transformative AI.
Neglectedness
Both artificial intelligence generally and potential risks have received increased attention in recent years.[4] We’ve put substantial work into trying to ensure that we have a thorough landscape of the researchers, funders, and key institutions in this space. We will later be putting out a landscape document, which will be largely consistent with the landscape we published last year. In brief:
There is a substantial and growing field, with a significant academic presence and significant corporate funding as well, around artificial intelligence and machine learning research.
There are a few organizations focused on reducing potential risks, either by pursuing particular technical research agendas or by highlighting strategic considerations. (An example of the latter is Nick Bostrom’s work, housed at the Future of Humanity Institute, on Superintelligence.) Most of these organizations are connected to the effective altruism community. Based on conversations we’ve had over the last few months, I believe some of these organizations have substantial room for more funding. There tends to be fairly little intersection between the people working at these organizations and people with substantial experience in mainstream research on AI and machine learning.
Ideally, I’d like to see leading researchers in AI and machine learning play leading roles in thinking through potential risks, including the associated technical challenges. Under the status quo, I feel that these fields—culturally and institutionally—do not provide much incentive to engage with these issues. While there is some interest in potential risks—in particular, some private labs have expressed informal interest in the matter, and many strong academics applied for the Future of Life Institute request for proposals that we co-funded last year—I believe there is room for much more. In particular, I believe that the amount of dedicated technical work focused on reducing potential risks is relatively small compared to the extent of open technical questions.
I’d also like to see a larger set of institutions working on key questions around strategic and policy considerations for reducing risks. I am particularly interested in frameworks for minimizing future misuse risks of transformative AI. I would like to see institutions with strong policy expertise considering different potential scenarios with respect to transformative AI; considering how governments, corporations, and individual researchers should react in those scenarios; and working with AI and machine learning researchers to identify potential signs that particular scenarios are becoming more likely. I believe there may be nearer-term questions (such as how to minimize misuse of advanced surveillance and drones) that can serve as jumping-off points for this sort of thinking.
Elon Musk, the majority funder of the Future of Life Institute’s 3-year grant program on robust and beneficial AI, is currently focusing his time and effort (along with significant funding) on OpenAI and its efforts to mitigate potential risks. (OpenAI is an AI research company that operates as a nonprofit.) We’re not aware of other similarly large private funders focused on potential risks from advanced artificial intelligence. There are government funders interested in the area, but they appear to operate under heavy constraints. There are individual donors interested in this space, but it appears to us that they are focused on different aspects of the problem and/or are operating a smaller scale.
Bottom line—I consider this cause to be highly neglected, particularly by philanthropists, and I see major gaps in the relevant fields that a philanthropist could potentially help to address.
Tractability
It’s been the case for a long time that I see this cause as important and neglected, and that my biggest reservation has been tractability. I see transformative AI as very much a future technology – I’ve argued that there is a nontrivial probability that it will be developed in the next 20 years, but it is also quite plausibly more than 100 years away, and even 20 years is a relatively long time. Working to reduce risks from a technology that is so far in the future, and about which so much is still unknown, could easily be futile.
With that said, this cause is not as unique in this respect as it might appear at first. I believe that one of the things philanthropy is best-positioned to do is provide steady, long-term support as fields and institutions grow. This activity is necessarily slow. It requires being willing to support groups based largely on their leadership and mission, rather than immediate plans for impact, in order to lay the groundwork for an uncertain future. I’ve written about this basic approach in the context of policy work, and I believe there is ample precedent for it in the history of philanthropy. It is the approach we favor for several of our other focus areas, such as immigration policy and macroeconomic stabilization policy.
And I have come to believe that there is potentially useful work to be done today that could lay the groundwork for mitigating future potential risks. In particular:
I think there are important technical challenges that could prove relevant to reducing accident risks.
I’ve previously put significant weight on an argument along the lines of, “By the time transformative AI is developed, the important approaches to AI will be so different from today’s that any technical work done today will have a very low likelihood of being relevant.” My views have shifted significantly for two reasons. First, as discussed previously, I now think there is a nontrivial chance that transformative AI will be developed in the next 20 years, and that the above-quoted argument carries substantially less weight when focusing on that high-stakes potential scenario. Second, having had more conversations about open technical problems that could be relevant to reducing risks, I’ve come to believe that there is a substantial amount of work worth doing today, regardless of how long it will be until the development of transformative AI.
Potentially relevant challenges that we’ve come across so far include value learning (designing AI systems to learn the values of other agents through e.g. inverse reinforcement learning); problems having to do with making reinforcement learning systems and other AI agents less likely to behave in undesirable ways (designing reinforcement learning systems that will not try to gain direct control of their rewards, that will avoid behavior with unreasonably far-reaching impacts, and that will be robust against differences between formally specified rewards and human designers’ intentions in specifying those rewards); reliability and usability of machine learning techniques (including transparency, understandability, and robustness against or at least detection of large changes in input distribution); formal specification and verification of deep learning, reinforcement learning, and other AI systems; better theoretical understanding of desirable properties for powerful AI systems; and a variety of challenges related to an approach laid out in a series of blog posts by Paul Christiano.
Going into the details of these challenges is beyond the scope of this post, but to give a sense for non-technical readers of what a relevant challenge might look like, I will elaborate briefly on one challenge. A reinforcement learning system is designed to learn to behave in a way that maximizes a quantitative “reward” signal that it receives periodically from its environment—for example, DeepMind’s Atari player is a reinforcement learning system that learns to choose controller inputs (its behavior) in order to maximize the game score (which the system receives as “reward”), and this produces very good play on many Atari games. However, if a future reinforcement learning system’s inputs and behaviors are not constrained to a video game, and if the system is good enough at learning, a new solution could become available: the system could maximize rewards by directly modifying its reward “sensor” to always report the maximum possible reward, and by avoiding being shut down or modified back for as long as possible. This behavior is a formally correct solution to the reinforcement learning problem, but it is probably not the desired behavior. And this behavior might not emerge until a system became quite sophisticated and had access to a lot of real-world data (enough to find and execute on this strategy), so a system could appear “safe” based on testing and turn out to be problematic when deployed in a higher-stakes setting. The challenge here is to design a variant of reinforcement learning that would not result in this kind of behavior; intuitively, the challenge would be to design the system to pursue some actual goal in the environment that is only indirectly observable, instead of pursuing problematic proxy measures of that goal (such as a “hackable” reward signal).
It appears to me that work on challenges like the above is possible in the near term, and could be useful in several ways. Solutions to these problems could turn out to directly reduce accident risks from transformative AI systems developed in the future, or could be stepping stones toward techniques that could reduce these risks; work on these problems could clarify desirable properties of present-day systems that apply equally well to systems developed in the longer-term; or work on these problems today could help to build up the community of people who will eventually work on risks posed by longer-term development, which would be difficult to do in the absence of concrete technical challenges.
I preliminarily feel that there is also useful work to be done today in order to reduce future misuse risks and provide useful analysis of strategic and policy considerations.
As mentioned above, I would like to see more institutions working on considering different potential scenarios with respect to transformative AI; considering how governments, corporations, and individual researchers should react in those scenarios; and working with machine learning researchers to identify potential signs that particular scenarios are becoming more likely.
I think it’s worth being careful about funding this sort of work, since it’s possible for it to backfire. My current impression is that government regulation of AI today would probably be unhelpful or even counterproductive (for instance by slowing development of AI systems, which I think currently pose few risks and do significant good, and/or by driving research underground or abroad). If we funded people to think and talk about misuse risks, I’d worry that they’d have incentives to attract as much attention as possible to the issues they worked on, and thus to raise the risk of such premature/counterproductive regulation.
With that said, I believe that potential risks have now received enough attention – some of which has been unfortunately exaggerated in my view – that premature regulation and/or intervention by government agencies is already a live risk. I’d be interested in the possibility of supporting institutions that could provide thoughtful, credible, public analysis of whether and when government regulation/intervention would be advisable, even if it meant simply making the case against such things for the foreseeable future. I think such analysis would likely improve the quality of discussion and decision-making, relative to what will happen without it.
I also think that technical work related to accident risks – along the lines discussed above – could be indirectly useful for reducing misuse risks as well. Currently, it appears to me that different people in the field have very different intuitions about how serious and challenging accident risks are. If it turns out that there are highly promising paths to reducing accident risks – to the point where the risks look a lot less serious – this development could result in a beneficial refocusing of attention on misuse risks. (If, by contrast, it turns out that accident risks are large and present substantial technical challenges, this makes work on such risks extremely valuable.)
Other notes on tractability.
I’ve long worried that it’s simply too difficult to make meaningful statements (even probabilistic ones) about the future course of technology and its implications. However, I’ve gradually changed my view on this topic, partly due to reading I’ve done on personal time. It will be challenging to assemble and present the key data points, but I hope to do so at some point this year.
Much of our overarching goal for this cause, in the near term, is to support an increase in the number of people – particularly people with strong relevant technical backgrounds—thinking through how to reduce potential risks. Even if the specific technical, strategic and other work we support does not prove useful, helping to support a growing field in this way could be. With that said, I think we will accomplish this goal best if the people we support are doing good and plausibly useful work.
Bottom line. I think there are real questions around the extent to which there is work worth doing today to reduce potential risks from advanced artificial intelligence. That said, I see a reasonable amount of potential if there were more people and institutions focused on the relevant issues; given the importance and neglectedness of this cause, I think that’s sufficient to prioritize it highly.
Some Open-Phil-specific considerations
Networks
I consider this a challenging cause. I think it would be easy to do harm while trying to do good. For example:
Trying to raise the profile of potential risks could contribute (and, I believe, has contributed to some degree) to non-nuanced or inaccurate portrayals of risk in the media, which in turn could raise the risks of premature and/or counterproductive regulation. In addition, raising such risks (or being perceived as doing so) could—in turn—cause many AI and machine learning researchers who oppose such regulation to become hostile to the idea of discussing potential risks.
Encouraging particular lines of research without sufficient input and buy-in from leading AI and machine learning researchers could be not only unproductive but counterproductive. It could lead to people generally taking risk-focused research less seriously. And since leading researchers tend to be extremely busy, getting thorough input from them can be challenging in itself.
I think it is important for someone working in this space to be highly attentive to these risks. In my view, one of the best ways to achieve this is to be as well-connected as possible to the people who have thought most deeply about the key issues, including both the leading researchers in AI and machine learning and the people/organizations most focused on reducing long-term risks.
I believe the Open Philanthropy Project is unusually well-positioned from this perspective:
We are well-connected in the effective altruism community, which includes many of the people and organizations that have been most active in analyzing and raising awareness of potential risks from advanced artificial intelligence. For example, Daniel Dewey has previously worked at the Future of Humanity Institute and the Future of Life Institute, and has been a research associate with the Machine Intelligence Research Institute.
One consideration that has made me hesitant about prioritizing this cause is the fact that I see relatively little in the way of truly “shovel-ready” giving opportunities. I list our likely priorities in the next section; I think they are likely to be very time-consuming for staff, and I am unsure of how long it will take before we see as many concrete giving opportunities as we do in some of our other focus areas.
However, I think the case for this cause is compelling enough to outweigh this consideration, and I think a major investment of senior staff time this year could leave us much better positioned to find outstanding giving opportunities in the future.
Our plans
For the last couple of months, we have focused on:
Talking to as many people as possible in the relevant communities, particularly leading researchers in AI and machine learning, in order to get feedback on our thinking, deepen our understanding of the relevant issues, and ensure that we have open channels of communication with them. Some high-level notes from these conversations are below.
Developing our communications strategy for this topic, including this series of blog posts.
Investigating the few potential “shovel-ready grants” (by which I mean grants we can investigate and recommend with relatively low time investments) we’re aware of. We will be publishing more about these later.
Working with several technical advisors to begin to get a sense of what the most important concrete, known technical challenges are. Our hope is to get to the point of being able to offer substantial funding to support work on the most important challenges. We’re beginning with close contacts and planning to broaden the conversation about the most important technical challenges from there.
Having initial conversations about what sorts of misuse risks we should be most concerned about, and what sorts of strategic and policy considerations seem most important, in order to lay the groundwork for finding potential grantees in this category.
Seeking past cases in which philanthropists helped support the growth of technical fields, to see what we can learn.
Ultimately, we expect to seek giving opportunities in the following categories:
“Shovel-ready” grants to existing organizations and researchers focused on reducing potential risks from advanced artificial intelligence.
Supporting substantial work on the most important technical challenges related to reducing accident risks. This could take the form of funding academic centers, requests for proposals, convenings and workshops, and/or individual researchers.
Supporting thoughtful, nuanced, independent analysis seeking to help inform discussions of key strategic and policy considerations for reducing potential risks, including misuse risks.
“Pipeline building”: supporting programs, such as fellowships, that can increase the total number of people who are deeply knowledgeable about technical research on artificial intelligence and machine learning, while also being deeply versed in issues relevant to potential risks.
Other giving opportunities that we come across, including those that pertain to AI-relevant issues other than those we’ve focused on in this post (some such issues are listed above).
Getting to this point will likely require a great deal more work and discussion – internally and with the relevant communities more broadly. It could be a long time before we are recommending large amounts of giving in this area, and I think that allocating significant senior staff time to the cause will speed our work considerably.
Some overriding principles for our work
As we work in this space, we think it’s especially important to follow a few core principles:
Don’t lose sight of the potential benefits of AI, even as we focus on mitigating risks
Our work is focused on potential risks, because this is the aspect of AI research that seems most neglected at the moment. However, as stated above, I see many ways in which AI has enormous potential to improve the world, and I expect the consequences of advances in AI to be positive on balance. It is important to act and communicate accordingly.
Deeply integrate people with strong technical expertise in our work
The request for proposals we co-funded last year employed an expert review panel for selecting grantees. We wouldn’t have participated if it had involved selecting grantees ourselves with nontechnical staff. We believe that AI and machine learning researchers are the people best positioned to make many assessments that will be important to us, such as which technical problems seem tractable and high-potential and which researchers have impressive accomplishments.
Seek a lot of input, and reflect a good deal, before committing to major grants and other activities
As stated above, I consider this a challenging cause, where well-intentioned actions could easily do harm. We are seeking to be thoroughly networked and to seek substantial advice on our activities from a range of people, both AI and machine learning researchers and people focused on reduction of potential risks.
Support work that could be useful in a variety of ways and in a variety of scenarios, rather than trying to make precise predictions
I don’t think it’s possible to have certainty, today, about when we should expect transformative AI, what form we should expect it to take, and/or what the consequences will be. We have a preference for supporting work that seems robustly likely to be useful. In particular, one of our main goals is to support an increase in the number of people – particularly people with strong relevant technical backgrounds—dedicated to thinking through how to reduce potential risks.
Distinguish between lower-stakes, higher-stakes, and highest-stakes potential risks
There are many imaginable risks of advanced artificial intelligence. Our focus is likely to be on those that seem to have the very highest stakes, to the point of being potential global catastrophic risks. In our view currently, that means misuse risks and accident risks involving transformative AI. We also consider neglectedness (we prefer to work on risks receiving less attention from others) and tractability (we prefer to work on risks where it seems there is useful work to be done today that can help mitigate them).
Notes on AI and machine learning researchers’ views on the topics discussed here
Over the last couple of months, we have been reaching out to AI and machine learning researchers that we don’t already have strong relationships with in order to discuss our plans and background views and get their feedback. We have put particular effort into seeking out skeptics and potential critics. As of today, we have requested 35 conversations along these lines and had 25. About three-fourths of these conversations have been with tenure-track academics or senior researchers at private labs, and the remainder have been with students or junior researchers at top AI and machine learning departments and private labs.
We’ve heard a diverse set of perspectives. Conversations were in confidence and often time-constrained, so we wouldn’t feel comfortable attributing specific views to specific people. Speaking generally, however, it seems to us that:
We encountered fewer strong skeptics of this cause than we expected to, given our previous informal impression that there are many researchers who are dismissive of potential risks from advanced artificial intelligence. That said, we spoke to a couple of highly skeptical researchers, and a few high-profile researchers who we think might be highly skeptical declined to speak with us.
Most of the researchers we talked to did not seem to have spent significant time or energy engaging with questions around potential risks from advanced artificial intelligence. To the extent they had views, most of the people we talked to seemed generally supportive of the views and goals we’ve laid out in this post (though this does not at all mean that they would endorse everything we’ve said).
Overall, these conversations caused us to update slightly positively on the promise of this cause and our plans. We hope to have many more conversations with AI and machine learning researchers in the coming months to deepen our understanding of the different perspectives in the field.
Risks and reservations
I see much room for debate in the decision to prioritize this cause as highly as we are. I have discussed most of the risks and reservations I see in this post and the ones preceding it. Here I list the major ones in one place. In this section, my goal is to provide a consolidated list of risks and reservations, but not necessarily to give my comprehensive take on each.
As discussed previously, I assign a nontrivial probability (at least 10% with moderate robustness, at least 1% with high robustness) to the development of transformative AI within the next 20 years. I feel I have thought deeply about this question, with access to strong technical advisors, and that we’ve collected what information we can, though I haven’t been able to share all important inputs into my thinking publicly. I recognize that our information is limited, and my take is highly debatable.
I see a risk that our thinking is distorted by being in an “echo chamber,” and that our views on the importance of this cause are overly reinforced by our closest technical advisors and by the effective altruism community. I’ve written previously about why I don’t consider this a fatal concern, but it does remain a concern.
I do not want to exacerbate what I see as an unfortunate pattern, to date, of un-nuanced and inaccurate media portrayals of potential risks from advanced artificial intelligence. I think this could lead to premature and/or counterproductive regulation, among other problems. We hope to communicate about our take on this cause with enough nuance to increase interest in reducing risks, without causing people to view AI as more threatening than positive.
I think the case that this cause is neglected is fairly strong, but leaves plenty of room for doubt. In particular, the cause has received attention from some high-profile people, and multiple well-funded AI labs and many AI researchers have expressed interest in doing what they can to reduce potential risks. It’s possible that they will end up pursuing essentially all relevant angles, and that the activities listed above will prove superfluous.
I’m mindful of the possibility that it might be futile to make meaningful predictions, form meaningful plans, and do meaningful work to reduce fairly far-off and poorly-understood potential risks.
I recognize that it’s debatable how important accident risks are. It’s possible that preventing truly catastrophic accidents will prove to be relatively easy, and that early work will look in hindsight like a poor use of resources.
I’m not in a position to support this claim very systematically, but we have done a substantial amount of investigation and discussion of various aspects of scientific research, as discussed in our recent annual review. In a previous post, I addressed what I see as the most noteworthy other possible major developments in the next 20 years.
Here I mean that it scores significantly higher by this criterion than the vast majority of causes, not that it stands entirely alone. I think there are a few other causes that have comparable importance, though none that I think have greater importance, as we’ve defined it.
Potential Risks from Advanced Artificial Intelligence: The Philanthropic Opportunity
Link post
We’re planning to make potential risks from artificial intelligence a major priority this year. We feel this cause presents an outstanding philanthropic opportunity — with extremely high importance, high neglectedness, and reasonable tractability (our three criteria for causes) — for someone in our position. We believe that the faster we can get fully up to speed on key issues and explore the opportunities we currently see, the faster we can lay the groundwork for informed, effective giving both this year and in the future.
With all of this in mind, we’re placing a larger “bet” on this cause, this year, than we are placing even on other focus areas — not necessarily in terms of funding (we aren’t sure we’ll identify very large funding opportunities this year, and are more focused on laying the groundwork for future years), but in terms of senior staff time, which at this point is a scarcer resource for us. Consistent with our philosophy of hits-based giving, we are doing this not because we have confidence in how the future will play out and how we can impact it, but because we see a risk worth taking. In about a year, we’ll formally review our progress and reconsider how senior staff time is allocated.
This post will first discuss why I consider this cause to be an outstanding philanthropic opportunity. (My views are fairly representative, but not perfectly representative, of those of other staff working on this cause.) It will then give a broad outline of our planned activities for the coming year, some of the key principles we hope to follow in this work, and some of the risks and reservations we have about prioritizing this cause as highly as we are.
In brief:
It seems to me that artificial intelligence is currently on a very short list of the most dynamic, unpredictable, and potentially world-changing areas of science. I believe there’s a nontrivial probability that transformative AI will be developed within the next 20 years, with enormous global consequences.
By and large, I expect the consequences of this progress — whether or not transformative AI is developed soon — to be positive. However, I also perceive risks. Transformative AI could be a very powerful technology, with potentially globally catastrophic consequences if it is misused or if there is a major accident involving it. Because of this, I see this cause as having extremely high importance (one of our key criteria), even while accounting for substantial uncertainty about the likelihood of developing transformative AI in the coming decades and about the size of the risks. I discuss the nature of potential risks below; note that I think they do not apply to today’s AI systems.
I consider this cause to be highly neglected in important respects. There is a substantial and growing field around artificial intelligence and machine learning research, but most of it is not focused on reducing potential risks. We’ve put substantial work into trying to ensure that we have a thorough landscape of the researchers, funders, and key institutions whose work is relevant to potential risks from advanced AI. We believe that the amount of work being done is well short of what it productively could be (despite recent media attention); that philanthropy could be helpful; and that the activities we’re considering wouldn’t be redundant with those of other funders.
I believe that there is useful work to be done today in order to mitigate future potential risks. In particular, (a) I think there are important technical problems that can be worked on today, that could prove relevant to reducing accident risks; (b) I preliminarily feel that there is also considerable scope for analysis of potential strategic and policy considerations.
More broadly, the Open Philanthropy Project may be able to help support an increase in the number of people – particularly people with strong relevant technical backgrounds—thinking through how to reduce potential risks, which could be important in the future even if the work done in the short term does not prove essential. I believe that one of the things philanthropy is best-positioned to do is provide steady, long-term support as fields and institutions grow.
I consider this a challenging cause. I think it would be easy to do harm while trying to do good. For example, trying to raise the profile of potential risks could contribute (and, I believe, has contributed to some degree) to non-nuanced or inaccurate portrayals of risk in the media, which in turn could raise the risks of premature and/or counterproductive regulation. I consider the Open Philanthropy Project relatively well-positioned to work in this cause while being attentive to pitfalls, and to deeply integrate people with strong technical expertise into our work.
I see much room for debate in the decision to prioritize this cause as highly as we are. However, I think it is important that a philanthropist in our position be willing to take major risks, and prioritizing this cause is a risk that I see as very worth taking.
My views on this cause have evolved considerably over time. I will discuss the evolution of my thinking in detail in a future post, but this post focuses on the case for prioritizing this cause today.
Importance
It seems to me that AI and machine learning research is currently on a very short list of the most dynamic, unpredictable, and potentially world-changing areas of science.[1] In particular, I believe that this research may lead eventually to the development of transformative AI, which we have roughly and conceptually defined as AI that precipitates a transition comparable to (or more significant than) the agricultural or industrial revolution. I believe there is a nontrivial likelihood (at least 10% with moderate robustness, and at least 1% with high robustness) that transformative AI will be developed within the next 20 years. For more detail on the concept of transformative AI (including a more detailed definition), and why I believe it may be developed in the next 20 years, see our previous post.
I believe that today’s AI systems are accomplishing a significant amount of good, and by and large, I expect the consequences of further progress on AI — whether or not transformative AI is developed soon — to be positive. Improvements in AI have enormous potential to improve the speed and accuracy of medical diagnosis; reduce traffic accidents by making autonomous vehicles more viable; help people communicate with better search and translation; facilitate personalized education; speed up science that can improve health and save lives; accelerate development of sustainable energy sources; and contribute on a huge number of other fronts to improving global welfare and productivity. As I’ve written before, I believe that economic and technological development have historically been highly beneficial, often despite the fact that particular developments were subject to substantial pessimism before they played out. I also expect that if and when transformative AI is very close to development, many people will be intensely aware of both the potential benefits and risks, and will work to maximize the benefits and minimize the risks.
With that said, I think the risks are real and important:
Misuse risks. One of the main ways in which AI could be transformative is by enabling/accelerating the development of one or more enormously powerful technologies. In the wrong hands, this could make for an enormously powerful tool of authoritarians, terrorists, or other power-seeking individuals or institutions. I think the potential damage in such a scenario is nearly limitless (if transformative AI causes enough acceleration of a powerful enough technology), and could include long-lasting or even permanent effects on the world as a whole. We refer to this class of risk as “misuse risks.” I do not think we should let misuse scenarios dominate our thinking about the potential consequences of AI, any more than for any other powerful technology, but I do think it is worth asking whether there is anything we can do today to lay the groundwork for avoiding misuse risks in the future.
Accident risks. I also believe that there is a substantial class of potential “accident risks” that could rise (like misuse risks) to the level of global catastrophic risks. In the course of many conversations with people in the field, we’ve seen substantial (though far from universal) concern that such risks could arise and no clear arguments for being confident that they will be easy to address. These risks are difficult to summarize; we’ve described them in more detail previously, and I will give only a basic outline here.
As goal-directed AI systems (such as reinforcement learning systems) become more capable, they will likely pursue the goals (e.g. as implied by a loss function) assigned to them in increasingly effective, unexpected, and hard-to-understand ways. Among these unexpected behaviors, there could be harmful behaviors, arising from (a) mismatches between the goals that programmers conceptually intend and the goals programmers technically, formally specify; (b) failures of AI systems to detect and respond to major context changes (I understand context change to be an area that many currently-highly-capable AI systems perform poorly at); (c) other technical problems. (See below for a slightly more detailed description of one possible failure mode.) It may be difficult to catch undesirable behaviors when an AI system is operating, in part because undesirable behaviors may be hard to distinguish from clever and desirable behaviors. It may, furthermore, be difficult and time-consuming to implement measures for confidently preventing undesirable behaviors, since they might emerge only in particular complex real-world situations (which raise the odds of major context changes and the risks of unexpected strategies for technically achieving specified goals) rather than in testing. If institutions end up “racing” to deploy powerful AI systems, this could create a significant risk of not taking sufficient precautions.
The result could be a highly intelligent, autonomous, unchecked system or set of systems optimizing for a problematic goal, which could put powerful technologies to problematic purposes and could cause significant harm. I think the idea of a globally catastrophic accident from AI only makes sense for certain kinds of AI—not for all things I would count as transformative AI. My rough impression at this time is that this sort of risk does not have a high overall likelihood (when taking into account that I expect people to take measures to prevent it), though it may have a high enough likelihood to be a very important consideration given the potential stakes. In conversations on this topic, I’ve perceived very large differences of opinion on the size of this risk, and could imagine changing my view on the matter significantly in the next year or so.
Other risks. Some risks could stem from changes that come about due to widespread use of AI systems, rather than from a particular accident or misuse. In particular, AI advances could dramatically transform the economy by leading to the automation of many tasks—including driving and various forms of manufacturing—currently done professionally by many people. The effects of such a transformation seem hard to predict and could be highly positive, but there are risks that it could greatly exacerbate inequality and harm well-being by worsening employment options for many people. We are tentatively less likely to focus on this type of risk than the above two types, since we expect this type of risk to be (a) relatively likely to develop gradually, with opportunities to respond as it develops; (b) less extreme in terms of potential damage, and in particular less likely to be a global catastrophic risk as we’ve defined it, than misuse or accidents; (c) somewhat less neglected than the other risks. But this could easily change depending on what we learn and what opportunities we come across.
The above risks could be amplified if AI capabilities improved relatively rapidly and unexpectedly, making it harder for society to anticipate, prepare for, and adapt to risks. This dynamic could (though won’t necessarily) be an issue if it turns out that a relatively small number of conceptual breakthroughs turn out to have very general applications.
If the above reasoning is right (and I believe much of it is highly debatable, particularly when it comes to my previous post’s arguments as well as the importance of accident risks), I believe it implies that this cause is not just important but something of an outlier in terms of importance, given that we are operating in an expected-value framework and are interested in low-probability, high-potential-impact scenarios.[2] The underlying stakes would be qualitatively higher than those of any issues we’ve explored or taken on under the U.S. policy category, to a degree that I think more than compensates for e.g. a “10% chance that this is relevant in the next 20 years” discount. When considering other possible transformative developments, I can’t think of anything else that seems equally likely to be comparably transformative on a similar time frame, while also presenting such a significant potential difference between best- and worst-case imaginable outcomes.
One reason that I’ve focused on a 20-year time frame is that I think this kind of window should, in a sense, be considered “urgent” from a philanthropist’s perspective. I see philanthropy as being well-suited to low-probability, long-term investments. I believe there are many past cases in which it took a very long time for philanthropy to pay off,[3] especially when its main value-added was supporting the gradual growth of organizations, fields and research that would eventually make a difference. If I thought there were negligible probability of transformative AI in the next 20 years, I would still consider this cause important enough to be a focus area for us, but we would not be prioritizing it as highly as we plan to this year.
The above has focused on potential risks of transformative AI. There are also many potential AI developments short of transformative AI that could be very important. For example:
Autonomous vehicles could become widespread relatively soon.
Continued advances in computer vision, audio recognition, etc. could dramatically alter what sorts of surveillance are possible, with a wide variety of potential implications; advances in robotics could have major implications for the future of warfare or policing. These could be important whether or not they ended up being “transformative” in our sense.
Automation could have major economic implications, again even if the underlying AI systems are not “transformative” in our sense.
We are interested in these potential developments, and see the possibility of helping to address them as a potential benefit of allocating resources to this cause. With that said, my previously expressed views, if correct, would imply that most of the “importance” (as we’ve defined it) in this cause comes from the enormously high-stakes possibility of transformative AI.
Neglectedness
Both artificial intelligence generally and potential risks have received increased attention in recent years.[4] We’ve put substantial work into trying to ensure that we have a thorough landscape of the researchers, funders, and key institutions in this space. We will later be putting out a landscape document, which will be largely consistent with the landscape we published last year. In brief:
There is a substantial and growing field, with a significant academic presence and significant corporate funding as well, around artificial intelligence and machine learning research.
There are a few organizations focused on reducing potential risks, either by pursuing particular technical research agendas or by highlighting strategic considerations. (An example of the latter is Nick Bostrom’s work, housed at the Future of Humanity Institute, on Superintelligence.) Most of these organizations are connected to the effective altruism community. Based on conversations we’ve had over the last few months, I believe some of these organizations have substantial room for more funding. There tends to be fairly little intersection between the people working at these organizations and people with substantial experience in mainstream research on AI and machine learning.
Ideally, I’d like to see leading researchers in AI and machine learning play leading roles in thinking through potential risks, including the associated technical challenges. Under the status quo, I feel that these fields—culturally and institutionally—do not provide much incentive to engage with these issues. While there is some interest in potential risks—in particular, some private labs have expressed informal interest in the matter, and many strong academics applied for the Future of Life Institute request for proposals that we co-funded last year—I believe there is room for much more. In particular, I believe that the amount of dedicated technical work focused on reducing potential risks is relatively small compared to the extent of open technical questions.
I’d also like to see a larger set of institutions working on key questions around strategic and policy considerations for reducing risks. I am particularly interested in frameworks for minimizing future misuse risks of transformative AI. I would like to see institutions with strong policy expertise considering different potential scenarios with respect to transformative AI; considering how governments, corporations, and individual researchers should react in those scenarios; and working with AI and machine learning researchers to identify potential signs that particular scenarios are becoming more likely. I believe there may be nearer-term questions (such as how to minimize misuse of advanced surveillance and drones) that can serve as jumping-off points for this sort of thinking.
Elon Musk, the majority funder of the Future of Life Institute’s 3-year grant program on robust and beneficial AI, is currently focusing his time and effort (along with significant funding) on OpenAI and its efforts to mitigate potential risks. (OpenAI is an AI research company that operates as a nonprofit.) We’re not aware of other similarly large private funders focused on potential risks from advanced artificial intelligence. There are government funders interested in the area, but they appear to operate under heavy constraints. There are individual donors interested in this space, but it appears to us that they are focused on different aspects of the problem and/or are operating a smaller scale.
Bottom line—I consider this cause to be highly neglected, particularly by philanthropists, and I see major gaps in the relevant fields that a philanthropist could potentially help to address.
Tractability
It’s been the case for a long time that I see this cause as important and neglected, and that my biggest reservation has been tractability. I see transformative AI as very much a future technology – I’ve argued that there is a nontrivial probability that it will be developed in the next 20 years, but it is also quite plausibly more than 100 years away, and even 20 years is a relatively long time. Working to reduce risks from a technology that is so far in the future, and about which so much is still unknown, could easily be futile.
With that said, this cause is not as unique in this respect as it might appear at first. I believe that one of the things philanthropy is best-positioned to do is provide steady, long-term support as fields and institutions grow. This activity is necessarily slow. It requires being willing to support groups based largely on their leadership and mission, rather than immediate plans for impact, in order to lay the groundwork for an uncertain future. I’ve written about this basic approach in the context of policy work, and I believe there is ample precedent for it in the history of philanthropy. It is the approach we favor for several of our other focus areas, such as immigration policy and macroeconomic stabilization policy.
And I have come to believe that there is potentially useful work to be done today that could lay the groundwork for mitigating future potential risks. In particular:
I think there are important technical challenges that could prove relevant to reducing accident risks.
Added June 24: for more on technical challenges, see Concrete Problems in AI Safety.
I’ve previously put significant weight on an argument along the lines of, “By the time transformative AI is developed, the important approaches to AI will be so different from today’s that any technical work done today will have a very low likelihood of being relevant.” My views have shifted significantly for two reasons. First, as discussed previously, I now think there is a nontrivial chance that transformative AI will be developed in the next 20 years, and that the above-quoted argument carries substantially less weight when focusing on that high-stakes potential scenario. Second, having had more conversations about open technical problems that could be relevant to reducing risks, I’ve come to believe that there is a substantial amount of work worth doing today, regardless of how long it will be until the development of transformative AI.
Potentially relevant challenges that we’ve come across so far include value learning (designing AI systems to learn the values of other agents through e.g. inverse reinforcement learning); problems having to do with making reinforcement learning systems and other AI agents less likely to behave in undesirable ways (designing reinforcement learning systems that will not try to gain direct control of their rewards, that will avoid behavior with unreasonably far-reaching impacts, and that will be robust against differences between formally specified rewards and human designers’ intentions in specifying those rewards); reliability and usability of machine learning techniques (including transparency, understandability, and robustness against or at least detection of large changes in input distribution); formal specification and verification of deep learning, reinforcement learning, and other AI systems; better theoretical understanding of desirable properties for powerful AI systems; and a variety of challenges related to an approach laid out in a series of blog posts by Paul Christiano.
Going into the details of these challenges is beyond the scope of this post, but to give a sense for non-technical readers of what a relevant challenge might look like, I will elaborate briefly on one challenge. A reinforcement learning system is designed to learn to behave in a way that maximizes a quantitative “reward” signal that it receives periodically from its environment—for example, DeepMind’s Atari player is a reinforcement learning system that learns to choose controller inputs (its behavior) in order to maximize the game score (which the system receives as “reward”), and this produces very good play on many Atari games. However, if a future reinforcement learning system’s inputs and behaviors are not constrained to a video game, and if the system is good enough at learning, a new solution could become available: the system could maximize rewards by directly modifying its reward “sensor” to always report the maximum possible reward, and by avoiding being shut down or modified back for as long as possible. This behavior is a formally correct solution to the reinforcement learning problem, but it is probably not the desired behavior. And this behavior might not emerge until a system became quite sophisticated and had access to a lot of real-world data (enough to find and execute on this strategy), so a system could appear “safe” based on testing and turn out to be problematic when deployed in a higher-stakes setting. The challenge here is to design a variant of reinforcement learning that would not result in this kind of behavior; intuitively, the challenge would be to design the system to pursue some actual goal in the environment that is only indirectly observable, instead of pursuing problematic proxy measures of that goal (such as a “hackable” reward signal).
It appears to me that work on challenges like the above is possible in the near term, and could be useful in several ways. Solutions to these problems could turn out to directly reduce accident risks from transformative AI systems developed in the future, or could be stepping stones toward techniques that could reduce these risks; work on these problems could clarify desirable properties of present-day systems that apply equally well to systems developed in the longer-term; or work on these problems today could help to build up the community of people who will eventually work on risks posed by longer-term development, which would be difficult to do in the absence of concrete technical challenges.
I preliminarily feel that there is also useful work to be done today in order to reduce future misuse risks and provide useful analysis of strategic and policy considerations.
As mentioned above, I would like to see more institutions working on considering different potential scenarios with respect to transformative AI; considering how governments, corporations, and individual researchers should react in those scenarios; and working with machine learning researchers to identify potential signs that particular scenarios are becoming more likely.
I think it’s worth being careful about funding this sort of work, since it’s possible for it to backfire. My current impression is that government regulation of AI today would probably be unhelpful or even counterproductive (for instance by slowing development of AI systems, which I think currently pose few risks and do significant good, and/or by driving research underground or abroad). If we funded people to think and talk about misuse risks, I’d worry that they’d have incentives to attract as much attention as possible to the issues they worked on, and thus to raise the risk of such premature/counterproductive regulation.
With that said, I believe that potential risks have now received enough attention – some of which has been unfortunately exaggerated in my view – that premature regulation and/or intervention by government agencies is already a live risk. I’d be interested in the possibility of supporting institutions that could provide thoughtful, credible, public analysis of whether and when government regulation/intervention would be advisable, even if it meant simply making the case against such things for the foreseeable future. I think such analysis would likely improve the quality of discussion and decision-making, relative to what will happen without it.
I also think that technical work related to accident risks – along the lines discussed above – could be indirectly useful for reducing misuse risks as well. Currently, it appears to me that different people in the field have very different intuitions about how serious and challenging accident risks are. If it turns out that there are highly promising paths to reducing accident risks – to the point where the risks look a lot less serious – this development could result in a beneficial refocusing of attention on misuse risks. (If, by contrast, it turns out that accident risks are large and present substantial technical challenges, this makes work on such risks extremely valuable.)
Other notes on tractability.
I’ve long worried that it’s simply too difficult to make meaningful statements (even probabilistic ones) about the future course of technology and its implications. However, I’ve gradually changed my view on this topic, partly due to reading I’ve done on personal time. It will be challenging to assemble and present the key data points, but I hope to do so at some point this year.
Much of our overarching goal for this cause, in the near term, is to support an increase in the number of people – particularly people with strong relevant technical backgrounds—thinking through how to reduce potential risks. Even if the specific technical, strategic and other work we support does not prove useful, helping to support a growing field in this way could be. With that said, I think we will accomplish this goal best if the people we support are doing good and plausibly useful work.
Bottom line. I think there are real questions around the extent to which there is work worth doing today to reduce potential risks from advanced artificial intelligence. That said, I see a reasonable amount of potential if there were more people and institutions focused on the relevant issues; given the importance and neglectedness of this cause, I think that’s sufficient to prioritize it highly.
Some Open-Phil-specific considerations
Networks
I consider this a challenging cause. I think it would be easy to do harm while trying to do good. For example:
Trying to raise the profile of potential risks could contribute (and, I believe, has contributed to some degree) to non-nuanced or inaccurate portrayals of risk in the media, which in turn could raise the risks of premature and/or counterproductive regulation. In addition, raising such risks (or being perceived as doing so) could—in turn—cause many AI and machine learning researchers who oppose such regulation to become hostile to the idea of discussing potential risks.
Encouraging particular lines of research without sufficient input and buy-in from leading AI and machine learning researchers could be not only unproductive but counterproductive. It could lead to people generally taking risk-focused research less seriously. And since leading researchers tend to be extremely busy, getting thorough input from them can be challenging in itself.
I think it is important for someone working in this space to be highly attentive to these risks. In my view, one of the best ways to achieve this is to be as well-connected as possible to the people who have thought most deeply about the key issues, including both the leading researchers in AI and machine learning and the people/organizations most focused on reducing long-term risks.
I believe the Open Philanthropy Project is unusually well-positioned from this perspective:
We are well-connected in the effective altruism community, which includes many of the people and organizations that have been most active in analyzing and raising awareness of potential risks from advanced artificial intelligence. For example, Daniel Dewey has previously worked at the Future of Humanity Institute and the Future of Life Institute, and has been a research associate with the Machine Intelligence Research Institute.
We are also reasonably well-positioned to coordinate with leading researchers in AI and machine learning. Daniel has some existing relationships, partly due to his work on last year’s request for proposals from the Future of Life Institute. As mentioned previously, some of us also have strong relationships with several researchers at top institutions. We have recently been reaching out to many leading researchers to discuss our plans for this cause, and have generally been within a couple of degrees of separation via our networks.
Time vs. money
One consideration that has made me hesitant about prioritizing this cause is the fact that I see relatively little in the way of truly “shovel-ready” giving opportunities. I list our likely priorities in the next section; I think they are likely to be very time-consuming for staff, and I am unsure of how long it will take before we see as many concrete giving opportunities as we do in some of our other focus areas.
By default, I prefer to prioritize causes with significant existing “shovel-ready” opportunities and minimal necessary time commitment, because I consider the Open Philanthropy Project to be short on capacity relative to funding at this stage in our development.
However, I think the case for this cause is compelling enough to outweigh this consideration, and I think a major investment of senior staff time this year could leave us much better positioned to find outstanding giving opportunities in the future.
Our plans
For the last couple of months, we have focused on:
Talking to as many people as possible in the relevant communities, particularly leading researchers in AI and machine learning, in order to get feedback on our thinking, deepen our understanding of the relevant issues, and ensure that we have open channels of communication with them. Some high-level notes from these conversations are below.
Developing our communications strategy for this topic, including this series of blog posts.
Investigating the few potential “shovel-ready grants” (by which I mean grants we can investigate and recommend with relatively low time investments) we’re aware of. We will be publishing more about these later.
Working with several technical advisors to begin to get a sense of what the most important concrete, known technical challenges are. Our hope is to get to the point of being able to offer substantial funding to support work on the most important challenges. We’re beginning with close contacts and planning to broaden the conversation about the most important technical challenges from there.
Working with close technical advisors to flesh out the key considerations around likely timelines to transformative AI. We expect to continue this work, hopefully with an increasingly broad set of researchers engaging in the discussions.
Having initial conversations about what sorts of misuse risks we should be most concerned about, and what sorts of strategic and policy considerations seem most important, in order to lay the groundwork for finding potential grantees in this category.
Seeking past cases in which philanthropists helped support the growth of technical fields, to see what we can learn.
Ultimately, we expect to seek giving opportunities in the following categories:
“Shovel-ready” grants to existing organizations and researchers focused on reducing potential risks from advanced artificial intelligence.
Supporting substantial work on the most important technical challenges related to reducing accident risks. This could take the form of funding academic centers, requests for proposals, convenings and workshops, and/or individual researchers.
Supporting thoughtful, nuanced, independent analysis seeking to help inform discussions of key strategic and policy considerations for reducing potential risks, including misuse risks.
“Pipeline building”: supporting programs, such as fellowships, that can increase the total number of people who are deeply knowledgeable about technical research on artificial intelligence and machine learning, while also being deeply versed in issues relevant to potential risks.
Other giving opportunities that we come across, including those that pertain to AI-relevant issues other than those we’ve focused on in this post (some such issues are listed above).
Getting to this point will likely require a great deal more work and discussion – internally and with the relevant communities more broadly. It could be a long time before we are recommending large amounts of giving in this area, and I think that allocating significant senior staff time to the cause will speed our work considerably.
Some overriding principles for our work
As we work in this space, we think it’s especially important to follow a few core principles:
Don’t lose sight of the potential benefits of AI, even as we focus on mitigating risks
Our work is focused on potential risks, because this is the aspect of AI research that seems most neglected at the moment. However, as stated above, I see many ways in which AI has enormous potential to improve the world, and I expect the consequences of advances in AI to be positive on balance. It is important to act and communicate accordingly.
Deeply integrate people with strong technical expertise in our work
The request for proposals we co-funded last year employed an expert review panel for selecting grantees. We wouldn’t have participated if it had involved selecting grantees ourselves with nontechnical staff. We believe that AI and machine learning researchers are the people best positioned to make many assessments that will be important to us, such as which technical problems seem tractable and high-potential and which researchers have impressive accomplishments.
Seek a lot of input, and reflect a good deal, before committing to major grants and other activities
As stated above, I consider this a challenging cause, where well-intentioned actions could easily do harm. We are seeking to be thoroughly networked and to seek substantial advice on our activities from a range of people, both AI and machine learning researchers and people focused on reduction of potential risks.
Support work that could be useful in a variety of ways and in a variety of scenarios, rather than trying to make precise predictions
I don’t think it’s possible to have certainty, today, about when we should expect transformative AI, what form we should expect it to take, and/or what the consequences will be. We have a preference for supporting work that seems robustly likely to be useful. In particular, one of our main goals is to support an increase in the number of people – particularly people with strong relevant technical backgrounds—dedicated to thinking through how to reduce potential risks.
Distinguish between lower-stakes, higher-stakes, and highest-stakes potential risks
There are many imaginable risks of advanced artificial intelligence. Our focus is likely to be on those that seem to have the very highest stakes, to the point of being potential global catastrophic risks. In our view currently, that means misuse risks and accident risks involving transformative AI. We also consider neglectedness (we prefer to work on risks receiving less attention from others) and tractability (we prefer to work on risks where it seems there is useful work to be done today that can help mitigate them).
Notes on AI and machine learning researchers’ views on the topics discussed here
Over the last couple of months, we have been reaching out to AI and machine learning researchers that we don’t already have strong relationships with in order to discuss our plans and background views and get their feedback. We have put particular effort into seeking out skeptics and potential critics. As of today, we have requested 35 conversations along these lines and had 25. About three-fourths of these conversations have been with tenure-track academics or senior researchers at private labs, and the remainder have been with students or junior researchers at top AI and machine learning departments and private labs.
We’ve heard a diverse set of perspectives. Conversations were in confidence and often time-constrained, so we wouldn’t feel comfortable attributing specific views to specific people. Speaking generally, however, it seems to us that:
We encountered fewer strong skeptics of this cause than we expected to, given our previous informal impression that there are many researchers who are dismissive of potential risks from advanced artificial intelligence. That said, we spoke to a couple of highly skeptical researchers, and a few high-profile researchers who we think might be highly skeptical declined to speak with us.
Most of the researchers we talked to did not seem to have spent significant time or energy engaging with questions around potential risks from advanced artificial intelligence. To the extent they had views, most of the people we talked to seemed generally supportive of the views and goals we’ve laid out in this post (though this does not at all mean that they would endorse everything we’ve said).
Overall, these conversations caused us to update slightly positively on the promise of this cause and our plans. We hope to have many more conversations with AI and machine learning researchers in the coming months to deepen our understanding of the different perspectives in the field.
Risks and reservations
I see much room for debate in the decision to prioritize this cause as highly as we are. I have discussed most of the risks and reservations I see in this post and the ones preceding it. Here I list the major ones in one place. In this section, my goal is to provide a consolidated list of risks and reservations, but not necessarily to give my comprehensive take on each.
As discussed previously, I assign a nontrivial probability (at least 10% with moderate robustness, at least 1% with high robustness) to the development of transformative AI within the next 20 years. I feel I have thought deeply about this question, with access to strong technical advisors, and that we’ve collected what information we can, though I haven’t been able to share all important inputs into my thinking publicly. I recognize that our information is limited, and my take is highly debatable.
I see a risk that our thinking is distorted by being in an “echo chamber,” and that our views on the importance of this cause are overly reinforced by our closest technical advisors and by the effective altruism community. I’ve written previously about why I don’t consider this a fatal concern, but it does remain a concern.
I do not want to exacerbate what I see as an unfortunate pattern, to date, of un-nuanced and inaccurate media portrayals of potential risks from advanced artificial intelligence. I think this could lead to premature and/or counterproductive regulation, among other problems. We hope to communicate about our take on this cause with enough nuance to increase interest in reducing risks, without causing people to view AI as more threatening than positive.
I think the case that this cause is neglected is fairly strong, but leaves plenty of room for doubt. In particular, the cause has received attention from some high-profile people, and multiple well-funded AI labs and many AI researchers have expressed interest in doing what they can to reduce potential risks. It’s possible that they will end up pursuing essentially all relevant angles, and that the activities listed above will prove superfluous.
I’m mindful of the possibility that it might be futile to make meaningful predictions, form meaningful plans, and do meaningful work to reduce fairly far-off and poorly-understood potential risks.
I recognize that it’s debatable how important accident risks are. It’s possible that preventing truly catastrophic accidents will prove to be relatively easy, and that early work will look in hindsight like a poor use of resources.
With all of the above noted, I think it is important that a philanthropist in our position be willing to take major risks, and prioritizing this cause is one that I see as very worth taking.
Notes
I’m not in a position to support this claim very systematically, but we have done a substantial amount of investigation and discussion of various aspects of scientific research, as discussed in our recent annual review. In a previous post, I addressed what I see as the most noteworthy other possible major developments in the next 20 years.
Here I mean that it scores significantly higher by this criterion than the vast majority of causes, not that it stands entirely alone. I think there are a few other causes that have comparable importance, though none that I think have greater importance, as we’ve defined it.
We’ve been accumulating case studies via our History of Philanthropy project, and we expect to publish an updated summary of what we know by the end of 2016. For now, there is some information available at our History of Philanthropy page and in a recent blog post.
See our previous post regarding artificial intelligence generally. See our writeup on a 2015 grant to support a request for proposals regarding potential risks.