As we announced back in October, I have taken on the senior leadership role at MIRI as its CEO. It’s a big pair of shoes to fill, and an awesome responsibility that I’m honored to take on.
We think it’s very unlikely that the AI alignment field will be able to make progress quickly enough to prevent human extinction and the loss of the future’s potential value, that we expect will result from loss of control to smarter-than-human AI systems.
However, developments this past year like the release of ChatGPT seem to have shifted the Overton window in a lot of groups. There’s been a lot more discussion of extinction risk from AI, including among policymakers, and the discussion quality seems greatly improved.
This provides a glimmer of hope. While we expect that more shifts in public opinion are necessary before the world takes actions that sufficiently change its course, it now appears more likely that governments could enact meaningful regulations to forestall the development of unaligned, smarter-than-human AI systems. It also seems more possible that humanity could take on a new megaproject squarely aimed at ending the acute risk period.
As such, in 2023, MIRI shifted its strategy to pursue three objectives:
Policy: Increase the probability that the major governments of the world end up coming to some international agreement to halt progress toward smarter-than-human AI, until humanity’s state of knowledge and justified confidence about its understanding of relevant phenomena has drastically changed; and until we are able to secure these systems such that they can’t fall into the hands of malicious or incautious actors.[2]
Communications: Share our models of the situation with a broad audience, especially in cases where talking about an important consideration could help normalize discussion of it.
Research: Continue to invest in a portfolio of research. This includes technical alignment research (though we’ve become more pessimistic that such work will have time to bear fruit if policy interventions fail to buy the research field more time), as well as research in support of our policy and communications goals.[3]
We see the communications work as instrumental support for our policy objective. We also see candid and honest communication as a way to bring key models and considerations into the Overton window, and we generally think that being honest in this way tends to be a good default.
Although we plan to pursue all three of these priorities, it’s likely that policy and communications will be a higher priority for MIRI than research going forward.[4]
The rest of this post will discuss MIRI’s trajectory over time and our current strategy. In one or more future posts, we plan to say more about our policy/comms efforts and our research plans.
Note that this post will assume that you’re already reasonably familiar with MIRI and AGI risk; if you aren’t, I recommend checking out Eliezer Yudkowsky’s recent short TED talk,
along with some of the resources cited on the TED page:
Throughout its history, MIRI’s goal has been to ensure that the long-term future goes well, with a focus on increasing the probability that humanity can safely navigate the transition to a world with smarter-than-human AI. If humanity can safely navigate the emergence of these systems, we believe this will lead to unprecedented levels of prosperity.
How we’ve approached that mission has varied a lot over the years.
When MIRI was first founded by Eliezer Yudkowsky and Brian and Sabine Atkins in 2000, its goal was to try to accelerate to smarter-than-human AI as quickly as possible, on the assumption that greater-than-human intelligence entails greater-than-human morality. In the course of looking into the alignment problem for the first time (initially called “the Friendly AI problem”), Eliezer came to the conclusion that he was completely wrong about “greater intelligence implies greater morality”, and MIRI shifted its focus to the alignment problem around 2003.
MIRI has continued to revise its strategy since then. In ~2006–2012, our primary focus was on trying to establish communities and fields of inquiry focused on existential risk reduction and domain-general problem-solving ability (e.g., the rationality community). Starting in 2013, our focus was on Agent Foundations research and trying to ensure that the nascent AI alignment field outside MIRI got off to a good start. In 2017–2020, we shifted our primary focus to a new engineering-heavy set of alignment research directions.
Now, after several years of reorienting and exploring different options, we’re making policy and communications our top focus, as that currently seems like the most promising way to serve our mission.
At a high level, MIRI is a place where folks who care deeply about humanity and its future, and who share a loose set of intuitions about the challenges we face in safely navigating the development of smarter-than-human AI systems, have teamed up to work towards a brighter future. In the spirit of “keep your identity small”, we don’t want to build more content than that into MIRI-the-organization: if an old “stereotypical” MIRI belief or strategy turns out to be wrong, we should just ditch the failed belief/strategy and move on.
With that context in view, I’ll next say a bit about what, concretely, MIRI has been up to recently, and what we plan to do next.
MIRI in 2021–2022
In our last strategy update (posted in December 2020), Nate Soares wrote that the research push we began in 2017 “has, at this point, largely failed, in the sense that neither Eliezer nor I have sufficient hope in it for us to continue focusing our main efforts there[...] We are currently in a state of regrouping, weighing our options, and searching for plans that we believe may yet have a shot at working.”
In 2021–2022, we focused on:
the aforementioned process of trying to find more promising plans,
smaller-scale support for our Agent Foundations and our 2017-initiated research programs (along with some exploration of other research ideas),[5]
dialoguing with other people working on AI x-risk, and
Our sense is that these and other MIRI write-ups were helpful for better communicating our perspective on the strategic landscape, which in turn is important for understanding how we’ve changed our strategy. Some of the main points we tried to communicate were:
Our default expectation is that humanity will soon develop AI systems that are smarter than humans.[7] We think the world isn’t ready for this, and that rushing ahead will very likely cause human extinction and the destruction of the future’s potential value.
The world’s situation currently looks extremely dire to us. For example, Eliezer wrote in late 2021, “I consider the present gameboard to look incredibly grim, and I don’t actually see a way out through hard work alone. We can hope there’s a miracle that violates some aspect of my background model, and we can try to prepare for that unknown miracle”.
The primary reason we think the situation is dire is because humanity’s current technical understanding of how to align smarter-than-human systems seems vastly lower than what’s likely to be required. Not much progress has been made to date (relative to what’s required), including at MIRI; and we see the field to date as mostly working on topics orthogonal to the core difficulties, or approaching the problem in a way that assumes away these core difficulties.
It would be a mistake to give up, but it would also be a mistake to start piling on optimistic assumptions and mentally living in a more-optimistic (but not real) world. If, somehow, we achieve a breakthrough, it’s likely to come from an unexpected direction, and we’re likely to be in a better position to take advantage of it if we’ve kept in touch with reality.
In his TIME article, Eliezer Yudkowsky describes what he sees as the sort of policy that would be minimally required in order for governments to prevent the world from destroying itself: an indefinite worldwide moratorium on new large training runs, enforced by an international agreement with actual teeth. The LessWrong mirror of Eliezer’s TIME piece goes into more detail on this, adding several clarifying addenda.
In the absence of sufficient political will/consensus to impose such a moratorium, we think the best policy objective to focus on would be building an “off switch”—that is, assembling the legal and technical capability needed to make it possible to shut down a dangerous project or impose an indefinite moratorium on the field, if policymakers decide at a future date that it’s necessary to do so. Having the option to shut things down would be an important step in the right direction.
The difficulties of proactively enacting and effectively enforcing such an international agreement are obvious to the point that, until recently, MIRI had much more hope in finding some AI-alignment-mediated solution to AGI proliferation, in spite of how little relevant technical progress has been made to date.
However, a combination of recent developments have changed our minds about this. First, our hope in anyone finding technical solutions in time has declined, and second, the moderate success of our communications efforts has increased our hope in that direction.
Although we still think the situation looks bleak, it now looks a little less bleak than we thought it did a year ago; and the source of this new hope lies in the way the public conversation about AI has recently shifted.
New developments in 2023
In the past, MIRI has mostly spent its time on alignment research and outreach to technical audiences—that is, outreach to the sort of people who might do relevant technical work.
Several developments this year have updated us toward thinking that we should prioritize outreach activities with an emphasis on influencing policymakers and groups that may influence policymakers, including AI researchers and the general public:
1. GPT-3.5 and GPT-4 were more impressive than some of us expected. We already had short timelines, but these launches were a further pessimistic update for some of us about how plausible it is that humanity could build world-destroying AGI with relatively few (or no) additional algorithmic advances.
2. The general public and policymakers have been more receptive to arguments about existential risk from AGI than we expected, and their responses have been pretty reasonable. For example, we were positively surprised by the reception to Eliezer’s February interview on Bankless and his March piece in TIME magazine (which was TIME’s highest-traffic page for a week). Eliezer’s piece in TIME mentions that we’d been surprised by how many non-specialists’ initial reactions to hearing about AI risk were quite reasonable and grounded.
3. More broadly, there has been a shift in the Overton window toward “take extinction risk from AGI more seriously”, including within ML. Geoffrey Hinton and Yoshua Bengio’s public statements seemed pivotal here; likewise the one-sentence statement signed by the CEOs of DeepMind, Anthropic, and OpenAI, and by hundreds of other AI scientists and public intellectuals:
Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.
A recent example of this 2023 shift was my participation in one of the U.S. Senate’s bipartisan AI Insight Forums. It was heartening to see how far the discourse has come—Leader Schumer opened the event by asking attendees for our probability that AI could lead to a doomsday scenario, using the term “p(doom)”, while Senators and their staff listened and took notes. We would not have predicted this level of interest and reception at the beginning of the year.
Even if policymakers turn out not to be immediately receptive to arguments for halting large training runs, this may still be a critical time for establishing precedents and laying policy groundwork that could be built on if policymakers and their constituents become more alarmed in the future, e.g., in the unfortunate event of a major, but not existential, AI related disaster.
Policy efforts like this seem very unlikely to save us, but all other plans we know of seem even less likely to succeed. As a result, we’ve started building capacity for our public outreach efforts, and ramping up these efforts.
Looking forward
In the coming year, we plan to continue our policy, communications, and research efforts. We are somewhat limited in our ability to enact policy programs by our 501(c)3 status, but we work closely with others at organizations with different structures that are more free to engage in advocacy work. We are growing our communications team and expanding our capacity to broadcast the basic arguments about AI x-risk, and of course, we’re continuing our alignment research programs and expanding the research team to communicate better about our results as well as exploring new ideas.
If you are interested in working directly with MIRI as we grow, please watch our careers page for job listings.
As Nate has written about in Superintelligent AI is necessary for an amazing future, but far from sufficient, we would consider it an enormous tragedy if humanity never developed artificial superintelligence. However, regulators may have a difficult time determining when we’ve reached the threshold “it’s now safe to move forward on AI capabilities.”
One alternative, proposed by Nate, would be for researchers to stop trying to pursue de novo AGI, and instead pursue human whole-brain emulation or human cognitive enhancement. This helps largely sidestep the issue of bureaucratic legibility, since the risks are far lower and success criteria are a lot clearer; and it could allow us to realize many of the near-term benefits of aligned AGI (e.g., for existential risk reduction).
Various people at MIRI have different levels of hope about this. Nate and Eliezer both believe that humanity should not be attempting technical alignment at its current level of cognitive ability, and should instead pursue human cognitive enhancement (e.g., via uploading), and then having smarter (trans)humans figure out alignment.
I personally would like to note my dissenting perspective on this overall choice.
While I agree MIRI can contribute in the short term with a comms and policy push towards effective regulation, in the medium to long term I think research is our greater comparative advantage and I think we should keep substantial focus here (as well as increase empirical research staff). We should continue doing research but shift our focus more towards technical work which can support regulatory efforts (ex. safety standards, etc), including empirical work but also for example drawing on our Agent Foundations experience to produce theoretical frameworks. I think stronger technical (ideally empirically grounded) research arguments targeting scientific government advisors, regulators and lab decision makers (rather than public-oriented comms using philosophical arguments) on why there is AI risk and why mitigation makes sense, are more critically missing from the picture and a component MIRI can perhaps uniquely contribute. I also personally expect more impact to come from influencing lab decision makers and creating more of an academic/research consensus on safety risks rather than hoping for substantial regulatory success in the shorter term of the next 1–3 years.
Further, solving critical open questions in the safety standards space on how to regulate using metrics (and how to do this scientifically and correctly) seems to me a priority for at least the next 1–2 years. I think it’s important to have a diversity of perspectives including non-lab organizations like MIRI creating this knowledge base.
We also helped set up a co-working space and related infrastructure for other AI x-risk orgs like Redwood Research. We’re fans of Redwood, and often direct researchers and engineers to apply to work there if they don’t obviously fit the far more unusual and constrained research niches at MIRI.
Where “soon” means, roughly, “there’s a lot of uncertainty here, but it’s a very live possibility that AGI is only a few years away; and it no longer seems likely to be (for example) 30+ years away.” In a fall 2023 poll of most MIRI researchers, we expect AGI (according to the definition from this Metaculus market) in a median of 9 years and a mean of 14.6 years. One researcher was an outlier at 52 years; the majority predicted under ten years.
MIRI 2024 Mission and Strategy Update
As we announced back in October, I have taken on the senior leadership role at MIRI as its CEO. It’s a big pair of shoes to fill, and an awesome responsibility that I’m honored to take on.
There have been several changes at MIRI since our 2020 strategic update, so let’s get into it.[1]
The short version:
We think it’s very unlikely that the AI alignment field will be able to make progress quickly enough to prevent human extinction and the loss of the future’s potential value, that we expect will result from loss of control to smarter-than-human AI systems.
However, developments this past year like the release of ChatGPT seem to have shifted the Overton window in a lot of groups. There’s been a lot more discussion of extinction risk from AI, including among policymakers, and the discussion quality seems greatly improved.
This provides a glimmer of hope. While we expect that more shifts in public opinion are necessary before the world takes actions that sufficiently change its course, it now appears more likely that governments could enact meaningful regulations to forestall the development of unaligned, smarter-than-human AI systems. It also seems more possible that humanity could take on a new megaproject squarely aimed at ending the acute risk period.
As such, in 2023, MIRI shifted its strategy to pursue three objectives:
Policy: Increase the probability that the major governments of the world end up coming to some international agreement to halt progress toward smarter-than-human AI, until humanity’s state of knowledge and justified confidence about its understanding of relevant phenomena has drastically changed; and until we are able to secure these systems such that they can’t fall into the hands of malicious or incautious actors.[2]
Communications: Share our models of the situation with a broad audience, especially in cases where talking about an important consideration could help normalize discussion of it.
Research: Continue to invest in a portfolio of research. This includes technical alignment research (though we’ve become more pessimistic that such work will have time to bear fruit if policy interventions fail to buy the research field more time), as well as research in support of our policy and communications goals.[3]
We see the communications work as instrumental support for our policy objective. We also see candid and honest communication as a way to bring key models and considerations into the Overton window, and we generally think that being honest in this way tends to be a good default.
Although we plan to pursue all three of these priorities, it’s likely that policy and communications will be a higher priority for MIRI than research going forward.[4]
The rest of this post will discuss MIRI’s trajectory over time and our current strategy. In one or more future posts, we plan to say more about our policy/comms efforts and our research plans.
Note that this post will assume that you’re already reasonably familiar with MIRI and AGI risk; if you aren’t, I recommend checking out Eliezer Yudkowsky’s recent short TED talk,
along with some of the resources cited on the TED page:
“A.I. Poses ‘Risk of Extinction,’ Industry Leaders Warn”, New York Times
“We must slow down the race to god-like AI”, Financial Times
“Pausing AI Developments Isn’t Enough. We Need to Shut it All Down”, TIME
“AGI Ruin: A List of Lethalities”, AI Alignment Forum
MIRI’s mission
Throughout its history, MIRI’s goal has been to ensure that the long-term future goes well, with a focus on increasing the probability that humanity can safely navigate the transition to a world with smarter-than-human AI. If humanity can safely navigate the emergence of these systems, we believe this will lead to unprecedented levels of prosperity.
How we’ve approached that mission has varied a lot over the years.
When MIRI was first founded by Eliezer Yudkowsky and Brian and Sabine Atkins in 2000, its goal was to try to accelerate to smarter-than-human AI as quickly as possible, on the assumption that greater-than-human intelligence entails greater-than-human morality. In the course of looking into the alignment problem for the first time (initially called “the Friendly AI problem”), Eliezer came to the conclusion that he was completely wrong about “greater intelligence implies greater morality”, and MIRI shifted its focus to the alignment problem around 2003.
MIRI has continued to revise its strategy since then. In ~2006–2012, our primary focus was on trying to establish communities and fields of inquiry focused on existential risk reduction and domain-general problem-solving ability (e.g., the rationality community). Starting in 2013, our focus was on Agent Foundations research and trying to ensure that the nascent AI alignment field outside MIRI got off to a good start. In 2017–2020, we shifted our primary focus to a new engineering-heavy set of alignment research directions.
Now, after several years of reorienting and exploring different options, we’re making policy and communications our top focus, as that currently seems like the most promising way to serve our mission.
At a high level, MIRI is a place where folks who care deeply about humanity and its future, and who share a loose set of intuitions about the challenges we face in safely navigating the development of smarter-than-human AI systems, have teamed up to work towards a brighter future. In the spirit of “keep your identity small”, we don’t want to build more content than that into MIRI-the-organization: if an old “stereotypical” MIRI belief or strategy turns out to be wrong, we should just ditch the failed belief/strategy and move on.
With that context in view, I’ll next say a bit about what, concretely, MIRI has been up to recently, and what we plan to do next.
MIRI in 2021–2022
In our last strategy update (posted in December 2020), Nate Soares wrote that the research push we began in 2017 “has, at this point, largely failed, in the sense that neither Eliezer nor I have sufficient hope in it for us to continue focusing our main efforts there[...] We are currently in a state of regrouping, weighing our options, and searching for plans that we believe may yet have a shot at working.”
In 2021–2022, we focused on:
the aforementioned process of trying to find more promising plans,
smaller-scale support for our Agent Foundations and our 2017-initiated research programs (along with some exploration of other research ideas),[5]
dialoguing with other people working on AI x-risk, and
writing up many of our background views.[6]
The most central write-ups from this period include:
Late 2021 MIRI Conversations
AGI Ruin: A List of Lethalities
A central AI alignment problem: capabilities generalization, and the sharp left turn
Six Dimensions of Operational Adequacy in AGI Projects
Our sense is that these and other MIRI write-ups were helpful for better communicating our perspective on the strategic landscape, which in turn is important for understanding how we’ve changed our strategy. Some of the main points we tried to communicate were:
Our default expectation is that humanity will soon develop AI systems that are smarter than humans.[7] We think the world isn’t ready for this, and that rushing ahead will very likely cause human extinction and the destruction of the future’s potential value.
The world’s situation currently looks extremely dire to us. For example, Eliezer wrote in late 2021, “I consider the present gameboard to look incredibly grim, and I don’t actually see a way out through hard work alone. We can hope there’s a miracle that violates some aspect of my background model, and we can try to prepare for that unknown miracle”.
The primary reason we think the situation is dire is because humanity’s current technical understanding of how to align smarter-than-human systems seems vastly lower than what’s likely to be required. Not much progress has been made to date (relative to what’s required), including at MIRI; and we see the field to date as mostly working on topics orthogonal to the core difficulties, or approaching the problem in a way that assumes away these core difficulties.
It would be a mistake to give up, but it would also be a mistake to start piling on optimistic assumptions and mentally living in a more-optimistic (but not real) world. If, somehow, we achieve a breakthrough, it’s likely to come from an unexpected direction, and we’re likely to be in a better position to take advantage of it if we’ve kept in touch with reality.
In his TIME article, Eliezer Yudkowsky describes what he sees as the sort of policy that would be minimally required in order for governments to prevent the world from destroying itself: an indefinite worldwide moratorium on new large training runs, enforced by an international agreement with actual teeth. The LessWrong mirror of Eliezer’s TIME piece goes into more detail on this, adding several clarifying addenda.
In the absence of sufficient political will/consensus to impose such a moratorium, we think the best policy objective to focus on would be building an “off switch”—that is, assembling the legal and technical capability needed to make it possible to shut down a dangerous project or impose an indefinite moratorium on the field, if policymakers decide at a future date that it’s necessary to do so. Having the option to shut things down would be an important step in the right direction.
The difficulties of proactively enacting and effectively enforcing such an international agreement are obvious to the point that, until recently, MIRI had much more hope in finding some AI-alignment-mediated solution to AGI proliferation, in spite of how little relevant technical progress has been made to date.
However, a combination of recent developments have changed our minds about this. First, our hope in anyone finding technical solutions in time has declined, and second, the moderate success of our communications efforts has increased our hope in that direction.
Although we still think the situation looks bleak, it now looks a little less bleak than we thought it did a year ago; and the source of this new hope lies in the way the public conversation about AI has recently shifted.
New developments in 2023
In the past, MIRI has mostly spent its time on alignment research and outreach to technical audiences—that is, outreach to the sort of people who might do relevant technical work.
Several developments this year have updated us toward thinking that we should prioritize outreach activities with an emphasis on influencing policymakers and groups that may influence policymakers, including AI researchers and the general public:
1. GPT-3.5 and GPT-4 were more impressive than some of us expected. We already had short timelines, but these launches were a further pessimistic update for some of us about how plausible it is that humanity could build world-destroying AGI with relatively few (or no) additional algorithmic advances.
2. The general public and policymakers have been more receptive to arguments about existential risk from AGI than we expected, and their responses have been pretty reasonable. For example, we were positively surprised by the reception to Eliezer’s February interview on Bankless and his March piece in TIME magazine (which was TIME’s highest-traffic page for a week). Eliezer’s piece in TIME mentions that we’d been surprised by how many non-specialists’ initial reactions to hearing about AI risk were quite reasonable and grounded.
3. More broadly, there has been a shift in the Overton window toward “take extinction risk from AGI more seriously”, including within ML. Geoffrey Hinton and Yoshua Bengio’s public statements seemed pivotal here; likewise the one-sentence statement signed by the CEOs of DeepMind, Anthropic, and OpenAI, and by hundreds of other AI scientists and public intellectuals:
A recent example of this 2023 shift was my participation in one of the U.S. Senate’s bipartisan AI Insight Forums. It was heartening to see how far the discourse has come—Leader Schumer opened the event by asking attendees for our probability that AI could lead to a doomsday scenario, using the term “p(doom)”, while Senators and their staff listened and took notes. We would not have predicted this level of interest and reception at the beginning of the year.
Even if policymakers turn out not to be immediately receptive to arguments for halting large training runs, this may still be a critical time for establishing precedents and laying policy groundwork that could be built on if policymakers and their constituents become more alarmed in the future, e.g., in the unfortunate event of a major, but not existential, AI related disaster.
Policy efforts like this seem very unlikely to save us, but all other plans we know of seem even less likely to succeed. As a result, we’ve started building capacity for our public outreach efforts, and ramping up these efforts.
Looking forward
In the coming year, we plan to continue our policy, communications, and research efforts. We are somewhat limited in our ability to enact policy programs by our 501(c)3 status, but we work closely with others at organizations with different structures that are more free to engage in advocacy work. We are growing our communications team and expanding our capacity to broadcast the basic arguments about AI x-risk, and of course, we’re continuing our alignment research programs and expanding the research team to communicate better about our results as well as exploring new ideas.
If you are interested in working directly with MIRI as we grow, please watch our careers page for job listings.
Thanks to Rob Bensinger, Gretta Duleba, Matt Fallshaw, Alex Vermeer, Lisa Thiergart, and Nate Soares for your valuable thoughts on this post.
As Nate has written about in Superintelligent AI is necessary for an amazing future, but far from sufficient, we would consider it an enormous tragedy if humanity never developed artificial superintelligence. However, regulators may have a difficult time determining when we’ve reached the threshold “it’s now safe to move forward on AI capabilities.”
One alternative, proposed by Nate, would be for researchers to stop trying to pursue de novo AGI, and instead pursue human whole-brain emulation or human cognitive enhancement. This helps largely sidestep the issue of bureaucratic legibility, since the risks are far lower and success criteria are a lot clearer; and it could allow us to realize many of the near-term benefits of aligned AGI (e.g., for existential risk reduction).
Various people at MIRI have different levels of hope about this. Nate and Eliezer both believe that humanity should not be attempting technical alignment at its current level of cognitive ability, and should instead pursue human cognitive enhancement (e.g., via uploading), and then having smarter (trans)humans figure out alignment.
Lisa comments:
In the case of our Agent Foundations research team, team size stayed the same, but we didn’t put any effort into trying to expand the team.
We also helped set up a co-working space and related infrastructure for other AI x-risk orgs like Redwood Research. We’re fans of Redwood, and often direct researchers and engineers to apply to work there if they don’t obviously fit the far more unusual and constrained research niches at MIRI.
Where “soon” means, roughly, “there’s a lot of uncertainty here, but it’s a very live possibility that AGI is only a few years away; and it no longer seems likely to be (for example) 30+ years away.” In a fall 2023 poll of most MIRI researchers, we expect AGI (according to the definition from this Metaculus market) in a median of 9 years and a mean of 14.6 years. One researcher was an outlier at 52 years; the majority predicted under ten years.