What if AI development goes well?

Summary: This post argues that three common objections to an approach in AI governance called AI ideal governance are misguided. Firstly, developing AI ideal governance theories is not necessarily an anti-pluralistic activity or one that promotes an irrational focus on the ends of our actions. Secondly, AI ideal governance theories are not necessarily de-motivating. Thirdly, AI ideal governance theories can be useful in developing plans for action.

Most of the discussion of advanced AI in the EA community has been focussed on what could go wrong, from the possibility of AI cementing authoritarianism to it leading to outright human extinction. However, if we manage to safely develop AI, there is also a fair amount that can go right.

The extent to which life in the future could be ‘profoundly good’ seems to be something that’s often neglected. The future may well include vast amounts of wealth, the expansion of the liveable galaxy, or even the development of the human form. Nonetheless, I sympathise with those who might initially feel somewhat reluctant about focussing on what could go right, when what could go wrong seems like the more pressing issue. With this in mind, this post will assess three possible arguments for scepticism about an approach in the field of AI governance that focuses on what could go right called AI ideal governance. I conclude that these arguments fall short, and that AI ideal governance is a valuable approach to thinking about AI.

As a field, AI governance focuses on the norms, institutions and policies that are required to ensure the safe development of AI.[1] The sub-field of AI ideal governance was first identified by Allan Dafoe in his 2018 research agenda. It considers what the institutions and norms governing AI should look like in the future, rather than what they look like now. For example, imagine that we wanted to assess the role of private research labs in the safe development of AI. A traditional approach in AI governance may begin by evaluating the current regulation and incentive structures of these institutions. In contrast, a potential AI ideal governance approach might take a step back and ask questions such as: What would a private research lab committed to AI safety have in its constitution? Which norms or values would it be guided by?[2]

The term ‘ideal’ may conjure an association with utopianism, which, for my purposes, can simply be understood as the approach of developing positive blueprints for institutions in the far future.[3] Utopianism has a long and storied history, from the terms coinage by Thomas More through to socialist utopianism and even the technological aspirations of Silicon Valley.[4] However, I don’t think that all models developed through AI ideal governance need be utopian. This is because we can be more or less realistic about our ideal model. For example, in his recent post on ideal governance, Holden Karnofsky discusses the method of sortition as a possible means to improve the governance of certain political institutions. I think it’s a stretch to suggest that this is a utopian suggestion; part of the reason we may favour sortition is to help address people’s imperfections or inadequacies. Nonetheless, it’s pretty clearly grounded in a view of what such institutions should look like in the future, so it’s fair to call it a discussion in ideal governance.

Objection 1. AI ideal governance theories are dangerous

The most stark and immediately pressing objection to AI ideal governance theories is that they are dangerous. Historically, this objection has an intellectual lineage to arguments against utopian thought by liberal thinkers during the Cold War, who commonly associated utopianism with socialist ideas and the Soviet Union. There are multiple reasons why someone could think that AI ideal governance theories are dangerous, but I’ll address two fairly common ones here. Both of these arguments focus on the more utopian AI ideal governance theories. I’m interested in arguments for why there’s something about (at least certain kinds of) ideal theories that makes them intrinsically dangerous, as to avoid a historical debate about whether particular uses of ideal theory (e.g., in the 20th century utopian political projects) have been dangerous. Although I ultimately conclude that AI ideal governance theories are not intrinsically dangerous, I do think there are important lessons to be learnt from each objection that can help us develop better plans.

1.1. AI ideal governance theories are anti-pluralistic

Wittgenstein, Elizabeth Taylor, Bertrand Russell, Thomas Merton, Yogi Berra, Allen Ginsburg, Harry Wolfson, Thoreau, Casey Stengel, The Lubavitcher Rebbe, Picasso, Moses, Einstein, Hugh Heffner, Socrates, Henry Ford, Lenny Bruce, Baba Ram Dass, Gandhi, Sir Edmund Hillary, Raymond Lubitz, Buddha, Frank Sinatra, Columbus, Freud, Norman Mailer, Ayn Rand, Baron Rothschild, Ted Williams, Thomas Edison, H.L. Mencken, Thomas Jefferson, Ralph Ellison, Bobby Fischer, Emma Goldman, Peter Kropotkin, you, and your parents. Is there really one kind of life which is best for each of these people?

The above quote, dated cultural figures and all, comes from the (often ignored) final section of Robert Nozick’s famous Anarchy, State and Utopia (1999, p. 310). It nicely illustrates the pluralism issue by highlighting the difficulty of drawing up ideal institutions that govern a large group of diverse individuals. To the extent that some AI ideal governance theories draw up such institutions, they might face the same problem. Previous utopians often haven’t grappled with this fact, contributing to the impression that utopias often don’t allow for people to live different kinds of lives. This is clearly a moral problem for those with liberal intuitions about the importance of respecting people’s choices to live their life as they please. Those sympathetic to Nick Bostrom’s views about population growth might also think that this pluralism issue is particularly pressing when thinking about the long-term future. 10^58 is a large number of sentient beings, and if a future governance structure required them to sacrifice their individuality, this could lead to a large amount of suffering.[5]

Although there have been many utopias that don’t pay enough attention to people’s different lives, I struggle to see why this is a necessary problem of utopianism or the AI ideal governance approach more generally. Firstly, it’s not clear that this objection would immediately apply to many of the institutions involved in AI governance. For example, one could focus on drawing up plans about institutions without creating guidelines about people’s personal lives, perhaps by making use of a liberal distinction based on neutrality between different conceptions of the good. Secondly, there could be utopias that aim to maximise individual freedoms and that thereby allow people to live the lives that they choose. For example, in an earlier study on utopianism, Karnofsky drew up a number of possible utopias and performed a poll to find out which ones they’d most like to live in. The utopia that was ‘reminiscent of Nozick’s’ went as follows :

Everything is set up to give people freedom. If you aren’t interfering with someone else’s life, you can do whatever you want. People can sell anything, buy anything, choose their daily activities, and choose the education their children receive. Thanks to advanced technology and wealth, in this world everyone can afford whatever they want (education, food, housing, entertainment, etc.) Everyone feels happy, wealthy, and fulfilled, with strong friendships and daily activities that they enjoy.

Examples like this suggest that it is possible to create utopias that respect pluralism. Incidentally, the results of Karnofsky’s poll vindicates the view that utopias that don’t respect pluralism aren’t popular. The Nozickian utopia that respects freedom did much better than its less liberal alternatives. Though I think the anti-pluralism objection that I addressed in this section doesn’t give us reason to abandon AI ideal governance, I do think being mindful of the fact that anti-pluralistic utopias often aren’t popular can help us be better at planning for our future.

1.2. AI ideal governance theories promote a misguided focus on the ends of our actions

Another possible worry one might have about AI ideal governance theories is that aiming for certain fixed destinations might promote a misguided focus on the ends of our actions. Developing utopias could help justify the neglect (or even the sanctioning) of suffering in the nearer term, when no justification is really possible. This is the objection that comes closest to the concerns of many Cold War liberals. Famously, Popper calls utopianism an ‘all too attractive theory’, highlighting that it ‘leads to violence’.[6] It’s beyond the scope of this paper to assess whether putting ends before means could ever be justified (for example through a form of utilitarianism), and instead I’m going to focus on how AI ideal governance theories might promote a form of irrationality.[7]

One reason that AI ideal theories might promote irrationality is by painting an intoxicating vision of the future that is so appealing that it encourages us to forego traditional safeguards and sacrifice other concerns. More utopian ideal theories are often presented through fiction, or through a practice of worldbuilding. A good writer able to create a tangible and coherent world may encourage a feeling of longing from the reader.[8]

I think there are a couple of counter-tendencies that should lead us to think that this concern is overstated. Firstly, there is no reason why we should consider utopias as a fixed plan rather than as a possible destination that could be updated if we think of something better.[9] The misunderstanding that utopias have to function as an unchangeable final destination may result from the historical context of utopias being used as a legitimating ideology, but in a different context utopias need not have such rigidity.

Secondly, AI ideal governance theories could help tackle some existing irrationalities. This could be achieved by framing an issue in a new light, or illustrating the possibility that a particular set of actions opens up. One way in which ideal theories may actually boost rationality is by functioning similarly to philosophical thought experiments.[10] Such experiments might allow us to further understand the institutional implications of maximising the presence of a certain virtue, or the complexity involved in balancing different desires for a future civilization.

The two tendencies I raise here cannot guarantee that a particular AI ideal governance theory might not inspire irrational action on the part of its reader, but give us reason to think such a concern may be overstated. They should highlight the importance of combining ideal governance theories with a healthy epistemic discourse, in which genuine examination of one’s views and the changing of one’s mind is possible.

Objection 2. AI ideal governance theories are de-motivating

Another possible objection one might have to AI ideal governance theories is that the practice of coming up with visions of how institutions should be is de-motivating. Even if this initially seems counter-intuitive, there are a number of reasons this might be the case. Firstly, without providing a roadmap, a vague blueprint for how an institution might be better may instead encourage lamentation at the state of the world, rather than helping to create something better.[11] In particular, developing utopias may reveal the gulf between them and our current world, which might reveal the scale of the challenge and make it seem more difficult. Secondly, ideal theories can often look weird and may be difficult to relate to, which may simply turn people off. There has also been a relevant discussion to this topic within the EA community around branding, with one view being that EA should be at least somewhat mindful of how weird it comes across.[12] Thirdly, it might be that other ways of talking about AI are simply more motivating. One recent suggestion in the EA community has been that the best way to motivate people to action may be just to talk about the possibility that inaction in the face of serious risks could lead to an existential catastrophe.[13] This approach seemingly requires less agreement (as most people would agree that extinction is at least prima facie bad), and can be used to appeal to people’s selfish sensibilities.

In this section, I’ll focus on offering a response to the third reason after briefly addressing the other two here. Firstly, as I explain in the next section, I think AI ideal governance theories can help to provide a roadmap. This can let us move beyond lamentation and towards concrete political action. In addition, I think there could be a useful division of labour here, and that it is rather uncharitable to expect all ideal governance theories to provide an accompanying policy programme. I think the second reason is a very important consideration in developing ideal governance theories, and I have no real response other than to say that weird ideal governance theories may well be ineffective.

As for the third reason, I think there is reason to think that ideal theories will probably be motivating to a certain kind of person. People’s imaginations are captured by different things, and it doesn’t seem implausible to think that ideal theories might motivate some people to a greater extent than, say, a sole focus on a possible existential catastrophe might do. A common reason in favour of developing such theories is that they allow us to move beyond short-term concerns and to consider possibilities that had not been previously heard. Consideration of ideals can exercise imagination and offer hope. One way in which they can do this is by offering concrete suggestions about what the implementation of a certain policy or principle might look like.

Personally speaking, I have found Anders Sandberg’s recent discussions of grand futures to be particularly engaging in thinking through the possible futures that humanity might enjoy.[14] Another popular example of motivational utopianism can be found in Toby Ord’s The Precipice (2020, p. 112). In his discussion of humanity’s ‘potential for flourishing’, Ord writes about the limits of our understanding of pleasure. He writes:

Most of these peaks [of pleasure] fade all too quickly. The world deadens; we settle into our routines; our memories dim. But we have seen enough to know that life can offer something far grander and more alive than the standard fare. If humanity can survive, we may one day learn to dwell more and more deeply in such vitality; to brush off more and more dust; to make a home amidst the beauty of the world. Sustaining such heights might not be easy, or simple. It could require changes in our psychology that we should approach with caution. But we know of nothing, in principle, that stands in our way, and much to recommend the exploration.[15]

Although these motivational writings leave work to do to figure out how to progress to that point, their appealing nature may give us good inspiration to do so.

There are also historical examples that show the power of ideal visions. There has been much written about the role that utopianism has played in the development of Silicon Valley, in which founders are often motivated by the possibility of a grander future.[16] A concrete example of this is Elon Musk’s naming of the SpaceX rockets after various science fiction books. In addition, in a policy context, thinking about how the future might go well may allow a candidate to come across as engaging, progressive and forward-looking. One famous example illustrating the ability of ideal visions to motivate is Al Gore’s famous speech highlighting the capacity of the internet. The speech, often mis-referred to as ‘Al Gore invents the internet’, waxes lyrical about the ‘planetary information network that transmits messages and images with the speed of light’, allowing us to ‘transcend the barriers of time and distance’ and making possible ‘robust and sustainable economic progress, strong democracies, better solutions to global and local environmental challenges, improved health care’. These examples should give us reason to think that AI ideal theories could be similarly motivating.

Objection 3. AI ideal governance theories are not helpful for action

The third objection I’m going to address here is the claim that AI ideal governance theories are not helpful in guiding action. A potential sceptic may grant my case above that utopias have a degree of motivational value, but this might mean that only a small number of them ever need to be written. Instead, the vast majority of people should perhaps be working on more practical endeavours to influence policymakers. Here, I want to tackle this scepticism by illustrating the concrete role that ideal governance theories could play in developing AI strategy. I argue that AI ideal theories can help provide general orientations to guide policy making and can even provide actionable objectives.

AI ideal governance theories are useful because they allow us to set positive objectives.[17] Having these positive objectives gives us something to aim for, which can in turn orient action. These plans wouldn’t necessarily have to be incredibly precise, and could instead be used to choose a broad positioning that will later be better defined. In previous mappings of the AI governance space, useful distinctions have been drawn between research at different levels of abstraction, from ‘macro strategy’ all the way down to ‘standards-setting’ and the evaluation of a particular regulation after it has been implemented. Setting broad objectives through AI ideal governance theories could form part of AI governance macro strategy, with further consideration about how to actually reach a given ideal or how to enact a certain strategy at a later stage in the division of labour. One method that has been particularly popular recently in setting such broad objectives has been worldbuilding, which has been recently promoted by the Future of Life Institute contest.

Alternatively, one could develop AI ideal governance theories that operate on shorter time-scales and that can provide more immediately actionable objectives. For example, in 2021, the World Economic Forum published a ‘Positive AI Economic Futures’ report that detailed six possible scenarios that might occur in the future given AI’s impact on the employment market. The impact of increased automation from AI might lead us to automate mundane tasks that humans don’t enjoy in favour of humans performing ‘fulfilling jobs’, such as those that involve using specialised skills (e.g., empathy or critical thinking). On the other hand, we might favour the automation of all work (assuming this is possible) so that humans are able to enjoy their leisure time. Sketching out these different positive visions allows us to decide which would be more appealing. This choice then has important policy consequences. If one favours the former option of only automating mundane jobs, this could lead us to try to develop AIs that can work with humans. In contrast, if the automation of all work is our aim, policies such as a universal basic income might better serve this.

However, an important objection to my argument that AI ideal governance theories are helpful in setting goals for action is that it’s very difficult to set positive objectives for AI development because AI pathways are so uncertain.[18] This is a particularly challenging objection to the practice of developing utopian visions involving AI, as it seems unlikely we could follow a set model of how AI develops without it being a little incorrect.

Here, I suggest two possible responses. Firstly, developing macro-strategies for humanity’s future using AI ideal governance theories might not rely on particular controversial technical claims about AI development. A more convincing strategy might involve multiple different ways of reaching it, and thereby limit its contingency. Secondly, the process of developing plans might be useful even if those plans are not used.[19] Plans can help clarify our motivations and illustrate what’s appealing about a particular policy. For example, it may not be possible to automate all work, even if we decided that it would be ideal to do so. This does limit how action guiding the plan is in this particular scenario, but the fact that we find the scenario in which work is no longer a feature of our lives an appealing one is still an important finding. This could then inform future action, for example in exploring whether any of the positive features of leisure could be better distributed by attempting to reduce inequality.

In conclusion, I think that the objections I’ve tried to raise fairly here do not give us reason to dismiss the approach of AI ideal governance. I argued that AI ideal governance theories are not necessarily dangerous, that they can be motivating and that they can be helpful in action. I hope to have another forum post appearing soon illustrating further directions in AI ideal governance and open questions that I think would be valuable for the sub-field to answer, which may be of particular interest to those that got this far in this post.

Acknowledgements: This post benefited greatly from the support of (amongst others) LP Bova, Miles Brundage, Joseph Carlsmith, Ross Gruetzemacher, Julia Karbing and Risto Uuk.


  1. ↩︎

    AI governance is not limited to a focus on government institutions, and can also include analysis of non-state actors like NGOs and intergovernmental organisations. For further discussion, see here.

  2. ↩︎

    These questions are similar to questions of institutional design. A great piece in this field is here. Discussion of values governing AI creates an overlap with the (much more well-developed) field of AI ethics. I do not spend time here differentiating the fields, other than to say that the institutional focus of AI ideal governance is often not present in AI ethics.

  3. ↩︎

    There is a long-standing debate about how to define utopianism which I do not enter into here. Compared to some authors, my definition is fairly narrow. Much of what I say will still be relevant to those who understand utopianism in a broader manner.

  4. ↩︎

    For illustrative purposes, I draw upon literature in the utopian tradition to make my argument. Not all of these utopias are examples of AI ideal governance; not all of them deal with AI. Such examples should simply be understood as illustrative.

  5. ↩︎

    This is Bostrom’s estimation including artificial beings found in Superintelligence (2017, p. 103).

  6. ↩︎

    See ‘Utopia and Violence’ (1986) by Karl Popper, p. 5.

  7. ↩︎

    Utopias and ‘ends justify the means’ worries are also discussed here.

  8. ↩︎

    For a broader discussion of fiction and EA, see here.

  9. ↩︎

    On this (and other) function of utopia, see here.

  10. ↩︎

    On the ability of philosophical thought experiments to re-frame issues and function as intuition pumps, see Daniel Dennett’s Intuition Pumps and other Tools for Thinking (2013).

  11. ↩︎

    For more on political philosophy and lamentation, see here.

  12. ↩︎

    On EA and weirdness, see here.

  13. ↩︎

    On simplifying EA pitches to warnings about existential catastrophes, see here.

  14. ↩︎

    For an example of Sandberg’s discussion of grand futures, see this talk.

  15. ↩︎

    For a nice forum post that collects some of these more grandiose visions of possible futures, see here.

  16. ↩︎

    On Silicon Valley and utopianism, see Howard Segal’s Technological Utopianism in American Culture (2005).

  17. ↩︎

    In writings on this topic, there is sometimes a misguided suggestion that AI ideal governance theories are the only way to set objectives, which isn’t true. The objective to reduce the chance of an existential catastrophe isn’t an objective that stems from ideal theory but it clearly is an objective, for example.

  18. ↩︎

    This also could be an objection to the motivational function of AI ideal governance theories. If we think that we have no idea if these utopias can come to be, this might make us less motivated by them.

  19. ↩︎

    On the value of planning, see this piece.