Towards more cooperative AI safety strategies

This post is written in a spirit of constructive criticism. It’s phrased fairly abstractly, in part because it’s a sensitive topic, but I welcome critiques and comments below. The post is structured in terms of three claims about the strategic dynamics of AI safety efforts; my main intention is to raise awareness of these dynamics, rather than advocate for any particular response to them. Disclaimer: I work at OpenAI, although this is a personal post that was not reviewed by OpenAI.

Claim 1: The AI safety community is structurally power-seeking.

By “structurally power-seeking” I mean: tends to take actions which significantly increase its power. This does not imply that people in the AI safety community are selfish or power-hungry; or even that these strategies are misguided. Taking the right actions for the right reasons often involves accumulating some amount of power. However, from the perspective of an external observer, it’s difficult to know how much to trust stated motivations, especially when they often lead to the same outcomes as self-interested power-seeking.

Some prominent examples of structural power-seeking include:

  • Trying to raise a lot of money.

  • Trying to gain influence within governments, corporations, etc.

  • Trying to control the ways in which AI values are shaped.

  • Favoring people who are concerned about AI risk for jobs and grants.

  • Trying to ensure non-release of information (e.g. research, model weights, etc).

  • Trying to recruit (high school and college) students.

To be clear, you can’t get anything done without being structurally power-seeking to some extent. However, I do think that the AI safety community is more structurally power-seeking than other analogous communities, such as most other advocacy groups. Some reasons for this disparity include:

  1. The AI safety community is more consequentialist and more focused on effectiveness than most other communities. When reasoning on a top-down basis, seeking power is an obvious strategy for achieving one’s desired consequences (but can be aversive to deontologists or virtue ethicists).

  2. The AI safety community feels a stronger sense of urgency and responsibility than most other communities. Many in the community believe that the rest of the world won’t take action until it’s too late; and that it’s necessary to have a centralized plan.

  3. The AI safety community is more focused on elites with homogeneous motivations than most other communities. In part this is because it’s newer than (e.g.) the environmentalist movement; in part it’s because the risks involved are more abstract; in part it’s a founder effect.

Again, these are intended as descriptions rather than judgments. Traits like urgency, consequentialism, etc, are often appropriate. But the fact that the AI safety community is structurally power-seeking to an unusual degree makes it important to grapple with another point:

Claim 2: The world has strong defense mechanisms against (structural) power-seeking.

In general, we should think of the wider world as being very cautious about perceived attempts to gain power; and we should expect that such attempts will often encounter backlash. In the context of AI safety, some types of backlash have included:

  1. Strong public criticism of not releasing models publicly.

  2. Strong public criticism of centralized funding (e.g. billionaire philanthropy).

  3. Various journalism campaigns taking a “conspiratorial” angle on AI safety.

  4. Strong criticism from the AI ethics community about “whose values” AIs will be aligned to.

  5. The development of an accelerationist movement focused on open-source AI.

These defense mechanisms often apply regardless of stated motivations. That is, even if there are good arguments for a particular policy, people will often look at the net effect on overall power balance when judging it. This is a useful strategy in a world where arguments are often post-hoc justifications for power-seeking behavior.

To be clear, it’s not necessary to avoid these defense mechanisms at all costs. It’s easy to overrate the effect of negative publicity; and attempts to avoid that publicity are often more costly than the publicity itself. But reputational costs do accumulate over time, and also contribute to a tribalist mindset of “us vs them” (as seen most notably in the open-source debate) which makes truth-seeking harder.

Note that most big companies (especially AGI companies) are strongly structurally power-seeking too, and this is a big reason why society at large is so skeptical of and hostile to them. I focused on AI safety in this post both because companies being power-seeking is an idea that’s mostly “priced in”, and because I think that these ideas are still useful even when dealing with other power-seeking actors.

Claim 3: The variance of (structurally) power-seeking strategies will continue to increase.

Those who currently take AGI and ASI seriously have opportunities to make investments (of money, time, social capital, etc) which will lead to much more power in the future if AI continues to become a much, much bigger deal.

But increasing attention to AI will also lead to increasingly high-stakes power struggles over who gets to control it. So far, we’ve seen relatively few such power struggles because people don’t believe that control over AI is an important type of power. That will change. To some extent this has already happened (with AI safety advocates being involved in the foundation of three leading AGI labs) but as power struggles become larger-scale, more people who are extremely good at winning them will become involved. That makes AI safety strategies which require power-seeking more difficult to carry out successfully.

How can we mitigate this issue? Two things come to mind. Firstly, focusing more on legitimacy. Work that focuses on informing the public, or creating mechanisms to ensure that power doesn’t become too concentrated even in the face of AGI, is much less likely to be perceived as power-seeking.

Secondly, prioritizing competence. Ultimately, humanity is mostly in the same boat: we’re the incumbents who face displacement by AGI. Right now, many people are making predictable mistakes because they don’t yet take AGI very seriously. We should expect this effect to decrease over time, as AGI capabilities and risks become less speculative. This consideration makes it less important that decision-makers are currently concerned about AI risk, and more important that they’re broadly competent, and capable of responding sensibly to confusing and stressful situations, which will become increasingly common as the AI revolution speeds up.

EDIT: A third thing, which may be the most important takeaway in practice: the mindset that it’s your job to “ensure” that things go well, or come up with a plan that’s “sufficient” for things to go well, inherently biases you towards trying to control other people—because otherwise they might be unreasonable enough to screw up your plan. But trying to control others will very likely backfire for all the reasons laid out above. Worse, it might get you stuck in a self-reinforcing negative loop: the more things backfire, the more worried you are, and so the more control you try to gain, causing further backfiring… So you shouldn’t be in that mindset unless you’re literally the US President (and maybe not even then). Instead, your job is to make contributions such that, if the wider world cooperates with you, then things are more likely to go well. AI safety is in the fortunate position that, as AI capabilities steadily grow, more and more people will become worried enough to join our coalition. Let’s not screw that up.

Crossposted from LessWrong (206 points, 130 comments)