Cognitive assets and defensive acceleration

Link post

Organizations aiming to create transformative AI systems (hereafter simplified as “developers”) will face a series of crucial decisions as the technologies they create grow increasingly powerful. One such decision that I think will have a significant impact on humanity’s trajectory is how developers choose to spend their “cognitive assets”. I use this term to describe the quantity and quality of artificial cognition at a lab’s disposal over a given period of time.

My view is that developers should dedicate a significant portion of their cognitive assets towards defensive acceleration (d/acc). This refers to the development of defensive and safety-oriented technologies and governance mechanisms aimed at increasing humanity’s chances of avoiding catastrophic risks while still reaping the immense benefits of advanced AI systems.

Cognitive asset growth

According to the current large language model (LLM) paradigm, the term “cognitive assets” describes the total number of tokens a lab can generate over a period of time, and how much “value” those tokens can create in the real world. The quantity of tokens generated is primarily determined by the computing power at the lab’s disposal, while the quality (read: usefulness) of those tokens depends on the capabilities of the lab’s AI systems.

Developers’ cognitive assets will likely grow over the next decade, potentially at a rapid pace. Computational power will become cheaper; developers will spend increasingly large amounts of money on training cutting-edge AI systems; new algorithmic discoveries will improve systems’ performance and training efficiency; and at some point, a potentially vast number of AI systems could assist with (or autonomously execute) various stages of the AI R&D process. Every single large training run, algorithmic discovery, and chip advancement increases labs’ cognitive assets and unlocks new ways to spend it.

How cognitive assets can be spent

Cognitive assets can be spent in many ways. For example, they can be sold to customers at a profit. We already see this — developers such as OpenAI and Anthropic rent access to their systems and compute, often on a per-token basis. So far, the societal impacts of these transactions have been largely positive; going forward, I expect tremendous value to be created from such systems, as entrepreneurs and businesses will use them to create products and services that fulfil consumer demand and power economic growth.

Though while I believe that most of these uses will benefit society, I fear that developers will face strong incentives to grow and spend their cognitive assets in ways that increase the odds of existential catastrophe.

Competitive pressures, financial incentives, and the desire for prestige could push developers to dramatically accelerate the rate of AI progress. Concretely, this could involve spending their cognitive assets on developing new training algorithms, running ML experiments, or manufacturing large volumes of chips — activities which themselves directly serve to increase developers’ cognitive assets. This feedback loop could result in an “explosion” of AI systems that vastly outperform humans at almost all cognitive tasks. If this transition occurs quickly (e.g., months to years), and/or if we don’t put significant effort towards preparing for the worst-case scenarios, I think there’s a non-trivial chance that humanity will end up permanently disempowered.

Crucially, I don’t think this will necessarily happen, nor do I think we are helpless to do anything about it. Yet absent significant effort towards figuring out how to align and govern transformative AI systems, I expect the default rate of AI progress over the next decade or two to introduce unacceptably high levels of risk (e.g., at least high single digits).

Reducing p(doom) by spending cognitive assets on defensive acceleration

Developers are well-placed to influence this trajectory. By dedicating a portion of their cognitive assets to defensive acceleration, they can proactively work to mitigate these risks while still benefiting from the fruits of increasingly transformative AI systems. Therefore, it would be prudent for developers to make deliberate plans for how they will spend their cognitive assets in ways that increase humanity’s security against the worst-case outcomes of rapid AI progress.

Creating such a plan is difficult: new capabilities can emerge unexpectedly, scientific breakthroughs can cause discontinuous jumps in the size and scope of cognitive assets, and predicting the shape and severity of possible risks from AI progress is notoriously tricky. But if developers want to credibly demonstrate a commitment to responsible stewardship of extremely powerful technologies, they should pledge a non-trivial fraction of their cognitive assets towards helping humanity avoid or defend against existential threats from transformative AI systems, such as by:

Conducting research that furthers our scientific understanding of how to align powerful AI systems with human intentions, and/or control AI systems aiming to subvert humanity;
Designing policies and governance frameworks that can improve humanity’s capacity to safely navigate the development of transformative AI;
Developing defense-oriented technologies, and/or improving existing security for a variety of safety-critical domains (e.g., cybersecurity, biosecurity).

Holden Karnofsky’s writing on the “deployment problem” explores some of these mechanisms and tactics for improving the state of AI safety and security in greater detail. However, a key question remains: What governance frameworks should be established to incentivize developers to implement these risk-reduction strategies from the outset?

To address this challenge, developers could take concrete actions such as updating their charters or entrusting a fraction of their assets to a safety-focused decision-making body. These steps could help safety-conscious labs like OpenAI achieve their stated goals of ensuring “benefits of, access to, and governance of AGI” is “widely and fairly shared”.

Setting up these processes and governance frameworks will undoubtedly be difficult — as is the challenge of properly specifying in advance what actually counts as defensive acceleration — but I think developers that are aiming to build world-transforming technology should make a serious attempt at it regardless.

Difficulties and pitfalls of this framework

Of course, these actions have the potential to interfere with developers’ competitive prospects. Committing cognitive assets to defensive uses could give prosocial developers a competitive weakness. These concerns might manifest in the form of a classic collective action problem, wherein each lab — perhaps out of rational self-interest — would in theory be interested in agreeing to such commitments, but aren’t sufficiently incentivized to do so unilaterally due to fierce competitive pressures. Nevertheless, developers could coordinate by agreeing to shared principles via a third party, such as the Frontier Model Forum, and there is precedent of developers unilaterally making these sorts of commitments. For instance (and to their credit), OpenAI has already committed 20% of their compute secured to date to solving the problem of aligning superintelligent AI systems.

Is the prospect of technological progress just too sweet?

By the time AI systems start producing tremendous wealth for their creators, the siren song of selling cognitive assets solely to the highest bidders might be too tempting, and the allure of technological progress too sweet. But in anticipation of that point, I hope those in charge of frontier AI projects carefully reflect on the stakes involved when deciding how they should spend their cognitive assets. While the path towards transformative AI is highly uncertain, a credible commitment to defensive acceleration should be considered by developers aspiring to safely usher in this technology.

Acknowledgements

I’d like to thank Aaron Gertler, Alex Lawsen, Jeremy Klemin, Max Nadeau, and Trevor Levin for their valuable feedback and suggestions.