AI acceleration from a safety perspective: Trade-offs and considerations

This post was written by Marius Hobbhahn and Tilman Räuker.

Over the last years, we have encountered stances on AI safety and acceleration from “all forms of AI acceleration is bad” over “sometimes acceleration can be justified” to “If it means control over AI technologies, acceleration could be a necessary evil”.
In this post, we want to look at different considerations and trade-offs regarding AI acceleration and possible pitfalls.

You are welcome to add further reasons, trade-offs, etc.

TL;DR: We investigate different views on AI acceleration & alignment and what considerations they depend on. We could not identify one view as clearly superior. We think that the interaction with non-aligned actors, e.g. AI researchers/companies, can make it necessary to apply strategies that increase acceleration.

Ozzie has recently posted 13 different stances on AGI. Our post tries to pick up some of his suggested “next steps″. He also gave feedback to an early draft which we are thankful for.

A big clarification:

We know that different people and institutions within the EA space hold different beliefs regarding the issue of AI acceleration. Our aim is NOT to single out and criticize them but rather to have a civil discussion on an issue that seems pretty important to AI alignment.
We don’t have an inside view into the decision-making that led to the foundation of different institutions but we are optimistic that with multiple founders, grantmakers and advisors involved, a lot of time went into thinking about the possible downsides of AI acceleration.

Definition:

We think of AI alignment as the process of aligning the goals of artificial systems with the broader goals of humanity.

We loosely define AI acceleration as everything that increases the pace of AI development/capabilities i.e. shortens TAI timelines.

In this post, we primarily look at AI acceleration that comes as a byproduct of alignment efforts. Examples include accidentally developing more efficient training techniques while doing alignment research or actively improving the state of the art but only giving access to aligned actors.

We will use the term “non-aligned actor” for institutions/researchers who are not directly or indirectly concerned with AI safety. There is not a clear threshold when an institution becomes aligned/non-aligned, but one example of a rather non-aligned actor is NVIDIA. Even though one research area of theirs is concerned with safety, we assume that this is not the main focus of their research.

Different views:

We broadly identified four escalating views on AI acceleration.

1. AI acceleration is bad under all circumstances

AGI might be really terrible. Thus, everything that makes it come earlier is bad.

2. To make relevant AI systems safe we need to work with the state of the art, which can sometimes lead to minimal acceleration as a side effect

If you work with the state of the art, e.g. LLMs, it is plausible that you sometimes solve problems that other institutions have not encountered before. This might, for example, be specific knowledge about prompt engineering or how to set up GPU clusters efficiently.

In the best case, aligned models show state-of-the-art performance and everyone adopts them.

3. We should further the state-of-the-art to increase control over relevant AI systems

By being in control of relevant knowledge about AI algorithms or relevant architecture, aligned actors could control who gets access and thus decrease the risk of misalignment.

In the best case, aligned actors have sufficient power to decide who gets access to the most capable models or compute.

4. AI is likely good. The sooner the better.

If you think we already have good answers to all alignment questions or are confident that we will find them in time, then acceleration is good. Since there are still a lot of open questions, the vast majority of EAs don’t hold this view.

Dependencies:

The following factors can lead to different views. They are not supposed to be an exhaustive list—we welcome additional suggestions in the comments.

How hard is alignment?

If you think that alignment is a very hard problem and we should have as much time as possible, view 1 becomes more plausible. If you think alignment might turn out to be easier than anticipated or might get easier with larger models, views 2 & 3 become more plausible.

How much AI acceleration can AI safety researchers create counterfactually?

The AI safety community is really small compared to conventional AI research in terms of people and money. To gain a simple estimate of the amount of AI research that focuses on safety we compare the number of papers released. We estimate that about 1 in 100 papers focuses on safety (see appendix). Furthermore, the AI safety community might be especially interested in and focused on TAI/AGI and therefore disproportionally prone to create acceleration.

The more you think that AI safety researchers can make a meaningful difference to the overall progress of AI, the more plausible views 1&3 become compared to 2.

How probable is accidental acceleration?

Assume an aligned organization furthers the state-of-the-art. They have good intentions and decide to only share their methods with other aligned organizations. If everything goes well, this does not increase acceleration by unaligned actors. However, it might be possible that the information leaks.

Reaching a new state-of-the-art could already accelerate other research, e.g. if you make better hardware, NVIDIA might accelerate because feasibility was shown.

The same premise holds for pure safety research: MIRI decided to go ‘nondisclosed-by-default’ due to the possibility of acceleration through their results.

Eleuther.ai also discusses the issue, focusing on a large language model (LLM). While they do not seem to aim to further state of the art, they want to open research on LLMs by releasing a similar model to GPT-3 to make safety research for LLMs possible. Furthermore, they claim most damage done through GPT-3 happened by showing feasibility.

Results of the state-of-the-art research can therefore be considered as infohazards.

The more probable you find accidental acceleration, the more you should favor view 1 over 2 and especially 3.

Which kind of models will be adopted by a broader audience?

It is plausible that most actors don’t care a lot about safety or alignment but do care about performance. The higher the performance of aligned models compared to unaligned ones, the likelier adoption by the broader public becomes.
It might be the case that there is a trade-off between capabilities and alignment and therefore, aligned models can’t ever reach state-of-the-art performance (as discussed e.g. in A dilemma for prosaic AI).

The more plausible you find this perspective on adoption, the more you should favor view 2.

How much pressure can EAs create on other players to be more aligned?

Assume aligned organizations control some state-of-the-art technology, e.g. a large language-model API or compute. In a high-pressure scenario, other companies adapt their mission to be aligned in order to get access. In a low-pressure scenario, other companies just ignore the aligned organization and nothing happens.

The more pressure aligned organizations can create, the more you should favor view 3.

How reckless are non-aligned actors?

There are different possible scenarios of how reckless non-aligned actors will be with AI systems. You might think that they will care about safety due to the large possible negative consequences of unaligned AI.

On the other hand, non-aligned actors might not care about safety due to profit incentives, lack of understanding or external pressures.

The more reckless you think that non-aligned actors behave, the stronger you should believe in controlling access to AI tech, i.e. view 3 over 1 & 2.

What is your take-off model?

Different take-off models result in different timelines. Short timelines until TAI should favor view 1 since every tiny bit of acceleration is important time lost. Longer timelines favor views 2 & 3 since there is more time for unaligned actors to do harm with increasingly powerful systems.

Unstable actors or the debate on who creates the first TAI

We have little expertise in other countries’ AI policy but we would assume that they are a less well-meaning actor than most western countries. For example, we think it is more likely that China would use an AI to gain an advantage even if everyone else is worse off.

The higher you think the probability of unstable actors developing the first TAI is, the more you should favor views 3 to control and prevent them.

Type of research

Obviously, the kind of research that an organization does matters as well. If they have a very clear and plausible theory of change for how their acceleration leads to relevant increases in alignment, views 2 & 3 might be more plausible. If their theory is “let’s accelerate and see what happens” that’s probably bad and strengthens view 1.

However, we think that even controlled acceleration just for the sake of testing safety-relevant techniques on more powerful models is already rather dangerous and is similar to gain-of-function research in other fields such as biosecurity. Therefore, all the risks of accidental releases can be translated into the AI domain.

How powerful is TAI/AGI?

Predictions about AI systems range from “better than humans but not by much” to “insanely powerful”. The more powerful AI is, the higher the stakes are. In a pre-alignment world, this favors view 1 since we want to minimize the chance of powerful unaligned actors with potentially large negative impacts. In a post-alignment world, this favors view 4 since we should get to good outcomes as fast as possible.

Potential Biases:

Effective Altruists are unfortunately not immune to biases, some of which we want to highlight. However, we think that the founding process, discussions with other EAs, and input from funders should mitigate these biases a lot. Therefore, we think these biases are much less important than the considerations above.

Money: There is a lot of money in AI. EAs are not immune from the desire to be rich. Most money comes from improving the state-of-the-art and much less from safety work—at least for now.
Rationalization: It’s easy to say that “someone else would have found the new technique if I didn’t” and it’s hard or impossible to evaluate this counterfactual.
Tired of being negative about the future: Some EAs might be sick of being the bad messenger all the time. The message “I think AI is good, we just need to make it safe” gives you much more pleasurable human interactions than “You’re gonna kill us all. Stop doing the thing that might make you rich”.

Conclusion:

Our main goal with this post is to highlight different considerations on AI acceleration in the context of AI safety research. Broadly, we think that view 1 (non-acceleration) would be the default if everyone in AI worked on alignment. Views 2 (side-effects) & 3 (control) come from the interaction with nonaligned actors, e.g. since other actors continually increase the state of the art the safety community has to sometimes make decisions that could accelerate AI.

We don’t think any of the first three views from above are obviously superior. However, we think it’s entirely plausible that some considerations might completely dominate others when investigated in more detail.

Appendix

How big is AI safety compared to the rest of the field?

To estimate this we use the arxiv search engine and search for papers with specific keywords. As a set of keywords, we used the suggested set from this metaculus question and applied it to the year 2021, which resulted in 533 results. If we use a general query of “Machine Learning”, “Artificial Intelligence”, we get 35,000 results. With additional keywords, we can easily bump up the number to as high as 47k results. Also, note that these numbers seem to increase slightly over time, maybe because people add keywords to some papers.

We are aware that this is not the best method for estimating the share of safety to non-safety work (e.g. compared to estimating funding, or estimating employees) but this might be a good starting point for someone to further explore these questions.

A dilemma for prosaic AI alignment (link)

By Rohin: “If we try to train an AI system directly using such a scheme, it will likely be uncompetitive, since it seems likely that the most powerful AI systems will probably require cutting-edge algorithms, architectures, objectives, and environments, at least some of which will be replaced by new versions from the safety scheme. Alternatively, we could first train a general AI system, and then use our alignment scheme to finetune it into an aligned AI system. However, this runs the risk that the initial training could create a misaligned mesa optimizer, that then deliberately sabotages our finetuning efforts.”

Eleuther AI take on their LLM

They state: “Most (>99%) of the damage of GPT⁠-⁠3’s release was done the moment the paper was published”. Also, Connor has written a lot about it, so you can check that out.