AGI in a vulnerable world

Link post

Written by Asya Bergal for AI Impacts.

I’ve been thinking about a class of AI-takeoff scenarios where a very large number of people can build dangerous, unsafe AGI before anyone can build safe AGI. This seems particularly likely if:

  • It is considerably more difficult to build safe AGI than it is to build unsafe AGI.

  • AI progress is software-constrained rather than compute-constrained.

  • Compute available to individuals grows quickly and unsafe AGI turns out to be more of a straightforward extension of existing techniques than safe AGI is.

  • Organizations are bad at keeping software secret for a long time, i.e. it’s hard to get a considerable lead in developing anything.

    • This may be because information security is bad, or because actors are willing to go to extreme measures (e.g. extortion) to get information out of researchers.

Another related scenario is one where safe AGI is built first, but isn’t defensively advantaged enough to protect against harms by unsafe AGI created soon afterward.

The intuition behind this class of scenarios comes from an extrapolation of what machine learning progress looks like now. It seems like large organizations make the majority of progress on the frontier, but smaller teams are close behind and able to reproduce impressive results with dramatically fewer resources. I don’t think the large organizations making AI progress are (currently) well-equipped to keep software secret if motivated and well-resourced actors put effort into acquiring it. There are strong openness norms in the ML community as a whole, which means knowledge spreads quickly. I worry that there are strong incentives for progress to continue to be very open, since decreased openness can hamper an organization’s ability to recruit talent. If compute available to individuals increases a lot, and building unsafe AGI is much easier than building safe AGI, we could suddenly find ourselves in a vulnerable world.

I’m not sure if this is a meaningfully distinct or underemphasized class of scenarios within the AI risk space. My intuition is that there is more attention on incentives failures within a small number of actors, e.g. via arms races. I’m curious for feedback about whether many-people-can-build-AGI is a class of scenarios we should take seriously and if so, what things society could do to make them less likely, e.g. invest in high-effort info-security and secrecy work. AGI development seems much more likely to go existentially badly if more than a small number of well-resourced actors are able to create AGI.

No comments.