When AI companies have human-level AI systems, will they use them for alignment research, or will they use them (mostly) to advance capabilities instead?
It’s not clear this is a crux for the automating alignment research plan to work out.
In particular, suppose an AI company currently spends 5% of its resources on alignment research and will continue spending 5% when they have human level systems. You might think this suffices for alignment to keep pace with capabilities as the alignment labor force will get more powerful as alignment gets more difficult (and more important) due to higher levels of capability.
This doesn’t mean this plan will necessarily work, it depends on the relative difficulty of advancing capabilities vs alignment. I’d naively guess that the probability of success just keeps going up the more resources you use for alignment.
There are some reasons for thinking automation of labor is particularly compelling in the alignment case relative to the capabilities case:
There might be scalable solutions to alignment which effectively indefinitely resolve the research problem while expect that capabilities looks more like continuously making better and better algorithms.
Safety research might benefit relatively more from labor (rather than compute) when compared to capabilities. Two reasons for this:
Safety currently seems relatively more labor bottlenecked.
We can in principle solve large fraction of safety/alignment with fully theoretical safety research without any compute while it seems harder to do purely theoretical capabilities research.
I do think that pausing further capabilities once we have human-ish-level AIs for even just a few years while we focus on safety would massively improve the situation. This currently seems unlikely to happen.
Another way to put this is that automating alignment research is a response in the following dialogue:
Bob: We won’t have enough time to solve alignment because AI takeoff will go very fast due to AIs automating AI R&D (and AI labor generally accelerating AI progress through other mechanisms).
Alice: Actually, as AIs are accelerating AI R&D, they could also be accelerating alignment work, so it’s not clear that accelerating AI progress due to AI R&D acceleration makes the situation very different. As AI progress speeds up, alignment progress might speed up by a similar amount. Or it could speed up by a greater amount due to compute bottlenecks hitting capabilities harder.
It’s not clear this is a crux for the automating alignment research plan to work out.
In particular, suppose an AI company currently spends 5% of its resources on alignment research and will continue spending 5% when they have human level systems. You might think this suffices for alignment to keep pace with capabilities as the alignment labor force will get more powerful as alignment gets more difficult (and more important) due to higher levels of capability.
This doesn’t mean this plan will necessarily work, it depends on the relative difficulty of advancing capabilities vs alignment. I’d naively guess that the probability of success just keeps going up the more resources you use for alignment.
There are some reasons for thinking automation of labor is particularly compelling in the alignment case relative to the capabilities case:
There might be scalable solutions to alignment which effectively indefinitely resolve the research problem while expect that capabilities looks more like continuously making better and better algorithms.
Safety research might benefit relatively more from labor (rather than compute) when compared to capabilities. Two reasons for this:
Safety currently seems relatively more labor bottlenecked.
We can in principle solve large fraction of safety/alignment with fully theoretical safety research without any compute while it seems harder to do purely theoretical capabilities research.
I do think that pausing further capabilities once we have human-ish-level AIs for even just a few years while we focus on safety would massively improve the situation. This currently seems unlikely to happen.
Another way to put this is that automating alignment research is a response in the following dialogue:
Bob: We won’t have enough time to solve alignment because AI takeoff will go very fast due to AIs automating AI R&D (and AI labor generally accelerating AI progress through other mechanisms).
Alice: Actually, as AIs are accelerating AI R&D, they could also be accelerating alignment work, so it’s not clear that accelerating AI progress due to AI R&D acceleration makes the situation very different. As AI progress speeds up, alignment progress might speed up by a similar amount. Or it could speed up by a greater amount due to compute bottlenecks hitting capabilities harder.