This is kind of off-topic, but I remember a few years ago, regarding the possibility of competition within AI alignment, I asked Nate, and he said one day he’d like to set up something like competing departments within MIRI. The issue with that at the time was that having an AI alignment organization respond to the idea they should have competitors with, instead of “trust us to do a good job”, to internalize competition checks out to “trust us to do a good job”. Things have changed, what with MIRI being much more reticent to publish much of their research, so it’s almost like “trust us to do a good job” now no matter what MIRI actually does.
Divergence of efforts in AI alignment could lead to an arms race, and that’s bad. At the same time, we can’t discourage competition in AI alignment. It seems, for AI alignment, determining what is ‘healthy’ competition is extremely complicated. I just thought I’d bring this up, since competition in AI alignment is at least somewhat necessary while also posing a risk of a race to the bottom, in a way that, for example, bednet distribution doesn’t.
Divergence of efforts in AI alignment could lead to an arms race
Can you be a bit concrete about what this will look like? Is this because different approaches to alignment can also lead to insight in capabilities, or is there something else more insidious?
Naively it’s easy to see why an arms race in AI capabilities is bad, but competition for AI alignment seems basically good.
An alignment arms race is only bad if there is a concomitant capabilities development that would make a wrong alignment protocol counterproductive. Different approaches to alignment can lead to insights into capabilities, and that’s something to be concerned about, but that isn’t anything already captured in analyses of capabilities arms-race scenarios.
If there are 2 or more alignment agencies, but only one of their approaches can fit with advanced AI systems as developed, each would race to complete their alignment agenda before the other agencies could complete theirs. This rushing could be especially bad if anyone doesn’t take the time to authenticate and verify their approach will actually align AI as intended. In addition, if the competition becomes hostile enough, AI alignment agencies won’t be checking each other’s work in good faith, and in general, there won’t be enough trust for anyone to let anyone else check the work they’ve done for alignment.
If 1 or more of these agencies racing to the finish line doesn’t let anyone check their work, and their strategy is invalid or unsound, then implementing one of them into an AI system would fail to lead to alignment, when it was expected that it would. In other words, because of mistakes made, what looks like an alignment competition inadvertently becomes a misalignment race.
I’m not saying competition in AI alignment is either good or bad by default. What I am saying is it appears there are particular conditions that would lead competition in AI alignment to make things worse, and that such states should be avoided. To summarize, it appears to me at least some of those conditions are:
1. Competition in AI alignment becomes a ‘race.’
2. One or more agencies in AI alignment themselves become untrustworthy.
3. Even if in principle all AI alignment agencies should be able to trust each other, in practice they end up mistrusting each other.
How can one incentivise the right kind of behaviour here? This isn’t a zero sum game—we can all win, we can all lose. How do we inculcate the market with that knowledge such that the belief that only one of us can win doesn’t make us all more likely to lose?
Off the top of my head:
Some sort of share trading scheme.
Some guarantee from different AI companies that whichever one reaches AI first will employ people from the others.
This is kind of off-topic, but I remember a few years ago, regarding the possibility of competition within AI alignment, I asked Nate, and he said one day he’d like to set up something like competing departments within MIRI. The issue with that at the time was that having an AI alignment organization respond to the idea they should have competitors with, instead of “trust us to do a good job”, to internalize competition checks out to “trust us to do a good job”. Things have changed, what with MIRI being much more reticent to publish much of their research, so it’s almost like “trust us to do a good job” now no matter what MIRI actually does.
Divergence of efforts in AI alignment could lead to an arms race, and that’s bad. At the same time, we can’t discourage competition in AI alignment. It seems, for AI alignment, determining what is ‘healthy’ competition is extremely complicated. I just thought I’d bring this up, since competition in AI alignment is at least somewhat necessary while also posing a risk of a race to the bottom, in a way that, for example, bednet distribution doesn’t.
Can you be a bit concrete about what this will look like? Is this because different approaches to alignment can also lead to insight in capabilities, or is there something else more insidious?
Naively it’s easy to see why an arms race in AI capabilities is bad, but competition for AI alignment seems basically good.
An alignment arms race is only bad if there is a concomitant capabilities development that would make a wrong alignment protocol counterproductive. Different approaches to alignment can lead to insights into capabilities, and that’s something to be concerned about, but that isn’t anything already captured in analyses of capabilities arms-race scenarios.
If there are 2 or more alignment agencies, but only one of their approaches can fit with advanced AI systems as developed, each would race to complete their alignment agenda before the other agencies could complete theirs. This rushing could be especially bad if anyone doesn’t take the time to authenticate and verify their approach will actually align AI as intended. In addition, if the competition becomes hostile enough, AI alignment agencies won’t be checking each other’s work in good faith, and in general, there won’t be enough trust for anyone to let anyone else check the work they’ve done for alignment.
If 1 or more of these agencies racing to the finish line doesn’t let anyone check their work, and their strategy is invalid or unsound, then implementing one of them into an AI system would fail to lead to alignment, when it was expected that it would. In other words, because of mistakes made, what looks like an alignment competition inadvertently becomes a misalignment race.
I’m not saying competition in AI alignment is either good or bad by default. What I am saying is it appears there are particular conditions that would lead competition in AI alignment to make things worse, and that such states should be avoided. To summarize, it appears to me at least some of those conditions are:
1. Competition in AI alignment becomes a ‘race.’
2. One or more agencies in AI alignment themselves become untrustworthy.
3. Even if in principle all AI alignment agencies should be able to trust each other, in practice they end up mistrusting each other.
How can one incentivise the right kind of behaviour here? This isn’t a zero sum game—we can all win, we can all lose. How do we inculcate the market with that knowledge such that the belief that only one of us can win doesn’t make us all more likely to lose?
Off the top of my head:
Some sort of share trading scheme.
Some guarantee from different AI companies that whichever one reaches AI first will employ people from the others.