I think “competitors” for key EA orgs, your point #2, are key here. No matter how smart and committed you are, without competitors there is less pressure on you to correct your faults and become the best version of yourself.
Competitors for key EA orgs will also be well-positioned (in some cases, perhaps in the best possible position) to dialogue with the orgs they compete with, improving them and likely also the EA “public sphere.”
I don’t think an independent auditor that works across EA orgs and mainly focuses on logic would be as high a value-add as competitors for specific orgs. The auditor is not going to be enough of a domain expert to competently evaluate the work of a bunch of different orgs. But I think it’s worth thinking more about. Would be curious if you or anyone has more ideas about the specifics of that.
This is kind of off-topic, but I remember a few years ago, regarding the possibility of competition within AI alignment, I asked Nate, and he said one day he’d like to set up something like competing departments within MIRI. The issue with that at the time was that having an AI alignment organization respond to the idea they should have competitors with, instead of “trust us to do a good job”, to internalize competition checks out to “trust us to do a good job”. Things have changed, what with MIRI being much more reticent to publish much of their research, so it’s almost like “trust us to do a good job” now no matter what MIRI actually does.
Divergence of efforts in AI alignment could lead to an arms race, and that’s bad. At the same time, we can’t discourage competition in AI alignment. It seems, for AI alignment, determining what is ‘healthy’ competition is extremely complicated. I just thought I’d bring this up, since competition in AI alignment is at least somewhat necessary while also posing a risk of a race to the bottom, in a way that, for example, bednet distribution doesn’t.
Divergence of efforts in AI alignment could lead to an arms race
Can you be a bit concrete about what this will look like? Is this because different approaches to alignment can also lead to insight in capabilities, or is there something else more insidious?
Naively it’s easy to see why an arms race in AI capabilities is bad, but competition for AI alignment seems basically good.
An alignment arms race is only bad if there is a concomitant capabilities development that would make a wrong alignment protocol counterproductive. Different approaches to alignment can lead to insights into capabilities, and that’s something to be concerned about, but that isn’t anything already captured in analyses of capabilities arms-race scenarios.
If there are 2 or more alignment agencies, but only one of their approaches can fit with advanced AI systems as developed, each would race to complete their alignment agenda before the other agencies could complete theirs. This rushing could be especially bad if anyone doesn’t take the time to authenticate and verify their approach will actually align AI as intended. In addition, if the competition becomes hostile enough, AI alignment agencies won’t be checking each other’s work in good faith, and in general, there won’t be enough trust for anyone to let anyone else check the work they’ve done for alignment.
If 1 or more of these agencies racing to the finish line doesn’t let anyone check their work, and their strategy is invalid or unsound, then implementing one of them into an AI system would fail to lead to alignment, when it was expected that it would. In other words, because of mistakes made, what looks like an alignment competition inadvertently becomes a misalignment race.
I’m not saying competition in AI alignment is either good or bad by default. What I am saying is it appears there are particular conditions that would lead competition in AI alignment to make things worse, and that such states should be avoided. To summarize, it appears to me at least some of those conditions are:
1. Competition in AI alignment becomes a ‘race.’
2. One or more agencies in AI alignment themselves become untrustworthy.
3. Even if in principle all AI alignment agencies should be able to trust each other, in practice they end up mistrusting each other.
How can one incentivise the right kind of behaviour here? This isn’t a zero sum game—we can all win, we can all lose. How do we inculcate the market with that knowledge such that the belief that only one of us can win doesn’t make us all more likely to lose?
Off the top of my head:
Some sort of share trading scheme.
Some guarantee from different AI companies that whichever one reaches AI first will employ people from the others.
I think “competitors” for key EA orgs, your point #2, are key here. No matter how smart and committed you are, without competitors there is less pressure on you to correct your faults and become the best version of yourself.
Competitors for key EA orgs will also be well-positioned (in some cases, perhaps in the best possible position) to dialogue with the orgs they compete with, improving them and likely also the EA “public sphere.”
I don’t think an independent auditor that works across EA orgs and mainly focuses on logic would be as high a value-add as competitors for specific orgs. The auditor is not going to be enough of a domain expert to competently evaluate the work of a bunch of different orgs. But I think it’s worth thinking more about. Would be curious if you or anyone has more ideas about the specifics of that.
I basically agree with this. I have a bunch of thoughts about healthy competition in the EA sphere I’ve been struggling to write up.
This is kind of off-topic, but I remember a few years ago, regarding the possibility of competition within AI alignment, I asked Nate, and he said one day he’d like to set up something like competing departments within MIRI. The issue with that at the time was that having an AI alignment organization respond to the idea they should have competitors with, instead of “trust us to do a good job”, to internalize competition checks out to “trust us to do a good job”. Things have changed, what with MIRI being much more reticent to publish much of their research, so it’s almost like “trust us to do a good job” now no matter what MIRI actually does.
Divergence of efforts in AI alignment could lead to an arms race, and that’s bad. At the same time, we can’t discourage competition in AI alignment. It seems, for AI alignment, determining what is ‘healthy’ competition is extremely complicated. I just thought I’d bring this up, since competition in AI alignment is at least somewhat necessary while also posing a risk of a race to the bottom, in a way that, for example, bednet distribution doesn’t.
Can you be a bit concrete about what this will look like? Is this because different approaches to alignment can also lead to insight in capabilities, or is there something else more insidious?
Naively it’s easy to see why an arms race in AI capabilities is bad, but competition for AI alignment seems basically good.
An alignment arms race is only bad if there is a concomitant capabilities development that would make a wrong alignment protocol counterproductive. Different approaches to alignment can lead to insights into capabilities, and that’s something to be concerned about, but that isn’t anything already captured in analyses of capabilities arms-race scenarios.
If there are 2 or more alignment agencies, but only one of their approaches can fit with advanced AI systems as developed, each would race to complete their alignment agenda before the other agencies could complete theirs. This rushing could be especially bad if anyone doesn’t take the time to authenticate and verify their approach will actually align AI as intended. In addition, if the competition becomes hostile enough, AI alignment agencies won’t be checking each other’s work in good faith, and in general, there won’t be enough trust for anyone to let anyone else check the work they’ve done for alignment.
If 1 or more of these agencies racing to the finish line doesn’t let anyone check their work, and their strategy is invalid or unsound, then implementing one of them into an AI system would fail to lead to alignment, when it was expected that it would. In other words, because of mistakes made, what looks like an alignment competition inadvertently becomes a misalignment race.
I’m not saying competition in AI alignment is either good or bad by default. What I am saying is it appears there are particular conditions that would lead competition in AI alignment to make things worse, and that such states should be avoided. To summarize, it appears to me at least some of those conditions are:
1. Competition in AI alignment becomes a ‘race.’
2. One or more agencies in AI alignment themselves become untrustworthy.
3. Even if in principle all AI alignment agencies should be able to trust each other, in practice they end up mistrusting each other.
How can one incentivise the right kind of behaviour here? This isn’t a zero sum game—we can all win, we can all lose. How do we inculcate the market with that knowledge such that the belief that only one of us can win doesn’t make us all more likely to lose?
Off the top of my head:
Some sort of share trading scheme.
Some guarantee from different AI companies that whichever one reaches AI first will employ people from the others.