Another problem with the differential development argument is that even if you buy that “alignment can be solved”, it’s not like it’s a vaccine you can apply to all AI so it all suddenly turns beneficial. Other people, companies, nations will surely continue to train and deploy AI models, and why would they all apply your alignment principles or tools?
I heard two arguments in response to this concern: that (1) the first aligned AGI will then kill off all other forms of AGI and make all AI related problems go away and that (2) there are more good people than bad people in the world so once techniques for alignment become available, everyone will naturally adopt them. Both of these seem like fairy tales to me.
In other words the premise that any amount of AI capabilities research is OK so long as we “solve alignment” has serious issues, and you don’t even have to believe in AGI for this to bother you.
Re 1) this relates to the strategy stealing assumption: your aligned AI can use whatever strategy unaligned AIs use to maintain and grow their power. Killing the competition is one strategy but there are many others including defensive actions and earning money / resources.
Edit: I implicitly said that it’s okay to have unaligned AIs as long as you have enough aligned ones around. For example we may not need aligned companies if we have (minimally) aligned government+law enforcement.
I don’t think the strategy-stealing assumption holds here: it’s pretty unlikely that we’ll build a fully aligned ‘sovereign’ AGI even if we solve alignment; it seems easier to make something corrigible / limited instead, ie something that is by design less powerful than would be possible if we were just pushing capabilities.
I don’t mean to imply that we’ll build a sovereign AI (I doubt it too).
Corrigible is more what I meant. Corrigible but not necessarily limited. Ie minimally intent aligned AIs which won’t kill you but by the strategy stealing assumption can still compete with unaligned AIs.
I’m curious to dig into this a bit more, and hear why you think these seem like fairy tales to you (I’m not saying that I disagree...).
I wonder if this comes down to different ideas of what “solve alignment” means (I see you put it in quotes...)
1) Are you perhaps thinking that realistic “solutions to alignment” will carry a significant alignment tax? Else why wouldn’t ~everyone adopt alignment techniques (that align AI systems with their preferences/values)?
2) Another source of ambiguity: there are a lot of different things people mean by “alignment”, including: * AI is aligned with objectively correct values * AI is aligned with a stakeholder and consistently pursues their interests * AI does a particular task as intended/expected Is one of these in particular (or something else) that you have in mind here?
I agree that it’s not trivial to assume everyone will use aligned AI.
Let’s suppose the goal of alignment research is to make aligned AI equally easy/cheap to build as unaligned AI. I. e. no addition cost. If we then suppose aligned AI also has a nonzero benefit, people are incentivized to use it.
More ink could be spilled on whether aligning AI has a nonzero commercial benefit. I feel that efforts like prompting and Instruct GPT are suggestive. But this may not apply to all alignment efforts.
Another problem with the differential development argument is that even if you buy that “alignment can be solved”, it’s not like it’s a vaccine you can apply to all AI so it all suddenly turns beneficial. Other people, companies, nations will surely continue to train and deploy AI models, and why would they all apply your alignment principles or tools?
I heard two arguments in response to this concern: that (1) the first aligned AGI will then kill off all other forms of AGI and make all AI related problems go away and that (2) there are more good people than bad people in the world so once techniques for alignment become available, everyone will naturally adopt them. Both of these seem like fairy tales to me.
In other words the premise that any amount of AI capabilities research is OK so long as we “solve alignment” has serious issues, and you don’t even have to believe in AGI for this to bother you.
Re 1) this relates to the strategy stealing assumption: your aligned AI can use whatever strategy unaligned AIs use to maintain and grow their power. Killing the competition is one strategy but there are many others including defensive actions and earning money / resources.
Edit: I implicitly said that it’s okay to have unaligned AIs as long as you have enough aligned ones around. For example we may not need aligned companies if we have (minimally) aligned government+law enforcement.
I don’t think the strategy-stealing assumption holds here: it’s pretty unlikely that we’ll build a fully aligned ‘sovereign’ AGI even if we solve alignment; it seems easier to make something corrigible / limited instead, ie something that is by design less powerful than would be possible if we were just pushing capabilities.
I don’t mean to imply that we’ll build a sovereign AI (I doubt it too).
Corrigible is more what I meant. Corrigible but not necessarily limited. Ie minimally intent aligned AIs which won’t kill you but by the strategy stealing assumption can still compete with unaligned AIs.
I’m curious to dig into this a bit more, and hear why you think these seem like fairy tales to you (I’m not saying that I disagree...).
I wonder if this comes down to different ideas of what “solve alignment” means (I see you put it in quotes...)
1) Are you perhaps thinking that realistic “solutions to alignment” will carry a significant alignment tax? Else why wouldn’t ~everyone adopt alignment techniques (that align AI systems with their preferences/values)?
2) Another source of ambiguity: there are a lot of different things people mean by “alignment”, including:
* AI is aligned with objectively correct values
* AI is aligned with a stakeholder and consistently pursues their interests
* AI does a particular task as intended/expected
Is one of these in particular (or something else) that you have in mind here?
I agree that it’s not trivial to assume everyone will use aligned AI.
Let’s suppose the goal of alignment research is to make aligned AI equally easy/cheap to build as unaligned AI. I. e. no addition cost. If we then suppose aligned AI also has a nonzero benefit, people are incentivized to use it.
The above seems to be the perspective in this alignment research overview https://www.effectivealtruism.org/articles/paul-christiano-current-work-in-ai-alignment.
More ink could be spilled on whether aligning AI has a nonzero commercial benefit. I feel that efforts like prompting and Instruct GPT are suggestive. But this may not apply to all alignment efforts.