Another probably very silly question: in what sense isn’t AI alignment just plain inconceivable to begin with? I mean, given the premise that we could and did create a superintelligence many orders of magnitude superior to ourselves, how could it even make sense to have any type of fail-safe mechanism to ‘enslave it’ to our own values? A priori, it sounds like trying to put shackles on God. We can’t barely manage to align ourselves as a species.
If an AI is built to value helping humans, and if that value can remain intact, then it wouldn’t need to be “enslaved”; it would want to be nice on its own accord. However, I agree with what I take to be the thrust of your question, which is that the chances seem slim that an AI would continue to care about human concerns after many rounds of self-improvement. It seems too easy for things to slide askew from what humans wanted one way or other, especially if there’s a competitive environment with complex interactions among agents.
The main way I currently see AI alignment to work out is to create an AI that is responsible for the alignment. My perspective is that humans are flawed and can not control / not properly control something that is smarter than them just as much as a single ant cannot control a human.
This in turn also means that we’ll eventually need to give up control and let the AI make the decisions with no way for a human to interfere.
If this is the case the direction of AI alignment would be to create this “Guardian AGI”, I’m still not sure how to go about this and maybe this idea is already out there and people are working on it. Or maybe there are strong arguments against this direction. Either way it’s an important question and I’d love for other people to give their take on it.
Wish I knew! Corporations and countries are shaped by the same survival of the fittest dynamic, and they’ve turned out less than perfect but mostly fine. AI could be far more intelligent though, and it seems unlikely that our current oversight mechanisms would naturally handle that case. Technical alignment research seems like the better path.
But if technical alignment research concludes that alignment of SAI is impossible? That’s the depressing scenario that I’m starting to contemplate (I think we should at least have a Manhattan Project on alignment following a global Pause to be sure though).
Another probably very silly question: in what sense isn’t AI alignment just plain inconceivable to begin with? I mean, given the premise that we could and did create a superintelligence many orders of magnitude superior to ourselves, how could it even make sense to have any type of fail-safe mechanism to ‘enslave it’ to our own values? A priori, it sounds like trying to put shackles on God. We can’t barely manage to align ourselves as a species.
If an AI is built to value helping humans, and if that value can remain intact, then it wouldn’t need to be “enslaved”; it would want to be nice on its own accord. However, I agree with what I take to be the thrust of your question, which is that the chances seem slim that an AI would continue to care about human concerns after many rounds of self-improvement. It seems too easy for things to slide askew from what humans wanted one way or other, especially if there’s a competitive environment with complex interactions among agents.
The main way I currently see AI alignment to work out is to create an AI that is responsible for the alignment. My perspective is that humans are flawed and can not control / not properly control something that is smarter than them just as much as a single ant cannot control a human.
This in turn also means that we’ll eventually need to give up control and let the AI make the decisions with no way for a human to interfere.
If this is the case the direction of AI alignment would be to create this “Guardian AGI”, I’m still not sure how to go about this and maybe this idea is already out there and people are working on it. Or maybe there are strong arguments against this direction. Either way it’s an important question and I’d love for other people to give their take on it.
That argument sounds right to me. A recent paper made a similar case: https://arxiv.org/abs/2303.16200
What is the plan in this case? Indefinite Pause and scaling back of compute allowances? (Kind of hate that we might be living in the Dune universe.)
Wish I knew! Corporations and countries are shaped by the same survival of the fittest dynamic, and they’ve turned out less than perfect but mostly fine. AI could be far more intelligent though, and it seems unlikely that our current oversight mechanisms would naturally handle that case. Technical alignment research seems like the better path.
But if technical alignment research concludes that alignment of SAI is impossible? That’s the depressing scenario that I’m starting to contemplate (I think we should at least have a Manhattan Project on alignment following a global Pause to be sure though).