This comment will focus on the specific approaches you set out, rather than the high level question, although I’m also interested in seeing comments from others on how difficult it is to solve alignment, and why.
The approach you’ve set out resembles Coherent Extrapolated Volition (CEV), which was described earlier by Bostrom. I’m not sure what the consensus is on CEV, but here’s a few thoughts which I have in my head from when I thought about CEV (several years ago now).
How do we choose the correct philosophers and intellectuals—e.g. would we want Nietsche or Wagner to be on the list of intellectuals, given the (arguable) links to the Nazis?
How do we extrapolate? (i.e. how do you determine whether the list of intellectuals would want the action to happen?)
For example, Plato was arguably in favour of dictatorships and preferred them over democracies, but recent history seems to suggest that democracies have fared better than dictatorships—should we extrapolate that Plato would prefer democracies if he lived today? How do we know?
Another example, perhaps a bit closer to home: some philosophers might argue that under some forms of utilitarianism, the ends justify the means, and it is appropriate to steal resources in order to fund activities which are in the best long-term interests of humanity. Even if those philosophers say they don’t believe that, they might just be pandering to expectations from society, and the AI might extrapolate that they would say that if unfettered.
In other words, I don’t think this does clearly guarantee us against power-seeking behaviour.
“How do we choose the correct philosophers?” Choose nearly all of them; don’t be selective. Because the AI must get approval fom every philosopher, this will be a severe constraint, but it ensures that the AI’s actions will be unambiguously good. Even if the AI has to make contentious extrapolations about some of the philosophers, I don’t think it would be free to do anything awful.
Ok, maybe don’t include every philosopher. But I think it would be good to include people with a diverse range of views: utilitarians, deontologists, animal rights activists, human rights activists, etc. I’m uncomfortable with the thought of AI unilaterally imposing a contentious moral philosophy (like extreme utilitarianism) on the world.
Even with my constraints, I think AI would be free to solve many huge problems, e.g. climate change, pandemics, natural disasters, and extreme poverty.
Assuming it could be implemented, I definitely think your approach would help prevent the imposition of serious harms.
I still intuitively think the AI could just get stuck though, given the range of contradictory views even in fairly mainstream moral and political philosophy. It would need to have a process for making decisions under moral uncertainty, which might entail putting additional weight on the views on certain philosophers. But because this is (as far as I know) a very recent area of ethics, the only existing work could be quite badly flawed.
I think a superintelligent AI will be able to find solutions with no moral uncertainty. For example, I can’t imagine what philosopher would object to bioengineering a cure to a disease.
I don’t think you need to commit yourself to including everyone. If it is true for any subset of people, then the point you gesture at in your post goes through. I have had similar thoughts to those you suggest in the post. If we gave the AI the goal of ‘do what Barack Obama would do if properly informed and at his most lucid’, I don’t really get why we would have high confidence in a treacherous turn or of the AI misbehaving in a catastrophic way. The main response to this seems to be to point to examples of AI not doing what we intend from limited computer games. I agree something similar might happen with advanced AI but don’t get why it is guaranteed to do so or why any of the arguments I have seen lend weight to any particular probability estimate of catastrophe.
It also seems like increased capabilities would in a sense increased alignment (with Obama) because the more advanced AIs would have a better idea of what Obama would do.
This comment will focus on the specific approaches you set out, rather than the high level question, although I’m also interested in seeing comments from others on how difficult it is to solve alignment, and why.
The approach you’ve set out resembles Coherent Extrapolated Volition (CEV), which was described earlier by Bostrom. I’m not sure what the consensus is on CEV, but here’s a few thoughts which I have in my head from when I thought about CEV (several years ago now).
How do we choose the correct philosophers and intellectuals—e.g. would we want Nietsche or Wagner to be on the list of intellectuals, given the (arguable) links to the Nazis?
How do we extrapolate? (i.e. how do you determine whether the list of intellectuals would want the action to happen?)
For example, Plato was arguably in favour of dictatorships and preferred them over democracies, but recent history seems to suggest that democracies have fared better than dictatorships—should we extrapolate that Plato would prefer democracies if he lived today? How do we know?
Another example, perhaps a bit closer to home: some philosophers might argue that under some forms of utilitarianism, the ends justify the means, and it is appropriate to steal resources in order to fund activities which are in the best long-term interests of humanity. Even if those philosophers say they don’t believe that, they might just be pandering to expectations from society, and the AI might extrapolate that they would say that if unfettered.
In other words, I don’t think this does clearly guarantee us against power-seeking behaviour.
“How do we choose the correct philosophers?” Choose nearly all of them; don’t be selective. Because the AI must get approval fom every philosopher, this will be a severe constraint, but it ensures that the AI’s actions will be unambiguously good. Even if the AI has to make contentious extrapolations about some of the philosophers, I don’t think it would be free to do anything awful.
Under that constraint, I wonder if the AI would be free to do anything at all.
Ok, maybe don’t include every philosopher. But I think it would be good to include people with a diverse range of views: utilitarians, deontologists, animal rights activists, human rights activists, etc. I’m uncomfortable with the thought of AI unilaterally imposing a contentious moral philosophy (like extreme utilitarianism) on the world.
Even with my constraints, I think AI would be free to solve many huge problems, e.g. climate change, pandemics, natural disasters, and extreme poverty.
Assuming it could be implemented, I definitely think your approach would help prevent the imposition of serious harms.
I still intuitively think the AI could just get stuck though, given the range of contradictory views even in fairly mainstream moral and political philosophy. It would need to have a process for making decisions under moral uncertainty, which might entail putting additional weight on the views on certain philosophers. But because this is (as far as I know) a very recent area of ethics, the only existing work could be quite badly flawed.
I think a superintelligent AI will be able to find solutions with no moral uncertainty. For example, I can’t imagine what philosopher would object to bioengineering a cure to a disease.
I don’t think you need to commit yourself to including everyone. If it is true for any subset of people, then the point you gesture at in your post goes through. I have had similar thoughts to those you suggest in the post. If we gave the AI the goal of ‘do what Barack Obama would do if properly informed and at his most lucid’, I don’t really get why we would have high confidence in a treacherous turn or of the AI misbehaving in a catastrophic way. The main response to this seems to be to point to examples of AI not doing what we intend from limited computer games. I agree something similar might happen with advanced AI but don’t get why it is guaranteed to do so or why any of the arguments I have seen lend weight to any particular probability estimate of catastrophe.
It also seems like increased capabilities would in a sense increased alignment (with Obama) because the more advanced AIs would have a better idea of what Obama would do.