The argument for doom by default seems to rest on a default misunderstanding of human values as the programmer attempts to communicate them to the AI.
I don’t think this is correct. The argument rests on AIs having any values which aren’t human values (e.g. maximising paperclips), not just misunderstood human values.
Maximising paperclips is a misunderstood human value. Some lazy factory owners says, gee wouldn’t it be great if I could get an AI to make my paperclips for me? Then builds an AGI and asks it to make paperclips, and it then makes everything into paperclips its utility function being unreflective of its owners true desire to also have a world.
If there is a flaw here it’s probably somewhere in thinking that AGI will get built as some sort of intermediate tool and that it will be easy to rub the lamp and ask the genie to do something in easy to misunderstand natural language.
Presumably the programmer will make some effort to embed the right set of values in the AI. If this is an easy task, doom is probably not the default outcome.
AI pessimists have argued human values will be difficult to communicate due to their complexity. But as AI capabilities improve, AI systems get better at learning complex things.
Both the instrumental convergence thesis and the complexity of value thesis are key parts of the argument for AI pessimism as it’s commonly presented. Are you claiming that they aren’t actually necessary for the argument to be compelling? (If so, why were they included in the first place? This sounds a bit like justification drift.)
I don’t think this is correct. The argument rests on AIs having any values which aren’t human values (e.g. maximising paperclips), not just misunderstood human values.
Maximising paperclips is a misunderstood human value. Some lazy factory owners says, gee wouldn’t it be great if I could get an AI to make my paperclips for me? Then builds an AGI and asks it to make paperclips, and it then makes everything into paperclips its utility function being unreflective of its owners true desire to also have a world.
If there is a flaw here it’s probably somewhere in thinking that AGI will get built as some sort of intermediate tool and that it will be easy to rub the lamp and ask the genie to do something in easy to misunderstand natural language.
Presumably the programmer will make some effort to embed the right set of values in the AI. If this is an easy task, doom is probably not the default outcome.
AI pessimists have argued human values will be difficult to communicate due to their complexity. But as AI capabilities improve, AI systems get better at learning complex things.
Both the instrumental convergence thesis and the complexity of value thesis are key parts of the argument for AI pessimism as it’s commonly presented. Are you claiming that they aren’t actually necessary for the argument to be compelling? (If so, why were they included in the first place? This sounds a bit like justification drift.)