This isn’t a complete answer, but I think it is useful to have a list of prosaic alignment failures to make the basic issue more concrete. Examples include fairness (bad data leading to inferences that reflect bad values), recommendation systems going awry, etc. I think Catherine Olsson has a long list of these, but I don’t know where it is. We should generically effect some sort of amplification as AI strength increases; it’s conceivable the amplification is in the good direction, but at a minimum we shouldn’t be confident of that.
If someone is skeptical about AIs getting smart enough that this matters, you can point to the various examples of existing superhuman systems (game playing programs, dog distinguishers that beat experts, medical imaging systems that beat teams of experts, etc.). Narrow superintelligence should already be enough to worry, depending on how such systems are deployed.
This isn’t a complete answer, but I think it is useful to have a list of prosaic alignment failures to make the basic issue more concrete. Examples include fairness (bad data leading to inferences that reflect bad values), recommendation systems going awry, etc. I think Catherine Olsson has a long list of these, but I don’t know where it is. We should generically effect some sort of amplification as AI strength increases; it’s conceivable the amplification is in the good direction, but at a minimum we shouldn’t be confident of that.
If someone is skeptical about AIs getting smart enough that this matters, you can point to the various examples of existing superhuman systems (game playing programs, dog distinguishers that beat experts, medical imaging systems that beat teams of experts, etc.). Narrow superintelligence should already be enough to worry, depending on how such systems are deployed.
note: your link is broken
Fixed, thanks!