When I introduce AI risk to someone, I generally start by talking about how we don’t actually know what’s going on inside of our ml systems, that we’re bad at making their goals what we actually want, and we have no way of trusting that the systems actually have the goals we’re telling them to optimize for.
Next I say this is a problem because as the state of the art of AI progresses, we’re going to be giving more and more power to these systems to make decisions for us, and if they are optimizing for goals different from ours this could have terrible effects.
I then note that we’ve already seen this happen in YouTube’s algorithm a few years ago: they told it to maximize the time spent on the platform, thinking it would just show people videos they liked. But in reality it learned that there were a few videos it could show people which would radicalize them into a political extreme, and by doing this it was far easier to judge which videos would keep them on the platform the longest: those which showed those they agreed with doing good things & being right, and which showed those they disagreed with doing bad things & being wrong. This has since been fixed, but the point is that we thought we were telling it to do one thing, but then it did something we really didn’t want it to do. If this system had more power (for instance, running drone swarms or being the CEO-equivalent of a business), it would be far harder to both understand what it was doing wrong, and be able to physically change it’s code.
I then say the situation becomes even worse if the AI is smarter than the typical human. There are many people who have malicious goals, and are as smart as the average person, but who are able to stay in positions of power through politically outmaneuvering their rivals. If the AI is better than these people at manipulating humans (which seems very likely, given the thing AIs are known for best nowadays is manipulating humans to do what the company they serve wants), then it is hopeless to attempt to remove them from power.
When I introduce AI risk to someone, I generally start by talking about how we don’t actually know what’s going on inside of our ml systems, that we’re bad at making their goals what we actually want, and we have no way of trusting that the systems actually have the goals we’re telling them to optimize for.
Next I say this is a problem because as the state of the art of AI progresses, we’re going to be giving more and more power to these systems to make decisions for us, and if they are optimizing for goals different from ours this could have terrible effects.
I then note that we’ve already seen this happen in YouTube’s algorithm a few years ago: they told it to maximize the time spent on the platform, thinking it would just show people videos they liked. But in reality it learned that there were a few videos it could show people which would radicalize them into a political extreme, and by doing this it was far easier to judge which videos would keep them on the platform the longest: those which showed those they agreed with doing good things & being right, and which showed those they disagreed with doing bad things & being wrong. This has since been fixed, but the point is that we thought we were telling it to do one thing, but then it did something we really didn’t want it to do. If this system had more power (for instance, running drone swarms or being the CEO-equivalent of a business), it would be far harder to both understand what it was doing wrong, and be able to physically change it’s code.
I then say the situation becomes even worse if the AI is smarter than the typical human. There are many people who have malicious goals, and are as smart as the average person, but who are able to stay in positions of power through politically outmaneuvering their rivals. If the AI is better than these people at manipulating humans (which seems very likely, given the thing AIs are known for best nowadays is manipulating humans to do what the company they serve wants), then it is hopeless to attempt to remove them from power.