I am concerned about AI risk so I don’t like including this, but I do think it “polls badly” among my friends who take GiveWell etc pretty seriously. I wonder if it could be reframed to sound less objectionable.
You know, my take on this is that instead of resisting comparisons to Terminator and The Matrix, they should just be embraced (mostly). “Yeah, like that! We’re trying to prevent those things from happening. More or less.”
The thing is, when you’re talking about something that sounds kind of far out, you can take one of two tactics: you can try to engineer the concept and your language around it so that it sounds more normal/ordinary, or you can just embrace the fact that it is kind of crazy, and use language that makes it clear you understand that perception.
I think something about properly testing powerful new technologies and making sure they’re not used to hurt people sounds pretty intuitive. I think people intuitively get that anything with military applications can cause serious accidents or be misused by bad actors.
I’m aware a problem with AI risk or AI safety is that it doesn’t distinguish other AI-related ethics or security concerns from the AI alignment problem, as the EA community’s primary concern about advanced AI. I got interesting answers to a question I recently asked on LessWrong about who else has this same attitude towards this kind of conceptual language.
When I introduce AI risk to someone, I generally start by talking about how we don’t actually know what’s going on inside of our ml systems, that we’re bad at making their goals what we actually want, and we have no way of trusting that the systems actually have the goals we’re telling them to optimize for.
Next I say this is a problem because as the state of the art of AI progresses, we’re going to be giving more and more power to these systems to make decisions for us, and if they are optimizing for goals different from ours this could have terrible effects.
I then note that we’ve already seen this happen in YouTube’s algorithm a few years ago: they told it to maximize the time spent on the platform, thinking it would just show people videos they liked. But in reality it learned that there were a few videos it could show people which would radicalize them into a political extreme, and by doing this it was far easier to judge which videos would keep them on the platform the longest: those which showed those they agreed with doing good things & being right, and which showed those they disagreed with doing bad things & being wrong. This has since been fixed, but the point is that we thought we were telling it to do one thing, but then it did something we really didn’t want it to do. If this system had more power (for instance, running drone swarms or being the CEO-equivalent of a business), it would be far harder to both understand what it was doing wrong, and be able to physically change it’s code.
I then say the situation becomes even worse if the AI is smarter than the typical human. There are many people who have malicious goals, and are as smart as the average person, but who are able to stay in positions of power through politically outmaneuvering their rivals. If the AI is better than these people at manipulating humans (which seems very likely, given the thing AIs are known for best nowadays is manipulating humans to do what the company they serve wants), then it is hopeless to attempt to remove them from power.
AI risk
I am concerned about AI risk so I don’t like including this, but I do think it “polls badly” among my friends who take GiveWell etc pretty seriously. I wonder if it could be reframed to sound less objectionable.
You know, my take on this is that instead of resisting comparisons to Terminator and The Matrix, they should just be embraced (mostly). “Yeah, like that! We’re trying to prevent those things from happening. More or less.”
The thing is, when you’re talking about something that sounds kind of far out, you can take one of two tactics: you can try to engineer the concept and your language around it so that it sounds more normal/ordinary, or you can just embrace the fact that it is kind of crazy, and use language that makes it clear you understand that perception.
So like, “AI Apocalypse Prevention”?
I think something about properly testing powerful new technologies and making sure they’re not used to hurt people sounds pretty intuitive. I think people intuitively get that anything with military applications can cause serious accidents or be misused by bad actors.
Unfortunately this isn’t a very good description of the concern about AI, and so even if it “polls better” I’d be reluctant to use it.
I’m aware a problem with AI risk or AI safety is that it doesn’t distinguish other AI-related ethics or security concerns from the AI alignment problem, as the EA community’s primary concern about advanced AI. I got interesting answers to a question I recently asked on LessWrong about who else has this same attitude towards this kind of conceptual language.
“AI is the new Nuclear Weapons. We don’t an arms race which leads to unsafe technologies” perhaps?
When I introduce AI risk to someone, I generally start by talking about how we don’t actually know what’s going on inside of our ml systems, that we’re bad at making their goals what we actually want, and we have no way of trusting that the systems actually have the goals we’re telling them to optimize for.
Next I say this is a problem because as the state of the art of AI progresses, we’re going to be giving more and more power to these systems to make decisions for us, and if they are optimizing for goals different from ours this could have terrible effects.
I then note that we’ve already seen this happen in YouTube’s algorithm a few years ago: they told it to maximize the time spent on the platform, thinking it would just show people videos they liked. But in reality it learned that there were a few videos it could show people which would radicalize them into a political extreme, and by doing this it was far easier to judge which videos would keep them on the platform the longest: those which showed those they agreed with doing good things & being right, and which showed those they disagreed with doing bad things & being wrong. This has since been fixed, but the point is that we thought we were telling it to do one thing, but then it did something we really didn’t want it to do. If this system had more power (for instance, running drone swarms or being the CEO-equivalent of a business), it would be far harder to both understand what it was doing wrong, and be able to physically change it’s code.
I then say the situation becomes even worse if the AI is smarter than the typical human. There are many people who have malicious goals, and are as smart as the average person, but who are able to stay in positions of power through politically outmaneuvering their rivals. If the AI is better than these people at manipulating humans (which seems very likely, given the thing AIs are known for best nowadays is manipulating humans to do what the company they serve wants), then it is hopeless to attempt to remove them from power.
Why do they object to it?
My experience has been that those who don’t participate in EA at all have a better reception of “AI risk” in general than near-termists in EA.
I expect long-termists care as much if not more about what others outside of EA think of AI risk as a concept than near-termists in EA.
I also recently asked a related question on LessWrong about the distinction between AI risk and AI alignment as concepts.
Avoid catostrophic industrial/research accidents?