[Discussion] Best intuition pumps for AI safety
When I introduce people to AI safety I usually get one of three responses:
a) “that makes a lot of sense. What can we do about it?”,
b) “I get it rationally, but intuitively I don’t feel it”,
c) “I just don’t buy it—I don’t think machines can’t be smarter than humans”, “I still think that we can just program them the way we want” or something along these lines.
I get the last response even after giving the standard arguments for why a stop button won’t work, why superhuman intelligence is plausible or why intelligence doesn’t imply morality. So my hypothesis is that they find the thought of unaligned superhuman AI so unintuitive that they are unwilling to actually consider the arguments.
Thus, my question is: What are the best intuition pumps for AI safety?
I’m personally looking for Carl Shulman-style common sense arguments similar to those of his 80K podcast appearance. He argues that buying insurance for a gain-of-function lab would probably cost billions of dollars which gives us a better intuition about the risk involved.
I have recently started making the following argument. If you think that AI won’t be smarter than humans but agree that we cannot perfectly control AI in the same way that we cannot perfectly control humans, then you should be willing to pay as much money towards aligning AI as society spends on aligning humans, e.g. terror defense, prisons, police, and the justice system.
According to Investopedia, the US alone spends 175$ Billion on counterterrorism and 118$ Billion on police per year.
This paper from 2004 estimates that 70 rich nations spend more than 360$ Billion combined on the justice system in 1997.
Thus, if we adjust for inflation and missing countries we will likely get a lower bound of at least 1 Trillion Dollars spend per year on aligning humans. What we currently spend on AI safety is many orders of magnitude away from this.
Do you think this argument makes sense? Feedback and further suggestions are welcome. Your argument can also address different concerns that people typically have about AI safety.
It seems like most of the work is being done here:
If I were adopting my skeptic-hat, I don’t think I would buy that assumption. (Or like, sure, we can’t perfectly control AI, but your argument assumes that we are at least as unable to control AI as we are unable to control humans, which I wouldn’t buy.) AI systems are programs; programs are (kind of) determined entirely by their source code, which we perfectly control, why should they be as hard to control as humans? You wouldn’t make the same assumption for, say, Google Maps; what’s the difference?
So what would you pitch for skeptics look like? Just ask which assumptions they don’t buy, rebut and iterate?
Yup
There was a paper on this recently:
AI Risk Skepticism
Roman V. Yampolskiy
Abstract:
As a meta-comment, I like the title “What are the best intuition pumps for AI safety?” for this post, rather than “How do you convince skeptics?”. The former feels more like scout mindset to me, the latter like soldier mindset.
An intuition pump takes the form of a question or thought experiment that a person can think through themselves; an argument to “convince” a “skeptic” feels more like an attempt to push someone toward a view without letting them think too carefully.
Makes sense. I changed it. Thanks!
One of my theories here is that it’s helpful to pivot quickly towards “here’s an example concrete research problem that seem hard but not impossible, and people are working on it, and not knowing the solution seems obviously problematic”. This is good for several reasons, including “pattern-matching to serious research, safety engineering, etc., rather than pattern-matching to sci-fi comics”, providing a gentler on-ramp (as opposed to wrenching things like “your children probably won’t die of natural causes” or whatever), providing food for thought, etc. Of course this only works if you can engage in the technical arguments. Brian Christian’s book is the extreme of this approach.
One thing I could imagine happening in these situations is that people close themselves off to object level arguments to a degree, and maybe for (somewhat) good reason.
to the general public, the idea of AI being a serious (existential) risk is probably still very weird
people may have an impression that believing in such things correlates with being gullible
people may be hesitant towards “being convinced” of something they haven’t fully thought through themselves
I remember once when I was younger talking to a Christian fanatic of sorts, who kept coming up with new arguments for why the bible must obviously be true due to the many correct predictions it has apparently made, plus some argument about irreducible complexity. In the moment, I couldn’t really tell if/where/why his arguments failed. I found them somewhat hard to follow and just knew the conclusion would be something that is both weird and highly unlikely (for reasons other than his concrete arguments). So my impression then was “there surely is something wrong about his claims, but in this very moment I’m lacking the means to identify the weaknesses”.
I sometimes find myself in similar situations when some person tries to get me to sign something or to buy some product they’re offering. They tend to make very convincing arguments about why I should definitely do it. I often have no good arguments against that. Still, I tend to resist many of these situations because I haven’t yet heard or had a chance to find the best counter arguments.
When somebody who has thought a lot about AI safety and is very convinced of its importance talks to people to whom this whole area is new and strange, I can imagine similar defenses being present. If this is true, more/better/different arguments may not necessarily be helpful to begin with. Some things that could help:
social proof (“these well respected people and organizations think this is important”)
slightly weaker claims that people have an easier time agreeing with
maybe some meta level argument about why the unintuitive-ness is misguided (although this could probably also taken as an attack)