A response to Matthews on AI Risk
Dylan Matthews has written lots of useful exploratory material about EA. Of my favourite journalistic articles about effective altruism, Matthews has written about half. So I was surprised to see that after attending the recent EA Global conference, Matthews wrote that he left worried, largely because of the treatment of AI risk, a topic that seems important to me. Matthews’s writing, though as clear as ever, had some issues with its facts and background research that I think compromised major parts of his argument.
Matthews critique of AI risk mitigation was mixed in with a wide range of his experiences and impressions from the event, but still his criticism was more substantial than most, and are already gaining thousands of social media shares. His main points, it seems to me, were that AI risk reduction efforts are:
self-serving
self-defeating
based on Pascal’s Mugging, a riddle in expected value thinking.
Let’s take these in the reverse order.
Conference-goers told Matthews that reducing AI risks is enormously important even if the probability of successfully mitigating those risks was arbitrarily small. Matthews reasonable identified this as an argument from arbitrarily small probabilities of astronomical gains, known as Pascal’s Mugging. He attributes discussion of the idea to Bostrom, and uses it it to challenge the notion of funding an AI risk charity like MIRI. Pascal’s Mugging is an interesting and contentious issue in decision theory. But does Matthews realise that the approach was thoroughly disendorsed by MIRI years ago? If Matthews read Bostrom’s piece about Pascal’s Mugging to the end, and followed Bostrom’s link, he would realise that the idea was originated by MIRI’s founder Eliezer Yudkowsky. In Eliezer’s original piece, non-credible offers of astronomical utility were not described as things to go and do, but as an unresolved puzzle.
Matthews says that to want to reduce existential risk, you have to distinguish between a probability of success of 10e-15 or 10e-50, and then throws his hands into the air exclaiming that surely noone could achieve such precision. This would be fine if Matthews had presented arguments that the likelihood of doing useful AI safety research was even less than one in a hundred.
But Matthews’ reservations about AI safety efforts were only a paragraph in length, and did not have such force. First, he professed some uncertainty whether AI is possible. However, this should not seriously slim the odds of reducing AI risk. The median AI researcher estimates even odds of human-level AI between 2035 and 2050 so the prospect that AI is possible and achievable within decades is large enough to worry about. Second, he doubts whether intelligence is sufficient to give a computer dominion over humans. But intelligence is exactly what has always given humans dominion over animals. A superintelligent AI could covertly gain extreme financial power (as trading algorithms already do), hack hardware (as academics do) and control military devices (as drone software does) --- at least!. Third, he asks whether artificial intelligences might function just as tools, rather than as agents. But an ultra-powerful AI tool would still permit one human to wield power over all others, a problem that would still require some combination of technical and other risk-reduction research. Fourth, he asks how we ought to define friendliness in the context of machines. But this question that has previously been of interest to MIRI researchers, and will probably return to the fold as progress is facilitated by work on underlying mathematical problems. All up, Matthews has weakly argued for uncertainty about the impact of AI and AI safety research, but then supposes that we therefore can’t tell the probability of success from 10e-15 and 10e-50. If we’re the kind of people who want to quantify our uncertainty, then Matthews has presented a complete non sequitur. If we’re uncertain about Matthews propositions, we ought to place our guesses somewhere closer to 50%. To do otherwise would be to mistake our deep uncertainty deep scepticism. And if the prospects are decent, then as Rob Wiblin, a speaker at the conference, has previously explained, Pascal’s Mugging is not needed:
“While there are legitimate question marks over whether existential risk reduction really does offer a very high expected value, and we should correct for ‘regression to the mean‘, cognitive biases and so on, I don’t think we have any reason to discard these calculations altogether. The impulse to do so seems mostly driven by a desire to avoid the weirdness of the conclusion, rather than actually having a sound reason to doubt it.
A similar activity which nobody objects to on such theoretical grounds is voting, or political campaigning. Considering the difference in vote totals and the number of active campaigners, the probability that someone volunteering for a US presidential campaign will swing the outcome seems somewhere between 1 in 100,000 and 1 in 10,000,000. The US political system throws up significantly different candidates for a position with a great deal of power over global problems. If a campaigner does swing the outcome, they can therefore have a very large and positive impact on the world, at least in subjective expected value terms.
While people may doubt the expected value of joining such a campaign on the grounds that the difference between the candidates isn’t big enough, or the probability of changing the outcome too small, I have never heard anyone say that the ‘low probability, high payoff’ combination means that we must dismiss it out of hand.”
Since there are a wide range of proposed actions for reducing risks from artificial intelligence, some more of which I will mention, it would take extensive argumentation to suggest that the probability of success for any of them was much less than swinging an election victory. So it would not seem that there’s any need to involve riddles in decision theory to decide whether AI safety research is something worth doing.
Claiming that AI risk-reduction research would be self-defeating, Matthews says: “It’s hard to think of ways to tackle this problem today other than doing more AI research, which itself might increase the likelihood of the very apocalypse this camp frets over”. But this sells the efforts of AI risk reducers far short. First, they are taking efforts that are political, such as ralling researchers and reporting to politicians. Second, there are strategy and technological forecasting. Overall, achieving differential progress of safety technology relative to raw intelligence has been the main point of the AI risk reduction progress for years. It remains a key fixture—see Russell’s recent talk, where he advocated promoting reverse reinforcement learning, while decreasing fine-tuning of deep neural networks. But even if Matthews disagreed with Russell’s assessment, this would only disagree with one specific plan for AI risk reduction, not the validity of the enterprise altogether. There are a wide range of other approaches to safely address the safety problem, such as rallying risk-aware researchers and politicians, and building clear strategies and timelines, that seem even more unambiguously good, and it would be odd—to say the least—if every one of these turned out to increase the risk of apocalypse, and that neither could any new safe courses of action be discovered.
Last, Matthews argues that AI risk reduction talk could be self-serving or biased. “At the risk of overgeneralizing, the computer science majors have convinced each other that the best way to save the world is to do computer science research.”. He later returns to the issue: “The movement has a very real demographic problem, which contributes to very real intellectual blinders of the kind that give rise to the AI obsession.” The problem here is that AI risk reducers can’t win. If they’re not computer scientists, they’re decried as uninformed non-experts, and if they do come from computer scientists, they’re promoting and serving themselves. In reality, they’re a healthy mixture. From MIRI, Eliezer wanted to make AI, and has had to flip into making AI safety measures. Bostrom, who begun as a philosopher, has ended up writing about AI because it seems not only interesting, but also like an important problem. Interestingly, where Eliezer gets criticised for his overly enthusiastic writing and warmth for science fiction, Bostrom has, in order to avoid bias, avoided it entirely. Russell begun as an AI professor, and it was only when he took sabbatical that he realised that despite Eliezer’s grating writing style, he was onto something. These stories would seem to describe efforts to overcoming bias moreso than succumbing to it.
Let’s take on one final argument of Matthews’ that also sums up the whole situation. According to Matthews, those concerned about AI risk presuppose that unborn people count equally to people alive now. To begin with, that’s stronger than what Eliezer argues. Eliezer has argued that future people need only to be valuable to within some reasonable factor of present people, to be overwhelmingly important. If our unborn children, great grandchildren and so on for a dozen generations were even 10% as important as us, then they would be more important than people currently living. If population grows in that time, or we give some moral weight to generations beyond that, or you privilege our descendants more equally to ourselves, then their value increases far beyond ours. Even if the 8 billion people currently alive were lost to a disaster, AI or otherwise, that would be terrible. It also seems neglected, as the comparison and prioritisation of such disasters lacks an establish field in which to receive proper academic attention. If future generations count also, then it may be terribly worse, and the question is just how much.
If Matthews just wanted to say that it’s a bit awkward that people are still citing Pascal’s Mugging arguments, then that would be fine. But if he was writing a piece whose main focus was his reservations about AI risk, and would widely be distributed as a critique of such, then he should have stress-tested these against the people who are working for the organisations being criticised, and who were eminently accessible to him at the recent conference. Unfortunately, it’s not straightforward to undo the impact of a poorly thought-through, and shareable opinion piece. At any rate, Matthews can be one of the first to read this counter-critique and I’m happy to correct any errors.
In conclusion, covering AI risk is hard. AI risk reduction efforts are a mixture of CS-experts and others, who would anyway be criticised if they were composed differently. Even if one gives some privilege to presently alive people above our descendants, existential risks are important, and we’ve no reason to be so sceptical as to invoke Pascal’s Mugging to support AI risk reduction.
- Tiny Probabilities of Vast Utilities: Bibliography and Appendix by 20 Nov 2018 17:34 UTC; 10 points) (
- 24 Aug 2015 13:15 UTC; 4 points) 's comment on Effective Altruism is a Big Tent by (
- 15 Aug 2015 9:51 UTC; 2 points) 's comment on August Open Thread: EA Global! by (
Whether this is right or wrong—and Ryan is certainly correct that Dylan Matthews’ piece didn’t offer a knock-down argument against focusing on AI risk, which I doubt it was intended to do—it’s worth noting that the article in question wasn’t only about this issue. It focused primarily on Matthews’ worries about the EA movement as a whole, following EA Global San Francisco. These included a lack of diversity, the risks of the focus on meta-charities and movement building (which can of course be valuable, but can also lead to self-servingness and self-congratulation), and the attitude of some of those focused on x-risks to people focused on global poverty. On this last, here was my comment from Facebook:
I think you are short selling Matthews on Pascal’s Mugging. I don’t think his point was that you must throw up your hands because of the uncertainty, but that he believes friendly AI researchers have approximately the same amount of evidence that AI research done today will have a 10^-15 chance of saving the existence of future humanity as any infinitesimal but positive chance.
Anyone feel free to correct me, but I believe in such a scenario spreading your prior evenly over all possible outcomes wouldn’t arbitrarily just include spitting the difference between 10^-15 or 10^-50 but spreading your belief over all positive outcomes below some reasonable barrier and (potentially) above another* (and this isn’t taking into account the non-zero, even if unlikely, probability that despite caution AI research is indeed speeding up our doom). What those numbers are is very difficult to tell but if the estimation of those boundaries is off, and given the record of future predictions of technology it’s not implausible, then all current donations could end up doing basically nothing. In other words, his critique is not that we must give up in the face of uncertainty but that the the justification of AI risk reduction being valuable right now depends on a number of assumptions with rather large error bars.
Despite what appeared to him to be this large uncertainty, he seemed to encounter many people who brushed aside, or seemingly belittled, all other possible cause areas and this rubbed him the wrong way. I believe that was his point about Pascal’s Mugging. And while you criticized him for not acknowledging MIRI does not support Pascal’s Mugging reasoning to support AI research, he never said they did in the article. He said many people at the conference replied to him with that type of reasoning (and as a fellow attendee, I can attest to a similar experience).
*Normally, I believe, it would be all logically possible outcomes but obviously it’s unreasonable to believe a $1000 donation, which was his example, has, say, a 25% chance of success given everything we know about how much such work costs, etc. However, where the lower bound is on this estimation is far less clear.
It’s complicated, but I don’t think it makes sense to have a probability distribution over probability distributions, because it collapses. We should just have a probability distribution over outcomes. We choose our prior estimate for chance of success based on other cases of people attempting to make safer tech.
In fairness, for people who adhere to expected value thinking to the fullest extent (some of whom would have turned out to the conference), arguments purely on the basis of scope of potential impact would be persuasive. But if it’s even annoying folks at EA Global, then probably people ought to stop using them.
I did mean over outcomes. I was referring to this:
That seems mistaken to me but it could be because I’m misinterpreting it. I was reading it as saying we should split the difference between the two probabilities of success Matthews proposed. However, I thought he was suggesting, and believe it is correct, that we shouldn’t just pick the median between the two because the smaller number was just an example. His real point being that any tiny probability of success seems equally as reasonable from the vantage point of now. If true we’d then have to split our prior evenly over that range instead of picking the median between 10^-15 and 10^-50. And given it’s very difficult to put a lower bound on the reasonable range but a $1000 donation being a good investment depends on a specific lower bound higher than he believes can be justified with evidence, some people came across as unduly confident.
Let me be very clear, I was not annoyed by them, even if I disagree, but people definitely used this reasoning. However, as I often point out, extrapolating from me to other humans is not a good idea even within the EA community.
I think it’s very good Matthews brought this point up so the movement can make sure we remain tolerant and inclusive of people mostly on our side but differing in a few small points. Especially those focused on x-risk, if he finds them to be most aggressive, but really I think it should apply to all of us.
That being said, I wish he had himself refrained from being divisive with allegations that x-risk is self-serving for those in CS. Your point about CS concentrators being “damned if you do, damned if you don’t” is great. Similarly, the point (you made on facebook?) about many people converting from other areas into computer science as they realize the risk is a VERY strong counterargument to his. But more generally, it seems like he is applying asymmetric standards here. It seems the x-risk crowd no more deserves his label of biased and self-serving as the animal rights crowd, or the global poverty crowd; many of the people in those subsets also began there, so any rebuttal could label them as self-serving for promoting their favored cause if we wanted. Ad hominem is a dangerous road to go down, and I wish he would refrain from critiquing the people and stick to critiquing the arguments (which actually promotes good discussion from people like you and Scott Alexander in regards to his pseudo-probability calculation, even if we’ve been down this road before).
Predictions about when we will achieve human-level AI have been wildly inaccurate in the past[1]. I don’t think the predictions of current AI researchers is particularly useful data point.
Assuming that we do in fact achieve human-level AI at some point, then if we’re going to avoid Pascal’s Mugging you need to present compelling evidence that the path from human-level AI → superintelligent/singularity/end of humanity level AI is (a) a thing which is likely (i.e. p >> 10^-50), (b) a thing that is likely to be bad for humanity, and (c) that we have a credible chance of altering the outcome in a way that benefits us.
I’ve seen some good arguments for (b), much less so for (a) and (c). Are there good arguments for these, or am I missing some other line of reasoning (very possible) that makes a non-Pascal’s Mugging argument for AI research?
[1] https://intelligence.org/files/PredictingAI.pdf
I agree that they’re not reliable, but there’s not much better. We’re basically citing the same body of surveys. On a compromise reading I suppose they suggest that AI will likely happen anywhen between this decade and a few centuries, with most of the weight in this one, which sounds right to me.
a) AI is already superhuman in a bunch of domains that are increasingly complex, starting from backgammon and chess to Jeopardy, driving, and image recognition. Computing power is still increasing, though less quickly than before, and in a more parallel direction. Algorithms are also getting better. There’s also a parallel path to superintelligence through brain emulation. So AI getting superhuman in some domains is science fact already.
Once AI gets more intelligent than one human in certain important domains, such as i) information security ii) trading iii) manipulating military hardware, or iv) persuasion, it will have significant power. See Scott’s colourful descriptions. So p < 10^-2 cannot be right here
c) This is harder. The best achievements of the AI community so far are making some interesting theoretical discoveries re cooperation and decision theory (MIRI), attracting millions in donations from eccentric billionaires (FLI), convening dozens of supportive AI experts (FLI), writing a popular book (Bostrom) and meeting with high levels of government in UK (FHI) and Germany (CSER). This is great, though none yet shows the path to friendly AI. There are suggestions for how to make friendly AI. Even if there weren’t, there’d be a nontrivial chance that this emerging x-risk-aware apparatus would find them, given that it it young and quickly gaining momentum. MIRI’s approach would require a lot more technical exploration, while a brain emulation approach would require a heap more resources, as well as progress in hardware and brain-scanning technology. I think this is the substance that has to be engaged with to push this discussion forward, and potentially also to improve AI safety efforts.
While Ryan’s below rebuttal to A sways me a little bit, I have reservations until I learn more about counterarguments to predictions of an intelligence explosion The biggest bottleneck put here is C, which gets to the heart of why I wouldn’t be ready to focus primarily on A.I. safety research among all causes right now. Thanks to you and Ryan for illustrating my concerns also, ones I can point to in the future when discussing A.I. safety research.
I haven’t explored the debate over AI risk in the EA movement in depth, so I’m not informed enough to take a strong position. But Kosta’s comment gets at one of the things that has puzzled me—as basically an interested outsider—about the concern for x-risk in EA. A very strong fear of human extinction seems to treat humanity as innately important. But in a hedonic utilitarian framework, humanity is only contingently important to the extent that the continuation of humanity improves overall utility. If an AI or AIs could improve overall utility by destroying humanity (perhaps after determining that humans feel more suffering than pleasure overall, or that humans cause more suffering than pleasure overall, or that AIs feel more pleasure and less suffering than humans and so should use all space and resources to sustain as many AIs as possible), then hedonic utilitarians (and EAs, to the extent that they are hedonic utilitarians) should presumably want AIs to do this.
I’m sure there are arguments that an AI that destroys humanity would end up lowering utility, but I don’t get the impression that x-risk-centered EAs only oppose the destruction of humanity if it turns out humanity adds more pleasure to the aggregate. I would have expected to see EAs arguing something more like, “Let’s make sure an AI only destroys us if destroying us turns out to raise aggregate good,” but instead the x-risk EAs seem to be saying something more like, “Let’s make sure an AI doesn’t destroy us.”
But maybe the x-risk-centered EAs aren’t hedonic utilitarians, or they mostly tend to think an AI destroying humanity would lower overall utility and that’s why they oppose it, or there’s something else that I’m missing – which is probably the case, since I haven’t investigated the debate in detail.
Cautious support of giving an AI control is not opposed to x-risk reduction. Reduction of x-risk is defined as curtailing the potential of Earth-originating life. Turning civ over to AIs or ems might be inevitable, but would still be safety-critical.
A non-careful transition to AI is bad for utilitarians and many others because of its irreversibility. Once you codify values (a definition of happiness and whatever else) in an AI, they’re stuck, unless you’ve programmed the AI a way for it to reflect on its values. When combined with Bostrom’s argument in Astronomical Waste, that the eventual awesomeness of a technologically mature civilisation is more important than when it is achived, this gives a strong reason for caution.
I forgot to mention that your post did help to clarify points and alleviate some of my confusion. Particularly the idea that an ultra-powerful AI tool (which may or may not be sentient) “would still permit one human to wield power over all others.”
The hypothetical of an AI wiping out all of humanity because it figures out (or thinks it figures out) that it will increase overall utility by doing so is just one extreme possibility. There must be a lot of credible seeming scenarios opposed to this one in which an AI could be used to increase overall suffering. (Unless the assumption was that a super intelligent being or device couldn’t help but come around to a utilitarian perspective, no matter how it was initially programmed!)
Also, like Scott Alexander wrote on his post about this, x-risk reduction is not all about AI.
Still, from a utilitarian perspective, it seems like talking about “AI friendliness” should mean friendliness to overall utility, which won’t automatically mean friendliness to humanity or human rights. But again, I imagine plenty of EAs do make that distinction, and I’m just not aware of it because I haven’t looked that far into it. And anyway, that’s not a critique of AI risk being a concern for EAs; at most, it’s a critique of some instances of rhetoric.
Brian Tomasik is ab self-described “negative-leaning” hedonic utilitarian who is a prominent thinker for effective altruism. He’s written about how humanity might have values which lead us to generating much suffering in the future, but also worries a machine superintelligence might end up doing the same. They’re myriad reasons he thinks this I can’t do justice to here. I believe right now he thinks the best course of action is to try steering values of present-day humanity, as much of it or as crucially an influential subset as possible, towards neglecting suffering less. He also believes doing foundational research into ascertaining better the chances of a singleton to promulgate suffering throughout space in the future. To this end he both does research with and funds colleagues at the Foundational Research Institute.
His whole body of work concerning future suffering is referred to as “astronomical suffering” considerations, sort of complementary utilitarian consideration to Dr Bostrom’s astronomical waste argument. You can read more of Mr. Tomasik’s work on the far future and related topics here. Note some of it is advanced and may require you to read beforehand to understand all premises in some of his essays, but he also usually provides citations for all this.
Worth noting that the negative-learning position is pretty fringe though, especially in mainstream philosophy. Personally, I avoid it.
If you’re a hedonic utilitarian, you might retain some uncertainty over this, and think it’s best to at least hold off on destroying humanity for a while out of deference to other moral theories, and because of the option value.
Even if someone took the view you describe, though, it’s not clear that it would be a helpful one to communicate, because talking about “AI destroying humanity” does a good job of successfully communicating concern about the scenarios you’re worried about (where AI destroys humanity without this being a good outcome) to other people. As the exceptions are things people generally won’t even think of, caveating might well cause more confusion than clarity.
An ‘option value’ argument assumes that (a) the AI wouldn’t take that uncertainty into account and (b) the AI wouldn’t be able to recreate humanity at some later point if it decided that this was in fact the correct maximisation course. Even if it set us back by fully 10,000 years (very roughly the time from the dawn of civilisation up to now) it wouldn’t be obviously that bad in the long run. Indeed, for all we know this could have already happened...
In other words, in the context of an ultra-powerful ultra-well-resourced ultra-smart AI, there are few things in this world that are truly irreversible, and I see little need to give special ‘option value’ to humanity’s or even civilisation’s existence.
Agree with the rest of your post re. rhetoric, and that’s generally what I’ve assumed is going on here when this has puzzled me also.
Agree with this. I was being a bit vague about what the option value was, but I was thinking of something like the value of not locking in a value set that on reflection we would disagree with. I think this covers some but not all of the scenarios Rhys was discussing.