Note: Iâm crossposting an Alignment Forum post, written by David Krueger, which contained a list I thought was worth building out further. There are some good comments at the original link!
I think one reason machine learning researchers donât think AI x-risk is a problem is because they havenât given it the time of day. And on some level, they may be right in not doing so!
We all need to do meta-level reasoning about what to spend our time and effort on. Even giving an idea or argument the time of day requires it to cross a somewhat high bar, if you value your time. Ultimately, in evaluating whether itâs worth considering a putative issue (like the extinction of humanity at the hands (graspers?) of a rogue AI), one must rely on heuristics; by giving the argument the time of day, youâve already conceded a significant amount of resources to it! Moreover, you risk privileging the hypothesis or falling victim to Pascalâs Mugging.
Unfortunately, the case for x-risk from out-of-control AI systems seems to fail many powerful and accurate heuristics. This can put proponents of this issue in a similar position to flat-earth conspiracy theorists at first glance. My goal here is to enumerate heuristics that arguments for AI takeover scenarios fail.
Ultimately, I think machine learning researchers should not refuse to consider AI x-risk when presented with a well-made case by a person they respect or have a personal relationship with, but Iâm ambivalent as to whether they have an obligation to consider the case if theyâve only seen a few headlines about Elon. I do find it a bit hard to understand how one doesnât end up thinking about the consequences of super-human AI, since it seems obviously impactful and fascinating. But Iâm a very curious (read âdistractableâ) person...
A list of heuristics that say not to worry about AI takeover scenarios:
Outsiders not experts: This concern is being voiced exclusively by non-experts like Elon Musk, Steven Hawking, and the talkative crazy guy next to you on the bus.
Ludditism has a poor track record: For every new technology, thereâs been a pack of alarmist naysayers and doomsday prophets. And then instead of falling apart, the world got better.
ETA: No concrete threat model: When someone raises a hypothetical concern, but canât give you a good explanation for how it could actually happen, itâs much less likely to actually happen. Is the paperclip maximizer the best you can do?
Itâs straight out of science fiction: AI researchers didnât come up with this concern, Hollywood did. Science fiction is constructed based on entertaining premises, not realistic capabilities of technologies.
Itâs not empirically testable: Thereâs no way to falsify the belief that AI will kill us all. Itâs purely a matter of faith. Such beliefs donât have good track records of matching reality.
Itâs just too extreme: Whenever we hear an extreme prediction, we should be suspicious. To the extent that extreme changes happen, they tend to be unpredictable. While extreme predictions sometimes contain a seed of truth, reality tends to be more mundane and boring.
It has no grounding in my personal experience: When I train my AI systems, they are dumb as doorknobs. Youâre telling me theyâre going to be smarter than me? In a few years? So smart that they can outwit me, even though I control the very substrate of their existence?
Itâs too far off: Itâs too hard to predict the future and we canât really hope to anticipate specific problems with future AI systems; weâre sure to be surprised! We should wait until we can envision more specific issues, scenarios, and threats, not waste our time on what comes down to pure speculation.
Iâm pretty sure this list in incomplete, and I plan to keep adding to it as I think of or hear new suggestions! Suggest away!!
Also, to be clear, I am writing these descriptions from the perspective of someone who has had very limited exposure to the ideas underlying concerns about AI takeover scenarios. I think a lot of these reactions indicate significant misunderstandings about what people working on mitigating AI x-risk believe, as well as matters of fact (e.g. a number of experts have voiced concerns about AI x-risk, and a significant portion of the research community seems to agree that these concerns are at least somewhat plausible and important).
As a meta-comment, I think itâs quite unhelpful that some of these âgood heuristicsâ are written as intentional strawmen where the author doesnât believe the assumptions hold. E.g., the author doesnât believe that there are no insiders talking about X-risk. If youâre going to write a post about good heuristics, maybe try to make the good heuristic arguments actually good? This kind of post mostly just alienates me from wanting to engage in these discussions, which is a problem given that Iâm one of the more senior AGI safety researchers.
Presumably there are two categories of heuristics, here: ones which relate to actual difficulties in discerning the ground truth, and ones which are irrelevant or stem from a misunderstanding. I think it seems bad that this list implicitly casts the heuristics as being in the latter category, and rather than linking to why each is irrelevant or a misunderstanding it does something closer to mocking the concern.
For example, I would decompose the âItâs not empirically testableâ heuristic into two different components. The first is something like âitâs way easier to do good work when you have tight feedback loops, and a project that relates to a single shot opportunity without a clear theory simply cannot have tight feedback loops.â This was the primary reason I stayed away from AGI safety for years, and still seems to me like a major challenge to research work here. [I was eventually convinced that it was worth putting up with this challenge, however.]
The second is something like âonly trust claims that have been empirically verifiedâ, which runs into serious problems with situations where the claims are about the future, or running the test is ruinously expensive. A claim that âputting lambâs blood on your door tonight will cause your child to be sparedâ is one that you have to act on (or not) before you get to observe whether or not it will be effective, and so whether or not this heuristic helps depends on whether or not itâs possible to have any edge ahead of time on figuring out which such claims are accurate.
Yes, the mocking is what bothers me. In some sense the wording of the list means that people on both sides of the question could come away feeling justified without a desire for further communication: AGI safety folk since the arguments seem quite bad, and AGI safety skeptics since they will agree that some of these heuristics can be steel-manned into a good form.
I wouldnât have used the word âGoodâ in the title if the author hadnât, and I share your annoyance with the way the word is being used.
As for the heuristics themselves, I think that even the strawmen capture beliefs that many people hold, and I could see it being useful for people to have a âsimpleâ version of a given heuristic to make this kind of list easier to hold in oneâs mind. However, I also think the author could have achieved simplicity without as much strawmanning.
Phil Trammell makes a similar point about how we should update from the fact that smart people refuse to engage with AI risk arguments:
âł The upshot here seems to be that when a lot of people disagree with the experts on some issue, one should often give a lot of weight to the popular disagreement, even when one is among the experts and the peopleâs objections sound insane. Epistemic humility can demand more than deference in the face of peer disagreement: it can demand deference in the face of disagreement from oneâs epistemic inferiors, as long as theyâre numerous. They havenât engaged with the arguments, but there is information to be extracted from the very fact that they havenât bothered engaging with them. â
Iâd second the suggestion of checking out the comments.
There are also a few extra comments on the LessWrong version of the post, which arenât on the Alignment Forum version.
Hereâs the long-winded comment I made there:
I think this list is interesting and potentially useful, and I think Iâm glad you put it together. I also generally think itâs a good and useful norm for people to seriously engage with the arguments they (at least sort-of/âoverall) disagree with.
But Iâm also a bit concerned about how this is currently presented. In particular:
This is titled âA list of good heuristics that the case for AI x-risk failsâ.
The heuristics themselves are stated as facts, not as something like âPeople may believe that...â or âSome claim that...â (using words like âmightâ could also help).
A comment of yours suggests youâve already noticed this. But I think itâd be pretty quick to fix.
Your final paragraph, a very useful caveat, comes after listing all the heuristics as facts.
I think these things will have relatively small downsides, given the likely quite informed and attentive audience here. But a bunch of psychological research I read a while ago (2015-2017) suggests there could be some degree of downsides. E.g.:
And also:
Based on that sort of research (for a tad more info on it, see here), Iâd suggest:
Renaming this to something like âA list of heuristics that suggest the case for AI x-risk is weakâ (or even âfailsâ, if youâve said something like âsuggestâ or âmightâ)
Rephrasing the heuristics to stated as disputable (or even false) claims, rather than facts. E.g., âSome people may believe that this concern is being voiced exclusively by non-experts like Elon Musk, Steven Hawking, and the talkative crazy guy next to you on the bus.â ETA: Putting them in quote marks might be another option for that.
Moving whatâs currently the final paragraph caveat to before the list of heuristics.
Perhaps also adding sub-points about the particularly disputable dot points. E.g.:
â(But note that several AI experts have now voiced concern about the possibility of major catastrophes from advanced AI system, although thereâs still not consensus on this.)â
I also recognise that several of the heuristics really do seem good, and probably should make us at least somewhat less concerned about AI. So Iâm not suggesting trying to make the heuristics all sound deeply flawed. Iâm just suggesting perhaps being more careful not to end up with some readersâ brains, on some level, automatically processing all of these heuristics as definite truths that definitely suggest AI x-risk isnât worth of attention.
Sorry for the very unsolicited advice! Itâs just that preventing gradual slides into false beliefs (including from well-intentioned efforts that do actually contain the truth in them!) is sort of a hobby-horse of mine.
[And then I replied to myself with the following]
Also, one other heuristic/âproposition that, as far as Iâm aware, is simply factually incorrect (rather than âflawed but in debatable waysâ or âactually pretty soundâ) is âAI researchers didnât come up with this concern, Hollywood did. Science fiction is constructed based on entertaining premises, not realistic capabilities of technologies.â So there it may also be worth pointing out in some manner that, in reality, quite early on prominent AI researchers raised concerns somewhat similar to those discussed now.
E.g., I. J. Good apparently wrote in 1959:
(I would include the original authorâs name somewhere in the crosspost, especially at the top.)
Done.
Iâm not sure whether this counts as a heuristic or not, but anyway⊠I think an important point is that itâs not clear what the incentive problem is, or (more weakly) that the incentive problem is nowhere near as bad as suggested in classical AI risk arguments. Designers of AI systems have very strong incentives for them not to be as useless as painted in classical AI risk arguments. If you were running a paperclip company, you would very strongly want to figure out a way to make a paperclip maximiser that doesnât use peopleâs atoms to make paperclips, and there would be lots of opportunities to learn as you go on this frontâthere would be incentives for capabilities to develop in tow with safety
The incentive problem is made better by the fact that the market for AI development is quite concentratedâthere are <10 major players in the field who currently look most likely to make very advanced AGIs. Thus, you only need to get coordination on safety from <10 actors. In contrast, biotechnology is comparatively a complete nightmare from a coordination point of view, requiring coordination among pretty much all states. Climate change also seems difficult on this front, though less bad than bio.