I think writing this sort of thing up is really good; thanks for this, Nuno. :)
I also feel uneasy about the social pressure in my particular social bubble. I think that the social pressure is for me to just accept Nate Soares’ argument here that Carlsmith’s method is biased, rather than to probabilistically incorporate it into my calculations. As in “oh, yes, people know that conjunctive chains of reasoning have been debunked, Nate Soares addressed that in a blogpost saying that they are biased”.
It sounds like your social environment might be conflating four different claims:
“I personally find Nate’s arguments for disjunctiveness compelling, so I have relatively high p(doom).”
“Nate’s arguments have debunked the idea that AI risk is conjunctive, in the sense that he’s given a completely ironclad argument for this that no remotely reasonable person could disagree with.”
“Nate and Eliezer have debunked the idea that multiple-stages-style reasoning is generically reliable (e.g., in the absence of very strong prior reasons to think that a class of scenarios is conjunctive / unlikely on priors).”
“Nate has shown that it’s unreasonable to treat anything as conjunctive, and that we can therefore take for granted that AI risk is non-conjunctive without even thinking about the real world or the problem structure.”
I agree with 1 and 3, but very much disagree with 2 and 4:
Regarding 2: I think Nate’s arguments are good, but “debunked” just seems too strong here.
Regarding 4: I’d claim that it’s clearly just correct to think about how conjunctive vs. disjunctive things like AGI risk are, as opposed to assuming a priori that they must be one or the other. (Hence Nate giving arguments for disjunctiveness, as opposed to just asserting it as a brute fact.)
If your social environment is promoting 2 and 4, then kudos for spotting this and resisting the pressure to say false stuff.
As deep learning attains more and more success, I think that some of the old concerns port over. But I am not sure which ones, to what extent, and in which context. This leads me to reduce some of my probability.
On net I think the deep learning revolution increases p(doom), mostly because it’s a surprisingly opaque and indirect way of building intelligent systems, that gives you relatively few levers to control internal properties of the reasoner you SGDed your way to.
Deep learning also increases the probability of things like “roughly human-level AGIs run around for a few years before we see anything strongly superhuman”, but this doesn’t affect my p(doom) much because I don’t see a specific path for leveraging this to prevent the world from being destroyed when we do reach superintelligence. Not knowing what’s going on in your AGI system’s brain continues to be a real killer.
Some concerns that apply to a more formidable Eurisko but which may not apply by default to near-term AI systems:
Alien values
Maximalist desire for world domination
Convergence to a utility function
Very competent strategizing, of the “treacherous turn” variety
Self-improvement
etc.
Some of those points may be true, but they don’t update me a ton on p(doom) unless I see a plausible, likely-to-happen-in-real-life path from “nearer-term systems have less-scary properties” to “this prevents us from building the scary systems further down the road”.
I think the usual way this “well, GPT-3 isn’t that scary” reasoning goes wrong is that it mistakes a reason to have longer timelines (“current systems and their immediate successors seem pretty weak”) for a reason to expect humanity to never build AGI at all. It’s basically repeating the same structural mistake that caused people to not seriously think about AGI risk in 1980, 1990, 2000, and 2010: “AI today isn’t scary, so it seems silly to worry about future AI”.
Longer timelines do give us more time to figure out alignment, which lowers p(doom); but you still have to actually do the alignment in order to get out the microhopes, so it matters how much more hopeful you feel with an extra (e.g.) five years of time for humanity to work on the alignment problem.
For a given operationalization of AGI, e.g., good enough to be forecasted on, I think that there is some possibility that we will reach such a level of capabilities, and yet that this will not be very impressive or world-changing, even if it would have looked like magic to previous generations.
If this means, e.g., “AI that can do everything a human physicist or heart surgeon can do, that can’t do other sciences and can’t outperform humans”, then I’d be really, really surprised if we ever see a state of affairs like that.
What, concretely, are some examples of world-saving things you think AI might be able to do 5+ years before world-endangering AGI is possible?
(Or, if you think the pre-danger stuff won’t be world-saving, then why does it lower your p(doom)? Just by giving humanity a bit more time to notice the smoke?)
Or, in other words, once I concede that AGI could be as transformative as the industrial revolution, I don’t have to concede that it would be maximally transformative.
This is true, but I don’t use the industrial revolution as a frame for thinking about AGI. I don’t see any reason to think AGI would be similar to the industrial revolution, except insofar as they’re both “important technology-mediated things that happened in history”. AGI isn’t likely to have a similar impact to steam engines because science and reasoning aren’t similar capabilities to a steam engine.
no community which has spent the same amount of effort looking for arguments against.
I think there might have been more effort spent seeking arguments against, but it’s shallower effort: the kinds of arguments you find when you’re trying to win a water-cooler argument or write a pop-sci editorial are different from the kinds of arguments you find when you’re working full-time on the thing.
But my individual impression is that the selection effects argument packs a whole lot of punch behind it.
I don’t think it has that much punch, partly just because I don’t think EAs and rationalists are as unreasonable as Christian apologists (e.g., we’re much more inclined to search for flaws in our own arguments). And partly because the field is just pretty old at this stage, and a lot of effort has gone into arguments in both directions by now.
Even if there were 10x as much effort going into “seek out new reasons to worry about AGI” as there were going into “seek out new reasons to relax about AGI”, this matters a lot less when you’re digging for the 1000th new argument than when you’re digging for the 10th. There just aren’t as many stones left unturned, and you can look at the available arguments yourself rather than wondering whether there are huge considerations the entire planet is missing.
It is interesting that when people move to the Bay area, this is often very “helpful” for them in terms of updating towards higher AI risk. I think that this is a sign that a bunch of social fuckery is going on.
I think there’s some social and psychological fuckery going on, but less than occurs in the opposite direction in the non-EA, non-rationalist, etc. superculture.
Partly I just think that because of my own experience. The idea of AIs rising up to overthrow humanity and kill us all just sounds really weird to me. I think this is inherently a really hard risk to take seriously on an emotional level—because it’s novel, because it sounds like science fiction, because it triggers our “anthropomorphize this” mental modules while violating many of the assumptions that we’re used to taking for granted in human-like reasoners.
Sitting with the idea for sufficiently long, properly chewing on it and metabolizing it, considering and analyzing concrete scenarios, and having peers to talk about this stuff with—all of that makes me feel much more equipped to weigh the arguments for and against the risks without leaning on shallow pattern-matching.
But that’s just autobiography; if you had an easy time seriously entertaining AI risk before joining the community, and now you feel “pressured to accept AGI risk” rather than “free to agree or disagree”, then by default I assume you’re just right about your own experience.
The actual drive in the background was a lot more like “Keep running workshops that wow people” with an additional (usually consciously (!) hidden) thread about luring people into being scared about AI risk in a very particular way and possibly recruiting them to MIRI-type projects.
Even from the very beginning CFAR simply COULD NOT be honest about what it was doing or bring anything like a collaborative tone to its participants. We would infantilize them by deciding what they needed to hear and practice basically without talking to them about it or knowing hardly anything about their lives or inner struggles, and we’d organize the workshop and lectures to suppress their inclination to notice this and object.
That paints a pretty fucked up picture of early-CFAR’s dynamics. I’ve heard a lot of conflicting stories about CFAR in this respect, usually quite vague (and there are nonzero ex-CFAR staffers who I just flatly don’t trust to report things accurately). I’d be interested to hear from Anna or other early CFAR staff about whether this matches their impressions of how things went down. It unfortunately sounds to me like a pretty realistic way this sort of thing can play out.
On my view, CFAR workshops can do a lot of useful things even if they aren’t magic bullets that instantly make all participants way more rational. E.g.:
Community-building is just plain useful, and shared culture and experiences is legit just an excellent way to build it. There’s no reason this has to be done in a dishonest way, and I suspect that some of the pressure to misclassify “community-building” stuff as “rationality-enhancing” stuff comes from the fact that the benefits of the former sound squishier.
I’ve been to two CFAR workshops (a decent one in late 2013, and an excellent one in late 2018; neither seemed dishonest or trying-to-wow-me), and IME a lot of the benefit came from techniques, language, and felt affordances to try lots of things in the months and years after the workshop—along with a bunch of social connections where it was common knowledge “it’s normal to try to understand and solve your problems, X Y Z are standard ways to get traction on day-to-day problems, etc.”.
The latter also doesn’t seem like a thing that requires any lying, and I’m very skeptical of the idea that it’s useful-for-epistemics-on-net to encourage workshop participants to fuzz out parts of their models, heavily defer to Mysterious Unquestionable Rationality Experts, treat certain questions as inherently bad to think about or discuss, etc. In my experience, deliberate fuzziness in one narrow part of people’s models often leaks out to distort thinking about many other things.
The switch Anna pushed back in 2016 to CFAR being explicitly about xrisk was in fact a shift to more honesty; it just abysmally failed the is/ought distinction in my opinion.
I don’t understand this part; how does it fail the is/ought distinction?
“Rationality for its own sake for the sake of existential risk” is doublespeak gibberish.
I think this is awkwardly/cutely phrased, but is contentful (and actually a good idea) rather than being doublespeak.
The way I’d put it (if I’m understanding the idea right) is: “we’re trying to teach rationality in a high-integrity, principled way, because we think rationality is useful for existential risk reduction”.
“Rationality for its own sake” is a different sort of optimization target than “rationality when it feels locally useful for an extrinsic goal”, among other things because it’s easy to be miscalibrated about when rationality is more or less useful instrumentally. Going all-in on rationality (and not trying to give yourself wiggle room to be more or less rational at different times) can be a good idea, even if at base we’re justifying policies like this for instrumental reasons on the meta level, as opposed to justifying the policy with rationales like “it’s just inherently morally good to always be rational, don’t ask questions, it just is”.
Philosophical summersaults won’t save the fact that the energy behind a statement like that is more about controlling others’ impressions than it is about being goddamned honest about what the desire and intention really is.
Hm, this case seems like a stretch to me, which makes me a bit more skeptical about the speaker’s other claims. But it’s also possible I’m misunderstanding what the intent behind “Rationality for its own sake for the sake of existential risk” originally was, or how it was used in practice?
For instance, I think that having 70 or 80%+ probabilities on AI catastrophe within our lifetimes is probably just incorrect, insofar as a probability can be incorrect.
I think you’re wrong about this, though the “within our lifetime” part is important: I think the case for extremely high risk is very strong, but reasoning about timelines seems way more uncertain to me.
(And way less important: if the human race is killed a few years after I die of non-AGI causes, that’s no less of a tragedy.)
Surely not. Neither of those make any arguments about AI, just about software generally. If you literally think those two are sufficient arguments for concluding “AI kills us with high probability” I don’t see why you don’t conclude “Powerpoint kills us with high probability”.
Yep! To be explicit, I was assuming that general intelligence is very powerful, that you can automate it, and that it isn’t (e.g.) friendly by default.
I’m not sure I understand what statements like “general intelligence is very powerful” mean even though it seems to be a crucial part of the argument. Can you explain more concretely what you mean by this? E.g. What is “general intelligence”? What are the ways in which it is and isn’t powerful?
By “general intelligence” I mean “whatever it is that lets human brains do astrophysics, category theory, etc. even though our brains evolved under literally zero selection pressure to solve astrophysics or category theory problems”.
Human brains aren’t perfectly general, and not all narrow AIs/animals are equally narrow. (E.g., AlphaZero is more general than AlphaGo.) But it sure is interesting that humans evolved cognitive abilities that unlock all of these sciences at once, with zero evolutionary fine-tuning of the brain aimed at equipping us for any of those sciences. Evolution just stumbled into a solution to other problems, that happened to generalize to billions of wildly novel tasks.
To get more concrete:
AlphaGo is a very impressive reasoner, but its hypothesis space is limited to sequences of Go board states rather than sequences of states of the physical universe. Efficiently reasoning about the physical universe requires solving at least some problems (which might be solved by the AGI’s programmer, and/or solved by the algorithm that finds the AGI in program-space; and some such problems may be solved by the AGI itself in the course of refining its thinking) that are different in kind from what AlphaGo solves.
E.g., the physical world is too complex to simulate in full detail, unlike a Go board state. An effective general intelligence needs to be able to model the world at many different levels of granularity, and strategically choose which levels are relevant to think about, as well as which specific pieces/aspects/properties of the world at those levels are relevant to think about.
More generally, being a general intelligence requires an enormous amount of laserlike strategicness about which thoughts you do or don’t think: a large portion of your compute needs to be ruthlessly funneled into exactly the tiny subset of questions about the physical world that bear on the question you’re trying to answer or the problem you’re trying to solve. If you fail to be ruthlessly targeted and efficient in “aiming” your cognition at the most useful-to-you things, you can easily spend a lifetime getting sidetracked by minutiae / directing your attention at the wrong considerations / etc.
And given the variety of kinds of problems you need to solve in order to navigate the physical world well / do science / etc., the heuristics you use to funnel your compute to the exact right things need to themselves be very general, rather than all being case-specific. (Whereas we can more readily imagine that many of the heuristics AlphaGo uses to avoid thinking about the wrong aspects of the game state, or thinking about the wrong topics altogether, are Go-specific heuristics.)
GPT-3 is a very impressive reasoner in a different sense (it successfully recognizes many patterns in human language, including a lot of very subtle or conjunctive ones like “when A and B and C and D and E and F and G and H and I are all true, humans often say X”), but it too isn’t doing the “model full physical world-states and trajectories thereof” thing (though an optimal predictor of human text would need to be a general intelligence, and a superhumanly capable one at that).
Some examples of abilities I expect humans to only automate once we’ve built AGI (if ever):
The ability to perform open-heart surgery with a high success rate, in a messy non-standardized ordinary surgical environment.
The ability to match smart human performance in a specific hard science field, across all the scientific work humans do in that field.
In principle, I suspect you could build a narrow system that is good at those tasks while lacking the basic mental machinery required to do par-human reasoning about all the hard sciences. In practice, I very strongly expect humans to find ways to build general reasoners to perform those tasks, before we figure out how to build narrow reasoners that can do them. (For the same basic reason evolution stumbled on general intelligence so early in the history of human tech development.)
(Of course, if your brain has all the basic mental machinery required to do other sciences, that doesn’t mean that you have the knowledge required to actually do well in those sciences. An artificial general intelligence could lack physics ability for the same reason many smart humans can’t solve physics problems.)
When I say “general intelligence is very powerful”, a lot of what I mean is that science is very powerful, and that having all the sciences at once is a lot more powerful than the sum of each science’s impact.
(E.g., because different sciences can synergize, and because you can invent new scientific fields and subfields, and more generally chain one novel insight into dozens of other new insights that critically depended on the first insight.)
Another large piece of what I mean is that general intelligence is a very high-impact sort of thing to automate because AGI is likely to blow human intelligence out of the water immediately, or very soon after its invention.
80K gives the (non-representative) example of how AlphaGo and its immediate successors compared to the human ability range on Go:
In the span of a year, AI had advanced from being too weak to win a single [Go] match against the worst human professionals, to being impossible for even the best players in the world to defeat.
I expect “general STEM AI” to blow human science ability out of the water in a similar fashion. Reasons for this include:
Software (unlike human intelligence) scales with more compute.
Current ML uses far more compute to find reasoners than to run reasoners. This is very likely to hold true for AGI as well.
We probably have more than enough compute already, and are mostly waiting on new ideas for how to get to AGI efficiently, as opposed to waiting on more hardware to throw at old ideas.
Empirically, humans aren’t near a cognitive ceiling, and even narrow AI often suddenly blows past the human reasoning ability range on the task it’s designed for. It would be weird if scientific reasoning were an exception.
Empirically, human brains are full of cognitive biases and inefficiencies. It’s doubly weird if scientific reasoning is an exception even though it’s visibly a mess with tons of blind spots, inefficiencies, motivated cognitive processes, and historical examples of scientists and mathematicians taking decades to make technically simple advances.
Empirically, human brains are extremely bad at some of the most basic cognitive processes underlying STEM. E.g., consider that human brains can barely do basic mental math at all.
Human brains underwent no direct optimization for STEM ability in our ancestral environment, beyond things like “can distinguish four objects in my visual field from five objects”. In contrast, human engineers can deliberately optimize AGI systems’ brains for math, engineering, etc. capabilities.
More generally, the sciences (and many other aspects of human life, like written language) are a very recent development. So evolution has had very little time to refine and improve on our reasoning ability in many of the ways that matter.
Human engineers have an enormous variety of tools available to build general intelligence that evolution lacked. This is often noted as a reason for optimism that we can align AGI to our goals, even though evolution failed to align humans to its “goal”. It’s additionally a reason to expect AGI to have greater cognitive ability, if engineers try to achieve great cognitive ability.
The hypothesis that AGI will outperform humans has a disjunctive character: there are many different advantages that individually suffice for this, even if AGI doesn’t start off with any other advantages. (E.g., speed, math ability, scalability with hardware, skill at optimizing hardware...)
Nitpick: doesn’t the argument you made also assume that there’ll be a big discontinuity right before AGI? That seems necessary for the premise about “extremely novel software” (rather than “incrementally novel software”) to hold.
I do think that AGI will be developed by methods that are relatively novel. Like, I’ll be quite surprised if all of the core ideas are >6 years old when we first achieve AGI, and I’ll be more surprised still if all of the core ideas are >12 years old.
(Though at least some of the surprise does come from the fact that my median AGI timeline is short, and that I don’t expect us to build AGI by just throwing more compute and data at GPT-n.)
Separately and with more confidence, I’m expecting discontinuities in the cognitive abilities of AGI. If AGI is par-human at heart surgery and physics, I predict that this will be because of “click” moments where many things suddenly fall into place at once, and new approaches and heuristics (both on the part of humans and on the part of the AI systems we build), not just because of a completely smooth, incremental, and low-impact-at-each-step improvement to the knowledge and thought-habits of GPT-3.
“Superhuman AI isn’t just GPT-3 but thinking faster and remembering more things” (for example) matters for things like interpretability, since if we succeed shockingly well at finding ways to reasonably thoroughly understand what GPT-3′s brain is doing moment-to-moment, this is less likely to be effective for understanding what the first AGI’s brain is doing moment-to-moment insofar as the first AGI is working in very new sorts of ways and doing very new sorts of things.
I’m happy to add more points like these to the stew so they can be talked about. “Your list of reasons for thinking AGI risk is high didn’t explicitly mention X” is a process we can continue indefinitely long if we want to, since there are always more background assumptions someone can bring up that they disagree with. (E.g., I also didn’t explicitly mention “intelligence is a property of matter rather than of souls imparted into particular animal species by God”, “AGI isn’t thousands of years in the future”, “most random goals would produce bad outcomes if optimized by a superintelligence”...)
Which specific assumptions should be included depends on the conversational context. I think it makes more sense to say “ah, I personally disagree with [X], which I want to flag as a potential conversational direction since your comment didn’t mention [X] by name”, as opposed to speaking as though there’s an objectively correct level of granularity.
Which was responding to a claim in the OP that no EA can rationally have a super high belief in AGI risk:
For instance, I think that having 70 or 80%+ probabilities on AI catastrophe within our lifetimes is probably just incorrect, insofar as a probability can be incorrect.
The challenge the OP was asking me to meet was to point at a missing model piece (or a disagreement, where the other side isn’t obviously just being stupid) that can cause a reasonable person to have extreme p(AGI doom), given other background views the OP isn’t calling obviously stupid.(E.g., the OP didn’t say that it’s obviously stupid for anyone to have a confident belief that AGI will be a particular software project built at a particular time and place.)
The OP didn’t issue a challenge to list all of the relevant background views (relative to some level of granularity or relative to some person-with-alternative-views, which does need to be specified if there’s to be any objective answer), so I didn’t try to explicitly write out obvious popularly held beliefs like “AGI is more powerful than PowerPoint”. I’m happy to do that if someone wants to shift the conversation there, but hopefully it’s obvious why I didn’t do that originally.
Fair! Sorry for the slow reply, I missed the comment notification earlier.
I could have been clearer in what I was trying to point at with my comment. I didn’t mean to fault you for not meeting an (unmade) challenge to list all your assumptions—I agree that would be unreasonable.
Instead, I meant to suggest an object-level point: that the argument you mentioned seems pretty reliant on a controversial discontinuity assumption—enough that the argument alone (along with other, largely uncontroversial assumptions) doesn’t make it “quite easy to reach extremely dire forecasts about AGI.” (Though I was thinking more about 90%+ forecasts.)
(That assumption—i.e. the main claims in the 3rd paragraph of your response—seems much more controversial/non-obvious among people in AI safety than the other assumptions you mention, as evidenced by researchers criticizing it and researchers doing prosaic AI safety work.)
(Minor: I really liked your top-level comment but almost didn’t read this second comment because I didn’t immediately realize you split up your comment due to (I suppose) running out of space. Maybe worth it to add a “[cont.]” or something in such cases in future.)
Ok, so thinking about this, one trouble with answering your comment this is that you have a self-consistent worldview which has contrary implications to some of the stuff I hold, but I feel that you are not giving answers with reference to stuff that I already hold, but rather to stuff that further references that worldview.
Let me know if this feels way off.
So I’m going to just pick one object-level argument and dig in to that:
As deep learning attains more and more success, I think that some of the old concerns port over. But I am not sure which ones, to what extent, and in which context. This leads me to reduce some of my probability.
On net I think the deep learning revolution increases p(doom), mostly because it’s a surprisingly opaque and indirect way of building intelligent systems, that gives you relatively few levers to control internal properties of the reasoner you SGDed your way to.
Well, I think that the question is, increased p(doom) compared to what, e.g., what were your default expectations before the DL revollution?
Compared to equivalent progress in a seed AI which has a utility function
Deep learning seems like it has some advantages, e.g,.: it is [doing the kinds of things that were reinforced during its training in the past], which seems safer than [optimizing across a utility function programmed into its core, where we don’t really know how to program utility functions.]
E.g., GPT-3 seems wildly more safe than a seed AGI that had already reached that level of capabilities
“Oh yes, we just had to put in a desire to predict the world, an impulse for curiosity, in addition to the standard self-preservation drive and our experimental caring for humans module, and then just let it explore the internet” sounds fairly terrifying.
I feel that you are not giving answers with reference to stuff that I already hold, but rather to stuff that further references that worldview.
Sounds right to me! I don’t know your worldview, so I’m mostly just reporting my thoughts on stuff, not trying to do anything particularly sophisticated.
what were your default expectations before the DL revollution?
I personally started thinking about ML and AGI risk in 2013, and I didn’t have much of a view of “how are we likely to get to AGI?” at the time.
My sense is that MIRI-circa-2010 wasn’t confident about how humanity would get to AI, but expected it would involve gaining at least some more (object-level, gearsy) insight into how intelligence works. “Just throw more compute at a slightly tweaked version of one of the standard old failed approaches to AGI” wasn’t MIRI’s top-probability scenario.
From my perspective, humanity got “unlucky” in three different respects:
AI techniques started working really well early, giving us less time to build up an understanding of alignment.
Techniques started working for reasons other than us acquiring and applying gearsy new insights into how reasoning works, so the advances in AI didn’t help us understand how to do alignment.
And the specific methods that worked are more opaque than most pre-deep-learning AI, making it hard to see how you’d align the system even in principle.
E.g., GPT-3 seems wildly more safe than a seed AGI that had already reached that level of capabilities
Seems like the wrong comparison; the question is whether AGI built by deep learning (that’s at the “capability level” of GPT-3) is safer than seed AGI (that’s at the “capability level” of GPT-3).
I don’t think GPT-3 is an AGI, or has the same safety profile as baby AGIs built by deep learning. (If there’s an efficient humanly-reachable way to achieve AGI via deep learning.) So an apples-to-apples comparison would either think about hypothetical deep-learning AGI vs. hypothetical seed AGI, or it would look at GPT-3 vs. hypothetical narrow AI built on the road to seed AGI.
If we can use GPT-3 or something very similar to GPT-3 to save the world, then it of course matters that GPT-3 is way safer than seed AGI. But then the relevant argument would look something like “maybe the narrow AI tech that you get on the path to deep-learning AGI is more powerful and/or more safe than the narrow AI tech that you get on the path to seed AGI”, as opposed to “GPT-3 is safer than a baby god” (the latter being something that’s true whether or not the baby god is deep-learning-based).
Sure! It would depend on what you mean by “an argument against AI risk”:
If you mean “What’s the main argument that makes you more optimistic about AI outcomes?”, I made a list of these in 2018.
If you mean “What’s the likeliest way you think it could turn out that aligning AGI is unnecessary in order to do a pivotal act / initiate an as-long-as-needed reflection?”, I’d currently guess it’s using strong narrow-AI systems to accelerate you to Drexlerian nanotechnology (which can then be used to build powerful things like “large numbers of fast-running human whole-brain emulations”).
If you mean “What’s the likeliest way you think it could turn out that humanity’s current trajectory is basically OK / no huge actions or trajectory changes are required?”, I’d say that the likeliest scenario is one where AGI kills all humans, but this isn’t a complete catastrophe for the future value of the reachable universe because the AGI turns out to be less like a paperclip maximizer and more like a weird sentient alien that wants to fill the universe with extremely-weird-but-awesome alien civilizations. This sort of scenario is discussed in Superintelligent AI is necessary for an amazing future, but far from sufficient.
If you mean “What’s the likeliest way you think it could turn out that EAs are focusing too much on AI and should focus on something else instead?”, I’d guess it’s if we should focus more on biotech. E.g., this conjunction could turn out to be true: (1) AGI is 40+ years away; (2) by default, it will be easy for small groups of crazies to kill all humans with biotech in 20 years; and (3) EAs could come up with important new ways to avoid disaster if we made this a larger focus (though it’s already a reasonably large focus in EA).
Another way it could be bad that EAs are focusing on AI is if EAs are accelerating AGI capabilities / shortening timelines way more than we’re helping with alignment (or otherwise increasing the probability of good outcomes).
It’s mostly a summary of Yudkowsky/Bostrom ideas, but with a bunch of the ideas garbled and misunderstood.
Mitchell says that one of the core assumptions of AI risk arguments is “that any goal could be ‘inserted’ by humans into a superintelligent AI agent”. But that’s not true, and in fact a lot of the risk comes from the fact that we have no idea how to’insert’ a goal into an AGI system.
The paperclip maximizer hypothetical here is a misunderstanding of the original idea. (Though it’s faithful to the version Bostrom gives in Superintelligence.) And the misunderstanding seems to have caused Mitchell to misunderstood a bunch of other things about the alignment problem. Picking one of many examples of just-plain-false claims:
“And importantly, in keeping with Bostrom’s orthogonality thesis, the machine has achieved superintelligence without having any of its own goals or values, instead waiting for goals to be inserted by humans.”
The article also says that “research efforts on alignment are underway at universities around the world and at big AI companies such as Google, Meta and OpenAI”. I assume Google here means DeepMind, but what alignment research at Meta does Mitchell have in mind??
Also: “Many researchers are actively engaged in alignment-based projects, ranging from attempts at imparting principles of moral philosophy to machines, to training large language models on crowdsourced ethical judgments.”
… That sure is a bad picture of what looks difficult about alignment.
But my individual impression is that the selection effects argument packs a whole lot of punch behind it.
I don’t think it has that much punch, partly just because I don’t think EAs and rationalists are as unreasonable as Christian apologists (e.g., we’re much more inclined to search for flaws in our own arguments). And partly because the field is just pretty old at this stage, and a lot of effort has gone into arguments in both directions by now.
Idk, I can’t help but notice that your title at MIRI is “Research Communications”, but there is nobody paid by the “Machine Intelligence Skepticism Institute” to put forth claims that you are wrong.
Idk, I can’t help but notice that your title at MIRI is “Research Communications”, but there is nobody paid by the “Machine Intelligence Skepticism Institute” to put forth claims that you are that wrong.
Since we’re talking about p(doom), this sounds like a claim that my job at MIRI is to generate arguments for worrying more about AGI, and we haven’t hired anybody whose job it is to generate arguments for worrying less.
Well, I’m happy to be able to cite that thing I wrote with a long list of reasons to worry less about AGI risk!
I’m not claiming the list is exhaustive or anything, or that everyone would agree with what should go on such a list. It’s the reasons that update me the most, not the reasons that I’d expect to be most convincing to some third party.
But the very existence of public lists like this seems like something your model was betting against. Especially insofar as it’s a novel list with items MIRI came up with itself that reflect MIRI-ish perspectives, not a curated list of points other groups have made.
I like the post you linked but I’m not sure this is much of a rebuttal to Nuno’s point. This is a single post, saying the situation is not maximally bad, against a much larger corpus of writings and communications by you and MIRI emphasizing risks from AGI.
If you think that AGI risk is extremely high (as I do), then the intellectually honest thing to do is to write out the main considerations that cause you to think it’s that high. This includes any major considerations that cause you to not think it’s even higher.
One of Nuno’s points in the OP was, paraphrasing: ‘I worry that the doomers are only citing strong arguments for doom, and not citing strong arguments against doom. Either because (a) doomers are flinching away from thinking about arguments against doom, or because (b) they’re strategically withholding arguments against doom in the hope of manipulating others into having doomier views. Whether (a) is true or (b), it follows that I should discount doomer arguments somewhat as filtered evidence.’
The existence of my “reasons I’m less doomy than I could have been” post is meaningful evidence against (a) and (b). It can still be part of an eleven-dimensional chess game to (consciously or unconsciously) trick people, but it’s nontrivial Bayesian evidence that we’re doing the epistemically cooperative thing.
If we had a huge number of posts pushing for optimism, then that would be even more evidence against (a) and (b). But that would also be evidence that we have way lower p(doom), or that we’re trying to trick people into thinking we have lower p(doom) by giving excessive time to arguments that we think are way weaker than the counter-arguments.
Be wary of setting a trap where there’s no possible way for you to take claims of high p(doom) seriously, because when someone gives more arguments for doom than for hope you assume they’re trying to trick you by filtering out secret strong reasons for hope, and when someone gives you similar numbers of arguments for doom and for hope you assume they can’t really think p(doom) is that high.
Yeah, I don’t think that your paraphrase was accurate. I don’t need to posit a conscious (strategically withholding) or subconscious (flinching away) conspiracy, in the same way that I don’t need a conscious conspiracy to explain why there are so many medieval proofs of god. So the problem may not be at the individual but at the collective level.
Be wary of setting a trap where there’s no possible way for you to take claims of high p(doom) seriously, because when someone gives more arguments for doom than for hope you assume they’re trying to trick you by filtering out secret strong reasons for hope, and when someone gives you similar numbers of arguments for doom and for hope you assume they can’t really think p(doom) is that high.
I briefly touched on this at the end of the post and in this comment thread. In short:
Eehh, you can’t just ignore your evidence being filtered
Strong kinds of evidence, e.g., empirical evidence, mathematical proof, very compelling arguments would still move my needle. Weak or fuzzy arguments much less
I can still process evidence from my own eyes, e.g., observe progress, tap into sources that I think are less filtered, think about this for myself, etc.
I can still “take claims of high p(doom) seriously” in the sense of believing that people reporting them hold that as a sincere belief.
Though that doesn’t necessarily inspire a compulsion to defer to those beliefs.
That all seems right to me, and compatible with what I was saying. The part of Sphor’s comment that seemed off to me was “against a much larger corpus of writings and communications by you and MIRI emphasizing risks from AGI”: one blog post is a small data point to weigh against lots of other data points, but the relevant data to weigh it against isn’t “MIRI wrote other things that emphasize risks from AGI” in isolation, as though “an organization or individual wrote a lot of arguments for X” on its own is strong reason to discount those arguments as filtered.
The thing doing the work has to be some background model of the arguers (or of some process upstream of the arguers), not a raw count of how often someone argues for a thing. Otherwise you run into the “damned if you argue a lot for X, damned if you don’t argue a lot for X” problem.
That’s not the point Nuno was making above. He said in the OP that there are selection effects at the level of arguments, not that doomers were trying to trick people. You replied to it saying that this argument doesn’t have much punch, because you trust EAs and Rationalists and think the field is established enough to have had arguments flowing in both directions. He replied by pointing out that MIRI promotes AI risk as an organization and there’s no equivalent organization putting out arguments against AI risk. You said this doesn’t apply because you once wrote a post saying not to be maximally pessimistic about AI. I said this doesn’t mean much because the vast majority of writing by you and MIRI emphasizes AI risks. I don’t know what your response to this specific line of criticism is.
Thanks for winding back through the conversation so far, as you understood it; that helped me understand better where you’re coming from.
He replied by pointing out that MIRI promotes AI risk as an organization and there’s no equivalent organization putting out arguments against AI risk.
Nuno said: “Idk, I can’t help but notice that your title at MIRI is ‘Research Communications’, but there is nobody paid by the ‘Machine Intelligence Skepticism Institute’ to put forth claims that you are wrong.”
I interpreted that as Nuno saying: MIRI is giving arguments for stuff, but I cited an allegation that CFAR is being dishonest, manipulative, and one-sided in their evaluation of AI risk arguments, and I note that MIRI is a one-sided doomer org that gives arguments for your side, while there’s nobody paid to raise counter-points.
My response was a concrete example showing that MIRI isn’t a one-sided doomer org that only gives arguments for doom. That isn’t a proof that we’re correct about this stuff, but it’s a data point against “MIRI is a one-sided doomer org that only gives arguments for doom”. And it’s at least some evidence that we aren’t doing the specific dishonest thing Nuno accused CFAR of doing, which got a lot of focus in the OP.
I said this doesn’t mean much because the vast majority of writing by you and MIRI emphasizes AI risks.
The specific thing you said was: “I like the post you linked but I’m not sure this is much of a rebuttal to Nuno’s point. This is a single post, saying the situation is not maximally bad, against a much larger corpus of writings and communications by you and MIRI emphasizing risks from AGI. ”
My reply mostly wasn’t an objection to “I’m not sure this is much of a rebuttal to Nuno’s point” or “This is a single post”. My objection was to “against a much larger corpus of writings and communications by you and MIRI emphasizing risks from AGI”. As I said to Nuno upthread:
[… O]ne blog post is a small data point to weigh against lots of other data points, but the relevant data to weigh it against isn’t “MIRI wrote other things that emphasize risks from AGI” in isolation, as though “an organization or individual wrote a lot of arguments for X” on its own is strong reason to discount those arguments as filtered.
The thing doing the work has to be some background model of the arguers (or of some process upstream of the arguers), not a raw count of how often someone argues for a thing. Otherwise you run into the “damned if you argue a lot for X, damned if you don’t argue a lot for X” problem.
Regarding those models of MIRI and other orgs in the space, and of upstream processes that might influence us:
I think you and Nuno are just wrong to think of MIRI as “an org organized around trying to make people more pessimistic about AI outcomes”, any more than FHI is an org organized around trying to make people think anthropics, whole-brain emulation, and biosecurity are really important. Those are things that people at FHI tend to believe, but that’s because researchers there (rightly or wrongly) looked at the arguments and reached that conclusion, while at the same time looking at other topics and concluding they weren’t very important (e.g., brain-computer interfaces, nuclear fusion, and asteroid risk). If FHI researchers had evaluated the arguments differently, the organization would have continued existing, just with a different set of research interests.
Similarly, MIRI was originally an accelerationist org, founded with a goal of advancing humanity to AGI as quickly as possible. We had an incentive to think AGI is important, but not (AFAICT) to think AGI is scary. “Oh wait, AGI is scary” is a conclusion Eliezer came to in the first few years of MIRI’s existence, via applying more scrutiny to his assumption that AGI would go great by default.
I’m all in favor of asking questions like “What were the myopic financial incentives in this case?” and seeing how much behavior this predicts. But I think applying that lens in an honest way should sometimes result in “oh weird, that behavior is the opposite of the one I’d naively have predicted with this model”, as opposed to it being a lens that can explain every observation equally.
MIRI deciding that AGI is scary and risky, in an excited techno-optimist social environment and funding landscape, seems like a canonical example of something different from that going on.
(Which doesn’t mean MIRI was right, then or now. People can be wrong for reasons other than “someone was paying me to be wrong”.)
Our first big donor, Peter Thiel, got excited about us because he thought of us as techno-optimists, and stopped supporting us within a few years when he concluded we were too dour about humanity’s prospects. This does not strike me as a weird or surprising outcome, except insofar as it’s weird someone in Thiel’s reference class took an interest in MIRI even temporarily.
I don’t think MIRI has more money today than if we were optimists about AI. I also don’t think we crystal-ball-predicted that funders like Open Phil would exist 5 or 15 years in the future, or that they’d have any interest in “superintelligent AI destroys the world” risk if they did exist. Nor do I think we’ve made more money, expanded more, felt better about ourselves, or had more-fun social lives via our opening up in 2020-2023 that we’ve become even more pessimistic and think things are going terribly, both at MIRI and in the alignment field at large.
Speaking to the larger question: is there a non-epistemic selection effect in the world at large, encouraging humanity to generate more arguments for AI risk than against it? This does not follow from the mere observation of a bunch of arguments for AI risk, because that observation is also predicted by those arguments being visibly correct, and accepted and shared because of their correctness.
For different groups, I’d guess that...
Random academics probably have a myopic incentive to say things that sound pretty respectable and normal, as opposed to wild and sensational. Beyond that, I don’t think there’s a large academia-wide incentive to either be pro-tech or anti-tech, or to have net-optimistic or net-pessimistic beliefs about AI in particular. There is a strong incentive to just ignore the topic, since it’s hard to publish papers about it in top journals or conferences.
Journalists do have an incentive to say things that sound sensational, both positive (“AI could transform the world in amazing positive way X!”) and negative (“AI could transform the world in horrifying way Y”). I’d guess there’s more myopic incentive to go negative than positive, by default. That said, respected newspapers tend to want to agree with academics and sound respectable and normal, which will similarly encourage a focus on small harms and small benefits. I don’t know how these different forces are likely to balance out, though I can observe empirically that I see a wide range of views expressed, including a decent number of articles worrying about AI doom.
The social network MIRI grew out of (transhumanists, Marginal-Revolution-style libertarians, extropians, techno-utopians, etc.) has strong myopic social incentives to favor accelerationism, “tech isn’t scary”, “regulation and safety concerns cause way more harm than the tech itself”, etc. The more optimistic you are about the default outcome of rapid technological progress, the better.
Though I think these incentives have weakened over the last 20 years, in large part due to MIRI persuading a lot of transhumanists to worry about misaligned AGI in particular, as a carve-out from our general techno-optimism.
EA circa 2010 probably had myopic incentives to not worry much about AGI doom, because “AGI breaks free of our control and kills us all” sounds weird and crankish, and didn’t help EA end malaria or factory farming any faster. And indeed, the earliest write-ups on AI risk by Open Phil and others strike me as going out of their way to talk about milder risks and be pretty cautious, abstract, and minimal in how they addressed “superintelligence takes over and kills us”, much more so than recent material like Cold Takes and the 80K Podcast. (Even though it’s not apparent to me that there’s more evidence for “superintelligence takes over and kills us” now than there was in 2014.)
EA circa 2023 probably has myopic incentives to have “medium-sized” probabilities of AI doom—unlike in the early days, EA leadership and super-funders like Open Phil nowadays tend to be very worried about AGI risk, which creates both financial and social incentives to look similarly worried about AI. The sweet spot is probably to think AI is a big enough risk to take seriously, but not as big as the weirder orgs like MIRI think. Within EA, this is a respected and moderate-sounding position, whereas in ML or academia even having a 10% chance of AGI doom might make you sound pretty crazy.
(Obviously none of this is true across the board, and different social networks within EA will have totally different local social incentives—some EA friend groups will think you’re dumb if you think AI risk is worth thinking about at all, some will think you’re dumb if your p(doom) is below 90%, and so on for a variety of different probability ranges. There’s a rich tapestry of diverging intuitions about which views are crazy here.)
The myopic incentives for ML itself, both financial and social, probably skew heavily toward “argue against ML being scary or dangerous at all”, mitigated mainly by a desire to sound moderate and respectable.
The “moderate and respectable” goal pushes toward ML people acknowledging that there are some risks, but relatively minor ones — this feels like a safe and sober middle ground between “AI is totally risk-free and there will be no problems” and “there’s a serious risk of AI causing a major global catastrophe”.
“Moderate and respectable” also pushes against ML people arguing against AGI risk because it pushes for ML people to just not talk about the subject at all. (Though I’d guess this is a smaller factor than “ML people don’t feel like they have a strong argument, and don’t want to broach the topic if there isn’t an easy powerful rebuttal”. People tend to be pretty happy to dunk on views they think are crazy—e.g., on social media—if they have a way of pointing at something about the view that their peers will be able to see is clearly wrong.)
I would say that the most important selection effect is ML-specific (favoring lower p(doom)), because ML is “the experts” who smart people would most naturally want to defer to, is a lot larger than the AI x-risk ecosystem (and especially larger than the small part of the x-risk ecosystem that has way higher p(doom) than Nuno), and ML researchers can focus a large share of their attention on generating arguments for “ML is not scary or dangerous at all”, whereas journalists, academia-at-large, etc. have their attention split between AI and a thousand other topics.
But mostly my conclusion from all this, and from the history of object-level discussion so far, is that there just aren’t super strong myopic incentives favoring either “humanity only generates arguments for higher p(doom)” or “humanity only generates arguments for lower p(doom)”. There’s probably some non-epistemic tilt toward “humanity generates more arguments against AI risk than for AI risk”, at least within intellectual circles (journalism may be another matter entirely). But I don’t think the arguments are so impenetrably difficult to evaluate on their own terms, or so scarce (on anyone’s side), that it ends up mattering much.
From inside MIRI, it appears much more plausible to me that we’ve historically understated how worried we are about AI, than that we’ve historically overstated it. (Which seems like a mistake to me now.) And I think our arguments are good on their own terms, and the reasoning checks out. Selection effects strike me as a nontrivial but minor factor in all of this.
I don’t think everyone has access to the same evidence as me, so I don’t think everyone should have probabilities as doomy as mine. But the above hopefully explains why I disagree with “the selection effects argument packs a whole lot of punch behind it”, as well as “having 70 or 80%+ probabilities on AI catastrophe within our lifetimes is probably just incorrect, insofar as a probability can be incorrect”.
I take the latter to be asserting, not just that Nuno thinks he lacks enough evidence to have 70% p(doom in our lifetime), but that he places vanishingly small probability on anyone else having the evidence required to have an extreme belief about this question.
Showing that this is overconfident on Nuno’s part requires a lot less evidence than providing a full decomposition of all the factors going into my level of worry about AGI: it should be easier for us to reach agreement that the other point of view isn’t crazy than for us to reach agreement about all the object-level questions.
I’m sorry, I’m not sure I understood correctly. Are you saying you agree there are selection effects, but you object to how you think Nuno and I are modeling MIRI and the processes generating MIRI-style models on AGI?
I’m confused by your phrasing “there are selection effects”, because it sounds so trivial to me. Every widespread claim faces some nonzero amount of (non-epistemic) selection bias.
E.g., I’d assume that twelve-syllable sentences get asserted at least slightly less often than eleven-syllable sentences, because they’re a bit more cumbersome. This is a non-epistemic selection effect, but it doesn’t cause me to worry that I’ll be unable to evaluate the truth of eleven- or twelve-syllable sentences for myself.
There are plenty of selection effects in the world, but typically they don’t put us into a state of epistemic helplessness; they just imply that it takes a bit of extra effort to dig up all the relevant arguments (since they’re out there, some just take some more minutes to find on Google).
When the world has already spent decades arguing about a question, and there are plenty of advocates for both sides of the question, selection effects usually mean “it takes you some more minutes to dig up all the key arguments on Google”, not “we must default to uncertainty no matter how strong the arguments look”. AI risk is pretty normal in that respect, on my view.
What’s an example of “here is an argument for why that reason doesn’t apply” that you think is wrong?
And are you claiming that Nate or I are “assigning maximal probability to AI doom”, or doing this kind of qualitative black-and-white reasoning? If so, why?
Rereading the post, I think that it has a bunch statements about what Soares believes, but it doesn’t have that many mechanisms, pathways, counter-considerations, etc.
E.g.,:
The world’s overall state needs to be such that AI can be deployed to make things good. A non-exhaustive list of things that need to go well for this to happen follows
The world needs to admit of an AGI deployment strategy (compatible with realistic alignable-capabilities levels for early systems) that prevents the world from being destroyed if executed.
At least one such strategy needs to be known and accepted by a leading organization.
Somehow, at least one leading organization needs to have enough time to nail down AGI, nail down alignable AGI, actually build+align their system, and deploy their system to help.
This very likely means that there needs to either be only one organization capable of building AGI for several years, or all the AGI-capable organizations need to be very cautious and friendly and deliberately avoid exerting too much pressure upon each other.
It needs to be the case that no local or global governing powers flail around (either prior to AGI, or during AGI development) in ways that prevent a (private or public) group from saving the world with AGI.
This is probably a good statement of what Soares thinks needs to happen, but it is not a case for that, so I am left to evaluate the statements and the claim that they are conjunctive with reference to their intuitive plausibility.
I think I might be a bit dense here.
E.g.,:
It needs to be the case that no local or global governing powers flail around (either prior to AGI, or during AGI development) in ways that prevent a (private or public) group from saving the world with AGI.
Idk, he later mentions the US government’s COVID response, but I think the relevant branch of the government for dealing with AGI threats would probably be the department of defense, which seems much more competent, and seems capable of plays like blocking exports of semiconductor manufacturing equipment to China.
Deep learning also increases the probability of things like “roughly human-level AGIs run around for a few years before we see anything strongly superhuman”, but this doesn’t affect my p(doom) much because I don’t see a specific path for leveraging this to prevent the world from being destroyed when we do reach superintelligence
But it does leave a path open to prevent doom: not reaching superintelligence! i.e. a global moratorium on AGI.
Social/moral consensus? There is precedent with e.g. recombinant DNA or human genetic engineering (if only the AI Asilomar conference was similarly focused on a moratorium!) It might be hard to indefinitely enforce globally, but we might at least be able to kick the can down the road a couple of decades (as seems to have happened with the problematic bio research).
(It should be achieved without such AGIs running around, if we want to minimise x-risk. Indeed, we should have started on this already! I’m starting to wonder whether it might actually be the best option we have, given the difficulty, or perhaps impossibility(?) of alignment.)
Don’t get me wrong, I’d love to live in a glorious transhuman future (like e.g. Iain M Bank’s Culture), but I just don’t think it’s worth the risk of doom, as things stand. Maybe after a few decades of moratorium, when we know a lot more, we can reassess (and hopefully we will still be able to have life extension so will personally still be around).
It now seems unfortunate that the AI x-risk prevention community was seeded from the transhumanist/techno-utopian community (e.g. Yudkowsky and Bostrom). This historical contingency is probably a large part of the reason why a global moratorium on AGI has never been seriously proposed/attempted.
This historical contingency is probably a large part of the reason why a global moratorium on AGI has never been seriously proposed/attempted.
Seems very surprising if true — the Yudkowskians are the main group that worries we’re screwed without a global moratorium, and the main group that would update positively if there were a way to delay AGI by a few decades. (Though they aren’t the main group that thinks it’s tractable to coordinate such a big delay.)
From my perspective Bostrom and Yudkowsky were the ones saying from the get-go that rushing to AGI is bad. E.g., in Superintelligence:
One effect of accelerating progress in hardware, therefore, is to hasten the arrival of machine intelligence. As discussed earlier, this is probably a bad thing from the impersonal perspective, since it reduces the amount of time available for solving the control problem and for humanity to reach a more mature stage of civilization.
(Though he flags that this is a “tentative conclusion” that “could be overturned, for example if the threats from other existential risks or from post-transition coordination failures turn out to be extremely large”. If we were thinking about going from “AGI in 100 years” to “AGI in 300 years”, I might agree; if we’re instead going from “AGI in 15 years” to “AGI in 40 years”, then the conclusion seems way less tentative to me, given how unsolved the alignment problem is!)
The transhumanists were the ones who centered a lot of the early discussion around differential technological development, a.k.a. deliberately trying to slow down scary tech (e.g. AGI) so it comes after anti-scary tech (e.g. alignment), or attempting to accelerate alignment work to the same effect.
The idea that Bostrom or Yudkowsky ever thought “the alignment problem is a major issue, but let’s accelerate to AGI as quickly as possible for the sake of reaching the Glorious Transhumanist Future sooner” seems like revisionism to me, and I’m skeptical that the people putting less early emphasis on differential technological development back in 2014, in real life, would have somehow performed better in this counterfactual.
The idea that Bostrom or Yudkowsky ever thought “the alignment problem is a major issue, but let’s accelerate to AGI as quickly as possible for the sake of reaching the Glorious Transhumanist Future sooner” seems like revisionism to me
I’m not saying this is (was) the case. It’s more subtle than that. It’s the kind of background worldview that makes people post this (or talk of “pivotal acts”) rather than this.
The message of differential technological development clearly hasn’t had the needed effect. There has been no meaningful heed paid to it by the top AI companies. What we need now is much stronger statements. i.e. ones that use the word “moratorium”. Why isn’t MIRI making such statements? It doesn’t make sense to go to 0 hope of survival without even seriously attempting a moratorium (or at the very least, publicly advocating for one).
I think the blunt MIRI-statement you’re wanting is here:
Capabilities work is currently a bad idea
Nate’s top-level view is that ideally, Earth should take a break on doing work that might move us closer to AGI, until we understand alignment better.
That move isn’t available to us, but individual researchers and organizations who choose not to burn the timeline are helping the world, even if other researchers and orgs don’t reciprocate. You can unilaterally lengthen timelines, and give humanity more chances of success, by choosing not to personally shorten them.
Nate thinks capabilities work is currently a bad idea for a few reasons:
He doesn’t buy that current capabilities work is a likely path to ultimately solving alignment.
Insofar as current capabilities work does seem helpful for alignment, it strikes him as helping with parallelizable research goals, whereas our bottleneck is serial research goals. (See A note about differential technological development.)
Nate doesn’t buy that we need more capabilities progress before we can start finding a better path.
[...]
On Nate’s view, the field should do experiments with ML systems, not just abstract theory. But if he were magically in charge of the world’s collective ML efforts, he would put a pause on further capabilities work until we’ve had more time to orient to the problem, consider the option space, and think our way to some sort of plan-that-will-actually-probably-work. It’s not as though we’re hurting for ML systems to study today, and our understanding already lags far behind today’s systems’ capabilities.
[...]
Nate thinks that DeepMind, OpenAI, Anthropic, FAIR, Google Brain, etc. should hit the pause button on capabilities work (or failing that, at least halt publishing). (And he thinks any one actor can unilaterally do good in the process, even if others aren’t reciprocating.)
Tangentially, I’ll note that you might not want MIRI to say “that move isn’t available to us”, if you think that it’s realistic to get the entire world to take a break on AGI work, and if you think that saying pessimistic things about this might make it harder to coordinate. (Because, e.g., this might require a bunch of actors to all put a lot of sustained work into building some special institution or law, that isn’t useful if you only half-succeed; and Alice might not put in this special work if she thinks Bob is unconditionally unwilling to coordinate, or if she’s confident that Carol is confident that Dan won’t coordinate.)
But this seems like a very unlikely possibility to me, so I currently see more value in just saying MIRI’s actual take; marginal timeline-lengthening actions can be useful even if we can’t actually put the whole world on pause for 20 years.
This is good, but I don’t think it goes far enough. And I agree with your comments re “might not want MIRI to say “that move isn’t available to us”″. It might not be realistic to get the entire world to take a break on AGI work, but it’s certainly conceivable, and I think maybe at this point more realistic than expecting alignment to be solved in time (or at all?). It seems reasonable to direct marginal resources toward pushing for a moratorium on AGI rather than more alignment work (although I still think this should at least be tried too!)
Your’s and Nate’s statement still implicitly assumes that AGI capabilities orgs are “on our side”. The evidence is that they are clearly not. Demis is voicing caution at the same time that Google leadership have started a race with OpenAI (Microsoft). It’s out of Demis’ (and his seemingly toothless ethics board’s) hands. Less accepting what has been tantamount to “existential safety washing”, and more realpolitik, is needed. Better now might be to directly appeal to the public and policymakers. Or find a way to strategise with those with power. For example, should the UN Security Council be approached somehow? This isn’t “defection”.
I’m saying all this because I’m not afraid of treading on any toes. I don’t depend on EA money (or anyone’s money) for my livelihood or career[1]. I’m financially independent. In fact, my life is pretty good, all apart from facing impending doom from this! I mean, I don’t need to work to survive[2], I’ve got an amazing partner and and a supportive family. All that is missing is existential security! I’d be happy to have “completed it mate” (i.e I’ve basically done this with the normal life of house, car, spouse, family, financial security etc); but I haven’t - remaining is this small issue of surviving for a normal lifespan, having my children survive and thrive / ensuring the continuation of the sentient universe as we know it...
I think maybe at this point more realistic than expecting alignment to be solved in time (or at all?).
I think it’s a lot more realistic to solve alignment than to delay AGI by 50 years. I’d guess that delaying AGI by 10 years is maybe easier than alignment, but it also doesn’t solve anything unless we can use those 10 years to figure out alignment as well. For that matter, delaying by 50 years also requires that we solve alignment in that timeframe, unless we’re trying to buy time to do some third other thing.
The difficulty of alignment is also a lot more uncertain than the difficulty of delaying AGI: it depends more on technical questions that are completely unknown from our current perspective. Delaying AGI by decades is definitely very hard, whereas the difficulty of alignment is mostly a question mark.
All of that suggests to me that alignment is far more important as a way to spend marginal resources today, but we should try to do both if there are sane ways to pursue both options today.
If you want MIRI to update from “both seem good, but alignment is the top priority” to your view, you should probably be arguing (or gathering evidence) against one or more of these claims:
AGI alignment is a solvable problem.
Absent aligned AGI, there isn’t a known clearly-viable way for humanity to achieve a sufficiently-long reflection (including centuries of delaying AGI, if that turned out to be needed, without permanently damaging or crippling humanity).
(There are alternatives to aligned AGI that strike me as promising enough to be worth pursuing. E.g., maybe humans can build Drexlerian nanofactories without help from AGI, and can leverage this for a pivotal act. But these all currently seem to me like even bigger longshots than the alignment problem, so I’m not currently eager to direct resources away from (relatively well-aimed, non-capabilities-synergistic) alignment research for this purpose.)
Humanity has never succeeded in any political task remotely as difficult as the political challenge of creating an enforced and effective 50+ year global moratorium on AGI. (Taking into account that we have no litmus test for what counts as an “AGI” and we don’t know what range of algorithms or what amounts of compute you’d need to exclude in order to be sure you’ve blocked AGI. So a regulation that blocks AGI for fifty years would probably need to block a ton of other things.)
EAs have not demonstrated the ability to succeed in political tasks that are way harder than any political task any past humans have succeeded on.
Even a 10 year delay is worth a huge amount (in expectation). We may well have a very different view of alignment by then (including perhaps being pretty solid on it’s impossibility? Or perhaps a detailed plan for implementing it? (Or even the seemingly very unlikely ”..there’s nothing to worry about”)), which would allow us to iterate on a better strategy (we shouldn’t assume that our outlook will be the same after 10 years!)
but we should try to do both if there are sane ways to pursue both options today.
Yes! (And I think there are sane ways).
If you want MIRI to update from “both seem good, but alignment is the top priority” to your view, you should probably be arguing (or gathering evidence) against one or more of these claims:
AGI alignment is a solvable problem.
There are people working on this (e.g. Yampolskiy, Landry & Ellen), and this is definitely something I want to spend more time on (note that the writings so far could definitely do with a more accessible distillation).
Absent aligned AGI, there isn’t a known clearly-viable way for humanity to achieve a sufficiently-long reflection
I really don’t think we need to worry about this now. AGI x-risk is an emergency—we need to deal with that emergency first (e.g. kick the can down the road 10 years with a moratorium on AGI research); then when we can relax a little, we can have the luxury to think about long term flourishing.
Humanity has never succeeded in any political task remotely as difficult as the political challenge of creating an enforced and effective 50+ year global moratorium on AGI.
I think this can definitely be argued against (and I will try and write more as/when I make a more fleshed out post calling for a global AGI moratorium). For a start, without all the work on nuclear proliferation and risk, we may well not be here today. Yes there has been proliferation, but there hasn’t been an all-out nuclear exchange yet! It’s now 77 years since a nuclear weapon was used in anger. That’s a pretty big result I think! Also, global taboos around bio topics such as human genetic engineering are well established. If such a taboo is established, enforcement becomes a lesser concern, as you are then only fighting against isolated rogue elements rather than established megacorporations. Katja Grace discusses such taboos in her post on slowing down AI.
EAs have not demonstrated the ability to succeed in political tasks that are way harder than any political task any past humans have succeeded on.
Fair point. I think we should be thinking much wider than EA here. This needs to become mainstream, and fast.
Also, I should say that I don’t think MIRI should necessarily be diverting resources to work on a moratorium. Alignment is your comparative advantage so you should probably stick to that. What I’m saying is that you should be publicly and loudly calling for a moratorium. That would be very easy for you to do (a quick blog post/press release). But it could have a huge effect in terms of shifting the Overton Window on this. As I’ve said, it doesn’t make sense for this not to be part of any “Death with Dignity” strategy. The sensible thing when faced with ~0% survival odds is to say “FOR FUCK’S SAKE CAN WE AT LEAST TRY AND PULL THE PLUG ON HUMANS DOING AGI RESEARCH!?!”, or even “STOP BUILDING AGI YOU FUCKS!” [Sorry for the language, but I think it’s appropriate given the gravity of the situation, as assumed by talk of 100% chance of death etc.]
Even a 10 year delay is worth a huge amount (in expectation). We may well have a very different view of alignment by then (including perhaps being pretty solid on it’s impossibility?
Agreed on all counts! Though as someone who’s been working in this area for 10 years, I have a newfound appreciation for how little intellectual progress can easily end up happening in a 10-year period...
(Or even the seemingly very unlikely ”..there’s nothing to worry about”)
I have a lot of hopes that seem possible enough to me to be worth thinking about, but this specific hope isn’t one of them. Alignment may turn out to be easier than expected, but I think we can mostly rule out “AGI is just friendly by default”.
But it could have a huge effect in terms of shifting the Overton Window on this.
In which direction?
:P
I’m joking, though I do take seriously that there are proposals that might be better signal-boosted by groups other than MIRI. But if you come up with a fuller proposal you want lots of sane people to signal-boost, do send it to MIRI so we can decide if we like it; and if we like it as a sufficiently-realistic way to lengthen timelines, I predict that we’ll be happy to signal-boost it and say as much.
As I’ve said, it doesn’t make sense for this not to be part of any “Death with Dignity” strategy. The sensible thing when faced with ~0% survival odds is to say “FOR FUCK’S SAKE CAN WE AT LEAST TRY AND PULL THE PLUG ON HUMANS DOING AGI RESEARCH!?!”, or even “STOP BUILDING AGI YOU FUCKS!” [Sorry for the language, but I think it’s appropriate given the gravity of the situation, as assumed by talk of 100% chance of death etc.]
I strongly agree and think it’s right that people… like, put some human feeling into their words, if they agree about how fucked up this situation is? (At least if they find it natural to do so.)
I think there’s some social and psychological fuckery going on, but less than occurs in the opposite direction in the non-EA, non-rationalist, etc. superculture.
Yes, but you could think that the fuckery in the EA/rat comunity is concentrated on the topic of AI, and that the EA/rat communities can develop defenses against normal social fuckery but not viceversa.
Hey, thanks for your lengthy comment. For future reference I would have found it more convenient if you had an individual comment for each consideration :)
I think writing this sort of thing up is really good; thanks for this, Nuno. :)
It sounds like your social environment might be conflating four different claims:
“I personally find Nate’s arguments for disjunctiveness compelling, so I have relatively high p(doom).”
“Nate’s arguments have debunked the idea that AI risk is conjunctive, in the sense that he’s given a completely ironclad argument for this that no remotely reasonable person could disagree with.”
“Nate and Eliezer have debunked the idea that multiple-stages-style reasoning is generically reliable (e.g., in the absence of very strong prior reasons to think that a class of scenarios is conjunctive / unlikely on priors).”
“Nate has shown that it’s unreasonable to treat anything as conjunctive, and that we can therefore take for granted that AI risk is non-conjunctive without even thinking about the real world or the problem structure.”
I agree with 1 and 3, but very much disagree with 2 and 4:
Regarding 2: I think Nate’s arguments are good, but “debunked” just seems too strong here.
Regarding 4: I’d claim that it’s clearly just correct to think about how conjunctive vs. disjunctive things like AGI risk are, as opposed to assuming a priori that they must be one or the other. (Hence Nate giving arguments for disjunctiveness, as opposed to just asserting it as a brute fact.)
If your social environment is promoting 2 and 4, then kudos for spotting this and resisting the pressure to say false stuff.
On net I think the deep learning revolution increases p(doom), mostly because it’s a surprisingly opaque and indirect way of building intelligent systems, that gives you relatively few levers to control internal properties of the reasoner you SGDed your way to.
Deep learning also increases the probability of things like “roughly human-level AGIs run around for a few years before we see anything strongly superhuman”, but this doesn’t affect my p(doom) much because I don’t see a specific path for leveraging this to prevent the world from being destroyed when we do reach superintelligence. Not knowing what’s going on in your AGI system’s brain continues to be a real killer.
Some of those points may be true, but they don’t update me a ton on p(doom) unless I see a plausible, likely-to-happen-in-real-life path from “nearer-term systems have less-scary properties” to “this prevents us from building the scary systems further down the road”.
I think the usual way this “well, GPT-3 isn’t that scary” reasoning goes wrong is that it mistakes a reason to have longer timelines (“current systems and their immediate successors seem pretty weak”) for a reason to expect humanity to never build AGI at all. It’s basically repeating the same structural mistake that caused people to not seriously think about AGI risk in 1980, 1990, 2000, and 2010: “AI today isn’t scary, so it seems silly to worry about future AI”.
Longer timelines do give us more time to figure out alignment, which lowers p(doom); but you still have to actually do the alignment in order to get out the microhopes, so it matters how much more hopeful you feel with an extra (e.g.) five years of time for humanity to work on the alignment problem.
If this means, e.g., “AI that can do everything a human physicist or heart surgeon can do, that can’t do other sciences and can’t outperform humans”, then I’d be really, really surprised if we ever see a state of affairs like that.
What, concretely, are some examples of world-saving things you think AI might be able to do 5+ years before world-endangering AGI is possible?
(Or, if you think the pre-danger stuff won’t be world-saving, then why does it lower your p(doom)? Just by giving humanity a bit more time to notice the smoke?)
This is true, but I don’t use the industrial revolution as a frame for thinking about AGI. I don’t see any reason to think AGI would be similar to the industrial revolution, except insofar as they’re both “important technology-mediated things that happened in history”. AGI isn’t likely to have a similar impact to steam engines because science and reasoning aren’t similar capabilities to a steam engine.
I think there might have been more effort spent seeking arguments against, but it’s shallower effort: the kinds of arguments you find when you’re trying to win a water-cooler argument or write a pop-sci editorial are different from the kinds of arguments you find when you’re working full-time on the thing.
I don’t think it has that much punch, partly just because I don’t think EAs and rationalists are as unreasonable as Christian apologists (e.g., we’re much more inclined to search for flaws in our own arguments). And partly because the field is just pretty old at this stage, and a lot of effort has gone into arguments in both directions by now.
Even if there were 10x as much effort going into “seek out new reasons to worry about AGI” as there were going into “seek out new reasons to relax about AGI”, this matters a lot less when you’re digging for the 1000th new argument than when you’re digging for the 10th. There just aren’t as many stones left unturned, and you can look at the available arguments yourself rather than wondering whether there are huge considerations the entire planet is missing.
I think there’s some social and psychological fuckery going on, but less than occurs in the opposite direction in the non-EA, non-rationalist, etc. superculture.
Partly I just think that because of my own experience. The idea of AIs rising up to overthrow humanity and kill us all just sounds really weird to me. I think this is inherently a really hard risk to take seriously on an emotional level—because it’s novel, because it sounds like science fiction, because it triggers our “anthropomorphize this” mental modules while violating many of the assumptions that we’re used to taking for granted in human-like reasoners.
Sitting with the idea for sufficiently long, properly chewing on it and metabolizing it, considering and analyzing concrete scenarios, and having peers to talk about this stuff with—all of that makes me feel much more equipped to weigh the arguments for and against the risks without leaning on shallow pattern-matching.
But that’s just autobiography; if you had an easy time seriously entertaining AI risk before joining the community, and now you feel “pressured to accept AGI risk” rather than “free to agree or disagree”, then by default I assume you’re just right about your own experience.
[cont.]
That paints a pretty fucked up picture of early-CFAR’s dynamics. I’ve heard a lot of conflicting stories about CFAR in this respect, usually quite vague (and there are nonzero ex-CFAR staffers who I just flatly don’t trust to report things accurately). I’d be interested to hear from Anna or other early CFAR staff about whether this matches their impressions of how things went down. It unfortunately sounds to me like a pretty realistic way this sort of thing can play out.
On my view, CFAR workshops can do a lot of useful things even if they aren’t magic bullets that instantly make all participants way more rational. E.g.:
Community-building is just plain useful, and shared culture and experiences is legit just an excellent way to build it. There’s no reason this has to be done in a dishonest way, and I suspect that some of the pressure to misclassify “community-building” stuff as “rationality-enhancing” stuff comes from the fact that the benefits of the former sound squishier.
I’ve been to two CFAR workshops (a decent one in late 2013, and an excellent one in late 2018; neither seemed dishonest or trying-to-wow-me), and IME a lot of the benefit came from techniques, language, and felt affordances to try lots of things in the months and years after the workshop—along with a bunch of social connections where it was common knowledge “it’s normal to try to understand and solve your problems, X Y Z are standard ways to get traction on day-to-day problems, etc.”.
The latter also doesn’t seem like a thing that requires any lying, and I’m very skeptical of the idea that it’s useful-for-epistemics-on-net to encourage workshop participants to fuzz out parts of their models, heavily defer to Mysterious Unquestionable Rationality Experts, treat certain questions as inherently bad to think about or discuss, etc. In my experience, deliberate fuzziness in one narrow part of people’s models often leaks out to distort thinking about many other things.
I don’t understand this part; how does it fail the is/ought distinction?
I think this is awkwardly/cutely phrased, but is contentful (and actually a good idea) rather than being doublespeak.
The way I’d put it (if I’m understanding the idea right) is: “we’re trying to teach rationality in a high-integrity, principled way, because we think rationality is useful for existential risk reduction”.
“Rationality for its own sake” is a different sort of optimization target than “rationality when it feels locally useful for an extrinsic goal”, among other things because it’s easy to be miscalibrated about when rationality is more or less useful instrumentally. Going all-in on rationality (and not trying to give yourself wiggle room to be more or less rational at different times) can be a good idea, even if at base we’re justifying policies like this for instrumental reasons on the meta level, as opposed to justifying the policy with rationales like “it’s just inherently morally good to always be rational, don’t ask questions, it just is”.
Hm, this case seems like a stretch to me, which makes me a bit more skeptical about the speaker’s other claims. But it’s also possible I’m misunderstanding what the intent behind “Rationality for its own sake for the sake of existential risk” originally was, or how it was used in practice?
I think you’re wrong about this, though the “within our lifetime” part is important: I think the case for extremely high risk is very strong, but reasoning about timelines seems way more uncertain to me.
(And way less important: if the human race is killed a few years after I die of non-AGI causes, that’s no less of a tragedy.)
If we bracket the timelines part and just ask about p(doom), I think https://www.lesswrong.com/posts/Ke2ogqSEhL2KCJCNx/security-mindset-lessons-from-20-years-of-software-security and https://intelligence.org/2017/11/25/security-mindset-ordinary-paranoia/ makes it quite easy to reach extremely dire forecasts about AGI. Getting extremely novel software right on the first try is just that hard. (And we have to do this eventually, even if we luck out and get a few years to play with weak AGIs before strong AGIs arrive.)
Surely not. Neither of those make any arguments about AI, just about software generally. If you literally think those two are sufficient arguments for concluding “AI kills us with high probability” I don’t see why you don’t conclude “Powerpoint kills us with high probability”.
Yep! To be explicit, I was assuming that general intelligence is very powerful, that you can automate it, and that it isn’t (e.g.) friendly by default.
I’m not sure I understand what statements like “general intelligence is very powerful” mean even though it seems to be a crucial part of the argument. Can you explain more concretely what you mean by this? E.g. What is “general intelligence”? What are the ways in which it is and isn’t powerful?
By “general intelligence” I mean “whatever it is that lets human brains do astrophysics, category theory, etc. even though our brains evolved under literally zero selection pressure to solve astrophysics or category theory problems”.
Human brains aren’t perfectly general, and not all narrow AIs/animals are equally narrow. (E.g., AlphaZero is more general than AlphaGo.) But it sure is interesting that humans evolved cognitive abilities that unlock all of these sciences at once, with zero evolutionary fine-tuning of the brain aimed at equipping us for any of those sciences. Evolution just stumbled into a solution to other problems, that happened to generalize to billions of wildly novel tasks.
To get more concrete:
AlphaGo is a very impressive reasoner, but its hypothesis space is limited to sequences of Go board states rather than sequences of states of the physical universe. Efficiently reasoning about the physical universe requires solving at least some problems (which might be solved by the AGI’s programmer, and/or solved by the algorithm that finds the AGI in program-space; and some such problems may be solved by the AGI itself in the course of refining its thinking) that are different in kind from what AlphaGo solves.
E.g., the physical world is too complex to simulate in full detail, unlike a Go board state. An effective general intelligence needs to be able to model the world at many different levels of granularity, and strategically choose which levels are relevant to think about, as well as which specific pieces/aspects/properties of the world at those levels are relevant to think about.
More generally, being a general intelligence requires an enormous amount of laserlike strategicness about which thoughts you do or don’t think: a large portion of your compute needs to be ruthlessly funneled into exactly the tiny subset of questions about the physical world that bear on the question you’re trying to answer or the problem you’re trying to solve. If you fail to be ruthlessly targeted and efficient in “aiming” your cognition at the most useful-to-you things, you can easily spend a lifetime getting sidetracked by minutiae / directing your attention at the wrong considerations / etc.
And given the variety of kinds of problems you need to solve in order to navigate the physical world well / do science / etc., the heuristics you use to funnel your compute to the exact right things need to themselves be very general, rather than all being case-specific. (Whereas we can more readily imagine that many of the heuristics AlphaGo uses to avoid thinking about the wrong aspects of the game state, or thinking about the wrong topics altogether, are Go-specific heuristics.)
GPT-3 is a very impressive reasoner in a different sense (it successfully recognizes many patterns in human language, including a lot of very subtle or conjunctive ones like “when A and B and C and D and E and F and G and H and I are all true, humans often say X”), but it too isn’t doing the “model full physical world-states and trajectories thereof” thing (though an optimal predictor of human text would need to be a general intelligence, and a superhumanly capable one at that).
Some examples of abilities I expect humans to only automate once we’ve built AGI (if ever):
The ability to perform open-heart surgery with a high success rate, in a messy non-standardized ordinary surgical environment.
The ability to match smart human performance in a specific hard science field, across all the scientific work humans do in that field.
In principle, I suspect you could build a narrow system that is good at those tasks while lacking the basic mental machinery required to do par-human reasoning about all the hard sciences. In practice, I very strongly expect humans to find ways to build general reasoners to perform those tasks, before we figure out how to build narrow reasoners that can do them. (For the same basic reason evolution stumbled on general intelligence so early in the history of human tech development.)
(Of course, if your brain has all the basic mental machinery required to do other sciences, that doesn’t mean that you have the knowledge required to actually do well in those sciences. An artificial general intelligence could lack physics ability for the same reason many smart humans can’t solve physics problems.)
When I say “general intelligence is very powerful”, a lot of what I mean is that science is very powerful, and that having all the sciences at once is a lot more powerful than the sum of each science’s impact.
(E.g., because different sciences can synergize, and because you can invent new scientific fields and subfields, and more generally chain one novel insight into dozens of other new insights that critically depended on the first insight.)
Another large piece of what I mean is that general intelligence is a very high-impact sort of thing to automate because AGI is likely to blow human intelligence out of the water immediately, or very soon after its invention.
80K gives the (non-representative) example of how AlphaGo and its immediate successors compared to the human ability range on Go:
I expect “general STEM AI” to blow human science ability out of the water in a similar fashion. Reasons for this include:
Software (unlike human intelligence) scales with more compute.
Current ML uses far more compute to find reasoners than to run reasoners. This is very likely to hold true for AGI as well.
We probably have more than enough compute already, and are mostly waiting on new ideas for how to get to AGI efficiently, as opposed to waiting on more hardware to throw at old ideas.
Empirically, humans aren’t near a cognitive ceiling, and even narrow AI often suddenly blows past the human reasoning ability range on the task it’s designed for. It would be weird if scientific reasoning were an exception.
See also AlphaGo Zero and the Foom Debate.
Empirically, human brains are full of cognitive biases and inefficiencies. It’s doubly weird if scientific reasoning is an exception even though it’s visibly a mess with tons of blind spots, inefficiencies, motivated cognitive processes, and historical examples of scientists and mathematicians taking decades to make technically simple advances.
Empirically, human brains are extremely bad at some of the most basic cognitive processes underlying STEM. E.g., consider that human brains can barely do basic mental math at all.
Human brains underwent no direct optimization for STEM ability in our ancestral environment, beyond things like “can distinguish four objects in my visual field from five objects”. In contrast, human engineers can deliberately optimize AGI systems’ brains for math, engineering, etc. capabilities.
More generally, the sciences (and many other aspects of human life, like written language) are a very recent development. So evolution has had very little time to refine and improve on our reasoning ability in many of the ways that matter.
Human engineers have an enormous variety of tools available to build general intelligence that evolution lacked. This is often noted as a reason for optimism that we can align AGI to our goals, even though evolution failed to align humans to its “goal”. It’s additionally a reason to expect AGI to have greater cognitive ability, if engineers try to achieve great cognitive ability.
The hypothesis that AGI will outperform humans has a disjunctive character: there are many different advantages that individually suffice for this, even if AGI doesn’t start off with any other advantages. (E.g., speed, math ability, scalability with hardware, skill at optimizing hardware...)
See also Sources of advantage for digital intelligence.
Nitpick: doesn’t the argument you made also assume that there’ll be a big discontinuity right before AGI? That seems necessary for the premise about “extremely novel software” (rather than “incrementally novel software”) to hold.
I do think that AGI will be developed by methods that are relatively novel. Like, I’ll be quite surprised if all of the core ideas are >6 years old when we first achieve AGI, and I’ll be more surprised still if all of the core ideas are >12 years old.
(Though at least some of the surprise does come from the fact that my median AGI timeline is short, and that I don’t expect us to build AGI by just throwing more compute and data at GPT-n.)
Separately and with more confidence, I’m expecting discontinuities in the cognitive abilities of AGI. If AGI is par-human at heart surgery and physics, I predict that this will be because of “click” moments where many things suddenly fall into place at once, and new approaches and heuristics (both on the part of humans and on the part of the AI systems we build), not just because of a completely smooth, incremental, and low-impact-at-each-step improvement to the knowledge and thought-habits of GPT-3.
“Superhuman AI isn’t just GPT-3 but thinking faster and remembering more things” (for example) matters for things like interpretability, since if we succeed shockingly well at finding ways to reasonably thoroughly understand what GPT-3′s brain is doing moment-to-moment, this is less likely to be effective for understanding what the first AGI’s brain is doing moment-to-moment insofar as the first AGI is working in very new sorts of ways and doing very new sorts of things.
I’m happy to add more points like these to the stew so they can be talked about. “Your list of reasons for thinking AGI risk is high didn’t explicitly mention X” is a process we can continue indefinitely long if we want to, since there are always more background assumptions someone can bring up that they disagree with. (E.g., I also didn’t explicitly mention “intelligence is a property of matter rather than of souls imparted into particular animal species by God”, “AGI isn’t thousands of years in the future”, “most random goals would produce bad outcomes if optimized by a superintelligence”...)
Which specific assumptions should be included depends on the conversational context. I think it makes more sense to say “ah, I personally disagree with [X], which I want to flag as a potential conversational direction since your comment didn’t mention [X] by name”, as opposed to speaking as though there’s an objectively correct level of granularity.
Like, the original thing I said was:
Which was responding to a claim in the OP that no EA can rationally have a super high belief in AGI risk:
The challenge the OP was asking me to meet was to point at a missing model piece (or a disagreement, where the other side isn’t obviously just being stupid) that can cause a reasonable person to have extreme p(AGI doom), given other background views the OP isn’t calling obviously stupid. (E.g., the OP didn’t say that it’s obviously stupid for anyone to have a confident belief that AGI will be a particular software project built at a particular time and place.)
The OP didn’t issue a challenge to list all of the relevant background views (relative to some level of granularity or relative to some person-with-alternative-views, which does need to be specified if there’s to be any objective answer), so I didn’t try to explicitly write out obvious popularly held beliefs like “AGI is more powerful than PowerPoint”. I’m happy to do that if someone wants to shift the conversation there, but hopefully it’s obvious why I didn’t do that originally.
Fair! Sorry for the slow reply, I missed the comment notification earlier.
I could have been clearer in what I was trying to point at with my comment. I didn’t mean to fault you for not meeting an (unmade) challenge to list all your assumptions—I agree that would be unreasonable.
Instead, I meant to suggest an object-level point: that the argument you mentioned seems pretty reliant on a controversial discontinuity assumption—enough that the argument alone (along with other, largely uncontroversial assumptions) doesn’t make it “quite easy to reach extremely dire forecasts about AGI.” (Though I was thinking more about 90%+ forecasts.)
(That assumption—i.e. the main claims in the 3rd paragraph of your response—seems much more controversial/non-obvious among people in AI safety than the other assumptions you mention, as evidenced by researchers criticizing it and researchers doing prosaic AI safety work.)
(Minor: I really liked your top-level comment but almost didn’t read this second comment because I didn’t immediately realize you split up your comment due to (I suppose) running out of space. Maybe worth it to add a “[cont.]” or something in such cases in future.)
Added!
Ok, so thinking about this, one trouble with answering your comment this is that you have a self-consistent worldview which has contrary implications to some of the stuff I hold, but I feel that you are not giving answers with reference to stuff that I already hold, but rather to stuff that further references that worldview.
Let me know if this feels way off.
So I’m going to just pick one object-level argument and dig in to that:
Well, I think that the question is, increased p(doom) compared to what, e.g., what were your default expectations before the DL revollution?
Compared to equivalent progress in a seed AI which has a utility function
Deep learning seems like it has some advantages, e.g,.: it is [doing the kinds of things that were reinforced during its training in the past], which seems safer than [optimizing across a utility function programmed into its core, where we don’t really know how to program utility functions.]
E.g., GPT-3 seems wildly more safe than a seed AGI that had already reached that level of capabilities
“Oh yes, we just had to put in a desire to predict the world, an impulse for curiosity, in addition to the standard self-preservation drive and our experimental caring for humans module, and then just let it explore the internet” sounds fairly terrifying.
Compared to no progress in deep learning at all
Sure, I agree
Compared to something else
Depends on the something else.
Sounds right to me! I don’t know your worldview, so I’m mostly just reporting my thoughts on stuff, not trying to do anything particularly sophisticated.
I personally started thinking about ML and AGI risk in 2013, and I didn’t have much of a view of “how are we likely to get to AGI?” at the time.
My sense is that MIRI-circa-2010 wasn’t confident about how humanity would get to AI, but expected it would involve gaining at least some more (object-level, gearsy) insight into how intelligence works. “Just throw more compute at a slightly tweaked version of one of the standard old failed approaches to AGI” wasn’t MIRI’s top-probability scenario.
From my perspective, humanity got “unlucky” in three different respects:
AI techniques started working really well early, giving us less time to build up an understanding of alignment.
Techniques started working for reasons other than us acquiring and applying gearsy new insights into how reasoning works, so the advances in AI didn’t help us understand how to do alignment.
And the specific methods that worked are more opaque than most pre-deep-learning AI, making it hard to see how you’d align the system even in principle.
Seems like the wrong comparison; the question is whether AGI built by deep learning (that’s at the “capability level” of GPT-3) is safer than seed AGI (that’s at the “capability level” of GPT-3).
I don’t think GPT-3 is an AGI, or has the same safety profile as baby AGIs built by deep learning. (If there’s an efficient humanly-reachable way to achieve AGI via deep learning.) So an apples-to-apples comparison would either think about hypothetical deep-learning AGI vs. hypothetical seed AGI, or it would look at GPT-3 vs. hypothetical narrow AI built on the road to seed AGI.
If we can use GPT-3 or something very similar to GPT-3 to save the world, then it of course matters that GPT-3 is way safer than seed AGI. But then the relevant argument would look something like “maybe the narrow AI tech that you get on the path to deep-learning AGI is more powerful and/or more safe than the narrow AI tech that you get on the path to seed AGI”, as opposed to “GPT-3 is safer than a baby god” (the latter being something that’s true whether or not the baby god is deep-learning-based).
Could you list some of your favorite/the strongest-according-to-you (collections of) arguments against AI risk?
Sure! It would depend on what you mean by “an argument against AI risk”:
If you mean “What’s the main argument that makes you more optimistic about AI outcomes?”, I made a list of these in 2018.
If you mean “What’s the likeliest way you think it could turn out that aligning AGI is unnecessary in order to do a pivotal act / initiate an as-long-as-needed reflection?”, I’d currently guess it’s using strong narrow-AI systems to accelerate you to Drexlerian nanotechnology (which can then be used to build powerful things like “large numbers of fast-running human whole-brain emulations”).
If you mean “What’s the likeliest way you think it could turn out that humanity’s current trajectory is basically OK / no huge actions or trajectory changes are required?”, I’d say that the likeliest scenario is one where AGI kills all humans, but this isn’t a complete catastrophe for the future value of the reachable universe because the AGI turns out to be less like a paperclip maximizer and more like a weird sentient alien that wants to fill the universe with extremely-weird-but-awesome alien civilizations. This sort of scenario is discussed in Superintelligent AI is necessary for an amazing future, but far from sufficient.
If you mean “What’s the likeliest way you think it could turn out that EAs are focusing too much on AI and should focus on something else instead?”, I’d guess it’s if we should focus more on biotech. E.g., this conjunction could turn out to be true: (1) AGI is 40+ years away; (2) by default, it will be easy for small groups of crazies to kill all humans with biotech in 20 years; and (3) EAs could come up with important new ways to avoid disaster if we made this a larger focus (though it’s already a reasonably large focus in EA).
Another way it could be bad that EAs are focusing on AI is if EAs are accelerating AGI capabilities / shortening timelines way more than we’re helping with alignment (or otherwise increasing the probability of good outcomes).
Here are a few non-MIRI perspectives if you’re interested:
What does it mean to align AI with human values?
The implausibility of intelligence explosion
Against the singularity hypothesis
Book Review: Reframing Superintelligence
This article is… really bad.
It’s mostly a summary of Yudkowsky/Bostrom ideas, but with a bunch of the ideas garbled and misunderstood.
Mitchell says that one of the core assumptions of AI risk arguments is “that any goal could be ‘inserted’ by humans into a superintelligent AI agent”. But that’s not true, and in fact a lot of the risk comes from the fact that we have no idea how to’insert’ a goal into an AGI system.
The paperclip maximizer hypothetical here is a misunderstanding of the original idea. (Though it’s faithful to the version Bostrom gives in Superintelligence.) And the misunderstanding seems to have caused Mitchell to misunderstood a bunch of other things about the alignment problem. Picking one of many examples of just-plain-false claims:
“And importantly, in keeping with Bostrom’s orthogonality thesis, the machine has achieved superintelligence without having any of its own goals or values, instead waiting for goals to be inserted by humans.”
The article also says that “research efforts on alignment are underway at universities around the world and at big AI companies such as Google, Meta and OpenAI”. I assume Google here means DeepMind, but what alignment research at Meta does Mitchell have in mind??
Also: “Many researchers are actively engaged in alignment-based projects, ranging from attempts at imparting principles of moral philosophy to machines, to training large language models on crowdsourced ethical judgments.”
… That sure is a bad picture of what looks difficult about alignment.
This essay is quite bad. A response here: A reply to Francois Chollet on intelligence explosion
I disagree with Thorstad and Drexler, but those resources seem much better.
Idk, I can’t help but notice that your title at MIRI is “Research Communications”, but there is nobody paid by the “Machine Intelligence Skepticism Institute” to put forth claims that you are wrong.
edit: removed superfluous “that”.
Since we’re talking about p(doom), this sounds like a claim that my job at MIRI is to generate arguments for worrying more about AGI, and we haven’t hired anybody whose job it is to generate arguments for worrying less.
Well, I’m happy to be able to cite that thing I wrote with a long list of reasons to worry less about AGI risk!
(Link)
I’m not claiming the list is exhaustive or anything, or that everyone would agree with what should go on such a list. It’s the reasons that update me the most, not the reasons that I’d expect to be most convincing to some third party.
But the very existence of public lists like this seems like something your model was betting against. Especially insofar as it’s a novel list with items MIRI came up with itself that reflect MIRI-ish perspectives, not a curated list of points other groups have made.
Nice
It also occurs to me that I’ve also written an argument in favor of worrying about x-risk, here.
I like the post you linked but I’m not sure this is much of a rebuttal to Nuno’s point. This is a single post, saying the situation is not maximally bad, against a much larger corpus of writings and communications by you and MIRI emphasizing risks from AGI.
If you think that AGI risk is extremely high (as I do), then the intellectually honest thing to do is to write out the main considerations that cause you to think it’s that high. This includes any major considerations that cause you to not think it’s even higher.
One of Nuno’s points in the OP was, paraphrasing: ‘I worry that the doomers are only citing strong arguments for doom, and not citing strong arguments against doom. Either because (a) doomers are flinching away from thinking about arguments against doom, or because (b) they’re strategically withholding arguments against doom in the hope of manipulating others into having doomier views. Whether (a) is true or (b), it follows that I should discount doomer arguments somewhat as filtered evidence.’
The existence of my “reasons I’m less doomy than I could have been” post is meaningful evidence against (a) and (b). It can still be part of an eleven-dimensional chess game to (consciously or unconsciously) trick people, but it’s nontrivial Bayesian evidence that we’re doing the epistemically cooperative thing.
If we had a huge number of posts pushing for optimism, then that would be even more evidence against (a) and (b). But that would also be evidence that we have way lower p(doom), or that we’re trying to trick people into thinking we have lower p(doom) by giving excessive time to arguments that we think are way weaker than the counter-arguments.
Be wary of setting a trap where there’s no possible way for you to take claims of high p(doom) seriously, because when someone gives more arguments for doom than for hope you assume they’re trying to trick you by filtering out secret strong reasons for hope, and when someone gives you similar numbers of arguments for doom and for hope you assume they can’t really think p(doom) is that high.
Yeah, I don’t think that your paraphrase was accurate. I don’t need to posit a conscious (strategically withholding) or subconscious (flinching away) conspiracy, in the same way that I don’t need a conscious conspiracy to explain why there are so many medieval proofs of god. So the problem may not be at the individual but at the collective level.
I like Unconscious Economics as an illustration of some of these dynamics.
Incidentally, that’s a good example of normal reasoning I’d consider fuzzy and which would bring my probabilities down
I briefly touched on this at the end of the post and in this comment thread. In short:
Eehh, you can’t just ignore your evidence being filtered
Strong kinds of evidence, e.g., empirical evidence, mathematical proof, very compelling arguments would still move my needle. Weak or fuzzy arguments much less
I can still process evidence from my own eyes, e.g., observe progress, tap into sources that I think are less filtered, think about this for myself, etc.
I can still “take claims of high p(doom) seriously” in the sense of believing that people reporting them hold that as a sincere belief.
Though that doesn’t necessarily inspire a compulsion to defer to those beliefs.
That all seems right to me, and compatible with what I was saying. The part of Sphor’s comment that seemed off to me was “against a much larger corpus of writings and communications by you and MIRI emphasizing risks from AGI”: one blog post is a small data point to weigh against lots of other data points, but the relevant data to weigh it against isn’t “MIRI wrote other things that emphasize risks from AGI” in isolation, as though “an organization or individual wrote a lot of arguments for X” on its own is strong reason to discount those arguments as filtered.
The thing doing the work has to be some background model of the arguers (or of some process upstream of the arguers), not a raw count of how often someone argues for a thing. Otherwise you run into the “damned if you argue a lot for X, damned if you don’t argue a lot for X” problem.
That’s not the point Nuno was making above. He said in the OP that there are selection effects at the level of arguments, not that doomers were trying to trick people. You replied to it saying that this argument doesn’t have much punch, because you trust EAs and Rationalists and think the field is established enough to have had arguments flowing in both directions. He replied by pointing out that MIRI promotes AI risk as an organization and there’s no equivalent organization putting out arguments against AI risk. You said this doesn’t apply because you once wrote a post saying not to be maximally pessimistic about AI. I said this doesn’t mean much because the vast majority of writing by you and MIRI emphasizes AI risks. I don’t know what your response to this specific line of criticism is.
Thanks for winding back through the conversation so far, as you understood it; that helped me understand better where you’re coming from.
Nuno said: “Idk, I can’t help but notice that your title at MIRI is ‘Research Communications’, but there is nobody paid by the ‘Machine Intelligence Skepticism Institute’ to put forth claims that you are wrong.”
I interpreted that as Nuno saying: MIRI is giving arguments for stuff, but I cited an allegation that CFAR is being dishonest, manipulative, and one-sided in their evaluation of AI risk arguments, and I note that MIRI is a one-sided doomer org that gives arguments for your side, while there’s nobody paid to raise counter-points.
My response was a concrete example showing that MIRI isn’t a one-sided doomer org that only gives arguments for doom. That isn’t a proof that we’re correct about this stuff, but it’s a data point against “MIRI is a one-sided doomer org that only gives arguments for doom”. And it’s at least some evidence that we aren’t doing the specific dishonest thing Nuno accused CFAR of doing, which got a lot of focus in the OP.
The specific thing you said was: “I like the post you linked but I’m not sure this is much of a rebuttal to Nuno’s point. This is a single post, saying the situation is not maximally bad, against a much larger corpus of writings and communications by you and MIRI emphasizing risks from AGI. ”
My reply mostly wasn’t an objection to “I’m not sure this is much of a rebuttal to Nuno’s point” or “This is a single post”. My objection was to “against a much larger corpus of writings and communications by you and MIRI emphasizing risks from AGI”. As I said to Nuno upthread:
Regarding those models of MIRI and other orgs in the space, and of upstream processes that might influence us:
I think you and Nuno are just wrong to think of MIRI as “an org organized around trying to make people more pessimistic about AI outcomes”, any more than FHI is an org organized around trying to make people think anthropics, whole-brain emulation, and biosecurity are really important. Those are things that people at FHI tend to believe, but that’s because researchers there (rightly or wrongly) looked at the arguments and reached that conclusion, while at the same time looking at other topics and concluding they weren’t very important (e.g., brain-computer interfaces, nuclear fusion, and asteroid risk). If FHI researchers had evaluated the arguments differently, the organization would have continued existing, just with a different set of research interests.
Similarly, MIRI was originally an accelerationist org, founded with a goal of advancing humanity to AGI as quickly as possible. We had an incentive to think AGI is important, but not (AFAICT) to think AGI is scary. “Oh wait, AGI is scary” is a conclusion Eliezer came to in the first few years of MIRI’s existence, via applying more scrutiny to his assumption that AGI would go great by default.
I’m all in favor of asking questions like “What were the myopic financial incentives in this case?” and seeing how much behavior this predicts. But I think applying that lens in an honest way should sometimes result in “oh weird, that behavior is the opposite of the one I’d naively have predicted with this model”, as opposed to it being a lens that can explain every observation equally.
MIRI deciding that AGI is scary and risky, in an excited techno-optimist social environment and funding landscape, seems like a canonical example of something different from that going on.
(Which doesn’t mean MIRI was right, then or now. People can be wrong for reasons other than “someone was paying me to be wrong”.)
Our first big donor, Peter Thiel, got excited about us because he thought of us as techno-optimists, and stopped supporting us within a few years when he concluded we were too dour about humanity’s prospects. This does not strike me as a weird or surprising outcome, except insofar as it’s weird someone in Thiel’s reference class took an interest in MIRI even temporarily.
I don’t think MIRI has more money today than if we were optimists about AI. I also don’t think we crystal-ball-predicted that funders like Open Phil would exist 5 or 15 years in the future, or that they’d have any interest in “superintelligent AI destroys the world” risk if they did exist. Nor do I think we’ve made more money, expanded more, felt better about ourselves, or had more-fun social lives via our opening up in 2020-2023 that we’ve become even more pessimistic and think things are going terribly, both at MIRI and in the alignment field at large.
Speaking to the larger question: is there a non-epistemic selection effect in the world at large, encouraging humanity to generate more arguments for AI risk than against it? This does not follow from the mere observation of a bunch of arguments for AI risk, because that observation is also predicted by those arguments being visibly correct, and accepted and shared because of their correctness.
For different groups, I’d guess that...
Random academics probably have a myopic incentive to say things that sound pretty respectable and normal, as opposed to wild and sensational. Beyond that, I don’t think there’s a large academia-wide incentive to either be pro-tech or anti-tech, or to have net-optimistic or net-pessimistic beliefs about AI in particular. There is a strong incentive to just ignore the topic, since it’s hard to publish papers about it in top journals or conferences.
Journalists do have an incentive to say things that sound sensational, both positive (“AI could transform the world in amazing positive way X!”) and negative (“AI could transform the world in horrifying way Y”). I’d guess there’s more myopic incentive to go negative than positive, by default. That said, respected newspapers tend to want to agree with academics and sound respectable and normal, which will similarly encourage a focus on small harms and small benefits. I don’t know how these different forces are likely to balance out, though I can observe empirically that I see a wide range of views expressed, including a decent number of articles worrying about AI doom.
The social network MIRI grew out of (transhumanists, Marginal-Revolution-style libertarians, extropians, techno-utopians, etc.) has strong myopic social incentives to favor accelerationism, “tech isn’t scary”, “regulation and safety concerns cause way more harm than the tech itself”, etc. The more optimistic you are about the default outcome of rapid technological progress, the better.
Though I think these incentives have weakened over the last 20 years, in large part due to MIRI persuading a lot of transhumanists to worry about misaligned AGI in particular, as a carve-out from our general techno-optimism.
EA circa 2010 probably had myopic incentives to not worry much about AGI doom, because “AGI breaks free of our control and kills us all” sounds weird and crankish, and didn’t help EA end malaria or factory farming any faster. And indeed, the earliest write-ups on AI risk by Open Phil and others strike me as going out of their way to talk about milder risks and be pretty cautious, abstract, and minimal in how they addressed “superintelligence takes over and kills us”, much more so than recent material like Cold Takes and the 80K Podcast. (Even though it’s not apparent to me that there’s more evidence for “superintelligence takes over and kills us” now than there was in 2014.)
EA circa 2023 probably has myopic incentives to have “medium-sized” probabilities of AI doom—unlike in the early days, EA leadership and super-funders like Open Phil nowadays tend to be very worried about AGI risk, which creates both financial and social incentives to look similarly worried about AI. The sweet spot is probably to think AI is a big enough risk to take seriously, but not as big as the weirder orgs like MIRI think. Within EA, this is a respected and moderate-sounding position, whereas in ML or academia even having a 10% chance of AGI doom might make you sound pretty crazy.
(Obviously none of this is true across the board, and different social networks within EA will have totally different local social incentives—some EA friend groups will think you’re dumb if you think AI risk is worth thinking about at all, some will think you’re dumb if your p(doom) is below 90%, and so on for a variety of different probability ranges. There’s a rich tapestry of diverging intuitions about which views are crazy here.)
The myopic incentives for ML itself, both financial and social, probably skew heavily toward “argue against ML being scary or dangerous at all”, mitigated mainly by a desire to sound moderate and respectable.
The “moderate and respectable” goal pushes toward ML people acknowledging that there are some risks, but relatively minor ones — this feels like a safe and sober middle ground between “AI is totally risk-free and there will be no problems” and “there’s a serious risk of AI causing a major global catastrophe”.
“Moderate and respectable” also pushes against ML people arguing against AGI risk because it pushes for ML people to just not talk about the subject at all. (Though I’d guess this is a smaller factor than “ML people don’t feel like they have a strong argument, and don’t want to broach the topic if there isn’t an easy powerful rebuttal”. People tend to be pretty happy to dunk on views they think are crazy—e.g., on social media—if they have a way of pointing at something about the view that their peers will be able to see is clearly wrong.)
I would say that the most important selection effect is ML-specific (favoring lower p(doom)), because ML is “the experts” who smart people would most naturally want to defer to, is a lot larger than the AI x-risk ecosystem (and especially larger than the small part of the x-risk ecosystem that has way higher p(doom) than Nuno), and ML researchers can focus a large share of their attention on generating arguments for “ML is not scary or dangerous at all”, whereas journalists, academia-at-large, etc. have their attention split between AI and a thousand other topics.
But mostly my conclusion from all this, and from the history of object-level discussion so far, is that there just aren’t super strong myopic incentives favoring either “humanity only generates arguments for higher p(doom)” or “humanity only generates arguments for lower p(doom)”. There’s probably some non-epistemic tilt toward “humanity generates more arguments against AI risk than for AI risk”, at least within intellectual circles (journalism may be another matter entirely). But I don’t think the arguments are so impenetrably difficult to evaluate on their own terms, or so scarce (on anyone’s side), that it ends up mattering much.
From inside MIRI, it appears much more plausible to me that we’ve historically understated how worried we are about AI, than that we’ve historically overstated it. (Which seems like a mistake to me now.) And I think our arguments are good on their own terms, and the reasoning checks out. Selection effects strike me as a nontrivial but minor factor in all of this.
I don’t think everyone has access to the same evidence as me, so I don’t think everyone should have probabilities as doomy as mine. But the above hopefully explains why I disagree with “the selection effects argument packs a whole lot of punch behind it”, as well as “having 70 or 80%+ probabilities on AI catastrophe within our lifetimes is probably just incorrect, insofar as a probability can be incorrect”.
I take the latter to be asserting, not just that Nuno thinks he lacks enough evidence to have 70% p(doom in our lifetime), but that he places vanishingly small probability on anyone else having the evidence required to have an extreme belief about this question.
Showing that this is overconfident on Nuno’s part requires a lot less evidence than providing a full decomposition of all the factors going into my level of worry about AGI: it should be easier for us to reach agreement that the other point of view isn’t crazy than for us to reach agreement about all the object-level questions.
I’m sorry, I’m not sure I understood correctly. Are you saying you agree there are selection effects, but you object to how you think Nuno and I are modeling MIRI and the processes generating MIRI-style models on AGI?
I’m confused by your phrasing “there are selection effects”, because it sounds so trivial to me. Every widespread claim faces some nonzero amount of (non-epistemic) selection bias.
E.g., I’d assume that twelve-syllable sentences get asserted at least slightly less often than eleven-syllable sentences, because they’re a bit more cumbersome. This is a non-epistemic selection effect, but it doesn’t cause me to worry that I’ll be unable to evaluate the truth of eleven- or twelve-syllable sentences for myself.
There are plenty of selection effects in the world, but typically they don’t put us into a state of epistemic helplessness; they just imply that it takes a bit of extra effort to dig up all the relevant arguments (since they’re out there, some just take some more minutes to find on Google).
When the world has already spent decades arguing about a question, and there are plenty of advocates for both sides of the question, selection effects usually mean “it takes you some more minutes to dig up all the key arguments on Google”, not “we must default to uncertainty no matter how strong the arguments look”. AI risk is pretty normal in that respect, on my view.
Re: Arguments against conjunctiveness
So here the thing is that I don’t find Nate’s argument particularly compelling, and after a few times of the following pattern:
Here is a reason to think that AI might not happen/might not cause an existential risk
Here is an argument for why that reason doesn’t apply, which could range from wrong to somewhat compelling to very compelling
[Advocate proceeds to take the argument on 2. as sort of permission in their mind to assign maximal probability to AI doom]
I grow tired of it, and it starts to irk me.
What’s an example of “here is an argument for why that reason doesn’t apply” that you think is wrong?
And are you claiming that Nate or I are “assigning maximal probability to AI doom”, or doing this kind of qualitative black-and-white reasoning? If so, why?
Nate’s post, for reference, was: AGI ruin scenarios are likely (and disjunctive)
Rereading the post, I think that it has a bunch statements about what Soares believes, but it doesn’t have that many mechanisms, pathways, counter-considerations, etc.
E.g.,:
This is probably a good statement of what Soares thinks needs to happen, but it is not a case for that, so I am left to evaluate the statements and the claim that they are conjunctive with reference to their intuitive plausibility.
I think I might be a bit dense here.
E.g.,:
Idk, he later mentions the US government’s COVID response, but I think the relevant branch of the government for dealing with AGI threats would probably be the department of defense, which seems much more competent, and seems capable of plays like blocking exports of semiconductor manufacturing equipment to China.
But it does leave a path open to prevent doom: not reaching superintelligence! i.e. a global moratorium on AGI.
(The rest of the comment is great btw :))
What causes the moratorium to be adopted, and how is it indefinitely enforced in all countries in the world?
(Also, if this can be achieved with roughly human-level AGIs running around, why can’t it be achieved without such AGIs running around?)
Social/moral consensus? There is precedent with e.g. recombinant DNA or human genetic engineering (if only the AI Asilomar conference was similarly focused on a moratorium!) It might be hard to indefinitely enforce globally, but we might at least be able to kick the can down the road a couple of decades (as seems to have happened with the problematic bio research).
(It should be achieved without such AGIs running around, if we want to minimise x-risk. Indeed, we should have started on this already! I’m starting to wonder whether it might actually be the best option we have, given the difficulty, or perhaps impossibility(?) of alignment.)
Don’t get me wrong, I’d love to live in a glorious transhuman future (like e.g. Iain M Bank’s Culture), but I just don’t think it’s worth the risk of doom, as things stand. Maybe after a few decades of moratorium, when we know a lot more, we can reassess (and hopefully we will still be able to have life extension so will personally still be around).
It now seems unfortunate that the AI x-risk prevention community was seeded from the transhumanist/techno-utopian community (e.g. Yudkowsky and Bostrom). This historical contingency is probably a large part of the reason why a global moratorium on AGI has never been seriously proposed/attempted.
Seems very surprising if true — the Yudkowskians are the main group that worries we’re screwed without a global moratorium, and the main group that would update positively if there were a way to delay AGI by a few decades. (Though they aren’t the main group that thinks it’s tractable to coordinate such a big delay.)
From my perspective Bostrom and Yudkowsky were the ones saying from the get-go that rushing to AGI is bad. E.g., in Superintelligence:
(Though he flags that this is a “tentative conclusion” that “could be overturned, for example if the threats from other existential risks or from post-transition coordination failures turn out to be extremely large”. If we were thinking about going from “AGI in 100 years” to “AGI in 300 years”, I might agree; if we’re instead going from “AGI in 15 years” to “AGI in 40 years”, then the conclusion seems way less tentative to me, given how unsolved the alignment problem is!)
The transhumanists were the ones who centered a lot of the early discussion around differential technological development, a.k.a. deliberately trying to slow down scary tech (e.g. AGI) so it comes after anti-scary tech (e.g. alignment), or attempting to accelerate alignment work to the same effect.
The idea that Bostrom or Yudkowsky ever thought “the alignment problem is a major issue, but let’s accelerate to AGI as quickly as possible for the sake of reaching the Glorious Transhumanist Future sooner” seems like revisionism to me, and I’m skeptical that the people putting less early emphasis on differential technological development back in 2014, in real life, would have somehow performed better in this counterfactual.
I’m not saying this is (was) the case. It’s more subtle than that. It’s the kind of background worldview that makes people post this (or talk of “pivotal acts”) rather than this.
The message of differential technological development clearly hasn’t had the needed effect. There has been no meaningful heed paid to it by the top AI companies. What we need now is much stronger statements. i.e. ones that use the word “moratorium”. Why isn’t MIRI making such statements? It doesn’t make sense to go to 0 hope of survival without even seriously attempting a moratorium (or at the very least, publicly advocating for one).
I think the blunt MIRI-statement you’re wanting is here:
Tangentially, I’ll note that you might not want MIRI to say “that move isn’t available to us”, if you think that it’s realistic to get the entire world to take a break on AGI work, and if you think that saying pessimistic things about this might make it harder to coordinate. (Because, e.g., this might require a bunch of actors to all put a lot of sustained work into building some special institution or law, that isn’t useful if you only half-succeed; and Alice might not put in this special work if she thinks Bob is unconditionally unwilling to coordinate, or if she’s confident that Carol is confident that Dan won’t coordinate.)
But this seems like a very unlikely possibility to me, so I currently see more value in just saying MIRI’s actual take; marginal timeline-lengthening actions can be useful even if we can’t actually put the whole world on pause for 20 years.
This is good, but I don’t think it goes far enough. And I agree with your comments re “might not want MIRI to say “that move isn’t available to us”″. It might not be realistic to get the entire world to take a break on AGI work, but it’s certainly conceivable, and I think maybe at this point more realistic than expecting alignment to be solved in time (or at all?). It seems reasonable to direct marginal resources toward pushing for a moratorium on AGI rather than more alignment work (although I still think this should at least be tried too!)
Your’s and Nate’s statement still implicitly assumes that AGI capabilities orgs are “on our side”. The evidence is that they are clearly not. Demis is voicing caution at the same time that Google leadership have started a race with OpenAI (Microsoft). It’s out of Demis’ (and his seemingly toothless ethics board’s) hands. Less accepting what has been tantamount to “existential safety washing”, and more realpolitik, is needed. Better now might be to directly appeal to the public and policymakers. Or find a way to strategise with those with power. For example, should the UN Security Council be approached somehow? This isn’t “defection”.
I’m saying all this because I’m not afraid of treading on any toes. I don’t depend on EA money (or anyone’s money) for my livelihood or career[1]. I’m financially independent. In fact, my life is pretty good, all apart from facing impending doom from this! I mean, I don’t need to work to survive[2], I’ve got an amazing partner and and a supportive family. All that is missing is existential security! I’d be happy to have “completed it mate” (i.e I’ve basically done this with the normal life of house, car, spouse, family, financial security etc); but I haven’t - remaining is this small issue of surviving for a normal lifespan, having my children survive and thrive / ensuring the continuation of the sentient universe as we know it...
Although I still care about my reputation in EA to be fair (can’t really avoid this as a human)
All my EA work is voluntary
I think it’s a lot more realistic to solve alignment than to delay AGI by 50 years. I’d guess that delaying AGI by 10 years is maybe easier than alignment, but it also doesn’t solve anything unless we can use those 10 years to figure out alignment as well. For that matter, delaying by 50 years also requires that we solve alignment in that timeframe, unless we’re trying to buy time to do some third other thing.
The difficulty of alignment is also a lot more uncertain than the difficulty of delaying AGI: it depends more on technical questions that are completely unknown from our current perspective. Delaying AGI by decades is definitely very hard, whereas the difficulty of alignment is mostly a question mark.
All of that suggests to me that alignment is far more important as a way to spend marginal resources today, but we should try to do both if there are sane ways to pursue both options today.
If you want MIRI to update from “both seem good, but alignment is the top priority” to your view, you should probably be arguing (or gathering evidence) against one or more of these claims:
AGI alignment is a solvable problem.
Absent aligned AGI, there isn’t a known clearly-viable way for humanity to achieve a sufficiently-long reflection (including centuries of delaying AGI, if that turned out to be needed, without permanently damaging or crippling humanity).
(There are alternatives to aligned AGI that strike me as promising enough to be worth pursuing. E.g., maybe humans can build Drexlerian nanofactories without help from AGI, and can leverage this for a pivotal act. But these all currently seem to me like even bigger longshots than the alignment problem, so I’m not currently eager to direct resources away from (relatively well-aimed, non-capabilities-synergistic) alignment research for this purpose.)
Humanity has never succeeded in any political task remotely as difficult as the political challenge of creating an enforced and effective 50+ year global moratorium on AGI. (Taking into account that we have no litmus test for what counts as an “AGI” and we don’t know what range of algorithms or what amounts of compute you’d need to exclude in order to be sure you’ve blocked AGI. So a regulation that blocks AGI for fifty years would probably need to block a ton of other things.)
EAs have not demonstrated the ability to succeed in political tasks that are way harder than any political task any past humans have succeeded on.
Even a 10 year delay is worth a huge amount (in expectation). We may well have a very different view of alignment by then (including perhaps being pretty solid on it’s impossibility? Or perhaps a detailed plan for implementing it? (Or even the seemingly very unlikely ”..there’s nothing to worry about”)), which would allow us to iterate on a better strategy (we shouldn’t assume that our outlook will be the same after 10 years!)
Yes! (And I think there are sane ways).
There are people working on this (e.g. Yampolskiy, Landry & Ellen), and this is definitely something I want to spend more time on (note that the writings so far could definitely do with a more accessible distillation).
I really don’t think we need to worry about this now. AGI x-risk is an emergency—we need to deal with that emergency first (e.g. kick the can down the road 10 years with a moratorium on AGI research); then when we can relax a little, we can have the luxury to think about long term flourishing.
I think this can definitely be argued against (and I will try and write more as/when I make a more fleshed out post calling for a global AGI moratorium). For a start, without all the work on nuclear proliferation and risk, we may well not be here today. Yes there has been proliferation, but there hasn’t been an all-out nuclear exchange yet! It’s now 77 years since a nuclear weapon was used in anger. That’s a pretty big result I think! Also, global taboos around bio topics such as human genetic engineering are well established. If such a taboo is established, enforcement becomes a lesser concern, as you are then only fighting against isolated rogue elements rather than established megacorporations. Katja Grace discusses such taboos in her post on slowing down AI.
Fair point. I think we should be thinking much wider than EA here. This needs to become mainstream, and fast.
Also, I should say that I don’t think MIRI should necessarily be diverting resources to work on a moratorium. Alignment is your comparative advantage so you should probably stick to that. What I’m saying is that you should be publicly and loudly calling for a moratorium. That would be very easy for you to do (a quick blog post/press release). But it could have a huge effect in terms of shifting the Overton Window on this. As I’ve said, it doesn’t make sense for this not to be part of any “Death with Dignity” strategy. The sensible thing when faced with ~0% survival odds is to say “FOR FUCK’S SAKE CAN WE AT LEAST TRY AND PULL THE PLUG ON HUMANS DOING AGI RESEARCH!?!”, or even “STOP BUILDING AGI YOU FUCKS!” [Sorry for the language, but I think it’s appropriate given the gravity of the situation, as assumed by talk of 100% chance of death etc.]
Agreed on all counts! Though as someone who’s been working in this area for 10 years, I have a newfound appreciation for how little intellectual progress can easily end up happening in a 10-year period...
I have a lot of hopes that seem possible enough to me to be worth thinking about, but this specific hope isn’t one of them. Alignment may turn out to be easier than expected, but I think we can mostly rule out “AGI is just friendly by default”.
In which direction?
:P
I’m joking, though I do take seriously that there are proposals that might be better signal-boosted by groups other than MIRI. But if you come up with a fuller proposal you want lots of sane people to signal-boost, do send it to MIRI so we can decide if we like it; and if we like it as a sufficiently-realistic way to lengthen timelines, I predict that we’ll be happy to signal-boost it and say as much.
I strongly agree and think it’s right that people… like, put some human feeling into their words, if they agree about how fucked up this situation is? (At least if they find it natural to do so.)
Yes, but you could think that the fuckery in the EA/rat comunity is concentrated on the topic of AI, and that the EA/rat communities can develop defenses against normal social fuckery but not viceversa.
Hey, thanks for your lengthy comment. For future reference I would have found it more convenient if you had an individual comment for each consideration :)
I could do that, though some of my points are related, which might make it confusing when karma causes the comments to get rearranged out of order.