Error
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
Thanks for posting! I’m sympathetic to the broad intuition that any one person being at the sweet spot where they make a decisive impact seems unlikely , but I’m not sold on most of the specific arguments given here.
I don’t see why this is the relevant standard. “Just” avoiding egregiously unintended behavior seems sufficient for avoiding the worst accidents (and is clearly possible, since humans do it often).
Also, I don’t think I’ve heard these decent reasons—what are they?
It’s also unclear that we only have one chance at this. Optimistically (but not that optimistically?), incremental progress and failsafes can allow for effectively multiple chances. (The main argument against seems to involve assumptions of very discontinuous or abrupt AI progress, but I haven’t seen very strong arguments for expecting that.)
Agree, but also unclear why this is the relevant standard. A smaller set of actors agreeing on a more limited goal might be enough to help.
Yup, though we should make sure not to double-count this, since this point was also included earlier (which isn’t to say you’re necessarily double-counting).
This also seems like an unnecessarily high standard, since regulations have been passed and enforced before without unanimous support from affected companies.
Also, getting acceptance from all major governments does seem very hard but not quite as hard as the above quotes makes it sound. After all, many major governments (developed Western ones) have relatively similar cultural sensibilities, and ambitious efforts to prevent unilateral actions have previously gotten very broad acceptance (e.g. many actors could have made and launched nukes, done large-scale human germline editing, or maybe done large-scale climate engineering, but to my knowledge none of those have happened).
Yup, though this is also potential double-counting.
Yeah, I share the view that the “Recalls” are the weakest part—I mostly was trying to get my fuzzy, accumulated-over-many-years vague sense of “whoa no we’re being way too confident about this” into a more postable form. Seeing your criticisms I think the main issue is a little bit of a Motte-and-Bailey sort of thing where I’m kind of responding to a Yudkowskian model, but smuggling in a more moderate perspective’s odds (ie. Yudkowsky thinks we need to get it right on the first try, but Grace and MacAskill may be agnostic there).
I may think more about this! I do think there’s something there sort of between the parts you’re quoting, by which I mean yes, we could get agreement to a narrower standard than solving ethics, but even just making ethical progress at all, or coming up with standards that go anywhere good/predictable politically seems hard. Like, the political dimension and the technical/problem specification dimensions both seem super hard in a way where we’d have to trust ourselves to be extremely competent across both dimensions, and our actual testable experiments against either outcome are mostly a wash (ie. we can’t get a US congressperson elected yet, or get affordable lab-grown meat on grocery store shelves, so doing harder versions of both at once seems...I dunno, might hedge my portfolio far beyond that!).
I think you’re pointing to a real phenomenon here (though I might not call it an “optimism bias”—EAs also tend to be unusually pessimistic about some things).
I have pretty strong disagreements with a lot of the more concrete points in the post though, I’ve tried to focus on the most important ones below.
(I think you may have missed the factor of 0.01, the relative risk reduction you postulated? I get 8 billion * 0.06 * 0.01 * 0.1 * 0.1 = 48,000. So AI safety would look worse by a factor of 100 compared to your numbers.)
But anyway, I strongly disagree with those numbers, and I’m pretty confused as to what kind of model generates them. Specifically, you seem to be extremely confident that we can’t solve AI X-risk (< 1⁄10,000 chance if we multiply together the 1% relative reduction with your two 10% chances). On the other hand, you think we’ll most likely be fine by default (94%). So you seem to be saying that there probably isn’t any problem in the first place, but if there is, then we should be extremely certain that it’s basically intractable. This seems weird to me. Why are you so sure that there isn’t a problem which would lead to catastrophe by default, but which could be solved by e.g. 1,000 AI safety researchers working for 10 years? To get to your level of certainty (<1/10,000 is a lot!), you’d need a very detailed model of AI X-risk IMO, more detailed than I think anyone has written about. A lot of the uncertainty people tend to have about AI X-risk comes specifically from the fact that we’re unsure what the main sources of risk are etc., so it’s unclear how you’d exclude the possibility that there are significant sources of risk that are reasonably easy to address.
As to why I’m not convinced by the argument that leads you to the <1/10,000 chance: the methodology of “split my claim into a conjunction of subclaims, then assign reasonable-sounding probabilities to each, then multiply” often just doesn’t work well (there are exceptions, but this certainly isn’t one of them IMO). You can get basically arbitrary result by splitting up the claim in different ways, since what probabilities are “reasonable-sounding” isn’t very consistent in humans.
I can’t speak for all longtermists of course, but that is decidedly not an argument I want to make (and FWIW, my impression is that this is not the key objection most longtermists would raise). If you convinced me that our chances of preventing an AI existential catastrophe were <1/10,000, and that additionally we’d very likely die in a few centuries anyway (not sure just how likely you think that is?), then I would probably throw the expected value calculations out the window and start from scratch trying to figure out what’s important. Basically for exactly the reasons you mention: at some point this starts feeling like a Pascal’s mugging, and that seems fishy and confusing.
But I think the actual chances we prevent an AI existential catastrophe are way higher than 1⁄10,000 (more like 1⁄10 in terms of the order of magnitude). And I think conditioned on that, our chances of surviving for billions of years are pretty decent (very spontaneous take: >=50%). Those feel like cruxes to me way more than whether we should blindly do expected value calculations with tiny probabilities, because my probabilities aren’t tiny.
I agree it’s possible in a very weak sense, but I think we can say something stronger about just how unlikely this is (over the next millennium or two): Nothing like this has happened over the past 65 million years (where I’m counting the asteroid back then as “unstoppable” even though I think we could stop that soon after AGI). So unless you think that alien invasions are reasonably likely to happen soon (but were’t likely before we sent out radio waves, for example), this scenario seems to be firmly in the “not really worth thinking about” category.
This may seem really nitpicky, but I think it’s important when we talk about how likely it is that we’ll continue living for billions of years. You give several scenarios for how things could go badly, but it would be just as easy to list scenarios for how things could go well. Listing very unlikely scenarios, especially just on one side, actively makes our impression of the overall probabilities worse.
Ah yeah, you’re right—I think basically I put in the percent rather than the probability. So it would indeed be very expensive to be competitive with AMF. Though so is everything else, so that’s not hugely surprising.
As for the numbers, yeah, it does just strike me as really, really unlikely that we can solve AI x-risk right now. 1⁄10,000 does feel about right to me. I certainly wouldn’t expect everyone else to agree though! I think some people would put the odds much higher, and others (like Tyler Cowen maybe?) would put them a bit lower. Probably the 1% step is the step I’m least confident in—wouldn’t surprise me if the (hard to find, hard to execute) solutions that are findable would reduce risk significantly more.
EDIT: tried to fix the math and switched the “relative risk reduction term” to 10%. I feel like among findable, executable interventions there’s probably a lot of variance, and it’s plausible some of the best ones do reduce risk by 10% or so. And 1/1000 feels about as plausible as 1/10000 to me. So, somewhere in there.
I think Erik wasn’t commenting so much on this number, but rather its combination with the assumption that there is a 94% chance things are fine by default.
I.e. you are assuming that there is a 94% chance it’s trivially easy, and 6% chance it’s insanely hard.
Very few problems have such a bimodal nature, and I also would be interested to understand what’s generating it for you.
I think you should be substantially more optimistic about the effects of aligned AGI. Once we have aligned AGI, this basically means high end cognitive labor becomes very cheap, as once an AI system is trained, it is relatively cheap to deploy it en masse. Some of these AI scientists would presumably work on making AI’s at least cheaper if not more capable, which limits to a functionally infinite supply of high end scientists. Given a functionally infinite supply of high end scientists, we will quickly discover basically everything that can be discovered through parallelizable scientific labor which is, if not everything, I think at least quite a few things (e. g. I have pretty high confidence that we could solve aging, develop extremely good vaccines to prevent against biorisk, etc.). Moreover, this is only a lower bound; I think AGI will probably relatively quickly become significantly smarter than the smartest human, so we will probably do even better than the aforementioned scenario.
To me, “aligned” does a lot of work here. Like yes, if it’s perfectly aligned and totally general, the benefits are mind boggling. But maybe we just get a bunch of AI that are mostly generating pretty good/safe outputs, but a few outputs here and there lower the threshold required for random small groups to wreak mass destruction, and then at least one of those groups blows up the biome.
But yeah given the premise we get AGI that mostly does what we tell it to, and we don’t immediately tell it to do anything stupid, I do think it’s very hard to predict what will happen but it’s gonna be wild (and indeed possibly really good).
Strong upvote—I found your perspective really fresh:
”The most likely case to me is that if AI x-risk is solved or turns out not to be a serious issue, and we just keep facing x-risks in proportion to how strong our technology gets, forever. Eventually we draw a black ball and all die.”
Lots of us are considering a career pivot into AI safety. Is it...actually tractable at all? How hopeful should we be about it? No idea.
Thank you! My perspective is: “figuring out if it’s tractable is at least tractable enough that it’s worth a lot more time/attention going there than is currently”, but not necessarily “working on it is far and away the best use of time/money/attention for altruistic purposes”, and almost certainly not “working on it is the best use of time/money/attention under a wide variety of ethical frameworks and it should dominate a healthy moral parliament”.
It’s hard to say. Considering there are fewer than 300 people estimated working on AI Safety and it’s still just starting to gain traction, I wouldn’t expect us to know a ton about it yet.
Even in established fields people are expected to usually take years or even decades before they can produce truly great research.
Psychology was still using lobotomies until 55 years ago. We’ve learned a lot since then and there’s still much more to learn. It took a similar amount of time for AI capabilities to get to where they are now. AI Safety is much newer and could look completely different in 10 years. Or, if nobody works on it or the people working on it are unable to make progress, it could look relatively similar.
Data point: I wasn’t there for this but Justis is a friend of mine, and on an interpersonal level he’s one of the chillest, highest-contentment-set-point people I know. He doesn’t brim over with cheerleading or American dynamism, but my default assumption is if someone calls him a downer they can’t mean interpersonal affect.
Re optimism bias
Towards the top of the post I think you made a claim that EAs are often very optimistic (particularly agentic one’s doing ambitious things or in ‘elitist’ positions).
I just wanted to flag that this isn’t my impression of many EAs who I think are doing ambitious projects, I think a disproportionate number of agentic people I know in EA are pretty pessimistic in general.
I think the optimism thing and something like desire to try hard/motivation/ enthusiasm for projects are getting a bit confused here, but low confidence.
Justis,do you, as someone involved in AI safety research, think that AI safety researchers would mostly dislike the total termination of AI research (assuming they all found great alternative jobs, etc)?
Hmm. I think reactions to that would vary really widely between researchers, and be super sensitive to when it happened, why, whether it was permanent, and other considerations.
I wonder if they are truly against AGI or ASI, or if they just want the safe versions? I am not sure if there are really two positions here (one for AI, one against), or really just one with caveats.
I wonder how much of this is an EA thing vs idiosyncrasies of the org you trialed at, or for that matter, West Coast American culture overall. Fwiw, my own experience is that I worked at three non-EA tech companies (Epic, Impossible Foods, and Google), and broadly people seemed more positive/confident in the organization than people I know in EA orgs. Certainly EA funders seem more pessimistic (though I’ve never talked to top VCs).
This seems quite bad and I’m sorry you had to go through that. The org’s actions feels rather unprofessional to me tbh.
Yeah I think a lot of it is West Coast American culture! I imagine EA would have super different vibes if it were mostly centered in New York.
For a contrasting opinion by Kat Woods and Amber Dawn, here’s this post: Two reasons we might be closer to solving alignment than it seems.
Link below:
https://forum.effectivealtruism.org/posts/RkpdA8763yGtEovj9/two-reasons-we-might-be-closer-to-solving-alignment-than-it
(Comment to flag that I looked back over this and just totally pretended 4,000 was equal to 1,000. Whoops. Don’t think it affects the argument very strongly, but I have multiplied the relevant dollar figures by 4.)
double comment
Thanks!