Confirmed, there’s still people around :)
Arepo
I would strongly push back on the idea that a world where it’s unlikely and we can’t change that is uninteresting. In that world, all the other possible global catastrophic risks become far more salient as potential flourishing-defeaters.
Thanks for the shout-out :) If you mentally replace the ‘multiplanetary’ state with ‘post-AGI’ in this calculator, I do think it models the set of concerns Will’s talking about here pretty well.
Thanks Ozzie! I’ll definitely try this out if I ever finish my current WIP :)
Questions that come to mind:
Will it automatically improve as new versions of the underlying model families are released?
Will you be actively developing it?
Feature suggestion: could/would you add a check for obviously relevant literature and ‘has anyone made basically this argument before’?
A 10% chance of transformative AI this decade justifies current EA efforts to make AI go well.
Not necessarily. It depends on
a) your credence distribution of TAI after this decade,
b) your estimate of annual risk per year of other catastrophes, and
c) your estimate of the comparative longterm cost of other catastrophes.
I don’t think it’s unreasonable to think, for example, that
there’s a very long tail to when TAI might arrive, given that its prospects of arriving in 2-3 decades are substantially related to to its prospects of arriving this decade) it arriving this decade (e.g. if we scale current models substantially and they still show no signs of becoming TAI, that undermines the case for future scaling getting us there under same paradigm); or
the more pessimistic annual risk estimates I talked about in the previous essay of 1-2% per year are correct, and that future civilisations will have a sufficiently increased difficulty for a collapse to cost to have near 50% the expected cost of extinction
And either of these beliefs (and others) would suggest we’re relatively overspending on on AI.
It’s also important to understand that Hendrycks and Yudkowsky were simply describing/predicting the geopolitical equilibrium that follows from their strategies, not independently advocating for the airstrikes or sabotage.
This is grossly disingenuous. Yudkowsky frames his call for airstrikes as what we ‘need’ to do, and describes them in the context of the hypothetical ‘if I had infinite freedom to write laws’. Hendrycks is slightly less direct in actively calling for it, claiming that it’s the default, but the document clearly states the intent of supporting it ‘we outline measures to maintain the conditions for MAIM’.
These aren’t the words of people dispassionately observing a phenomenon—they are both clearly trying to bring about the scenarios they describe when the lines they’ve personally drawn are crossed.
But the expected value of existential risk reduction is—if not infinite, which I think it clearly is in expectation—extremely massive.
I commented something similar on your blog, but as soon as you allow that one decision is infinite in expectation you have to allow that all outcomes are, since whatever possibility of infinite value you have given that action must still be present without it.
If you think the Bostrom number of 10^52 happy people has a .01% chance of being right, then you’ll get 10^48 expected future people if we don’t go extinct, meaning reducing odds of existential risks by 1/10^20 creates 10^28 extra lives.
Reasoning like this seems kind of scope insensitive to me. In the real world, it’s common to see expected payoffs declining as offered rewards get larger, and I don’t see any reason to think this pattern shouldn’t typically generalise to most such prospects, even when the offer is astronomically large.
The odds are not trivial that if we get very advanced AI, we’ll basically eliminate any possibility of human extinction for billions of years.
I think the stronger case is just security in numbers. Get a civilisation around multiple star systems and capable of proliferating, and the odds of its complete destruction rapidly get indistinguishable from 0.
I agree with Yarrow’s anti-‘truth-seeking’ sentiment here. That phrase seems to primarily serve as an epistemic deflection device indicating ‘someone whose views I don’t want to take seriously and don’t want to justify not taking seriously’.
I agree we shouldn’t defer to the CEO of PETA, but CEOs aren’t—often by their own admission—subject matter experts so much as people who can move stuff forwards. In my book the set of actual experts is certainly murky, but includes academics, researchers, sometimes forecasters, sometimes technical workers—sometimes CEOs but only in particular cases—anyone who’s spent several years researching the subject in question.
Sometimes, as you say, they don’t exist, but in such cases we don’t need to worry about deferring to them. When they do, it seems foolish to not to upweight their views relative to our own unless we’ve done the same, or unless we have very concrete reasons to think they’re inept or systemically biased (and perhaps even then).
I agree that the OP is too confident/strongly worded, but IMO this
which is more than enough to justify EA efforts here.
could be dangerously wrong. As long as AI safety consumes resources that might have counterfactually gone to e.g. nuclear disarmament, stronger international relations, it might well be harmful in expectation.
This is doubly true for warlike AI ‘safety’ strategies like Aschenbrenner’s call to intentionally arms race China, Hendrycks, Schmidt and Wang’s call to ‘sabotage’ countries that cross some ill-defined threshold, and Yudkowsky calling for airstrikes on data centres. I think such ‘AI safety’ efforts are very likely increasing existential risk.
Well handled, Peter! I’m curious how much of that conversation was organic, how much scripted or at least telegraphed in advance?
I’m warming to CoGi
That makes some sense, but leaves me with questions like
Which projects were home runs, and how did you tell that a) they were successful at achieving their goals and b) that their goals were valuable?
Which projects were failures that you feel were justifiable given your knowledge state at the time?
What do these past projects demonstrate about the team’s competence to work on future projects?
What and how was the budget allocated to these projects, and do you expect future projects projects to have structurally similar budgets?
Are there any other analogies you could draw between past and possible future projects that would enable us update on the latter’s probability of success?
MIRI is hardly unique even in the EA/rat space in having special projects—Rethink Priorities, for e.g., seem to be very fluid in what they work on; Founders Pledge and Longview are necessarily driven to some degree by the interests of their major donors; Clean Air Task force have run many different political campaigns, each seemingly unlike the previous ones in many ways; ALLFED are almost unique in their space, so have huge variance in the projects they work on; and there are many more with comparable flexibility.
And many EA organisations in the space that don’t explicitly have such a strategy have nonetheless pivoted after learning of a key opportunity in their field, or realising an existing strategy was failing.
In order to receive funds—at least from effectiveness-minded funders—all these orgs have to put a certain amount of effort into answering questions like those above.
And ok, you say you’re not claiming to be entitled to dollars, but it still seems reasonable to ask why a rational funder should donate to MIRI over e.g. any of the above organisations—and to hope that MIRI has some concrete answers.
IMO it would help to see a concrete list of MIRI’s outputs and budget for the last several years. My understanding is that MIRI has intentionally withheld most of its work from the public eye for fear of infohazards, which might be reasonable for soliciting funding from large private donors but seems like a poor strategy for raising substantial public money, both prudentially and epistemically.
If there are particular projects you think are too dangerous to describe, it would still help to give a sense of what the others were, a cost breakdown for those, anything you can say about the more dangerous ones (e.g. number of work hours that went into them, what class of project they were, whether they’re still live, any downstream effect you can point to, and so on).
You might want to consider EA Serbia, which I was told in answer to a similar question has a good community, at least big enough to have their own office. I didn’t end up going there, so can’t comment personally, but it’s on a latitude with northern Italy, so likely to average pretty warm—though it’s inland, so ‘average’ is likely to contain cold winters and very hot summers.
(but in the same thread @Dušan D. Nešić (Dushan) mentioned that air conditioning is ubiquitous)
Might be confusing with SoGive.
Should our EA residential program prioritize structured programming or open-ended residencies?
You can always host structured programs, perhaps on a regular cycle, but doing so to the exclusion of open-ended residencies seems to be giving up much of the counterfactual value the hotel provided. It seems like a strong overcommitment to a concern about AI doom in the next low-single-digit years, which remains (rightly IMO) a niche belief even in the EA world, despite heavy selection within the community for it.
Having said that, to some degree it sounds like you’ll need to follow the funding, and prioritise keeping operations running. If that funding is likely to be conditional on a short-term AI safety focus then you can always shift focus if the world doesn’t end in 2027 - though I would strive to avoid being long-term locked into that particular view.
[ETA] I’m not sure the poll is going to give you that meaningful results. I’m at approx the opposite end of it from @Chris Leong, but his answer sounds largely consistent with mine, primarily with a different emotional focus.
Thanks for the extensive reply! Thoughts in order:
I would also note that #3 could be much worse than #2 if #3 entails spreading wild animal suffering.
I think this is fair, though if we’re not fixing that issue then it seems problematic for any pro- longtermism view, since it implies the ideal outcome is probably destroying the biosphere. Fwiw I also find it hard to imagine humans populating the universe with anything resembling ‘wild animals’, given the level of control we’d have in such scenarios, and our incentives to exert it. That’s not to say we couldn’t wind up with something much worse though (planetwide factory farms, or some digital fear-driven economy adjacent to Hanson’s Age of Em)
I’m having a hard time wrapping my head around what the “1 unit of extinction” equation is supposed to represent.
It’s whatever the cost in expected future value extinction today would be. The cost can be negative if wild-animal-suffering proliferates, and some trajectory changes could have a negative cost of more than 1 UoEs if they make the potential future more than twice as good, and vice versa (a positive cost of more than 1 UoE if they make the future expectation negative from positive).
But in most cases I think its use is to describe non-extinction catastrophes as having a cost C such that 0 < C < 1UoE.
the parable of the apple tree is more about P(recovery) than it is about P(flourishing|recovery)
Good point. I might write a v2 of this essay at some stage, and I’ll try and think of a way to fix that if so.
“Resources get used up, so getting back to a level of technology the 2nd time is harder than the 1st time.”
...
”A higher probability of catastrophe means there’s a higher chance that civilization keeps getting set back by catastrophes without ever expanding to the stars.”I’m not sure I follow your confusion here, unless it’s a restatement of what you wrote in the previous bullet. The latter statement, if I understand it accurately is closer to my primary thesis. The first statement could be true if
a) Recovery is hard; or
b) Developing technology beyond ‘recovery’ is hard
I don’t have a strong view on a), except that it worries me that so many people who’ve looked into it think it could be very hard, yet x-riskers still seem to write it off as trivial on long timelines without much argument.
b) is roughly a subset of my thesis, though one could believe the main source of friction increase would come when society runs out of technological information from previous civilisations.
I’m not sure if I’m clearing anything up here...
“we might still have a greater expected loss of value from those catastrophes”—This seems unlikely to me, but I’d like to see some explicit modeling.
So would I, though modelling it sensibly is extremely hard. My previous sequence’s model was too simple to capture this question, despite being probably too complicated for what most people would consider practical use. To answer comparative value loss, you need to look at at least:
Risk per year of non-AI catastrophes of various magnitudes
Difficulty of recovery from other catastrophes
Difficulty of flourishing given recovery from other catastrophes
Risk per year of AI catastrophes of various magnitudes
Effect of AI-catastrophe risk reduction on other catastrophes? E.g. does benign AI basically lock in a secure future, or would we retain the capacity and willingness to launch powerful weapons at each other?
How likely is it that AI outcome is largely predetermined by, such that developing benign AI once would be strong evidence that if society subsequently collapsed and developed it again, it would be benign again?
The long-term nature of AI catastrophic risk. Is it a one-and-done problem if it goes well? Or does making a non-omnicidal AI just give us some breathing space until we create its successor, at which point we have to solve the problem all over again?
Effect of other catastrophe risk reduction on AI-catastrophe. E.g. does reducing global nuclear arsenals meaningfully reduce the risk that AI goes horribly wrong by accident? Or do we think most of the threat is from AI that deliberatively plans our destruction, and is smart enough not to need existing weaponry?
The long-term moral status of AI. Is a world where it replaces us as good or better than a world where we stick around on reasonable value systems?
Expected changes to human-descendant values given flourishing after other catastrophes
My old model didn’t have much to say on any beyond the first three of these considerations.
Though if we return to the much simpler model and handwave a bit, if we suppose that annual non-extinction catastrophic risk is between 1 and 2%, then 10-20 year risk is between 20 and 35%. If we also suppose that chances of flourishing after collapse drop by 10 or more %, that puts it in the realm of ‘substantially bigger threat than the more conservative AI x-riskers view AI as, substantially smaller than the most pessimistic views of AI x-risk’.
It could be somewhat more important either if chances of flourishing after collapse drop by substantially more (as I think they do), and much more important if we could persistently reduce catastrophic risk that persist for beyond the 10-20-year period (e.g. by moving towards stable global governance or at least substantially reducing nuclear arsenals).
Very helpful, thanks! A couple of thoughts:
EA grantmaking appears on a steady downward trend since 2022 / FTX.
It looks like this is driven entirely by Givewell/global health and development reduction, and that actually the other fields have been stable or even expanding.Also, in an ideal world we’d see funding from Longview and Founders Pledge. I also gather there’s a new influx of money into the effective animal welfare space from some other funder, though I don’t know their name.
Kudos to whoever wrote these summaries. They give a great sense of the contents and at least wth mine capture the essence of it much more succinctly than I could!
Most of these aren’t so much well-formed questions, as research/methodological issues I would like to see more focus on:
Operationalisations of AI safety that don’t exacerbate geopolitical tensions with China—or ideally that actively seek ways to collaborate with China on reducing the major risks.
Ways to materially incentivise good work and disincentivise bad work within nonprofit organisations, especially effectiveness-minded organisations
Looking for ways to do data-driven analyses on political work especially advocacy; correct me if wrong, but the recommendations in EA space for political advocacy seem to necessarily boil down to a lot of gut-instincting on whether someone having successfully executed Project A makes their work on Project B have high expectation
Research into the difficulty of becoming a successful civilisation after recovery from civilisational collapse (I wrote more about this here)
How much scope is there for more work or more funding in the nuclear safety space, and what is its current state? Last I heard, it had lost a bunch of funding, such that highly skilled/experienced diplomats in the space were having to find unrelated jobs. Is that still true?
I’m sympathetic to that camp, but I think it has major epistemic issues that largely go unaddressed:
It systemically biases away from extreme probabilities (it’s hard to assert < than 10−3, for e.g., but many real-world probabilities are and post-hoc credences look like they should have been below this)
By focusing on very specific pathways towards some outcome, it diverts attention towards easily definable issues, and hence away from the prospects of more complex pathways of causing the same or value-equivalent outcomes.[1]
It strongly emphasises point credence estimates over distributions, the latter of which are IMO well worth the extra effort, at least whenever you’re broadcasting your credences to the rest of the world.
By the way, I find this a strange remark:
This sounds like exactly the sort of criticism that’s most valuable of a project like this! If their methodology were sound it might be more valuable to present a more holistic set of criticisms and some contrary credences, but David and titotal aren’t exactly nitpicking syntactic errors—IMO they’re finding concrete reasons to be deeply suspicious of virtually every step of the AI 2027 methodology.
For e.g. I think it’s a huge concern that the EA movement have been pulling people away from nonextinction global catastrophic work because they focused for so long on extinction being the only plausible way we could fail to become interstellar, subject to the latter being possible. I’ve been arguing for years now, that the extinction focus is too blunt a tool, at least for the level of investigation the question has received from longtermists and x-riskers.