I do research at Longview Philanthropy. Previously I was a Research scholar at FHI and assistant to Toby Ord. Philosophy at Cambridge before that.
I also do a podcast about EA called Hear This Idea.
I do research at Longview Philanthropy. Previously I was a Research scholar at FHI and assistant to Toby Ord. Philosophy at Cambridge before that.
I also do a podcast about EA called Hear This Idea.
Nice post! Copying the comment I left on the draft (edited for clarity) —
I agree with both conclusions, but I don’t think your argument is the strongest reason to buy those conclusions.
My picture of how large-scale space expansion goes involves probes (not humans) being sent out after AGI. Then a reasonable default might be that the plans and values embedded in humanity’s first large-scale space settlement initiatives are set by the plans and values of some very large and technologically advanced political faction at the time (capable of launching such a significant initiative by force or unanimity), rather than a smaller number of humans who were early to settle some part of the Solar System.
I then picture most human-originating life to not resemble biological humans (more like digital people). In this case it’s very hard to imagine how farming animals would make any sense.
Even with shorter-term and human-led space settlement, like bases on the Moon and Mars, I expect it to make very little logistical sense to farm animals (regardless of the psychological profile of whoever is doing the settlement). The first settlements will be water and space and especially labour constrained, and raising animals is going to look needlessly painful and inefficient without the big economies of scale of factory farms.
That said, if animals are farmed in early settlements, then note that smaller animals tend to be the most efficient at converting feed into human-palatable calories (and also the most space-efficient). For that reason some people suggest insect farming (e.g. crickets, mealworms), which does seem much more likely than livestock or poultry! But another option is bioreactors of the kind being developed on Earth. In theory they could become more efficient than animals and would then make most practical sense (since the capital cost to build the reactor isn’t going to matter; taking anything into space is already crazy expensive). Also a lot of food will probably be imported as payload early on; unsure if that’s relevant.
So I think I’m saying the cultural attitudes of early space settlers is probably less important than the practical mechanisms by which most of space is eventually settled. Especially if most future people are not biological humans, which kind of moots the question.
I do think it’s valuable and somewhat relieving to point out that animal farming could plausibly remain an Earth-only problem!
I endorse many (more) people focusing on x-risk and it is a motivation and focus of mine; I don’t endorse “we should act confidently as if x-risk is the overwhelmingly most important thing”.
Honestly, I think the explicitness of my points misrepresents what it really feels like to form a view on this, which is to engage with lots of arguments and see what my gut says at the end. My gut is moved by the idea of existential risk reduction as a central priority, and it feels uncomfortable being fanatical about it and suggesting others do the same. But it struggles to credit particular reasons for that.
To actually answer the question: (6), (5), and (8) stand out, and feel connected.
Agree.
In this spirit, here are some x-risk sceptical thoughts:
You could reasonably think human extinction this century is very unlikely. One way to reach this conclusion is simply to work through the most plausible causes of human extinction, and reach low odds for each. Vasco Grilo does this for (great power) conflict and nuclear winter, John Halstead suggests extinction risk from extreme climate change is very low here, and the background rate of extinction from natural sources can be bounded by (among other things) observing how long humans have already been around for. That leaves extinction risk from AI and (AI-enabled) engineered pandemics, where discussion is more scattered and inconclusive. Here and here are some reasons for scepticism about AI existential risk.
Even if the arguments for AI x-risk are sound, then it’s not clear how they are arguments for expecting literal human extinction over outcomes like ‘takeover’ or ‘disempowerment’. It’s hard to see why AI takeover would lead to smouldering ruins, versus continued activity and ‘life’, just a version not guided by humans or their values.
So “existential catastrophe” probably shouldn’t just mean “human extinction”. But then it surprisingly slippery as a concept. Existential risk is the risk of existential catastrophe, but it’s difficult to give a neat and intuitive definition of “existential catastrophe” such that “minimise existential catastrophe” is a very strong guide for how to do good. Hilary Greaves dicusses candidate definitions here.
From (1), you might think that if x-risk reduction this century should be a near-top priority, then most its importance comes from mitigating non-extinction catastrophes, like irreversible dystopias. But few current efforts are explicitly framed as ways to avoid dystopian outcomes, and it’s less clear how to do that. Other than preventing AI disempowerment or takeover, assuming those things are dystopian.
But then isn’t x-risk work basically just about AI, and maybe also biorisk? Shouldn’t specific arguments for those risks and ways to prevent them therefore matter more than more abstract arguments for the value of mitigating existential risks in general?
Many strategies to mitigate x-risks trade off uncomfortably against other goods. Of course they require money and talent, but it’s hard to argue the world is spending too much on e.g. preventing engineered pandemics. But (to give a random example), mitigating x-risk from AI might require strong AI control measures. If we also end up thinking things like AI autonomy matter, that could be an uncomfortable (if worthwhile) price to pay.
It’s not obvious that efforts to improve prospects for the long-run future should focus on preventing unrecoverable disasters. There is a strong preemptive argument for this; roughly that humans are likely to recover from less severe disasters, and so retain most their prospects (minus the cost of recovering, which is assumed to be small in terms of humanity’s entire future). The picture here is one on which the value of the future is roughly bimodal — either we mess up irrecoverable and achieve close to zero of our potential, or we reach roughly our full potential. But that bimodal picture isn’t obviously true. It might be comparably important to find ways to turn a mediocre-by-default future into a really great future, for instance.
A related picture that “existential catastrophe” suggests is that the causes of losing all our potential are fast and discrete events (bangs) rather than gradual processes (whimpers). But why are bangs more likely than whimpers? (See e.g. “you get what you measure” here).
Arguments for prioritising x-risk mitigation often involve mistakes, like strong ‘time of perils’ assumptions and apples to oranges comparisons. A naive case for prioritising x-risk mitigation might go like this: “reducing x-risk this century by 1 percentage point is worth one percentage point of the expected value of the entire future conditional on no existential catastrophes. And the entire future is huge, it’s like lives. So reducing x-risk by even a tiny fraction, say , this century saves (a huge number of) lives in expectation. The same resources going to any work directed at saving lives within this century cannot save such a huge number of lives in expectation even if it saved 10 billion people.” This is too naive for a couple reasons:
This assumes this century is the only time where an existential catastrophe could occur. Better would be “the expected value of the entire future conditional on no existential catastrophe this century”, which could be much lower.
This compares long-run effects with short-run effects without attempting to evaluate the long-run effects of interventions not deliberately targeted at reducing existential catastrophe this century.
Naive analysis of the value of reducing existential catastrophe also doesn’t account for ‘which world gets saved’. This feels especially relevant when assessing the value of preventing human extinction, where you might expect the worlds where extinction-preventing interventions succeed in preventing extinction are far less valuable than the expected value of the world conditional on no extinction (since narrowly avoiding extinction is bad news about the value of the rest of the future). Vasco Grilo explores this line of thinking here, and I suggest some extra thoughts here.
The fact that some existential problems (e.g. AI alignment) seem, on our best guess, just about solvable with an extra push from x-risk motivated people doesn’t itself say much about the chance that x-risk motivated people make the difference in solving those problems (if we’re very uncertain about how difficult the problems are). Here are some thoughts about that.
These thoughts make me hesitant about confidently acting as if x-risk is overwhelmingly important, even compared to other potential ways to improve the long-run future, or other framings on the importance of helping navigate the transition to very powerful AI.
But I still existential risk matters greatly as an action-guiding idea. I like this snippet from the FAQ page for The Precipice —
But for most purposes there is no need to debate which of these noble tasks is the most important—the key point is just that safeguarding humanity’s longterm potential is up there among the very most important priorities of our time.
[Edited a bit for clarity after posting]
Thanks for the comment, Owen.
I agree with your first point and I should have mentioned it.
On your second point, I am assuming that ‘solving’ the problem means solving it by a date, or before some other event (since there’s no time in my model). But I agree this is often going to be the right way to think, and a case where the value of working on a problem with increasing resources can be smooth, even under certainty.
Ah thanks, good spot. You’re right.
Another way to express (to avoid a stacked fraction) is ; i.e. percentage change in resources. I’ll update the post to reflect this.
Just noticed I missed the deadline — will you be accepting late entries?
Edit: I had not in fact missed the deadline
Here’s a framing which I think captures some (certainly not all) of what you’re saying. Imagine graphing out percentiles for your credence distribution over values the entire future can take. We can consider the effect that extinction mitigation has on the overall distribution, and the change in expected value which the mitigation has. In the diagrams below, the shaded area represents the difference made by extinction migitation.
The closest thing to a ‘classic’ story in my head looks like below: on which (i) the long-run future is basically biomodal, between ruin and near-best futures, and (ii) the main effect of extinction mitigation is to make near-best futures more likely.
A rough analogy: you are a healthy and othrewise cautious 22-year old, but you find yourself trapped on a desert island. You know the only means of survival is a perilous week-long journey on your life raft to the nearest port, but you think there is a good chance you don’t survive the journey. Supposing you make the journey alive, then your distribution over your expected lifespan from this point (ignoring the possibility of natural lifespan enhancement) basically just shifts to the left as above (though with negligible weight on living <1 year from now).
A possibility you raise is that the main effect of preventing extinction is only to make worlds more likely which are already close to zero value, as below.
A variant on this possibility is that, if you knew some option to prevent human extinction were to be taken, your new distribution would place less weight on near-zero futures, but less weight on the best futures also. So your intervention affects many percentiles of your distribution, in a way which could make the net effect unclear.
One reason might be causal: the means required to prevent extinction might themselves seal off the best futures. In a variant of the shipwreck example, you could imagine facing the choice between making the perilous week-long journey, or waiting it out for a ship to find you in 2 months. Suppose you were confident that, if you wait it out, you will be found alive, but at the cost of reducing your overall life expectancy (maybe because of long-run health effects).
The above possibilities (i) assume that your distribution over the value of the future is roughly bimodal, and (ii) ignore worse-than-zero outcomes. If we instead assume a smooth distribution, and include some possibility of worse-than-zero worlds, we can ask what effect mitigating extinction has.
Here’s one possibility: the fraction of your distribution that effectively zero value worlds gets is ‘pinched’, giving more weight both to better-than-zero worlds, and worse-than-zero worlds. Here you’d need to explain why this is a good thing to do.
So an obvious question here is how likely it is that extinction mitigation is more like the ‘knife-edge’ scenario of a healthy person trapped in a survive-or-die predicament. I agree that the ‘classic’ picture of the value of extinction mitigation can mislead about how obvious this is for a bunch of reasons. Though (as other commenters seem to have pointed out) it’s unclear how much to rely on relatively uninformed priors, versus the predicament we seem to find ourselves in when we look at the world.
I’ll also add that, in the case of AI risk, I think that framing literal human extinction as the main test of whether the future will be good seems like a mistake, in particular because I think literal human extinction is much less likely than worlds where things go badly for other reasons.
Curious for thoughts, and caveat that I read this post quickly and mostly haven’t read the comments.
I’m fascinated by the logistics here. I’m imagining you’ll need a very flat route? And also that you won’t be able to stop at all (including at lights)?? Will you be doing a big loop, or point to point?
Anyway, rooting for you!
Thanks, I think both those points make sense. On the second point about value of information, the future for animals without humans would likely still be bad (because of wild animal suffering), and a future with humans could be less bad for animals (because we alleviate both wild and farmed animal suffering). So I don’ think it’s necessarily true that something as abstract as ‘a clearer picture of the future’ can’t be worth the price of present animal suffering, since one of the upshots of learning that picture might be to choose to live on and reduce overall animal suffering over the long run. Although of course you could just be very sceptical that the information value alone would be enough to justify another ⩾ half-century of animal suffering (and it certainly shouldn’t be used to excuse to wait around and not do things to urgently reduce that suffering). Though I don’t know exactly what you’re pointing at re “defensive capabilities” of factory farming.
I also think I share your short-term (say, ⩽ 25-year) pessimism about farmed animals. But in the longer run, I think there are some reasons for hope (if alt proteins get much cheaper and better, if humans do eventually decide to move away from animal agriculture for roughly ethical reasons, despite the track record of activism so far).
Of course there is a question of what to do if you are much more pessimistic even over the long-run for animal (or nonhuman) welfare. Even here, if “cause the end of human civilisation” were a serious option, I’d be very surprised if there weren’t many other serious options available to end factory farming without also causing the worst calamity ever.
(Don’t mean to represent you as taking a stand on whether extinction would be good fwiw)
I agree that, right now, we’re partly in the dark about whether the future will be good if humanity survives. But if humanity survives, and continues to commit moral crimes, then there will still be humans around to notice that problem. And I expect that those humans will be better informed about (i) ways to end those moral crimes, and (ii) the chance those efforts will eventually succeed.
If future efforts to end moral crimes succeed, then of course it would be a great mistake to go extinct before that point. But even for the information value of knowing more about the prospects for humans and animals (and everything else that matters), it seems well worth staying alive.
I think it is worth appreciating the number and depth of insights that FHI can claim significant credit for. In no particular order:
The concept of existential risk, and arguments for treating x-risk reduction as a global priority (see: The Precipice)
Arguments for x-risk from AI, and other philosophical considerations around superintelligent AI (see: Superintelligence)
Arguments for the scope and importance of humanity’s long-term future (since called longtermism)
Observer selection effects and ‘anthropic shadow’
Bounding natural extinction rates with statistical methods
Dissolving the Fermi paradox
The reversal test in applied ethics
‘Comprehensive AI services’ as an alternative to unipolar outcomes
The concept of existential hope
Note especially how much of the literal terminology was coined on (one imagines) a whiteboard in FHI. “Existential risk” isn’t a neologism, but I understand it was Nick who first suggested it be used in a principled way to point to the “loss of potential” thing. “Existential hope”, “vulnerable world”, “unilateralist’s curse”, “information hazard”, all (as far as I know) tracing back to an FHI publication.
It’s also worth remarking on the areas of study that FHI effectively incubated, and which are now full-blown fields of research:
The ‘Governance of AI Program’ was launched in 2017, to study questions around policy and advanced AI, beyond the narrowly technical questions. That project was spun out of FHI to become the Centre for the Governance of AI. As far as I understand, it was the first serious research effort on what’s now called ”AI governance”.
From roughly 2019 onwards, the working group on biological risks seems to have been fairly instrumental in making the case for biological risk reduction as a global priority, specifically because of engineered pandemics.
If research on digital minds (and their implications) grows to become something resembling a ‘field’, then the small team and working groups on digital minds can make a claim to precedence, as well as early and more recent published work.
FHI was staggeringly influential; more than many realise.
The singer-songwriter José González has mentioned being inspired by The Precipice and apparently other EA-related ideas. Take the charmingly scout mindset ‘Head On’:
Speak up
Stand down
Pick your battles
Look around
Reflect
Update
Pause your intuitions and deal with it
Head on
[Copied from an email exchange with Vasco, slightly embellished]
I think the probability of a flat universe is ~0 because the distribution describing our knowledge about the curvature of the universe is continuous, whereas a flat universe corresponds to a discrete curvature of 0.
Sure, if you put infinitesimal weight on a flat universe in your prior (true if your distribution is continuous over a measure of spatial curvature and you think it’s infinite only if spatial curvature = 0), then no observation of (local) curvature is going to be enough. On your framing, I think the question is just why the distribution needs to be continuous? Consider: “the falloff of light intensity / gravity etc is very close to being proportional to , but presumably the exponent isn’t exactly 2 since our distribution over for is continuous”.
all the evidence for infinity is coming from having some weight on infinity in our prior.
‘All’ in the sense that you need nonzero non-infinitesimal weight on infinity in your prior, but not in the sense that your prior is the only thing influencing your credence in infinity. Presumably observations of local flatness do actually upweight hypotheses about the universe being infinite, or at least keep them open if you are open to the possibility in the first place. And I could imagine other things counting as more indirect evidence, such as how well or poorly our best physical theories fit with infinity.
[Added] I think this speaks to something interesting about a picture of theoretical science suggested by a subjective Bayesian attitude to belief-forming in general, on which we start with some prior distribution(s) over some big (continuous?) hypothesis space(s), and observations tell us how to update our priors. But you might think that’s a weird way to figure out which theories to believe, because e.g. (i) the hypothesis space is indefinitely large such that you should have infinitesimal or very small credence in any given theory; (ii) the hypothesis space is unknown in some important way, in which case you can’t assign credences at all, or (iii) theorists value various kinds of simplicity or elegance which are hard to cash out in Bayesian terms in a non-arbitrary way. I don’t know where I come down on this but this is a case where I’m unusually sympathetic to such critiques (which I associate with Popper/Deutsch[1]).
[Continuing email] I do agree that “the universe is infinite in extent” (made precise) is different from “for any size, we can’t rule out the universe being at least that big”, and that the first claim is of a different kind. For instance, your distribution over the size of the universe could have an infinite mean while implying certainty that the universe has some finite size (e.g. if that distribution over the size of the universe is where ).
That does put us in a weird spot though, where all the action seems to be in your choice of prior.
I don’t know how relevant it is that the axiom of infinity is independent of ZFC, unless you think that all true mathematical claims are made true by actual physical things in the world (JS Mill believed something like this I think). Then you might have thought you have independent reason to believe (i) the axioms, and if so believing that (ii) you’d be forced to believe in an actual physical infinity. But that has the same suspect “synthetic a priori” character as ontological arguments for God’s existence, and is moot in any case because (ii) is false!
For what it’s worth, as a complete outsider I feel a surprised by how little serious discussion there is in e.g. astrophysics / philosophy of physics etc around whether the universe is infinite in some way. It seems like such a big deal; indeed an infinitely big deal!
Though I don’t think these views would have much constructive to say about how much credence to put on the universe being infinite, since they’d probably reject the suggestion that you can or should be trying to figure out what credence to put on it. Paging @ben_chugg since I think he could say if I’m misrepresenting the view.
Very cool! Feel free to share your paper if you’re able, I’d be curious to see.
I don’t know how to interpret the image, but the this makes sense:
With a [small] attack surface (grid) for each actor, the budget multiplication should have no effect on loss rates, because all vulnerabilities are found and it’s just a matter of who found them first, which is not affected by budget multiplication. However, with a [large attack surface], the multiplication of budgets strictly benefits the attacker, because the defenders will ~never check the same squares that the attacker checks.
Copying a comment from Substack:
If offence and defence both get faster, but all the relative speeds stay the same, I don’t see how that in itself favours offence (we get ICBMs, but the same rocketry + guidance etc tech means missile defence gets faster at the same rate). But ideas like this make sense, e.g. if there are any fixed lags in defence (like humans don’t get much faster at responding but need to be involved in defensive moves) then speed favours offence in that respect.
That is to say there could be a ‘faster is different’ effect, where in the AI case things might move too chaotically fast — faster than the human-friendly timescales of previous tech — to effectively defend. For instance, your model of cybersecurity might be a kind of cat-and-mouse game, where defenders are always on the back foot looking for exploits, but they patch them with a small (fixed) time lag. The lag might be insignificant historically, until the absolute lag begins to matter. Not sure I buy this though.
A related vague theme is that more powerful tech in some sense ‘turns up the volatility/variance’. And then maybe there’s some ‘risk of ruin’ asymmetry if you could dip below a point that’s irrecoverable, but can’t rise irrecoverably above a point. Going all in on such risky bets can still be good on expected value grounds, while also making it much more likely that you get wiped out, which is the thing at stake.
Also, embarassingly, I realise I don’t have a very good sense of how exactly people operationalise the ‘offence-defence balance’. One way could be something like ‘cost to attacker of doing $1M of damage in equilibrium’, or in terms of relative spending like Garfinkel and Dafoe do (“if investments into cybersecurity and into cyberattacks both double, should we expect successful attacks to become more or less feasible”). Or maybe something about the cost-per-attacker spending to hold on to some resource (or cost-per-defender spending to sieze it).
This is important because I don’t currently know how to say that some technology is more or less defence-dominant than another, other than in a hand-wavery intuitive way. But in hand-wavey terms it sure seems like bioweapons are more offence-dominant than, say, fighter planes. Because it’s already the case that you need to spend a lot of money to prevent most the damage someone could cause with not much money at all.
I see the AI stories — at least the ones I find most compelling — as being kinda openly idiosyncratic and unprecedented. The prior from previous new tech very much points against them, as you show. But the claim is just: yes, but we have stories about why things are different this time ¯\_(ツ)_/¯
Great post.
What a great resource, thanks for putting it together!
Opinionated lists like this feel significantly more useful than comprehensive but unordered lists of relevant resources, because: (i) for most literatures, you’re likely to get most of all the good insights from reading a small standout minority of everything written; and (ii) it’s typically often not obvious to an outsider which resources are best in this respect. I hadn’t heard of many of the books you rate highly.
Incidentally: consider reformatting the papers to not be headers? It makes the navigation bar feel cluttered to me.
Congrats Toby, excited to see what you get up to in the new role! And thanks for all your work on Amplify.
Thanks Vasco!