Nice argument, I hadn’t heard that before!
William_MacAskill
I think what we should be talking about is whether we hit the “point of no return” this century for extinction of Earth-originating intelligent life. Where that could mean: Homo sapiens and a most other mammals get killed off in an extinction event this century; then technologically-capable intelligence never evolves again on Earth; so all life dies off within a billion years or so. (In a draft post that you saw of mine, this is what I had in mind.)
The probability of this might be reasonably high. There I’m at idk 1%-5%.
In the first of these, I think most of the EV comes from whether technologically-capable intelligence evolves or not. I’m more likely or not on that (for say extinction via bio-catastrophe), but not above 90%.
Thanks! I haven’t read your stuff yet, but it seems like good work; and this has been a reason in my mind for being more in favour of trajectory change than totla extinction reduction for a while. It would only reduce the value of extinction risk reduction by an OOM at most, though?
I’m sympathetic to something in Mediocrity direction (for AI-built civilisations as well as human-built civilisations), but I think it’s very hard to have a full-blooded Mediocrity principle if you also think that you can take actions today to meaningfully increase or decrease the value of Earth-originating civilisation. Suppose that Earth-originating civilisation’s value is V, and if we all worked on it we could increase that to V+ or to V-. If so, then which is the right value for the alien civilisation? Choosing V rather than V+ or V- (or V+++ or V—etc) seems pretty arbitrary.
Rather, we should think about how good our prospects are compared to a random draw civilisation. You might think we’re doing better or worse, but if it’s possible for us to move the value of the future around, then it seems we should be able to reasonably think that we’re quite a bit better (or worse) than the random draw civ.
We can then work out the other issues once we have more time to think about them
Fin and I talk a bit about the “punting” strategy here.
I think it works often, but not in all cases.
For example the AI capability level that poses a meaningful risk of human takeover comes earlier than the AI capability level that poses a meaningful risk of AI takeover. Because some humans are coming with loads of power, already, and the amount of strategic intelligence you need to take over, if you already have loads of power, is less than the strategic capability you need if you’re starting off with almost none (which will be true of the ASI).
Moral error as an existential risk
Agreed!
My view is that Earth-originating civilisation, if we become spacefaring, will attain around 0.0001% of all value.
So you think:
1. People with your values control 1 in 1 millionth of future resources, or less? This seems pessimistic!
2. But maybe you think it’s just you who has your values and everyone else would converge on something subtly different—different enough to result in the loss of essentially all value. Then the 1-in-1-million would no longer seem so pessimistic.
But if so, then suppose I’m Galactic Emperor and about to turn everything into X, best by my lights… do you really take a 99.9% chance of extinction, and a 0.1% chance of stuff optimised by you, instead?
3. And if so, do you think that Tyler-now has different values than Tyler-2026? Or are you worried that he might have slightly different values, such that you should be trying to bind yourself to the mast in various ways?
4. Having such a low v(future) feels hard to maintain in light of model uncertainty and moral uncertainty.
E.g. what’s the probability you have that:
i. People in general just converge on what’s right?
ii. People don’t converge, but a significant enough fraction converge with you that you and others end up with more than 1milllionth of resources?
iii. You are able to get most of what you want via trade with others?
Yeah, I think that lock-in this century is quite a bit more likely than extinction this century. (Especially if we’re talking about hitting a points of no return for total extinction.)
That’s via two pathways:
- AGI-enforced institutions (including AGI-enabled immortality of rulers).
- Defence-dominance of star systems
I do think that “path dependence” (a broader idea than lock-in) is a big deal, but most of the long-term impact of that goes via a billiards dynamic: path-dependence on X, today, affects some lock-in event around X down the road. (Where e.g. digital rights and space governance are plausible here.)
Surely any of our actions changes who exists in the future? So we aren’t in fact benefiting them?
(Whereas we can benefit specific aliens, e.g. by leaving our resources for them—our actions today don’t affect the identities of those aliens.)
If you have a fixed population and imagine increasing the resources they have available, I assume that the value of the outcome is a strictly concave function of the resource base.
Certainly given current levels of technology, but perhaps not given future technology (e.g. indefinite life-extension technology), at least if individual wellbeing is proportional to number of happy years lived.
”Doubling the population might double the value of the outcome, although it’s not clear that this constitutes a doubling of resources.”
I was thinking you’d need twice as many resources to have twice as many people?
”And why should it matter if the relationship between value and resources is strictly concave? Isn’t the key question something like whether there are potentially realizable futures that are many orders of magnitude more valuable than the default or where we are now? Answering yes seems compatible with thinking that the function relating resources to value is strictly concave and asymptotes, so long as it asymptotes somewhere suitably high up on the scale of value. “
Yes, in principle, but I think that if you have the upper-bound view, then you do so on the basis of common-sense intuition. But if so, then I think the upper bound is probably really low in cosmic scales—like, if we already have a Common Sense Eutopia within the solar system, I think we’d be more than 50% of the way from 0 to the upper bound.
Yeah, I think the issue (for me) is not just about fanaticism. Give me Common-sense Eutopia or a gamble with a 90% chance of extinction and a 10% chance of Common-sense Eutopia 20 times the size, and it seems problematic to choose the gamble.
(To be clear—other views, on which value is diminishing, are also really problematic. We’re in impossibility theorem territory, and I see the whole things as a mess; I don’t have a positive view I’m excited about.)
Re WWOTF: You can (and should) think that there’s huge amounts of value at stake in the future, and even think that there’s much more value at stake in the future than there is in the present century, without thinking that value is linear in number of happy people. It diminishes the case a bit, but nowhere near enough for longtermism to not go through.
Yeah, thanks for pushing me to be clearer: I meant “convergence” as shorthand to refer to “fully accurate, motivational convergence”. So I mean a scenario where people have the correct moral views, on everything that matters significantly, and are motivated to act on those moral views. I’ll try to say FAM-convergence from now on.
I agree with the framing.
Quantitatively, the willingness to pay to avoid extinction even just from the United States is truly enormous. The value of a statistical life in the US — used by the US government to estimate how much US citizens are willing to pay to reduce their risk of death — is around $10 million. The willingness to pay, therefore, from the US as a whole, to avoid a 0.1 percentage point of a catastrophe that would kill everyone in the US, is over $1 trillion. I don’t expect these amounts to be spent on global catastrophic risk reduction, but they show how much latent desire there is to reduce global catastrophic risk, which I’d expect to become progressively mobilised with increasing indications that various global catastrophic risks, such as biorisks, are real. [I think my predictions around this are pretty different than some others, who expect the world to be almost totally blindsided. Timelines and gradualness of AI takeoff is of course relevant here.]In contrast, many areas of better futures work are likely to remain extraordinarily neglected. The amount of even latent interest in, for example, ensuring that resources outside of our solar system are put to their best use, or that misaligned AI produces a somewhat-better future than it would otherwise have done even if it kills us all, is tiny, and I don’t expect society to mobilise massive resources towards these issues even if there were indications that those issues were pressing.
In some cases, what people want will be actively opposed to what is in fact best, if what’s best involves self-sacrifice on the part of those alive today, or with power today.
And then I think the neglectedness consideration beats the tractability consideration. Here are some pretty general reasons for optimism on expected tractability:In general, tractability doesn’t vary by as much as importance and neglectedness.
In cause areas where very little work has been done, it’s hard for expected tractability to be very low. Because of how little we know about tractability in unexplored cause areas, we should often put significant credence on the idea that the cause will turn out to be fairly tractable; this is enough to warrant some investment into the cause area — at least enough to find out how tractable the area is.
There are many distinct sub-areas within better futures work. It seems unlikely to me that tractability in all of them is very low, and unlikely that their tractability is very highly correlated.
There’s a reasonable track record of early-stage areas with seemingly low tractability turning out to be surprisingly tractable. A decade ago, work on risks from AI takeover and engineered pathogens seemed very intractable; there was very little that one could fund, and very little in the way of promising career paths. But this changed over time, in significant part because of (i) research work improving our strategic understanding, and shedding light on what interventions were most promising; (ii) scientific developments (e.g. progress in machine learning) making it clearer what interventions might be promising; (ii) the creation of organisations that could absorb funding and talent. All these same factors could well be true for better futures work, too.
Of these considerations, it’s the last that personally moves me the most. It doesn’t feel long ago that work on AI takeover risk felt extraordinarily speculative and low-tractability, where there was almost nowhere one could work for or donate to outside of the Future of Humanity Institute or Machine Intelligence Research Institute. In the early days, I was personally very sceptical about the tractability of the area. But I’ve been proved wrong. Via years of foundational work — both research work figuring out what the most promising paths forward are, and via founding new organisations that are actually squarely focused on the goal of reducing takeover risk or biorisk, rather than on a similar but tangential goal — the area has become tractable, and now there are dozens of great organisations that one can work for or donate to.
Why so confident that:
- It’ll be a singleton AI that takes over
- That it will not be conscious?
I’m at 80% or more that there will be a lot of conscious AIs, if AI takes over.
The easiest way, in my view, to make a near-optimal future very likely, conditional on non-extinction, is if value is bounded above.
There’s an argument that this is the common sense view. E.g. consider:
Common-sense Eutopia: In the future, there is a very large population with very high well-being; those people are able to do almost anything they want as long as they don’t harm others. They have complete scientific and technological understanding. War and conflict are things of the past. Environmental destruction has been wholly reversed; Earth is now a natural paradise. However, society is limited only to the solar system, and will come to an end once the Sun has exited its red giant phase, in about five billion years.
Does this seem to capture less than one 10^22th of all possible value? (Because there are ~10^22 affectable stars, so civilisation could be over 10^22 times as big). On my common-sense moral intuitions, no.
Making this argument stronger: Normally, quantities of value are defined are in terms of the value of risky gambles. So what it means to say that Common-sense Eutopia is less than one 10^22th of all possible value is that a gamble with a one in 10^22 chance of producing an ideal-society-across-all-the-stars, and a (1 − 1/10^22) chance of near-term extinction, is better than producing Common-sense Eutopia for certain.
But that seems wild. Of all the issues facing classical utilitarianism, this seems the most problematic to me.
Average utilitarianism is approx linear in resources as long as at least one possible individual’s wellbeing is linear in resources.
I.e. we create Mr Utility Monster, who has wellbeing that is linear in resources, and give all resources to benefiting Mr Monster. Total value is the same as it would be under total utilitarianism, just divided by a constant (namely, the number of people who’ve ever lived).
Unpacking this: on linear-in-resources (LIR) views, we could lose out on most value if we (i) capture only a small fraction of resources that we could have done, and/or (ii) use resources in a less efficient way than we could have done. (Where on a LIR view, there is some use of resources that has the highest value/unit of resources, and everything should be used in that way.)
Plausibly at least, only a tiny % of possible ways of using resources are close to the value produced by the highest value/unit of resources use. So, the thinking goes, merely getting non-extinction isn’t yet getting you close to a near-best future—instead you really need to get from a non-extinction future to that optimally-used-resources future, and if you don’t then you lose out on almost all value.
Discussion topic: People vary a lot in the extent to which, and how likely it is, that post-AGI, different people will converge on the same moral views. I feel fairly sceptical about having a high likelihood of convergence; I certainly don’t think we should bank on it.
[See my response to Andreas below. Here I meant “convergence” as shorthand to refer to “fully accurate, motivational convergence”.]
Thanks! I appreciate you clarifying this, and for being clear about it. Views along these lines are what I always expect subjectivists to have, and they never do, and then I feel confused.