Future Matters #1: AI takeoff, longtermism vs. existential risk, and probability discounting

The remedies for all our diseases will be discovered long after we are dead; and the world will be made a fit place to live in. It is to be hoped that those who live in those days will look back with sympathy to their known and unknown benefactors.
— John Stuart Mill

Future Matters is a newsletter about longtermism. Each month we collect and summarize longtermism-relevant research, share news from the longtermism community, and feature a conversation with a prominent longtermist. You can also subscribe on Substack, listen on your favorite podcast platform and follow on Twitter.

Research

Scott Alexander’s “Long-termism” vs. “existential risk” worries that “longtermism” may be a worse brand (though not necessarily a worse philosophy) than “existential risk”. It seems much easier to make someone concerned about transformative AI by noting that it might kill them and everyone else, than by pointing out its effects on people in the distant future. We think that Alexander raises a valid worry, although we aren’t sure the worry favors the “existential risk” branding over the “longtermism” branding as much as he suggests: existential risks are, after all, defined as risks to humanity’s long-term potential. Both of these concepts, in fact, attempt to capture the core idea that what ultimately matters is mostly located in the far future: existential risk uses the language of “potential” and emphasizes threats to it, whereas longtermism instead expresses the idea in terms of value and the duties it creates. Maybe the “existential risk” branding seems to address Alexander’s worry better because it draws attention to the threats to this value, which are disproportionately (but not exclusively) located in the short-term, while the “longtermism” branding emphasizes instead the determinants of value, which are in the far future.^[1]

In General vs AI-specific explanations of existential risk neglect, Stefan Schubert asks why we systematically neglect existential risk. The standard story invokes general explanations, such as cognitive biases and coordination problems. But Schubert notes that people seem to have specific biases that cause them to underestimate AI risk, e.g. because AI scenarios sound outlandish and counter-intuitive. If unaligned AI is the greatest source of existential risk in the near-term, then these AI-specific biases could explain most of our neglect.

Max Roser’s The future is vast is a powerful new introduction to longtermism. His graphical representations do well to convey the scale of humanity’s potential, and have made it onto the Wikipedia entry for longtermism.

Thomas Kwa’s Effectiveness is a conjunction of multipliers makes the important observation that (1) a person’s impact can be decomposed into a series of impact “multipliers” and that (2) these terms interact multiplicatively, rather than additively, with each other.^[2] For example, donating 80% instead of 10% multiplies impact by a factor of 8 and earning $1m/year instead of $250k/year multiplies impact by a factor of 4; but doing both of these things multiplies impact by a factor of 32. Kwa shows that many other common EA choices are best seen as multipliers of impact, and notes that multipliers related to judgment and ambition are especially important for longtermists.

The first installment in a series on “learning from crisis”, Jan Kulveit’s Experimental longtermism: theory needs data (co-written with Gavin Leech) recounts the author’s motivation to launch Epidemic Forecasting, a modelling and forecasting platform that sought to present probabilistic data to decisionmakers and the general public. Kulveit realized that his “longtermist” models had relatively straightforward implications for the COVID pandemic, such that trying to apply them to this case (1) had the potential to make a direct, positive difference to the crisis and (2) afforded an opportunity to experimentally test those models. While the first of these effects had obvious appeal, Kulveit considers the second especially important from a longtermist perspective: attempts to think about the long-term future lack rapid feedback loops, and disciplines that aren’t tightly anchored to empirical reality are much more likely to go astray. He concludes that longtermists should engage more often in this type of experimentation, and generally pay more attention to the longtermist value of information that “near-termist” projects can sometimes provide.

Rhys Lindmark’s FTX Future Fund and Longtermism considers the significance of the Future Fund within the longtermist ecosystem by examining trends in EA funding over time. Interested readers should look at the charts in the original post for more details, but roughly it looks like Open Philanthropy has allocated about 20% of its budget to longtermist causes in recent years, accounting for about 80% of all longtermist grantmaking. On the assumption that Open Phil gives $200 million to longtermism in 2022, the Future Fund lower bound target of $100 million already positions it as the second-largest longtermist grantmaker, with roughly a 30% share. Lindmark’s analysis prompted us to create a Metaculus question on whether the Future Fund will give more than Open Philanthropy to longtermist causes in 2022. At the time of publication (22 April 2022), the community predicts that the Future Fund is 75% likely to outspend Open Philanthropy.

Holden Karnofsky’s Debating myself on whether “extra lives lived” are as good as “deaths prevented” is an engaging imaginary dialogue between a proponent and an opponent of Total Utilitarianism. Karnofsky manages to cover many of the key debates in population ethics—including those surrounding the Intuition of Neutrality, the Procreation Asymmetry, the Repugnant and Very Repugnant Conclusions, and the impossibility of Theory X—in a highly accessible yet rigorous manner. Overall, this blog post struck us as one of the best popular, informal introductions to the topic currently available.

Matthew Barnett shares thoughts on the risks from SETI. People underestimate the risks from passive SETI—scanning for alien signals without transmitting anything. We should consider the possibility that alien civilizations broadcast messages designed to hijack or destroy their recipients. At a minimum, we should treat alien signals with as much caution as we would a strange email attachment. However, current protocols are to publicly release any confirmed alien messages, and no one seems to have given much thought to managing downside risk. Overall, Barnett estimates a 0.1–0.2% chance of extinction from SETI over the next 1,000 years. Now might be a good opportunity for longtermists to figure out, and advocate for, some more sensible policies.

Scott Alexander provides an epic commentary on the long-running debate about AI Takeoff Speeds. Paul Christiano thinks it more likely that improvements in AI capabilities, and the ensuing transformative impacts on the world, will happen gradually. Eliezer Yudkowsky thinks there will be a sudden, sharp jump in capabilities, around the point we build AI with human-level intelligence. Alexander presents the two perspectives with more clarity than their main proponents, and isolates some of the core disagreements. It’s the best summary of the takeoff debate we’ve come across.

Buck Shlegeris points out that takeoff speeds have a huge effect on what it means to work on AI x-risk. In fast takeoff worlds, AI risk will never be much more widely accepted than it is today, because everything will look pretty normal until we reach AGI. The majority of AI alignment work that is done before this point will be from the sorts of existential risk–motivated people working on alignment now. In slow takeoff worlds, by contrast, AI researchers will encounter and tackle many aspects of the alignment problem “in miniature”, before AI is powerful enough to pose an existential risk. So a large fraction of alignment work will be done by researchers motivated by normal incentives, because making AI systems that behave well is good for business. In these worlds, existential risk–motivated researchers today need to be strategic, and identify and prioritise aspects of alignment that won’t be solved “by default” in the course of AI progress. In the comments, John Wentworth argues that there will be stronger incentives to conceal alignment problems than to solve them. Therefore, contra Shlegeris, he thinks AI risk will remain neglected even in slow takeoff worlds.

Linchuan Zhang’s Potentially great ways forecasting can improve the longterm future identifies several different paths via which short-range forecasting can be useful from a longtermist perspective. These include (1) improving longtermist research by outsourcing research questions to skilled forecasters; (2) improving longtermist grantmaking by predicting how potential grants will be assessed by future evaluators; (3) improving longtermist outreach by making claims more legible to outsiders; and (4) improving the longtermist training and vetting pipeline by tracking forecasting performance in large-scale public forecasting tournaments.

Zhang’s companion post, Early-warning Forecasting Center: What it is, and why it’d be cool, proposes the creation of an organization whose goal is to make short-range forecasts on questions of high longtermist significance. A foremost use case is early warning for AI risks, biorisks, and other existential risks. Besides outlining the basic idea, Zhang discusses some associated questions, such as why the organization should focus on short- rather than long-range forecasting, why it should be a forecasting center rather than a prediction market, and how the center should be structured.

Dylan Mathews’s The biggest funder of anti-nuclear war programs is taking its money away looks at the reasons prompting the MacArthur Foundation to announce its exit from grantmaking in nuclear security. (For reference: in 2018, the Foundation accounted for 45% of all philanthropic funding in the field.) The decision was partly based on the conclusions of what appears to be a flawed report by the consulting firm ORS Impact, which “repeatedly seemed to blame the MacArthur strategy for not overcoming structural forces that one foundation could never overcome”. Fortunately, there are some hopeful developments in this space, as we report in the next section.

Matthews also examines Congress’s epic pandemic funding failure. Per one recent estimate, COVID-19 cost the US upwards of $10 trillion. The Biden administration proposed spending $65 billion to reduce the risk of future pandemics, including major investments in vaccine manufacturing capacity, therapeutics, and early-warning systems. Congress isn’t keen, and is agreeing to a mere $2 billion of spending: better than nothing, but nowhere near enough to materially reduce pandemic risk.

Alene Anello’s Who is protecting animals in the long-term future? describes a bizarre educational program, funded by the United States Department of Agriculture, that stimulates students to think about ways to raise chickens on Mars. Although factory farming doesn’t strike us as particularly likely to persist for more than a few centuries, either on Earth or elsewhere in the universe,^[3] we do believe that other scenarios involving defenseless moral patients (including digital sentients) warrant serious longtermist concern.

Over the past few weeks, several posts on the EA Forum have raised various concerns regarding the recent influx of funding to the effective altruism community. We agree with Stefan Schubert that George Rosenfeld’s Free-spending EA might be a big problem for optics and epistemics is the strongest of these critical articles. Rosenfeld’s first objection (“optics”) is that, realities aside, many people—including committed effective altruists—are starting to perceive lots of EA spending as not just wasteful, but also self-serving. Besides exposing the movement to damaging external criticism, this perception may repel proto-EAs and, over time, alter the composition of our community. Rosenfeld’s second objection (“epistemics”) is that, because one can now get plenty of money by becoming a group organizer or by participating in other EA activities, it has become more difficult to think critically about the movement. Rosenberg concludes by sharing some suggestions on how to mitigate these problems.

News

Open Philanthropy has launched the Century Fellowship, offering generous support to early-career individuals doing longtermist-relevant work. Applications to join the 2022 cohort are open until the end of the year and will be assessed on a rolling basis.

The Centre for the Governance of AI is hiring an Operations Manager and Associate. Applications are open until May 15th.

William MacAskill’s long-awaited book, What We Owe The Future, is available to pre-order. It will be released on August 16th in the United States and on September 1st in the United Kingdom.

The Cambridge Existential Risks Initiative published a collection of cause profiles to accompany their 2022 Summer Research Fellowship. It includes overviews of climate change, AI safety, nuclear risk, and meta, as well as other supplementary articles.

The 80,000 Hours Podcast released two relevant conversations: one with Joan Rohlfing on how to avoid catastrophic nuclear blunders, and one with Sam Bankman-Fried on taking a high-risk approach to entrepreneurship and altruism.

Upon learning that the MacArthur Foundation was leaving the field of nuclear security, Longview Philanthropy decided to launch its own nuclear security grantmaking program. Carl Robichaud—who until 2021 was Program Officer at the Carnegie Corporation, running the second-largest nuclear security grantmaking program—will be joining full-time next year. Provided that promising enough opportunities are found, Longview expects to make at least $10 million in grants—and this amount may grow substantially depending on what new opportunities they are able to identify. Longview is also hiring for a co-lead on the program. They are looking for applicants with a “strong understanding of the implications of longtermism” and you, dear reader of this newsletter, might be just the right candidate. Apply here.

Last month, we wrote about the Future Fund’s project ideas competition. The awards have now been announced. Six submissions received each a prize of $5,000:

A working group on civilizational refugees composed of Linchuan Zhang, Ajay Karpur and an anonymous collaborator is looking for a technically competent volunteer or short-term contractor to help them refine and sharpen their plans.

Rethink Priorities has a number of positions open in Operations and Research. Since the start of 2021, RP has grown from 15 to 40, and plans to have 60 staff by end of year.

Eli Lifland and Misha Yagudin awarded prizes to some particularly impactful forecasting writing:

Ryan Beck on whether genetic engineering will raise IQ by at least 10 points by 2050.
qassiov on whether synthetic biological weapons will infect at least 100 people by 2030.
FJehn on when carbon capture will cost less than $50 per ton.
rodeoflagellum on how many gene-edited babies will be born by 2030.

The Berlin Hub, an initiative inspired by the EA Hotel, plans to convert a full hotel or similar building into a co-living space later this year. Express your interest here.

Conversation with Petra Kosonen

Petra Kosonen is a doctoral candidate in philosophy at the University of Oxford. Her DPhil thesis, supervised by Andreas Mogensen and Teruji Thomas, focuses on population ethics and decision theory, especially issues surrounding probability fanaticism. Previously, she studied at the University of Glasgow and the University of Edinburgh. Later this year, she will be starting a postdoc at the newly launched Population Wellbeing Initiative, which aspires to be the world’s leading centre for research on utilitarianism. She is also a Global Priorities Fellow at the Forethought Foundation and a participant of the FTX fellowship.

Future Matters: Some of your research focuses on what you call “probability discounting” and whether it undermines longtermism. Could you tell us what you mean by “probability discounting” and your motivation for looking at this?

Petra Kosonen: Probability discounting is the idea that we should ignore tiny probabilities in practical decision-making. Probability discounting has been proposed in response to cases that involve very small probabilities of huge payoffs, like Pascal’s Mugging.

For those who’re not familiar with this case, it goes like this: A stranger approaches you and promises to use magic that will give you a thousand quadrillion happy days in the seventh dimension if you pay him a small amount of money. Should you do that? Well, there is a very small, but non-zero, probability that the stranger is telling the truth. And if he is telling the truth, then the payoff is enormous. Provided that the payoff is sufficiently great, the offer has positive expected utility, or at least that’s the idea. Also, the mugger points out that if you have a non-zero credence in the mugger being able to deliver any finite amount of utility, then the mugger can always increase the payoff until the offer has positive expected utility—at least if your utilities are unbounded.

Probability discounting avoids the counterintuitive implication that you should pay the mugger by discounting the tiny probability of the mugger telling the truth down to zero. More generally, probability discounting is one way to avoid fanaticism, a term used to refer to the philosophical view that for every bad outcome, there is a tiny probability of a horrible outcome that is worse, and that for every good outcome, there is a tiny probability of a great payoff that is better. Other possible ways of avoiding fanaticism are, for example, having bounded utilities or conditionalising on knowledge before maximising expected utility.

Future Matters: Within probability discounting, you distinguish between “naive discounting” and other forms of discounting. What do you mean by “naive discounting”?

Petra Kosonen: Naive discounting is one of the simplest ways of cashing out probability discounting. On this view, there is some threshold probability such that outcomes whose probabilities are below this threshold are ignored by conditionalising on not obtaining these outcomes.

One obvious problem with naive discounting is where this threshold is located. When are probabilities small enough to be discounted? Some have suggested possible thresholds. For example, Buffon suggested that the threshold should be one-in-ten-thousand. And Condorcet gave an amusingly specific threshold: 1 in 144,768. Buffon chose his threshold because it was the probability of a 56-year-old man dying in one day—an outcome reasonable people usually ignore. Condorcet had a similar justification. More recently, Monton has suggested a threshold of 1 in 2 quadrillion—significantly lower than the thresholds given by the historical thinkers. Monton thinks that the threshold is subjective within reason: there is no single threshold for everybody.

Another problem for naive discounting comes from individuating outcomes. The problem is that if we individuate outcomes very finely by giving a lot of information about them, then all outcomes will have probabilities that are below the threshold. One possible solution is to individuate outcomes by utilities. The idea is that outcomes are considered “the same outcome” if their utilities are the same. This doesn’t fully solve the problem though. In some cases, all outcomes might have zero probability. Imagine for example an ideally shaped dart that is thrown on a dartboard. The probability that it hits a particular point may be zero.

Lastly, one problem for naive discounting is that it violates dominance. Imagine a lottery that gives you a tiny probability of some prize and otherwise nothing, and compare this to a lottery that surely gives you nothing. The former lottery dominates the latter one, but naive discounting says they are equally good.

Future Matters: Are there forms of probability discounting that avoid the problems of naive discounting?

Petra Kosonen: One could solve the previous dominance violation by considering very-small-probability outcomes as tie-breakers in cases where the prospects are otherwise equally good. This is not enough to avoid violating dominance though, because the resulting view still violates dominance in a more complicated case. There are also many other ways of cashing out probability discounting. Naive discounting ignores very-small-probability outcomes. Instead, one could ignore states of the world that have tiny probabilities of occurring. The different versions of this kind of “state discounting” have other problems, though. For example, they give cyclic preference orderings or violate dominance principles in other ways.

There is also tail discounting. On this view, you should first order all the possible outcomes of a prospect in terms of betterness. Then you should ignore the edges, that is, the very best and the very worst outcomes. Tail discounting solves the problems with individuating outcomes and dominance violations. But it also has one big problem: it can be money-pumped. This means that someone with this view would end up paying for something they could have kept for free—which makes it less plausible as a theory of instrumental rationality.

Future Matters: Why do you think that probability discounting, in any of its forms, does not undermine longtermism?

Petra Kosonen: In one of my papers I go through three arguments against longtermism from discounting small probabilities. I focus on existential risk mitigation as a longtermist intervention. The first argument is a very obvious one: that the probabilities of existential risks are so tiny that we should just ignore existential risks. This is the “Low Risks Argument”. But, it does not seem to be the case that the risks are so small. Even in the next 100 years many existential risks are estimated to be above any reasonable discounting thresholds. For example, Toby Ord has estimated that net existential risk in the next 100 years is ¹⁄₆. The British astronomer Sir Martin Rees has an even more pessimistic view. He thinks that the odds are no better than fifty-fifty that our present civilisation survives to the end of the century. And the risks from specific sources also seem to be relatively high. Some estimates Ord gives include, for example, 1 in 30 risk from engineered pandemics and 1 in 10 risk from unaligned artificial intelligence. (See Michael Aird’s database for many more estimates.)

But now we come to the problem of how outcomes should be individuated. Although the risks in the next 100 years are above any reasonable discounting thresholds, the probability of an existential catastrophe due to a pandemic on the 4th of January 2055 at 13:00-14:00 might be tiny. Similarly the risk might be tiny at 14.00-15:00, and so on. Of course ignoring a high net existential risk on the basis of individuating outcomes this finely would be mad. But it is difficult to see how naive discounting can avoid this implication. Even if we individuate outcomes by utilities, we might end up individuating outcomes too finely because every second that passes could add a little bit of utility to the world.

I mentioned earlier that tail discounting can solve the problem of outcome individuation. But what does it say about existential risk mitigation? Consider one type of existential risk: human extinction. Tail discounting probably wouldn’t tell us to ignore the possibility of a near-term human extinction even if its probability was tiny. Recall that tail discounting only ignores the very best and the very worst outcomes, provided that their probabilities are tiny. As long as there are sufficiently high probabilities of both better and worse outcomes than human extinction, human extinction will be a “normal” outcome in terms of value. So we should not ignore it on this view.

The second argument against longtermism that I discuss in the paper concern the size of the future. For longtermism to be true, it also needs to be true that there is in expectation a great number of individuals in the far future—otherwise it would not be the case that relatively small changes in the probability of an existential catastrophe have great expected value. The “small future argument” states that once we ignore very-small-probability scenarios, such as space settlement and digital minds, the expected number of individuals in the far future is too small for longtermism to be true. Again, consider tail discounting. Space settlement and digital minds might be the kind of unlikely best-case-scenarios that tail discounting ignores. So is the small future argument right if you accept tail discounting? No, it does not seem so. Even if you ignore these scenarios, in expectation there seems to be enough individuals in the far future, at least if we take the far future to start in 100 years. This is true even on the relatively conservative numbers that Hilary Greaves and Will MacAskill use in their paper “The case for strong longtermism”.

The final argument against longtermism that I discuss in the paper states that the probability of making a difference to whether an existential catastrophe occurs or not is so small that we should ignore it. This is the “no difference argument”. Earlier I mentioned the idea of state discounting on which you should ignore states that are associated with tiny probabilities. State discounting captures the idea of the no difference argument naturally: there is that one state in which an existential catastrophe happens no matter what you do, one state in which an existential catastrophe does not happen no matter what you do and the third state in which your actions can make a difference to whether the catastrophe happens or not. And, if the third state is associated with a tiny probability, then you should ignore it.

I think the no difference argument is the strongest of the three arguments against longtermism that I discuss. Plausibly, at least for many of us, the probability of making a difference is indeed small, possibly less than some reasonable discounting thresholds. But there are some responses to this argument. First, as I mentioned earlier, the different versions of state discounting face problems like cyclic preference orderings and dominance violations. So we might want to reject state discounting for these reasons. Secondly, state discounting faces collective action type problems. For example, imagine an asteroid heading towards the Earth. There are multiple asteroid defence systems and (unrealistically) each has a tiny probability of hitting the asteroid and preventing a catastrophe. But the probability of preventing a catastrophe is high if enough of them try. Suppose that attempting to stop the asteroid involves some small cost. State discounting would then recommend against attempting to stop the asteroid, because the probability of making a difference is tiny for each individual. Consequently, the asteroid will almost certainly hit the Earth.

To solve these kind of cases, state discounting would need to somehow take into account the choices other people face. But if it does so, then it no longer undermines longtermism. This is because plausibly we collectively, for example the Effective Altruism movement, can make a non-negligible difference to whether an existential catastrophe happens or not. So, my response to the no difference argument is that if there is a solution to the collective action problems, then this solution will also block the argument against longtermism. But if there is no solution to these problems, then state discounting is significantly less plausible as a theory. Either way, we don’t need to worry about the no difference argument.

To sum up, my overall conclusion is that discounting small probabilities doesn’t undermine longtermism.

Future Matters: Thanks, Petra!

For helpful feedback on our first issue, we thank Sawyer Bernath, Ryan Carey, Evelyn Ciara, Alex Lawsen, Howie Lempel, Garrison Lovely and David Mears. We owe a special debt of gratitude to Fin Moorhouse for invaluable technical advice and assistance.

^
It’s also possible that Alexander is using “existential risk” to just mean “risk of human extinction”.
^
As the author acknowledges, this point has been made before; see e.g. this talk by Toby Ord or this paper by Will MacAskill. But Kwa’s post presents the insight with particular clarity and vividness, and it may help even those already familiar with it better internalize it.
^
For roughly the reasons articulated in this comment by Robert Wiblin.