Long Reflection Reading List

Last updated: April 26, 2024

This is a reading list on the long reflection and the closely related, more recently coined notions of ASI governance, reflective governance and grand challenges.

I claim that this area outscores regular AI safety on importance[1] while being significantly more neglected (and roughly the same in terms of tractability), making it perhaps the highest priority EA cause area.

I don’t claim to be the ideal person to have made this reading list. The story behind how it came about is that two months ago, Will MacAskill wrote: “I think there’s a lot of excitement about work in this broad area that isn’t yet being represented in places like the Forum. I’d be keen for more people to start learning about and thinking about these issues.” Intrigued, I spent some time trying to learn about the issues Will[2] was pointing to. I then figured I’d channel the spirit of “EAs should post more summaries and collections”: this reading list is an attempt to make the path easier for others to follow. Accordingly, it starts at the introductory level, but by the end the reader will be at the frontier of publicly available knowledge. (The frontier as far as I’m aware, at least, and at the time of writing.[3])

DALL·E depiction of some people reflecting on what to do with the cosmos. They’ve been at it a long time.

Intro

What might we be aiming for?

Is there moral truth? What should we do if not? What are human values, and how might they fit in?

The intention with this section is to overview some of the philosophical background to the long reflection idea. Note that there is a large body of literature on the moral realism vs. antirealism debate, and metaethics more broadly, that exists beyond what’s listed here.

How to think about utopia?

Ideally, I would include some readings on how division or aggregation might work for building a utopia, since this seems like an obvious and important point. For instance, should the light cone be divided such that every person (or every moral patient more broadly, perhaps with the division taking moral weight into account) gets to live in a sliver of the light cone that’s optimized for their preferences? Should everybody’s preferences be aggregated somehow, so that everyone can live together happily in the overall light cone? Something else? However, I was unable to find any real discussion of this point. Let me know in the comments if there are writings I’m missing. For now, I’ll include the two most relevant things I could find as well as a more run-of-the-mill piece on preference aggregation theory.

How to think about avoiding worst-case futures?

How large could the future be?

How to think about counterfactuals?

How much of our future light cone will be colonized by aliens if we don’t colonize it ourselves?

There is a rich literature in the vicinity of this question. For those wanting to work on the long reflection it’s probably not necessary to get into the details of the models/​arguments: a sense for the state of the debate should suffice.

The Fermi paradox

Grabby aliens

What, if anything, can we say about alien values?[5]

Corruption from within

In “Beyond Maxipok”, Cotton-Barratt writes “perhaps there are some ingredients which could be cancerous and distort an otherwise good reflective process from the inside.” This section aims to be about the most salient ways that could arise.

Human safety problems

Metaphilosophy; AI philosophical competence

Values hijack; epistemic hijack

The previous section was about how a reflective process could get distorted from within. This section is about one way—perhaps the most salient way—a reflective process could become corrupted by bad actors /​ from the outside.

On AGI deployment

The readings in this section discuss what the world around the time of AGI might look like. Things could be very alien, and move very fast. How can we raise the chance that good deliberative processes prevail in such a world?

Economic explosion

Intelligence explosion

What do the leading AI companies say they’d do with AGI?

(Ideally this subsection would include all the leading AI companies’ stated plans for how they’d employ AGI if they developed it, but to date only OpenAI has stated such plans, as far as I’m aware.)

The deployment problem

Decisive strategic advantage

Who (or what) will control the future?

The singleton hypothesis

A race to the bottom?

Institutional design for the long reflection

What can we learn from existing institutions?

There are a few institutions whose design we could maybe bootstrap off for designing the long reflection. Below, I try to point out these institutions; there are likely some (many?) I’m missing—let me know in the comments if so. Questions to keep in mind as you read are: What exactly happened to result in the creation of [institution]? How was [institution]’s precise nature determined? (For example, for the U.S. constitution, how were the details of its articles decided?) How long did it all take? (In other words, given these historical examples, and given our best guesses as to AI timelines, when do we need to start building infrastructure for the long reflection?)

Case study writing on lessons we can learn from the above institutions’ formations is a particularly promising direction for work on the long reflection, in my view. The following is an example of excellent writing in a similar vein.

Existing proposals for new AI governance institutions

The proposals here are aimed at reducing catastrophic risk from AI, but institutions for that purpose could(?) quite naturally extend into supporting a democratic reflective process aided by advanced AI—i.e., the long reflection.

International coordination on regulation

Democratic AGI development

The issue of credible compliance

The motivating question for this subsection is: supposing that an AI leader (e.g., the U.S.) wanted to make an agreement to share power and respect other actors’ (e.g., other countries’) sovereignty in the event that it develops superintelligence first, how could it legibly guarantee future compliance with that agreement so that the commitment is credible to these other actors?[7]

(The AI leader might want to make such an agreement partly for ethical reasons, and partly to decrease the incentive for competitors to race and cut corners on safety.)

There doesn’t appear to be any work on this exact issue. However, some work in adjacent areas may be applicable.

Credible compliance: Potential directions forward

The Windfall Clause

The windfall clause is a proposed mechanism for distributing AGI profits for social good. It seems somewhat applicable to the issue of credible compliance: there are presumably learnings to be had from understanding the clause and how it has been received.

AIs making credible commitments

AI systems can in theory precommit[8] to actions in a way that humans cannot. I cannot show you my source code to prove that I will act in a certain way in a certain situation, whereas an AI might be able to.

The path through which this is leveraged to solve the issue of credible compliance might look something like the leading AI company building into its frontier model the commitment to share power and respect other actors’ sovereignty,[9] and displaying this for other actors to see. (Exactly how such a commitment could be built into a model is likely a thorny technical challenge.)

Noteworthy subarea: Cryptographic technologies

Cryptographic technologies like blockchain are potentially relevant to credible commitment.

  • A Tour of Emerging Cryptographic Technologies – Garfinkel (2021)

    • Just “Non-intrusive agreement verification could become feasible” (pp. 48–49) and “It could become possible to solve collection [sic] action problems that existing institutions cannot” (pp. 53–57). (Follow the page numbers at the bottom of each page rather than those in the left sidebar.)

The issue of opposers and defectors

Interstellar coordination

I would ideally include a reading on the problems—coordination or otherwise—an institution or governance regime might run into if its constituents are light-years apart, given that it seems plausible humanity should start expanding before the long reflection is over. (On the latter point, Cotton-Barratt writes: “I’m not using that term [‘the long reflection’] because it’s not obvious that ‘the reflection’ is a distinct phase: it could be that reflection is an ongoing process into the deep future, tackling new (perhaps local) questions as they arise.”) However, I was unable to find anything of great relevance. Let me know in the comments if there are writings I’m missing. I list the one semi-relevant piece I have seen below.

Note that work in this niche is less pressing than other work on institutional design, in my view, as it’s the kind of thing that could be solved during the long reflection (whereas the other things like democratic control and credible compliance need to be in place from the outset). Work in this niche is more pressing the shorter and less controlled one believes the long reflection is likely to be.

  • Succession – Ngo (2023)

    • A couple of points to be aware of:

      • Ngo’s story involves long-distance communication both within a civilization and between civilizations. It’s the former that’s relevant—potentially—to the long reflection.

      • The being/​civilization that’s the first person character seems to mostly be executing on a plan rather than coming up with a plan (i.e., they’ve mostly finished their long reflection).

        • Although it’s interesting to note that their knowledge exchange with the aliens does change their plan a little, which resonates with Cotton-Barratt’s idea about reflection being an ongoing process into the deep future.


Mesa-topics

These are topics that aren’t directly part of establishing a long reflection, but which those doing work or aiming to do work on the long reflection might want to be aware of.[10] Essentially, each of these topics points to an open problem that is arguably a crucial consideration which has to be solved, or dissolved, for the future to be close to maximally valuable. A successful long reflection is a meta-solution, a process that enables these problems to be solved. Under each topic I list the top resource(s), in my opinion, for getting up to speed.[11]

Welfare and moral weights

Theory

Different biological species

Digital minds

Societal considerations

Technical considerations

Is positive experience possible?[12]

Infinite ethics

Decision theory

Commitment races

Updatelessness

Acausal trade /​ Evidential cooperation in large worlds

Acausal threats

Anthropics

Theory

Key problems

The simulation hypothesis

Boltzmann brains

Everett branches /​ Quantum many-worlds

  • David Wallace on the many-worlds theory of quantum mechanics and its implications – Wiblin & Harris (2021)

    • One way this topic could be important, which wasn’t emphasized so much by Wallace, is that if it’s true that the amount of “stuff”—for want of a better word—increases as the number of branches increases (rather than the total amount of stuff remaining constant, with individual branches becoming ever smaller slivers of the total), then, if humanity succeeds at creating a utopia, it could become a moral priority for us to trigger as many branch splittings as possible.

Subtopic: Quantum immortality

Content warning: eternity; suffering. See Raemon’s note.

Cause X

  1. ^

    A crude simplification of my model, which I don’t currently have time to write up in full: if a middling outcome is 1% the value of a great outcome, then going from extinction to a middling outcome is 199 as valuable as going from a middling outcome to a great one.

  2. ^

  3. ^

    If this reading list turns out to be useful then maybe I’ll keep it updated, or maybe someone more qualified than me will step into that role.

  4. ^

    Original discussion of the long reflection indicated that it could be a lengthy process of 10,000 years or more. More recent discussion I’m aware of, which is nonpublic hence no corresponding reading, i) takes seriously the possibility that the long reflection could last just weeks rather than years or millenia, and ii) notes that wall clock time is probably not the most useful way to think about the length of the reflection, given that the reflection process, if it happens at all, will likely involve many superfast AIs doing the bulk of the cognitive labor.

  5. ^

    The key overall question here is “how much worse is a light cone colonized by aliens compared to a light cone colonized by humanity, by our lights?” I don’t have a post to link to that answers this question, but I’ve been involved in some nonpublic discussion. My all-things-considered belief is that, according to our values, an alien-colonized light cone would be something like 30% as valuable as a human-colonized light cone. (This 30% figure is highly uncertain and low resilience, though.)

  6. ^

    Some readers may have seen Karnofsky’s “Racing through a minefield: the AI deployment problem” and wonder how that post is different to the one I’ve included in the reading list. So, “Racing through a minefield” is a distillation of “Nearcast-based ‘deployment problem’ analysis”: I include the latter because I consider the extra detail worth knowing about.

  7. ^

    This motivating question is closely adapted from MacAskill’s quick take.

  8. ^

    The terminological difference between “commitment” and “precommitment” is explained here.

  9. ^

    Compliance that routes through AI precommitment in this way is more complicated than standard credible commitment between AIs, which does not have a step involving humans. For compliance to be credible, the humans behind the leading AI model presumably must not be able to override its precommitments. There is a tension here with corrigibility that may render this direction a non-starter.

  10. ^

    Mesa- is a Greek prefix that means the opposite of meta-. To “go meta” is to go one level up; to “go mesa” is to go one level down. So a mesa-topic is a topic one level down from the one you were on.

  11. ^

    With a bias towards resources that explain why their topic is decision-relevant. In practice, this means that more of the resources I list are EA-sphere writings than would otherwise be the case.

  12. ^

    For readers who think the answer is obviously “yes,” I point you to “Narrative Self-Deception: The Ultimate Elephant in the Brain?” (Vinding, 2018). (I used to think the answer was obviously yes; I changed my mind when I became proficient at meditation /​ paying close attention to experience.)

  13. ^

    H/​T Michael St. Jules for making me aware of this lecture.

  14. ^

    H/​T Ben West for making me aware of FNC.