Long Reflection Reading List
Last updated: June 5, 2024
This is a reading list on the long reflection and the closely related, more recently coined notions of ASI governance, reflective governance and grand challenges.
I claim that this area outscores regular AI safety on importance[1] while being significantly more neglected (and roughly the same in terms of tractability), making it perhaps the highest priority EA cause area.
I don’t claim to be the ideal person to have made this reading list. The story behind how it came about is that two months ago, Will MacAskill wrote: “I think there’s a lot of excitement about work in this broad area that isn’t yet being represented in places like the Forum. I’d be keen for more people to start learning about and thinking about these issues.” Intrigued, I spent some time trying to learn about the issues Will[2] was pointing to. I then figured I’d channel the spirit of “EAs should post more summaries and collections”: this reading list is an attempt to make the path easier for others to follow. Accordingly, it starts at the introductory level, but by the end the reader will be at the frontier of publicly available knowledge. (The frontier as far as I’m aware, at least, and at the time of writing.[3])
Intro
Long reflection – EA Forum Wiki
Quotes about the long reflection – MichaelA (2020)[4]
The Precipice – Ord (2020)
Just chapter 7, including endnotes.
Beyond Maxipok — good reflective governance as a target for action – Cotton-Barratt (2024)
New Frontiers in Effective Altruism – MacAskill (2024)
This was a talk given at EAG Bay Area 2024. It doesn’t appear to be available as a recording yet, but I’ll add it if and when it goes up.
Quick take on Grand Challenges – MacAskill (2024)
The part about hiring is no longer relevant, but the research projects MacAskill outlines still give a sense for what good future work on grand challenges / the long reflection might look like.
Criticism of the long reflection idea:
‘Long Reflection’ Is Crazy Bad Idea – Hanson (2021)
Objections: What about “long reflection” and the division of labor? – Vinding (2022)
Just the highlighted section.
The Long Reflection as the Great Stagnation – Larks (2022)
Lukas Gloor’s comment is worth reading as well.
What might we be aiming for?
Is there moral truth? What should we do if not? What are human values, and how might they fit in?
The intention with this section is to overview some of the philosophical background to the long reflection idea. Note that there is a large body of literature on the moral realism vs. antirealism debate, and metaethics more broadly, that exists beyond what’s listed here.
Moral Uncertainty and the Path to AI Alignment with William MacAskill – AI Alignment Podcast by the Future of Life Institute (2018)
See also Shah (2018)’s summary and commentary.
See also this comment exchange between Michael Aird and Lukas Gloor (2020), which zooms in on the realism vs. antirealism wager and how it relates to the long reflection.
Complexity of value – LessWrong Wiki
Moral ~realism – Cotton-Barratt (2024)
Wei Dai’s and Lukas Gloor’s comments are worth reading in addition to the main post.
Why should ethical anti-realists do ethics? – Carlsmith (2023)
Coherent extrapolated volition – Arbital
On the limits of idealized values – Carlsmith (2021)
The comment exchange between Carlsmith and Richard Ngo is also worth reading.
How to think about utopia?
Hedonium and computronium – EA Forum Wiki
Terms that tend to come up in discussions of utopia.
Why Describing Utopia Goes Badly – Karnofsky (2021)
Visualizing Utopia – Karnofsky (2021)
Characterising utopia – Ngo (2020)
Actually possible: thoughts on Utopia – Carlsmith (2021)
Deep Utopia – Bostrom (2024)
(If and when someone writes a summary of this book I’ll add it to this reading list.)
Ideally, I would include some readings on how division or aggregation might work for building a utopia, since this seems like an obvious and important point. For instance, should the light cone be divided such that every person (or every moral patient more broadly, perhaps with the division taking moral weight into account) gets to live in a sliver of the light cone that’s optimized for their preferences? Should everybody’s preferences be aggregated somehow, so that everyone can live together happily in the overall light cone? Something else? However, I was unable to find any real discussion of this point. Let me know in the comments if there are writings I’m missing. For now, I’ll include the two most relevant things I could find as well as a more run-of-the-mill piece on preference aggregation theory.
Archipelago and Atomic Communitarianism – Alexander (2014)
Investigating the Long Reflection – Muehlhauser (2023)
Just the “When to stop reflecting” section.
Social Choice Theory – Stanford Encyclopedia of Philosophy
Just section 3, “The aggregation of preferences”.
How to think about avoiding worst-case futures?
Cause prioritization for downside-focused value systems – Gloor (2018)
The option value argument doesn’t work when it’s most needed – Oswald-Drummond (2023)
How large could the future be?
Astronomical Waste: The Opportunity Cost of Delayed Technological Development – Bostrom (2003)
The Edges of Our Universe – Ord (2021)
Anders Sandberg on war in space, whether civilisations age, and the best things possible in our universe – Wiblin & Harris (2023)
Listen up to 02:23:08 (i.e., stop once you reach the “Room-temperature semiconductors” section).
How to think about counterfactuals?
How much of our future light cone will be colonized by aliens if we don’t colonize it ourselves?
There is a rich literature in the vicinity of this question. For those wanting to work on the long reflection, it’s probably not necessary to get into the details of the models/arguments: a sense for the state of the debate should suffice.
The Fermi paradox
Eternity in six hours: intergalactic spreading of intelligent life and sharpening the Fermi paradox – Armstrong & Sandberg (2013)
Just the abstract.
Dissolving the Fermi Paradox – Sandberg, Drexler & Ord (2018)
Just the abstract.
Quantifying anthropic effects on the Fermi paradox – Finnveden (2019)
Just the summary.
Grabby aliens
What If Humanity Is Among The First Spacefaring Civilizations? – PBS Space Time (2022)
Watch up to 15:17.
This is the Robin Hanson paper the video is based on. I recommend the video over the paper.
Replicating and extending the grabby aliens model – Cook (2022)
Just the summary.
What, if anything, can we say about alien values?[5]
The Grabby Values Selection Thesis: What values do space-faring civilizations plausibly have? – Buhler (2023)
kokotajlod’s comments are worth reading in addition to the main post.
Why we may expect our successors not to care about suffering – Buhler (2023)
Will Aldred’s (i.e., my) comment is also worth reading, in my humble opinion.
Corruption from within
In “Beyond Maxipok”, Cotton-Barratt writes “perhaps there are some ingredients which could be cancerous and distort an otherwise good reflective process from the inside.” This section aims to be about the most salient ways that could arise.
Human safety problems
Two Neglected Problems in Human-AI Safety – Dai (2018)
Three AI Safety Related Ideas – Dai (2018)
Just sections 1 and 2. (Section 3 will appear later in this reading list.)
This expansive comment thread featuring Rohin Shah, Paul Christiano and Dai is also worth reading.
Christiano, in his first comment, mentions corrigibility, IDA, and agent foundations. For those unfamiliar, see corrigibility explained here. IDA and agent foundations are probably not worth rabbit-holing into—they won’t appear again in this reading list—but for readers who are interested I point you to the explainers here and here, respectively.
Morality is Scary – Dai (2021)
Daniel Kokotajlo’s comment is also worth reading.
Decoupling deliberation from competition – Christiano (2021)
The comments, especially the top exchange between Wei Dai and Christiano, are also worth reading.
Two sources of human misalignment that may resist a long reflection: malevolence and ideological fanaticism – Althaus (2024)
Althaus’s follow-up comment is also worth reading.
Metaphilosophy; AI philosophical competence
A comment by Wei Dai (2023) that helps frame this subsection.
(No need to read the replies.)
Three AI Safety Related Ideas – Dai (2018)
Just section 3. (Short.)
This reading might not fully make sense to non-technical folks. The next one explores the same point but in less technical language.
AI doing philosophy = AI generating hands? – Dai (2024)
The Argument from Philosophical Difficulty – Dai (2019)
Some Thoughts on Metaphilosophy – Dai (2019)
Meta Questions about Metaphilosophy – Dai (2023)
The comment threads with Connor Leahy and Anthony DiGiovanni are worth reading in addition to the main post.
Essay competition on the Automation of Wisdom and Philosophy — $25k in prizes – Cotton-Barratt (2024)
I would call this post an early-stage research agenda in AI philosophical competence. (Note: the post is pertinent even if the competition is no longer open.)
Values hijack; epistemic hijack
The previous section was about how a reflective process could get distorted from within. This section is about one way—perhaps the most salient way—a reflective process could become corrupted by bad actors / from the outside.
Two Neglected Problems in Human-AI Safety – Dai (2018)
Just section 2, “How to defend against intentional attempts by AIs to corrupt human values?”
This comment exchange between Rohin Shah and Dai is worth reading as well.
Persuasion Tools: AI takeover without AGI or agency? – Kokotajlo (2020)
Project ideas: Epistemics – Finnveden (2024)
On AGI deployment
The readings in this section discuss what the world around the time of AGI might look like. Things could be very alien, and move very fast. How can we raise the chance that good deliberative processes prevail in such a world?
Economic explosion
All Possible Views About Humanity’s Future Are Wild – Karnofsky (2021)
The Duplicator: Instant Cloning Would Make the World Economy Explode – Karnofsky (2021)
Digital People Would Be An Even Bigger Deal – Karnofsky (2021)
Intelligence explosion
What a Compute-Centric Framework Says About Takeoff Speeds – Davidson (2023)
Quoting Davidson on how to read the report:
“Non-technical readers should first watch this video presentation (slides), then read this blog post, and then play around with the Full Takeoff Model here.
Moderately technical readers should first read the short summary, then play around with the Full Takeoff Model here, and then read the long summary.
If you have a background in growth economics, or are particularly mathsy, you might want to read this concise mathematical description of the Full Takeoff Model.”
At this point I’ll highlight one of Will MacAskill’s suggested research projects: “Figuring out what a good operationalisation of transformative AI would be, for the purpose of creating an early tripwire to alert the world of an imminent intelligence explosion.”
What do the leading AI companies say they’d do with AGI?
(Ideally this subsection would include all the leading AI companies’ stated plans for how they’d employ AGI if they developed it, but to date only OpenAI has stated such plans, as far as I’m aware.)
Planning for AGI and beyond – Altman (2023)
Comments on OpenAI’s “Planning for AGI and beyond” – Soares (2023)
Thoughts on the OpenAI Strategy – Gooen (2024)
A proposal for importing society’s values – Leike (2023)
As context, Leike currently co-leads OpenAI’s Superalignment team. It is worth noting, though, that the proposal “does not necessarily represent [his] employer’s views or plans.”
The deployment problem
Nearcast-based “deployment problem” analysis – Karnofsky (2022)[6]
What’s going on with ‘crunch time’? – Hadshar (2023)
Decisive strategic advantage
Superintelligence 7: Decisive strategic advantage – Grace (2014)
Review of Soft Takeoff Can Still Lead to DSA – Kokotajlo (2021)
Who (or what) will control the future?
The singleton hypothesis
What is a Singleton? – Bostrom (2005)
Will transformative AI result in a singleton (as opposed to a multipolar world)? – Metaculus
A race to the bottom?
The Future of Human Evolution – Bostrom (2004)
Meditations on Moloch – Alexander (2014)
Will Humanity Choose Its Future? – Assadi (2023)
Guive Assadi on Whether Humanity Will Choose Its Future – Moorhouse (2023)
This podcast episode overviews and discusses the above paper.
The Age of Em – Hanson (2016)
This is a long book that goes deep into the weeds of one particular race to the bottom scenario. I include it only as further reading.
Institutional design for the long reflection
What can we learn from existing institutions?
There are a few institutions whose design we could maybe bootstrap off for designing the long reflection. Below, I try to point out these institutions; there are likely some (many?) I’m missing—let me know in the comments if so. Questions to keep in mind as you read are: What exactly happened to result in the creation of [institution]? How was [institution]’s precise nature determined? (For example, for the U.S. constitution, how were the details of its articles decided?) How long did it all take? (In other words, given these historical examples, and given our best guesses as to AI timelines, when do we need to start building infrastructure for the long reflection?)
The U.S. Constitution
Constitution of the United States – Wikipedia
The UN
United Nations – Wikipedia
Charter of the United Nations – Wikipedia
Notable treaties
Kyoto Protocol – Wikipedia
Montreal Protocol – Wikipedia
Treaty on the Non-Proliferation of Nuclear Weapons – Wikipedia
Case study writing on lessons we can learn from the above institutions’ formations is a particularly promising direction for work on the long reflection, in my view. The following is an example of excellent writing in a similar vein.
Existing proposals for new AI governance institutions
The proposals here are aimed at reducing catastrophic risk from AI, but institutions for that purpose could(?) quite naturally extend into supporting a democratic reflective process aided by advanced AI—i.e., the long reflection.
International coordination on regulation
International Institutions for Advanced AI – Ho et al. (2023)
This was a white paper from Google DeepMind: see here.
International coordination on development
Multinational AGI Consortium (MAGIC): A Proposal for International Coordination on AI – Hausenloy, Miotti & Dennis (2023)
UK-led democratic AGI:
Securing Liberal Democratic Control of AGI through UK Leadership – Phillips & Rajkumar (2023)
Import AI 321: Open source GPT3; giving away democracy to AGI companies; GPT-4 is a political artifact – Clark (2023)
Just the highlighted section.
Responses to comments on our democratic control of AGI paper – Phillips (2023)
The issue of credible compliance
The motivating question for this subsection is: supposing that an AI leader (e.g., the U.S.) wanted to make an agreement to share power and respect other actors’ (e.g., other countries’) sovereignty in the event that it develops superintelligence first, how could it legibly guarantee future compliance with that agreement so that the commitment is credible to these other actors?[7]
(The AI leader might want to make such an agreement partly for ethical reasons, and partly to decrease the incentive for competitors to race and cut corners on safety.)
There doesn’t appear to be any work on this exact issue. However, some work in adjacent areas may be applicable.
Credible compliance: Potential directions forward
Project ideas: Governance during explosive technological growth – Finnveden (2024)
Just the short section on “Dubiously enforceable promises”.
The Windfall Clause
The windfall clause is a proposed mechanism for distributing AGI profits for social good. It seems somewhat applicable to the issue of credible compliance: there are presumably learnings to be had from understanding the clause and how it has been received.
The Windfall Clause – O’Keefe et al. (2020)
I recommend reading the conference paper and then the main paper.
Distributing the Benefits of AI via the Windfall Clause with Cullen O’Keefe – FLI Podcast (2020)
On the windfall clause – Leike (2020)
A Windfall Clause for CEO could worsen AI race dynamics – Larks (2023)
Will Any Major AI Company Commit to an AI Windfall Clause by 2025? – Metaculus
The analogy here is: “If we were to hand an agreement proposal—one about sharing power with other countries in the event the U.S. government develops/controls superintelligence—to U.S. government decision-makers, how likely would they be to enact the proposed agreement? What could make them more likely to enact it?”
AIs making credible commitments
AI systems can in theory precommit[8] to actions in a way that humans cannot. I cannot show you my source code to prove that I will act in a certain way in a certain situation, whereas an AI might be able to.
The path through which this is leveraged to solve the issue of credible compliance might look something like the leading AI company building into its frontier model the commitment to share power and respect other actors’ sovereignty,[9] and displaying this for other actors to see. (Exactly how such a commitment could be built into a model is likely a thorny technical challenge.)
Cooperation, Conflict, and Transformative Artificial Intelligence: A Research Agenda – Clifton (2020)
Just section 3, “Credibility”.
Noteworthy subarea: Cryptographic technologies
Cryptographic technologies like blockchain are potentially relevant to credible commitment.
A Tour of Emerging Cryptographic Technologies – Garfinkel (2021)
Just “Non-intrusive agreement verification could become feasible” (pp. 48–49) and “It could become possible to solve collection [sic] action problems that existing institutions cannot” (pp. 53–57). (Follow the page numbers at the bottom of each page rather than those in the left sidebar.)
The issue of opposers and defectors
Investigating the Long Reflection – Muehlhauser (2023)
Just the “Who will oppose the Long Reflection” and “Coordinating the Long Reflection” sections.
Interstellar coordination
I would ideally include a reading on the problems—coordination or otherwise—an institution or governance regime might run into if its constituents are light-years apart, given that it seems plausible humanity should start expanding before the long reflection is over. (On the latter point, Cotton-Barratt writes: “I’m not using that term [‘the long reflection’] because it’s not obvious that ‘the reflection’ is a distinct phase: it could be that reflection is an ongoing process into the deep future, tackling new (perhaps local) questions as they arise.”) However, I was unable to find anything of great relevance. Let me know in the comments if there are writings I’m missing. I list the one semi-relevant piece I have seen below.
Note that work in this niche is less pressing than other work on institutional design, in my view, as it’s the kind of thing that could be solved during the long reflection (whereas the other things like democratic control and credible compliance need to be in place from the outset). Work in this niche is more pressing the shorter and less controlled one believes the long reflection is likely to be.
Succession – Ngo (2023)
A couple of points to be aware of:
Ngo’s story involves long-distance communication both within a civilization and between civilizations. It’s the former that’s relevant—potentially—to the long reflection.
The being/civilization that’s the first person character seems to mostly be executing on a plan rather than coming up with a plan (i.e., they’ve mostly finished their long reflection).
Although it’s interesting to note that their knowledge exchange with the aliens does change their plan a little, which resonates with Cotton-Barratt’s idea about reflection being an ongoing process into the deep future.
Mesa-topics
These are topics that aren’t directly part of establishing a long reflection, but which those doing work or aiming to do work on the long reflection might want to be aware of.[10] Essentially, each of these topics points to an open problem that is arguably a crucial consideration which has to be solved, or dissolved, for the future to be close to maximally valuable. A successful long reflection is a meta-solution, a process that enables these problems to be solved. Under each topic I list the top resource(s), in my opinion, for getting up to speed.[11]
Welfare and moral weights
Theory
Welfare and moral weights – St. Jules (2024)
Social Choice Theory – Stanford Encyclopedia of Philosophy
Just section 4, “The aggregation of welfare measures or qualitative ratings”.
Different biological species
A comment by Carl Shulman (2023).
Note: this is the Rethink report Shulman is responding to.
The Moral Weight Project Sequence – Fischer (2022)
Digital minds
Societal considerations
Sharing the World with Digital Minds – Shulman & Bostrom (2020)
Propositions Concerning Digital Minds and Society – Bostrom & Shulman (2022)
Technical considerations
Moral consideration for AI systems by 2030 – Sebo & Long (2023)
A Definition of Happiness for Reinforcement Learning Agents – Daswani & Leike (2015)
Is positive experience possible?[12]
Tranquilism – Gloor (2017)
Infinite ethics
On infinite ethics – Carlsmith (2022)
Problems for Impartiality (lecture; handout; see especially the summary table at 45:40) – Russell (2022)[13]
Decision theory
Commitment races
The Commitment Races problem – Kokotajlo (2019)
Updatelessness
Updatelessness doesn’t solve most problems – Soto (2024)
Yudkowsky’s and Habryka’s comments are worth reading as well as the main post. In fact, I’d recommend reading Habryka’s comment first because it helps frame the post.
UDT shows that decision theory is more puzzling than ever – Dai (2023)
Note: “UDT” is short for “updateless decision theory”.
Further note: see here a breakdown of the different decision theory axes.
Is the potential astronomical waste in our universe too small to care about? – Dai (2014)
Acausal trade / Evidential cooperation in large worlds
Acausal trade – EA Forum Wiki
Three reasons to cooperate – Christiano (2022)
Cooperating with aliens and AGIs: An ECL explainer – Nguyen, Aldred & Wasil (2024)
Acausal threats
Distant superintelligences can coerce the most probable environment of your AI – Yudkowsky (2015)
Christiano’s comments are worth reading as well.
See also this related note (just the short, highlighted part of the post).
Anthropics
Theory
Nick Bostrom on Anthropic Selection and Living in a Simulation – Carroll (2020)
And for a deeper dive:
Anthropic Bias – Bostrom (2002)
UDASSA
The Absolute Self-Selection Assumption – Christiano (2011)
Anthropics and the Universal Distribution – Carlsmith (2021)
FNC[14]
Key problems
The simulation hypothesis
Are You Living In a Computer Simulation? – Bostrom (2003)
Simulation arguments – Carlsmith (2022)
Beyond Astronomical Waste – Dai (2018)
The Moral Status of Independent Identical Copies – Dai (2009)
Boltzmann brains
Are You a Boltzmann Brain? – PBS Space Time (2017)
And for a deeper dive:
Boltzmann brain – Wikipedia
Are you in a Boltzmann simulation? – Armstrong (2018)
Types of Boltzmann Brains – Turchin & Yampolskiy (2019)
Everett branches / Quantum many-worlds
David Wallace on the many-worlds theory of quantum mechanics and its implications – Wiblin & Harris (2021)
One way this topic could be important, which wasn’t emphasized so much by Wallace, is that if it’s true that the amount of “stuff”—for want of a better word—increases as the number of branches increases (rather than the total amount of stuff remaining constant, with individual branches becoming ever smaller slivers of the total), then, if humanity succeeds at creating a utopia, it could become a moral priority for us to trigger as many branch splittings as possible.
Subtopic: Quantum immortality
Content warning: eternity; suffering. See Raemon’s note.
Forever and Again: Necessary Conditions for “Quantum Immortality” and its Practical Implications – Turchin (2018)
See also this comment by Natália Mendonça.
In my opinion, Scott Alexander’s “Cryonics without freezers: resurrection possibilities in a Big World” (2012) does a better job of explaining the link between quantum immortality and cryonics. I therefore recommend that post to those who may be interested.
Quantum suicide and immortality – Wikipedia
Cause X
Cause X – EA Forum Wiki
- ^
A crude simplification of my model, which I don’t currently have time to write up in full: if a middling outcome is 1% the value of a great outcome, then going from extinction to a middling outcome is 1⁄99 as valuable as going from a middling outcome to a great one.
- ^
- ^
If this reading list turns out to be useful then maybe I’ll keep it updated, or maybe someone more qualified than me will step into that role.
- ^
Original discussion of the long reflection indicated that it could be a lengthy process of 10,000 years or more. More recent discussion I’m aware of, which is nonpublic hence no corresponding reading, i) takes seriously the possibility that the long reflection could last just weeks rather than years or millenia, and ii) notes that wall clock time is probably not the most useful way to think about the length of the reflection, given that the reflection process, if it happens at all, will likely involve many superfast AIs doing the bulk of the cognitive labor.
- ^
The key overall question here is “how much worse is a light cone colonized by aliens compared to a light cone colonized by humanity, by our lights?” I don’t have a post to link to that answers this question, but I’ve been involved in some nonpublic discussion. My all-things-considered belief is that, according to our values, an alien-colonized light cone would be something like 30% as valuable as a human-colonized light cone. (This 30% figure is highly uncertain and low resilience, though.)
- ^
Some readers may have seen Karnofsky’s “Racing through a minefield: the AI deployment problem” and wonder how that post is different to the one I’ve included in the reading list. So, “Racing through a minefield” is a distillation of “Nearcast-based ‘deployment problem’ analysis”: I include the latter because I consider the extra detail worth knowing about.
- ^
This motivating question is closely adapted from MacAskill’s quick take.
- ^
The terminological difference between “commitment” and “precommitment” is explained here.
- ^
Compliance that routes through AI precommitment in this way is more complicated than standard credible commitment between AIs, which does not have a step involving humans. For compliance to be credible, the humans behind the leading AI model presumably must not be able to override its precommitments. There is a tension here with corrigibility that may render this direction a non-starter.
- ^
Mesa- is a Greek prefix that means the opposite of meta-. To “go meta” is to go one level up; to “go mesa” is to go one level down. So a mesa-topic is a topic one level down from the one you were on.
- ^
With a bias towards resources that explain why their topic is decision-relevant. In practice, this means that more of the resources I list are EA-sphere writings than would otherwise be the case.
- ^
For readers who think the answer is obviously “yes,” I point you to “Narrative Self-Deception: The Ultimate Elephant in the Brain?” (Vinding, 2018). (I used to think the answer was obviously yes; I changed my mind when I became proficient at meditation / paying close attention to experience.)
- ^
H/T Michael St. Jules for making me aware of this lecture.
- ^
H/T Ben West for making me aware of FNC.
- AI governance needs a theory of victory by 21 Jun 2024 16:08 UTC; 80 points) (
- AI governance needs a theory of victory by 21 Jun 2024 16:15 UTC; 34 points) (LessWrong;
- 10 Apr 2023 23:21 UTC; 28 points) 's comment on EA & “The correct response to uncertainty is *not* half-speed” by (
- 23 Jul 2024 1:00 UTC; 16 points) 's comment on Will Aldred’s Quick takes by (
- 3 Jul 2024 2:14 UTC; 9 points) 's comment on Zach Stein-Perlman’s Quick takes by (
- 25 Mar 2024 11:19 UTC; 4 points) 's comment on Moral ~realism by (
Many of those posts in the list seem really relevant to me for the cluster of things you’re pointing at!
On some of the philosophical background assumptions, I would consider adding my ambitiously-titled post The Moral Uncertainty Rabbit Hole, Fully Excavated. (It’s the last post in my metaethics/anti-realism sequence.)
Since the post is long and it says that it doesn’t work maximally well as a standalone piece (without two other posts from earlier in my sequence), it didn’t get much engagement when I published it, so I feel like I should do some advertizing for it here.
As the title indicates, I’m trying to answer questions in that post that many EAs don’t ask themselves because they think about moral uncertainty or moral reflection in an IMO somewhat lazy way.
The post starts with a conundrum for the concept of moral uncertainty:
This insight has implications because we’re now conflating a few different things under the “moral uncertainty” label:
Metaethical uncertainty (i.e., our remaining probability on moral realism) and the strength of possible wagers for acting as though moral realism is true even if our probability in it is low.
Uncertainty over the values we’d choose after long reflection (our “idealized values”, which most people would be motivated to act upon even if moral realism is false).
Related to how we’d get to idealized values, the possibility of having under-defined values, i.e., the possibility that, because moral realism is false, even idealized moral reflection may lead to different endpoints based on very small changes to the procedure, or that a person’s reflection doesn’t “terminate” because their subjective feeling of uncertainty never goes away inside the envisioned reflection procedure.
My post is all about further elaborating on these distinctions and spelling out their implications for effective altruists.
I start out by introducing the notion of a moral reflection procedure to explain what moral reflection in an idealized setting could look like:
For reflection strategies (how to behave inside a reflection procedure), I discuss a continuum from “conservative” to “open-minded” reflection strategies.
Comparing these two reflection strategies is a core theme of the post, and one takeaway I get to is that none of the two ends of the spectrum is superior to the other. Instead, I see moral reflection as a bit of an art, and we just have to find our personal point on the spectrum.
Relatedly, there’s also the question of “What’s the benefit of reflection now” vs. “how much do we want to just leave things to future selves or hypothetical future selves in a reflection procedure.” (The point being that it is is not by-default obvious that moral reflection has to be postponed!)
This is then followed by a discussion on whether “idealized values” are chosen or discovered.
Why do I think this? There’s more in my post, but here are some of the interesting bits, which seem especially relevant to the topic of “long reflection”:
I further discuss the notion of “having under-defined values.” This happens if someone defers to moral reflection with the expectation that it’ll terminate with a specific answer, but they’re pre-disposed to following reflection strategies that are open-ended enough so that the reflection will, in practice, have under-defined outcomes.
Having under-defined values isn’t necessarily a problem – I discuss the pros and cons of it in the post.
Towards the end of the post, there’s a section where I discuss the IMO most sophisticated wager for “acting as though moral realism is true” (the wager for naturalist moral realism, rather than the one for non-naturalist/irreducible-normativity-based moral realism which I discussed earlier in my sequence). In that discussion, I conclude that this naturalist moral realism wager actually often doesn’t overpower what we’d do anyway under anti-realism. (The reasoning here is that naturalist moral realism feels somewhat watered down compared to non-naturalist moral realism, so that it’s actually “built on the same currency” as how we’d anyway structure our reasoning under moral anti-realism. Consequently, whether naturalist moral realism is true isn’t too different from the question of whether idealized values are chosen or discovered – it’s just that now we’re also asking about the degree of moral convergence between different people’s reflection.)
Anyway, that section is hard to summarize, so I recommend just reading it in full in the post (it has pictures and a fun “mountain analogy.”)
Lastly, I end the post with some condensed takeaways in the form of advice for someone’s moral reflection:
Thanks, lots of interesting articles in this list that I missed despite my interest in this area.
One suggestion I have is to add some studies of failed attempts at building/reforming institutions, otherwise one might get a skewed view of the topic. (Unfortunately I don’t have specific readings to suggest.)
A related topic you don’t mention here (maybe due to lack of writings on it?) is maybe humanity should pause AI development and have a long (or even short!) reflection about what it wants to do next, e.g. resume AI development or do something else like subsidize intelligence enhancement (e.g. embryo selection) for everyone who wants it so more people can meaningfully participate in deciding the fate of our world. (I note that many topics on this reading list are impossible for most humans to fully understand, perhaps even with AI assistance.)
This neglect is itself perhaps one of the most important puzzles of our time. With AGI very plausibly just a few years away, why aren’t more people throwing money or time/effort at this cluster of problems just out of self interest? Why isn’t there more intellectual/academic interest in these topics, many of which seem so intrinsically interesting to me?
I think all of:
Many people seem to believe in something like “AI will be a big deal, but the singularity is much further off (or will never happen)”.
People treat the singularity in far mode even if they admit belief.
Previously commited people (especially academics) don’t shift their interests or research areas much based on events in the world, though they do rebrand their prior interests. It requires new people entering fields to actually latch onto new areas and there hasn’t been enough time for this.
People who approach these topics from an altruistic perspective often come away with the view “probably we can mostly let the AIs/future figure this out, other topics seems more pressing and more possible to make progress on.
There aren’t clear shovel ready projects.
It would probably be worth if for someone to write out the ethical implications of K-complexity-weighted utilitarianism/UDASSA on how to think about far-future ethics.
A few things that come to mind about this question (these are all ~hunches and maybe only semi-related, sorry for the braindump):
The description length of earlier states of the universe is probably shorter, which means that the “claw” that locates minds earlier in a simple universe is also shorter. This implies that lives earlier in time in the universe would be more important, and that we don’t have to care about exact copies as much.
This is similar to the reasons why not to care too much about Boltzmann brains.
We might have to aggregate preferences of agents with different beliefs (possible) and different ontologies/metaphysical stances (not sure about this), probably across ontological crises.
I have some preliminary writings on this, but nothing publishable yet.
The outcomes of UDASSA is dependent on the choice of Turing machine. (People say it’s only up to a constant, but that constant can be pretty big).
So we either find a way of classifying Turing machines by simplicity without relying on a single Turing machine to give us that notion, or we start out with some probability distribution over Turing machines and do some “2-level-Solomonoff induction”, where we update both the probability of each Turing machine and the probabilities of each hypothesis for Turing machine.
This leads to selfishness for whoever is computing Solomonoff induction, because the Turing machine where the empty program just outputs their observations receives the highest posterior probability.
If we use UDASSA/K-ultilitarianism to weigh minds there’s a pressure/tradeoff to simplify one’s preferences to be simpler.
If we endorse some kind of total utilitarianism, and there are increasing marginal returns to energymatter or spacetime investment into minds with respect to degree of moral patienthood then we’d expect to end up with very few large minds, if there are decreasing marginal returns we end up with many small minds.
Theorems like Gibbard-Satterthwaite and Hylland imply that robust preference aggregation that resists manipulation is really hard. You can circumvent this by randomly selecting a dictator, but I think this would become unnecesary if we operate in an open-source game theory context, where algorithms can inspect each others’ reasons for a vote.
I’m surprised you didn’t mention reflective equilibrium! Formalising reflective equilibrium and value formation with meta-preferences would be major steps in a long reflection.
I have the intuition that Grand Futures talks about this problem somewhere[1], but I don’t remember/know where.
Which, given its length, isn’t that out there.
Are there any readings about how a long reflection could be realistically and concretely achieved?
Great resource, thanks for putting this together!
I think collections like this are helpful, but it’s a misleading to say it presents the “frontier of publicly available knowledge.”
Taking just the first section on moral truth as an example, it seems like a huge overstatement to say this collection of podcasts and forum posts gets people to the frontier of this subject. Philosophers have spent a long time on this, writing thousands of papers. And at a glance, it seems like all of OPs linked resources don’t even intend to give an overview of the literature on meta-ethics. They instead present their own personal perspectives.
And all of the resources in this section are EA/rationalist affiliated. Surely there have been some people who’ve said intelligent things about the nature of morality prior to Yudkowsky’s birth, right? Neglecting these voices seems like an oversight, especially given the stated goal of getting readers to the frontier of publicly available knowledge.
Going forward, I’d suggest making more modest claims about what can be accomplished by a reading list like this and expanding the range of perspectives that’s considered worth listening to.