I am the main organizer of Effective Altruism Cambridge (UK), a group of people who are thinking hard about how to help others the most and address the world’s most pressing problems through their careers.
Previously, I worked in organizations such as EA France (community director), Existential Risk Alliance (research fellow), and the Center on Long-Term Risk (events and community associate).
I’ve conducted research on various longtermist topics (some of it posted here on the EA Forum) and recently finished a Master’s in moral philosophy.
I’ve also written some stuff on LessWrong.
You can give me anonymous feedback here. :)
Jim Buhler
Some governance research ideas to prevent malevolent control over AGI and why this might matter a hell of a lot
Predicting what future people value: A terse introduction to Axiological Futurism
Why we may expect our successors not to care about suffering
Future technological progress does NOT correlate with methods that involve less suffering
The Grabby Values Selection Thesis: What values do space-faring civilizations plausibly have?
What the Moral Truth might be makes no difference to what will happen
What values will control the Future? Overview, conclusion, and directions for future work
Great piece, thanks !
Since you devoted a subsection to moral circle expansion as a way of reducing s-risks, I guess you consider that its beneficial effects outweigh the backfire risks you mention (at least if MCE is done “in the right way”). CRS’ 2020 End-of-Year Fundraiser post also induces optimism regarding the impact of increasing moral consideration for artificial minds (the only remaining doubts seem to be about when and how to do it).
I wonder how confident we should be about this (the positiveness of MCE in reducing s-risks), at this point? Have you – or other researchers – made estimates confirming this, for instance? :)
EDIT: Your piece Arguments for and against moral advocacy (2017) already raises relevant considerations but perhaps your view on this issue is clearer now.
Very interesting, Wei! Thanks a lot for the comment and the links.
TL;DR of my response: Your argument assumes that the first two conditions I list are met by default, which is I think a strong assumption (Part 1). Assuming that is the case, however, your point suggests there might be a selection effect favoring agents that act in accordance with the moral truth, which might be stronger than the selection effect I depict for values that are more expansion-conducive than the moral truth. This is something I haven’t seriously considered and this made me update! Nonetheless, for your argument to be valid and strong, the orthogonality thesis has to be almost completely false, and I think we need more solid evidence to challenge that thesis (Part 2).
Part 1: Strong assumptionThis came about because there are facts about what preferences one should have, just like there exist facts about what decision theory one should use or what prior one should have, and species that manage to build intergalactic civilizations (or the equivalent in other universes) tend to discover all of these facts.
My understanding is that this scenario says the seven conditions I listed are met because it is actually trivial for a super-capable intergalactic civilization to meet those (or even required for it to become intergalactic in the first place, as you suggest later).
I think this is plausible for the following conditions:#3 They find something they recognize as a moral truth.
#4 They (unconditionally) accept it, even if it is highly counterintuitive.
#5 The thing they found is actually the moral truth. No normative mistake.
#6 They succeed at acting in accordance with it. No practical mistake.
#7 They stick to this forever. No value drift.
You might indeed expect that the most powerful civs figure out how to overcome these challenges, and that those who don’t are left behind.[1] This is something I haven’t seriously considered before, so thanks!
However, recall the first two conditions:There is a moral truth.
It is possible to “find it” and recognize it as such.
How capable a civilization is doesn’t matter when it comes to how likely these two are to be met. And while most metaethical debates focus only on 1, saying 1 is true is a much weaker claim than saying 1&2 is true (see, e.g., the naturalism vs non-naturalism controversy, which is I think only one piece of the puzzle).
Part 2: Challenging the orthogonality thesis
Then, you say that in this scenario you depictThere are occasional paperclip maximizers that arise, but they are a relatively minor presence or tend to be taken over by more sophisticated minds.
Maybe, but what I argue I that they are (occasional) “sophisticated minds” with values that are more expansion-conducive than the (potential) moral truth (e.g., because they have simple unconstrained goals such as “let’s just maximize for more life” or “for expansion itself”), and that they’re the ones who tend to take over.
But then you make this claim, which, if true, seems to sort of debunk my argument:you can’t become a truly powerful civilization without being able to “do philosophy” and be generally motivated by the results.
(Given the context in your comment, I assume that by “being able to do philosophy”, you mean “being able to do things like finding the moral truth”.)
But I don’t think this claim is true.[1] However, you made me update and I might update more once I read the posts of yours that you linked! :)- ^
I remain skeptical because this would imply the orthogonality thesis is almost completely false. Assuming there is a moral truth and that it is possible to “find” it and recognize it as such, I tentatively still believe that extremely powerful agents/civs with motivations misaligned with the moral truth are very plausible and not rare. You can at least imagine scenarios where they started aligned but then value drifted (without that making them significantly less powerful).
Thanks for writing this Jamie!
Concerning the “SHOULD WE FOCUS ON MORAL CIRCLE EXPANSION?” question, I think something like the following sub-question is also relevant: Will MCE lead to a “near miss” of the values we want to spread?Magnus Vinding (2018) argues that someone who cares about a given sentient being, is absolutely not guaranteed to wish what we think is the best for this sentient being. While he argues from a suffering-focused perspective, the problem is still the same from any ethical framework.
For instance, future people who “care” about wild animals and AS, will likely care about things that have nothing to do with their subjective experiences (e.g., their “freedom” or their “right to life”), which might lead them to do things that are arguably bad (e.g., creating a lot of faithful simulations of the Amazon rainforest), although well intentioned.
Even in a scenario where most people genuinely care about the welfare of non-humans, their standards to consider such welfare positive might be incredibly low.
Interesting! Thank you for writing this up. :)
It does seem plausible that, by evolutionary forces, biological nonhumans would care about the proliferation of sentient life about as much as humans do, with all the risks of great suffering that entails.
What about the grabby aliens, more specifically? Do they not, in expectation, care about proliferation (even) more than humans do?
All else being equal, it seems—at least to me—that civilizations with very strong pro-life values (i.e., that thinks that perpetuating life is good and necessary, regardless of its quality) colonize, in expectation, more space than compassionate civilizations willing to do the same only under certain conditions regarding others’ subjective experiences.
Then, unless we believe that the emergence of dominant pro-life values in any random civilization is significantly unlikely in the first place (I see a priori more reasons to assume the exact opposite), shouldn’t we assume that space is mainly being colonized by “life-maximizing aliens” who care about nothing but perpetuating life (including sentient life) as much as possible?
Since I’ve never read such an argument anywhere else (and am far from being an expert in this field), I guess that is has a problem that I don’t see.
EDIT: Just to be clear, I’m just trying to understand what the grabby aliens are doing, not to come to any conclusion about what we should do vis-à-vis the possibility of human-driven space colonization. :)
Thanks Oscar!
predicting future (hopefully wiser and better-informed) values for moral antirealists
Any reason to believe moral realists would be less interested in this empirical work? You seem to assume the goal is to update our values based on those of future people. While this can be a motivation (this is among those of Danaher 2021), we might also worry—independently from whether we are moral realists or antirealists—that the expected future evolution of values doesn’t point towards something wiser and better-informed (since that’s not what evolution is “optimizing” for; relevant examples in this comment), and want to change this trajectory.
Anticipating what could happen seems instrumentally useful for anyone who has long-term goals, no matter their take on meta-ethics, right?
Yeah so Danaher (2021) coined the term axiological futurism, but research on this topic has existed long before that. For instance, I find those two pieces particularly insightful:
Robin Hanson (1998) Burning the Cosmic Commons: Evolutionary Strategies for Interstellar Colonization
Nick Bostrom (2004) The Future of Human Evolution
They explore how compassionate values might be selected against because of evolutionary pressures, and be replaced by values more competitive for, e.g., space colonization races. In The Age of Em, Robin Hanson forecasts what would happen if whole brain emulation comes before de novo AGI, and arrives at similar conclusions.
I don’t think we can say they made “successful predictions” and settled the debate, but it seems like they came up with quite important considerations.
I intend to elaborate more on this kind of work in future posts within this sequence. :)
Interesting, thanks Ben! I definitely agree that this is the crux.
I’m sympathetic to the claim that “this algorithm would be less efficient than quicksort” and that this claim is generalizable.[1] However, if true, I think it only implies that suffering is—by default—inefficient as a motivation for an algorithm.
Right after making my crux claim, I reference some of Tobias Baumann’s (2022a, 2022b) work which gives some examples of how significant amounts of suffering may be instrumentally useful/required in cases such as scientific experiments where sentience plays a key role (where the suffering is not due to it being a strong motivator for an efficient algorithm, but for other reasons). Interestingly, his “incidental suffering” examples are more similar to the factory farming and human slavery examples than to the Quicksort example.- ^
To be fair, it’s been a while since I’ve read about stuff like suffering subroutines (see, e.g., Tomasik 2019) and its plausibility, and people might have raised considerations going against that claim.
- ^
Thanks for the comment!
Right now, in rich countries, we seem to live in an unusual period Robin Hanson (2009) calls “the Dream Time”. You can survive valuing pretty much whatever you want, which is why there isn’t much selection pressure on values. This likely won’t go on forever, especially if Humanity starts colonizing space.
(Re religion. This is anecdotical but since you brought up this example: in the past, I think religious people would have been much less successful at spreading their values if they were more concerned about the suffering of the people they were trying to convert. The growth of religion was far from being a harm-free process.)
Thanks for giving arguments pointing the other way! I’m not sure #1 is relevant to our context here, but #2 is definitely worth considering. In the second post of the present sequence, I argue that something like #2 probably doesn’t pan out, and we discuss an interesting counter-argument in this comment thread.
Unfortunately we are unable to sponsor visas, so applicants must be eligible to work in the US.
Isn’t it possible to simply contract (rather than employ) those who have or can get an ESTA, such that there’s no need for a visa?
I completely agree with 3 and it’s indeed worth clarifying. Even ignoring this, the possibility of humans being more compassionate than pro-life grabby aliens might actually be an argument against human-driven space colonization, since compassion—especially when combined with scope sensitivity—might increase agential s-risks related to potential catastrophic cooperation failure between AIs (see e.g., Baumann and Harris 2021, 46:24), which are the most worrying s-risks according to Jesse Clifton’s preface of CLR’s agenda. A space filled with life-maximizing aliens who don’t give a crap about welfare might be better than one filled with compassionate humans who create AGIs that might do the exact opposite of what they want (because of escalating conflicts and stuff). Obviously, uncertainty stays huge here.
Besides, 1 and 2 seem to be good counter-considerations, thanks! :)
I’m not sure I get why “Singletons about non-life-maximizing values are also convergent”, though. Do you—or anyone else reading this—can point at any reference that would help me understand this?
Thanks Vasco! Perhaps a nitpick but suffering still doesn’t seem to be the limiting factor per se, here. If farmed animals were philosophical zombies (i.e., were not sentient but still had the exact same needs), that wouldn’t change the fact that one needs to keep them in conditions that are ok enough to be able to make a profit out of them. The limiting factor is their physical needs, not their suffering itself. Do you agree?
I think the distinction is important because it suggests that suffering itself appears as a limiting factor only insofar as it is strong evidence of physical needs that are not met. And while both strongly correlate in the present, I argue that we should expect this to change.
Interesting! Thanks for writing this. Seems like a helpful summary of ideas related to s-risks from AI.
Another important normative reason for dedicating some attention to s-risks is that the future (conditional on humanity’s survival) is underappreciatedly likely to be negative -- or at least not very positive—from whatever plausible moral perspective, e.g., classical utilitarianism (see DiGiovanni 2021; Anthis 2022).
While this does not speak in favor of prioritizing s-risks per se, it obviously speaks against prioritizing X-risks which seem to be their biggest longtermist “competitors” at the moment.
(I have two unrelated remarks I’ll make in separate comments.)