I am the main organizer of Effective Altruism Cambridge (UK), a group of people who are thinking hard about how to help others the most and address the world’s most pressing problems through their careers.
Previously, I worked in organizations such as EA France (community director), Existential Risk Alliance (research fellow), and the Center on Long-Term Risk (events and community associate).
I’ve conducted research on various longtermist topics (some of it posted here on the EA Forum) and recently finished a Master’s in moral philosophy.
I’ve also written some stuff on LessWrong.
You can give me anonymous feedback here. :)
Jim Buhler
Great piece, thanks !
Since you devoted a subsection to moral circle expansion as a way of reducing s-risks, I guess you consider that its beneficial effects outweigh the backfire risks you mention (at least if MCE is done “in the right way”). CRS’ 2020 End-of-Year Fundraiser post also induces optimism regarding the impact of increasing moral consideration for artificial minds (the only remaining doubts seem to be about when and how to do it).
I wonder how confident we should be about this (the positiveness of MCE in reducing s-risks), at this point? Have you – or other researchers – made estimates confirming this, for instance? :)
EDIT: Your piece Arguments for and against moral advocacy (2017) already raises relevant considerations but perhaps your view on this issue is clearer now.
Michael’s definition of risks of disappointing futures doesn’t include s-risks though, right?
a disappointing future is when humans do not go extinct and civilization does not collapse or fall into a dystopia, but civilization[1] nonetheless never realizes its potential.
I guess we get something like “risks of negative (or nearly negative) future” adding up the two types.
Thank you for writing this.
According to a survey of quantitative predictions, disappointing futures appear roughly as likely as existential catastrophes. [More]
It looks like that Bostrom and Ord included risks of disappointing futures in their estimates on x-risks, which might make this conclusion a bit skewed, don’t you think?
Good point, and it is consistent with CLR’s s-risks definition. :)
Interesting! Thank you for writing this up. :)
It does seem plausible that, by evolutionary forces, biological nonhumans would care about the proliferation of sentient life about as much as humans do, with all the risks of great suffering that entails.
What about the grabby aliens, more specifically? Do they not, in expectation, care about proliferation (even) more than humans do?
All else being equal, it seems—at least to me—that civilizations with very strong pro-life values (i.e., that thinks that perpetuating life is good and necessary, regardless of its quality) colonize, in expectation, more space than compassionate civilizations willing to do the same only under certain conditions regarding others’ subjective experiences.
Then, unless we believe that the emergence of dominant pro-life values in any random civilization is significantly unlikely in the first place (I see a priori more reasons to assume the exact opposite), shouldn’t we assume that space is mainly being colonized by “life-maximizing aliens” who care about nothing but perpetuating life (including sentient life) as much as possible?
Since I’ve never read such an argument anywhere else (and am far from being an expert in this field), I guess that is has a problem that I don’t see.
EDIT: Just to be clear, I’m just trying to understand what the grabby aliens are doing, not to come to any conclusion about what we should do vis-à-vis the possibility of human-driven space colonization. :)
I completely agree with 3 and it’s indeed worth clarifying. Even ignoring this, the possibility of humans being more compassionate than pro-life grabby aliens might actually be an argument against human-driven space colonization, since compassion—especially when combined with scope sensitivity—might increase agential s-risks related to potential catastrophic cooperation failure between AIs (see e.g., Baumann and Harris 2021, 46:24), which are the most worrying s-risks according to Jesse Clifton’s preface of CLR’s agenda. A space filled with life-maximizing aliens who don’t give a crap about welfare might be better than one filled with compassionate humans who create AGIs that might do the exact opposite of what they want (because of escalating conflicts and stuff). Obviously, uncertainty stays huge here.
Besides, 1 and 2 seem to be good counter-considerations, thanks! :)
I’m not sure I get why “Singletons about non-life-maximizing values are also convergent”, though. Do you—or anyone else reading this—can point at any reference that would help me understand this?
Thanks for writing this Jamie!
Concerning the “SHOULD WE FOCUS ON MORAL CIRCLE EXPANSION?” question, I think something like the following sub-question is also relevant: Will MCE lead to a “near miss” of the values we want to spread?Magnus Vinding (2018) argues that someone who cares about a given sentient being, is absolutely not guaranteed to wish what we think is the best for this sentient being. While he argues from a suffering-focused perspective, the problem is still the same from any ethical framework.
For instance, future people who “care” about wild animals and AS, will likely care about things that have nothing to do with their subjective experiences (e.g., their “freedom” or their “right to life”), which might lead them to do things that are arguably bad (e.g., creating a lot of faithful simulations of the Amazon rainforest), although well intentioned.
Even in a scenario where most people genuinely care about the welfare of non-humans, their standards to consider such welfare positive might be incredibly low.
I haven’t received anything on my side. I think a confirmation by email would be nice, yes. Otherwise, I’ll send the application a second time just in case.
For EA group retreats, is it better to apply for the CEA event support you introduced, or for CEA’s support group funding?
Thanks :)
Caspar Oesterheld’s work on Evidential Cooperation in Large Worlds (ECL) shows that some fairly weak assumptions about the shape of the universe are enough to arrive at the conclusion that there is one optimal system of ethics: the compromise between all the preferences of all agents who cooperate with each other acausally. That would solve ethics for all practical purposes. It would therefore have enormous effects on a wide variety of fields because of how foundational ethics is.
ECL recommends that agents maximize a compromise utility function averaging their own and those of the agents that action-correlate with them (their “copies”). The compromise between me and my copies would look different from the compromise between you and your copies, right? So I could “solve ethics” for myself, but not for you, and vice versa. Ethics could be “solved” for everyone if all agents in the multiverse were action-correlated with each other to the exact same degree, which appears exceedingly unlikely. Do I miss something?
(Not a criticism of your proposal. I’m just trying to refine my understanding of ECL) :)
Thanks for the reply! :)
By “copies”, I meant “agents which action-correlate with you” (i.e., those which will cooperate if you cooperate), not “agents sharing your values”. Sorry for the confusion.
Do you think all agents thinking superrationaly action-correlate? This seems like a very strong claim to me. My impression is that the agents with a decision-algorithm similar enough to mine to (significantly) action-correlate with me is a very small subset of all superrationalists . As your post suggests, even your past-self doesn’t fully action-correlate with you (although you don’t need “full correlation” for cooperation to be worthwhile, of course).
In a one-shot prisoner’s dilemma, would you cooperate with anyone who agrees that superrationality is the way to go?
In his paper on ECL, Caspar Oesterheld says (section 2, p.9): “I will tend to make arguments from similarity of decision algorithms rather than from common rationality, because I hold these to be more rigorous and more applicable whenever there is not authority to tell my collaborators and me about our common rationality.”
However, he also often uses “the agents with a decision-algorithm similar enough to mine to (significantly) action-correlate with me” and “all superrationalists ” interchangeably, which confuses me a lot.
Oh interesting! Ok so I guess there are two possibilities.
1) Either by “supperrationalists”, you mean something stronger than “agents taking acausal dependences into account in PD-like situations”, which I thought was roughly Caspar’s definition in his paper. And then, I’d be even more confused.
2) Or you really think that taking acausal dependences into account is, by itself, sufficient to create a significant correlation in two decision-algorithms. In that case, how do you explain that I would defect against you and exploit you in one-shot PD (very sorry, I just don’t believe we correlate ^^), despite being completely on board with supperrationality? How is that not a proof that common supperrationality is insufficient?
(Btw, happy to jump on a call to talk about this if you’d prefer that over writing.)
Insightful! Thanks for writing this.
> Perhaps it will be possible to design AGI systems with goals that are cleanly separated from the rest of their cognition (e.g. as an explicit utility function), such that learning new facts and heuristics doesn’t change the systems’ values.
In that case, value lock-in is the default (unless corrigibility/uncertainty is somehow part of what the AGI values), such that there’s no need for the “stable institution” you keep mentioning, right?
> But the one example of general intelligence we have — humans — instead seem to store their values as a distributed combination of many heuristics, intuitions, and patterns of thought. If the same is true for AGI, it is hard to be confident that new experiences would not occasionally cause their values to shift.Therefore, it seems to me that most of your doc assumes we’re in this scenario? Is that the case? Did I widely misunderstand something?
Interesting! Thanks for writing this. Seems like a helpful summary of ideas related to s-risks from AI.
Another important normative reason for dedicating some attention to s-risks is that the future (conditional on humanity’s survival) is underappreciatedly likely to be negative -- or at least not very positive—from whatever plausible moral perspective, e.g., classical utilitarianism (see DiGiovanni 2021; Anthis 2022).
While this does not speak in favor of prioritizing s-risks per se, it obviously speaks against prioritizing X-risks which seem to be their biggest longtermist “competitors” at the moment.
(I have two unrelated remarks I’ll make in separate comments.)
(emphasis is mine)
For something to constitute an “s-risk” under this definition, the suffering involved not only has to be astronomical in scope (e.g., “more suffering than has existed on Earth so far”),[5] but also significant compared to other sources of expected future suffering. This last bit ensures that “s-risks,” assuming sufficient tractability, are always a top priority for suffering-focused longtermists.
Nitpick but you also need to assume sufficient likelihood, right? One might very well be a suffering-focused longtermist and think s-risks are tractable but so exceedingly unlikely that they’re not a top priority relative to, e.g., near-term animal suffering (especially if they prefer risk aversion over classic expected value reasoning).
(I’m not arguing that someone who thinks that is right. I actually think they’re probably very wrong. Just wanted to make sure we agree the definition of s-risks doesn’t include any claim about their likelihood .) :)
I really like the section S-risk reduction is separate from alignment work! I’ve been surprised by the extent to which people dismiss s-risks on the pretext that “alignment work will solve them anyway” (which is both insufficient and untrue as you pointed out).
I guess some of the technical work to reduce s-risks (e.g., preventing the “accidental” emergence of conflict-seeking preferences) can be considered a very specific kind of AI intent alignment (that only a few cooperative AI people are working on afaik) where we want to avoid worst-case scenarios.
But otherwise, do you think it’s fair to say that most s-risk work is focused AI capability issues (as opposed to intent alignment in Paul Christiano’s (2019) typology)? Even if the AI is friendly to humans’ values, it doesn’t mean it’d be capable of avoiding the failure modes / near-misses you refer to.
I usually frame things this way in discussions to make things clearer but I’m wondering if that’s the best framing...
Very interesting post, thanks for writing this!
1. Simulations are not the most efficient way for A and B to reach their agreement. Rather, writing out arguments or formal proofs about each other is much more computationally efficient, because nested arguments naturally avoid stack overflows in a way that nested simulations do not. In short, each of A and B can write out an argument about each other that self-validates without an infinite recursion. There are several ways to do this, such as using Löb’s Theorem-like constructions (as in this 2019 JSL paper), or even more simply and efficiently using Payor’s Lemma (as in this 2023 LessWrong post).
I’m wondering to what extent this is the exact same as Evidential Cooperation in Large Worlds, with which you don’t need simulations because you cooperate only with the agents that are decision-entangled with you (i.e., those you can prove will cooperate if you cooperate). While not needing simulation is an advantage, the big limitation of Evidential Cooperation in Large Worlds is that the sample of agents you can cooperate with is fairly small (since they need to be decision-entangled with you).
The whole point of nesting simulations—and classic acausal trade—is to create some form of artificial/”indirect” decision-entanglement with agents who would otherwise not be entangled with you (by creating a channel of “communication” that makes the players able to see what the other is actually playing so you can start implementing a tit-for-tat strategy). Without those simulations, you’re limited to the agents you can prove will necessarily cooperate if you cooperate (without any way to verify/coordinate via mutual simulation). (Although one might argue that you can hardly simulate agents you can’t prove anything about or are not (close to) decision-entangled with, anyway.)
So is your idea basically Evidential Cooperation in Large Worlds explained in another way or is it something in between that and classic acausal trade?
Thanks for writing this! :)
Another potential outcome that comes to mind regarding such projects is a self-fulfilling prophecy effect (provided the predictions are not secret). I have no idea how much of an (positive/negative) impact it would have though.