I am the main organizer of Effective Altruism Cambridge (UK), a group of people who are thinking hard about how to help others the most and address the world’s most pressing problems through their careers.
Previously, I worked in organizations such as EA France (community director), Existential Risk Alliance (research fellow), and the Center on Long-Term Risk (events and community associate).
I’ve conducted research on various longtermist topics (some of it posted here on the EA Forum) and recently finished a Master’s in moral philosophy.
I’ve also written some stuff on LessWrong.
You can give me anonymous feedback here. :)
Jim Buhler
Interesting, makes sense! Thanks for the clarification and for your thoughts on this! :)
If I want to prove that technological progress generally correlates with methods that involve more suffering, yes! Agreed.
But while the post suggests that this is a possibility, its main point is that suffering itself is not inefficient, such that there is no reason to expect progress and methods that involve less suffering to correlate by default (much weaker claim).
This makes me realize that the crux is perhaps this below part more than the claim we discuss above.
While I tentatively think the “the most efficient solutions to problems don’t seem like they involve suffering” claim is true if we limit ourselves to the present and the past, I think it is false once we consider the long-term future, which makes the argument break apart.Future solutions are more efficient insofar as they overcome past limitations. In the relevant examples that are enslaved humans and exploited animals, suffering itself is not a limiting factor. It is rather the physical limitations of those biological beings, relative to machines that could do a better job at their tasks.
I don’t see any inevitable dependence between their suffering and these physical limitations. If human slaves and exploited animals were not sentient, this wouldn’t change the fact that machines would do a better job.
Sorry for the confusion and thanks for pushing back! Helps me clarify what the claims made in this post imply and don’t imply. :)
I do not mean to argue that the future will be net negative. (I even make this disclaimer twice in the post, aha.) :)
I simply argue that the convergence between efficiency and methods that involve less suffering argument in favor of assuming it’ll be positive is unsupported.
There are many other arguments/considerations to take into account to assess the sign of the future.
Thanks!
Are you thinking about this primarily in terms of actions that autonomous advanced AI systems will take for the sake of optimisation?
Hum… not sure. I feel like my claims are very weak and true even in future worlds without autonomous advanced AIs.
“One large driver of humanity’s moral circle expansion/moral improvement has been technological progress which has reduced resource competition and allowed groups to expand concern for others’ suffering without undermining themselves”.Agreed but this is more similar to argument (A) fleshed out in this footnote, which is not the one I’m assailing in this post.
Thanks Vasco! Perhaps a nitpick but suffering still doesn’t seem to be the limiting factor per se, here. If farmed animals were philosophical zombies (i.e., were not sentient but still had the exact same needs), that wouldn’t change the fact that one needs to keep them in conditions that are ok enough to be able to make a profit out of them. The limiting factor is their physical needs, not their suffering itself. Do you agree?
I think the distinction is important because it suggests that suffering itself appears as a limiting factor only insofar as it is strong evidence of physical needs that are not met. And while both strongly correlate in the present, I argue that we should expect this to change.
Interesting, thanks Ben! I definitely agree that this is the crux.
I’m sympathetic to the claim that “this algorithm would be less efficient than quicksort” and that this claim is generalizable.[1] However, if true, I think it only implies that suffering is—by default—inefficient as a motivation for an algorithm.
Right after making my crux claim, I reference some of Tobias Baumann’s (2022a, 2022b) work which gives some examples of how significant amounts of suffering may be instrumentally useful/required in cases such as scientific experiments where sentience plays a key role (where the suffering is not due to it being a strong motivator for an efficient algorithm, but for other reasons). Interestingly, his “incidental suffering” examples are more similar to the factory farming and human slavery examples than to the Quicksort example.- ^
To be fair, it’s been a while since I’ve read about stuff like suffering subroutines (see, e.g., Tomasik 2019) and its plausibility, and people might have raised considerations going against that claim.
- ^
Thanks a lot, Alene! That’s motivating :)
Thanks, Maxime! This is indeed a relevant consideration I thought a tiny bit about, and Michael St. Jules also brought that up in a comment on my draft.
First of all, it is important to note that UCC affects the neglectedness—and potentially also the probability—of “late s-risks”, only (i.e., those that happen far away enough from now for the UCC selection to actually have the time to occur). So let’s consider only these late s-risks.
We might want to differentiate between three different cases:
1. Extreme UCC (where suffering is not just ignored but ends up being valued as in the scenario I depict in this footnote. In this case, all kinds of late s-risks seem not only more neglected but also more likely.
2. Strong UCC (where agents end up being roughly indifferent to suffering; this is the case your comment assumes I think). In this case, while all kinds of late s-risks seem more neglected, late s-risks from conflict seem indeed less likely. However, this doesn’t seem to apply to (at least) near-misses and incidental risks.
3. Weak UCC (where agents still care about suffering but much less than we do). In this case, same as above, except perhaps for the “late s-risks from conflict” part. I don’t know how weak UCC would change conflict dynamics.
The more we expect #2 more than #1 and #3, the more your point applies, I think (with the above caveat on near-misses and incidental risks). I might definitely have missed something, though. It’s a bit complicated.
Thanks for the comment!
Right now, in rich countries, we seem to live in an unusual period Robin Hanson (2009) calls “the Dream Time”. You can survive valuing pretty much whatever you want, which is why there isn’t much selection pressure on values. This likely won’t go on forever, especially if Humanity starts colonizing space.
(Re religion. This is anecdotical but since you brought up this example: in the past, I think religious people would have been much less successful at spreading their values if they were more concerned about the suffering of the people they were trying to convert. The growth of religion was far from being a harm-free process.)
Thanks Will! :)
I think I haven’t really thought about this possibility.
I know nothing about how things like false vacuum decay work (thankfully, I guess), about how tractable it is, and about how the minds of the agents would work on trying to trigger that operate. And my immediate impression is that these things matter a lot to whether my responses to the first two “obvious objections” sort of apply here as well and to whether “decay-conducive values” might be competitive.
However, I think we can at least confidently say that—at least in the intra-civ selection context (see my previous post) -- a potential selection effect non-trivially favoring “decay-conducive values”, during the space colonization process, seems much less straightforward and obvious than the selection effect progressively favoring agents that are more and more upside-focused (on long-time scales with many bits of selection). The selection steps are not the same in these two different cases and the potential dynamic that might lead decay-conducive values to take over seems more complex and fragile.
Thanks for giving arguments pointing the other way! I’m not sure #1 is relevant to our context here, but #2 is definitely worth considering. In the second post of the present sequence, I argue that something like #2 probably doesn’t pan out, and we discuss an interesting counter-argument in this comment thread.
Thanks Miranda! :)
I personally think the strongest argument for reducing malevolence is its relevance for s-risks (see section Robustness: Highly beneficial even if we fail at alignment), since I believe s-risks are much more neglected than they should be.
And the strongest counter-considerations for me would beUncertainty regarding the value of the future. I’m generally much more excited about making the future go better rather than “bigger” (reducing X-risk does the latter), so the more reducing malevolence does the latter more than the former, the less certain I am it should be a priority. (Again, this applies to any kind of work that reduces X-risks, though.)
Info / attention hazards. Perhaps the best way to avoid these malevolence scenarios is to ignore them and avoid making them more salient.
Interesting question you asked, thanks! I added a link to this comment in a footnote.
Right so assuming no early value lock-in and the values of the AGI being (at least somewhat) controlled/influenced by its creators, I imagine these creators to have values that are grabby to varying extents, and these values are competing against one another in the big tournament that is cultural evolution.
For simplicity, say there are only two types of creators: the pure grabbers (who value grabbing (quasi-)intrinsically) and the safe grabbers (who are in favor of grabbing only if it is done in a “safe” way, whatever that means).Since we’re assuming there hasn’t been any early value lock-in, the AGI isn’t committed to some form of compromise between the values of the pure and safe grabbers. Therefore, you can imagine that the AGI allows for competition and helps both groups accomplish what they want proportionally to their size, or something like that. From there, I see two plausible scenarios:
A) The pure and safe grabbers are two cleanly separated groups running a space expansion race against one another, and we should—all else equal—expect the pure grabbers to win, for the same reasons why we should—all else equal—expect the AGI race to be won by the labs optimizing for AI capabilities rather than for AI safety.
B) The safe grabbers “infiltrate” the pure grabbers in an attempt to make their space-expansion efforts “safer”, but are progressively selected against since they drag the pure-grabby project down. The few safe grabbers who might manage not to value drift and not to get kicked out of the pure grabbers are those who are complacent and not pushing really hard for more safety.
The reason why the intra-civ grabby values selection is currently fairly weak on Earth, as you point out, is that humans didn’t even start colonizing space, which makes something like A or B very unlikely to have happened yet. Arguably, the process that may eventually lead to something like A or B hasn’t even begun for real. We’re unlikely to notice a selection for grabby values before people actually start running something like a space expansion race. And most of those we might expect to want to somehow get involved in the potential[1] space expansion race are currently focused on the race to AGI, which makes sense. It seems like this latter race is more relevant/pressing, right now.- ^
It seems like this race will happen (or actually be worth running) if, and only if, AGI has non-locked-in values and is corrigible(-ish) and aligned(-ish) with its creators, as we suggested.
- ^
Thanks a lot for this comment! I linked to it in a footnote. I really like this breakdown of different types of relevant evolutionary dynamics. :)
Thanks for the comment! :) You’re assuming that the AGI’s values will be pretty much locked-in forever once it is deployed such that the evolution of values will stop, right? Assuming this, I agree. But I can also imagine worlds where the AGI is made very corrigible (such that the overseers stay in control of the AGI’s values) and where intra-civ value evolution continues/accelerates. I’d be curious if you see reasons to think these worlds are unlikely.
If you had to remake this 3D sim of the expansion of grabby aliens based on your beliefs, what would look different, exactly? (Sorry, I know you already answer this indirectly throughout the post, at least partially.)
Do you have any reading to suggest on that topic? I’d be curious to understand that position more :)
Insightful! Thanks for taking the time to write these.
failing to act in perfect accord with the moral truth does not mean you’re not influenced by it at all. Humans fail your conditions 4-7 and yet are occasionally influenced by moral facts in ways that matter.
Agreed and I didn’t mean to argue against that so thanks for clarifying! Note however that the more you expect the moral truth to be fragile/complex, the further from it you should expect agents’ actions to be.
you expect intense selection within civilizations, such that their members behave so as to maximize their own reproductive success.
Hum… I don’t think the “such that...” part logically follows. I don’t think this is how selection effects work. All I’m saying is that those who are the most bullish on space colonization will colonize more space.
I’m not sure what to say regarding your last two points. I think I need to think/read more, here. Thanks :)
Very interesting, Wei! Thanks a lot for the comment and the links.
TL;DR of my response: Your argument assumes that the first two conditions I list are met by default, which is I think a strong assumption (Part 1). Assuming that is the case, however, your point suggests there might be a selection effect favoring agents that act in accordance with the moral truth, which might be stronger than the selection effect I depict for values that are more expansion-conducive than the moral truth. This is something I haven’t seriously considered and this made me update! Nonetheless, for your argument to be valid and strong, the orthogonality thesis has to be almost completely false, and I think we need more solid evidence to challenge that thesis (Part 2).
Part 1: Strong assumptionThis came about because there are facts about what preferences one should have, just like there exist facts about what decision theory one should use or what prior one should have, and species that manage to build intergalactic civilizations (or the equivalent in other universes) tend to discover all of these facts.
My understanding is that this scenario says the seven conditions I listed are met because it is actually trivial for a super-capable intergalactic civilization to meet those (or even required for it to become intergalactic in the first place, as you suggest later).
I think this is plausible for the following conditions:#3 They find something they recognize as a moral truth.
#4 They (unconditionally) accept it, even if it is highly counterintuitive.
#5 The thing they found is actually the moral truth. No normative mistake.
#6 They succeed at acting in accordance with it. No practical mistake.
#7 They stick to this forever. No value drift.
You might indeed expect that the most powerful civs figure out how to overcome these challenges, and that those who don’t are left behind.[1] This is something I haven’t seriously considered before, so thanks!
However, recall the first two conditions:There is a moral truth.
It is possible to “find it” and recognize it as such.
How capable a civilization is doesn’t matter when it comes to how likely these two are to be met. And while most metaethical debates focus only on 1, saying 1 is true is a much weaker claim than saying 1&2 is true (see, e.g., the naturalism vs non-naturalism controversy, which is I think only one piece of the puzzle).
Part 2: Challenging the orthogonality thesis
Then, you say that in this scenario you depictThere are occasional paperclip maximizers that arise, but they are a relatively minor presence or tend to be taken over by more sophisticated minds.
Maybe, but what I argue I that they are (occasional) “sophisticated minds” with values that are more expansion-conducive than the (potential) moral truth (e.g., because they have simple unconstrained goals such as “let’s just maximize for more life” or “for expansion itself”), and that they’re the ones who tend to take over.
But then you make this claim, which, if true, seems to sort of debunk my argument:you can’t become a truly powerful civilization without being able to “do philosophy” and be generally motivated by the results.
(Given the context in your comment, I assume that by “being able to do philosophy”, you mean “being able to do things like finding the moral truth”.)
But I don’t think this claim is true.[1] However, you made me update and I might update more once I read the posts of yours that you linked! :)- ^
I remain skeptical because this would imply the orthogonality thesis is almost completely false. Assuming there is a moral truth and that it is possible to “find” it and recognize it as such, I tentatively still believe that extremely powerful agents/civs with motivations misaligned with the moral truth are very plausible and not rare. You can at least imagine scenarios where they started aligned but then value drifted (without that making them significantly less powerful).
Interesting, thanks for sharing your thoughts on the process and stuff! (And happy to see the post published!) :)