I’m Anthony DiGiovanni, a suffering-focused AI safety researcher at the Center on Long-Term Risk. I (occasionally) write about altruism-relevant topics on my Substack. All opinions my own.
Anthony DiGiovanni
Thanks for asking — you can read more about these two sources of s-risk in Section 3.2 of our new intro to s-risks article. (We also discuss “near miss” there, but our current best guess is that such scenarios are significantly less likely than other s-risks of comparable scale.)
I agree with your reasoning here—while I think working on s-risks from AI conflict is a top priority, I wouldn’t give Dawn’s argument for it. This post gives the main arguments for why some “rational” AIs wouldn’t avoid conflicts by default, and some high-level ways we could steer AIs into the subset that would.
I’ve found this super useful over the past several months—thanks!
Given that you can just keep doing better and better essentially indefinitely, and that GPT is not anywhere near the upper limit, talking about the difficulty of the task isn’t super meaningful.
I don’t understand this claim. Why would the difficulty of the task not be super meaningful when training to performance that isn’t near the upper limit?
In “Against neutrality...,” he notes that he’s not arguing for a moral duty to create happy people, and it’s just good “others things equal.” But, given that the moral question under opportunity costs is what practically matters, what are his thoughts on this view?: “Even if creating happy lives is good in some (say) aesthetic sense, relieving suffering has moral priority when you have to choose between these.” E.g., does he have any sympathy for the intuition that, if you could either press a button that treats someone’s migraine for a day or one that creates a virtual world with happy people, you should press the first one?
(I could try to shorten this if necessary, but worry about the message being lost from editorializing.)
I am (clearly) not Tobias, but I’d expect many people familiar with EA and LW would get something new out of Ch 2, 4, 5, and 7-11. Of these, seems like the latter half of 5, 9, and 11 would be especially novel if you’re already familiar with the basics of s-risks along the lines of the intro resources that CRS and CLR have published. I think the content of 7 and 10 is sufficiently crucial that it’s probably worth reading even if you’ve checked out those older resources, despite some overlap.
Anecdote: My grad school personal statement mentioned “Concrete Problems in AI Safety” and Superintelligence, though at a fairly vague level about the risks of distributional shift or the like. I got into some pretty respectable programs. I wouldn’t take this as strong evidence, of course.
I’m fine with other phrasings and am also concerned about value lock-in and s-risks though I think these can be thought of as a class of x-risks
I’m not keen on classifying s-risks as x-risks because, for better or worse, most people really just seem to mean “extinction or permanent human disempowerment” when they talk about “x-risks.” I worry that a motte-and-bailey can happen here, where (1) people include s-risks within x-risks when trying to get people on board with focusing on x-risks, but then (2) their further discussion of x-risks basically equates them with non-s-x-risks. The fact that the “dictionary definition” of x-risks would include s-risks doesn’t solve this problem.
e.g. 2 minds with equally passionate complete enthusiasm (with no contrary psychological processes or internal currencies to provide reference points) respectively for and against their own experience, or gratitude and anger for their birth (past or future). They can respectively consider a world with and without their existences completely unbearable and beyond compensation. But if we’re in the business of helping others for their own sakes rather than ours, I don’t see the case for excluding either one’s concern from our moral circle.
…
But when I’m in a mindset of trying to do impartial good I don’t see the appeal of ignoring those who would desperately, passionately want to exist, and their gratitude in worlds where they do.
I don’t really see the motivation for this perspective. In what sense, or to whom, is a world without the existence of the very happy/fulfilled/whatever person “completely unbearable”? Who is “desperate” to exist? (Concern for reducing the suffering of beings who actually feel desperation is, clearly, consistent with pure NU, but by hypothesis this is set aside.) Obviously not themselves. They wouldn’t exist in that counterfactual.
To me, the clear case for excluding intrinsic concern for those happy moments is:
“Gratitude” just doesn’t seem like compelling evidence in itself that the grateful individual has been made better off. You have to compare to the counterfactual. In daily cases with existing people, gratitude is relevant as far as the grateful person would have otherwise been dissatisfied with their state of deprivation. But that doesn’t apply to people who wouldn’t feel any deprivation in the counterfactual, because they wouldn’t exist.
I take it that the thrust of your argument is, “Ethics should be about applying the same standards we apply across people as we do for intrapersonal prudence.” I agree. And I also find the arguments for empty individualism convincing. Therefore, I don’t see a reason to trust as ~infallible the judgment of a person at time T that the bundle of experiences of happiness and suffering they underwent in times T-n, …, T-1 was overall worth it. They’re making an “interpersonal” value judgment, which, despite being informed by clear memories of the experiences, still isn’t incorrigible. Their positive evaluation of that bundle can be debunked by, say, this insight from my previous bullet point that the happy moments wouldn’t have felt any deprivation had they not existed.
In any case, I find upon reflection that I don’t endorse tradeoffs of contentment for packages of happiness and suffering for myself. I find I’m generally more satisfied with my life when I don’t have the “fear of missing out” that a symmetric axiology often implies. Quoting myself:
Another takeaway is that the fear of missing out seems kind of silly. I don’t know how common this is, but I’ve sometimes felt a weird sense that I have to make the most of some opportunity to have a lot of fun (or something similar), otherwise I’m failing in some way. This is probably largely attributable to the effect of wanting to justify the “price of admission” (I highly recommend the talk in this link) after the fact. No one wants to feel like a sucker who makes bad decisions, so we try to make something we’ve already invested in worth it, or at least feel worth it. But even for opportunities I don’t pay for, monetarily or otherwise, the pressure to squeeze as much happiness from them as possible can be exhausting. When you no longer consider it rational to do so, this pressure lightens up a bit. You don’t have a duty to be really happy. It’s not as if there’s a great video game scoreboard in the sky that punishes you for squandering a sacred gift.
...Having said that, I do think the “deeper intuition that the existing Ann must in some way come before need-not-ever-exist-at-all Ben” plausibly boils down to some kind of antifrustrationist or tranquilist intuition. Ann comes first because she has actual preferences (/experiences of desire) that get violated when she’s deprived of happiness. Not creating Ben doesn’t violate any preferences of Ben’s.
certainly don’t reflect the kinds of concerns expressed by Setiya that I was responding to in the OP
I agree. I happen to agree with you that the attempts to accommodate the procreation asymmetry without lexically disvaluing suffering don’t hold up to scrutiny. Setiya’s critique missed the mark pretty hard, e.g. this part just completely ignores that this view violates transitivity:
But the argument is flawed. Neutrality says that having a child with a good enough life is on a par with staying childless, not that the outcome in which you have a child is equally good regardless of their well-being. Consider a frivolous analogy: being a philosopher is on a par with being a poet—neither is strictly better or worse—but it doesn’t follow that being a philosopher is equally good, regardless of the pay.
appeal to some form of partiality or personal prerogative seems much more appropriate to me than denying the value of the beneficiaries
I don’t think this solves the problem, at least if one has the intuition (as I do) that it’s not the current existence of the people who are extremely harmed to produce happy lives that makes this tradeoff “very repugnant.” It doesn’t seem any more palatable to allow arbitrarily many people in the long-term future (rather than the present) to suffer for the sake of sufficiently many more added happy lives. Even if those lives aren’t just muzak and potatoes, but very blissful. (One might think that is “horribly evil” or “utterly disastrous,” and isn’t just a theoretical concern either, because in practice increasing the extent of space settlement would in expectation both enable many miserable lives and many more blissful lives.)
ETA: Ideally I’d prefer these discussions not involve labels like “evil” at all. Though I sympathize with wanting to treat this with moral seriousness!
I think such views have major problems, but I don’t talk about those problems in the book. (Briefly: If you think that any X outweighs any Y, then you seem forced to believe that any probability of X, no matter how tiny, outweighs any Y. So: you can either prevent a one in a trillion trillion trillion chance of someone with a suffering life coming into existence, or guarantee a trillion lives of bliss. The lexical view says you should do the former. This seems wrong, and I think doesn’t hold up under moral uncertainty, either. There are ways of avoiding the problem, but they run into other issues.)
It really isn’t clear to me that the problem you sketched is so much worse than the problems with total symmetric, average, or critical-level axiology, or the “intuition of neutrality.” In fact this conclusion seems much less bad than the Sadistic Conclusion or variants of that, which affect the latter three. So I find it puzzling how much attention you (and many other EAs writing about population ethics and axiology generally; I don’t mean to pick on you in particular!) devoted to those three views. And I’m not sure why you think this problem is so much worse than the Very Repugnant Conclusion (among other problems with outweighing views), either.
I sympathize with the difficulty of addressing so much content in a popular book. But this is a pretty crucial axiological debate that’s been going on in EA for some time, and it can determine which longtermist interventions someone prioritizes.
The Asymmetry endorses neutrality about bringing into existence lives that have positive wellbeing, and I argue against this view for much of the population ethics chapter, in the sections “The Intuition of Neutrality”, “Clumsy Gods: The Fragility of Identity”, and “Why the Intuition of Neutrality is Wrong”.
You seem to be using a different definition of the Asymmetry than Magnus is, and I’m not sure it’s a much more common one. On Magnus’s definition (which is also used by e.g. Chappell; Holtug, Nils (2004), “Person-affecting Moralities”; and McMahan (1981), “Problems of Population Theory”), bringing into existence lives that have “positive wellbeing” is at best neutral. It could well be negative.
The kind of Asymmetry Magnus is defending here doesn’t imply the intuition of neutrality, and so isn’t vulnerable to your critiques like violating transitivity, or relying on a confused concept of necessarily existing people.
Are you saying that from your and Teo’s POVs, there’s a way to ‘improve a mental state’ that doesn’t amount to decreasing suffering (/preventing it)?
No, that’s precisely what I’m denying. So, the reason I mentioned that “arbitrary” view was that I thought Jack might be conflating my/Teo’s view with one that (1) agrees that happiness intrinsically improves a mental state, but (2) denies that improving a mental state in this particular way is good (while improving a mental state via suffering-reduction is good).
Such an understanding seems plausible in a self-intimating way when one valence state transitions to the next, insofar as we concede that there are states of more or less pleasure, outside an negatively valanced states.
It’s prima facie plausible that there’s an improvement, sure, but upon reflection I don’t think my experience that happiness has varying intensities implies that moving from contentment to more intense happiness is an improvement. Analogously, you can increase the complexity and artistic sophistication of some painting, say, but if no one ever observes it (which I’m comparing to no one suffering from the lack of more intense happiness), there’s no “improvement” to the painting.
It seems that one could do this all the while maintaining that such improvements are never capable of outweighing the mitigation of problematic, suffering states.
You could, yeah, but I think “improvement” has such a strong connotation to most people that something of intrinsic value has been added. So I’d worry that using that language would be confusing, especially to welfarist consequentialists who think (as seems really plausible to me) that you should do an act to the extent that it improves the state of the world.
Some things I liked about What We Owe the Future, despite my disagreements with the treatment of value asymmetries:
The thought experiment of imagining that you live one big super-life composed of all sentient beings’ experiences is cool, as a way of probing moral intuitions. (I’d say this kind of thought experiment is the core of ethics.)
It seems better than e.g. Rawls’ veil of ignorance because living all lives (1) makes it more salient that the possibly rare extreme experiences of some lives still exist even if you’re (un)lucky enough not to go through them, and (2) avoids favoring average-utilitarian intuitions.
Although the devil is very much in the details of what measure of (dis)value the total view totals up, the critiques of average, critical level, and symmetric person-affecting views are spot-on.
There’s some good discussion of avoiding lock-in of bad (/not-reflected-upon) values as a priority that most longtermists can get behind.
I was already inclined to think dominant values can be very contingent on factors that don’t seem ethically relevant, like differences in reproduction rates (biological or otherwise) or flukes of power imbalances. So I didn’t update much from reading about this. But I have the impression that many longtermists are a bit too complacent about future people converging to the values we’d endorse with proper reflection (strangely, even when they’re less sympathetic to moral realism than I am). And the vignettes about e.g. Benjamin Lay were pretty inspiring.
Relatedly, it’s great that premature space settlement is acknowledged as a source of lock-in / reduction of option value. Lots of discourse on longtermism seems to gloss over this.
I think one crux here is that Teo and I would say, calling an increase in the intensity of a happy experience “improving one’s mental state” is a substantive philosophical claim. The kind of view we’re defending does not say something like, “Improvements of one’s mental state are only good if they relieve suffering.” I would agree that that sounds kind of arbitrary.
The more defensible alternative is that replacing contentment (or absence of any experience) with increasingly intense happiness / meaning / love is not itself an improvement in mental state. And this follows from intuitions like “If a mind doesn’t experience a need for change (and won’t do so in the future), what is there to improve?”
Is it thought experiments such as the ones Magnus has put forward? I think these argue that alleviating suffering is more pressing than creating happiness, but I don’t think these argue that creating happiness isn’t good.
I think they do argue that creating happiness isn’t intrinsically good, because you can always construct a version of the Very Repugnant Conclusion that applies to a view that says suffering is weighed some finite X times more than happiness, and I find those versions almost as repugnant. E.g. suppose that on classical utilitarianism we prefer to create 100 purely miserable lives plus some large N micro-pleasure lives over creating 10 purely blissful lives. On this new view, we’d prefer to create 100 purely miserable lives plus X*N micro-pleasure lives over the 10 purely blissful lives. Another variant you could try is a symmetric lexical view where only sufficiently blissful experiences are allowed to outweigh misery. But while some people find that dissolves the repugnance of the VRC, I can’t say the same.
Increasing the X, or introducing lexicalities, to try to escape the VRC just misses the point, I think. The problem is that (even super-awesome/profound) happiness is treated as intrinsically commensurable with miserable experiences, as if giving someone else happiness in itself solves the miserable person’s urgent problem. That’s just fundamentally opposed to what I find morally compelling.
(I like the monk example given in the other response to your question, anywho. I’ve written about why I find strong SFE compelling elsewhere, like here and here.)
You could try to use your pareto improvement argument here i.e. that it’s better if parents still have a preference for their child not to have been killed, but also not to feel any sort of pain related to it.
Yeah, that is indeed my response; I have basically no sympathy to the perspective that considers the pain intrinsically necessary in this scenario, or any scenario. This view seems to clearly conflate intrinsic with instrumental value. “Disrespect” and “grotesqueness” are just not things that seem intrinsically important to me, at all.
having a preference that the child wasn’t killed, but also not feeling any sort of hedonic pain about it...is this contradictory?
Depends how you define a preference, I guess, but the point of the thought experiment is to suspend your disbelief about the flow-through effects here. Just imagine that literally nothing changes about the world other than that the suffering is relieved. This seems so obviously better than the default that I’m at a loss for a further response.
This only applies to flavors of the Asymmetry that treat happiness as intrinsically valuable, such that you would pay to add happiness to a “neutral” life (without relieving any suffering by doing so). If the reason you don’t consider it good to create new lives with more happiness than suffering is that you don’t think happiness is intrinsically valuable, at least not at the price of increasing suffering, then you can’t get Dutch booked this way. See this comment.
My understanding is that:
Spite (as a preference we might want to reduce in AIs) has just been relatively well-studied compared to other malevolent preferences. If this subfield of AI safety were more mature there might be less emphasis on spite in particular.
(Less confident, haven’t thought that much about this:) It seems conceptually more straightforward what sorts of training environments are conducive to spite, compared to fanaticism (or fussiness or little-to-lose, for that matter).