Researcher at the Center on Long-Term Risk. I (occasionally) write about altruism-relevant topics on my Substack. All opinions my own.
Anthony DiGiovanni
[Question] Neartermist crucial considerations?
(This post was coauthored by Jesse Clifton — crossposting from LW doesn’t seem to show this, unfortunately.)
Winning isn’t enough
As nicely discussed in this comment, the key ideas of UDT and LDT seem to have been predated by, respectively, “resolute choice” and Spohn’s variant of CDT. (It’s not entirely clear to me how UDT or LDT are formally specified, though, and in my experience people seem to equivocate between different senses of “UDT”.)
Updateless decision theory and logical decision theory
It seems to me that you need to weight the probability functions in your set according to some intuitive measure of your plausibility, according to your own priors.
The concern motivating the use of imprecise probabilities is that you don’t always have a unique prior you’re justified in using to compare the plausibility of these distributions. In some cases you’ll find that any choice of unique prior, or unique higher-order distribution for aggregating priors, involves an arbitrary choice. (E.g., arbitrary weights assigned to conflicting intuitions about plausibility.)
[Question] What are your cruxes for imprecise probabilities / decision rules?
I don’t think you need to be ambiguity / risk averse to be worried about robustness of long-term causes. You could think that (1) the long term is extremely complex and (2) any paths to impact on such a complex system that humans right now can conceive of will be too brittle to model errors.
It’s becoming increasingly apparent to me how strong an objection to longtermist interventions this comment is. I’d be very keen to see more engagement with this model.
My own current take: I hold out some hope that our ability to forecast long-term effects, at least under some contingencies within our lifetimes, will be not-terrible enough. And I’m more sympathetic to straightforward EV maximization than you are. But the probability of systematically having a positive long-term impact by choosing any given A over B seems much smaller than longtermists act as if is the case — in particular, it does seem to be in Pascal’s mugging territory.
My understanding is that:
Spite (as a preference we might want to reduce in AIs) has just been relatively well-studied compared to other malevolent preferences. If this subfield of AI safety were more mature there might be less emphasis on spite in particular.
(Less confident, haven’t thought that much about this:) It seems conceptually more straightforward what sorts of training environments are conducive to spite, compared to fanaticism (or fussiness or little-to-lose, for that matter).
Thanks for asking — you can read more about these two sources of s-risk in Section 3.2 of our new intro to s-risks article. (We also discuss “near miss” there, but our current best guess is that such scenarios are significantly less likely than other s-risks of comparable scale.)
I agree with your reasoning here—while I think working on s-risks from AI conflict is a top priority, I wouldn’t give Dawn’s argument for it. This post gives the main arguments for why some “rational” AIs wouldn’t avoid conflicts by default, and some high-level ways we could steer AIs into the subset that would.
I’ve found this super useful over the past several months—thanks!
Given that you can just keep doing better and better essentially indefinitely, and that GPT is not anywhere near the upper limit, talking about the difficulty of the task isn’t super meaningful.
I don’t understand this claim. Why would the difficulty of the task not be super meaningful when training to performance that isn’t near the upper limit?
In “Against neutrality...,” he notes that he’s not arguing for a moral duty to create happy people, and it’s just good “others things equal.” But, given that the moral question under opportunity costs is what practically matters, what are his thoughts on this view?: “Even if creating happy lives is good in some (say) aesthetic sense, relieving suffering has moral priority when you have to choose between these.” E.g., does he have any sympathy for the intuition that, if you could either press a button that treats someone’s migraine for a day or one that creates a virtual world with happy people, you should press the first one?
(I could try to shorten this if necessary, but worry about the message being lost from editorializing.)
I am (clearly) not Tobias, but I’d expect many people familiar with EA and LW would get something new out of Ch 2, 4, 5, and 7-11. Of these, seems like the latter half of 5, 9, and 11 would be especially novel if you’re already familiar with the basics of s-risks along the lines of the intro resources that CRS and CLR have published. I think the content of 7 and 10 is sufficiently crucial that it’s probably worth reading even if you’ve checked out those older resources, despite some overlap.
Anecdote: My grad school personal statement mentioned “Concrete Problems in AI Safety” and Superintelligence, though at a fairly vague level about the risks of distributional shift or the like. I got into some pretty respectable programs. I wouldn’t take this as strong evidence, of course.
I’m fine with other phrasings and am also concerned about value lock-in and s-risks though I think these can be thought of as a class of x-risks
I’m not keen on classifying s-risks as x-risks because, for better or worse, most people really just seem to mean “extinction or permanent human disempowerment” when they talk about “x-risks.” I worry that a motte-and-bailey can happen here, where (1) people include s-risks within x-risks when trying to get people on board with focusing on x-risks, but then (2) their further discussion of x-risks basically equates them with non-s-x-risks. The fact that the “dictionary definition” of x-risks would include s-risks doesn’t solve this problem.
e.g. 2 minds with equally passionate complete enthusiasm (with no contrary psychological processes or internal currencies to provide reference points) respectively for and against their own experience, or gratitude and anger for their birth (past or future). They can respectively consider a world with and without their existences completely unbearable and beyond compensation. But if we’re in the business of helping others for their own sakes rather than ours, I don’t see the case for excluding either one’s concern from our moral circle.
…
But when I’m in a mindset of trying to do impartial good I don’t see the appeal of ignoring those who would desperately, passionately want to exist, and their gratitude in worlds where they do.
I don’t really see the motivation for this perspective. In what sense, or to whom, is a world without the existence of the very happy/fulfilled/whatever person “completely unbearable”? Who is “desperate” to exist? (Concern for reducing the suffering of beings who actually feel desperation is, clearly, consistent with pure NU, but by hypothesis this is set aside.) Obviously not themselves. They wouldn’t exist in that counterfactual.
To me, the clear case for excluding intrinsic concern for those happy moments is:
“Gratitude” just doesn’t seem like compelling evidence in itself that the grateful individual has been made better off. You have to compare to the counterfactual. In daily cases with existing people, gratitude is relevant as far as the grateful person would have otherwise been dissatisfied with their state of deprivation. But that doesn’t apply to people who wouldn’t feel any deprivation in the counterfactual, because they wouldn’t exist.
I take it that the thrust of your argument is, “Ethics should be about applying the same standards we apply across people as we do for intrapersonal prudence.” I agree. And I also find the arguments for empty individualism convincing. Therefore, I don’t see a reason to trust as ~infallible the judgment of a person at time T that the bundle of experiences of happiness and suffering they underwent in times T-n, …, T-1 was overall worth it. They’re making an “interpersonal” value judgment, which, despite being informed by clear memories of the experiences, still isn’t incorrigible. Their positive evaluation of that bundle can be debunked by, say, this insight from my previous bullet point that the happy moments wouldn’t have felt any deprivation had they not existed.
In any case, I find upon reflection that I don’t endorse tradeoffs of contentment for packages of happiness and suffering for myself. I find I’m generally more satisfied with my life when I don’t have the “fear of missing out” that a symmetric axiology often implies. Quoting myself:
Another takeaway is that the fear of missing out seems kind of silly. I don’t know how common this is, but I’ve sometimes felt a weird sense that I have to make the most of some opportunity to have a lot of fun (or something similar), otherwise I’m failing in some way. This is probably largely attributable to the effect of wanting to justify the “price of admission” (I highly recommend the talk in this link) after the fact. No one wants to feel like a sucker who makes bad decisions, so we try to make something we’ve already invested in worth it, or at least feel worth it. But even for opportunities I don’t pay for, monetarily or otherwise, the pressure to squeeze as much happiness from them as possible can be exhausting. When you no longer consider it rational to do so, this pressure lightens up a bit. You don’t have a duty to be really happy. It’s not as if there’s a great video game scoreboard in the sky that punishes you for squandering a sacred gift.
If I understand correctly, you’re arguing that we either need to:
Put precise estimates on the consequences of what we do for net welfare across the cosmos, and maximize EV w.r.t. these estimates, or
Go with our gut … which is just implicitly putting precise estimates on the consequences of what we do for net welfare across the cosmos, and maximizing EV w.r.t. these estimates.
I think this is a false dichotomy,[1] even for those who are very confident in impartial consequentialism and risk-neutrality (as I am!). If (as suggested by titotal’s comment) you worry that precise estimates of net welfare conditional on different actions are themselves vibes-based, you have option 3: Suspend judgment on the consequences of what we do for net welfare across the cosmos, and instead make decisions for reasons other than “my [explicit or implicit] estimate of the effects of my action on net welfare says to do X.” (Coherence theorems don’t rule this out.)
What might those other reasons be? A big one is moral uncertainty: If you truly think impartial consequentialism doesn’t give you compelling reasons either way, because our estimates of net welfare are hopelessly arbitrary, it seems better to follow the verdicts of other moral views you put some weight on. Another alternative is to reflect more on what your reasons for action are exactly, if not “maximize EV w.r.t. vibes-based estimates.” You can ask yourself, what does it mean to make the world a better place impartially, under deep uncertainty? If you’ve only looked at altruistic prioritization from the perspective of options 1 or 2, and didn’t realize 3 was on the table, I find it pretty plausible that (as a kind of bedrock meta-normative principle) you ought to clarify the implications of option 3. Maybe you can find non-vibes-based decision procedures for impartial consequentialists. ETA: Ch. 5 of Bradley (2012) is an example of this kind of research, not to say I necessarily endorse his conclusions.
(Just to be clear, I totally agree with your claim that we shouldn’t dismiss shrimp welfare — I don’t think we’re clueless about that, though the tradeoffs with other animal causes might well be difficult.)
This is also my reply to Michael’s comments here and here.