Ryan Greenblatt comments on Matthew_Barnett’s Quick takes

Ryan Greenblatt 5 May 2024 17:20 UTC
4 points
1 ∶ 0
In other words, agents optimizing for their own happiness, or the happiness of those they care about, seem likely to be the primary force behind the creation of hedonium-like structures. They may not frame it in utilitarian terms, but they will still be striving to maximize happiness and well-being for themselves and others they care about regardless. And it seems natural to assume that, with advanced technology, they would optimize pretty hard for their own happiness and well-being, just as a utilitarian might optimize hard for happiness when creating hedonium.
Suppose that a single misaligned AI takes control and it happens to care somewhat about its own happiness while not having any more “altruistic” tendencies that I would care about or you would care about. (I think misaligned AIs which seize control caring about their own happiness substantially seems less likely than not, but let’s suppose this for now.) (I’m saying “single misaligned AI” for simplicity, I get that a messier coalition might be in control.) It now has access to vast amounts of computation after sending out huge numbers of probes to take control over all available energy. This is enough computation to run absolutely absurd amounts of stuff.
What are you imagining it spends these resources on which is competitive with optimized goodness? Running >10^50 copies of itself which are heavily optimized for being as happy as possible while spending?
If a small number of agents have a vast amount of power, and these agents don’t (eventually, possibly after a large amount of thinking) want to do something which is de facto like the values I end up caring about upon reflection (which is probably, though not certainly, vaguely like utilitarianism in some sense), then from my perspective it seems very likely that the resources will be squandered.
If you’re imagining something like:
1. It thinks carefully about what would make “it” happy.
2. It realizes it cares about having as many diverse good experience moments as possible in a non-indexical way.
3. It realizes that heavy self-modification would result in these experience moments being better and more efficient, so it creates new versions of “itself” which are radically different and produce more efficiently good experiences.
4. It realizes it doesn’t care much about the notion of “itself” here and mostly just focuses on good experiences.
5. It runs vast numbers of such copies with diverse experiences.
Then this is just something like utilitarianism by another name via a differnet line of reasoning.
I thought your view was that step (2) in this process won’t go like this. E.g., currently self-ish entities will retain indexical preferences. If so, then I do see where the goodness can plausibly come from.
The fact that our current world isn’t well described by the idea that what matters most is the number of explicit utilitarians, strengthens my point here.
When I look at very rich people (people with >$1 billion), it seems like the dominant way they make the world better via spending money (not via making money!) is via thoughtful altuistic giving not via consumption.
Perhaps your view is that with the potential for digital minds this situation will change?
(Also, it seems very plausible to me that the dominant effect on current welfare is driven mostly by the effect on factory farming and other animal welfare.)
I expect this trend to further increase as people get much, much wealthier and some fraction (probably most) of them get much, much smarter and wiser with intelligence augmentation.
What links here?
- Ryan Greenblatt's comment on Matthew_Barnett’s Quick takes by Matthew_Barnett (5 May 2024 18:31 UTC; 7 points)