Ryan Greenblatt comments on Clarifying two uses of “alignment”

Ryan Greenblatt 10 Mar 2024 23:39 UTC
5 points
0 ∶ 0
Of course, in this scenario, it would still be nice if AIs cared about exactly what we cared about; but even if they don’t, we aren’t necessarily made worse off as a result of building them. If they share our preferences, that would simply be a nice bonus for us. The future could still be bright for humans even if the universe is eventually filled with entities whose preferences we do not ultimately share.
From a scope sensitive (linear returns) longtermist perspective, we’re potentially much worse off.
If we built aligned AIs, we would acquire 100% of the value (from humanity’s perspective). If we built misaligned AIs that end up keeping humans alive and happy but don’t directly care about anything we value, we might directly acquire vastly less than this, perhaps 1 millionth of the scope sensitive value. (Note that we might recover some value (e.g. 10%) from acausal trade, I’m not counting this in the direct value.)
Perhaps you think this view is worth dismising because either:
- You think humanity wouldn’t do things which are better than what AIs would do, so it’s unimportant. (E.g. because humanity is 99.9% selfish. I’m skeptical of this particular argument, I think this is going to be more like 50% selfish and the naive billionare extrapolation is more like 90% selfish.)
- You think scope sensitive (linear returns) isn’t worth putting a huge amount of weight on.
To be clear, it’s important to not equivocate between:
- AI takeover might be violent and clearly horrible for existing people.
- AI takeover might result in resources being allocated in massively suboptimal ways from the perspective of scope sensitive humans.
I think both are probably true, but these are separate.
Edit: I clarified some language a bit.
- Matthew_Barnett 10 Mar 2024 23:54 UTC
  2 points
  0 ∶ 0
  Parent
  Perhaps you think this view is worth dismising because either:
  You think humanity wouldn’t do things which are better than what AIs would do, so it’s unimportant. (E.g. because humanity is 99.9% selfish. I’m skeptical, I think this is going to be more like 50% selfish and the naive billionare extrapolation is more like 90% selfish.)
  From an impartial (non-selfish) perspective, yes, I’m not particularly attached to human economic consumption relative to AI economic consumption. In general, my utilitarian intuitions are such that I don’t have a strong preference for humans over most “default” unaligned AIs, except insofar as this conflicts with my preferences for existing people (including myself, my family, friends etc.).
  I’d additionally point out that AIs could be altruistic too. Indeed, it seems plausible to me they’ll be even more altruistic than humans, since the AI training process is likely to deliberately select for altruism, whereas human evolution directly selected for selfishness (at least on the gene level, if not the personal level too).
  This is a topic we’ve touched on several times before, and I agree you’re conveying my views — and our disagreement — relatively accurately overall.
  You think scope sensitive (linear returns) isn’t worth putting a huge amount of weight on.
  I also think this, yes. For example, we could consider the following bets:
  1. 99% chance of 1% of control over the universe, and a 1% chance of 0% control
  2. 10% chance of 90% of control over the universe, and a 90% chance of 0% control
  According to a scope sensitive calculation, the second gamble is better than the first. Yet, from a personal perspective, I’d prefer (1) under a wide variety of assumptions.