Rohin Shah comments on Analyzing the moral value of unaligned AIs

Rohin Shah 30 Apr 2024 19:29 UTC
2 points
0 ∶ 0
I was arguing that trying to preserve the present generation of humans looks good according to (2), not (1).
I was always thinking about (1), since that seems like the relevant thing. When I agreed with you that generational value drift seems worrying, that’s because it seems bad by (1). I did not mean to imply that I should act to maximize (2). I agree that if you want to act to maximize (2) then you should probably focus on preserving the current generation.
In my post, I fairly explicitly argued that the rough level of utilitarian values exhibited by humans is likely not very contingent, in the sense of being unusually high compared to other possibilities—and this was a crucial element of my thesis. This idea was particularly important for the section discussing whether unaligned AIs will be more or less utilitarian than humans.
Fwiw, I reread the post again and still failed to find this idea in it, and am still pretty confused at what argument you are trying to make.
At this point I think we’re clearly failing to communicate with each other, so I’m probably going to bow out, sorry.
- Matthew_Barnett 30 Apr 2024 19:41 UTC
  2 points
  0 ∶ 0
  Parent
  Fwiw, I reread the post again and still failed to find this idea in it
  I’m baffled by your statement here. What did you think I was arguing when discussed whether “aligned AIs are more likely to have a preference for creating new conscious entities, furthering utilitarian objectives”? The conclusion of that section was that aligned AIs are plausibly not more likely to have such a preference, and therefore, human utilitarian preferences here are not “unusually high compared to other possibilities” (the relevant alternative possibility here being unaligned AI).
  This was a central part of my post that I discussed at length. The idea that unaligned AIs might be similarly utilitarian or even more so, compared to humans, was a crucial part of my argument. If indeed unaligned AIs are very likely to be less utilitarian than humans, then much of my argument in the first section collapses, which I explicitly acknowledged.
  I consider your statement here to be a valuable data point about how clear my writing was and how likely I am to get my ideas across to others who read the post. That said, I believe I discussed this point more-or-less thoroughly.
  ETA: Claude 3′s summary of this argument in my post:
  The post argued that the level of utilitarian values exhibited by humans is likely not unusually high compared to other possibilities, such as those of unaligned AIs. This argument was made in the context of discussing whether aligned AIs are more likely to have a preference for creating new conscious entities, thereby furthering utilitarian objectives.
  The author presented several points to support this argument:
  Only a small fraction of humans are total utilitarians, and most humans do not regularly express strong preferences for adding new conscious entities to the universe.
  Some human moral intuitions directly conflict with utilitarian recommendations, such as the preference for habitat preservation over intervention to improve wild animal welfare.
  Unaligned AI preferences are unlikely to be completely alien or random compared to human preferences if the AIs are trained on human data. By sharing moral concepts with humans, unaligned AIs could potentially be more utilitarian than humans, given that human moral preferences are a mix of utilitarian and anti-utilitarian intuitions.
  Even in an aligned AI scenario, the consciousness of AIs will likely be determined mainly by economic efficiency factors during production, rather than by moral considerations.
  The author concluded that these points undermine the idea that unaligned AI moral preferences will be clearly less utilitarian than the moral preferences of most humans, which are already not very utilitarian. This suggests that the level of utilitarian values exhibited by humans is likely not unusually high compared to other possibilities, such as those of unaligned AIs.
  - Rohin Shah 30 Apr 2024 20:02 UTC
    2 points
    0 ∶ 0
    Parent
    I agree it’s clear that you claim that unaligned AIs are plausibly comparably utilitarian as humans, maybe more.
    What I didn’t find was discussion of how contingent utilitarianism is in humans.
    Though actually rereading your comment (which I should have done in addition to reading the post) I realize I completely misunderstood what you meant by “contingent”, which explains why I didn’t find it in the post (I thought of it as meaning “historically contingent”). Sorry for the misunderstanding.
    Let me backtrack like 5 comments and retry again.