Derek Shiller comments on Making AI Welfare an EA priority requires justifications that have not been given

Derek Shiller Jul 10, 2024, 12:26 AM
5 points
2 ∶ 2
For an intervention to be a longtermist priority, there needs to be some kind of concrete story for how it improves the long-term future.
I disagree with this. With existential risk from unaligned AI, I don’t think anyone has ever told a very clear story about how AI will actually get misaligned, get loose, and kill everyone. People have speculated about components of the story, but generally not in a super concrete way, and it isn’t clear how standard AI safety research would address a very specific disaster scenario. I don’t think this is a problem: we shouldn’t expect to know all the details of how things go wrong in advance, and it is worthwhile to do a lot of preparatory research that might be helpful so that we’re not fumbling through basic things during a critical period. I think the same applies to digital minds.
Your points here do not engage with the argument, made by @Zach Stein-Perlman early on in the week, that we can just punt solving AI welfare to the future (i.e., to the long reflection / to once we have aligned superintelligent advisors), and in the meantime continue focusing our resources on AI safety (i.e., on raising the probability that we make it to a long reflection).
I think this viewpoint is overly optimistic about the probability of locking in / the relevance of superintelligent advisors. I discuss some of the issues of locking in in a contribution to the debate week. In brief, I think that it is possible that digital minds will be sufficiently integrated in the next few decades that they will have power in social relationships that will be extremely difficult to disentangle. I also think that AGI may be useful in drawing inferences from our assumptions, but won’t be particularly helpful at setting the right assumptions.
- JWS 🔸Jul 10, 2024, 11:29 AM
  6 points
  2 ∶ 0
  Parent
  With existential risk from unaligned AI, I don’t think anyone has ever told a very clear story about how AI will actually get misaligned, get loose, and kill everyone.
  This should be evidence against AI x-risk!^[1] Even in the atmospheric ignition case in Trinity, they had more concrete models to use. If we can’t build a concrete model here, then it implies we don’t have a concrete/convincing case for why it should be prioritised at all, imo. It’s similar to the point in my footnotes that you need to argue for both p and p->q, not just the latter. This is what I would expect to see if the case for p was unconvincing/incorrect.
  I don’t think this is a problem: we shouldn’t expect to know all the details of how things go wrong in advance
  Yeah I agree with this. But the uncertainty and cluelessness in the future should decrease one’s confidence that they’re working on the most important thing in the history of humanity, one would think.
  and it is worthwhile to do a lot of preparatory research that might be helpful so that we’re not fumbling through basic things during a critical period. I think the same applies to digital minds.
  I’m all in favour of research, but how much should that research get funded? Can it be justified above other potential uses of money and general resource? Should it be an EA priority as defined by the AWDW framing? These we (almost) entirely unargued for.
  1. ^
    Not dispositive evidence perhaps, but a consideration
- Mo Putera Jul 10, 2024, 3:21 AM
  3 points
  1 ∶ 0
  Parent
  > For an intervention to be a longtermist priority, there needs to be some kind of concrete story for how it improves the long-term future.
  I disagree with this. With existential risk from unaligned AI, I don’t think anyone has ever told a very clear story about how AI will actually get misaligned, get loose, and kill everyone.
  When I read the passage you quoted I thought of e.g. Critch’s description of RAAPs and Christiano’s what failure looks like, both of which seem pretty detailed to me without necessarily fitting the “AI gets misaligned, gets loose and kills everyone” meme; both Critch and Christiano seem to me to be explicitly pushing back against consideration of only that meme, and Critch in particular thinks work in this area is ~neglected (as of 2021, I haven’t kept up with goings-on). I suppose Gwern’s writeup comes closest to your description, and I can’t imagine it being more concrete; curious to hear if you have a different reaction.