> It is not good enough to simply say that an issue might have a large scale impact and therefore think it should be an EA priority [...]
I think that this is wrong. The fact that something might have a huge scale and we might be able to do something about it is enough for it to be taken seriously and provides prima facie evidence that it should be a priority. I think it is vastly preferrable [sic] to preempt problems before they occur rather than try to fix them once they have. For one, AI welfare is a very complicated topic that will take years or decades to sort out. AI persons (or things that look like AI persons) could easily be here in the next decade. If we don’t start thinking about it soon, then we may be years behind when it happens.
I feel like you are talking past the critique. For an intervention to be a longtermist priority, there needs to be some kind of story for how it improves the long-term future. Sure, AI welfare may be a large-scaled problem which takes decades to sort out (if tackled by unaided humans), but that alone does not mean it should be worked on presently. Your points here do not engage with the argument, made by @Zach Stein-Perlman early on in the week, that we can just punt solving AI welfare to the future (i.e., to the long reflection / to once we have aligned superintelligent advisors), and in the meantime continue focusing our resources on AI safety (i.e., on raising the probability that we make it to a long reflection).
(There is an argument going in the opposite direction that a long reflection might not happen following alignment success, and so doing AI welfare work now might indeed make a difference to what gets locked in for the long-term. I am somewhat sympathetic to this argument, as I wrote here, but I still don’t think it delivers a knockdown case for making AI welfare work a priority.)
Likewise, for an intervention to be a neartermist priority, there has to be some kind of quantitative estimate demonstrating that it is competitive—or will soon be competitive, if nothing is done—in terms of suffering prevented per dollar spent, or similar, with the current neartermist priorities. Factory farming seems like the obvious thing to compare AI welfare against. I’ve been surprised by how nobody has tried coming up with such an estimate this week, however rough. (Note: I’m not sure if you are trying to argue that AI welfare should be both a neartermist and longtermist priority, as some have.)
(Note also: I’m unsure how much of our disagreement is simply because of the “should be a priority” wording. I agree with JWS’s current “It is not good enough…” statement, but would think it wrong if the “should” were replaced with “could.” Similarly, I agree with you as far as: “The fact that something might have a huge scale and we might be able to do something about it is enough for it to be taken seriously.”)
[ETA: On a second read, this comment of mine seems a bit more combative than I intended—sorry about that.]
For an intervention to be a longtermist priority, there needs to be some kind of concrete story for how it improves the long-term future.
I disagree with this. With existential risk from unaligned AI, I don’t think anyone has ever told a very clear story about how AI will actually get misaligned, get loose, and kill everyone. People have speculated about components of the story, but generally not in a super concrete way, and it isn’t clear how standard AI safety research would address a very specific disaster scenario. I don’t think this is a problem: we shouldn’t expect to know all the details of how things go wrong in advance, and it is worthwhile to do a lot of preparatory research that might be helpful so that we’re not fumbling through basic things during a critical period. I think the same applies to digital minds.
Your points here do not engage with the argument, made by @Zach Stein-Perlman early on in the week, that we can just punt solving AI welfare to the future (i.e., to the long reflection / to once we have aligned superintelligent advisors), and in the meantime continue focusing our resources on AI safety (i.e., on raising the probability that we make it to a long reflection).
I think this viewpoint is overly optimistic about the probability of locking in / the relevance of superintelligent advisors. I discuss some of the issues of locking in in a contribution to the debate week. In brief, I think that it is possible that digital minds will be sufficiently integrated in the next few decades that they will have power in social relationships that will be extremely difficult to disentangle. I also think that AGI may be useful in drawing inferences from our assumptions, but won’t be particularly helpful at setting the right assumptions.
With existential risk from unaligned AI, I don’t think anyone has ever told a very clear story about how AI will actually get misaligned, get loose, and kill everyone.
This should be evidence against AI x-risk![1] Even in the atmospheric ignition case in Trinity, they had more concrete models to use. If we can’t build a concrete model here, then it implies we don’t have a concrete/convincing case for why it should be prioritised at all, imo. It’s similar to the point in my footnotes that you need to argue for both p and p->q, not just the latter. This is what I would expect to see if the case for p was unconvincing/incorrect.
I don’t think this is a problem: we shouldn’t expect to know all the details of how things go wrong in advance
Yeah I agree with this. But the uncertainty and cluelessness in the future should decrease one’s confidence that they’re working on the most important thing in the history of humanity, one would think.
and it is worthwhile to do a lot of preparatory research that might be helpful so that we’re not fumbling through basic things during a critical period. I think the same applies to digital minds.
I’m all in favour of research, but how much should that research get funded? Can it be justified above other potential uses of money and general resource? Should it be an EA priority as defined by the AWDW framing? These we (almost) entirely unargued for.
> For an intervention to be a longtermist priority, there needs to be some kind of concrete story for how it improves the long-term future.
I disagree with this. With existential risk from unaligned AI, I don’t think anyone has ever told a very clear story about how AI will actually get misaligned, get loose, and kill everyone.
When I read the passage you quoted I thought of e.g. Critch’s description of RAAPs and Christiano’s what failure looks like, both of which seem pretty detailed to me without necessarily fitting the “AI gets misaligned, gets loose and kills everyone” meme; both Critch and Christiano seem to me to be explicitly pushing back against consideration of only that meme, and Critch in particular thinks work in this area is ~neglected (as of 2021, I haven’t kept up with goings-on). I suppose Gwern’s writeup comes closest to your description, and I can’t imagine it being more concrete; curious to hear if you have a different reaction.
I feel like you are talking past the critique. For an intervention to be a longtermist priority, there needs to be some kind of story for how it improves the long-term future. Sure, AI welfare may be a large-scaled problem which takes decades to sort out (if tackled by unaided humans), but that alone does not mean it should be worked on presently. Your points here do not engage with the argument, made by @Zach Stein-Perlman early on in the week, that we can just punt solving AI welfare to the future (i.e., to the long reflection / to once we have aligned superintelligent advisors), and in the meantime continue focusing our resources on AI safety (i.e., on raising the probability that we make it to a long reflection).
(There is an argument going in the opposite direction that a long reflection might not happen following alignment success, and so doing AI welfare work now might indeed make a difference to what gets locked in for the long-term. I am somewhat sympathetic to this argument, as I wrote here, but I still don’t think it delivers a knockdown case for making AI welfare work a priority.)
Likewise, for an intervention to be a neartermist priority, there has to be some kind of quantitative estimate demonstrating that it is competitive—or will soon be competitive, if nothing is done—in terms of suffering prevented per dollar spent, or similar, with the current neartermist priorities. Factory farming seems like the obvious thing to compare AI welfare against. I’ve been surprised by how nobody has tried coming up with such an estimate this week, however rough. (Note: I’m not sure if you are trying to argue that AI welfare should be both a neartermist and longtermist priority, as some have.)
(Note also: I’m unsure how much of our disagreement is simply because of the “should be a priority” wording. I agree with JWS’s current “It is not good enough…” statement, but would think it wrong if the “should” were replaced with “could.” Similarly, I agree with you as far as: “The fact that something might have a huge scale and we might be able to do something about it is enough for it to be taken seriously.”)
[ETA: On a second read, this comment of mine seems a bit more combative than I intended—sorry about that.]
I disagree with this. With existential risk from unaligned AI, I don’t think anyone has ever told a very clear story about how AI will actually get misaligned, get loose, and kill everyone. People have speculated about components of the story, but generally not in a super concrete way, and it isn’t clear how standard AI safety research would address a very specific disaster scenario. I don’t think this is a problem: we shouldn’t expect to know all the details of how things go wrong in advance, and it is worthwhile to do a lot of preparatory research that might be helpful so that we’re not fumbling through basic things during a critical period. I think the same applies to digital minds.
I think this viewpoint is overly optimistic about the probability of locking in / the relevance of superintelligent advisors. I discuss some of the issues of locking in in a contribution to the debate week. In brief, I think that it is possible that digital minds will be sufficiently integrated in the next few decades that they will have power in social relationships that will be extremely difficult to disentangle. I also think that AGI may be useful in drawing inferences from our assumptions, but won’t be particularly helpful at setting the right assumptions.
This should be evidence against AI x-risk![1] Even in the atmospheric ignition case in Trinity, they had more concrete models to use. If we can’t build a concrete model here, then it implies we don’t have a concrete/convincing case for why it should be prioritised at all, imo. It’s similar to the point in my footnotes that you need to argue for both p and p->q, not just the latter. This is what I would expect to see if the case for p was unconvincing/incorrect.
Yeah I agree with this. But the uncertainty and cluelessness in the future should decrease one’s confidence that they’re working on the most important thing in the history of humanity, one would think.
I’m all in favour of research, but how much should that research get funded? Can it be justified above other potential uses of money and general resource? Should it be an EA priority as defined by the AWDW framing? These we (almost) entirely unargued for.
Not dispositive evidence perhaps, but a consideration
When I read the passage you quoted I thought of e.g. Critch’s description of RAAPs and Christiano’s what failure looks like, both of which seem pretty detailed to me without necessarily fitting the “AI gets misaligned, gets loose and kills everyone” meme; both Critch and Christiano seem to me to be explicitly pushing back against consideration of only that meme, and Critch in particular thinks work in this area is ~neglected (as of 2021, I haven’t kept up with goings-on). I suppose Gwern’s writeup comes closest to your description, and I can’t imagine it being more concrete; curious to hear if you have a different reaction.