I generally agree that the formal thesis for the debate week set a high bar that is difficult to defend and I think that this is a good statement of the case for that. Even if you think that AI welfare is important (which I do!), the field doesn’t have the existing talent pipelines or clear strategy to absorb $50 million in new funding each year. Putting that much in over the next few years could easily make things worse. It is also possible that AI welfare has the potential for non-EA money and it should aim for that rather than try to take money that would otherwise go to EA cause areas.
That said, there are other points that I disagree with:
It is not good enough to simply say that an issue might have a large scale impact and therefore think it should be an EA priority, it is not good enough to simply defer to Carl Shulman’s views if you yourself can’t argue why you think it’s “pretty likely… that there will be vast numbers of AIs that are smarter than us” and why those AIs deserve moral consideration.
I think that this is wrong. The fact that something might have a huge scale and we might be able to do something about it is enough for it to be taken seriously and provides prima facie evidence that it should be a priority. I think it is vastly preferrable to preempt problems before they occur rather than try to fix them once they have. For one, AI welfare is a very complicated topic that will take years or decades to sort out. AI persons (or things that look like AI persons) could easily be here in the next decade. If we don’t start thinking about it soon, then we may be years behind when it happens.
AI people (of some form or other) are not exactly a purely hypothetical technology, and the epistemic case for them doesn’t seem fundamentally different from the case for thinking that AI safety will be an existential issue in the future, that the average intensively farmed animal leads a net-negative life, or that any given global health intervention won’t have significant unanticipated negative side effects. We’re dealing with deep uncertainties no matter what we do.
Additionally, it might be much harder to try to lobby for changes once things have gone wrong.
I wish some groups were actively lobbying against intensified animal agriculture in the 1930s (or the 1880s). It may not have been tractable. It may not have been clear, but it may have been possible to outlaw some terrible practices before they were adopted. We might have that opportunity now with AI welfare. Perhaps this means that we only need a small core group, but I do think some people should make it a priority.
> It is not good enough to simply say that an issue might have a large scale impact and therefore think it should be an EA priority [...]
I think that this is wrong. The fact that something might have a huge scale and we might be able to do something about it is enough for it to be taken seriously and provides prima facie evidence that it should be a priority. I think it is vastly preferrable [sic] to preempt problems before they occur rather than try to fix them once they have. For one, AI welfare is a very complicated topic that will take years or decades to sort out. AI persons (or things that look like AI persons) could easily be here in the next decade. If we don’t start thinking about it soon, then we may be years behind when it happens.
I feel like you are talking past the critique. For an intervention to be a longtermist priority, there needs to be some kind of story for how it improves the long-term future. Sure, AI welfare may be a large-scaled problem which takes decades to sort out (if tackled by unaided humans), but that alone does not mean it should be worked on presently. Your points here do not engage with the argument, made by @Zach Stein-Perlman early on in the week, that we can just punt solving AI welfare to the future (i.e., to the long reflection / to once we have aligned superintelligent advisors), and in the meantime continue focusing our resources on AI safety (i.e., on raising the probability that we make it to a long reflection).
(There is an argument going in the opposite direction that a long reflection might not happen following alignment success, and so doing AI welfare work now might indeed make a difference to what gets locked in for the long-term. I am somewhat sympathetic to this argument, as I wrote here, but I still don’t think it delivers a knockdown case for making AI welfare work a priority.)
Likewise, for an intervention to be a neartermist priority, there has to be some kind of quantitative estimate demonstrating that it is competitive—or will soon be competitive, if nothing is done—in terms of suffering prevented per dollar spent, or similar, with the current neartermist priorities. Factory farming seems like the obvious thing to compare AI welfare against. I’ve been surprised by how nobody has tried coming up with such an estimate this week, however rough. (Note: I’m not sure if you are trying to argue that AI welfare should be both a neartermist and longtermist priority, as some have.)
(Note also: I’m unsure how much of our disagreement is simply because of the “should be a priority” wording. I agree with JWS’s current “It is not good enough…” statement, but would think it wrong if the “should” were replaced with “could.” Similarly, I agree with you as far as: “The fact that something might have a huge scale and we might be able to do something about it is enough for it to be taken seriously.”)
[ETA: On a second read, this comment of mine seems a bit more combative than I intended—sorry about that.]
For an intervention to be a longtermist priority, there needs to be some kind of concrete story for how it improves the long-term future.
I disagree with this. With existential risk from unaligned AI, I don’t think anyone has ever told a very clear story about how AI will actually get misaligned, get loose, and kill everyone. People have speculated about components of the story, but generally not in a super concrete way, and it isn’t clear how standard AI safety research would address a very specific disaster scenario. I don’t think this is a problem: we shouldn’t expect to know all the details of how things go wrong in advance, and it is worthwhile to do a lot of preparatory research that might be helpful so that we’re not fumbling through basic things during a critical period. I think the same applies to digital minds.
Your points here do not engage with the argument, made by @Zach Stein-Perlman early on in the week, that we can just punt solving AI welfare to the future (i.e., to the long reflection / to once we have aligned superintelligent advisors), and in the meantime continue focusing our resources on AI safety (i.e., on raising the probability that we make it to a long reflection).
I think this viewpoint is overly optimistic about the probability of locking in / the relevance of superintelligent advisors. I discuss some of the issues of locking in in a contribution to the debate week. In brief, I think that it is possible that digital minds will be sufficiently integrated in the next few decades that they will have power in social relationships that will be extremely difficult to disentangle. I also think that AGI may be useful in drawing inferences from our assumptions, but won’t be particularly helpful at setting the right assumptions.
With existential risk from unaligned AI, I don’t think anyone has ever told a very clear story about how AI will actually get misaligned, get loose, and kill everyone.
This should be evidence against AI x-risk![1] Even in the atmospheric ignition case in Trinity, they had more concrete models to use. If we can’t build a concrete model here, then it implies we don’t have a concrete/convincing case for why it should be prioritised at all, imo. It’s similar to the point in my footnotes that you need to argue for both p and p->q, not just the latter. This is what I would expect to see if the case for p was unconvincing/incorrect.
I don’t think this is a problem: we shouldn’t expect to know all the details of how things go wrong in advance
Yeah I agree with this. But the uncertainty and cluelessness in the future should decrease one’s confidence that they’re working on the most important thing in the history of humanity, one would think.
and it is worthwhile to do a lot of preparatory research that might be helpful so that we’re not fumbling through basic things during a critical period. I think the same applies to digital minds.
I’m all in favour of research, but how much should that research get funded? Can it be justified above other potential uses of money and general resource? Should it be an EA priority as defined by the AWDW framing? These we (almost) entirely unargued for.
> For an intervention to be a longtermist priority, there needs to be some kind of concrete story for how it improves the long-term future.
I disagree with this. With existential risk from unaligned AI, I don’t think anyone has ever told a very clear story about how AI will actually get misaligned, get loose, and kill everyone.
When I read the passage you quoted I thought of e.g. Critch’s description of RAAPs and Christiano’s what failure looks like, both of which seem pretty detailed to me without necessarily fitting the “AI gets misaligned, gets loose and kills everyone” meme; both Critch and Christiano seem to me to be explicitly pushing back against consideration of only that meme, and Critch in particular thinks work in this area is ~neglected (as of 2021, I haven’t kept up with goings-on). I suppose Gwern’s writeup comes closest to your description, and I can’t imagine it being more concrete; curious to hear if you have a different reaction.
To add to the intensive animal agriculture analogy: this time, people are designing them, which provides a lot of reason to believe early intervention can affect AI welfare compared to animal agriculture.
Even if you think that AI welfare is important (which I do!), the field doesn’t have the existing talent pipelines or clear strategy to absorb $50 million in new funding each year.
Yep completely agree here, and as Siebe pointed out I did got to the extreme end of ‘make the changes right now’. It could be structured in more gradual way, and potential from more external funding.
The fact that something might have a huge scale and we might be able to do something about it is enough for it to be taken seriously and provides prima facie evidence that it should be a priority.
I agree in principle on the huge scale point, but much less so the ‘might be able to do something’. I think we need a lot more than that, we need something tractable to get going, especially for something to be considered a priority. I think the general form of argument I’ve seen this week is that AI Welfare could have a huge scale, therefore it should be an EA priority without much to flesh out the ‘do something’ part.
AI persons (or things that look like AI persons) could easily be here in the next decade...AI people (of some form or other) are not exactly a purely hypothetical technology,
I think I disagree empirically here. Counterfeit “people” might be here soon, but I am not moved much by arguments that digital ‘life’ with full agency, self-awareness, autopoiesis, moral values, moral patienhood etc will be here in the next decade. Especially not easily here. I definitely think that case hasn’t been made, and I think (contra Chris in the other thread) that claims of this sort should have been made much more strongly during AWDW.
We might have that opportunity now with AI welfare. Perhaps this means that we only need a small core group, but I do think some people should make it a priority.
Some small people should, I agree. Funding Jeff Sebo and Rob Long? Sounds great. Giving them 438 research assistants and $49M in funding taken from other EA causes? Hell to the naw. We weren’t discussing whether AI Welfare should be a priority for some EAs, we were discussing specific terms set out in the week’s statement, and I feel like I’m the only person during this week who paid any attention to them.
Secondly, the ‘we might have that opportunity’ is very unconving to me. It’s the same convingness to me of saying in 2008 that ’”If CERN is turned on, it make create a black hole that destroys the world. Nobody else is listening. We might only have the opportunity to act now!” It’s just not enough to be action-guiding in my opinion.
I’m pretty aware the above is unfair to strong advocates of AI Safety and AI Welfare, but at the moment that’s where the quality of arguments this week have roughly stood from my viewpoint.
I generally agree that the formal thesis for the debate week set a high bar that is difficult to defend and I think that this is a good statement of the case for that. Even if you think that AI welfare is important (which I do!), the field doesn’t have the existing talent pipelines or clear strategy to absorb $50 million in new funding each year. Putting that much in over the next few years could easily make things worse. It is also possible that AI welfare has the potential for non-EA money and it should aim for that rather than try to take money that would otherwise go to EA cause areas.
That said, there are other points that I disagree with:
I think that this is wrong. The fact that something might have a huge scale and we might be able to do something about it is enough for it to be taken seriously and provides prima facie evidence that it should be a priority. I think it is vastly preferrable to preempt problems before they occur rather than try to fix them once they have. For one, AI welfare is a very complicated topic that will take years or decades to sort out. AI persons (or things that look like AI persons) could easily be here in the next decade. If we don’t start thinking about it soon, then we may be years behind when it happens.
AI people (of some form or other) are not exactly a purely hypothetical technology, and the epistemic case for them doesn’t seem fundamentally different from the case for thinking that AI safety will be an existential issue in the future, that the average intensively farmed animal leads a net-negative life, or that any given global health intervention won’t have significant unanticipated negative side effects. We’re dealing with deep uncertainties no matter what we do.
Additionally, it might be much harder to try to lobby for changes once things have gone wrong. I wish some groups were actively lobbying against intensified animal agriculture in the 1930s (or the 1880s). It may not have been tractable. It may not have been clear, but it may have been possible to outlaw some terrible practices before they were adopted. We might have that opportunity now with AI welfare. Perhaps this means that we only need a small core group, but I do think some people should make it a priority.
I feel like you are talking past the critique. For an intervention to be a longtermist priority, there needs to be some kind of story for how it improves the long-term future. Sure, AI welfare may be a large-scaled problem which takes decades to sort out (if tackled by unaided humans), but that alone does not mean it should be worked on presently. Your points here do not engage with the argument, made by @Zach Stein-Perlman early on in the week, that we can just punt solving AI welfare to the future (i.e., to the long reflection / to once we have aligned superintelligent advisors), and in the meantime continue focusing our resources on AI safety (i.e., on raising the probability that we make it to a long reflection).
(There is an argument going in the opposite direction that a long reflection might not happen following alignment success, and so doing AI welfare work now might indeed make a difference to what gets locked in for the long-term. I am somewhat sympathetic to this argument, as I wrote here, but I still don’t think it delivers a knockdown case for making AI welfare work a priority.)
Likewise, for an intervention to be a neartermist priority, there has to be some kind of quantitative estimate demonstrating that it is competitive—or will soon be competitive, if nothing is done—in terms of suffering prevented per dollar spent, or similar, with the current neartermist priorities. Factory farming seems like the obvious thing to compare AI welfare against. I’ve been surprised by how nobody has tried coming up with such an estimate this week, however rough. (Note: I’m not sure if you are trying to argue that AI welfare should be both a neartermist and longtermist priority, as some have.)
(Note also: I’m unsure how much of our disagreement is simply because of the “should be a priority” wording. I agree with JWS’s current “It is not good enough…” statement, but would think it wrong if the “should” were replaced with “could.” Similarly, I agree with you as far as: “The fact that something might have a huge scale and we might be able to do something about it is enough for it to be taken seriously.”)
[ETA: On a second read, this comment of mine seems a bit more combative than I intended—sorry about that.]
I disagree with this. With existential risk from unaligned AI, I don’t think anyone has ever told a very clear story about how AI will actually get misaligned, get loose, and kill everyone. People have speculated about components of the story, but generally not in a super concrete way, and it isn’t clear how standard AI safety research would address a very specific disaster scenario. I don’t think this is a problem: we shouldn’t expect to know all the details of how things go wrong in advance, and it is worthwhile to do a lot of preparatory research that might be helpful so that we’re not fumbling through basic things during a critical period. I think the same applies to digital minds.
I think this viewpoint is overly optimistic about the probability of locking in / the relevance of superintelligent advisors. I discuss some of the issues of locking in in a contribution to the debate week. In brief, I think that it is possible that digital minds will be sufficiently integrated in the next few decades that they will have power in social relationships that will be extremely difficult to disentangle. I also think that AGI may be useful in drawing inferences from our assumptions, but won’t be particularly helpful at setting the right assumptions.
This should be evidence against AI x-risk![1] Even in the atmospheric ignition case in Trinity, they had more concrete models to use. If we can’t build a concrete model here, then it implies we don’t have a concrete/convincing case for why it should be prioritised at all, imo. It’s similar to the point in my footnotes that you need to argue for both p and p->q, not just the latter. This is what I would expect to see if the case for p was unconvincing/incorrect.
Yeah I agree with this. But the uncertainty and cluelessness in the future should decrease one’s confidence that they’re working on the most important thing in the history of humanity, one would think.
I’m all in favour of research, but how much should that research get funded? Can it be justified above other potential uses of money and general resource? Should it be an EA priority as defined by the AWDW framing? These we (almost) entirely unargued for.
Not dispositive evidence perhaps, but a consideration
When I read the passage you quoted I thought of e.g. Critch’s description of RAAPs and Christiano’s what failure looks like, both of which seem pretty detailed to me without necessarily fitting the “AI gets misaligned, gets loose and kills everyone” meme; both Critch and Christiano seem to me to be explicitly pushing back against consideration of only that meme, and Critch in particular thinks work in this area is ~neglected (as of 2021, I haven’t kept up with goings-on). I suppose Gwern’s writeup comes closest to your description, and I can’t imagine it being more concrete; curious to hear if you have a different reaction.
To add to the intensive animal agriculture analogy: this time, people are designing them, which provides a lot of reason to believe early intervention can affect AI welfare compared to animal agriculture.
Thanks for extensive reply Derek :)
Yep completely agree here, and as Siebe pointed out I did got to the extreme end of ‘make the changes right now’. It could be structured in more gradual way, and potential from more external funding.
I agree in principle on the huge scale point, but much less so the ‘might be able to do something’. I think we need a lot more than that, we need something tractable to get going, especially for something to be considered a priority. I think the general form of argument I’ve seen this week is that AI Welfare could have a huge scale, therefore it should be an EA priority without much to flesh out the ‘do something’ part.
I think I disagree empirically here. Counterfeit “people” might be here soon, but I am not moved much by arguments that digital ‘life’ with full agency, self-awareness, autopoiesis, moral values, moral patienhood etc will be here in the next decade. Especially not easily here. I definitely think that case hasn’t been made, and I think (contra Chris in the other thread) that claims of this sort should have been made much more strongly during AWDW.
Some small people should, I agree. Funding Jeff Sebo and Rob Long? Sounds great. Giving them 438 research assistants and $49M in funding taken from other EA causes? Hell to the naw. We weren’t discussing whether AI Welfare should be a priority for some EAs, we were discussing specific terms set out in the week’s statement, and I feel like I’m the only person during this week who paid any attention to them.
Secondly, the ‘we might have that opportunity’ is very unconving to me. It’s the same convingness to me of saying in 2008 that ’”If CERN is turned on, it make create a black hole that destroys the world. Nobody else is listening. We might only have the opportunity to act now!” It’s just not enough to be action-guiding in my opinion.
I’m pretty aware the above is unfair to strong advocates of AI Safety and AI Welfare, but at the moment that’s where the quality of arguments this week have roughly stood from my viewpoint.