Some quick thoughts on AI consciousness work, I may write up something more rigorous later.
Normally when people have criticisms of the EA movement they talk about its culture or point at community health concerns.
I think aspects of EA that make me more sad is that there seems to be a few extremely important issues on an impartial welfarist view that don’t seem to get much attention at all, despite having been identified at some point by some EAs. I do think that ea has done a decent job of pointing at the most important issues relative to basically every other social movement that I’m aware of but I’m going to complain about one of it’s shortcomings anyway.
It looks to me like we could build advanced ai systems in the next few years and in most worlds we have little idea of what’s actually going on inside them. The systems may tell us they are conscious, or say that they don’t like the tasks we tell them to do but right now we can’t really trust their self reports. There’ll be a clear economic incentive to ignore self reports that create a moral obligation to using the systems in less useful/efficient ways. I expect the number of deployed systems to be very large and that it’ll be plausible that we lock in the suffering of these systems in a similar way to factory farming. I think there are stronger arguments for the topic’s importance that I won’t dive into right now but the simplest case is just the “big if true-ness” of this area seems very high.
My impression is that our wider society and community is not orienting in a sane way to this topic. I don’t remember ever coming across a junior EA seriously considering directing their career to work in this area. 80k has a podcast with Rob Long and a very brief problem profile (that seems kind of reasonable), ai consciousness (iirc) doesn’t feature in ea virtual programs or any intro fellowship that I’m aware of, there haven’t been many (or any?) talks about it at eag in the last year. I do think that most organisations could turn around and ask “well what concrete action do you actually want our audience to take” and my answers are kind of vague and unsatisfying right now—I think we were at a similar point with alignment a few years ago and my impression is that it had to be on the communities mind for a while before we were able to pour substantial resources into it (though the field of alignment feels pretty sub-optimal to me and I’m interested in working out how to do a better job this time round).
I get that there aren’t shovel ready directions to push people to work on right now, but in so far as our community and its organisations brand themselves substantially as the groups identifying and prioritising the worlds most pressing problems it sure does feel to me like more people should have this topic on their minds.
There are some people I know of dedicating some of their resources to making progress in this area, and I am pretty optimistic about the people involved—the ones that I know of seem especially smart and thoughtful.
I don’t want all of the EA to jump into this rn, and I’m optimistic about having a research agenda in this space that I’m excited about and maybe even a vague plan about what one might do about all this by the end of this year—after which I think we’ll be better positioned to do field building. I am excited about people who feel especially well placed moving into this area—in particular people with some familiarity with both mainstream theories of consciousness and ml research (particularly designing and running empirical experiments). Feel free to reach out to me or apply for funding at the ltff.
(quick thoughts, may be missing something obvious)
Relative the scale of the long term future, the number of AIs deployed in the near term is very small, so to me it seems like there’s pretty limited upside to improving that. In the long term, it seems like we have AIs to figure out the nature of consciousness for us.
Maybe I’m missing the case that lock-in is plausible, it currently seems pretty unlikely to me because the singularity seems like it will transform the ways the AIs are running. So in my mind it mostly matters what happens after the singularity.
I’m also not sure about the tractability, but the scale is my major crux.
I do think understanding AI consciousness might be valuable for alignment, I’m just arguing against work on nearterm AI suffering.
I agree with your “no lock-in” view in the case of alignment going well: in that world, we’d surely use the aligned superintelligence to help us with things like understanding AI sentience and making sure that sentient AIs aren’t suffering.
In the case of misalignment and humanity losing control of the future, I don’t think I understand the view that there wouldn’t be lock-in. I may well be missing something, but I can’t see why there wouldn’t be lock-in of things related to suffering risk—for example, whether or not the ASI creates sentient subroutines which help it achieve its goals but which incidentally suffer—that could in theory be steered away from even if we fail at alignment, given that the ASI’s future actions (even if they’re very hard to exactly predict) are decided by how we build it, and which we could likely steer away from more effectively if we better understood AI sentience (because then we’d know more about things like what kinds of subroutines can suffer).
Edit: I have a lot of sympathy for the take above but I tried to write up my response around why I think lock-ins are pretty plausible.
I’m not sure rn whether the majority of downside comes from lock-in but I think that’s what I’m most immediately concerned about.
I assume by singularity you mean an intelligence explosion or extremely rapid economic growth. I think my default story for how this happens in the current paradigm involves people using AIs in existing institutions (or institutions that look pretty similar today’s one’s) in markets that looks pretty similar to current markets which (on my view) are unlikely to care about the moral patienthood of AIs in a pretty similar ways to current market failures.
On the “markets still exist and we do things in kind of like how we do now view”—I agree that in principle we’d be better positioned to make progress on problems generally if we had something like PASTA but I feel like you need to tell a reasonable story for one of
how governance works post TAI so that you can easily enact improvements like eliminating ai suffering
why current markets do allow for things like factory farming and slavery but wouldn’t allow for violation of AI preferences
I’m guessing your view is that progress will be highly discontinuous and society will look extremely different post singularity to how it does now (kind of like going from pre-agricultural revolution to now whereas my view is more like preindustrial revolution to now).
I’m not really sure where the cruxes are on this view or how to reason about it well but my high level argument is that the “god like AGI which has significant responsibility but still checks in with its operators” will still need to make some trade offs across various factors and unless it’s doing some cev type thing, outcomes will be fairly dependent on the goals that you give it and it’s not clear to me that the median world leader or ceo gives the agi goals that concern the ai’s wellbeing (or its subsystems wellbeing) - even if it’s relatively cheap to evaluate it. I am more optimistic about agi controlled by person sampled from a culture that has already set up norms around how to orient to the moral patienthood of ai systems than one that needs to figure it out on the fly. I do feel much better about worlds where some kind of reflection process is overdetermined.
My views here are pretty fuzzy and are often influenced substantially by thought experiments like “If random tech ceo could effectively control all the worlds scientists and have them run at 10x speed and had 100 trillion dollars does factory farming still exist?” which isn’t a very high epistemic bar to beat. (I also don’t think I’ve articulated my models very well and I may take another stab at this later on).
I have some tractability concerns but my understanding is that few people are actually trying to solve the problem right now and when few people are trying it’s pretty hard for me to actually get a sense of how tractable a thing is, so my priors on similarly shaped problems are doing most of the work (which leaves me feeling quite confused).
I’m really glad you wrote this; I’ve been worried about the same thing. I’m particularly worried at how few people are working on it given the potential scale and urgency of the problem. It also seems like an area where the EA ecosystem has a strong comparative advantage — it deals with issues many in this field are familiar with, requires a blend of technical and philosophical skills, and is still too weird and nascent for the wider world to touch (for now). I’d be very excited to see more research and work done here, ideally quite soon.
Very strong +1 to all this. I honestly think it’s the most neglected area relative to its importance right now. It seems plausible that the vast majority of future beings will be digital, so it would be surprising if longtermism does not imply much more attention to the issue.
I hadn’t seen this until now, but it’s good to see that you’ve come to the same conclusion I have. I’ve just started my DPhil in Philosophy and plan on working on AI mental states and welfare.
Some quick thoughts on AI consciousness work, I may write up something more rigorous later.
Normally when people have criticisms of the EA movement they talk about its culture or point at community health concerns.
I think aspects of EA that make me more sad is that there seems to be a few extremely important issues on an impartial welfarist view that don’t seem to get much attention at all, despite having been identified at some point by some EAs. I do think that ea has done a decent job of pointing at the most important issues relative to basically every other social movement that I’m aware of but I’m going to complain about one of it’s shortcomings anyway.
It looks to me like we could build advanced ai systems in the next few years and in most worlds we have little idea of what’s actually going on inside them. The systems may tell us they are conscious, or say that they don’t like the tasks we tell them to do but right now we can’t really trust their self reports. There’ll be a clear economic incentive to ignore self reports that create a moral obligation to using the systems in less useful/efficient ways. I expect the number of deployed systems to be very large and that it’ll be plausible that we lock in the suffering of these systems in a similar way to factory farming. I think there are stronger arguments for the topic’s importance that I won’t dive into right now but the simplest case is just the “big if true-ness” of this area seems very high.
My impression is that our wider society and community is not orienting in a sane way to this topic. I don’t remember ever coming across a junior EA seriously considering directing their career to work in this area. 80k has a podcast with Rob Long and a very brief problem profile (that seems kind of reasonable), ai consciousness (iirc) doesn’t feature in ea virtual programs or any intro fellowship that I’m aware of, there haven’t been many (or any?) talks about it at eag in the last year. I do think that most organisations could turn around and ask “well what concrete action do you actually want our audience to take” and my answers are kind of vague and unsatisfying right now—I think we were at a similar point with alignment a few years ago and my impression is that it had to be on the communities mind for a while before we were able to pour substantial resources into it (though the field of alignment feels pretty sub-optimal to me and I’m interested in working out how to do a better job this time round).
I get that there aren’t shovel ready directions to push people to work on right now, but in so far as our community and its organisations brand themselves substantially as the groups identifying and prioritising the worlds most pressing problems it sure does feel to me like more people should have this topic on their minds.
There are some people I know of dedicating some of their resources to making progress in this area, and I am pretty optimistic about the people involved—the ones that I know of seem especially smart and thoughtful.
I don’t want all of the EA to jump into this rn, and I’m optimistic about having a research agenda in this space that I’m excited about and maybe even a vague plan about what one might do about all this by the end of this year—after which I think we’ll be better positioned to do field building. I am excited about people who feel especially well placed moving into this area—in particular people with some familiarity with both mainstream theories of consciousness and ml research (particularly designing and running empirical experiments). Feel free to reach out to me or apply for funding at the ltff.
(quick thoughts, may be missing something obvious)
Relative the scale of the long term future, the number of AIs deployed in the near term is very small, so to me it seems like there’s pretty limited upside to improving that. In the long term, it seems like we have AIs to figure out the nature of consciousness for us.
Maybe I’m missing the case that lock-in is plausible, it currently seems pretty unlikely to me because the singularity seems like it will transform the ways the AIs are running. So in my mind it mostly matters what happens after the singularity.
I’m also not sure about the tractability, but the scale is my major crux.
I do think understanding AI consciousness might be valuable for alignment, I’m just arguing against work on nearterm AI suffering.
I agree with your “no lock-in” view in the case of alignment going well: in that world, we’d surely use the aligned superintelligence to help us with things like understanding AI sentience and making sure that sentient AIs aren’t suffering.
In the case of misalignment and humanity losing control of the future, I don’t think I understand the view that there wouldn’t be lock-in. I may well be missing something, but I can’t see why there wouldn’t be lock-in of things related to suffering risk—for example, whether or not the ASI creates sentient subroutines which help it achieve its goals but which incidentally suffer—that could in theory be steered away from even if we fail at alignment, given that the ASI’s future actions (even if they’re very hard to exactly predict) are decided by how we build it, and which we could likely steer away from more effectively if we better understood AI sentience (because then we’d know more about things like what kinds of subroutines can suffer).
Edit: I have a lot of sympathy for the take above but I tried to write up my response around why I think lock-ins are pretty plausible.
I’m not sure rn whether the majority of downside comes from lock-in but I think that’s what I’m most immediately concerned about.
I assume by singularity you mean an intelligence explosion or extremely rapid economic growth. I think my default story for how this happens in the current paradigm involves people using AIs in existing institutions (or institutions that look pretty similar today’s one’s) in markets that looks pretty similar to current markets which (on my view) are unlikely to care about the moral patienthood of AIs in a pretty similar ways to current market failures.
On the “markets still exist and we do things in kind of like how we do now view”—I agree that in principle we’d be better positioned to make progress on problems generally if we had something like PASTA but I feel like you need to tell a reasonable story for one of
how governance works post TAI so that you can easily enact improvements like eliminating ai suffering
why current markets do allow for things like factory farming and slavery but wouldn’t allow for violation of AI preferences
I’m guessing your view is that progress will be highly discontinuous and society will look extremely different post singularity to how it does now (kind of like going from pre-agricultural revolution to now whereas my view is more like preindustrial revolution to now).
I’m not really sure where the cruxes are on this view or how to reason about it well but my high level argument is that the “god like AGI which has significant responsibility but still checks in with its operators” will still need to make some trade offs across various factors and unless it’s doing some cev type thing, outcomes will be fairly dependent on the goals that you give it and it’s not clear to me that the median world leader or ceo gives the agi goals that concern the ai’s wellbeing (or its subsystems wellbeing) - even if it’s relatively cheap to evaluate it. I am more optimistic about agi controlled by person sampled from a culture that has already set up norms around how to orient to the moral patienthood of ai systems than one that needs to figure it out on the fly. I do feel much better about worlds where some kind of reflection process is overdetermined.
My views here are pretty fuzzy and are often influenced substantially by thought experiments like “If random tech ceo could effectively control all the worlds scientists and have them run at 10x speed and had 100 trillion dollars does factory farming still exist?” which isn’t a very high epistemic bar to beat. (I also don’t think I’ve articulated my models very well and I may take another stab at this later on).
I have some tractability concerns but my understanding is that few people are actually trying to solve the problem right now and when few people are trying it’s pretty hard for me to actually get a sense of how tractable a thing is, so my priors on similarly shaped problems are doing most of the work (which leaves me feeling quite confused).
I’m really glad you wrote this; I’ve been worried about the same thing. I’m particularly worried at how few people are working on it given the potential scale and urgency of the problem. It also seems like an area where the EA ecosystem has a strong comparative advantage — it deals with issues many in this field are familiar with, requires a blend of technical and philosophical skills, and is still too weird and nascent for the wider world to touch (for now). I’d be very excited to see more research and work done here, ideally quite soon.
Very strong +1 to all this. I honestly think it’s the most neglected area relative to its importance right now. It seems plausible that the vast majority of future beings will be digital, so it would be surprising if longtermism does not imply much more attention to the issue.
I hadn’t seen this until now, but it’s good to see that you’ve come to the same conclusion I have. I’ve just started my DPhil in Philosophy and plan on working on AI mental states and welfare.