I’m still not sure how the consciousness issue can just be ignored. Yes, given the assumption that AIs will be mindless machines with no moral value, obviously we need to build them to serve humans. But if AIs will be conscious creatures with moral value like us, then...? In this case finding the right thing to do seems like a much harder problem, as it would be far from clear that a future in which machine intelligences gradually replace human intelligences represents a nightmare scenario, or even an existential risk at all. It’s especially frustrating to see AI-risk folks treat this question as an irrelevance, since it seems to have such enormous implications on how important AI alignment actually is.
(Note that I’m not invoking ‘ghost in the machine’, I am making the very reasonable guess that our consciousness is a physical process that occurs in our brains, that it’s there for the same reason other features of our minds are there—because it’s adaptive—and that similar functionality might very plausibly be useful for an AI too.)
Consciousness is something people at MIRI have thoughtaboutquitea bit, and we don’t think people should ignore the issue. But it’s a conceptually separate issue from intelligence, and it’s important to be clear about that.
One reason to prioritize the AI alignment issue over consciousness research is that a sufficiently general solution to AI alignment would hopefully also resolve the consciousness problem: we care about the suffering, happiness, etc. of minds in general, so if we successfully build a system that shares and promotes our values, that system would hopefully also look out for the welfare of conscious machines, if any exist. That includes looking out for its own welfare, if it’s conscious.
In contrast, a solution to the consciousness problem doesn’t get us a solution to the alignment problem, because there’s no assurance that conscious beings will do things that are good (including good for the welfare of conscious machines).
Some consciousness-related issues are subsumed in alignment, though. E.g., a plausible desideratum for early general AI systems is “limit its ability to model minds” (‘behaviorist’ AI). And we do want to drive down the probability that the system is conscious if we can find a way to do that.
Thanks for the reply. You’re right that we can’t be sure that conscious beings will do good things, but we don’t have that assurance for any outcome we might push for.
If AIs are conscious, then a multipolar future filled with vast numbers of unaligned AIs could very plausibly be a wonderful future, brimming with utility. This isn’t overwhelmingly obvious, but it’s a real possibility. By contrast, if AIs aren’t conscious then this scenario would represent a dead future. So distinguishing the two seems quite vital to understanding whether a multipolar outcome is bad or good.
You point out that even compared to the optimistic scenario I describe above, a correctly-aligned singleton could do better, by ensuring the very best future possible. True, but if a singleton isn’t inevitable, trying to create one will itself pose a serious existential risk, as would any similar world takeover attempt. And even if the attempt is successful, we all agree that creating an aligned singleton is a very difficult task. Most singleton outcomes result in a universe almost entirely full of dead matter, produced by the singleton AI optimising for something irrelevant; even if it’s conscious itself, resources that could have been put towards creating utility are almost all wasted as paperclips or whatever.
So it seems to me that, unless you’re quite certain we’re headed for a singleton future, the question of whether AIs will be conscious or not has a pretty huge impact on what path we should try to take.
You’re right that we can’t be sure that conscious beings will do good things, but we don’t have that assurance for any outcome we might push for.
One way to think about the goal is that we want to “zero in” on valuable futures: it’s unclear what exactly a good future looks like, and we can’t get an “assurance,” but for example a massive Manhattan Project to develop whole brain emulation is a not-implausible path to zeroing in, assuming WBE isn’t too difficult to achieve on the relevant timescale and assuming you can avoid accelerating difficult-to-align AI too much in the process. It’s a potentially promising option for zeroing in because emulated humans could be leveraged to do a lot of cognitive work in a compressed period of time to sort out key questions in moral psychology+philosophy, neuroscience, computer science, etc. that we need to answer in order to get a better picture of good outcomes.
This is also true for a Manhattan Project to develop a powerful search algorithm that generates smart creative policies to satisfy our values, while excluding hazardous parts of the search space—this is the AI route.
Trying to ensure that AI is conscious, without also solving WBE or alignment or global coordination or something of that kind in the process, doesn’t have this “zeroing in” property. It’s more of a gamble that hopefully good-ish futures have a high enough base rate even when we don’t put a lot of work into steering in a specific direction, that maybe arbitrary conscious systems would make good things happen. But building a future of conscious AI systems opens up a lot of ways for suffering to end up proliferating in the universe, just as it opens up a lot of ways for happiness to end up proliferating in the universe. Just as it isn’t obvious that e.g. rabbits experience more joy than suffering in the natural world, it isn’t obvious that conscious AI systems in a multipolar outcome would experience more joy than suffering. (Or otherwise experience good-on-net lives.)
True, but if a singleton isn’t inevitable, trying to create one will itself pose a serious existential risk, as would any similar world takeover attempt.
I think if an AI singleton isn’t infeasible or prohibitively difficult to achieve, then it’s likely to happen eventually regardless of what we’d ideally prefer to have happen, absent some intervention to prevent it. Either it’s not achievable, or something needs to occur to prevent anyone in the world from reaching that point. If you’re worried about singletons, I don’t think pursuing multipolar outcomes and/or conscious-AI outcomes should be a priority for you, because I don’t think either of those paths concentrates very much probability mass (if any) into scenarios where singletons start off feasible but something blocks them from occurring.
Multipolar scenarios are likelier to occur in scenarios where singletons simply aren’t feasible, as a background fact about the universe; but conditional on singletons being feasible, I’m skeptical that achieving a multipolar AI outcome would do much (if anything) to prevent a singleton from occurring afterward, and I think it would make alignment much more difficult.
Alignment and WBE look like difficult tasks, but they have the “zeroing in” property, and we don’t know exactly how difficult they are. Alignment in particular could turn out to be much harder than it looks or much easier, because there’s so little understanding of what specifically is required. (Investigating WBE has less value-of-information because we already have some decent WBE roadmaps.)
To clarify: I don’t think it will be especially fruitful to try to ensure AIs are conscious, for the reason you mention: multipolar scenarios don’t really work that way, what will happen is determined by what’s efficient in a competitive world, which doesn’t allow much room to make changes now that will actually persist.
And yes, if a singleton is inevitable, then our only hope for a good future is to do our best to align the singleton, so that it uses its uncontested power to do good things rather than just to pursue whatever nonsense goal it will have been given otherwise.
What I’m concerned about is the possibility that a singleton is not inevitable (which seems to me the most likely scenario) but that folks attempt to create one anyway. This includes realities where a singleton is impossible or close to it, as well as where a singleton is possible but only with some effort made to push towards that outcome. An example of the latter would just be a soft takeoff coupled with an attempt at forming a world government to control the AI—such a scenario certainly seems to me like it could fit the “possible but not inevitable” description.
A world takeover attempt has the potential to go very, very wrong—and then there’s the serious possibility that the creation of the singleton would be successful but the alignment of it would not. Given this, I don’t think it makes sense to push unequivocally for this option, with the enormous risks it entails, until we have a good idea of what the alternative looks like. That we can’t control that alternative is irrelevant—we can still understand it! When we have a reasonable picture of that scenario, then we can start to think about whether it’s so bad that we should embark on dangerous risky strategies to try to avoid it.
One element of that understanding would be on how likely AIs are to be conscious; another would be how good or bad a life conscious AIs would have in a multipolar scenario. I agree entirely that we don’t know this yet—whether for rabbits or for future AIs—that’s part of what I’d need to understand before I’d agree that a singleton seems like our best chance at a good future.
Did anything in Nate’s post or my comments strike you as “pushing for a singleton”? When people say “singleton,” I usually understand them to have in mind some kind of world takeover, which sounds like what you’re talking about here. The strategy people at MIRI favor tends to be more like “figure out what minimal AI system can end the acute risk period (in particular, from singletons), while doing as little else as possible; then steer toward that kind of system”. This shouldn’t be via world takeover if there’s any less-ambitious path to that outcome, because any added capability, or any added wrinkle in the goal you’re using the system for, increases accident risk.
More generally, alignment is something that you can partially solve for systems with some particular set of capabilities, rather than being all-or-nothing.
I agree entirely that we don’t know this yet—whether for rabbits or for future AIs—that’s part of what I’d need to understand before I’d agree that a singleton seems like our best chance at a good future.
I think it’s much less likely that we can learn that kind of generalization in advance than that we can solve most of the alignment problem in advance. Additionally, solving this doesn’t in any obvious way get you any closer to being able to block singletons from being developed, in the scenario where singletons are “possible but only with some effort made”. Knowing about the utility of a multipolar outcome where no one ever builds a singleton can be useful for knowing whether you should aim for a multipolar outcome where no one ever build a singleton, but it doesn’t get us any closer to knowing how to prevent anyone from ever building a singleton if you find a way to achieve an initially multipolar outcome.
I’d also add that I think the risk of producing bad conscious states via non-aligned AI mainly lies in AI systems potentially having parts or subsystems that are conscious, rather than in the system as a whole (or executive components) being conscious in the fashion of a human.
I’m still not sure how the consciousness issue can just be ignored. Yes, given the assumption that AIs will be mindless machines with no moral value, obviously we need to build them to serve humans. But if AIs will be conscious creatures with moral value like us, then...? In this case finding the right thing to do seems like a much harder problem, as it would be far from clear that a future in which machine intelligences gradually replace human intelligences represents a nightmare scenario, or even an existential risk at all. It’s especially frustrating to see AI-risk folks treat this question as an irrelevance, since it seems to have such enormous implications on how important AI alignment actually is.
(Note that I’m not invoking ‘ghost in the machine’, I am making the very reasonable guess that our consciousness is a physical process that occurs in our brains, that it’s there for the same reason other features of our minds are there—because it’s adaptive—and that similar functionality might very plausibly be useful for an AI too.)
Consciousness is something people at MIRI have thought about quite a bit, and we don’t think people should ignore the issue. But it’s a conceptually separate issue from intelligence, and it’s important to be clear about that.
One reason to prioritize the AI alignment issue over consciousness research is that a sufficiently general solution to AI alignment would hopefully also resolve the consciousness problem: we care about the suffering, happiness, etc. of minds in general, so if we successfully build a system that shares and promotes our values, that system would hopefully also look out for the welfare of conscious machines, if any exist. That includes looking out for its own welfare, if it’s conscious.
In contrast, a solution to the consciousness problem doesn’t get us a solution to the alignment problem, because there’s no assurance that conscious beings will do things that are good (including good for the welfare of conscious machines).
Some consciousness-related issues are subsumed in alignment, though. E.g., a plausible desideratum for early general AI systems is “limit its ability to model minds” (‘behaviorist’ AI). And we do want to drive down the probability that the system is conscious if we can find a way to do that.
Thanks for the reply. You’re right that we can’t be sure that conscious beings will do good things, but we don’t have that assurance for any outcome we might push for.
If AIs are conscious, then a multipolar future filled with vast numbers of unaligned AIs could very plausibly be a wonderful future, brimming with utility. This isn’t overwhelmingly obvious, but it’s a real possibility. By contrast, if AIs aren’t conscious then this scenario would represent a dead future. So distinguishing the two seems quite vital to understanding whether a multipolar outcome is bad or good.
You point out that even compared to the optimistic scenario I describe above, a correctly-aligned singleton could do better, by ensuring the very best future possible. True, but if a singleton isn’t inevitable, trying to create one will itself pose a serious existential risk, as would any similar world takeover attempt. And even if the attempt is successful, we all agree that creating an aligned singleton is a very difficult task. Most singleton outcomes result in a universe almost entirely full of dead matter, produced by the singleton AI optimising for something irrelevant; even if it’s conscious itself, resources that could have been put towards creating utility are almost all wasted as paperclips or whatever.
So it seems to me that, unless you’re quite certain we’re headed for a singleton future, the question of whether AIs will be conscious or not has a pretty huge impact on what path we should try to take.
One way to think about the goal is that we want to “zero in” on valuable futures: it’s unclear what exactly a good future looks like, and we can’t get an “assurance,” but for example a massive Manhattan Project to develop whole brain emulation is a not-implausible path to zeroing in, assuming WBE isn’t too difficult to achieve on the relevant timescale and assuming you can avoid accelerating difficult-to-align AI too much in the process. It’s a potentially promising option for zeroing in because emulated humans could be leveraged to do a lot of cognitive work in a compressed period of time to sort out key questions in moral psychology+philosophy, neuroscience, computer science, etc. that we need to answer in order to get a better picture of good outcomes.
This is also true for a Manhattan Project to develop a powerful search algorithm that generates smart creative policies to satisfy our values, while excluding hazardous parts of the search space—this is the AI route.
Trying to ensure that AI is conscious, without also solving WBE or alignment or global coordination or something of that kind in the process, doesn’t have this “zeroing in” property. It’s more of a gamble that hopefully good-ish futures have a high enough base rate even when we don’t put a lot of work into steering in a specific direction, that maybe arbitrary conscious systems would make good things happen. But building a future of conscious AI systems opens up a lot of ways for suffering to end up proliferating in the universe, just as it opens up a lot of ways for happiness to end up proliferating in the universe. Just as it isn’t obvious that e.g. rabbits experience more joy than suffering in the natural world, it isn’t obvious that conscious AI systems in a multipolar outcome would experience more joy than suffering. (Or otherwise experience good-on-net lives.)
I think if an AI singleton isn’t infeasible or prohibitively difficult to achieve, then it’s likely to happen eventually regardless of what we’d ideally prefer to have happen, absent some intervention to prevent it. Either it’s not achievable, or something needs to occur to prevent anyone in the world from reaching that point. If you’re worried about singletons, I don’t think pursuing multipolar outcomes and/or conscious-AI outcomes should be a priority for you, because I don’t think either of those paths concentrates very much probability mass (if any) into scenarios where singletons start off feasible but something blocks them from occurring.
Multipolar scenarios are likelier to occur in scenarios where singletons simply aren’t feasible, as a background fact about the universe; but conditional on singletons being feasible, I’m skeptical that achieving a multipolar AI outcome would do much (if anything) to prevent a singleton from occurring afterward, and I think it would make alignment much more difficult.
Alignment and WBE look like difficult tasks, but they have the “zeroing in” property, and we don’t know exactly how difficult they are. Alignment in particular could turn out to be much harder than it looks or much easier, because there’s so little understanding of what specifically is required. (Investigating WBE has less value-of-information because we already have some decent WBE roadmaps.)
To clarify: I don’t think it will be especially fruitful to try to ensure AIs are conscious, for the reason you mention: multipolar scenarios don’t really work that way, what will happen is determined by what’s efficient in a competitive world, which doesn’t allow much room to make changes now that will actually persist.
And yes, if a singleton is inevitable, then our only hope for a good future is to do our best to align the singleton, so that it uses its uncontested power to do good things rather than just to pursue whatever nonsense goal it will have been given otherwise.
What I’m concerned about is the possibility that a singleton is not inevitable (which seems to me the most likely scenario) but that folks attempt to create one anyway. This includes realities where a singleton is impossible or close to it, as well as where a singleton is possible but only with some effort made to push towards that outcome. An example of the latter would just be a soft takeoff coupled with an attempt at forming a world government to control the AI—such a scenario certainly seems to me like it could fit the “possible but not inevitable” description.
A world takeover attempt has the potential to go very, very wrong—and then there’s the serious possibility that the creation of the singleton would be successful but the alignment of it would not. Given this, I don’t think it makes sense to push unequivocally for this option, with the enormous risks it entails, until we have a good idea of what the alternative looks like. That we can’t control that alternative is irrelevant—we can still understand it! When we have a reasonable picture of that scenario, then we can start to think about whether it’s so bad that we should embark on dangerous risky strategies to try to avoid it.
One element of that understanding would be on how likely AIs are to be conscious; another would be how good or bad a life conscious AIs would have in a multipolar scenario. I agree entirely that we don’t know this yet—whether for rabbits or for future AIs—that’s part of what I’d need to understand before I’d agree that a singleton seems like our best chance at a good future.
Did anything in Nate’s post or my comments strike you as “pushing for a singleton”? When people say “singleton,” I usually understand them to have in mind some kind of world takeover, which sounds like what you’re talking about here. The strategy people at MIRI favor tends to be more like “figure out what minimal AI system can end the acute risk period (in particular, from singletons), while doing as little else as possible; then steer toward that kind of system”. This shouldn’t be via world takeover if there’s any less-ambitious path to that outcome, because any added capability, or any added wrinkle in the goal you’re using the system for, increases accident risk.
More generally, alignment is something that you can partially solve for systems with some particular set of capabilities, rather than being all-or-nothing.
I think it’s much less likely that we can learn that kind of generalization in advance than that we can solve most of the alignment problem in advance. Additionally, solving this doesn’t in any obvious way get you any closer to being able to block singletons from being developed, in the scenario where singletons are “possible but only with some effort made”. Knowing about the utility of a multipolar outcome where no one ever builds a singleton can be useful for knowing whether you should aim for a multipolar outcome where no one ever build a singleton, but it doesn’t get us any closer to knowing how to prevent anyone from ever building a singleton if you find a way to achieve an initially multipolar outcome.
I’d also add that I think the risk of producing bad conscious states via non-aligned AI mainly lies in AI systems potentially having parts or subsystems that are conscious, rather than in the system as a whole (or executive components) being conscious in the fashion of a human.