I might elaborate on this at some point, but I thought I’d write down some general reasons why I’m more optimistic than many EAs on the risk of human extinction from AI. I’m not defending these reasons here; I’m mostly just stating them.
Skepticism of foom: I think it’s unlikely that a single AI will take over the whole world and impose its will on everyone else. I think it’s more likely that millions of AIs will be competing for control over the world, in a similar way that millions of humans are currently competing for control over the world. Power or wealth might be very unequally distributed in the future, but I find it unlikely that it will be distributed so unequally that there will be only one relevant entity with power. In a non-foomy world, AIs will be constrained by norms and laws. Absent severe misalignment among almost all the AIs, I think these norms and laws will likely include a general prohibition on murdering humans, and there won’t be a particularly strong motive for AIs to murder every human either.
Skepticism that value alignment is super-hard: I haven’t seen any strong arguments that value alignment is very hard, in contrast to the straightforward empirical evidence that e.g. GPT-4 seems to be honest, kind, and helpful after relatively little effort. Most conceptual arguments I’ve seen for why we should expect value alignment to be super-hard rely on strong theoretical assumptions that I am highly skeptical of. I have yet to see significant empirical successes from these arguments. I feel like many of these conceptual arguments would, in theory, apply to humans, and yet human children are generally value aligned by the time they reach young adulthood (at least, value aligned enough to avoid killing all the old people). Unlike humans, AIs will be explicitly trained to be benevolent, and we will have essentially full control over their training process. This provides much reason for optimism.
Belief in a strong endogenous response to AI: I think most people will generally be quite fearful of AI and will demand that we are very cautious while deploying the systems widely. I don’t see a strong reason to expect companies to remain unregulated and rush to cut corners on safety, absent something like a world war that presses people to develop AI as quickly as possible at all costs.
Not being a perfectionist: I don’t think we need our AIs to be perfectly aligned with human values, or perfectly honest, similar to how we don’t need humans to be perfectly aligned and honest. Individual humans are usually quite selfish, frequently lie to each other, and are often cruel, and yet the world mostly gets along despite this. This is true even when there are vast differences in power and wealth between humans. For example some groups in the world have almost no power relative to the United States, and residents in the US don’t particularly care about them either, and yet they survive anyway.
Skepticism of the analogy to other species: it’s generally agreed that humans dominate the world at the expense of other species. But that’s not surprising, since humans evolved independently of other animal species. And we can’t really communicate with other animal species, since they lack language. I don’t think AI is analogous to this situation. AIs will mostly be born into our society, rather than being created outside of it. (Moreover, even in this very pessimistic analogy, humans still spend >0.01% of our GDP on preserving wild animal species, and the vast majority of animal species have not gone extinct despite our giant influence on the natural world.)
ETA: feel free to ignore the below, given your caveat, though you may find it helpful if you choose to write an expanded form of any of the arguments later to have some early objections.
Correct me if I’m wrong, but it seems like most of these reasons boil down to not expecting AI to be superhuman in any relevant sense (since if it is, effectively all of them break down as reasons for optimism)? To wit:
Resource allocation is relatively equal (and relatively free of violence) among humans because even humans that don’t very much value the well-being of others don’t have the power to actually expropriate everyone else’s resources by force. (We have evidence of what happens when those conditions break down to any meaningful degree; it isn’t super pretty.)
I do not think GPT-4 is meaningful evidence about the difficulty of value alignment. In particular, the claim that “GPT-4 seems to be honest, kind, and helpful after relatively little effort” seems to be treating GPT-4′s behavior as meaningfully reflecting its internal preferences or motivations, which I think is “not even wrong”. I think it’s extremely unlikely that GPT-4 has preferences over world states in a way that most humans would consider meaningful, and in the very unlikely event that it does, those preferences almost certainly aren’t centrally pointed at being honest, kind, and helpful.
re: endogenous reponse to AI—I don’t see how this is relevant once you have ASI. To the extent that it might be relevant, it’s basically conceding the argument: that the reason we’ll be safe is that we’ll manage to avoid killing ourselves by moving too quickly. (Note that we are currently moving at pretty close to max speed, so this is a prediction that the future will be different from the past. One that some people are actively optimising for, but also one that other people are optimizing against.)
re: perfectionism—I would not be surprised if many current humans, given superhuman intelligence and power, created a pretty terrible future. Current power differentials do not meaningfully let individual players flip every single other player the bird at the same time. Assuming that this will continue to be true is again assuming the conclusion (that AI will not be superhuman in any relevant sense). I also feel like there’s an implicit argument here about how value isn’t fragile that I disagree with, but I might be reading into it.
I’m not totally sure what analogy you’re trying to rebut, but I think that human treatment of animal species, as a piece of evidence for how we might be treated by future AI systems that are analogously more powerful than we are, is extremely negative, not positive. Human efforts to preserve animal species are a drop in the bucket compared to the casual disregard with which we optimize over them and their environments for our benefit. I’m sure animals sometimes attempt to defend their territory against human encroachment. Has the human response to this been to shrug and back off? Of course, there are some humans who do care about animals having fulfilled lives by their own values. But even most of those humans do not spend their lives tirelessly optimizing for their best understanding of the values of animals.
Correct me if I’m wrong, but it seems like most of these reasons boil down to not expecting AI to be superhuman in any relevant sense
No, I certainly expect AIs will eventually be superhuman in virtually all relevant respects.
Resource allocation is relatively equal (and relatively free of violence) among humans because even humans that don’t very much value the well-being of others don’t have the power to actually expropriate everyone else’s resources by force.
Can you clarify what you are saying here? If I understand you correctly, you’re saying that humans have relatively little wealth inequality because there’s relatively little inequality in power between humans. What does that imply about AI?
I think there will probably be big inequalities in power among AIs, but I am skeptical of the view that there will be only one (or even a few) AIs that dominate over everything else.
I do not think GPT-4 is meaningful evidence about the difficulty of value alignment.
I’m curious: does that mean you also think that alignment research performed on GPT-4 is essentially worthless? If not, why?
I think it’s extremely unlikely that GPT-4 has preferences over world states in a way that most humans would consider meaningful, and in the very unlikely event that it does, those preferences almost certainly aren’t centrally pointed at being honest, kind, and helpful.
I agree that GPT-4 probably doesn’t have preferences in the same way humans do, but it sure appears to be a limited form of general intelligence, and I think future AGI systems will likely share many underlying features with GPT-4, including, to some extent, cognitive representations inside the system.
I think our best guess of future AI systems should be that they’ll be similar to current systems, but scaled up dramatically, trained on more modalities, with some tweaks and post-training enhancements, at least if AGI arrives soon. Are you simply skeptical of short timelines?
re: endogenous reponse to AI—I don’t see how this is relevant once you have ASI.
To be clear, I expect we’ll get AI regulations before we get to ASI. I predict that regulations will increase in intensity as AI systems get more capable and start having a greater impact on the world.
Note that we are currently moving at pretty close to max speed, so this is a prediction that the future will be different from the past.
Every industry in history initially experienced little to no regulation. However, after people became more acquainted with the industry, regulations on the industry increased. I expect AI will follow a similar trajectory. I think this is in line with historical evidence, rather than contradicting it.
re: perfectionism—I would not be surprised if many current humans, given superhuman intelligence and power, created a pretty terrible future. Current power differentials do not meaningfully let individual players flip every single other player the bird at the same time.
I agree. If you turned a random human into a god, or a random small group of humans into gods, then I would be pretty worried. However, in my scenario, there aren’t going to be single AIs that suddenly become gods. Instead, in my scenario, there will be millions of different AIs, and the AIs will smoothly increase in power over time. During this time, we will be able to experiment and do alignment research to see what works and what doesn’t at making the AIs safe. I expect AI takeof will be fairly diffuse, and AIs will probably be respectful of norms and laws because no single AI can take over the world by themselves. Of course, the way I think about the future could be wrong on a lot of specific details, but I don’t see a strong reason to doubt the basic picture I’m presenting, as of now.
My guess is that your main objection here is that you think foom will happen, i.e. there will be a single AI that takes over the world and imposes its will on everyone else. Can you elaborate more on why you think that will happen? I don’t think it’s a straightforward consequence of AIs being smarter than humans.
I’m not totally sure what analogy you’re trying to rebut, but I think that human treatment of animal species, as a piece of evidence for how we might be treated by future AI systems that are analogously more powerful than we are, is extremely negative, not positive.
My main argument is that we should reject the analogy itself. I’m not really arguing that the analogy provides evidence for optimism, except in a very weak sense. I’m just saying: AIs will be born into and shaped by our culture; that’s quite different than what happened between animals and humans.
Individual humans are usually quite selfish, frequently lie to each other, and are often cruel, and yet the world mostly gets along despite this. This is true even when there are vast differences in power and wealth between humans. For example some groups in the world have almost no power relative to the United States, and residents in the US don’t particularly care about them either, and yet they survive anyway.
Okay so these are two analogies: individual humans & groups/countries.
First off, “surviving” doesn’t seem like the right thing to evaluate, more like “significant harm”/”being exploited ”
Can you give some examples where individual humans have a clear strategic decisive advantage (i.e. very low risk of punishment), where the low-power individual isn’t at a high risk of serious harm? Because the examples I can think of are all pretty bad: dictators, slaveholders, husbands in highly patriarchal societies.. Sexual violence is extremely prevalent and is pretty much always in a high power difference context.
I find the US example unconvincing, because I find it hard to imagine the US benefiting more from aggressive use it force, than trade and soft economic exploitation. The US doesn’t have the power to successfully occupy countries anymore. When there were bigger power differences due to technology, we had the age of colonialism.
Can you give some examples where individual humans have a clear strategic decisive advantage (i.e. very low risk of punishment), where the low-power individual isn’t at a high risk of serious harm?
Why are we assuming a low risk of punishment? Risk of punishment depends largely on social norms and laws, and I’m saying that AIs will likely adhere to a set of social norms.
I think the central question is whether these social norms will include the norm “don’t murder humans”. I think such a norm will probably exist, unless almost all AIs are severely misaligned. I think severe misalignment is possible; one can certainly imagine it happening. But I don’t find it likely, since people will care a lot about making AIs ethical, and I’m not yet aware of any strong reasons to think alignment will be super-hard.
I might elaborate on this at some point, but I thought I’d write down some general reasons why I’m more optimistic than many EAs on the risk of human extinction from AI. I’m not defending these reasons here; I’m mostly just stating them.
Skepticism of foom: I think it’s unlikely that a single AI will take over the whole world and impose its will on everyone else. I think it’s more likely that millions of AIs will be competing for control over the world, in a similar way that millions of humans are currently competing for control over the world. Power or wealth might be very unequally distributed in the future, but I find it unlikely that it will be distributed so unequally that there will be only one relevant entity with power. In a non-foomy world, AIs will be constrained by norms and laws. Absent severe misalignment among almost all the AIs, I think these norms and laws will likely include a general prohibition on murdering humans, and there won’t be a particularly strong motive for AIs to murder every human either.
Skepticism that value alignment is super-hard: I haven’t seen any strong arguments that value alignment is very hard, in contrast to the straightforward empirical evidence that e.g. GPT-4 seems to be honest, kind, and helpful after relatively little effort. Most conceptual arguments I’ve seen for why we should expect value alignment to be super-hard rely on strong theoretical assumptions that I am highly skeptical of. I have yet to see significant empirical successes from these arguments. I feel like many of these conceptual arguments would, in theory, apply to humans, and yet human children are generally value aligned by the time they reach young adulthood (at least, value aligned enough to avoid killing all the old people). Unlike humans, AIs will be explicitly trained to be benevolent, and we will have essentially full control over their training process. This provides much reason for optimism.
Belief in a strong endogenous response to AI: I think most people will generally be quite fearful of AI and will demand that we are very cautious while deploying the systems widely. I don’t see a strong reason to expect companies to remain unregulated and rush to cut corners on safety, absent something like a world war that presses people to develop AI as quickly as possible at all costs.
Not being a perfectionist: I don’t think we need our AIs to be perfectly aligned with human values, or perfectly honest, similar to how we don’t need humans to be perfectly aligned and honest. Individual humans are usually quite selfish, frequently lie to each other, and are often cruel, and yet the world mostly gets along despite this. This is true even when there are vast differences in power and wealth between humans. For example some groups in the world have almost no power relative to the United States, and residents in the US don’t particularly care about them either, and yet they survive anyway.
Skepticism of the analogy to other species: it’s generally agreed that humans dominate the world at the expense of other species. But that’s not surprising, since humans evolved independently of other animal species. And we can’t really communicate with other animal species, since they lack language. I don’t think AI is analogous to this situation. AIs will mostly be born into our society, rather than being created outside of it. (Moreover, even in this very pessimistic analogy, humans still spend >0.01% of our GDP on preserving wild animal species, and the vast majority of animal species have not gone extinct despite our giant influence on the natural world.)
ETA: feel free to ignore the below, given your caveat, though you may find it helpful if you choose to write an expanded form of any of the arguments later to have some early objections.
Correct me if I’m wrong, but it seems like most of these reasons boil down to not expecting AI to be superhuman in any relevant sense (since if it is, effectively all of them break down as reasons for optimism)? To wit:
Resource allocation is relatively equal (and relatively free of violence) among humans because even humans that don’t very much value the well-being of others don’t have the power to actually expropriate everyone else’s resources by force. (We have evidence of what happens when those conditions break down to any meaningful degree; it isn’t super pretty.)
I do not think GPT-4 is meaningful evidence about the difficulty of value alignment. In particular, the claim that “GPT-4 seems to be honest, kind, and helpful after relatively little effort” seems to be treating GPT-4′s behavior as meaningfully reflecting its internal preferences or motivations, which I think is “not even wrong”. I think it’s extremely unlikely that GPT-4 has preferences over world states in a way that most humans would consider meaningful, and in the very unlikely event that it does, those preferences almost certainly aren’t centrally pointed at being honest, kind, and helpful.
re: endogenous reponse to AI—I don’t see how this is relevant once you have ASI. To the extent that it might be relevant, it’s basically conceding the argument: that the reason we’ll be safe is that we’ll manage to avoid killing ourselves by moving too quickly. (Note that we are currently moving at pretty close to max speed, so this is a prediction that the future will be different from the past. One that some people are actively optimising for, but also one that other people are optimizing against.)
re: perfectionism—I would not be surprised if many current humans, given superhuman intelligence and power, created a pretty terrible future. Current power differentials do not meaningfully let individual players flip every single other player the bird at the same time. Assuming that this will continue to be true is again assuming the conclusion (that AI will not be superhuman in any relevant sense). I also feel like there’s an implicit argument here about how value isn’t fragile that I disagree with, but I might be reading into it.
I’m not totally sure what analogy you’re trying to rebut, but I think that human treatment of animal species, as a piece of evidence for how we might be treated by future AI systems that are analogously more powerful than we are, is extremely negative, not positive. Human efforts to preserve animal species are a drop in the bucket compared to the casual disregard with which we optimize over them and their environments for our benefit. I’m sure animals sometimes attempt to defend their territory against human encroachment. Has the human response to this been to shrug and back off? Of course, there are some humans who do care about animals having fulfilled lives by their own values. But even most of those humans do not spend their lives tirelessly optimizing for their best understanding of the values of animals.
No, I certainly expect AIs will eventually be superhuman in virtually all relevant respects.
Can you clarify what you are saying here? If I understand you correctly, you’re saying that humans have relatively little wealth inequality because there’s relatively little inequality in power between humans. What does that imply about AI?
I think there will probably be big inequalities in power among AIs, but I am skeptical of the view that there will be only one (or even a few) AIs that dominate over everything else.
I’m curious: does that mean you also think that alignment research performed on GPT-4 is essentially worthless? If not, why?
I agree that GPT-4 probably doesn’t have preferences in the same way humans do, but it sure appears to be a limited form of general intelligence, and I think future AGI systems will likely share many underlying features with GPT-4, including, to some extent, cognitive representations inside the system.
I think our best guess of future AI systems should be that they’ll be similar to current systems, but scaled up dramatically, trained on more modalities, with some tweaks and post-training enhancements, at least if AGI arrives soon. Are you simply skeptical of short timelines?
To be clear, I expect we’ll get AI regulations before we get to ASI. I predict that regulations will increase in intensity as AI systems get more capable and start having a greater impact on the world.
Every industry in history initially experienced little to no regulation. However, after people became more acquainted with the industry, regulations on the industry increased. I expect AI will follow a similar trajectory. I think this is in line with historical evidence, rather than contradicting it.
I agree. If you turned a random human into a god, or a random small group of humans into gods, then I would be pretty worried. However, in my scenario, there aren’t going to be single AIs that suddenly become gods. Instead, in my scenario, there will be millions of different AIs, and the AIs will smoothly increase in power over time. During this time, we will be able to experiment and do alignment research to see what works and what doesn’t at making the AIs safe. I expect AI takeof will be fairly diffuse, and AIs will probably be respectful of norms and laws because no single AI can take over the world by themselves. Of course, the way I think about the future could be wrong on a lot of specific details, but I don’t see a strong reason to doubt the basic picture I’m presenting, as of now.
My guess is that your main objection here is that you think foom will happen, i.e. there will be a single AI that takes over the world and imposes its will on everyone else. Can you elaborate more on why you think that will happen? I don’t think it’s a straightforward consequence of AIs being smarter than humans.
My main argument is that we should reject the analogy itself. I’m not really arguing that the analogy provides evidence for optimism, except in a very weak sense. I’m just saying: AIs will be born into and shaped by our culture; that’s quite different than what happened between animals and humans.
Okay so these are two analogies: individual humans & groups/countries.
First off, “surviving” doesn’t seem like the right thing to evaluate, more like “significant harm”/”being exploited ”
Can you give some examples where individual humans have a clear strategic decisive advantage (i.e. very low risk of punishment), where the low-power individual isn’t at a high risk of serious harm? Because the examples I can think of are all pretty bad: dictators, slaveholders, husbands in highly patriarchal societies.. Sexual violence is extremely prevalent and is pretty much always in a high power difference context.
I find the US example unconvincing, because I find it hard to imagine the US benefiting more from aggressive use it force, than trade and soft economic exploitation. The US doesn’t have the power to successfully occupy countries anymore. When there were bigger power differences due to technology, we had the age of colonialism.
Why are we assuming a low risk of punishment? Risk of punishment depends largely on social norms and laws, and I’m saying that AIs will likely adhere to a set of social norms.
I think the central question is whether these social norms will include the norm “don’t murder humans”. I think such a norm will probably exist, unless almost all AIs are severely misaligned. I think severe misalignment is possible; one can certainly imagine it happening. But I don’t find it likely, since people will care a lot about making AIs ethical, and I’m not yet aware of any strong reasons to think alignment will be super-hard.