ETA: feel free to ignore the below, given your caveat, though you may find it helpful if you choose to write an expanded form of any of the arguments later to have some early objections.
Correct me if I’m wrong, but it seems like most of these reasons boil down to not expecting AI to be superhuman in any relevant sense (since if it is, effectively all of them break down as reasons for optimism)? To wit:
Resource allocation is relatively equal (and relatively free of violence) among humans because even humans that don’t very much value the well-being of others don’t have the power to actually expropriate everyone else’s resources by force. (We have evidence of what happens when those conditions break down to any meaningful degree; it isn’t super pretty.)
I do not think GPT-4 is meaningful evidence about the difficulty of value alignment. In particular, the claim that “GPT-4 seems to be honest, kind, and helpful after relatively little effort” seems to be treating GPT-4′s behavior as meaningfully reflecting its internal preferences or motivations, which I think is “not even wrong”. I think it’s extremely unlikely that GPT-4 has preferences over world states in a way that most humans would consider meaningful, and in the very unlikely event that it does, those preferences almost certainly aren’t centrally pointed at being honest, kind, and helpful.
re: endogenous reponse to AI—I don’t see how this is relevant once you have ASI. To the extent that it might be relevant, it’s basically conceding the argument: that the reason we’ll be safe is that we’ll manage to avoid killing ourselves by moving too quickly. (Note that we are currently moving at pretty close to max speed, so this is a prediction that the future will be different from the past. One that some people are actively optimising for, but also one that other people are optimizing against.)
re: perfectionism—I would not be surprised if many current humans, given superhuman intelligence and power, created a pretty terrible future. Current power differentials do not meaningfully let individual players flip every single other player the bird at the same time. Assuming that this will continue to be true is again assuming the conclusion (that AI will not be superhuman in any relevant sense). I also feel like there’s an implicit argument here about how value isn’t fragile that I disagree with, but I might be reading into it.
I’m not totally sure what analogy you’re trying to rebut, but I think that human treatment of animal species, as a piece of evidence for how we might be treated by future AI systems that are analogously more powerful than we are, is extremely negative, not positive. Human efforts to preserve animal species are a drop in the bucket compared to the casual disregard with which we optimize over them and their environments for our benefit. I’m sure animals sometimes attempt to defend their territory against human encroachment. Has the human response to this been to shrug and back off? Of course, there are some humans who do care about animals having fulfilled lives by their own values. But even most of those humans do not spend their lives tirelessly optimizing for their best understanding of the values of animals.
Correct me if I’m wrong, but it seems like most of these reasons boil down to not expecting AI to be superhuman in any relevant sense
No, I certainly expect AIs will eventually be superhuman in virtually all relevant respects.
Resource allocation is relatively equal (and relatively free of violence) among humans because even humans that don’t very much value the well-being of others don’t have the power to actually expropriate everyone else’s resources by force.
Can you clarify what you are saying here? If I understand you correctly, you’re saying that humans have relatively little wealth inequality because there’s relatively little inequality in power between humans. What does that imply about AI?
I think there will probably be big inequalities in power among AIs, but I am skeptical of the view that there will be only one (or even a few) AIs that dominate over everything else.
I do not think GPT-4 is meaningful evidence about the difficulty of value alignment.
I’m curious: does that mean you also think that alignment research performed on GPT-4 is essentially worthless? If not, why?
I think it’s extremely unlikely that GPT-4 has preferences over world states in a way that most humans would consider meaningful, and in the very unlikely event that it does, those preferences almost certainly aren’t centrally pointed at being honest, kind, and helpful.
I agree that GPT-4 probably doesn’t have preferences in the same way humans do, but it sure appears to be a limited form of general intelligence, and I think future AGI systems will likely share many underlying features with GPT-4, including, to some extent, cognitive representations inside the system.
I think our best guess of future AI systems should be that they’ll be similar to current systems, but scaled up dramatically, trained on more modalities, with some tweaks and post-training enhancements, at least if AGI arrives soon. Are you simply skeptical of short timelines?
re: endogenous reponse to AI—I don’t see how this is relevant once you have ASI.
To be clear, I expect we’ll get AI regulations before we get to ASI. I predict that regulations will increase in intensity as AI systems get more capable and start having a greater impact on the world.
Note that we are currently moving at pretty close to max speed, so this is a prediction that the future will be different from the past.
Every industry in history initially experienced little to no regulation. However, after people became more acquainted with the industry, regulations on the industry increased. I expect AI will follow a similar trajectory. I think this is in line with historical evidence, rather than contradicting it.
re: perfectionism—I would not be surprised if many current humans, given superhuman intelligence and power, created a pretty terrible future. Current power differentials do not meaningfully let individual players flip every single other player the bird at the same time.
I agree. If you turned a random human into a god, or a random small group of humans into gods, then I would be pretty worried. However, in my scenario, there aren’t going to be single AIs that suddenly become gods. Instead, in my scenario, there will be millions of different AIs, and the AIs will smoothly increase in power over time. During this time, we will be able to experiment and do alignment research to see what works and what doesn’t at making the AIs safe. I expect AI takeof will be fairly diffuse, and AIs will probably be respectful of norms and laws because no single AI can take over the world by themselves. Of course, the way I think about the future could be wrong on a lot of specific details, but I don’t see a strong reason to doubt the basic picture I’m presenting, as of now.
My guess is that your main objection here is that you think foom will happen, i.e. there will be a single AI that takes over the world and imposes its will on everyone else. Can you elaborate more on why you think that will happen? I don’t think it’s a straightforward consequence of AIs being smarter than humans.
I’m not totally sure what analogy you’re trying to rebut, but I think that human treatment of animal species, as a piece of evidence for how we might be treated by future AI systems that are analogously more powerful than we are, is extremely negative, not positive.
My main argument is that we should reject the analogy itself. I’m not really arguing that the analogy provides evidence for optimism, except in a very weak sense. I’m just saying: AIs will be born into and shaped by our culture; that’s quite different than what happened between animals and humans.
ETA: feel free to ignore the below, given your caveat, though you may find it helpful if you choose to write an expanded form of any of the arguments later to have some early objections.
Correct me if I’m wrong, but it seems like most of these reasons boil down to not expecting AI to be superhuman in any relevant sense (since if it is, effectively all of them break down as reasons for optimism)? To wit:
Resource allocation is relatively equal (and relatively free of violence) among humans because even humans that don’t very much value the well-being of others don’t have the power to actually expropriate everyone else’s resources by force. (We have evidence of what happens when those conditions break down to any meaningful degree; it isn’t super pretty.)
I do not think GPT-4 is meaningful evidence about the difficulty of value alignment. In particular, the claim that “GPT-4 seems to be honest, kind, and helpful after relatively little effort” seems to be treating GPT-4′s behavior as meaningfully reflecting its internal preferences or motivations, which I think is “not even wrong”. I think it’s extremely unlikely that GPT-4 has preferences over world states in a way that most humans would consider meaningful, and in the very unlikely event that it does, those preferences almost certainly aren’t centrally pointed at being honest, kind, and helpful.
re: endogenous reponse to AI—I don’t see how this is relevant once you have ASI. To the extent that it might be relevant, it’s basically conceding the argument: that the reason we’ll be safe is that we’ll manage to avoid killing ourselves by moving too quickly. (Note that we are currently moving at pretty close to max speed, so this is a prediction that the future will be different from the past. One that some people are actively optimising for, but also one that other people are optimizing against.)
re: perfectionism—I would not be surprised if many current humans, given superhuman intelligence and power, created a pretty terrible future. Current power differentials do not meaningfully let individual players flip every single other player the bird at the same time. Assuming that this will continue to be true is again assuming the conclusion (that AI will not be superhuman in any relevant sense). I also feel like there’s an implicit argument here about how value isn’t fragile that I disagree with, but I might be reading into it.
I’m not totally sure what analogy you’re trying to rebut, but I think that human treatment of animal species, as a piece of evidence for how we might be treated by future AI systems that are analogously more powerful than we are, is extremely negative, not positive. Human efforts to preserve animal species are a drop in the bucket compared to the casual disregard with which we optimize over them and their environments for our benefit. I’m sure animals sometimes attempt to defend their territory against human encroachment. Has the human response to this been to shrug and back off? Of course, there are some humans who do care about animals having fulfilled lives by their own values. But even most of those humans do not spend their lives tirelessly optimizing for their best understanding of the values of animals.
No, I certainly expect AIs will eventually be superhuman in virtually all relevant respects.
Can you clarify what you are saying here? If I understand you correctly, you’re saying that humans have relatively little wealth inequality because there’s relatively little inequality in power between humans. What does that imply about AI?
I think there will probably be big inequalities in power among AIs, but I am skeptical of the view that there will be only one (or even a few) AIs that dominate over everything else.
I’m curious: does that mean you also think that alignment research performed on GPT-4 is essentially worthless? If not, why?
I agree that GPT-4 probably doesn’t have preferences in the same way humans do, but it sure appears to be a limited form of general intelligence, and I think future AGI systems will likely share many underlying features with GPT-4, including, to some extent, cognitive representations inside the system.
I think our best guess of future AI systems should be that they’ll be similar to current systems, but scaled up dramatically, trained on more modalities, with some tweaks and post-training enhancements, at least if AGI arrives soon. Are you simply skeptical of short timelines?
To be clear, I expect we’ll get AI regulations before we get to ASI. I predict that regulations will increase in intensity as AI systems get more capable and start having a greater impact on the world.
Every industry in history initially experienced little to no regulation. However, after people became more acquainted with the industry, regulations on the industry increased. I expect AI will follow a similar trajectory. I think this is in line with historical evidence, rather than contradicting it.
I agree. If you turned a random human into a god, or a random small group of humans into gods, then I would be pretty worried. However, in my scenario, there aren’t going to be single AIs that suddenly become gods. Instead, in my scenario, there will be millions of different AIs, and the AIs will smoothly increase in power over time. During this time, we will be able to experiment and do alignment research to see what works and what doesn’t at making the AIs safe. I expect AI takeof will be fairly diffuse, and AIs will probably be respectful of norms and laws because no single AI can take over the world by themselves. Of course, the way I think about the future could be wrong on a lot of specific details, but I don’t see a strong reason to doubt the basic picture I’m presenting, as of now.
My guess is that your main objection here is that you think foom will happen, i.e. there will be a single AI that takes over the world and imposes its will on everyone else. Can you elaborate more on why you think that will happen? I don’t think it’s a straightforward consequence of AIs being smarter than humans.
My main argument is that we should reject the analogy itself. I’m not really arguing that the analogy provides evidence for optimism, except in a very weak sense. I’m just saying: AIs will be born into and shaped by our culture; that’s quite different than what happened between animals and humans.