You should compare against human nature, which was optimized for something quite different from utilitarianism. If I add up the pros and cons of the thing humans were optimized for and compare it against the thing AIs will be optimized for, I don’t see why it comes out with humans on top, from a utilitarian perspective. Can you elaborate on your reasoning here?
I can’t quickly elaborate in a clear way, but some messy combination of:
I can currently observe humans which screens off a bunch of the comparison and let’s me do direct analysis.
I can directly observe AIs and make predictions of future training methods and their values seem to result from a much more heavily optimized and precise thing with less “slack” in some sense. (Perhaps this is related to genetic bottleneck, I’m unsure.)
AIs will be primarily trained in things which look extremely different from “cooperatively achieving high genetic fitness”.
Current AIs seem to use the vast, vast majority of their reasoning power for purposes which aren’t directly related to their final applications. I predict this will also apply for internal high level reasoning of AIs. This doesn’t seem true for humans.
Humans seem optimized for something which isn’t that far off from utilitarianism from some perspective? Make yourself survive, make your kin group survive, make your tribe survive, etc? I think utilitarianism is often a natural generalization of “I care about the experience of XYZ, it seems arbitrary/dumb/bad to draw the boundary narrowly, so I should extend this further” (This is how I get to utilitarianism.) I think the AI optimization looks considerably worse than this by default.
(Again, note that I said in my comment above: “Some of these can be defeated relatively easily if we train AIs specifically to be good successors, but the default AIs which end up with power over the future will not have this property.” I edited this in to my prior comment, so you might have missed it, sorry.)
I can currently observe humans which screens off a bunch of the comparison and let’s me do direct analysis.
I’m in agreement that this consideration makes it hard to do a direct comparison. But I think this consideration should mostly make us more uncertain, rather than making us think that humans are better than the alternative. Analogy: if you rolled a die, and I didn’t see the result, the expected value is not low just because I am uncertain about what happened. What matters here is the expected value, not necessarily the variance.
I can directly observe AIs and make predictions of future training methods and their values seem to result from a much more heavily optimized and precise thing with less “slack” in some sense. (Perhaps this is related to genetic bottleneck, I’m unsure.)
I guess I am having trouble understanding this point.
AIs will be primarily trained in things which look extremely different from “cooperatively achieving high genetic fitness”.
Sure, but the question is why being different makes it worse along the relevant axes that we were discussing. The question is not just “will AIs be different than humans?” to which the answer would be “Obviously, yes”. We’re talking about why the differences between humans and AIs make AIs better or worse in expectation, not merely different.
Current AIs seem to use the vast, vast majority of their reasoning power for purposes which aren’t directly related to their final applications. I predict this will also apply for internal high level reasoning of AIs. This doesn’t seem true for humans.
I am having a hard time parsing this claim. What do you mean by “final applications”? And why won’t this be true for future AGIs that are at human-level intelligence or above? And why does this make a difference to the ultimate claim that you’re trying to support?
Humans seem optimized for something which isn’t that far off from utilitarianism from some perspective? Make yourself survive, make your kin group survive, make your tribe survive, etc? I think utilitarianism is often a natural generalization of “I care about the experience of XYZ, it seems arbitrary/dumb/bad to draw the boundary narrowly, so I should extend this further” (This is how I get to utilitarianism.) I think the AI optimization looks considerably worse than this by default.
This consideration seems very weak to me. Early AGIs will presumably be directly optimized for something like consumer value, which looks a lot closer to “utilitarianism” to me than the implicit values in gene-centered evolution. When I talk to GPT-4, I find that it’s way more altruistic and interested in making others happy than most humans are. This seems kind of a little bit like utilitarianism to me—at least more than your description of what human evolution was optimizing for. But maybe I’m just not understanding the picture you’re painting well enough though. Or maybe my model of AI is wrong.
I’m in agreement that this consideration makes it hard to do a direct comparison. But I think this consideration should mostly make us more uncertain, rather than making us think that humans are better than the alternative.
Actually, I was just trying to say “I can see what humans are like, and it seems pretty good relative to me current guesses about AIs in ways that dont just update me up about AIs” sorry about the confusion.
I think utilitarianism is often a natural generalization of “I care about the experience of XYZ, it seems arbitrary/dumb/bad to draw the boundary narrowly, so I should extend this further” (This is how I get to utilitarianism.) I think the AI optimization looks considerably worse than this by default.
Why is this different between AIs and humans? Do you expect AIs to care less about experience than humans, maybe bc humans get reward during life-time learning about AIs don’t get reward during in context learning?
I can directly observe AIs and make predictions of future training methods and their values seem to result from a much more heavily optimized and precise thing with less “slack” in some sense. (Perhaps this is related to genetic bottleneck, I’m unsure.)
Can you say more about how slack (or genetic bottleneck) would affect whether AIs have values that are good by human lights?
Current AIs seem to use the vast, vast majority of their reasoning power for purposes which aren’t directly related to their final applications. I predict this will also apply for internal high level reasoning of AIs. This doesn’t seem true for humans.
In what sense do AIs use their reasoning power in this way? How that that affect whether they will have values that humans like?
I can’t quickly elaborate in a clear way, but some messy combination of:
I can currently observe humans which screens off a bunch of the comparison and let’s me do direct analysis.
I can directly observe AIs and make predictions of future training methods and their values seem to result from a much more heavily optimized and precise thing with less “slack” in some sense. (Perhaps this is related to genetic bottleneck, I’m unsure.)
AIs will be primarily trained in things which look extremely different from “cooperatively achieving high genetic fitness”.
Current AIs seem to use the vast, vast majority of their reasoning power for purposes which aren’t directly related to their final applications. I predict this will also apply for internal high level reasoning of AIs. This doesn’t seem true for humans.
Humans seem optimized for something which isn’t that far off from utilitarianism from some perspective? Make yourself survive, make your kin group survive, make your tribe survive, etc? I think utilitarianism is often a natural generalization of “I care about the experience of XYZ, it seems arbitrary/dumb/bad to draw the boundary narrowly, so I should extend this further” (This is how I get to utilitarianism.) I think the AI optimization looks considerably worse than this by default.
(Again, note that I said in my comment above: “Some of these can be defeated relatively easily if we train AIs specifically to be good successors, but the default AIs which end up with power over the future will not have this property.” I edited this in to my prior comment, so you might have missed it, sorry.)
I’m in agreement that this consideration makes it hard to do a direct comparison. But I think this consideration should mostly make us more uncertain, rather than making us think that humans are better than the alternative. Analogy: if you rolled a die, and I didn’t see the result, the expected value is not low just because I am uncertain about what happened. What matters here is the expected value, not necessarily the variance.
I guess I am having trouble understanding this point.
Sure, but the question is why being different makes it worse along the relevant axes that we were discussing. The question is not just “will AIs be different than humans?” to which the answer would be “Obviously, yes”. We’re talking about why the differences between humans and AIs make AIs better or worse in expectation, not merely different.
I am having a hard time parsing this claim. What do you mean by “final applications”? And why won’t this be true for future AGIs that are at human-level intelligence or above? And why does this make a difference to the ultimate claim that you’re trying to support?
This consideration seems very weak to me. Early AGIs will presumably be directly optimized for something like consumer value, which looks a lot closer to “utilitarianism” to me than the implicit values in gene-centered evolution. When I talk to GPT-4, I find that it’s way more altruistic and interested in making others happy than most humans are. This seems kind of a little bit like utilitarianism to me—at least more than your description of what human evolution was optimizing for. But maybe I’m just not understanding the picture you’re painting well enough though. Or maybe my model of AI is wrong.
Actually, I was just trying to say “I can see what humans are like, and it seems pretty good relative to me current guesses about AIs in ways that dont just update me up about AIs” sorry about the confusion.
Why is this different between AIs and humans? Do you expect AIs to care less about experience than humans, maybe bc humans get reward during life-time learning about AIs don’t get reward during in context learning?
Can you say more about how slack (or genetic bottleneck) would affect whether AIs have values that are good by human lights?
They might well be trained to cooperate with other copies on tasks, if this is they way they’ll be deployed in practice?
In what sense do AIs use their reasoning power in this way? How that that affect whether they will have values that humans like?