Thanks, interesting idea, I think I mostly disagree and would like to see AI progress specifically slowed/âhalted while continuing to have advances in space exploration, biology, nuclear power, etc and that if we later get safe TAI we wonât have become too anti-technology/âanti-growth to expand a lot. But I hadnât thought about this before and there probably is something to this, I just think it is most likely swamped by the risks from AI. It is a good reason to be careful in pause AI type pitches to be narrowly focused on frontier AI models rather than tech and science in general.
I suppose when I think about pro-expansion things I would like to see they are only really ones that do not (IMO) increase x-risksâbetter institutions, more pro-natalism, space exploration, maybe cognitive enhancement.
Thanks, interesting idea, I think I mostly disagree and would like to see AI progress specifically slowed/âhalted while continuing to have advances in space exploration, biology, nuclear power, etc
Of those technologies, AI seems to be the only one that could be transformative, in the sense of sustaining dramatic economic growth and bringing about a giant, vibrant cosmic future. In other words, it seems youâre saying we should slow down the most promising technologyâthe only technology that could actually take us to the future youâre advocating forâbut make sure not to slow down the less promising ones. The fact that people want to slow down (and halt!) precisely the technology that is most promising is basically the whole reason Iâm worried hereâI think my argument would be much less strong if I we were talking about slowing down something like nuclear power.
I hadnât thought about this before and there probably is something to this, I just think it is most likely swamped by the risks from AI.
Itâs important to be clear about what we mean when we talk about the risks from AI. Do you mean:
The risk that AI could disempower humanity in particular?
The risk that AI could derail a large, vibrant cosmic civilization?
I think AI does pose a large risk in the sense of (1), but (2) is more important from a total utilitarian perspective, and it doesnât seem particularly likely to me that AIs pose a large risk in the sense of (2) (as the AIs themselves, after disempowering humanity, would presumably go on to create a big, vibrant civilization).
If you care about humanity as a species in particular, I understand the motive behind slowing down AI. On the other hand, if youâre a total utilitarian (or youâre concerned about the present generation of humans who might otherwise miss out on the benefits of AI), then Iâm not convinced, as you seem to be, that the risks from AI outweigh the considerations that I mentioned.
It is a good reason to be careful in pause AI type pitches to be narrowly focused on frontier AI models rather than tech and science in general.
Again, the frontier AI models, in my view, are precisely what is most promising from a pro-growth perspective. So if you are worried about EAs choking off economic growth and spurring cosmic NIMBYism by establishing norms against growth, it seems from my perspective that you should be mostconcerned about attempts to obstruct frontier AI research.
Whatâs the argument for why an AI future will create lots of value by total utilitarian lights?
At least for hedonistic total utilitarianism, I expect that a large majority of expected-hedonistic-value (from our current epistemic state) will be created by people who are at least partially sympathetic to hedonistic utilitarianism or other value systems that value a similar type of happiness in a scope-sensitive fashion. And Iâd guess that humans are more likely to have such values than AI systems. (At least conditional on my thinking that such values are a good idea, on reflection.)
Objective-list theories of welfare seems even less likely to be endorsed by AIs. (Since they seem pretty niche to human values.)
Thereâs certainly some values you could have that would mainly be concerned that we got any old world with a large civilization. Or that would think it morally appropriate to be happy that someone got to use the universe for what they wanted, and morally inappropriate to be too opinionated about who that should be. But I donât think that looks like utilitarianism.
We can similarly ask, âWhy would an em future create lots of value by total utilitarian lights?â The answer Iâd give is: it would happen for essentially the same reasons biological humans might do such a thing. For example, some biological humans are utilitarians. But some ems might be utilitarians too. Therefore, both could create lots of value by total utilitarian lights.
In order to claim that ems have a significantly lower chance of creating lots of value by total utilitarian lights than biological humans, youâd need to posit a distinction between ems and biological humans that makes this possibility plausible. Some candidate distinctions, such as the idea that ems would not be conscious because theyâre on a computer, seem implausible in any way that could imply the conclusion. So, at least as far as I can tell, I cannot identify any such distinction; and thus, ems seem similarly likely to create lots of value by total utilitarian lights, compared to biological humans.
The exact same analysis can likewise be carried over to the case for AIs. Some biological humans are utilitarians, but some AIs might be utilitarians too. Therefore, both could create lots of value by total utilitarian lights.
In order to claim that AIs have a significantly lower chance of creating lots of value by total utilitarian lights than biological humans, youâd need to posit a distinction between AIs and biological humans that makes this possibility plausible. A number of candidate distinctions have been given to me in the past. These include:
The idea that AIs will not be conscious
The idea that AIs will care less about optimizing for extreme states of moral value
The idea that AIs will care more about optimizing imperfectly specified utility functions, which wonât produce much utilitarian moral value
In each case I generally find that the candidate distinction is either poorly supported, or it does not provide strong support for the conclusion. So, just as with ems, I find the idea that AIs will have a significantly lower chance of creating lots of value by total utilitarian lights than biological humans to be weak. I do not claim that there is definitely no such distinction that would convince me of this premise. But I have yet to hear one that has compelled me so far.
Positive argument in favor of humans: It seems pretty likely that whatever Iâd value on-reflection will be represented in a human future, since Iâm a human. (And accordingly, Iâm similar to many other humans along many dimensions.)
If AI values where sampled ~randomly (whatever that means), I think that the above argument would be basically enough to carry the day in favor of humans.
But hereâs a salient positive argument in favor of why AIsâ values will be similar to mine: People will be training AIs to be nice and helpful, which will surely push them towards better values.
However, I also expect people to be training AIs for obedience and, in particular, training them to not disempower humanity. So if we condition on a future where AIs disempower humanity, we evidentally didnât have that much control over their values. This signiciantly weakens the strength of the argument âtheyâll be nice because weâll train them to be niceâ.
In addition: human disempowerment is more likely to succeed if AIs are willing to egregiously violate norms, such a by lying, stealing, and killing. So conditioning on human disempowerment also updates me somewhat towards egregiously norm-violating AI. That makes me feel less good about their values.
Another argument is that, in the near term, weâll train AIs to act nicely on short-horizon tasks, but we wonât particularly train them to deliberate and reflect on their values well. So even if âAIsâ best-guess stated valuesâ are similar to âmy best-guess stated valuesâ, thereâs less reason to belive that âAIsâ on-reflection valuesâ are similar to âmy on-reflection valuesâ. (Whereas the basic argument of my being similar to humans still work ok: âmy on-reflection valuesâ vs. âother humansâ on-reflection valuesâ.)
Edit: Oops, I accidentally switched to talking about âmy on-reflection valuesâ rather than âtotal utilitarian valuesâ. The former is ultimately what I care more about, though, so it is what Iâm more interested in. But sorry for the switch.
The ones I would say are something like (approximately in priority order):
AIâs values could result mostly from playing the training game or other relatively specific optimizations they performed in training which might result in extremely bizarre values from our perspective.
More generally AI values might be highly alien in a way where caring about experience seems very strange to them.
AIs by default will be optimized for very specific commercial purposes with narrow specializations and a variety of hyperspecific heuristics and the resulting values and generalizations of these will be problematic
I care ultimately about what I would think is good upon (vast amounts of) reflection and there are good a priori reasons to think this is similar to what other humans (who care about using vast amounts of compute) will end up thinking is good.
As a sub argument, I might care specifically about things which are much more specific than âlots of good diverse experienceâ. And, divergences from what I care about (even conditioning on something roughly utilitarian) might result in massive discounts from my perspective.
I care less about my values and preferences in worlds where they seem relatively contingent, e.g. they arenât broadly shared on reflection by reasonable fractions of humanity.
AIs donât have a genetic bottleneck and thus can learn much more specific drives that perform well while evolution had to make values more discoverable and adaptable.
E.g. various things about empathy.
AIs might have extremely low levels of cognitive diversity in their training environments as far as co-workers go which might result in very different attitudes as far as caring about diverse experience.
Some of these can be defeated relatively easily if we train AIs specifically to be good successors, but the default AIs which end up with power over the future will not have this property.
Also, I should note that this isnât a very strong list, though in aggregate itâs sufficient to make me think that human control is perhaps 4x better than AIs. (For reference, Iâd say that me personally being in control is maybe 3x better than human control.) I disagree with a MIRI style view about the disvalue of AI and the extent of fragility of value that seems implicit.
AIâs values could result mostly from playing the training game or other relatively specific optimizations they performed in training
Donât humans also play the training game when being instructed to be nice/âgood/âmoral? (Humans donât do it all the time, and maybe some humans donât do it at all; but then again, I donât think every AI would play the training game all the time either.)
AIs by default will be optimized for very specific commercial purposes with narrow specializations and a variety of hyperspecific heuristics and the resulting values and generalizations of these will be problematic
You should compare against human nature, which was optimized for something quite different from utilitarianism. If I add up the pros and cons of the thing humans were optimized for and compare it against the thing AIs will be optimized for, I donât see why it comes out with humans on top, from a utilitarian perspective. Can you elaborate on your reasoning here?
I care ultimately about what I would think is good upon (vast amounts of) reflection and there are good a priori reasons to think this is similar to what other humans (who care about using vast amounts of compute) will end up thinking is good.
What are these a priori reasons and why donât they similarly apply to AI?
AIs donât have a genetic bottleneck and thus can learn much more specific drives that perform well while evolution had to make values more discoverable and adaptable.
I havenât thought about this one much, but it seems like an interesting consideration.
AIs might have extremely low levels of cognitive diversity in their training environments as far as co-workers go which might result in very different attitudes as far as caring about diverse experience.
This consideration feels quite weak to me, although you also listed it last, so I guess you might agree with my assessment.
You should compare against human nature, which was optimized for something quite different from utilitarianism. If I add up the pros and cons of the thing humans were optimized for and compare it against the thing AIs will be optimized for, I donât see why it comes out with humans on top, from a utilitarian perspective. Can you elaborate on your reasoning here?
I canât quickly elaborate in a clear way, but some messy combination of:
I can currently observe humans which screens off a bunch of the comparison and letâs me do direct analysis.
I can directly observe AIs and make predictions of future training methods and their values seem to result from a much more heavily optimized and precise thing with less âslackâ in some sense. (Perhaps this is related to genetic bottleneck, Iâm unsure.)
AIs will be primarily trained in things which look extremely different from âcooperatively achieving high genetic fitnessâ.
Current AIs seem to use the vast, vast majority of their reasoning power for purposes which arenât directly related to their final applications. I predict this will also apply for internal high level reasoning of AIs. This doesnât seem true for humans.
Humans seem optimized for something which isnât that far off from utilitarianism from some perspective? Make yourself survive, make your kin group survive, make your tribe survive, etc? I think utilitarianism is often a natural generalization of âI care about the experience of XYZ, it seems arbitrary/âdumb/âbad to draw the boundary narrowly, so I should extend this furtherâ (This is how I get to utilitarianism.) I think the AI optimization looks considerably worse than this by default.
(Again, note that I said in my comment above: âSome of these can be defeated relatively easily if we train AIs specifically to be good successors, but the default AIs which end up with power over the future will not have this property.â I edited this in to my prior comment, so you might have missed it, sorry.)
I can currently observe humans which screens off a bunch of the comparison and letâs me do direct analysis.
Iâm in agreement that this consideration makes it hard to do a direct comparison. But I think this consideration should mostly make us more uncertain, rather than making us think that humans are better than the alternative. Analogy: if you rolled a die, and I didnât see the result, the expected value is not low just because I am uncertain about what happened. What matters here is the expected value, not necessarily the variance.
I can directly observe AIs and make predictions of future training methods and their values seem to result from a much more heavily optimized and precise thing with less âslackâ in some sense. (Perhaps this is related to genetic bottleneck, Iâm unsure.)
I guess I am having trouble understanding this point.
AIs will be primarily trained in things which look extremely different from âcooperatively achieving high genetic fitnessâ.
Sure, but the question is why being different makes it worse along the relevant axes that we were discussing. The question is not just âwill AIs be different than humans?â to which the answer would be âObviously, yesâ. Weâre talking about why the differences between humans and AIs make AIs better or worse in expectation, not merely different.
Current AIs seem to use the vast, vast majority of their reasoning power for purposes which arenât directly related to their final applications. I predict this will also apply for internal high level reasoning of AIs. This doesnât seem true for humans.
I am having a hard time parsing this claim. What do you mean by âfinal applicationsâ? And why wonât this be true for future AGIs that are at human-level intelligence or above? And why does this make a difference to the ultimate claim that youâre trying to support?
Humans seem optimized for something which isnât that far off from utilitarianism from some perspective? Make yourself survive, make your kin group survive, make your tribe survive, etc? I think utilitarianism is often a natural generalization of âI care about the experience of XYZ, it seems arbitrary/âdumb/âbad to draw the boundary narrowly, so I should extend this furtherâ (This is how I get to utilitarianism.) I think the AI optimization looks considerably worse than this by default.
This consideration seems very weak to me. Early AGIs will presumably be directly optimized for something like consumer value, which looks a lot closer to âutilitarianismâ to me than the implicit values in gene-centered evolution. When I talk to GPT-4, I find that itâs way more altruistic and interested in making others happy than most humans are. This seems kind of a little bit like utilitarianism to meâat least more than your description of what human evolution was optimizing for. But maybe Iâm just not understanding the picture youâre painting well enough though. Or maybe my model of AI is wrong.
Iâm in agreement that this consideration makes it hard to do a direct comparison. But I think this consideration should mostly make us more uncertain, rather than making us think that humans are better than the alternative.
Actually, I was just trying to say âI can see what humans are like, and it seems pretty good relative to me current guesses about AIs in ways that dont just update me up about AIsâ sorry about the confusion.
I think utilitarianism is often a natural generalization of âI care about the experience of XYZ, it seems arbitrary/âdumb/âbad to draw the boundary narrowly, so I should extend this furtherâ (This is how I get to utilitarianism.) I think the AI optimization looks considerably worse than this by default.
Why is this different between AIs and humans? Do you expect AIs to care less about experience than humans, maybe bc humans get reward during life-time learning about AIs donât get reward during in context learning?
I can directly observe AIs and make predictions of future training methods and their values seem to result from a much more heavily optimized and precise thing with less âslackâ in some sense. (Perhaps this is related to genetic bottleneck, Iâm unsure.)
Can you say more about how slack (or genetic bottleneck) would affect whether AIs have values that are good by human lights?
Current AIs seem to use the vast, vast majority of their reasoning power for purposes which arenât directly related to their final applications. I predict this will also apply for internal high level reasoning of AIs. This doesnât seem true for humans.
In what sense do AIs use their reasoning power in this way? How that that affect whether they will have values that humans like?
âHumanâ is just one category you belong to. Youâre also a member of the category âintelligent beingsâ, which you will share with AGIs. Another category you share with near-future AGIs is âbeings who were trained on 21st century cultural dataâ. I guess 12th century humans arenât in that category, so maybe we donât share their values?
Perhaps the category that matters is your nationality. Or maybe itâs âbeings in the Milky Wayâ, and you wouldnât trust people from Andromeda? (To be clear, this is rhetorical, not an actual suggestion)
My point here is that I think your argument could benefit from some rigor by specifying exactly what about being human makes someone share your values in the sense you are describing. As it stands, this reasoning seems quite shallow to me.
Currently, humans seem much closer to me in a values level than GPT-4 base. I think this is also likely to be true of future AIs, though I understand why you might not find this convincing.
I think the architecture (learning algorithm, etc.) and training environment between me and other humans seems vastly more similar than between me and likely AIs.
I donât think Iâm going to flesh this argument out to an extent to which youâd find it sufficiently rigorous or convincing, sorry.
I donât think Iâm going to flesh this argument out to an extent to which youâd find it sufficiently rigorous or convincing, sorry.
Getting a bit meta for a bit, Iâm curious (if youâd like to answer) whether you feel that you wonât explain your views rigorously in a convincing way here mainly because (1) you are uncertain about these specific views, (2) you think your views are inherently difficult or costly to explain despite nonetheless being very compelling, (3) you think I canât understand your views easily because Iâm lacking some bedrock intuitions that are too costly to convey, or (4) some other option.
My views are reasonably messy, complicated, hard to articulate, and based on a relatively diffuse set of intuitions. I think we also reason in a pretty different way about the situation than you seem to (3). I think it wouldnât be impossible to try to write up a post on my views, but I would need to consolidate and think about how exactly to express where Iâm at. (Maybe 2-5 person days of work.) I havenât really consolidated my views or something close to reflective equilibrium.
I also just that arguing about pure philosophy very rarely gets anywhere and is very hard to make convincing in general.
Iâm somewhat uncertain on the âinside view/âmechanisticâ level. (But my all considered view is partially defering to some people which makes me overall less worried that I should immediately reconsider my life choices.)
I think my views are compelling, but Iâm not sure if Iâd say âvery compellingâ
My guess now of where we most disagree is regarding the value of a world where AIs disempower humanity and go onto have a vast technologically super-advanced, rapidly expanding civilisation. I think this would quite likely be ~0 value since we donât understand consciousness at all really, and my guess is that AIs arenât yet conscious and if we relatively quickly get to TAI in the current paradigm they probably still wonât be moral patients. As a sentientist I donât really care whether there is a huge future if humans (or something sufficiently related to humans e.g. we carefully study consciousness for a millennium and create digital people we are very confident have morally important experiences to be our successors) arenât in it.
So yes I agree frontier AI models are where the most transformative potential lies, but I would prefer to get there far later once we understand alignment and consciousness far better (while other less important tech progress continues in the meantime).
My guess now of where we most disagree is regarding the value of a world where AIs disempower humanity and go onto have a vast technologically super-advanced, rapidly expanding civilisation. I think this would quite likely be ~0 value since we donât understand consciousness at all really, and my guess is that AIs arenât yet conscious and if we relatively quickly get to TAI in the current paradigm they probably still wonât be moral patients.
Thanks. I disagree with this for the following reasons:
AIs will get more complex over time, even in our current paradigm. Eventually I expect AIs will have highly sophisticated cognition that Iâd feel comfortable calling conscious, on our current path of development (Iâm an illusionist about phenomenal consciousness so I donât think thereâs a fact of the matter anyway).
If we slowed down AI, I donât think that would necessarily translate into a higher likelihood that future AIs will be conscious. Why would it?
In the absence of a strong argument that slowing down AI makes future AIs more likely to be conscious, I still think the considerations I mentioned are stronger than the counter-considerations youâve mentioned here, and I think they should push us towards trying to avoid entrenching norms that could hamper future growth and innovation.
Thanks, interesting idea, I think I mostly disagree and would like to see AI progress specifically slowed/âhalted while continuing to have advances in space exploration, biology, nuclear power, etc and that if we later get safe TAI we wonât have become too anti-technology/âanti-growth to expand a lot. But I hadnât thought about this before and there probably is something to this, I just think it is most likely swamped by the risks from AI. It is a good reason to be careful in pause AI type pitches to be narrowly focused on frontier AI models rather than tech and science in general.
I suppose when I think about pro-expansion things I would like to see they are only really ones that do not (IMO) increase x-risksâbetter institutions, more pro-natalism, space exploration, maybe cognitive enhancement.
Of those technologies, AI seems to be the only one that could be transformative, in the sense of sustaining dramatic economic growth and bringing about a giant, vibrant cosmic future. In other words, it seems youâre saying we should slow down the most promising technologyâthe only technology that could actually take us to the future youâre advocating forâbut make sure not to slow down the less promising ones. The fact that people want to slow down (and halt!) precisely the technology that is most promising is basically the whole reason Iâm worried hereâI think my argument would be much less strong if I we were talking about slowing down something like nuclear power.
Itâs important to be clear about what we mean when we talk about the risks from AI. Do you mean:
The risk that AI could disempower humanity in particular?
The risk that AI could derail a large, vibrant cosmic civilization?
I think AI does pose a large risk in the sense of (1), but (2) is more important from a total utilitarian perspective, and it doesnât seem particularly likely to me that AIs pose a large risk in the sense of (2) (as the AIs themselves, after disempowering humanity, would presumably go on to create a big, vibrant civilization).
If you care about humanity as a species in particular, I understand the motive behind slowing down AI. On the other hand, if youâre a total utilitarian (or youâre concerned about the present generation of humans who might otherwise miss out on the benefits of AI), then Iâm not convinced, as you seem to be, that the risks from AI outweigh the considerations that I mentioned.
Again, the frontier AI models, in my view, are precisely what is most promising from a pro-growth perspective. So if you are worried about EAs choking off economic growth and spurring cosmic NIMBYism by establishing norms against growth, it seems from my perspective that you should be most concerned about attempts to obstruct frontier AI research.
Whatâs the argument for why an AI future will create lots of value by total utilitarian lights?
At least for hedonistic total utilitarianism, I expect that a large majority of expected-hedonistic-value (from our current epistemic state) will be created by people who are at least partially sympathetic to hedonistic utilitarianism or other value systems that value a similar type of happiness in a scope-sensitive fashion. And Iâd guess that humans are more likely to have such values than AI systems. (At least conditional on my thinking that such values are a good idea, on reflection.)
Objective-list theories of welfare seems even less likely to be endorsed by AIs. (Since they seem pretty niche to human values.)
Thereâs certainly some values you could have that would mainly be concerned that we got any old world with a large civilization. Or that would think it morally appropriate to be happy that someone got to use the universe for what they wanted, and morally inappropriate to be too opinionated about who that should be. But I donât think that looks like utilitarianism.
We can similarly ask, âWhy would an em future create lots of value by total utilitarian lights?â The answer Iâd give is: it would happen for essentially the same reasons biological humans might do such a thing. For example, some biological humans are utilitarians. But some ems might be utilitarians too. Therefore, both could create lots of value by total utilitarian lights.
In order to claim that ems have a significantly lower chance of creating lots of value by total utilitarian lights than biological humans, youâd need to posit a distinction between ems and biological humans that makes this possibility plausible. Some candidate distinctions, such as the idea that ems would not be conscious because theyâre on a computer, seem implausible in any way that could imply the conclusion. So, at least as far as I can tell, I cannot identify any such distinction; and thus, ems seem similarly likely to create lots of value by total utilitarian lights, compared to biological humans.
The exact same analysis can likewise be carried over to the case for AIs. Some biological humans are utilitarians, but some AIs might be utilitarians too. Therefore, both could create lots of value by total utilitarian lights.
In order to claim that AIs have a significantly lower chance of creating lots of value by total utilitarian lights than biological humans, youâd need to posit a distinction between AIs and biological humans that makes this possibility plausible. A number of candidate distinctions have been given to me in the past. These include:
The idea that AIs will not be conscious
The idea that AIs will care less about optimizing for extreme states of moral value
The idea that AIs will care more about optimizing imperfectly specified utility functions, which wonât produce much utilitarian moral value
In each case I generally find that the candidate distinction is either poorly supported, or it does not provide strong support for the conclusion. So, just as with ems, I find the idea that AIs will have a significantly lower chance of creating lots of value by total utilitarian lights than biological humans to be weak. I do not claim that there is definitely no such distinction that would convince me of this premise. But I have yet to hear one that has compelled me so far.
Hereâs one line of argument:
Positive argument in favor of humans: It seems pretty likely that whatever Iâd value on-reflection will be represented in a human future, since Iâm a human. (And accordingly, Iâm similar to many other humans along many dimensions.)
If AI values where sampled ~randomly (whatever that means), I think that the above argument would be basically enough to carry the day in favor of humans.
But hereâs a salient positive argument in favor of why AIsâ values will be similar to mine: People will be training AIs to be nice and helpful, which will surely push them towards better values.
However, I also expect people to be training AIs for obedience and, in particular, training them to not disempower humanity. So if we condition on a future where AIs disempower humanity, we evidentally didnât have that much control over their values. This signiciantly weakens the strength of the argument âtheyâll be nice because weâll train them to be niceâ.
In addition: human disempowerment is more likely to succeed if AIs are willing to egregiously violate norms, such a by lying, stealing, and killing. So conditioning on human disempowerment also updates me somewhat towards egregiously norm-violating AI. That makes me feel less good about their values.
Another argument is that, in the near term, weâll train AIs to act nicely on short-horizon tasks, but we wonât particularly train them to deliberate and reflect on their values well. So even if âAIsâ best-guess stated valuesâ are similar to âmy best-guess stated valuesâ, thereâs less reason to belive that âAIsâ on-reflection valuesâ are similar to âmy on-reflection valuesâ. (Whereas the basic argument of my being similar to humans still work ok: âmy on-reflection valuesâ vs. âother humansâ on-reflection valuesâ.)
Edit: Oops, I accidentally switched to talking about âmy on-reflection valuesâ rather than âtotal utilitarian valuesâ. The former is ultimately what I care more about, though, so it is what Iâm more interested in. But sorry for the switch.
The ones I would say are something like (approximately in priority order):
AIâs values could result mostly from playing the training game or other relatively specific optimizations they performed in training which might result in extremely bizarre values from our perspective.
More generally AI values might be highly alien in a way where caring about experience seems very strange to them.
AIs by default will be optimized for very specific commercial purposes with narrow specializations and a variety of hyperspecific heuristics and the resulting values and generalizations of these will be problematic
I care ultimately about what I would think is good upon (vast amounts of) reflection and there are good a priori reasons to think this is similar to what other humans (who care about using vast amounts of compute) will end up thinking is good.
As a sub argument, I might care specifically about things which are much more specific than âlots of good diverse experienceâ. And, divergences from what I care about (even conditioning on something roughly utilitarian) might result in massive discounts from my perspective.
I care less about my values and preferences in worlds where they seem relatively contingent, e.g. they arenât broadly shared on reflection by reasonable fractions of humanity.
AIs donât have a genetic bottleneck and thus can learn much more specific drives that perform well while evolution had to make values more discoverable and adaptable.
E.g. various things about empathy.
AIs might have extremely low levels of cognitive diversity in their training environments as far as co-workers go which might result in very different attitudes as far as caring about diverse experience.
Some of these can be defeated relatively easily if we train AIs specifically to be good successors, but the default AIs which end up with power over the future will not have this property.
Also, I should note that this isnât a very strong list, though in aggregate itâs sufficient to make me think that human control is perhaps 4x better than AIs. (For reference, Iâd say that me personally being in control is maybe 3x better than human control.) I disagree with a MIRI style view about the disvalue of AI and the extent of fragility of value that seems implicit.
Donât humans also play the training game when being instructed to be nice/âgood/âmoral? (Humans donât do it all the time, and maybe some humans donât do it at all; but then again, I donât think every AI would play the training game all the time either.)
You should compare against human nature, which was optimized for something quite different from utilitarianism. If I add up the pros and cons of the thing humans were optimized for and compare it against the thing AIs will be optimized for, I donât see why it comes out with humans on top, from a utilitarian perspective. Can you elaborate on your reasoning here?
What are these a priori reasons and why donât they similarly apply to AI?
I havenât thought about this one much, but it seems like an interesting consideration.
This consideration feels quite weak to me, although you also listed it last, so I guess you might agree with my assessment.
I canât quickly elaborate in a clear way, but some messy combination of:
I can currently observe humans which screens off a bunch of the comparison and letâs me do direct analysis.
I can directly observe AIs and make predictions of future training methods and their values seem to result from a much more heavily optimized and precise thing with less âslackâ in some sense. (Perhaps this is related to genetic bottleneck, Iâm unsure.)
AIs will be primarily trained in things which look extremely different from âcooperatively achieving high genetic fitnessâ.
Current AIs seem to use the vast, vast majority of their reasoning power for purposes which arenât directly related to their final applications. I predict this will also apply for internal high level reasoning of AIs. This doesnât seem true for humans.
Humans seem optimized for something which isnât that far off from utilitarianism from some perspective? Make yourself survive, make your kin group survive, make your tribe survive, etc? I think utilitarianism is often a natural generalization of âI care about the experience of XYZ, it seems arbitrary/âdumb/âbad to draw the boundary narrowly, so I should extend this furtherâ (This is how I get to utilitarianism.) I think the AI optimization looks considerably worse than this by default.
(Again, note that I said in my comment above: âSome of these can be defeated relatively easily if we train AIs specifically to be good successors, but the default AIs which end up with power over the future will not have this property.â I edited this in to my prior comment, so you might have missed it, sorry.)
Iâm in agreement that this consideration makes it hard to do a direct comparison. But I think this consideration should mostly make us more uncertain, rather than making us think that humans are better than the alternative. Analogy: if you rolled a die, and I didnât see the result, the expected value is not low just because I am uncertain about what happened. What matters here is the expected value, not necessarily the variance.
I guess I am having trouble understanding this point.
Sure, but the question is why being different makes it worse along the relevant axes that we were discussing. The question is not just âwill AIs be different than humans?â to which the answer would be âObviously, yesâ. Weâre talking about why the differences between humans and AIs make AIs better or worse in expectation, not merely different.
I am having a hard time parsing this claim. What do you mean by âfinal applicationsâ? And why wonât this be true for future AGIs that are at human-level intelligence or above? And why does this make a difference to the ultimate claim that youâre trying to support?
This consideration seems very weak to me. Early AGIs will presumably be directly optimized for something like consumer value, which looks a lot closer to âutilitarianismâ to me than the implicit values in gene-centered evolution. When I talk to GPT-4, I find that itâs way more altruistic and interested in making others happy than most humans are. This seems kind of a little bit like utilitarianism to meâat least more than your description of what human evolution was optimizing for. But maybe Iâm just not understanding the picture youâre painting well enough though. Or maybe my model of AI is wrong.
Actually, I was just trying to say âI can see what humans are like, and it seems pretty good relative to me current guesses about AIs in ways that dont just update me up about AIsâ sorry about the confusion.
Why is this different between AIs and humans? Do you expect AIs to care less about experience than humans, maybe bc humans get reward during life-time learning about AIs donât get reward during in context learning?
Can you say more about how slack (or genetic bottleneck) would affect whether AIs have values that are good by human lights?
They might well be trained to cooperate with other copies on tasks, if this is they way theyâll be deployed in practice?
In what sense do AIs use their reasoning power in this way? How that that affect whether they will have values that humans like?
I am a human. Other humans might end up in a similar spot on reflection.
(Also I care less about values of mine which are highly contingent wrt humans.)
âHumanâ is just one category you belong to. Youâre also a member of the category âintelligent beingsâ, which you will share with AGIs. Another category you share with near-future AGIs is âbeings who were trained on 21st century cultural dataâ. I guess 12th century humans arenât in that category, so maybe we donât share their values?
Perhaps the category that matters is your nationality. Or maybe itâs âbeings in the Milky Wayâ, and you wouldnât trust people from Andromeda? (To be clear, this is rhetorical, not an actual suggestion)
My point here is that I think your argument could benefit from some rigor by specifying exactly what about being human makes someone share your values in the sense you are describing. As it stands, this reasoning seems quite shallow to me.
Currently, humans seem much closer to me in a values level than GPT-4 base. I think this is also likely to be true of future AIs, though I understand why you might not find this convincing.
I think the architecture (learning algorithm, etc.) and training environment between me and other humans seems vastly more similar than between me and likely AIs.
I donât think Iâm going to flesh this argument out to an extent to which youâd find it sufficiently rigorous or convincing, sorry.
Getting a bit meta for a bit, Iâm curious (if youâd like to answer) whether you feel that you wonât explain your views rigorously in a convincing way here mainly because (1) you are uncertain about these specific views, (2) you think your views are inherently difficult or costly to explain despite nonetheless being very compelling, (3) you think I canât understand your views easily because Iâm lacking some bedrock intuitions that are too costly to convey, or (4) some other option.
My views are reasonably messy, complicated, hard to articulate, and based on a relatively diffuse set of intuitions. I think we also reason in a pretty different way about the situation than you seem to (3). I think it wouldnât be impossible to try to write up a post on my views, but I would need to consolidate and think about how exactly to express where Iâm at. (Maybe 2-5 person days of work.) I havenât really consolidated my views or something close to reflective equilibrium.
I also just that arguing about pure philosophy very rarely gets anywhere and is very hard to make convincing in general.
Iâm somewhat uncertain on the âinside view/âmechanisticâ level. (But my all considered view is partially defering to some people which makes me overall less worried that I should immediately reconsider my life choices.)
I think my views are compelling, but Iâm not sure if Iâd say âvery compellingâ
My guess now of where we most disagree is regarding the value of a world where AIs disempower humanity and go onto have a vast technologically super-advanced, rapidly expanding civilisation. I think this would quite likely be ~0 value since we donât understand consciousness at all really, and my guess is that AIs arenât yet conscious and if we relatively quickly get to TAI in the current paradigm they probably still wonât be moral patients. As a sentientist I donât really care whether there is a huge future if humans (or something sufficiently related to humans e.g. we carefully study consciousness for a millennium and create digital people we are very confident have morally important experiences to be our successors) arenât in it.
So yes I agree frontier AI models are where the most transformative potential lies, but I would prefer to get there far later once we understand alignment and consciousness far better (while other less important tech progress continues in the meantime).
Thanks. I disagree with this for the following reasons:
AIs will get more complex over time, even in our current paradigm. Eventually I expect AIs will have highly sophisticated cognition that Iâd feel comfortable calling conscious, on our current path of development (Iâm an illusionist about phenomenal consciousness so I donât think thereâs a fact of the matter anyway).
If we slowed down AI, I donât think that would necessarily translate into a higher likelihood that future AIs will be conscious. Why would it?
In the absence of a strong argument that slowing down AI makes future AIs more likely to be conscious, I still think the considerations I mentioned are stronger than the counter-considerations youâve mentioned here, and I think they should push us towards trying to avoid entrenching norms that could hamper future growth and innovation.