I’m curious why there hasn’t been more work exploring a pro-AI or pro-AI-acceleration position from an effective altruist perspective. Some points:
Unlike existential risk from other sources (e.g. an asteroid) AI x-risk is unique because humans would be replaced by other beings, rather than completely dying out. This means you can’t simply apply a naive argument that AI threatens total extinction of value to make the case that AI safety is astronomically important, in the sense that you can for other x-risks. You generally need additional assumptions.
Total utilitarianism is generally seen as non-speciesist, and therefore has no intrinsic preference for human values over unaligned AI values. If AIs are conscious, there don’t appear to be strong prima facie reasons for preferring humans to AIs under hedonistic utilitarianism. Under preference utilitarianism, it doesn’t necessarily matter whether AIs are conscious.
Total utilitarianism generally recommends large population sizes. Accelerating AI can be modeled as a kind of “population accelerationism”. Extremely large AI populations could be preferable under utilitarianism compared to small human populations, even those with high per-capita incomes. Indeed, humans populations have recently stagnated via low population growth rates, and AI promises to lift this bottleneck.
Therefore, AI accelerationism seems straightforwardly recommended by total utilitarianism under some plausible theories.
Here’s a non-exhaustive list of guesses for why I think EAs haven’t historically been sympathetic to arguments like the one above, and have instead generally advocated AI safety over AI acceleration (at least when these two values conflict):
A belief that AIs won’t be conscious, and therefore won’t have much moral value compared to humans.
But why would we assume AIs won’t be conscious? For example, if Brian Tomasik is right, consciousness is somewhat universal, rather than being restricted to humans or members of the animal kingdom.
I also haven’t actually seen much EA literature defend this assumption explicitly, which would be odd if this belief is the primary reason EAs have for focusing on AI safety over AI acceleration.
A presumption in favor of human values over unaligned AI values for some reasons that aren’t based on strict impartial utilitarian arguments. These could include the beliefs that: (1) Humans are more likely to have “interesting” values compared to AIs, and (2) Humans are more likely to be motivated by moral arguments than AIs, and are more likely to reach a deliberative equilibrium of something like “ideal moral values” compared to AIs.
Why would humans be more likely to have “interesting” values than AIs? It seems very plausible that AIs will have interesting values even if their motives seem alien to us. AIs might have even more “interesting” values than humans.
It seems to me like wishful thinking to assume that humans are strongly motivated by moral arguments and would settle upon something like “ideal moral values”
A belief that population growth is inevitable, so it is better to focus on AI safety.
But a central question here is why pushing for AI safety—in the sense of AI research that enhances human interests—is better than the alternative on the margin. What reason is there to think AI safety now is better than pushing for greater AI population growth now? (Potential responses to this question are outlined in other bullet points above and below.)
AI safety has lasting effects due to a future value lock-in event, whereas accelerationism would have, at best, temporary effects.
Are you sure there will ever actually be a “value lock-in event”?
Even if there is at some point a value lock-in event, wouldn’t pushing for accelerationism also plausibly affect the values that are locked in? For example, the value of “population growth is good” seems more likely to be locked in, if you advocate for that now.
A belief that humans would be kinder and more benevolent than unaligned AIs
Humans seem pretty bad already. For example, humans are responsible for factory farming. It’s plausible that AIs could be even more callous and morally indifferent than humans, but the bar already seems low.
I’m also not convinced that moral values will be a major force shaping “what happens to the cosmic endowment”. It seems to me that the forces shaping economic consumption matter more than moral values.
A bedrock heuristic that it would be extraordinarily bad if “we all died from AI”, and therefore we should pursue AI safety over AI accelerationism.
But it would also be bad if we all died from old age while waiting for AI, and missed out on all the benefits that AI offers to humans, which is a point in favor of acceleration. Why would this heuristic be weaker?
An adherence to person-affecting views in which the values of currently-existing humans are what matter most; and a belief that AI threatens to kill existing humans.
But in this view, AI accelerationism could easily be favored since AIs could greatly benefit existing humans by extending our lifespans and enriching our lives with advanced technology.
An implicit acceptance of human supremacism, i.e. the idea that what matters is propagating the interests of the human species, or preserving the human species, even at the expense of individual interests (either within humanity or outside humanity) or the interests of other species.
But isn’t EA known for being unusually anti-speciesist compared to other communities? Peter Singer is often seen as a “founding father” of the movement, and a huge part of his ethical philosophy was about how we shouldn’t be human supremacists.
More generally, it seems wrong to care about preserving the “human species” in an abstract sense relative to preserving the current generation of actually living humans.
A belief that most humans are biased towards acceleration over safety, and therefore it is better for EAs to focus on safety as a useful correction mechanism for society.
But was an anti-safety bias common for previous technologies? I think something closer to the opposite is probably true: most humans seem, if anything, biased towards being overly cautious about new technologies rather than overly optimistic.
A belief that society is massively underrating the potential for AI, which favors extra work on AI safety, since it’s so neglected.
But if society is massively underrating AI, then this should also favor accelerating AI too? There doesn’t seem to be an obvious asymmetry between these two values.
An adherence to negative utilitarianism, which would favor obstructing AI, along with any other technology that could enable the population of conscious minds to expand.
This seems like a plausible moral argument to me, but it doesn’t seem like a very popular position among EAs.
A heuristic that “change is generally bad” and AI represents a gigantic change.
I don’t think many EAs would defend this heuristic explicitly.
Added: AI represents a large change to the world. Delaying AI therefore preserves option value.
This heuristic seems like it would have favored advocating delaying the industrial revolution, and all sorts of moral, social, and technological changes to the world in the past. Is that a position that EAs would be willing to bite the bullet on?
My understanding is that relatively few EAs are actual hardcore classic hedonist utilitarians. I think this is ~sufficient to explain why more haven’t become accelerationists.
Have you cornered a classic hedonist utilitarian EA and asked them? Have you cornered three? What did they say?
Don’t know why this is being disagree-voted. I think point 1 is basically correct—it doesn’t take diverging far from being a “hardcore classic hedonist utilitarian” to not support the case Matthew makes in the OP
I think a more important reason is the additional value of the information and the option value. It’s very likely that the change resulting from AI development will be irreversible. Since we’re still able to learn about AI as we study it, taking additional time to think and plan before training the most powerful AI systems seems to reduce the likelihood of being locked into suboptimal outcomes. Increasing the likelihood of achieving “utopia” rather than landing into “mediocrity” by 2 percent seems far more important than speeding up utopia by 10 years.
It’s very likely that whatever change that comes from AI development will be irreversible.
I think all actions are in a sense irreversible, but large changes tend to be less reversible than small changes. In this sense, the argument you gave seems reducible to “we should generally delay large changes to the world, to preserve option value”. Is that a reasonable summary?
In this case I think it’s just not obvious that delaying large changes is good. Would it have been good to delay the industrial revolution to preserve option value? I think this heuristic, if used in the past, would have generally demanded that we “pause” all sorts of social, material, and moral progress, which seems wrong.
I don’t think we would have been able to use the additional information we would have gained from delaying the industrial revolution but I think if we could have the answer might be “yes”. It’s easy to see in hindsight that it went well overall, but that doesn’t mean that the correct ex ante attitude shouldn’t have been caution!
AI x-risk is unique because humans would be replaced by other beings, rather than completely dying out. This means you can’t simply apply a naive argument that AI threatens total extinction of value
Paul Christiano wrote a piece a few years ago about ensuring that misaligned ASI is a “good successor” (in the moral value sense),[1] as a plan B to alignment (Medium version; LW version). I agree it’s odd that there hasn’t been more discussion since.[2]
Here’s a non-exhaustive list of guesses for why I think EAs haven’t historically been sympathetic [...]: A belief that AIs won’t be conscious, and therefore won’t have much moral value compared to humans.
accelerationism would have, at best, temporary effects
I’m confused by this point, and for me this is the overriding crux between my view and yours. Do you really not think accelerationism could have permanent effects, through making AI takeover, or some other irredeemable outcome, more likely?
Are you sure there will ever actually be a “value lock-in event”?
Although, Paul’s argument routes through acausal cooperation—see the piece for details—rather than through the ASI being morally valuable in itself. (And perhaps OP means to focus on the latter issue.) In Paul’s words:
Clarification: Being good vs. wanting good
We should distinguish two properties an AI might have:
- Having preferences whose satisfaction we regard as morally desirable. - Being a moral patient, e.g. being able to suffer in a morally relevant way.
These are not the same. They may be related, but they are related in an extremely complex and subtle way. From the perspective of the long-run future, we mostly care about the first property.
Under purely longtermist views, accelerating AI by 1 year increases available cosmic resources by 1 part in 10 billion. This is tiny. So the first order effects of acceleration are tiny from a longtermist perspective.
Thus, a purely longtermist perspective doesn’t care about the direct effects of delay/acceleration and the question would come down to indirect effects.
I can see indirect effects going either way, but delay seems better on current margins (this might depend on how much optimism you have on current AI safety progress, governance/policy progress, and whether you think humanity retaining control relative to AIs is good or bad). All of these topics have been explored and discussed to some extent.
I expect there hasn’t been much investigation of accelerating AI to advance the preferences of currently existing people because this exists at a point on the crazy train that very few people are at. See also the curse of cryonics:
the “curse of cryonics” is when a problem is both weird and very important, but it’s sitting right next to other weird problems that are even more important, so everyone who’s able to notice weird problems works on something else instead.
Under purely longtermist views, accelerating AI by 1 year increases available cosmic resources by 1 part in 10 billion. This is tiny.
Tiny compared to what? Are you assuming we can take some other action whose consequences don’t wash out over the long-term, e.g. because of a value lock-in? In general, these assumptions just seem quite weak and underspecified to me.
What exactly is the alternative action that has vastly greater value in expectation, and why does it have greater value? If what you mean is that we can try to reduce the risk of extinction instead, keep in mind that my first bullet point preempted pretty much this exact argument:
Unlike existential risk from other sources (e.g. an asteroid) AI x-risk is unique because humans would be replaced by other beings, rather than completely dying out. This means you can’t simply apply a naive argument that AI threatens total extinction of value to make the case that AI safety is astronomically important, in the sense that you can for other x-risks. You generally need additional assumptions.
What exactly is the alternative action that has vastly greater value in expectation, and why does it have greater value?
Ensuring human control throughout the singularity rather than having AIs get control very obviously has relatively massive effects. Of course, we can debate the sign here, I’m just making a claim about the magnitude.
I’m not talking about extinction of all smart beings on earth (AIs and humans), which seems like a small fraction of existential risk.
(Separately, the badness of such extinction seems maybe somewhat overrated because pretty likely intelligent life will just re-evolve in the next 300 million years. Intelligent life doesn’t seem that contingent. Also aliens.)
I think it remains the case that the value of accelerating AI progress is tiny relative to other apparently available interventions, such as ensuring that AIs are sentient or improving their expected well-being conditional on their being sentient. The case for focusing on how a transformative technology unfolds, rather than on when it unfolds,[1] seems robust to a relatively wide range of technologies and assumptions. Still, this seems worth further investigation.
Indeed, it seems that when the transformation unfolds is primarily important because of how it unfolds, insofar as the quality of a transformation is partly determined by its timing.
I’m claiming that it is not actually clear that we can take actions that don’t merely wash out over the long-term. In this case, you cannot simply assume that we can meaningfully and predictably affect how valuable the long-term future will be in, for example, billions of years. I agree that, yes, if you assume we can meaningfully affect the very long-run, then all actions that merely have short-term effects will have “tiny” impacts by comparison. But the assumption that we can meaningfully and predictably affect the long-run is precisely the thing that needs to be argued. I think it’s important for EAs to try to be more rigorous about their empirical claims here.
Moreover, actions that have short-term effects can generally be assumed to have longer term effects if our actions propagate. For example, support for larger population sizes now would presumably increase the probability that larger population sizes exist in the very long run, compared to the alternative of smaller population sizes with high per capita incomes. It seems arbitrary to assume this effect will be negligible but then also assume other competing effects won’t be negligible. I don’t see any strong arguments for this position.
I was trying to hint at prima facie plausible ways in which the present generation can increase the value of the long-term future by more than one part in billions, rather than “assume” that this is the case, though of course I never gave anything resembling a rigorous argument.
I do agree that the “washing out” hypothesis is a reasonable default and that one needs a positive reason for expecting our present actions to persist into the long-term. One seemingly plausible mechanism is influencing how a transformative technology unfolds: it seems that the first generation that creates AGI has significantly more influence on how much artificial sentience there is in the universe a trillion years from now than, say, the millionth generation. Do you disagree with this claim?
I’m not sure I understand the point you make in the second paragraph. What would be the predictable long-term effects of hastening the arrival of AGI in the short-term?
I was trying to hint at prima facie plausible ways in which the present generation can increase the value of the long-term future by more than one part in billions, rather than “assume” that this is the case, though of course I never gave anything resembling a rigorous argument.
As I understand, the argument originally given was that there was a tiny effect of pushing for AI acceleration, which seems outweighed by unnamed and gigantic “indirect” effects in the long-run from alternative strategies of improving the long-run future. I responded by trying to get more clarity on what these gigantic indirect effects actually are, how we can predictably bring them about, and why we would think it’s plausible that we could bring them about in the first place. From my perspective, the shape of this argument looks something like:
Your action X has this tiny positive near-term effect (ETA: or a tiny direct effect)
My action Y has this large positive long-term effect (ETA: or a large indirect effect)
Therefore, Y is better than X.
Do you see the flaw here? Well, both X and Y could have long-term effects! So, it’s not sufficient to compare the short-term effect of X to the long-term effect of Y. You need to compare both effects, on both time horizons. As far as I can tell, I haven’t seen any argument in this thread that analyzed and compared the long-term effects in any detail, except perhaps in Ryan Greenblatt original comment, in which he linked to some other comments about a similar topic in a different thread (but I still don’t see what the exact argument is).
More generally, I think you’re probably trying to point to some concept you think is obvious and clear here, and I’m not seeing it, which is why I’m asking you to be more precise and rigorous about what you’re actually claiming.
I’m not sure I understand the point you make in the second paragraph. What would be the predictable long-term effect of hastening the arrival of AGI in the short-term?
In my original comment I pointed towards a mechanism. Here’s a more precise characterization of the argument:
Total utilitarianism generally supports, all else being equal, larger population sizes with low per capita incomes over small population sizes with high per capita incomes.
To the extent that our actions do not “wash out”, it seems reasonable to assume that pushing for large population sizes now would make it more likely in the long-run that we get large population sizes with low per-capita incomes compared to a small population size with high per capita incomes. (Keep in mind here that I’m not making any claim about the total level of resources.)
To respond to this argument you could say that in fact our actions do “wash out” here, so as to make the effect of pushing for larger population sizes rather small in the long run. But in response to that argument, I claim that this objection can be reversed and applied to almost any alternative strategy for improving the future that you might think is actually better. (In other words, I actually need to see your reasons for why there’s an asymmetry here; and I don’t currently see these reasons.)
Alternatively, you could just say that total utilitarianism is unreasonable and a bad ethical theory, but my original comment was about analyzing the claim about accelerating AI from the perspective of total utilitarianism, which, as a theory, seems to be relatively popular among EAs. So I’d prefer to keep this discussion grounded within that context.
Yes, I agree that we should consider the long-term effects of each intervention when comparing them. I focused on the short-term effects of hastening AI progress because it is those effects that are normally cited as the relevant justification in EA/utilitarian discussions of that intervention. For instance, those are the effects that Bostrom considers in ‘Astronomical waste’. Conceivably, there is a separate argument that appeals to the beneficial long-term effects of AI capability acceleration. I haven’t considered this argument because I haven’t seen many people make it, so I assume that accelerationist types tend to believe that the short-term effects dominate.
I think Bostrom’s argument merely compares a pure x-risk (such as a huge asteroid hurtling towards Earth) relative to technological acceleration, and then concludes that reducing the probability of a pure x-risk is more important because the x-risk threatens the eventual colonization of the universe. I agree with this argument in the case of a pure x-risk, but as I noted in my original comment, I don’t think that AI risk is a pure x-risk.
If, by contrast, all we’re doing by doing AI safety research is influencing something like “the values of the agents in society in the future” (and not actually influencing the probability of eventual colonization), then this action seems to plausibly just wash out in the long-term. In this case, it seems very appropriate to compare the short-term effects of AI safety to the short-term effects of acceleration.
Let me put it another way. We can think about two (potentially competing) strategies for making the future better, along with their relevant short and possible long-term effects:
Doing AI safety research
Short-term effects: makes it more likely that AIs are kind to current or near-future humans
Possible long-term effect: makes it more likely that AIs in the very long-run will share the values of the human species, relative to some unaligned alternative
Accelerating AI
Short-term effect: helps current humans by hastening the arrival of advanced technology
Possible long-term effect: makes it more likely that we have a large population size at low per capita incomes, relative to a low population size with high per capita income
My opinion is that both of these long-term effects are very speculative, so it’s generally better to focus on a heuristic of doing what’s better in the short-term, while keeping the long-term consequences in mind. And when I do that, I do not come to a strong conclusion that AI safety research “beats” AI acceleration, from a total utilitarian perspective.
Your action X has this tiny positive near-term effect.
My action Y has this large positive long-term effect.
Therefore, Y is better than X.
To be clear, this wasn’t the structure of my original argument (though it might be Pablo’s). My argument was more like “you seem to be implying that action X is good because of its direct effect (literal first order acceleration), but actually the direct effect is small when considered in a particular perspective (longtermism), so for the that perspective we need to consideer indirect effects and the analysis for that looks pretty different”.
Note that I wasn’t trying really trying argue much about the sign of the indirect effect, though people have indeed discussed this in some detail in various contexts.
I agree your original argument was slightly different than the form I stated. I was speaking too loosely, and conflated what I thought Pablo might be thinking with what you stated originally.
I think the important claim from my comment is “As far as I can tell, I haven’t seen any argument in this thread that analyzed and compared the long-term effects in any detail, except perhaps in Ryan Greenblatt original comment, in which he linked to some other comments about a similar topic in a different thread (but I still don’t see what the exact argument is).”
I think the important claim from my comment is “As far as I can tell, I haven’t seen any argument in this thread that analyzed and compared the long-term effects in any detail, except perhaps in Ryan Greenblatt original comment, in which he linked to some other comments about a similar topic in a different thread (but I still don’t see what the exact argument is).”
Explicitly confirming that this seems right to me.
Moreover, actions that have short-term effects can generally be assumed to have longer term effects if our actions propagate.
I don’t disagree with this. I was just claiming that the “indirect” effects dominate (by indirect, I just mean effects other than shifting the future closer in time).
There is still the question of indirect/direct effects.
I was just claiming that the “indirect” effects dominate (by indirect, I just mean effects other than shifting the future closer in time).
I understand that. I wanted to know why you thought that. I’m asking for clarity. I don’t currently understand your reasons. See this recent comment of mine for more info.
I generally agree that we should be more concerned about this. In particular, I find people who will happily approve Shut Up and Multiply sentiment but reject this consideration suspect in their reasoning.
A more extreme version of this is that, given the massively greater efficiency with which a digital consciousness could convert matter and energy to utilons (IIRC naively about 3 orders of magnitude according to Bostrom, before any increase from greater coordination), on strict expected value reasoning you have to be extremely confident that this won’t happen—or at least have a much stronger rebuttal than ‘AI won’t necessarily be conscious’.
Separately, I think there might be a case for accelerationism even if you think it increases the risk of AI takeover and that AI takeover is bad, on the grounds that in many scenarios advancing faster might still increase the probability of human descendants getting through the time of perils before some other threat destroys us (every year we remain in our current state is another year in which we run the risk of, for example, a global nuclear war or civilisation-ending pandemic).
A more extreme version of this is that, given the massively greater efficiency with which a digital consciousness could convert matter and energy to utilons
I have a post where I conclude the above may well apply not only to digital consciousness, but also to animals:
I calculated the welfare ranges per calorie consumption for a few species.
They vary a lot. The values for bees and pigs are 4.88 k and 0.473 times as high as that for humans.
They are higher for non-human animals:
5 of the 6 species I analysed have values higher than that of humans.
The lower the calorie consumption, the higher the median welfare range per calorie consumption.
A lot of these points seem like arguments that it’s possible that unaligned AI takeover will go well, e.g. there’s no reason not to think that AIs are conscious, or will have interesting moral values, or etc.
My stance is that we (more-or-less) know humans are conscious and have moral values that, while they have failed to prevent large amounts of harm, seem to have the potential to be good. AIs may be conscious and may have welfare-promoting values, but we don’t know that yet. We should try to better understand whether AIs are worthy successors before transitioning power to them.
Probably a core point of disagreement here is whether, presented with a “random” intelligent actor, we should expect it to promote welfare or prevent suffering “by default”. My understanding is that some accelerationists believe that we should. I believe that we shouldn’t. Moreover I believe that it’s enough to be substantially uncertain about whether this is or isn’t the default to want to take a slower and more careful approach.
My stance is that we (more-or-less) know humans are conscious and have moral values that, while they have failed to prevent large amounts of harm, seem to have the potential to be good.
I claim there’s a weird asymmetry here where you’re happy to put trust into humans because they have the “potential” to do good, but you’re not willing to say the same for AIs, even though they seem to have the same type of “potential”.
Whatever your expectations about AIs, we already know that humans are not blank slates that may or may not be altruistic in the future: we actually have a ton of evidence about the quality and character of human nature, and it doesn’t make humans look great. Humans are not mainly described as altruistic creatures. I mentioned factory farming in my original comment, but one can examine the way people spend their money (i.e. not mainly on charitable causes), or the history of genocides, war, slavery, and oppression for additional evidence.
Probably a core point of disagreement here is whether, presented with a “random” intelligent actor, we should expect it to promote welfare or prevent suffering “by default”.
I don’t expect humans to “promote welfare or prevent suffering” by default either. Look at the current world. Have humans, on net, reduced or increased suffering? Even if you think humans have been good for the world, it’s not obvious. Sure, it’s easy to dismiss the value of unaligned AIs if you compare against some idealistic baseline; but I’m asking you to compare against a realistic baseline, i.e. actual human nature.
It seems like you’re just substantially more pessimistic than I am about humans. I think factory farming will be ended, and though it seems like humans have caused more suffering than happiness so far, I think their default trajectory will be to eventually stop doing that, and to ultimately do enough good to outweigh their ignoble past. I don’t think this is certain by any means, but I think it’s a reasonable extrapolation. (I maybe don’t expect you to find it a reasonable extrapolation.)
Meanwhile I expect the typical unaligned AI may seize power for some purpose that seems to us entirely trivial, and may be uninterested in doing any kind of moral philosophy, and/or may not place any terminal (rather than instrumental) value in paying attention to other sentient experiences in any capacity. I do think humans, even with their kind of terrible track record, are more promising than that baseline, though I can see why other people might think differently.
Sure, it’s easy to dismiss the value of unaligned AIs if you compare against some idealistic baseline; but I’m asking you to compare against a realistic baseline, i.e. actual human nature.
I haven’t read your entire post about this, but I understand you believe that if we created aligned AI, it would get essentially “current” human values, rather than e.g. some improved / more enlightened iteration of human values. If instead you believed the latter, that would set a significantly higher bar for unaligned AI, right?
If instead you believed the latter, that would set a significantly higher bar for unaligned AI, right?
That’s right, if I thought human values would improve greatly in the face of enormous wealth and advanced technology, I’d definitely be open to seeing humans as special and extra valuable from a total utilitarian perspective. Note that many routes through which values could improve in the future could apply to unaligned AIs too. So, for example, I’d need to believe that humans would be more likely to reflect, and be more likely to do the right type of reflection, relative to the unaligned baseline. In other words it’s not sufficient to argue that humans would reflect a little bit; that wouldn’t really persuade me at all.
I think there is very likely at some point going to be some sort of transition to a world where AIs are effectively in control. It seems worth it to slow down on the margin to try to shape this transition as best we can, especially slowing it down as we get closer to AGI and ASI. It would be surprising to me if making the transfer of power more voluntary/careful led to worse outcomes (or only led to slightly better outcomes such that the downsides of slowing down a bit made things worse).
Delaying the arrival of AGI by a few years as we get close to it seems good regardless of parameters like the value of involuntary-AI-disempowerment futures. But delaying the arrival by 100s of years seems more likely bad due to the tradeoff with other risks.
It would be surprising to me if making the transfer of power more voluntary/careful led to worse outcomes (or only led to slightly better outcomes such that the downsides of slowing down a bit made things worse).
Two questions here:
Why would accelerating AI make the transition less voluntary? (In my own mind, I’d be inclined to reverse this sentiment a bit: delaying AI by regulation generally involves forcibly stopping people from adopting AI. Force might be justified if it brings about a greater good, but that’s not the argument here.)
I can understand being “careful”. Being careful does seem like a good thing. But “being careful” generally trades off against other values in almost every domain I can think of, and there is such a thing as too much of a good thing. What reason is there to think that pushing for “more caution” is better on the margin compared to acceleration, especially considering society’s default response to AI in the absence of intervention?
So in the multi-agent slowly-replacing case, I’d argue that individual decisions don’t necessarily represent a voluntary decision on behalf of society (I’m imagining something like this scenario). In the misaligned power-seeking case, it seems obvious to me that this is involuntary. I agree that it technically could be a collective voluntary decision to hand over power more quickly, though (and in that case I’d be somewhat less against it).
I think emre’s comment lays out the intuitive case for being careful / taking your time, as does Ryan’s. I think the empirics are a bit messy once you take into account benefits of preventing other risks but I’d guess they come out in favor of delaying by at least a few years.
A presumption in favor of human values over unaligned AI values for some reasons that aren’t based on strict impartial utilitarian arguments. These could include the beliefs that: (1) Humans are more likely to have “interesting” values compared to AIs, and (2) Humans are more likely to be motivated by moral arguments than AIs, and are more likely to reach a deliberative equilibrium of something like “ideal moral values” compared to AIs.
I don’t think this is a crux. Even if you prefer unaligned AI values over likely human values (weighted by power), you’d probably prefer doing research on further improving AI values over speeding things up.
I think misaligned AI values should be expected to be worse than human values, because it’s not clear that misaligned AI systems would care about eg their own welfare.
Inasmuch as we expect misaligned AI systems to be conscious (or whatever we need to care about them) and also to be good at looking after their own interests, I agree that it’s not clear from a total utilitarian perspective that the outcome would be bad.
But the “values” of a misaligned AI system could be pretty arbitrary, so I don’t think we should expect that.
So I think it’s likely you have some very different beliefs from most people/EAs/myself, particularly:
Thinking that humans/humanity is bad, and AI is likely to be better
Thinking that humanity isn’t driven by ideational/moral concerns[1]
That AI is very likely to be conscious, moral (as in, making better moral judgements than humans), and that the current/default trend in the industry is very likely to make them conscious moral agents in a way humans aren’t
I don’t know if the total utilitarian/accelerationist position in the OP is yours or not. I think Daniel is right that most EAs don’t have this position. I think maybe Peter Singer gets closest to this in his interview with Tyler on the ‘would you side with the Aliens or not question’ here. But the answer to your descriptive question is simply that most EAs don’t have the combination of moral and empirical views about the world to make the argument you present valid and sound, so that’s why there isn’t much talk in EA about naïve accelerationism.
Going off the vibe I get from this view though, I think it’s a good heuristic that if your moral view sounds like a movie villain’s monologue it might be worth reflecting, and a lot of this post reminded me of the Earth-Trisolaris Organisation from Cixin Liu’s Three Body Problem. If someone’s honest moral view is “Eliminate human tyranny! The world belongs to Trisolaris AIs!” then I don’t know what else there is to do except quote Zvi’s phrase “please speak directly into this microphone”.
Another big issue I have with this post is that some of the counter-arguments just seem a bit like ‘nu-uh’, see:
But why would we assume AIs won’t be conscious?
Why would humans be more likely to have “interesting” values than AIs?
But it would also be bad if we all died from old age while waiting for AI, and missed out on all the benefits that AI offers to humans, which is a point in favor of acceleration. Why would this heuristic be weaker?
These (and other examples) are considerations for sure, but they need to be argued for. I don’t think they can just be stated and then say “therefore, ACCELERATE!”. I agree that AI Safety research needs to be more robust and the philosophical assumptions and views made more explicit, but one could already think of some counters to the questions that you raise, and I’m sure you already have them. For example, you might take a view (ala Peter Godfrey-Smith) that a certain biological substrate is necessary for conscious.
Similarly on total utilitarianism emphasis larger population sizes, agreed to the extent that the greater population increase the population utility, but this is the repugnant conclusion again. There’s a stopping point even in that scenario where an ever larger population decreases total utility, which is why in Parfit’s scenario it’s full of potatoes and muzak rather than humans crammed into battery cages like factory-farmed animals. Empirically, naïve accelerationism may tend toward the latter case in practice, even if there’s a theoretical case to be made for it.
There’s more I could say, but I don’t want to make this reply too long, and I think as Nathan said it’s a point worth discussing. Nevertheless it seems our different positions on this are built on some wide, fundamental divisions about reality and morality itself, and I’m not sure how those can be bridged, unless I’ve wildly misunderstood your position.
I don’t think humanity is bad. I just think people are selfish, and generally driven by motives that look very different from impartial total utilitarianism. AIs (even potentially “random” ones) seem about as good in expectation, from an impartial standpoint. In my opinion, this view becomes even stronger if you recognize that AIs will be selected on the basis of how helpful, kind, and useful they are to users. (Perhaps notice how different this selection criteria is from the evolutionary criteria used to evolve humans.)
I understand that most people are partial to humanity, which is why they generally find my view repugnant. But my response to this perspective is to point out that if we’re going to be partial to a group on the basis of something other than utilitarian equal consideration of interests, it makes little sense to choose to be partial to the human species as opposed to the current generation of humans or even myself. And if we take this route, accelerationism seems even more strongly supported than before, since developing AI and accelerating technological progress seems to be the best chance we have of preserving the current generation against aging and death. If we all died, and a new generation of humans replaced us, that would certainly be pretty bad for us.
Which sounds more like a movie villain’s monologue?
The idea that everyone currently living needs to sacrificed, and die, in order to preserve the human species
The idea that we should try to preserve currently living people, even if that means taking on a greater risk of not preserving the values of the human species
To be clear, I also just totally disagree with the heuristic that “if your moral view sounds like a movie villain’s monologue it might be worth reflecting”. I don’t think that fiction is generally a great place for learning moral philosophy, albeit with some notable exceptions.
Anyway, the answer to these moral questions may seem obvious to you, but I don’t think they’re as obvious as you’re making them seem.
I think the fact that people are partial to humanity explains a large fraction of the disagreement people have with me. But, fair enough, I exaggerated a bit. My true belief is a more moderate version of that claim.
When discussing why EAs in particular disagree with me, to overgeneralize by a fair bit, I’ve noticed that EAs are happy to concede that AIs could be moral patients, but are generally reluctant to admit AIs as moral agents, in the way they’d be happy to accept humans as independent moral agents (e.g. newborns) into our society. I’dcall this “being partial to humanity”, or at least, “being partial to the values of the human species”.
(In my opinion, this partiality seems so prevalent and deep in most people that to deny it seems a bit like a fish denying the existence of water. But I digress.)
To test this hypothesis, I recently asked three questions on Twitter about whether people would be willing to accept immigration through a portal to another universe from three sources:
“a society of humans who are very similar to us”
“a society of people who look & act like humans, but each of them only cares about their family”
“a society of people who look & act like humans, but they only care about maximizing paperclips”
I emphasized that in each case, the people are human-level in their intelligence, and also biological.
The results are preliminary (and I’m not linking here to avoid biasing the results, as voting has not yet finished), but so far my followers, who are mostly EAs, are much more happy to let the humans immigrate to our world, compared to the last two options. I claim there just aren’t really any defensible reasons to maintain this choice other than by implicitly appealing to a partiality towards humanity.
My guess is that if people are asked to defend their choice explicitly, they’d largely talk about some inherent altruism or hope they place in the human species, relative to the other options; and this still looks like “being partial to humanity”, as far as I can tell, from almost any reasonable perspective.
I think the fact that people are partial to humanity explains a large fraction of the disagreement people have with me.
Maybe, it’s hard for me to know. But I predict most the pushback you’re getting from relatively thoughtful longtermists isn’t due to this.
I’ve noticed that EAs are happy to concede that AIs could be moral patients, but are generally reluctant to admit AIs as moral agents, in the way they’d be happy to accept humans as independent moral agents (e.g. newborns) into our society.
I agree with this.
I’dcall this “being partial to humanity”, or at least, “being partial to the values of the human species”.
I think “being partial to humanity” is a bad description of what’s going on because (e.g.) these same people would be considerably more on board with aliens. I think the main thing going on is that people have some (probably mistaken) levels of pessimism about how AIs would act as moral agents which they don’t have about (e.g.) aliens.
To test this hypothesis, I recently asked three questions on Twitter about whether people would be willing to accept immigration through a portal to another universe from three sources:
“a society of humans who are very similar to us”
“a society of people who look & act like humans, but each of them only cares about their family”
“a society of people who look & act like humans, but they only care about maximizing paperclips”
...
I claim there just aren’t really any defensible reasons to maintain this choice other than by implicitly appealing to a partiality towards humanity.
This comparison seems to me to be missing the point. Minimally I think what’s going on is not well described as “being partial to humanity”.
Here’s a comparison I prefer:
A society of humans who are very similar to us.
A society of humans who are very similar to us in basically every way, except that they have a genetically-caused and strong terminal preference for maximizing the total expected number of paper clips (over the entire arc of history) and only care about other things instrumentally. They are sufficiently commited to paper clip maximization that this will persist on arbitrary reflection (e.g. they’d lock in this view immediately when given this option) and let’s also suppose that this view is transmitted genetically and in a gene-drive-y way such that all of their descendents will also only care about paper clips. (You can change paper clips to basically anything else which is broadly recognized to have no moral value on its own, e.g. gold twisted into circles.)
A society of beings (e.g. aliens) who are extremely different in basically every way to humans except that they also have something pretty similar to the concepts of “morality”, “pain”, “pleasure”, “moral patienthood”, “happyness”, “preferences”, “altruism”, and “careful reasoning about morality (moral thoughtfulness)”. And the society overall also has a roughly similar relationship with these concepts (e.g. the level of “altruism” is similar). (Note that having the same relationship as humans to these concepts is a pretty low bar! Humans aren’t that morally thoughtful!)
I think I’m almost equally happy with (1) and (3) on this list and quite unhappy with (2).
If you changed (3) to instead be “considerably more altruistic”, I would prefer (3) over (1).
I think it seems weird to call my views on the comparison I just outlined as “being partial to humanity”: I actually prefer (3) over (2) even though (2) are literally humans!
(Also, I’m not that commited to having concepts of “pain” and “pleasure”, but I’m relatively commited to having a concepts which are something like “moral patienthood”, “preferences”, and “altruism”.)
Below is a mild spoiler for a story by Eliezer Yudkowsky:
To make the above comparison about different beings more concrete, in the case of three worlds collide, I would basically be fine giving the universe over the the super-happies relative to humans (I think mildly better than humans?) and I think it seems only mildly worse than humans to hand it over to the baby-eaters. In both cases, I’m pricing in some amount of reflection and uplifting which doesn’t happen in the actual story of three worlds collide, but would likely happen in practice. That is, I’m imagining seeing these societies prior to their singularity and then based on just observations of their societies at this point, deciding how good they are (pricing in the fact that the society might change over time).
To be clear, it seems totally reasonable to call this “being partial to some notion of moral thoughtfulness about pain, pleasure, and preferences”, but these concepts don’t seem that “human” to me. (I predict these occur pretty frequently in evolved life that reaches a singularity for instance. And they might occur in AIs, but I expect misaligned AIs which seize control of the world are worse from my perspective than if humans retain control.)
When I say that people are partial to humanity, I’m including an irrational bias towards thinking that humans, or evolved beings, are unusually thoughtful or ethical compared to the alternatives (I believe this is in fact an irrational bias, since the arguments I’ve seen for thinking that unaligned AIs will be less thoughtful or ethical than aliens seem very weak to me).
In other cases, when people irrationally hold a certain group X to a higher standard than a group Y, it is routinely described as “being partial to group Y over group X”. I think this is just what “being partial” means, in an ordinary sense, across a wide range of cases.
For example, if I proposed aligning AI to my local friend group, with the explicit justification that I thought my friends are unusually thoughtful, I think this would be well-described as me being “partial” to my friend group.
To the extent you’re seeing me as saying something else about how longtermists view the argument, I suspect you’re reading me as saying something stronger than what I originally intended.
In that case, my main disagreement is thinking that your twitter poll is evidence for your claims.
More specifically:
I claim there just aren’t really any defensible reasons to maintain this choice other than by implicitly appealing to a partiality towards humanity.
Like you claim there aren’t any defensible reasons to think that what humans will do is better than literally maximizing paper clips? This seems totally wild to me.
Like you claim there aren’t any defensible reasons to think that what humans will do is better than literally maximizing paper clips?
I’m not exactly sure what you mean by this. There were three options, and human paperclippers were only one of these options. I was mainly discussing the choice between (1) and (2) in the comment, not between (1) and (3).
Here’s my best guess at what you’re saying: it sounds like you’re repeating that you expect humans to be unusually altruistic or thoughtful compared to an unaligned alternative. But the point of my previous comment was to state my view that this bias counted as “being partial towards humanity”, since I view the bias as irrational. In light of that, what part of my comment are you objecting to?
To be clear, you can think the bias I’m talking about is actually rational; that’s fine. But I just disagree with you for pretty mundane reasons.
[Incorporating what you said in the other comment]
Also, to be clear, I agree that the question of “how much worse/better is it for AIs to get vast amounts of resources without human society intending to grant those resources to the AIs from a longtermist perspective” is underinvestigated, but I think there are pretty good reasons to systematically expect human control to be a decent amount better.
Then I think it’s worth concretely explaining what these reasons are to believe that human control will be a decent amount better in expectation. You don’t need to write this up yourself, of course. I think the EA community should write these reasons up. Because I currently view the proposition as non-obvious, and despite being a critical belief in AI risk discussions, it’s usually asserted without argument. When I’ve pressed people in the past, they typically give very weak reasons.
I don’t know how to respond to an argument whose details are omitted.
Then I think it’s worth concretely explaining what these reasons are to believe that human control will be a decent amount better in expectation. You don’t need to write this up yourself, of course.
+1, but I don’t generally think it’s worth counting on “the EA community” to do something like this. I’ve been vaguely trying to pitch Joe on doing something like this (though there are probably better uses of his time) and his recent blogs posts are touching similar topics.
Here’s my best guess at what you’re saying: it sounds like you’re repeating that you expect humans to be unusually altruistic or thoughtful compared to an unaligned alternative.
There, I’m just saying that human control is better than literal paperclip maximization.
This response still seems underspecified to me. Is the default unaligned alternative paperclip maximization in your view? I understand that Eliezer Yudkowsky has given arguments for this position, but it seems like you diverge significantly from Eliezer’s general worldview, so I’d still prefer to hear this take spelled out in more detail from your own point of view.
“a society of people who look & act like humans, but they only care about maximizing paperclips”
And then you say:
so far my followers, who are mostly EAs, are much more happy to let the humans immigrate to our world, compared to the last two options. I claim there just aren’t really any defensible reasons to maintain this choice other than by implicitly appealing to a partiality towards humanity.
So, I think more human control is better than more literal paperclip maximization, the option given in your poll.
My overall position isn’t that the AIs will certainly be paperclippers, I’m just arguing in isolation about why I think the choice given in the poll is defensible.
I have the feeling we’re talking past each other a bit. I suspect talking about this poll was kind of a distraction. I personally have the sense of trying to convey a central point, and instead of getting the point across, I feel the conversation keeps slipping into talking about how to interpret minor things I said, which I don’t see as very relevant.
I will probably take a break from replying for now, for these reasons, although I’d be happy to catch up some time and maybe have a call to discuss these questions in more depth. I definitely see you as trying a lot harder than most other EAs in trying to make progress on these questions collaboratively with me.
I’d be very happy to have some discussion on these topics with you Matthew. For what it’s worth, I really have found much of your work insightful, thought-provoking, and valuable. I think I just have some strong, core disagreements on multiple empirical/epistemological/moral levels with your latest series of posts.
That doesn’t mean I don’t want you to share your views, or that they’re not worth discussion, and I apologise if I came off as too hostile. An open invitation to have some kind of deeper discussion stands.[1]
Also, to be clear, I agree that the question of “how much worse/better is it for AIs to get vast amounts of resources without human society intending to grant those resources to the AIs from a longtermist perspective” is underinvestigated, but I think there are pretty good reasons to systematically expect human control to be a decent amount better.
Under preference utilitarianism, it doesn’t necessarily matter whether AIs are conscious.
I’m guessing preference utilitarians would typically say that only the preferences of conscious entities matter. I doubt any of them would care about satisfying an electron’s “preference” to be near protons rather than ionized.
Strongly there should be more explicit defences of this argument.
One way of doing this in a co-operative way might working on co-operative AI stuff, since it seems to increase the likelihood that misaligned AI goes well, or at least less badly.
My personal reason for not digging into this is that my naive model of how good the AI future is: quality_of_future * amount_of_the_stuff.
And there is distinction I haven’t seen you acknowledged: while high “quality” doesn’t require humans to be around, I ultimately judge quality by my values. (Thing being conscious is an example. But this also includes things like not copy-pasting the same thing all over, not wiping out aliens, and presumably many other things I am not aware of. IIRC Yudkowsky talks about cosmopolitanism being a human value.)
Because of this, my impression is that if we hand over the future to a random AI, the “quality” will be very low. And so we can currently have a much larger impact by focusing on increasing the quality. Which we can do by delaying “handing over the future to AI” and picking a good AI to hand over to. IE, alignment.
(Still, I agree it would be nice if there was a better analysis of this, which exposed the assumptions.)
And there is distinction I haven’t seen you acknowledged: while high “quality” doesn’t require humans to be around, I ultimately judge quality by my values.
Is there any particular reason why you are partial towards humans generically controlling the future, relative to this particular current generation of humans? To me, it seems like being partial to one’s own values, one’s community, and especially one’s own life, generally leads to an even stronger argument for accelerationism, since the best way to advance your own values is generally to actually “be there” when AI happens.
In my opinion, the main relevant alternative to this view is to be partial to the human species, as opposed to being partial to either one’s current generation, or oneself. And I think the human species is kind of a weird category to be partial to, relative to those other things. Do you disagree?
In my opinion, the main relevant alternative to this view is to be partial to the human species, as opposed to being partial to either one’s current generation, or oneself. And I think the human species is kind of a weird category to be partial to, relative to those other things. Do you disagree?
I agree with this.
the best way to advance your own values is generally to actually “be there” when AI happens.
I (strongly) disagree with this. Me being alive is a relatively small part of my values. And since I am not the director of the world, me personally being around to influence things is unlikely to have a decisive impact on things I value.
In more detail: Sure, all else being equal, me being there when AI happens is mildly helpful. But the outcome of building AI seems to be a function of, among other things, (i) values of the people building it + (ii) how much reflection they can do on those values + (iii) the environment dynamics these people are subject to (e.g., the current race dynamics between AI companies). And over time, I expect the potential decrease in (i) to be far outweighed by gains in (ii) and (iii).
The first issue is about (i), that it is not actually me building the AGI, either now or in the future. But I am willing to grant that (all else being equal) current generation is more likely to have values closer to my values.
However, I expect that the factors are (ii) and (iii) are just as influential. Regarding (ii), it seems we keep making progress at philosophy, ethics, etc, and to me, this currently far outweighs the value drift in (i).
Regarding (iii), my impression is that the current situation is so bad that it can’t get much worse, and we might as well wait. This of course depends on how likely you think we are likely to get a bad outcome if we either (a) get superintelligence without additional progress on alignment or (b) get widespread human-level AI with no progress on alignment, institution design, etc.
Me being alive is a relatively small part of my values.
I agree some people (such as yourself) might be extremely altruistic, and therefore might not care much about their own life relative to other values they hold, but this position is fairly uncommon. Most people care a lot about their own lives (and especially the lives of their family and friends) relative to other things they care about. We can empirically test this hypothesis by looking at how people choose to spend their time and money; and the results are generally that people spend their money on themselves, their family and their friends.
since I am not the director of the world, me personally being around to influence things is unlikely to have a decisive impact on things I value.
You don’t need to be director of the world to have influence over things. You can just be a small part of the world to have influence over things that you care about. This is essentially what you’re already doing by living and using your income to make decisions, to satisfy your own preferences. I’m claiming this situation could and probably will persist into the indefinite future, for the agents that exist in the future.
I’m very skeptical that there will ever be a moment in time during which there will be a “director of the world”, in a strong sense. And I doubt the developer of the first AGI will become the director of the world, even remotely (including versions of them that reflect on moral philosophy etc.). You might want to read my post about this.
One potential argument against accelerating AI is that it will increase the chance of catastrophes which will then lead to overregulating AI (e.g. in the same way that nuclear power arguably was overregulated).
I’m curious why there hasn’t been more work exploring a pro-AI or pro-AI-acceleration position from an effective altruist perspective. Some points:
Unlike existential risk from other sources (e.g. an asteroid) AI x-risk is unique because humans would be replaced by other beings, rather than completely dying out. This means you can’t simply apply a naive argument that AI threatens total extinction of value to make the case that AI safety is astronomically important, in the sense that you can for other x-risks. You generally need additional assumptions.
Total utilitarianism is generally seen as non-speciesist, and therefore has no intrinsic preference for human values over unaligned AI values. If AIs are conscious, there don’t appear to be strong prima facie reasons for preferring humans to AIs under hedonistic utilitarianism. Under preference utilitarianism, it doesn’t necessarily matter whether AIs are conscious.
Total utilitarianism generally recommends large population sizes. Accelerating AI can be modeled as a kind of “population accelerationism”. Extremely large AI populations could be preferable under utilitarianism compared to small human populations, even those with high per-capita incomes. Indeed, humans populations have recently stagnated via low population growth rates, and AI promises to lift this bottleneck.
Therefore, AI accelerationism seems straightforwardly recommended by total utilitarianism under some plausible theories.
Here’s a non-exhaustive list of guesses for why I think EAs haven’t historically been sympathetic to arguments like the one above, and have instead generally advocated AI safety over AI acceleration (at least when these two values conflict):
A belief that AIs won’t be conscious, and therefore won’t have much moral value compared to humans.
But why would we assume AIs won’t be conscious? For example, if Brian Tomasik is right, consciousness is somewhat universal, rather than being restricted to humans or members of the animal kingdom.
I also haven’t actually seen much EA literature defend this assumption explicitly, which would be odd if this belief is the primary reason EAs have for focusing on AI safety over AI acceleration.
A presumption in favor of human values over unaligned AI values for some reasons that aren’t based on strict impartial utilitarian arguments. These could include the beliefs that: (1) Humans are more likely to have “interesting” values compared to AIs, and (2) Humans are more likely to be motivated by moral arguments than AIs, and are more likely to reach a deliberative equilibrium of something like “ideal moral values” compared to AIs.
Why would humans be more likely to have “interesting” values than AIs? It seems very plausible that AIs will have interesting values even if their motives seem alien to us. AIs might have even more “interesting” values than humans.
It seems to me like wishful thinking to assume that humans are strongly motivated by moral arguments and would settle upon something like “ideal moral values”
A belief that population growth is inevitable, so it is better to focus on AI safety.
But a central question here is why pushing for AI safety—in the sense of AI research that enhances human interests—is better than the alternative on the margin. What reason is there to think AI safety now is better than pushing for greater AI population growth now? (Potential responses to this question are outlined in other bullet points above and below.)
AI safety has lasting effects due to a future value lock-in event, whereas accelerationism would have, at best, temporary effects.
Are you sure there will ever actually be a “value lock-in event”?
Even if there is at some point a value lock-in event, wouldn’t pushing for accelerationism also plausibly affect the values that are locked in? For example, the value of “population growth is good” seems more likely to be locked in, if you advocate for that now.
A belief that humans would be kinder and more benevolent than unaligned AIs
Humans seem pretty bad already. For example, humans are responsible for factory farming. It’s plausible that AIs could be even more callous and morally indifferent than humans, but the bar already seems low.
I’m also not convinced that moral values will be a major force shaping “what happens to the cosmic endowment”. It seems to me that the forces shaping economic consumption matter more than moral values.
A bedrock heuristic that it would be extraordinarily bad if “we all died from AI”, and therefore we should pursue AI safety over AI accelerationism.
But it would also be bad if we all died from old age while waiting for AI, and missed out on all the benefits that AI offers to humans, which is a point in favor of acceleration. Why would this heuristic be weaker?
An adherence to person-affecting views in which the values of currently-existing humans are what matter most; and a belief that AI threatens to kill existing humans.
But in this view, AI accelerationism could easily be favored since AIs could greatly benefit existing humans by extending our lifespans and enriching our lives with advanced technology.
An implicit acceptance of human supremacism, i.e. the idea that what matters is propagating the interests of the human species, or preserving the human species, even at the expense of individual interests (either within humanity or outside humanity) or the interests of other species.
But isn’t EA known for being unusually anti-speciesist compared to other communities? Peter Singer is often seen as a “founding father” of the movement, and a huge part of his ethical philosophy was about how we shouldn’t be human supremacists.
More generally, it seems wrong to care about preserving the “human species” in an abstract sense relative to preserving the current generation of actually living humans.
A belief that most humans are biased towards acceleration over safety, and therefore it is better for EAs to focus on safety as a useful correction mechanism for society.
But was an anti-safety bias common for previous technologies? I think something closer to the opposite is probably true: most humans seem, if anything, biased towards being overly cautious about new technologies rather than overly optimistic.
A belief that society is massively underrating the potential for AI, which favors extra work on AI safety, since it’s so neglected.
But if society is massively underrating AI, then this should also favor accelerating AI too? There doesn’t seem to be an obvious asymmetry between these two values.
An adherence to negative utilitarianism, which would favor obstructing AI, along with any other technology that could enable the population of conscious minds to expand.
This seems like a plausible moral argument to me, but it doesn’t seem like a very popular position among EAs.
A heuristic that “change is generally bad” and AI represents a gigantic change.
I don’t think many EAs would defend this heuristic explicitly.
Added: AI represents a large change to the world. Delaying AI therefore preserves option value.
This heuristic seems like it would have favored advocating delaying the industrial revolution, and all sorts of moral, social, and technological changes to the world in the past. Is that a position that EAs would be willing to bite the bullet on?
My understanding is that relatively few EAs are actual hardcore classic hedonist utilitarians. I think this is ~sufficient to explain why more haven’t become accelerationists.
Have you cornered a classic hedonist utilitarian EA and asked them? Have you cornered three? What did they say?
Don’t know why this is being disagree-voted. I think point 1 is basically correct—it doesn’t take diverging far from being a “hardcore classic hedonist utilitarian” to not support the case Matthew makes in the OP
I think a more important reason is the additional value of the information and the option value. It’s very likely that the change resulting from AI development will be irreversible. Since we’re still able to learn about AI as we study it, taking additional time to think and plan before training the most powerful AI systems seems to reduce the likelihood of being locked into suboptimal outcomes. Increasing the likelihood of achieving “utopia” rather than landing into “mediocrity” by 2 percent seems far more important than speeding up utopia by 10 years.
I think all actions are in a sense irreversible, but large changes tend to be less reversible than small changes. In this sense, the argument you gave seems reducible to “we should generally delay large changes to the world, to preserve option value”. Is that a reasonable summary?
In this case I think it’s just not obvious that delaying large changes is good. Would it have been good to delay the industrial revolution to preserve option value? I think this heuristic, if used in the past, would have generally demanded that we “pause” all sorts of social, material, and moral progress, which seems wrong.
I don’t think we would have been able to use the additional information we would have gained from delaying the industrial revolution but I think if we could have the answer might be “yes”. It’s easy to see in hindsight that it went well overall, but that doesn’t mean that the correct ex ante attitude shouldn’t have been caution!
Paul Christiano wrote a piece a few years ago about ensuring that misaligned ASI is a “good successor” (in the moral value sense),[1] as a plan B to alignment (Medium version; LW version). I agree it’s odd that there hasn’t been more discussion since.[2]
I’ve wonderedabout this myself. My take is that this area was overlooked a year ago, but there’s now some good work being done. See Jeff Sebo’s Nov ’23 80k podcast episode, as well as Rob Long’s episode, and the paper that the two of them co-authored at the end of last year: “Moral consideration for AI systems by 2030”. Overall, I’m optimistic about this area becoming a new forefront of EA.
I’m confused by this point, and for me this is the overriding crux between my view and yours. Do you really not think accelerationism could have permanent effects, through making AI takeover, or some other irredeemable outcome, more likely?
I’m not sure there’ll be a lock-in event, in the way I can’t technically be sure about anything, but such an event seems clearly probable enough that I very much want to avoid taking actions that bring it closer. (Insofar as bringing the event closer raises the chance it goes badly, which I believe to be a likely dynamic. See, for example, the Metaculus question, “How does the level of existential risk posed by AGI depend on its arrival time?”, or discussion of the long reflection.)
Although, Paul’s argument routes through acausal cooperation—see the piece for details—rather than through the ASI being morally valuable in itself. (And perhaps OP means to focus on the latter issue.) In Paul’s words:
There was a little discussion a few months ago, here, but none of what was said built on Paul’s article.
It’s worth emphasizing that moral welfare of digital minds is quite a different (though related) topic to whether AIs are good successors.
Fair point, I’ve added a footnote to make this clearer.
Under purely longtermist views, accelerating AI by 1 year increases available cosmic resources by 1 part in 10 billion. This is tiny. So the first order effects of acceleration are tiny from a longtermist perspective.
Thus, a purely longtermist perspective doesn’t care about the direct effects of delay/acceleration and the question would come down to indirect effects.
I can see indirect effects going either way, but delay seems better on current margins (this might depend on how much optimism you have on current AI safety progress, governance/policy progress, and whether you think humanity retaining control relative to AIs is good or bad). All of these topics have been explored and discussed to some extent.
When focusing on the welfare/preferences of currently existing people, I think it’s unclear if accelerating AI looks good or bad, it depends on optimism about AI safety, how you trade-off old people versus young people, and death via violence versus death from old age. (Misaligned AI takeover killing lots of people is by no means assured, but seems reasonably likely by default.)
I expect there hasn’t been much investigation of accelerating AI to advance the preferences of currently existing people because this exists at a point on the crazy train that very few people are at. See also the curse of cryonics:
Tiny compared to what? Are you assuming we can take some other action whose consequences don’t wash out over the long-term, e.g. because of a value lock-in? In general, these assumptions just seem quite weak and underspecified to me.
What exactly is the alternative action that has vastly greater value in expectation, and why does it have greater value? If what you mean is that we can try to reduce the risk of extinction instead, keep in mind that my first bullet point preempted pretty much this exact argument:
Ensuring human control throughout the singularity rather than having AIs get control very obviously has relatively massive effects. Of course, we can debate the sign here, I’m just making a claim about the magnitude.
I’m not talking about extinction of all smart beings on earth (AIs and humans), which seems like a small fraction of existential risk.
(Separately, the badness of such extinction seems maybe somewhat overrated because pretty likely intelligent life will just re-evolve in the next 300 million years. Intelligent life doesn’t seem that contingent. Also aliens.)
For what it’s worth, I think my reply to Pablo here responds to your comment fairly adequately too.
I think it remains the case that the value of accelerating AI progress is tiny relative to other apparently available interventions, such as ensuring that AIs are sentient or improving their expected well-being conditional on their being sentient. The case for focusing on how a transformative technology unfolds, rather than on when it unfolds,[1] seems robust to a relatively wide range of technologies and assumptions. Still, this seems worth further investigation.
Indeed, it seems that when the transformation unfolds is primarily important because of how it unfolds, insofar as the quality of a transformation is partly determined by its timing.
I’m claiming that it is not actually clear that we can take actions that don’t merely wash out over the long-term. In this case, you cannot simply assume that we can meaningfully and predictably affect how valuable the long-term future will be in, for example, billions of years. I agree that, yes, if you assume we can meaningfully affect the very long-run, then all actions that merely have short-term effects will have “tiny” impacts by comparison. But the assumption that we can meaningfully and predictably affect the long-run is precisely the thing that needs to be argued. I think it’s important for EAs to try to be more rigorous about their empirical claims here.
Moreover, actions that have short-term effects can generally be assumed to have longer term effects if our actions propagate. For example, support for larger population sizes now would presumably increase the probability that larger population sizes exist in the very long run, compared to the alternative of smaller population sizes with high per capita incomes. It seems arbitrary to assume this effect will be negligible but then also assume other competing effects won’t be negligible. I don’t see any strong arguments for this position.
I was trying to hint at prima facie plausible ways in which the present generation can increase the value of the long-term future by more than one part in billions, rather than “assume” that this is the case, though of course I never gave anything resembling a rigorous argument.
I do agree that the “washing out” hypothesis is a reasonable default and that one needs a positive reason for expecting our present actions to persist into the long-term. One seemingly plausible mechanism is influencing how a transformative technology unfolds: it seems that the first generation that creates AGI has significantly more influence on how much artificial sentience there is in the universe a trillion years from now than, say, the millionth generation. Do you disagree with this claim?
I’m not sure I understand the point you make in the second paragraph. What would be the predictable long-term effects of hastening the arrival of AGI in the short-term?
As I understand, the argument originally given was that there was a tiny effect of pushing for AI acceleration, which seems outweighed by unnamed and gigantic “indirect” effects in the long-run from alternative strategies of improving the long-run future. I responded by trying to get more clarity on what these gigantic indirect effects actually are, how we can predictably bring them about, and why we would think it’s plausible that we could bring them about in the first place. From my perspective, the shape of this argument looks something like:
Your action X has this tiny positive near-term effect (ETA: or a tiny direct effect)
My action Y has this large positive long-term effect (ETA: or a large indirect effect)
Therefore, Y is better than X.
Do you see the flaw here? Well, both X and Y could have long-term effects! So, it’s not sufficient to compare the short-term effect of X to the long-term effect of Y. You need to compare both effects, on both time horizons. As far as I can tell, I haven’t seen any argument in this thread that analyzed and compared the long-term effects in any detail, except perhaps in Ryan Greenblatt original comment, in which he linked to some other comments about a similar topic in a different thread (but I still don’t see what the exact argument is).
More generally, I think you’re probably trying to point to some concept you think is obvious and clear here, and I’m not seeing it, which is why I’m asking you to be more precise and rigorous about what you’re actually claiming.
In my original comment I pointed towards a mechanism. Here’s a more precise characterization of the argument:
Total utilitarianism generally supports, all else being equal, larger population sizes with low per capita incomes over small population sizes with high per capita incomes.
To the extent that our actions do not “wash out”, it seems reasonable to assume that pushing for large population sizes now would make it more likely in the long-run that we get large population sizes with low per-capita incomes compared to a small population size with high per capita incomes. (Keep in mind here that I’m not making any claim about the total level of resources.)
To respond to this argument you could say that in fact our actions do “wash out” here, so as to make the effect of pushing for larger population sizes rather small in the long run. But in response to that argument, I claim that this objection can be reversed and applied to almost any alternative strategy for improving the future that you might think is actually better. (In other words, I actually need to see your reasons for why there’s an asymmetry here; and I don’t currently see these reasons.)
Alternatively, you could just say that total utilitarianism is unreasonable and a bad ethical theory, but my original comment was about analyzing the claim about accelerating AI from the perspective of total utilitarianism, which, as a theory, seems to be relatively popular among EAs. So I’d prefer to keep this discussion grounded within that context.
Thanks for the clarification.
Yes, I agree that we should consider the long-term effects of each intervention when comparing them. I focused on the short-term effects of hastening AI progress because it is those effects that are normally cited as the relevant justification in EA/utilitarian discussions of that intervention. For instance, those are the effects that Bostrom considers in ‘Astronomical waste’. Conceivably, there is a separate argument that appeals to the beneficial long-term effects of AI capability acceleration. I haven’t considered this argument because I haven’t seen many people make it, so I assume that accelerationist types tend to believe that the short-term effects dominate.
I think Bostrom’s argument merely compares a pure x-risk (such as a huge asteroid hurtling towards Earth) relative to technological acceleration, and then concludes that reducing the probability of a pure x-risk is more important because the x-risk threatens the eventual colonization of the universe. I agree with this argument in the case of a pure x-risk, but as I noted in my original comment, I don’t think that AI risk is a pure x-risk.
If, by contrast, all we’re doing by doing AI safety research is influencing something like “the values of the agents in society in the future” (and not actually influencing the probability of eventual colonization), then this action seems to plausibly just wash out in the long-term. In this case, it seems very appropriate to compare the short-term effects of AI safety to the short-term effects of acceleration.
Let me put it another way. We can think about two (potentially competing) strategies for making the future better, along with their relevant short and possible long-term effects:
Doing AI safety research
Short-term effects: makes it more likely that AIs are kind to current or near-future humans
Possible long-term effect: makes it more likely that AIs in the very long-run will share the values of the human species, relative to some unaligned alternative
Accelerating AI
Short-term effect: helps current humans by hastening the arrival of advanced technology
Possible long-term effect: makes it more likely that we have a large population size at low per capita incomes, relative to a low population size with high per capita income
My opinion is that both of these long-term effects are very speculative, so it’s generally better to focus on a heuristic of doing what’s better in the short-term, while keeping the long-term consequences in mind. And when I do that, I do not come to a strong conclusion that AI safety research “beats” AI acceleration, from a total utilitarian perspective.
To be clear, this wasn’t the structure of my original argument (though it might be Pablo’s). My argument was more like “you seem to be implying that action X is good because of its direct effect (literal first order acceleration), but actually the direct effect is small when considered in a particular perspective (longtermism), so for the that perspective we need to consideer indirect effects and the analysis for that looks pretty different”.
Note that I wasn’t trying really trying argue much about the sign of the indirect effect, though people have indeed discussed this in some detail in various contexts.
I agree your original argument was slightly different than the form I stated. I was speaking too loosely, and conflated what I thought Pablo might be thinking with what you stated originally.
I think the important claim from my comment is “As far as I can tell, I haven’t seen any argument in this thread that analyzed and compared the long-term effects in any detail, except perhaps in Ryan Greenblatt original comment, in which he linked to some other comments about a similar topic in a different thread (but I still don’t see what the exact argument is).”
Explicitly confirming that this seems right to me.
I don’t disagree with this. I was just claiming that the “indirect” effects dominate (by indirect, I just mean effects other than shifting the future closer in time).
There is still the question of indirect/direct effects.
I understand that. I wanted to know why you thought that. I’m asking for clarity. I don’t currently understand your reasons. See this recent comment of mine for more info.
(I don’t think I’m going to engage further here, sorry.)
I generally agree that we should be more concerned about this. In particular, I find people who will happily approve Shut Up and Multiply sentiment but reject this consideration suspect in their reasoning.
A more extreme version of this is that, given the massively greater efficiency with which a digital consciousness could convert matter and energy to utilons (IIRC naively about 3 orders of magnitude according to Bostrom, before any increase from greater coordination), on strict expected value reasoning you have to be extremely confident that this won’t happen—or at least have a much stronger rebuttal than ‘AI won’t necessarily be conscious’.
Separately, I think there might be a case for accelerationism even if you think it increases the risk of AI takeover and that AI takeover is bad, on the grounds that in many scenarios advancing faster might still increase the probability of human descendants getting through the time of perils before some other threat destroys us (every year we remain in our current state is another year in which we run the risk of, for example, a global nuclear war or civilisation-ending pandemic).
Hi,
I have a post where I conclude the above may well apply not only to digital consciousness, but also to animals:
A lot of these points seem like arguments that it’s possible that unaligned AI takeover will go well, e.g. there’s no reason not to think that AIs are conscious, or will have interesting moral values, or etc.
My stance is that we (more-or-less) know humans are conscious and have moral values that, while they have failed to prevent large amounts of harm, seem to have the potential to be good. AIs may be conscious and may have welfare-promoting values, but we don’t know that yet. We should try to better understand whether AIs are worthy successors before transitioning power to them.
Probably a core point of disagreement here is whether, presented with a “random” intelligent actor, we should expect it to promote welfare or prevent suffering “by default”. My understanding is that some accelerationists believe that we should. I believe that we shouldn’t. Moreover I believe that it’s enough to be substantially uncertain about whether this is or isn’t the default to want to take a slower and more careful approach.
I claim there’s a weird asymmetry here where you’re happy to put trust into humans because they have the “potential” to do good, but you’re not willing to say the same for AIs, even though they seem to have the same type of “potential”.
Whatever your expectations about AIs, we already know that humans are not blank slates that may or may not be altruistic in the future: we actually have a ton of evidence about the quality and character of human nature, and it doesn’t make humans look great. Humans are not mainly described as altruistic creatures. I mentioned factory farming in my original comment, but one can examine the way people spend their money (i.e. not mainly on charitable causes), or the history of genocides, war, slavery, and oppression for additional evidence.
I don’t expect humans to “promote welfare or prevent suffering” by default either. Look at the current world. Have humans, on net, reduced or increased suffering? Even if you think humans have been good for the world, it’s not obvious. Sure, it’s easy to dismiss the value of unaligned AIs if you compare against some idealistic baseline; but I’m asking you to compare against a realistic baseline, i.e. actual human nature.
It seems like you’re just substantially more pessimistic than I am about humans. I think factory farming will be ended, and though it seems like humans have caused more suffering than happiness so far, I think their default trajectory will be to eventually stop doing that, and to ultimately do enough good to outweigh their ignoble past. I don’t think this is certain by any means, but I think it’s a reasonable extrapolation. (I maybe don’t expect you to find it a reasonable extrapolation.)
Meanwhile I expect the typical unaligned AI may seize power for some purpose that seems to us entirely trivial, and may be uninterested in doing any kind of moral philosophy, and/or may not place any terminal (rather than instrumental) value in paying attention to other sentient experiences in any capacity. I do think humans, even with their kind of terrible track record, are more promising than that baseline, though I can see why other people might think differently.
I haven’t read your entire post about this, but I understand you believe that if we created aligned AI, it would get essentially “current” human values, rather than e.g. some improved / more enlightened iteration of human values. If instead you believed the latter, that would set a significantly higher bar for unaligned AI, right?
That’s right, if I thought human values would improve greatly in the face of enormous wealth and advanced technology, I’d definitely be open to seeing humans as special and extra valuable from a total utilitarian perspective. Note that many routes through which values could improve in the future could apply to unaligned AIs too. So, for example, I’d need to believe that humans would be more likely to reflect, and be more likely to do the right type of reflection, relative to the unaligned baseline. In other words it’s not sufficient to argue that humans would reflect a little bit; that wouldn’t really persuade me at all.
(edit: my point is basically the same as emre’s)
I think there is very likely at some point going to be some sort of transition to a world where AIs are effectively in control. It seems worth it to slow down on the margin to try to shape this transition as best we can, especially slowing it down as we get closer to AGI and ASI. It would be surprising to me if making the transfer of power more voluntary/careful led to worse outcomes (or only led to slightly better outcomes such that the downsides of slowing down a bit made things worse).
Delaying the arrival of AGI by a few years as we get close to it seems good regardless of parameters like the value of involuntary-AI-disempowerment futures. But delaying the arrival by 100s of years seems more likely bad due to the tradeoff with other risks.
Two questions here:
Why would accelerating AI make the transition less voluntary? (In my own mind, I’d be inclined to reverse this sentiment a bit: delaying AI by regulation generally involves forcibly stopping people from adopting AI. Force might be justified if it brings about a greater good, but that’s not the argument here.)
I can understand being “careful”. Being careful does seem like a good thing. But “being careful” generally trades off against other values in almost every domain I can think of, and there is such a thing as too much of a good thing. What reason is there to think that pushing for “more caution” is better on the margin compared to acceleration, especially considering society’s default response to AI in the absence of intervention?
So in the multi-agent slowly-replacing case, I’d argue that individual decisions don’t necessarily represent a voluntary decision on behalf of society (I’m imagining something like this scenario). In the misaligned power-seeking case, it seems obvious to me that this is involuntary. I agree that it technically could be a collective voluntary decision to hand over power more quickly, though (and in that case I’d be somewhat less against it).
I think emre’s comment lays out the intuitive case for being careful / taking your time, as does Ryan’s. I think the empirics are a bit messy once you take into account benefits of preventing other risks but I’d guess they come out in favor of delaying by at least a few years.
I don’t think this is a crux. Even if you prefer unaligned AI values over likely human values (weighted by power), you’d probably prefer doing research on further improving AI values over speeding things up.
I think misaligned AI values should be expected to be worse than human values, because it’s not clear that misaligned AI systems would care about eg their own welfare.
Inasmuch as we expect misaligned AI systems to be conscious (or whatever we need to care about them) and also to be good at looking after their own interests, I agree that it’s not clear from a total utilitarian perspective that the outcome would be bad.
But the “values” of a misaligned AI system could be pretty arbitrary, so I don’t think we should expect that.
So I think it’s likely you have some very different beliefs from most people/EAs/myself, particularly:
Thinking that humans/humanity is bad, and AI is likely to be better
Thinking that humanity isn’t driven by ideational/moral concerns[1]
That AI is very likely to be conscious, moral (as in, making better moral judgements than humans), and that the current/default trend in the industry is very likely to make them conscious moral agents in a way humans aren’t
I don’t know if the total utilitarian/accelerationist position in the OP is yours or not. I think Daniel is right that most EAs don’t have this position. I think maybe Peter Singer gets closest to this in his interview with Tyler on the ‘would you side with the Aliens or not question’ here. But the answer to your descriptive question is simply that most EAs don’t have the combination of moral and empirical views about the world to make the argument you present valid and sound, so that’s why there isn’t much talk in EA about naïve accelerationism.
Going off the vibe I get from this view though, I think it’s a good heuristic that if your moral view sounds like a movie villain’s monologue it might be worth reflecting, and a lot of this post reminded me of the Earth-Trisolaris Organisation from Cixin Liu’s Three Body Problem. If someone’s honest moral view is “Eliminate human tyranny! The world belongs to
TrisolarisAIs!” then I don’t know what else there is to do except quote Zvi’s phrase “please speak directly into this microphone”.Another big issue I have with this post is that some of the counter-arguments just seem a bit like ‘nu-uh’, see:
These (and other examples) are considerations for sure, but they need to be argued for. I don’t think they can just be stated and then say “therefore, ACCELERATE!”. I agree that AI Safety research needs to be more robust and the philosophical assumptions and views made more explicit, but one could already think of some counters to the questions that you raise, and I’m sure you already have them. For example, you might take a view (ala Peter Godfrey-Smith) that a certain biological substrate is necessary for conscious.
Similarly on total utilitarianism emphasis larger population sizes, agreed to the extent that the greater population increase the population utility, but this is the repugnant conclusion again. There’s a stopping point even in that scenario where an ever larger population decreases total utility, which is why in Parfit’s scenario it’s full of potatoes and muzak rather than humans crammed into battery cages like factory-farmed animals. Empirically, naïve accelerationism may tend toward the latter case in practice, even if there’s a theoretical case to be made for it.
There’s more I could say, but I don’t want to make this reply too long, and I think as Nathan said it’s a point worth discussing. Nevertheless it seems our different positions on this are built on some wide, fundamental divisions about reality and morality itself, and I’m not sure how those can be bridged, unless I’ve wildly misunderstood your position.
this is me-specific
I don’t think humanity is bad. I just think people are selfish, and generally driven by motives that look very different from impartial total utilitarianism. AIs (even potentially “random” ones) seem about as good in expectation, from an impartial standpoint. In my opinion, this view becomes even stronger if you recognize that AIs will be selected on the basis of how helpful, kind, and useful they are to users. (Perhaps notice how different this selection criteria is from the evolutionary criteria used to evolve humans.)
I understand that most people are partial to humanity, which is why they generally find my view repugnant. But my response to this perspective is to point out that if we’re going to be partial to a group on the basis of something other than utilitarian equal consideration of interests, it makes little sense to choose to be partial to the human species as opposed to the current generation of humans or even myself. And if we take this route, accelerationism seems even more strongly supported than before, since developing AI and accelerating technological progress seems to be the best chance we have of preserving the current generation against aging and death. If we all died, and a new generation of humans replaced us, that would certainly be pretty bad for us.
Which sounds more like a movie villain’s monologue?
The idea that everyone currently living needs to sacrificed, and die, in order to preserve the human species
The idea that we should try to preserve currently living people, even if that means taking on a greater risk of not preserving the values of the human species
To be clear, I also just totally disagree with the heuristic that “if your moral view sounds like a movie villain’s monologue it might be worth reflecting”. I don’t think that fiction is generally a great place for learning moral philosophy, albeit with some notable exceptions.
Anyway, the answer to these moral questions may seem obvious to you, but I don’t think they’re as obvious as you’re making them seem.
This is not why people disagree IMO.
I think the fact that people are partial to humanity explains a large fraction of the disagreement people have with me. But, fair enough, I exaggerated a bit. My true belief is a more moderate version of that claim.
When discussing why EAs in particular disagree with me, to overgeneralize by a fair bit, I’ve noticed that EAs are happy to concede that AIs could be moral patients, but are generally reluctant to admit AIs as moral agents, in the way they’d be happy to accept humans as independent moral agents (e.g. newborns) into our society. I’d call this “being partial to humanity”, or at least, “being partial to the values of the human species”.
(In my opinion, this partiality seems so prevalent and deep in most people that to deny it seems a bit like a fish denying the existence of water. But I digress.)
To test this hypothesis, I recently asked three questions on Twitter about whether people would be willing to accept immigration through a portal to another universe from three sources:
“a society of humans who are very similar to us”
“a society of people who look & act like humans, but each of them only cares about their family”
“a society of people who look & act like humans, but they only care about maximizing paperclips”
I emphasized that in each case, the people are human-level in their intelligence, and also biological.
The results are preliminary (and I’m not linking here to avoid biasing the results, as voting has not yet finished), but so far my followers, who are mostly EAs, are much more happy to let the humans immigrate to our world, compared to the last two options. I claim there just aren’t really any defensible reasons to maintain this choice other than by implicitly appealing to a partiality towards humanity.
My guess is that if people are asked to defend their choice explicitly, they’d largely talk about some inherent altruism or hope they place in the human species, relative to the other options; and this still looks like “being partial to humanity”, as far as I can tell, from almost any reasonable perspective.
Maybe, it’s hard for me to know. But I predict most the pushback you’re getting from relatively thoughtful longtermists isn’t due to this.
I agree with this.
I think “being partial to humanity” is a bad description of what’s going on because (e.g.) these same people would be considerably more on board with aliens. I think the main thing going on is that people have some (probably mistaken) levels of pessimism about how AIs would act as moral agents which they don’t have about (e.g.) aliens.
This comparison seems to me to be missing the point. Minimally I think what’s going on is not well described as “being partial to humanity”.
Here’s a comparison I prefer:
A society of humans who are very similar to us.
A society of humans who are very similar to us in basically every way, except that they have a genetically-caused and strong terminal preference for maximizing the total expected number of paper clips (over the entire arc of history) and only care about other things instrumentally. They are sufficiently commited to paper clip maximization that this will persist on arbitrary reflection (e.g. they’d lock in this view immediately when given this option) and let’s also suppose that this view is transmitted genetically and in a gene-drive-y way such that all of their descendents will also only care about paper clips. (You can change paper clips to basically anything else which is broadly recognized to have no moral value on its own, e.g. gold twisted into circles.)
A society of beings (e.g. aliens) who are extremely different in basically every way to humans except that they also have something pretty similar to the concepts of “morality”, “pain”, “pleasure”, “moral patienthood”, “happyness”, “preferences”, “altruism”, and “careful reasoning about morality (moral thoughtfulness)”. And the society overall also has a roughly similar relationship with these concepts (e.g. the level of “altruism” is similar). (Note that having the same relationship as humans to these concepts is a pretty low bar! Humans aren’t that morally thoughtful!)
I think I’m almost equally happy with (1) and (3) on this list and quite unhappy with (2).
If you changed (3) to instead be “considerably more altruistic”, I would prefer (3) over (1).
I think it seems weird to call my views on the comparison I just outlined as “being partial to humanity”: I actually prefer (3) over (2) even though (2) are literally humans!
(Also, I’m not that commited to having concepts of “pain” and “pleasure”, but I’m relatively commited to having a concepts which are something like “moral patienthood”, “preferences”, and “altruism”.)
Below is a mild spoiler for a story by Eliezer Yudkowsky:
To make the above comparison about different beings more concrete, in the case of three worlds collide, I would basically be fine giving the universe over the the super-happies relative to humans (I think mildly better than humans?) and I think it seems only mildly worse than humans to hand it over to the baby-eaters. In both cases, I’m pricing in some amount of reflection and uplifting which doesn’t happen in the actual story of three worlds collide, but would likely happen in practice. That is, I’m imagining seeing these societies prior to their singularity and then based on just observations of their societies at this point, deciding how good they are (pricing in the fact that the society might change over time).
To be clear, it seems totally reasonable to call this “being partial to some notion of moral thoughtfulness about pain, pleasure, and preferences”, but these concepts don’t seem that “human” to me. (I predict these occur pretty frequently in evolved life that reaches a singularity for instance. And they might occur in AIs, but I expect misaligned AIs which seize control of the world are worse from my perspective than if humans retain control.)
When I say that people are partial to humanity, I’m including an irrational bias towards thinking that humans, or evolved beings, are unusually thoughtful or ethical compared to the alternatives (I believe this is in fact an irrational bias, since the arguments I’ve seen for thinking that unaligned AIs will be less thoughtful or ethical than aliens seem very weak to me).
In other cases, when people irrationally hold a certain group X to a higher standard than a group Y, it is routinely described as “being partial to group Y over group X”. I think this is just what “being partial” means, in an ordinary sense, across a wide range of cases.
For example, if I proposed aligning AI to my local friend group, with the explicit justification that I thought my friends are unusually thoughtful, I think this would be well-described as me being “partial” to my friend group.
To the extent you’re seeing me as saying something else about how longtermists view the argument, I suspect you’re reading me as saying something stronger than what I originally intended.
In that case, my main disagreement is thinking that your twitter poll is evidence for your claims.
More specifically:
Like you claim there aren’t any defensible reasons to think that what humans will do is better than literally maximizing paper clips? This seems totally wild to me.
I’m not exactly sure what you mean by this. There were three options, and human paperclippers were only one of these options. I was mainly discussing the choice between (1) and (2) in the comment, not between (1) and (3).
Here’s my best guess at what you’re saying: it sounds like you’re repeating that you expect humans to be unusually altruistic or thoughtful compared to an unaligned alternative. But the point of my previous comment was to state my view that this bias counted as “being partial towards humanity”, since I view the bias as irrational. In light of that, what part of my comment are you objecting to?
To be clear, you can think the bias I’m talking about is actually rational; that’s fine. But I just disagree with you for pretty mundane reasons.
[Incorporating what you said in the other comment]
Then I think it’s worth concretely explaining what these reasons are to believe that human control will be a decent amount better in expectation. You don’t need to write this up yourself, of course. I think the EA community should write these reasons up. Because I currently view the proposition as non-obvious, and despite being a critical belief in AI risk discussions, it’s usually asserted without argument. When I’ve pressed people in the past, they typically give very weak reasons.
I don’t know how to respond to an argument whose details are omitted.
+1, but I don’t generally think it’s worth counting on “the EA community” to do something like this. I’ve been vaguely trying to pitch Joe on doing something like this (though there are probably better uses of his time) and his recent blogs posts are touching similar topics.
Also, it’s usually only the crux of longtermists which is probably one of the reasons why no one has gotten around to this.
You didn’t make this clear, so was just responding generically.
Separately, I think I feel a pretty similar intution for case (2), people literally only caring about their families seems pretty clearly worse.
There, I’m just saying that human control is better than literal paperclip maximization.
This response still seems underspecified to me. Is the default unaligned alternative paperclip maximization in your view? I understand that Eliezer Yudkowsky has given arguments for this position, but it seems like you diverge significantly from Eliezer’s general worldview, so I’d still prefer to hear this take spelled out in more detail from your own point of view.
Your poll says:
And then you say:
So, I think more human control is better than more literal paperclip maximization, the option given in your poll.
My overall position isn’t that the AIs will certainly be paperclippers, I’m just arguing in isolation about why I think the choice given in the poll is defensible.
I have the feeling we’re talking past each other a bit. I suspect talking about this poll was kind of a distraction. I personally have the sense of trying to convey a central point, and instead of getting the point across, I feel the conversation keeps slipping into talking about how to interpret minor things I said, which I don’t see as very relevant.
I will probably take a break from replying for now, for these reasons, although I’d be happy to catch up some time and maybe have a call to discuss these questions in more depth. I definitely see you as trying a lot harder than most other EAs in trying to make progress on these questions collaboratively with me.
I’d be very happy to have some discussion on these topics with you Matthew. For what it’s worth, I really have found much of your work insightful, thought-provoking, and valuable. I think I just have some strong, core disagreements on multiple empirical/epistemological/moral levels with your latest series of posts.
That doesn’t mean I don’t want you to share your views, or that they’re not worth discussion, and I apologise if I came off as too hostile. An open invitation to have some kind of deeper discussion stands.[1]
I’d like to try out the new dialogue feature on the Forum, but that’s a weak preference
Agreed, sorry about that.
Also, to be clear, I agree that the question of “how much worse/better is it for AIs to get vast amounts of resources without human society intending to grant those resources to the AIs from a longtermist perspective” is underinvestigated, but I think there are pretty good reasons to systematically expect human control to be a decent amount better.
I’m guessing preference utilitarians would typically say that only the preferences of conscious entities matter. I doubt any of them would care about satisfying an electron’s “preference” to be near protons rather than ionized.
Perhaps. I don’t know what most preference utilitarians believe.
Are you familiar with Brian Tomasik? (He’s written about suffering of fundamental particles, and also defended preference utilitarianism.)
Strongly there should be more explicit defences of this argument.
One way of doing this in a co-operative way might working on co-operative AI stuff, since it seems to increase the likelihood that misaligned AI goes well, or at least less badly.
My personal reason for not digging into this is that my naive model of how good the AI future is: quality_of_future * amount_of_the_stuff. And there is distinction I haven’t seen you acknowledged: while high “quality” doesn’t require humans to be around, I ultimately judge quality by my values. (Thing being conscious is an example. But this also includes things like not copy-pasting the same thing all over, not wiping out aliens, and presumably many other things I am not aware of. IIRC Yudkowsky talks about cosmopolitanism being a human value.) Because of this, my impression is that if we hand over the future to a random AI, the “quality” will be very low. And so we can currently have a much larger impact by focusing on increasing the quality. Which we can do by delaying “handing over the future to AI” and picking a good AI to hand over to. IE, alignment.
(Still, I agree it would be nice if there was a better analysis of this, which exposed the assumptions.)
Is there any particular reason why you are partial towards humans generically controlling the future, relative to this particular current generation of humans? To me, it seems like being partial to one’s own values, one’s community, and especially one’s own life, generally leads to an even stronger argument for accelerationism, since the best way to advance your own values is generally to actually “be there” when AI happens.
In my opinion, the main relevant alternative to this view is to be partial to the human species, as opposed to being partial to either one’s current generation, or oneself. And I think the human species is kind of a weird category to be partial to, relative to those other things. Do you disagree?
I agree with this.
I (strongly) disagree with this. Me being alive is a relatively small part of my values. And since I am not the director of the world, me personally being around to influence things is unlikely to have a decisive impact on things I value.
In more detail: Sure, all else being equal, me being there when AI happens is mildly helpful. But the outcome of building AI seems to be a function of, among other things, (i) values of the people building it + (ii) how much reflection they can do on those values + (iii) the environment dynamics these people are subject to (e.g., the current race dynamics between AI companies). And over time, I expect the potential decrease in (i) to be far outweighed by gains in (ii) and (iii).
The first issue is about (i), that it is not actually me building the AGI, either now or in the future. But I am willing to grant that (all else being equal) current generation is more likely to have values closer to my values.
However, I expect that the factors are (ii) and (iii) are just as influential. Regarding (ii), it seems we keep making progress at philosophy, ethics, etc, and to me, this currently far outweighs the value drift in (i).
Regarding (iii), my impression is that the current situation is so bad that it can’t get much worse, and we might as well wait. This of course depends on how likely you think we are likely to get a bad outcome if we either (a) get superintelligence without additional progress on alignment or (b) get widespread human-level AI with no progress on alignment, institution design, etc.
I agree some people (such as yourself) might be extremely altruistic, and therefore might not care much about their own life relative to other values they hold, but this position is fairly uncommon. Most people care a lot about their own lives (and especially the lives of their family and friends) relative to other things they care about. We can empirically test this hypothesis by looking at how people choose to spend their time and money; and the results are generally that people spend their money on themselves, their family and their friends.
You don’t need to be director of the world to have influence over things. You can just be a small part of the world to have influence over things that you care about. This is essentially what you’re already doing by living and using your income to make decisions, to satisfy your own preferences. I’m claiming this situation could and probably will persist into the indefinite future, for the agents that exist in the future.
I’m very skeptical that there will ever be a moment in time during which there will be a “director of the world”, in a strong sense. And I doubt the developer of the first AGI will become the director of the world, even remotely (including versions of them that reflect on moral philosophy etc.). You might want to read my post about this.
Great points, Matthew! I have wondered about this too. Relatedly, readers may want to check the sequence otherness and control in the age of AGI from Joe Carlsmith, in particular, Does AI risk “other” the AIs?.
One potential argument against accelerating AI is that it will increase the chance of catastrophes which will then lead to overregulating AI (e.g. in the same way that nuclear power arguably was overregulated).