This other Ryan Greenblatt is my old account[1]. Here is my LW account.
- ^
Account lost to the mists of time and expired university email addresses.
This other Ryan Greenblatt is my old account[1]. Here is my LW account.
Account lost to the mists of time and expired university email addresses.
because it feels very differently about “99% of humanity is destroyed, but the remaining 1% are able to rebuild civilisation” and “100% of humanity is destroyed, civilisation ends”
Maybe? This depends on what you think about the probability that intelligent life re-evolves on earth (it seems likely to me) and how good you feel about the next intelligent species on earth vs humans.
the particular focus on extinction increases the threat from AI and engineered biorisks
IMO, most x-risk from AI probably doesn’t come from literal human extinction but instead AI systems acquiring most of the control over long run resources while some/most/all humans survive, but fair enough.
Where the main counterargument is that now the groups in power can be immortal and digital minds will be possible.
See also: AGI and Lock-in
What about “Is Power-Seeking AI an Existential Risk?”?
I don’t know if you’d count it as quantitative, but it is detailed.
My views are reasonably messy, complicated, hard to articulate, and based on a relatively diffuse set of intuitions. I think we also reason in a pretty different way about the situation than you seem to (3). I think it wouldn’t be impossible to try to write up a post on my views, but I would need to consolidate and think about how exactly to express where I’m at. (Maybe 2-5 person days of work.) I haven’t really consolidated my views or something close to reflective equilibrium.
I also just that arguing about pure philosophy very rarely gets anywhere and is very hard to make convincing in general.
I’m somewhat uncertain on the “inside view/mechanistic” level. (But my all considered view is partially defering to some people which makes me overall less worried that I should immediately reconsider my life choices.)
I think my views are compelling, but I’m not sure if I’d say “very compelling”
I’m in agreement that this consideration makes it hard to do a direct comparison. But I think this consideration should mostly make us more uncertain, rather than making us think that humans are better than the alternative.
Actually, I was just trying to say “I can see what humans are like, and it seems pretty good relative to me current guesses about AIs in ways that dont just update me up about AIs” sorry about the confusion.
Currently, humans seem much closer to me in a values level than GPT-4 base. I think this is also likely to be true of future AIs, though I understand why you might not find this convincing.
I think the architecture (learning algorithm, etc.) and training environment between me and other humans seems vastly more similar than between me and likely AIs.
I don’t think I’m going to flesh this argument out to an extent to which you’d find it sufficiently rigorous or convincing, sorry.
You should compare against human nature, which was optimized for something quite different from utilitarianism. If I add up the pros and cons of the thing humans were optimized for and compare it against the thing AIs will be optimized for, I don’t see why it comes out with humans on top, from a utilitarian perspective. Can you elaborate on your reasoning here?
I can’t quickly elaborate in a clear way, but some messy combination of:
I can currently observe humans which screens off a bunch of the comparison and let’s me do direct analysis.
I can directly observe AIs and make predictions of future training methods and their values seem to result from a much more heavily optimized and precise thing with less “slack” in some sense. (Perhaps this is related to genetic bottleneck, I’m unsure.)
AIs will be primarily trained in things which look extremely different from “cooperatively achieving high genetic fitness”.
Current AIs seem to use the vast, vast majority of their reasoning power for purposes which aren’t directly related to their final applications. I predict this will also apply for internal high level reasoning of AIs. This doesn’t seem true for humans.
Humans seem optimized for something which isn’t that far off from utilitarianism from some perspective? Make yourself survive, make your kin group survive, make your tribe survive, etc? I think utilitarianism is often a natural generalization of “I care about the experience of XYZ, it seems arbitrary/dumb/bad to draw the boundary narrowly, so I should extend this further” (This is how I get to utilitarianism.) I think the AI optimization looks considerably worse than this by default.
(Again, note that I said in my comment above: “Some of these can be defeated relatively easily if we train AIs specifically to be good successors, but the default AIs which end up with power over the future will not have this property.” I edited this in to my prior comment, so you might have missed it, sorry.)
What are these a priori reasons and why don’t they similarly apply to AI?
I am a human. Other humans might end up in a similar spot on reflection.
(Also I care less about values of mine which are highly contingent wrt humans.)
The ones I would say are something like (approximately in priority order):
AI’s values could result mostly from playing the training game or other relatively specific optimizations they performed in training which might result in extremely bizarre values from our perspective.
More generally AI values might be highly alien in a way where caring about experience seems very strange to them.
AIs by default will be optimized for very specific commercial purposes with narrow specializations and a variety of hyperspecific heuristics and the resulting values and generalizations of these will be problematic
I care ultimately about what I would think is good upon (vast amounts of) reflection and there are good a priori reasons to think this is similar to what other humans (who care about using vast amounts of compute) will end up thinking is good.
As a sub argument, I might care specifically about things which are much more specific than “lots of good diverse experience”. And, divergences from what I care about (even conditioning on something roughly utilitarian) might result in massive discounts from my perspective.
I care less about my values and preferences in worlds where they seem relatively contingent, e.g. they aren’t broadly shared on reflection by reasonable fractions of humanity.
AIs don’t have a genetic bottleneck and thus can learn much more specific drives that perform well while evolution had to make values more discoverable and adaptable.
E.g. various things about empathy.
AIs might have extremely low levels of cognitive diversity in their training environments as far as co-workers go which might result in very different attitudes as far as caring about diverse experience.
Some of these can be defeated relatively easily if we train AIs specifically to be good successors, but the default AIs which end up with power over the future will not have this property.
Also, I should note that this isn’t a very strong list, though in aggregate it’s sufficient to make me think that human control is perhaps 4x better than AIs. (For reference, I’d say that me personally being in control is maybe 3x better than human control.) I disagree with a MIRI style view about the disvalue of AI and the extent of fragility of value that seems implicit.
Another relevant consideration along these lines is that people who selfishly desire high wealth might mostly care about positional goods which are similar to current positional goods. Usage of these positional goods won’t burn much of any compute (resources for potential minds) even if these positional goods become insanely valuable in terms of compute. E.g., land values of interesting places on earth might be insanely high and people might trade vast amounts of comptuation for this land, but ultimately, the computation will be spent on something else.
why you care about the small fraction of resources spent on altruism
I’m also not sold it’s that small.
Regardless, doesn’t seem like we’re making progresss here.
If AI alignment causes high per capita incomes (because it enriches humans with a small population size), then plausibly this is worse than having a far larger population of unaligned AIs who have lower per capita consumption, from a utilitarian point of view.
Both seems negligible relative to the expected amount of compute spent on optimized goodness in my view.
Also, I’m not sold that there will be more AIs, it depends on pretty complex details about AI preferences. I think it’s likely AIs won’t have preferences for their own experiences given current training methods and will instead have preferences for causing certain outcomes.
It’s possible we’re using these words differently, but I guess I’m not sure why you’re downplaying the value of economic consumption here
Ah, sorry, I was referring to the process of the AI labor being used to accomplish the economic output not having much total moral value. I thought you were arguing that aligned AIs being used to produce goods would have be where most value is coming from because of the vast numbers of such AIs laboring relative to other enitites. Sorry by “from incidental economic consumption” I actually meant “incidentally (as a side effect from) economic consumption”. This is in response to things like:
Consequently, in a scenario where AIs are aligned with human preferences, the consciousness of AIs will likely be determined mainly by economic efficiency factors during production, rather than by moral considerations. To put it another way, the key factor influencing whether AIs are conscious in this scenario will be the relative efficiency of creating conscious AIs compared to unconscious ones for producing the goods and services demanded by future people. As these efficiency factors are likely to be similar in both aligned and unaligned scenarios, we are led to the conclusion that, from a total utilitarian standpoint, there is little moral difference between these two outcomes.
As far as the other thing you say, I still disagree, though for different (related) reasons:
As a consequence, I simply do not agree with the intuition that economic consumption is a rounding error compared to the much smaller fraction of resources spent on altruistic purposes.
I don’t agree with either “much smaller” and I think rounding error is reasonably likely as far as the selfish preferences of current existing humans or the AIs that seize control go. (These entities might (presumably altruistically) create entities which then selfishly satisfy their preferences, but that seems pretty different.)
My main counterargument is that selfish preference will result in wildly fewer entities if such entities aren’t into (presumably altruistically) making more entities and thus will be extremely inefficient. Of course it’s possible that you have AIs with non-indexical preferences but which are de facto selfish in other ways.
E.g., for humans you have 10^10 beings which are probably radically inefficient at producing moral value. For AIs it’s less clear and depends heavily on how you operationalize selfishness.
I have a general view like “in the future, the main way you’ll get specific things that you might care about is via people trying specifically to make those things because optimization is extremely powerful”.
I’m probably not going to keep responding as I don’t think I’m comparatively advantaged in fleshing this out. And doing this in a comment section seems suboptimal. If this is anyone’s crux for working on AI safety though, consider contacting me and I’ll consider setting you up with someone who I think understands my views and would to go through relevant arguments with you. Same offer applies to you Matthew particularly if this is a crux, but I think we should use a medium other than EA forum comments.
Do you have an argument for why humans are more likely to try to create morally valuable lives compared to unaligned AIs?
TBC, the main point I was trying to make was that you didn’t seem to be presenting arguments about what seems to me like the key questions. Your summary of your position in this comment seems much closer to arguments about the key questions than I interpreted your post being. I interpreted your post as claiming that most value would result from incidental economic consumption under either humans or unaligned AIs, but I think you maybe don’t stand behind this.
Separately, I think the “maybe AIs/humans will be selfish and/or not morally thoughtful” argument mostly just hits both unaligned AIs and humans equally hard such that it just gets normalized out. And then the question is more about how much you care about the altruistic and morally thoughtful subset.
(E.g., the argument you make in this comment seemed to me like about 1⁄6 of your argument in the post and it’s still only part of the way toward answering the key questions from my perspective. I think I partially misunderstood the emphasis of your argument in the post.)
I do have arguments for why I think human control is more valuable than control by AIs that seized control from humans, but I’m not going to explain them in detail in this comment. My core summary would be something like “I expect substantial convergence among morally thoughtful humans which reflect toward my utilitarian-ish views, I expect notably less convergence between me and AIs. I expect that AIs have somewhat messed up and complex and specific values in ways which might make them not care about things we care about as a results of current training processes, while I don’t have such an arguement for humans.”
As far as I what I do think the the key questions are, I think they are something like:
What do humans/AIs have for preference radically longer lives, massive self-enhancement, and potentially long periods of reflection?
How much do values/views diverge/converge between different altruistically minded humans who’ve thought about it extremely long durations?
Even if various entities are into creating “good experiences” how much do these views diverge in what is the best? My guess would be that even if two entities are maximizing good experiences from their perspective the relative goodness/compute can be much lower for the other entity, (e.g. easily 100x lower, maybe more)
How similar are my views on what is good after reflection to other humans vs AIs?
How much should we care about worlds where morally thoughtful humans reach radically diffent conclusions on reflection?
Structurally, what sorts of preferences do AI training processes impart on AIs conditionally on these AIs successfully seizing power? I also think this is likely despite humanity likely resisting to at least some extent.
It seems like your argument is something like “who knows about AI preferences, also, they’ll probably have similar concepts as we do” and “probably humanity will just have the same observed preferences as they currently do”.
But I think we can get much more specific guesses about AI preferences such that this weak indifference principle seems unimportant and I think human preferences will change radically, e.g. preferences will change far more in the next 10 million than in the last the last 2000 years.
Note that I’m not making an argument for greater value on human control in this comment, just trying to explain why I don’t think your argument is very relevant. I might try to write up something about my overall views here, but it doesn’t seem like my comparative advantage and it currently seems non-urgent from my perspective. (Though embarassing for the field as a whole.)
If I had to pick a second consideration I’d go with:
After millions of years of life (or much more) and massive amounts of cognitive enhancement, the way post-humans might act isn’t clearly well predicted by just looking at their current behavior.
Again, I’d like to stress that my claim is:
Also, to be clear, none of the considerations I listed make a clear and strong case for unaligned AI being less morally valuable, but they do make the case that the relevant argument here is very different from the considerations you seem to be listing. In particular, I think value won’t be coming from incidental consumption.
Maybe the most important single consideration is something like:
Value can be extremely dense in computation relative to the density of value from AIs used for economic activity (instead of value).
So, we should focus on the question of entities trying to create morally valuable lives (or experience or whatever relevant similar property we care about) and then answer this.
(You do seem to talk about “will AIs have more/less utilitarian impulses than humans”, but you seem to talk about this almost entirely from the perspective of growing the economy rather than question like how good the lives will be.)
Hmm, this is more of a claim then a consideration but I’d highlight:
It seems likely to me that the vast, vast majority of moral value (from this sort of utilitarian perspective) will be produced via people trying to improve to improve moral value rather than incidentally via economic production. This applies for both aligned and unaligned AI. I expect that only a tiny fraction of available comptuation goes toward optimizing economic production and that only a smaller fraction of this is morally relevant and that the weight on this moral relevance is much lower than being specifically optimize for moral relevance when operating from a similar perspective. This bullet is somewhere between a consideration and a claim, though it seems like possibly our biggest disagreement. I think it’s possible that this disagreement is driven by some of the other considerations I list.
The main thing this claim disputes is:
Consequently, in a scenario where AIs are aligned with human preferences, the consciousness of AIs will likely be determined mainly by economic efficiency factors during production, rather than by moral considerations.
(and some related points).
Sorry, I don’t think this exactly addresses your comment. I’ll maybe try to do a better job in a bit. I think a bunch of the considerations I mention are relatively diffuse, but important in aggregate.
One additional meta-level point which I think is important: I think that existing writeups of why human control would have more moral value than unaligned AI control from a longtermist perspective are relatively weak and often specific writeups are highly flawed. (For some discussion of flaws, see this sequence.)
I just think that this write-up misses what seem to me to be key considerations, I’m not claiming that existing work settles the question or is even robust at all.
And it’s somewhat surprising and embarassing that this is the state of the current work given that longtermism is reasonably common and arguments for working on AI x-risk from a longtermist perspective are also common.
The “footprints on the future” thing could be referencing this post.
(Edit: to be clear, this link is not an endorsement.)
I’m not sure that I buy that critics lack motivation. At least in the space of AI, there will be (and already are) people with immense financial incentive to ensure that x-risk concerns don’t become very politically powerful.
Of course, it might be that the best move for these critics won’t be to write careful and well reasoned arguments for whatever reason (e.g. this would draw more attention to x-risk so ignoring it is better from their perspective).
Edit: this is mentioned in the post, but I’m a bit surprised because this isn’t emphasized more.