To respond to Yampolskiy without disagreeing with the fundamental point, I think it’s definitely possible for a less intelligent species to align or even indefinitely control a boundedly and only slightly more intelligent species, especially given greater resources, speed, and/or numbers, and sufficient effort.
The problem is that humans aren’t currently trying to limit the systems or trying much to monitor, much less robustly align or control them.
Davidmanheim
I’d love to hear non-naturalist moral realists talk about how they think moral facts are epistemically accessible...
The lack of an answer to that is a lot of the reason I discount the view as either irrelevant or not effectively different from moral non-realism.True, you could accept this moral view....
Thanks!
And as I noted on the other post, I think there’s a coherent argument that if we care about distinct moral experiences in some way, rather than just the sum, we get something like a limited effective utility, not at 10^12 people specifically, but plausibly somewhere far less than a galaxy full.
To clarify, do you think there’s a large minority change that it is possible to align an arbitrarily powerful system, or do you think there is a large minority chance that it is going to happen with the first such arbitrarily powerful system, such that we’re not locked in to a different future / killed by a misaligned singleton?
I think we can thread the needle by creating strongly non-superintelligent AI systems which can be robustly aligned or controlled. And I agree that we don’t know how to do that at present, but we can very likely get there, even if the proofs of unalignable ASI hold up.
How much do you think that having lots of mostly or entirely identical future lives is differently valuable than having vastly different positive lives? (Because that would create a reasonable view on which a more limited number of future people can saturate the possible future value.)
This seems like a really critical issue, and I’d be very interested in hearing whether this is disputed by @tylermjohn / @William_MacAskill.
This seems like a predictive difference about AI trajectories and control, rather than an ethical debate. Does that seem correct to you (and/or to @Greg_Colbourn ⏸️ ?)
So it sounds like this might be a predictive / empirical dispute about probabilities conditional on slowing AI and avoiding extinction, and the likely futures in each case, and not primarily an ethical theory dispute?
I’m surprised you think future AI would be so likely to be conscious, given the likely advantages of creating non-conscious systems in terms of simplicity and usefulness. (If consciousness is required for much greater intelligence, I would feel differently, but that seems very non-obvious!)
A negotiated paretotopian future could create lots of moral value regardless of values not converging on their own.
Worth pointing out that extinction by almost any avenue we’re discussing seriously would kill a lot of people who already exist.
I don’t think that logic works—in the worlds where AI safety fails, humans go extinct, and you’re not saving lives for very long, so the value of short term EA investments is also correspondingly lower, and you’re choosing between “focusing on good outcomes which won’t happen,” as you said, and focusing on good outcomes which end almost immediately anyways. (But to illustrate this better, I’d need to work an example, and do the math, and then I’d need to argue about the conditionals and the exact values I’m using.)
Good article summarizing the point, but I don’t see the reason for posting these older discussions on the forum.
Thanks for the post, very interesting, definitely resonates with the empirical EA view of power-law returns, which I was surprised you didn’t mention.
A couple issues:
1. The version of non-naturalist moral realism that the divergence seems both very strong, and strange to me. It assumes that the true moral code is unlike mathematical realism, where it’s accessible with reflection and would be a natural conclusion for those who cared.
2. “You could accept diminishing returns to value in utility… but you’re unlikely to be a longtermist, laser focused on extinction risk if you do.” I think this is false under the view of near-term extinction risk that is held by most of those who seem concerned about AI extinction risk, or even varieties of the hinge-of-history view whereby we are affected in the near term by longtermist concerns.
How much of the argument for working towards positive futures rather than existential security rests on conditional value, as opposed to expected value?
One could argue for conditional value, that in worlds where strong AI is easy and AI safety is hard, we are doomed regardless of effort, so we should concentrate on worlds where we could plausibly have good outcomes.
Alternatively, one could be confident that the probability of safety is relatively high, and make the argument that we should spend more time focused on positive futures because it’s likely already—either due to efforts towards superintelligence safety are likely to work, (and if so, which ones?) or because alignment by default seems likely.
(Or, I guess, lastly, one could assume, or argue, that no superintelligence is possible, or it is unlikely.)
I’ve said much the same, explicitly focused on this.
See: https://forum.effectivealtruism.org/posts/jGYoDrtf8JGw85k8T/my-personal-priorities-charity-judaism-and-effective:
To quote the most relevant part. “Lastly, local organizations or those where I have personal affiliations or feel responsibilities towards are also important to me—but… this is conceptually separate from giving charity effectively, and as I mentioned, I donate separately from the 10% dedicated to charity. I give to other organizations, including my synagogue and other local community organizations, especially charities that support the local poor around Jewish holidays, and other personally meaningful projects. But in the spirit of purchasing fuzzies separately, this is done with a smaller total amount, separate from my effective giving. ”
To respond to you points in order:
Sure, but I think of, say, a 5% probability of success and a 6% probability of success as similarly dire enough not to want to pick either.
What we call AGI today, human level at everything as aminimum but running on a GPU, is what Bostrom called speed and/or collective superintelligence, if chip prices and speeds continue to change.
and 4. Sure, alignment isn’t enough, but it’s necessary, and it seems we’re not on track to make even that low bar.
I think we basically agree, but I wanted to add the note of caution. Also, I’m evidently more skeptical of the value of evals, as I don’t see a particularly viable theory of change.
Don’t cause harm
It is not obvious to me that a number of suggested actions here meet this bar. Developing evals, funding work that accidentally encourages race dynamics, or engaging in fear-mongering about current largely harmless or even net-positive AI applications all seem likely to qualify.
I would add the (in my view far more likely) possibility of Yudkowskian* paperclipping via non-sentient AI, which given our currently incredibly low level of control of AI systems, and the fact that we don’t know how to create sentience, seems like the most likely default.
*) Specifically, the view that paperclipping occurs by default from any complex non-satiable implicit utility function, rather than the Bostromian paperclipping risk of accidentally giving a smart AI a dumb goal.