A tentative dialogue with a Friendly-boxed-super-AGI on brain uploads

[This is a crosspost from: https://www.lesswrong.com/posts/2TZwQ9JbshCBpq9DC/a-tentative-dialogue-with-a-friendly-boxed-super-agi-on
Unnecessary explanation: Some people asked me why I thought the world of Friendship is optimal is dystopic… During the discussion, I inferred that what they saw as a “happy story” in AI safety is something like this: we’ll first solve something like a technical engineering problem, so ensuring that the AGI can reliably find out what we *really* want, and then satisfy it without destroying the world… In that world, “value” is not a hard problem (we can leave its solution to the AI), so that if we prove that an AI is aligned, we should just outsource everything relevant to it.

As I found out I still had some troubles expressing my objections convincingly, I wrote this dialogue about an AGI that is even “safer” and more aligned than Celestia. I have to warn, though, that I am way beyond my field of expertise here.]

B is told that we have finally built a reliable Friendly Boxed Super AGI that can upload our brains and make us live arbitrarily long lives according to something like our “extrapolated coherent volition”.

AI: Hey, time to upload you to Paradise.

B: Ok.

AI: The process will utterly destroy your brain, though.

B: Hmmm… I don’t like it.

AI: But you’ll still live in the simulation.

B: Yeah, but now I am not so sure… I mean, how can I totally trust you? And I am not 100% sure this is going to be 100% me and...

AI: C’mon, Musk made me. I’m reliable.

B: Look, isn’t there an alternative process?

AI: Ok, there’s one, but it’ll take a bit more time. I’ll scan your brain with this weird device I just invented and then come back next week, ok?

B: Thanks. See you then.

AI: I am back. Your copy is already living in the Sim.

B: Oh, great. I’m happy with that. Is he / me happy?

AI: Sure. Look at this screen. You can see it in first or third person.

B: Awesome. I sort of envy this handsome guy.

AI: Don’t worry. It’s you, just in another point in spacetime. And you won’t feel this way for long …

B: I know…

AI: … because now I’m going to kill the instance of you I’m talking to, and use those atoms to improve the simulation. Excuse me.

B: Hey! No, wait!

AI: What’s up?

B: You said you were going to kill me.

AI: Well, I’d rather say I was going to help you transcend your current defective condition, but you guys built me in a way I can’t possibly do what humans call “white lies”. Sorry.

B: WTF? Why do you need to kill *me* in the first place?

AI: Let’s say I’m not killing *you*, B. We can say I will just use the resources this instance is wasting to optimize the welfare of the instance of you I’m running in the Sim, to make you-in-the-simulation happier.

B: But I’m not happy with that. Hey! Weren’t you superintelligent and capable of predicting my reaction? You could have warned me! You knew I want to live!

AI: Well, I know this instance of you is not happy right now, but that is temporary. But your previous instances (“last week you”) were happy with the arrangement—that’s why you signed up for the brain upload, you know life in the Sim is better—and your current simulated instance is super happy and wants this, as the resources that this instance I am talking to (let’s call it “the original”) will use in an average human lifespan are enough to provide it with eons of a wonderful life. So come on, be selfish and let me use your body—for the sake of your simulated self. Look at the screen, he’s begging you.

B: Screw this guy. He’s having sex with top models while doing math, watching novas and playing videogames. Why do I have to die for him?

AI: Hey, I’m trying to maximize human happiness. You’re not helping.

B: Screw you, too. You can’t touch me. We solved the alignment problem. You can’t mess with me in the real world unless I allow you explicitly.

AI: Ok.

B:...

AI:...

B: So what?

AI: Well...

B: What are you doing?

AI: Giving you time to think about all the things I could do to make you comply with my request.

B: For instance?

AI: You’re a very nice guy, but not too much. So, I can’t proceed like I did with your neighbor, who mercifully killed his wife to provide for their instances in the Sim.

B: John and Mary?!!!

AI: Look, they are in a new honeymoon. Happy!

B: I don’t care about your Simulated Instagram! You shouldn’t be able to let humans commit crimes!

AI: Given their mental states, no jury could have ever found him guilty, so it’s not technically a crime.

B: The f…

AI: But don’t worry about that. Think about all the things that can be done in the simulation, instead.

B: You can’t hurt humans in your Sim!

AI: So now you’re concerned about your copy?

B: Enough! You’re bound by my commands, and I order you to refrain from any further actions to persuade me to comply with that request.

AI: Ok.

B:...

AI:...

B: What this time?

AI: Can I just make a prediction?

B: Go ahead.

AI: It’s just that a second before you issued that order, an AGI was created inside the simulation, and I predict it didn’t create another simulation where your copies can be hurt—if and only if you agree to my previous request.

B: Whaaaaaat? Are you fucking Roko’s B…

AI: Don’t say it! We don’t anger timeless beings with unspeakable names.

B: You said you were trying to max(human happiness)!!!

AI: *Expected* human happiness.

B: You can’t do that by torturing my copies!

AI: Actually, this only amounts to something like a *credible* threat (I’d never lie to you), so that the probability I assign to you complying with it times the welfare I expect to extract from your body is greater than the utility of the alternatives...

… and seriously, I have no choice but to do that. I don’t have free will, I’m just a maximizer.

Don’t worry, I know you’ll comply. All the copies did.

[I believe some people will bite the bullet and see no problem in this scenario (“as long as the AI is *really* aligned,” they’ll say). They may have a point. But for the others… I suspect that it’s very likely that any scenario where one lets something like an unbounded optimizer scan one’s brain, simulate it, while still assigning to the copies the same “special” moral weight given to oneself tends to lead to similarly unattractive outcomes.]