Note: In retrospect, I think I’m making two separate points here. The first, and most important one, is the idea that “interestingness” or asking “what possible futures would make for the best story?” can provide predictive insight. I didn’t spend much time on that point, despite its potential importance, so if anyone can create a steelmanned version of that position I’d appreciate it. The second point is that judging by that heuristic, the rationalist movement is terrible at placebomancy. The epistemic status of this entire post is Speculation Mode, so take everything I say with a grain of salt.
We don’t live in a work of fiction. That being said, there is value in being genre-savvy; storytelling is a form of world-modeling, and I think it’s possible that what we find most satisfying from a storytelling perspective is more likely to reflect reality than what one would naively think. As such, it may be worth modeling what we would expect to happen if reality was a story.
How would Rationalists fare in a typical story?
I posed the following scenario to three different Discord groups I’m in (none of which know much about the Rationalist movement):
Imagine you’re halfway through reading a fiction book. Here’s a summary of what’s going on so far:
The book follows a strange group of people. They call themselves “Thinkers,” and they believe they’ve discovered a set of methods that allow people to make significantly more accurate predictions of the future. And the Thinkers are worried. Using their new methods, they predict with a high degree of confidence that humanity will create a hyper-intelligent machine (so smart that it makes Einstein look like a toddler by comparison), but without human morality. This human-made machine will destroy humanity, if they don’t do something about it. The leaders of the group seem to be more-or-less in agreement—the only way to stop this apocalypse is to build a hyper-intelligent machine themselves, but one which has the same goals as humans, and will be able to prevent any other immoral hyper-intelligent machines from being created. The only problem is that the Thinkers do not yet know how to program a machine to have fully human goals.
What happens next in this story? How does it end?
I followed up by clarifying that the question is more “if you read this in a fiction book how you would expect the plot to go?”, not “what would actually happen in real life?”
My personal answer
The story that I would personally expect to read involves the Thinkers failing. A tale in which they succeed sounds far less interesting than a self-caused apocalypse, to the point that I’d probably feel almost disappointed if the story ended with a dramatic AI battle where the humans (and aligned AI) win, which is how I’d envision a likely “happy ending” to go. The setup practically begs for a dramatic failure; the heroes believe they have skills/knowledge others don’t (a classic sign of hubris, which rarely goes unpunished in stories), the risk can be perceived as being somewhat self-imposed (making a potential loss feel more poetic), and of course there’s the obvious Frankenstein/Golem-esque trope of the scientist’s own creation turning against the creator staring you right in the face.
One might say that a bad ending would leave little room for a sequel, but that counter-argument only works if both A) the AI doesn’t do much of narrative interest, and B) humanity would go on to lead a more interesting narrative. The problem is that most utopias are boring—nobody likes reading about a happily-ever-after which involves no suffering or risk of suffering; it’s boring. In order to sustain narrative interest, dystopias with negative overall quality of life are preferred.
But enough with my take—what did people outside the community think?
Consensus: We will be tragic, self-fulfilling prophets of doom
There were a few different possible endings given (for a mostly complete list, see anonymized transcripts below), but the most common response was that the Thinkers would cause the very tragedy they fear most, by building an AI themselves, but failing to take account of some crucial factor (the details differ significantly) which leads to Alignment failure.
What can we take away from this?
If we live in a world in which interestingness of narrative serves as a useful predictive heuristic, the best thing to do would probably not be to try to create an Aligned AI ourselves. Rather, the best way to reduce existential risk should try to find a plausible pathway to make the world more narratively interesting with us in it. Research should be done on making better plausible utopias where deep and interesting stories can be set.
This also has the advantage of serving as a proxy for finding worlds in which it feels like there is value in our existence, I think. That’s just generally useful for morale, and self-value under some philosophies!
Also also, maybe the world really is optimized for interestingness, and this isn’t just a weird thought experiment—it might be worth exploring this (admittedly rather exotic) theory in more detail for philosophical plausibility. One argument in favor might look like the observation (which may or may not be true) that most detailed social simulations currently in existence are made in the context of video games and chatbots, and if the simulation hypothesis is correct, the majority of worlds with observers in them may be designed to optimize entertainment of an external observer over the pleasure or pain of its inhabitants. Is there instrumental convergence in entertainment—some way to generalize the concept for a diverse array of agents? I have no idea, but it might be worth considering!
Transcript of responses
Here are some representative responses, anonymized, and shared here with consent of all involved:
Private chat with friend group 1
User A: There [sic] machine does exactly what they didn’t want it to do in the first place because of big part of human nature is conquering others
User B: The “thinkers” are actually the creators of the bad machine and they do exactly the opposite of what they wanted and they ultimately are the provayor [sic] of death.
User A: That was gonna be my second guess lol
[Me]: I’m getting the sense y’all would not expect a happy ending lol
User B: No
I would not
User A: No nothing ever good happens when people can see the future and try to make it better
[Redacted; consent not obtained for response]
User C: i would expect it to end in the machine, being as intelligent as it is yet having its robotic lack of compassion, is the prophesized doomsday device
[conversation continues, questions are raised about if better predicting the future is actually a good thing or not, interesting stuff but not super relevant, etc.]
Chat in Discord server devoted to The Endless Empty (an artistic indie video game made by a friend of mine—it’s really good btw!)
User 1: The thinkers use their prediction method to find the best way to begin developing their own moral hyper AI
User 2: the 1st machine isnt actually super evil and shit, it can still be reasoned with because good will always defeat evil or whatever idk. they convince it that humans are good, actually and it becomes gay with one of the thinkers because you know whos writing this post
User 1: And they eventually succeed
User 2: or like, not even ~good~ persay just better then not
User 1: Everyone is so in love with the idea of making robots fall in love with being human
User 2: i wonder why that is haha
User 1: I’m obsessed with making them similar to humans but just to the left enough that their societies would not be able to mix seamlessly
User 2: <----- needs to be convinced that things Matter sososososo bad
[Me]: (For context btw, the question is more if you read this in a fiction book how you would expect the plot to go, not “what would actually happen in real life”)
User 3: I would want the machines to run away together, but what i expect is for them (the thinkers) to create the machine that causes the downfall of humanity
User 1: Oh so we need to prolong conflict? The thinker’s development has to be accelerated because they get wind that a horrible megacorp is working on developing an AI for optimizing profit and some thinkers are assigned to steal and sabotage the megacorp
Eventually both parties fail
And the prediction is left undecided for now
User 2: i thionk my version is the best because it had sloppy interspecies makeouts and gay computers in it
[some irrelevant, but quite fun discussion ensues]
User 2: i mean, no because my ideas are always better then whoever is writing the book but i desperately need it to end like this yknow
i need a book that decides that, maybe things are good and beautiful and nice
the hopeful gay ending is the best one, but we live in a world where marvel rakes in more cash then everything everywhere so yknow
User 1: I need a book that says that worrying endlessly about the future is a waste of energy and you just need to live your life and do things as you see fit
User 2: https://youtu.be/8309BPqllyg
i need it to end like this
Chat in Discord focused on Wikipedia editing
User X: a cliche ending would be “they create their AI but it doesn’t have human morality, fulfilling their own prophecy”
User Y: The thinkers builds a machine which comes to the same conclusion that the best way forward for humanity is indeed a machine without morals, as if it’s based on humanity, the record has shown people wage wars, ignore climate crises, engage in corruption, etc.
0th law of robotics, really
cliche but true
if machines can execute corrupt politicians, or eliminate potential Hitlers
is it moral to do so?
User X: a variation could be: the AI they create to prevent the evil AI from being created focuses exclusively on that goal, and eradicates humanity to prevent it from creating an AI.
User Y: Human morality prevents us from doing the most efficient thing, even if the greater good is at stake.
User X: I feel like human morality (or lack thereof) allows people to be very efficiently evil sometimes
User Y: it would depend.
[Conversation switches to discussion about the morality of killing baby Hitler, and the plausibility of the multiverse hypothesis]
I think there’s a plausible case to be made that art’s evolutionary “purpose” was to help with collaborative world-modeling, mainly of social dynamics. By engaging in low-stakes roleplay we can both model the other, and get critique from others which further refines our model. I hope I’m making sense here—if not, please let me know :)
Some anecdotal experience informing this hypothesis: For a few years as a teenager, I was half-convinced that we were living in a simulation optimized to tell an engaging story (I no longer believe this to be accurate, but I honestly wouldn’t be too surprised if it were). This belief was grounded in the observation that while history is clearly not optimized for the pleasure of its inhabitants, it makes for a very fun read after the fact (or well, the history I read about was fun at least). If true, it would make sense that future political / large-scale social events would continue in a direction optimized for making the most interesting story to an outsider, so I correctly predicted that Donald Trump would get elected, months before anyone else in my social group did. Weak evidence, I know, but plenty of later events have seemingly gone in the direction of “maximally interesting to read about” over “most realistically likely to happen using normative assumptions.” Try it out yourself and see where it gets you!
To just blatantly steal a quote from Justis, who very kindly proofread (and critiqued) this post:
I think “how would people see this going in a story” is a helpful way to predict how people will feel, at a gut level, about a project. And that itself is really valuable! If people see you as the doomed bad guys on a gut level, they probably won’t support you, which may mean the AI x-risk project needs different messaging.
Oh! And I suppose one other thing strikes me. You asked what readers would expect in a generic story, not necessarily a good one! Person two in one of the transcript had a very different idea of what would make a good story. Let’s hope it’s their taste that wins out, if this is indeed a simulation for the purpose of entertainment. :)
This is an excellent point, obviously. It also ties into some speculative thoughts I’ve been having about “prompting” as it relates to humans, and how some techniques used in prompt engineering may be transferable to other domains. If there’s interest, I might try to expand on this tangent sometime in the future...
Another excellent critique from Justis (which I wasn’t sure how to incorporate into the main body of the text so I’m just sticking it here):
To get into why I don’t think “how would this go in a novel” is a very good metric, I think that while the best stories do have a grain of truth in them, modal/cliche stories kind of just reliably push certain buttons? Romance novels feature people not liking each other happening to be in contrived situations that make them fall in love, for example. In reality, this is pretty rare—most couples just sort of like each other from the get go! I think there’s something similar with treatments of “rationality” in fiction—people like to have “rational” characters in fiction stand in opposition to emotion, with emotion winning out, because stories are designed to evoke emotion so they tend to be “in emotion’s corner.” But are more rational people less emotional on the whole? Not necessarily!
Then again, I think many surprising theories about the world strike different people different ways—so I don’t mean to imply my suspiciousness means the idea isn’t worth sharing. And, again, I do think the research you did is interesting even if the “placebomancy is real” thesis happens to be false. I would also be interested in seeing that steelmanning, though clearly I’m not the one to write it!