Inspired by the last section of this post (and by a later comment from Mjreard), I thought it’d be fun—and maybe helpful—to taxonomize the ways in which mission or value drift can arise out of the instrumental goal of pursuing influence/reach/status/allies:
Epistemic status: caricaturing things somewhat
Never turning back the wheel
In this failure mode, you never lose sight of how x-risk reduction is your terminal goal. However, in your two-step plan of ‘gain influence, then deploy that influence to reduce x-risk,’ you wait too long to move onto step two, and never get around to actually reducing x-risk. There is always more influence to acquire, and you can never be sure that ASI is only a couple of years away, so you never get around to saying, ‘Okay, time to shelve this influence-seeking and refocus on reducing x-risk.’ What in retrospect becomes known as crunch time comes and goes, and you lose your window of opportunity to put your influence to good use.
Classic murder-Gandhi
Scott Alexander (2012) tells the tale of murder-Gandhi:
Previously on Less Wrong’s The Adventures of Murder-Gandhi: Gandhi is offered a pill that will turn him into an unstoppable murderer. He refuses to take it, because in his current incarnation as a pacifist, he doesn’t want others to die, and he knows that would be a consequence of taking the pill. Even if we offered him $1 million to take the pill, his abhorrence of violence would lead him to refuse.
But suppose we offered Gandhi $1 million to take a different pill: one which would decrease his reluctance to murder by 1%. This sounds like a pretty good deal. Even a person with 1% less reluctance to murder than Gandhi is still pretty pacifist and not likely to go killing anybody. And he could donate the money to his favorite charity and perhaps save some lives. Gandhi accepts the offer.
Now we iterate the process: every time Gandhi takes the 1%-more-likely-to-murder-pill, we offer him another $1 million to take the same pill again.
Maybe original Gandhi, upon sober contemplation, would decide to accept $5 million to become 5% less reluctant to murder. Maybe 95% of his original pacifism is the only level at which he can be absolutely sure that he will still pursue his pacifist ideals.
Unfortunately, original Gandhi isn’t the one making the choice of whether or not to take the 6th pill. 95%-Gandhi is. And 95% Gandhi doesn’t care quite as much about pacifism as original Gandhi did. He still doesn’t want to become a murderer, but it wouldn’t be a disaster if he were just 90% as reluctant as original Gandhi, that stuck-up goody-goody.
What if there were a general principle that each Gandhi was comfortable with Gandhis 5% more murderous than himself, but no more? Original Gandhi would start taking the pills, hoping to get down to 95%, but 95%-Gandhi would start taking five more, hoping to get down to 90%, and so on until he’s rampaging through the streets of Delhi, killing everything in sight.
The parallel here is that you can ‘take the pill’ to gain some influence, at the cost of focusing a bit less on x-risk. Unfortunately, like Gandhi, once you start taking pills, you can’t stop—your values change and you care less and less about x-risk until you’ve slid all the way down the slope.
It could be your personal values that change: as you spend more time gaining influence amongst policy folks (say), you start to genuinely believe that unemployment is as important as x-risk, and that beating China is the ultimate goal.
Or, it could be your organisation’s values that change: You hire some folks for their expertise and connections outside of EA. These new hires affect your org’s culture. The effect is only slight, at first, but a couple of positive feedback cycles go by (wherein, e.g., your most x-risk-focused staff notice the shift, don’t like it, and leave). Before you know it, your org has gained the reach to impact x-risk, but lost the inclination to do so, and you don’t have enough control to change things back.
Social status misgeneralization
You and I, as humans, are hardwired to care about status. We often behave in ways that are about gaining status, whether we admit this to ourselves consciously or not. Fortunately, when surrounded by EAs, pursuing status is a great proxy for reducing x-risk: it is high status in EA to be a frugal, principled, scout mindset-ish x-risk reducer.
Unfortunately, now that we’re expanding our reach, our social circles don’t offer the same proxy. Now, pursuing status means making big, prestigious-looking moves in the world (and making big moves in AI means building better products or addressing hot-button issues, like discrimination). It is not high status in the wider world to be an x-risk reducer, and so we stop being x-risk reducers.
I have no real idea which of these failure modes is most common, although I speculate that it’s the last one. (I’d be keen to hear others’ takes.) Also, to be clear, I don’t believe the correct solution is to ‘stay small’ and avoid interfacing with the wider world. However, I do believe that these failure modes are easier to fall into than one might naively expect, and I hope that a better awareness of them might help us circumvent them.
Relevant reporting from Sentinel earlier today (May 19):
28% is concerningly high—all the more reason for US citizens to heed this post’s call to action and get in touch with your Senators. (Thank you to those who already have!)
(Current status is: “The bill cleared a key hurdle when the House Budget Committee voted to advance it on Sunday [May 18] night, but it still must undergo a series of votes in the House before it can move to the Senate for consideration.”)