My bargain with the EA machine

Last summer I was thinking about what I wanted to do after finishing grad school. While I had several career options in the back of my mind, I only had a good sense of two of them (quantitative finance and academia). It struck me that the amount of exploration I had done so far had been woefully inadequate for making a decision about how I’d spend 80,000 hours of my life. Shortly thereafter, I decided that I would take a year off to try various potential careers. I’ll be kicking off this adventure in a month; some things I might try are AI alignment theory, empirical alignment work, philosophy, forecasting, and politics.

Sometime in 2023, I plan to make a career decision. When doing so, what will I be optimizing for? The EA answer is that I should pick the career that would allow me to have the most positive impact on the world. This will be part of my calculation, but it won’t be everything. I intrinsically value enjoying my life and career – not just as a means to the end of generating impact, but for its own sake.

Many people in EA depart from me here: they see choices that do not maximize impacts as personal mistakes. Imagine a button that, if you press it, would cause you to always take the impact-maximizing action for the rest of your life, even if it entails great personal sacrifice. Many (most?) longtermist EAs I talk to say they would press this button – and I believe them. That’s not true of me; I’m partially aligned with EA values (since impact is an important consideration for me), but not fully aligned.

I like to think about this in terms of an elliptical cloud. Each point in the ellipse represents a plausible outcome of my career. The x-coordinate represents the impact I have on the world; the y-coordinate represents how good the career is according to my utility function (what I believe to be my extrapolated volition, to be more precise), which you can think of as combining an altruistic component (to the extent I care about impact) with a selfish component (to the extent that I want to prioritize my own happiness as part of my career choice).

Career A ranks higher than Career B on the x-axis if it’s more impactful by standard EA/utilitarian metrics. A ranks higher than B on the y-axis if, upon reflecting enough on my preferences and values, I would choose A over B.

(Why an ellipse? Data drawn from a bivariate normal distribution will form an elliptical cloud. This is a simplified model in that vein; most of what I’ll be saying will also be true for other natural shape choices.)

Note that this is my personal ellipse; you have your own, and it might be shaped pretty differently from mine. If you would want to press the button in my thought experiment, then your ellipse is probably pretty close to a line.

*That’s why we call these people “aligned”*

People whose ellipse looks like this are a perfect fit for the EA community. EA is geared toward giving people the resources to maximize their impact, which is exactly what some people are looking for.

On the other hand, many EAs deliberately make choices that significantly reduce their impact, and endorse these choices. For wealthy earners-to-give, giving away only 10% (or even only 50%?) of their wealth often constitutes such a choice. For some (but not all) EAs, having kids is such a choice. Such people often have ellipses that are shaped similarly to mine.

I love the EA community, but the shape of my ellipse complicates my relationship with it. My “default” career trajectory (i.e., what I probably would have done if not for exposure to the EA community) is being a professor in a non-impactful area; it might be represented by this red point.

Pretty good in terms of my values – not so much in terms of impact. EA would rather I do something very different; indeed, according to EA values, the optimal place for me to be is as far to the right as possible.

(What is this point that’s all the way to the right? It might be working on AI alignment theory, exhausting myself every day, working just at the threshold where I don’t get burned out but do a few more hours of work than I’d enjoy. Or maybe not quite this, because of secondary effects, but probably something in this direction.)

On the other hand, I’d be unhappy with this outcome. I look at this arrow and think “that arrow goes down.” That’s not what I want; I want to move up.

But observe: the best (highest) point according to my values is also to the right of the “default” red circle! This is true for a couple reasons. First, I do care about impact, even if it’s not the only consideration. Second, I really enjoy socializing and working with other EAs, more so than with any other community I’ve found. The career outcomes that are all the way up (and pretty far to the right) are ones where I do cool work at a longtermist office space, hanging out with the awesome people there during lunch and after work.^[1]

It’s natural to conceptualize this situation in terms of an implicit bargain, where my counterparty is what I might call the “EA machine”. That’s my term for the infrastructure that’s set up to nudge people into high-impact careers and provide them with the resources they need to pursue these careers. We each have something to give to the other: I can give “impact” (e.g. through AI safety research), while the EA machine can give me things that make me happy: primarily social community and good working conditions, but also things like housing, money, status, and career capital. Both the EA machine and I want my career to end up on the Pareto frontier, between the two green points. I’m only willing to accept a bargain that would allow me to attain a higher point than what I would attain by default – but besides that, anything is on the table.

(I don’t have a full model of what exactly the bargaining looks like. A fleshed-out model might involve the EA machine expending resources to change the shape of the ellipse and then me choosing the point. But the bargaining metaphor feels appropriate to me even in the absence of a complete model.)

What’s the “fair” point along this frontier? This is an interesting question that I don’t really know how to approach, so I’ll leave it to the side.

A more pressing concern for me is: how do I make sure that I don’t end up all the way to the right, at the “impactful toil” point? The naïve answer is that I wouldn’t end up there: if offered a job that would make me much less happy, I wouldn’t take it. But I don’t think it’s that straightforward, because part of the bargain with the EA machine is that the EA machine can change your values. During my year off, I’ll be surrounded by people who care deeply about maximizing their impact, to the exclusion of everything else. People change to be more similar to the people around them, and I’m no exception.

Do I want my values changed to be more aligned with what’s good for the world? This is a hard philosophical question, but my tentative answer is: not inherently – only to the extent that it lets me do better according to my current values. This means that I should be careful – and being careful involves noticing when my values are changing. I’m not really scared of starting to value impact more, but I am scared of valuing less the things I currently care about. It seems pretty bad for me if the EA machine systematically makes things that currently bring me joy stop bringing me joy.

If you’d like, you can think about this hypothetical situation as the EA machine cutting off the top of the ellipse (thanks to Drake Thomas for this framing). If they do that, I guess I might as well move all the way to the right:

If you’d like a more concrete model, you can think of the top part of the ellipse as representing hypothetical lives in which I spend my free time on various non-impactful things that I enjoy. If the EA machine takes these things away from me, then I may as well spend all my time working, since nothing else would bring me joy.

I don’t think anyone has anything like this as their goal. EAs are super nice and would be pretty sad if they discovered that becoming part of the EA community had this effect on me. But sometimes organizations of people have outcomes that no one in particular desires or intends, and to the extent that the EA machine is oriented toward the “goal” of maximizing impact, it seems plausible that mechanisms such as “cutting off the top of the ellipse” would arise, by the fault of no one in particular.

To prevent this outcome, I’ve made a list of non-EA things I currently care a lot about:

My family and non-EA friends
Puzzles and puzzle hunts
Spending time in nature
Doing random statistical analysis on things that don’t matter at all, such as marble racing

My plan is to periodically reflect on how much I care about these things – to notice if I start caring about them less. If I start caring about my family less, that’s a red flag; for the other bullet points, interests come and go, so I wouldn’t say that waning interest is a red flag. But maybe it’s a yellow flag: something to notice and reflect on.

I don’t expect the EA machine to dramatically change my values in a way that current-me wouldn’t endorse, because I’m pretty good at not being pressured into beliefs. But social pressure can be a strong force, which is why I’d like to be careful.

(What are some other plans that might be helpful? Ben Pace suggested scheduling breaks from the EA community. Duncan Sabien suggested doing occasional sanity checks with people outside the EA community that I’m not making bad decisions. Vael Gates suggested applying techniques like internal double crux or focusing to introspect about my feelings. These all seem like great suggestions, and I welcome others in the comments.)

Most top EA community builders who I talk to are surprised to learn that I wouldn’t press the button. I think I’m pretty normal in this regard, and that it’s useful for community builders to know that many EAs who can and want to do productive work have ellipses that look like mine.

The fact that such people exist may have implications for community building and EA/longtermist spaces more broadly. Concretely, in many conversations in EA circles, especially at longtermist retreats I’ve been to, there is an unstated assumption that everyone’s goal is to maximize their impact. This assumption has benefits, such as setting high expectations and creating an atmosphere in which EAs are expected to act on their moral beliefs. But it also sometimes makes me (and I imagine others) a bit uncomfortable or excluded. To the extent that EA wants to get people (including people who aren’t fully aligned) excited about working on important causes, this may be a substantial drawback. X-risk is an all-hands-on-deck issue, and EA may be well-served by being more inclusive of such people. I’m not saying that this is one of the most important ways for the EA community to improve, but I thought it might be useful to flag anyway.

Thanks to Sam Marks, Aaron Scher, Ben Pace, Drake Thomas, Duncan Sabien, and Vael Gates for thoughts and comments!

^
If an EA org generously pays someone who cares a lot about money to do impactful work, they are essentially creating an opportunity for the person to move up by moving to the right. I’m not super motivated by money, so this isn’t a big deal to me, but I see this as one potentially positive effect of EA having lots of money.