A simple argument for trying less hard
People often make arguments against “trying hard” (working very hard, pushing yourself to the brink, being intensely goal-directed, and so on) by pointing to the risks of burnout or of losing some kind of wholesomeness[1].
But there’s another, very simple argument against it that I have not seen anyone fully make explicit[2], even though I think it’s very important. It goes like this:
We face a lot of uncertainty about the sign of our impact.
Therefore, we should be very vigilant about our epistemics to make sure that we are not having a negative impact in expectation.
But trying hard deeply distorts our epistemics—it makes us more prone to motivated reasoning about what we’re doing, and leaves us with less slack to reflect on it.
Therefore, all else being equal, we should try less hard.
Crucially, this argument applies much more strongly to people working in “longtermist areas”—which other critiques of trying hard generally don’t do. For example, global health EAs whose terminal value is short-term welfare also face uncertainty about the impact of their actions—but much less (especially about the sign) than people trying to improve the long-term future. So this argument suggests that it’s especially dangerous for longtermists to try very hard.[3][4]
I’ll go through the steps in a little bit more detail.
Uncertainty
Much has been said in EA about cluelessness and crucial considerations, but I’ll highlight a few specific concerns that could make lots of current AI safety work[5] net negative (with no claim to novelty):
AI governance interventions are obviously high-variance: bad regulation can easily make things worse, many interventions could increase the risk of great power conflict, increased political polarization around AI could be really bad, more centralization of power increases authoritarianism risk, and so on. And technical work can have flow-through effects on these variables that outweigh its direct effects.[6]
Activist work can polarize people against the cause.[7]
Human takeover might be worse than AI takeover, and many AI safety interventions effectively attempt to make human takeover more likely relative to AI takeover.[8]
If powerful AI will be well-described as doing humanlike roleplaying, trying to control it could make it eventually dislike its “oppressors”, or make it less “mentally healthy” in some way. And even without that assumption, AI safety work could lead to an adversarial relationship with AI in other ways.
Future AIs may be moral patients themselves, which would substantially reduce the value of preventing human extinction, and increase the downside risk (including S-risk) of “AI control”-style interventions.
Useless work could contribute to “safety-washing” or a false sense of security.
There’s cultural concerns around scale, professionalization and “mainstreaming”[9]- decreases in integrity and epistemic virtue could be very bad for achieving good outcomes.
Capabilities externalities could accelerate AI progress, which many think is bad—people have raised this worry about RLHF historically, and raise it about interpretability and evals nowadays. Most infamously, AI safety activity, to varying extents, contributed to the foundings of all three of DeepMind, OpenAI and Anthropic.
These are very difficult to evaluate (and underargued here). My point is not that these are all valid worries—I’m skeptical of several of them. But I don’t think they can be neglected.
Epistemic distortion
In the past, I’ve felt a sense of being overwhelmed at all these considerations, and felt tempted to just avoid thinking about them—but that can’t be the answer. We have to take the uncertainty seriously. Even if I don’t currently have the capacity to go into some kind of deep reflection, I should attempt to make my actions as robust to the uncertainty as I can—for example, by making sure I can course-correct, and by keeping my epistemics in good shape.
Unfortunately, trying very hard conflicts with this—the harder we push toward a goal, the more we bend the evidence to justify it, and the less mental room we have left to step back and question what we’re doing.[10]
And I think there’s something stronger, too—in an active inference framework, beliefs and desires are both just expectations about the world. Experientially, this rings true to me—the feelings of frustration at not getting what I want and at being taken off guard by something I hadn’t even been paying attention to are very similar. It seems deeply hard to distinguish between what we want and what we believe.
This is a bit more speculative, but sometimes I think people don’t fully absorb this point: It’s not psychological, it’s neurological. There’s a sense in which wanting anything distorts us away from pure self-supervised prediction of the world and compresses us internally into living in a specific hypothesis—a “gut-level” vision of the world, that gets upweighted on the level of our base perceptions. So we may not be able to fully adjust for it by only manipulating psychological factors, e.g. consciously trying to be more objective or less selfish.
Conclusion
To be clear, I have a lot of respect for people who try extremely hard—I wouldn’t be able to do it, I’m often in awe of them.[11] I’m also not trying to make a statement about how strong the update from this should be (I don’t even have a good enough knowledge about these spaces to have a precise sense of how hard various groups of people are trying). Maybe arguments for trying even harder actually outweigh this on the current margins, I wouldn’t know.
But I have the sense that this simple consideration is underrated, and I hope this post can provide a reference point for it and make people take it into account in their personal deliberations.
- ^
Also, some apparently seminal academic philosophy stuff that seems interesting: Moral Saints by Susan Wolf, and Bernard Williams’ work.
- ^
The closest thing to it is probably the strain of thought around What should you change in response to an “emergency”? And AI risk and Slack gives you space to notice/reflect on subtle things—but that still seems centered on the MIRI-style mindset of (strawmanning here) “more AI safety is definitely good and we just need to think hard to find “true” AI safety work” (which isn’t really my mindset). That is, the uncertainty about what to do comes more from their specific inside-view that alignment is very hard, rather than model-agnostic EA/philosophy-style cluelessness. So I think the argument in this post is a more general one that should be convincing to more people (nowadays, there are obviously a lot of non-MIRI-cluster people who are trying extremely hard on AI safety stuff, e.g. this post that I saw recently).
There’s also the “Maximization is perilous” angle, but that’s more about naive optimization in general, and not about facing huge uncertainty specifically (e.g. it applies equally to global health EAs).
Also, shoutout to Slack matters more than any outcome for a personally inspirational framing on related issues.
- ^
Which is a little counterintuitive, because we usually see the opposite—longtermist EAs being more intense. Although longtermists also have a bigger moral scope and often more urgency (vis-à-vis AI timelines), so may reasonably trade off more sharply against other values and personal well-being.
- ^
The same argument also works for virtue and general emotional health, but that’s out of scope for this post.
- ^
Of course, AI safety interventions are extremely heterogenous—but that just increases the extent to which individual decisionmaking is crucial (as opposed to deferring to people).
- ^
Holden Karnofsky: “Most things that touch policy at all in any way will move us along that spectrum in one direction or another, so therefore have a high chance of being negative [...]
And then most things that you can do in AI at all will have some impact on policy. Even just alignment research: policy will be shaped by what we’re seeing from alignment research, how tractable it looks, what the interventions look like.” (h/t Anthony DiGiovanni)
- ^
Holden Karnofsky: “there’s also a lot of micro ways in which you could do harm. Just literally working in safety and being annoying, you might do net harm. You might just talk to the wrong person at the wrong time, get on their nerves. I’ve heard lots of stories of this. Just like, this person does great safety work, but they really annoyed this one person, and that might be the reason we all go extinct” (h/t Anthony DiGiovanni)
- ^
Among other things.
- ^
I associate these with people like Richard Ngo (and here) and Oliver Habryka.
- ^
This is well-known in psychology. Also, Opus 4.8 wrote that sentence.
- ^
I definitely don’t want to imply that if only it weren’t for this argument, I would be an extremely hard worker too, haha.
I’d never thought of this argument, and it’s obviously correct in retrospect.
Although “trying less hard” might not quite be pointing at the right thing. Reflecting on your epistemics / course of action could still be considered a form of “trying hard”, so maybe it would be better to describe it as “rocketing in a particular pre-determined direction less hard”.
Thank you Michael! Really appreciate you saying that.
I guess I did say at the end of the “Epistemic distortion” section that consciously trying to be more objective may not be enough to replicate the effects of trying less hard. But yeah, I’m pretty unhappy with the framing/structure overall, e.g. it could’ve been more concrete and clear about the mechanisms.
Hi Elias.
Great point.
If interventions decreasing the risk of large catastrophes mostly affected the longterm future, why would the same not apply to global health interventions?
Interventions with negligible longterm effects could still decrease welfare due to effects on soil invertebrates?
Thanks Vasco!
Not sure what you’re asking exactly—I’m just saying that if you’re not a longtermist, you don’t face as much uncertainty about how to achieve good outcomes, so the argument doesn’t apply as much to you.
Makes sense.
You seem to suggest that longterm effects may not be relevant for global health interventions while being relevant for AI safety interventions (or others which are typically referred to as longtermist interventions). I meant to question this. If I thought longterm effects were relevant for AI safety interventions (I do not), I would think they would be relevant for global health interventions too.
Interesting post!
I suspect that the epistemic health that matters is on the level of the group and not the individual. I.e. I care about the net effect of the group’s actions, and I don’t much care whether individuals are rational, as long as the group acts rationally. With that framing, would it be better if everyone individually ‘tried hard’ or not?
Seems like it’d net out to being better if the correct course of action produces much more utility than the aggregate disutility of the rest of the actions we are likely to take. However, if it’s very easy to do badly wrong, doing less might be the better strategy.
One other point though: When you take negative responsibility seriously (as we should), you dispense with the idea of a neutral option, or the choice to do nothing. Bracketing the epistemics-distorting point, there isn’t necessarily a difference in the expected (sign-neutral) impact of doing ‘nothing’ and trying hard.
Thanks Toby!
Yeah, I think there’s a lot of interesting things to be said here. Some points:
-I’m not sure why we should expect the group to be well-calibrated when no individual is? That sounds a little magical to me. It’s not a market dynamic where ground-level feedback directly lowers the viability/status of incorrect hypotheses, and so leads to the collective being smarter than any individual—since we get no direct feedback about our effect on the long-term future whatsoever, and certainly none that would literally force us to stop what we’re doing (like not being in touch with what the market wants being able to cause your business to fail).
It feels more viable to me for every individual to act with epistemic virtue (or explicitly defer to someone who does so).
-For a slightly tangential point, it’s interesting to think about what the optimal social structures of deferral would be, but I’ll note that my guess is that the most influential people also tend to be among the people who work/try the hardest? That’s a big part of why people are successful, after all. So if anything, this is a cause for more worry.
-On your “other point”: Yeah, I certainly don’t endorse doing nothing. The epistemic distortion is just one consideration that happens to push towards doing less (since it lowers the EV of doing anything, if we haven’t considered it before). It needs to be weighed against the many other considerations that exist.
Curious what you think!
“I’m not sure why we should expect the group to be well-calibrated when no individual is?”—Something something marketplace of ideas? An analogy is a court, where the prosecution and the defence both have their conclusions assigned to them beforehand. They are both epistemically vicious, in opposite ways. Then the idea is that the best arguments win on their merits. I’m not sure quite how this analogy would fit though (I’ve had false starts writing a blog post on this a few times).
“I certainly don’t endorse doing nothing” Yep I get that this argument is only one consideration, but my point extends to “trying less hard” as well I think.
“Yep I get that this argument is only one consideration, but my point extends to “trying less hard” as well I think.” I’m not sure I understand what you mean, could you explain it more?
”Something something marketplace of ideas?”
Yeah, this is an empirical question—theoretically, we might see better arguments rise in status and influence people more, and of course to some extent we do (e.g. AI risk rising in status over the last 10 years). But there are other factors influencing the status of ideas too, like some kind of general action-bias / power-seeking-bias. Concretely, I feel that most of the bullet points in the “uncertainty” section of the post are underrepresented in the discourse—curious if you disagree.
Also, one more point on the deferral thing—one way in which I think I’m weird is that I would genuinely raise my esteem of someone (on a gut-level) if they said “I’m hopelessly biased on this topic, I can’t think about it, don’t listen to me”. Unfortunately, you never see this. I would really like to see that more.
E.g. if someone who tries extremely hard said “I can’t really clearly think about what I’m doing because I’m working so hard, so be careful about listening to me” it would be very beautiful to me. Poetically speaking, it would be… accepting that they are a human weapon, forged for a purpose, impaired by that sacrifice, baring it for the world to see. There would be a slight feeling of heartbreak and love for them in me, and I might very well value them more than before. As I argue in the “Epistemic distortion” section, there’s at least to some extent a deep tradeoff between doing and thinking—so admitting that they are trading off against thinking, crippling their mind on a deep level, could make people respect them more by showing how much they are sacrificing for the “doing” status hierarchy. It could be heroic.
That’s just a fantasy I have about how things could work.
Or perhaps the conclusion could just be “if you want to try very hard, don’t forget to have good epistemics and course correct if needed”?
This doesn’t quite work if you accept Elias’s argument that trying hard distorts your epistemics.