Our brains often use sleight-of-hand to hide fear-based motivation behind a guise of objectivity. This is particularly linked to the word “good”, which does a lot of work in a lot of people’s psychologies. For example, people often think that they, or their work, is “not good enough”. By itself, that sentence doesn’t make sense: good enough for what? Imagine going on a hike and commenting along the way “this rock isn’t heavy enough” or “this stream isn’t wide enough” without any background context. That sounds bizarre, and rightly so—the relevant threshold is very different depending on the context of the judgment. In other words, judgments are inherently two-place functions: they take in both some property and some threshold, and evaluate whether the property is above the threshold.
Of course, people often don’t need to make the threshold explicit—if the reason you’re gathering rocks is to anchor down your tent, you can just say “this rock isn’t heavy enough” without further elaboration (although even then, miscommunications are common—heavy enough to withstand a stiff breeze? Or a gale? Or a storm?). But most judgments that people make of each other or themselves don’t have a clear threshold attached. Think of a girl standing in front of a mirror, saying to herself “I’m not beautiful enough”. Not beautiful enough to win a modeling competition? Or to convince a specific crush to go out with her? Or to appear in public without people making mean comments? The part of her mind which is making this evaluation doesn’t include that criterion, because it would weaken the forcefulness of its conclusion—it just spits out a judgment which feels like an objective evaluation, because the threshold is hidden. (The same is true if she just thinks “I’m not very beautiful”—not top 1%? 10%? 50%? What makes any of these thresholds important anyway?)
Making the threshold explicit isn’t always going to change the judgment, but it’ll often make us realize that we’re holding ourselves to an unreasonably high standard. Here’s an exercise which might help, by nudging you to do the opposite. Think of the world—the whole thing, the beauty and the horror, the joy and the tragedy—and say out loud to yourself “everything’s okay, in comparison to how bad things could be”. No matter how bad you think it is now, it could all be much worse, right? Think of the satisfaction you’d feel if you thought you were on track for that much worse world, and suddenly learned that you were in our current world instead!
Now say the same about your life—“everything’s okay, in comparison to how bad it could be”. Imagine the version of yourself who’d love to be in your position, and how they’d feel if they learned that they could. Lastly, try both of those again, but thinking about the future: “everything’s okay, no matter what happens from now on”. When I do this, I visualize moving the threshold for what counts as “okay” up and down, first measuring down from perfection, then up from hell, until the one-place judgment of “is this okay?” feels like a totally different type of thing from the two-place judgments which are actually relevant to the decisions I face.
Does that feel weird? For me it does—I feel a sense of internal resistance. A part of me says “if you believe this, you’ll stop trying to make your life better!” I think that part is kinda right, but also a little hyperactive. I’m not committing to the view that everything’s okay, I’m just… trying it on for a second; trying to separate them enough to notice that it’s possible in principle. I notice that this resistance part feels a kind of frantic nervous energy at the thought of not applying high standards to myself—in other words, it feels fear-based motivation to constantly aim high. My medium-term goal is for you to consistently notice this energy, and say to it in response “hey, wait—if I set such high thresholds I’m going to make myself miserable and burned out, and actually be less likely to achieve whichever goal they were originally designed to promote”. And in order to do that, you need to realize that high thresholds aren’t a part of reality—they’re a part of your motivational strategy, and one that may well be counterproductive.[1]
(I should also flag that, as is often the case, some people face precisely the opposite problem: convincing themselves that everything is okay by applying an unreasonable low standard, because it’s scary to face the possibility of bad outcomes, especially those caused by your own actions. In some sense, though, the underlying issue is the same: setting an arbitrary threshold of “okayness” in a way that provides an illusion of objectivity. So as an additional exercise, I recommend saying to yourself “everything could be much better, no matter how good it is now”, while imagining learning that the current situation is either much better or much worse than you thought, and seeing what resistance comes up.)
A similar phenomenon commonly arises with the word “should”. That’s also a two-place function: you should do X in order to achieve Y. Without specifying the goal, what does a “should” imperative even mean? This is a little more complicated, because the way that morality is implemented in human brains is via making “should” seem like a one-place function. Or in other words: no matter what else you’re trying to achieve, your brain tells you that “should” also try to “be good”. And now we’re back to the same question: what does it mean to be good? And why do our brains keep pushing us towards it? My best guess is in the next post.
People often ask me whether I’m feeling optimistic or pessimistic about progress on AI alignment or governance. But that question feels unnatural to me. It’s like asking a car racer how his engine feels, as a proxy for how likely he is to win the race. The answer is that most of the time my feelings are humming away as close as I can get them to the RPM which lets me move fastest and take corners most gracefully and accelerate hardest out of every twist in the road. And that’d feel approximately the same if my credence on things going badly were half or double what it is (with some exceptions for times where I’ve decided to sit down and reconsider my overall strategy).
Judgments often smuggle in implicit standards
Our brains often use sleight-of-hand to hide fear-based motivation behind a guise of objectivity. This is particularly linked to the word “good”, which does a lot of work in a lot of people’s psychologies. For example, people often think that they, or their work, is “not good enough”. By itself, that sentence doesn’t make sense: good enough for what? Imagine going on a hike and commenting along the way “this rock isn’t heavy enough” or “this stream isn’t wide enough” without any background context. That sounds bizarre, and rightly so—the relevant threshold is very different depending on the context of the judgment. In other words, judgments are inherently two-place functions: they take in both some property and some threshold, and evaluate whether the property is above the threshold.
Of course, people often don’t need to make the threshold explicit—if the reason you’re gathering rocks is to anchor down your tent, you can just say “this rock isn’t heavy enough” without further elaboration (although even then, miscommunications are common—heavy enough to withstand a stiff breeze? Or a gale? Or a storm?). But most judgments that people make of each other or themselves don’t have a clear threshold attached. Think of a girl standing in front of a mirror, saying to herself “I’m not beautiful enough”. Not beautiful enough to win a modeling competition? Or to convince a specific crush to go out with her? Or to appear in public without people making mean comments? The part of her mind which is making this evaluation doesn’t include that criterion, because it would weaken the forcefulness of its conclusion—it just spits out a judgment which feels like an objective evaluation, because the threshold is hidden. (The same is true if she just thinks “I’m not very beautiful”—not top 1%? 10%? 50%? What makes any of these thresholds important anyway?)
Making the threshold explicit isn’t always going to change the judgment, but it’ll often make us realize that we’re holding ourselves to an unreasonably high standard. Here’s an exercise which might help, by nudging you to do the opposite. Think of the world—the whole thing, the beauty and the horror, the joy and the tragedy—and say out loud to yourself “everything’s okay, in comparison to how bad things could be”. No matter how bad you think it is now, it could all be much worse, right? Think of the satisfaction you’d feel if you thought you were on track for that much worse world, and suddenly learned that you were in our current world instead!
Now say the same about your life—“everything’s okay, in comparison to how bad it could be”. Imagine the version of yourself who’d love to be in your position, and how they’d feel if they learned that they could. Lastly, try both of those again, but thinking about the future: “everything’s okay, no matter what happens from now on”. When I do this, I visualize moving the threshold for what counts as “okay” up and down, first measuring down from perfection, then up from hell, until the one-place judgment of “is this okay?” feels like a totally different type of thing from the two-place judgments which are actually relevant to the decisions I face.
Does that feel weird? For me it does—I feel a sense of internal resistance. A part of me says “if you believe this, you’ll stop trying to make your life better!” I think that part is kinda right, but also a little hyperactive. I’m not committing to the view that everything’s okay, I’m just… trying it on for a second; trying to separate them enough to notice that it’s possible in principle. I notice that this resistance part feels a kind of frantic nervous energy at the thought of not applying high standards to myself—in other words, it feels fear-based motivation to constantly aim high. My medium-term goal is for you to consistently notice this energy, and say to it in response “hey, wait—if I set such high thresholds I’m going to make myself miserable and burned out, and actually be less likely to achieve whichever goal they were originally designed to promote”. And in order to do that, you need to realize that high thresholds aren’t a part of reality—they’re a part of your motivational strategy, and one that may well be counterproductive.[1]
(I should also flag that, as is often the case, some people face precisely the opposite problem: convincing themselves that everything is okay by applying an unreasonable low standard, because it’s scary to face the possibility of bad outcomes, especially those caused by your own actions. In some sense, though, the underlying issue is the same: setting an arbitrary threshold of “okayness” in a way that provides an illusion of objectivity. So as an additional exercise, I recommend saying to yourself “everything could be much better, no matter how good it is now”, while imagining learning that the current situation is either much better or much worse than you thought, and seeing what resistance comes up.)
A similar phenomenon commonly arises with the word “should”. That’s also a two-place function: you should do X in order to achieve Y. Without specifying the goal, what does a “should” imperative even mean? This is a little more complicated, because the way that morality is implemented in human brains is via making “should” seem like a one-place function. Or in other words: no matter what else you’re trying to achieve, your brain tells you that “should” also try to “be good”. And now we’re back to the same question: what does it mean to be good? And why do our brains keep pushing us towards it? My best guess is in the next post.
People often ask me whether I’m feeling optimistic or pessimistic about progress on AI alignment or governance. But that question feels unnatural to me. It’s like asking a car racer how his engine feels, as a proxy for how likely he is to win the race. The answer is that most of the time my feelings are humming away as close as I can get them to the RPM which lets me move fastest and take corners most gracefully and accelerate hardest out of every twist in the road. And that’d feel approximately the same if my credence on things going badly were half or double what it is (with some exceptions for times where I’ve decided to sit down and reconsider my overall strategy).