EA should blurt

A lot of EAs are reporting that some things seem like early signs of character or judgment flaws in SBF — an argument that seems wrong, an action that seems unjustified, etc. — now that they can reexamine those data points with the benefit of hindsight.

But the mental motions involved in “revisit the past and do a mental search for warning signs confirming that a Bad Person is bad” are pretty different from the mental motions involved in noticing and responding to problems before the person seems Bad at all.

“Noticing red flags” often isn’t what it feels like from the inside to properly notice, respond to, and propagate warning signs that someone you respect is fucking up in a surprising way.

Things usually feel like “red flags” after you’re suspicious, rather than before.

You’re hopefully learning some real-world patterns via this “reinterpret old data points in a new light” process. But you aren’t necessarily training the relevant skills and habits by doing this.

From my perspective, the whole idea that the relevant skillset is specifically about spotting Bad Actors is itself sort of confused. Like, EAs might indeed have too low a prior on bad actors existing, but also, the idea that the world is sharply divided into Fully Good Actors and Fully Bad Actors is part of what protected SBF in the first place!

It kept us from doing mundane epistemic accounting before he seemed Bad. If you’re discouraged from just raising a minor local Criticism or Objection for its own sake — if you need some larger thesis or agenda or axe to grind, before it’s OK to say “hey wait, I don’t get X” — then it will be a lot harder to update incrementally and spot problems early.

(And, incidentally, a lot harder to trust your information sources! EA will inevitably make slower intellectual progress insofar as we don’t trust each other to just say what’s on our mind like an ordinary group of acquaintances working on a project together, and instead have to try to correct for various agendas or strategies we think the other party might be implementing.)

(Even if nobody’s lying, we have to worry about filtered evidence, where people are willing to say X if they believe X but unwilling to say not-X if they believe not-X.)


Suppose that I say “the mental motions needed to spot SBF’s issues early are mostly the same as the mental motions needed to notice when Eliezer’s saying something that doesn’t seem to make sense, casually updating at least a little against Eliezer’s judgment in this domain, and naively blurting out ‘wait, that doesn’t currently make sense to me, what about objection X?’”

(Or if you don’t have much respect for Eliezer, pick someone you do have respect for — Holden Karnofsky, or Paul Graham, or Peter Singer, or whoever.)

I imagine some people’s reaction to that being: “But wait! Are you saying that Eliezer/​Holden/​whoever is a bad actor?? That seems totally wrong, what about evidence A B C X Y Z...”

Which seems to me to be missing the point:

1. The processes required to catch bad actors reliably, are often (though not always) similar to the processes required to correct innocent errors by good actors.

You do need to also have “bad actor” in your hypothesis space, or you’ll be fooled forever even as you keep noting weird data points. (More concretely, since “bad actor” is vague verbiage: you need to have probability mass on people being liars, promise-breakers, Machiavellian manipulators, etc.)

But in practice, I think most of the problem lies in people not noticing or sharing the data points in the first place. Certainly in SBF’s case, I (and I think most EAs) had never even heard any of the red flags about SBF, as opposed to us hearing a ton of flags and trying to explain them away.

So something went wrong in the processes “notice when something is off”, “blurt out when you notice something is off”, and “propagate interesting blurtings so others can hear about them”, more so than in the process “realize that someone might be a bad actor if a long list of publicly discussed things already seem off about them”.

(Though I assume some EAs — ones with more insider knowledge about SBF than me — made the latter mistake too.)

2. If a community only activates its “blurt out objections when you think you see an issue” reflex when it thinks it might be in the presence of bad actors, then (a) it will be way harder for the community to notice when a bad actor is present, but also (b) a ton of other dysfunctions became way likelier in the community.

I think (b) is where most of the action is.

EA has a big problem, I claim — relative to its goals and relative to what’s possible, not necessarily relative to the average intellectual community — with...

  • excessive deference;

  • passivity and “taking marching orders” (rather than taking initiative);

  • not asking questions or raising objections;

  • learned helplessness;

  • lack of social incentive to blurt things out when you’re worried you might be wrong;

  • lack of social incentive to build up your own inside-view model (especially one that disagrees with all the popular views among elite EAs);

  • general lack of error-correction and propagation-of-information-about-errors;

  • excessive focus on helping EA’s image (“protecting the brand”), over simple inquiry into obvious questions that interest or confuse you.

I think EA leadership is unusually correct, and I think it legit can be hard for new EAs to come up with arguments that haven’t already been extensively considered at some point in the past, somewhere on the public Internet or in unpublished Google Docs or wherever. So I think it’s easy to understand why a lot of EAs are wary of looking stupid by blurting out their naive first-pass objections to things.

But I think that not blurting those things out turns out to have really serious costs at the community level. (Even in cases where a myopic Causal Decision Theorist would say it’s individually rational.)

First, because it means that the EA with a nagging objection never learns why their objection is right or wrong, and therefore permanently has a hole in their model of reality.

And second, because a lot of how EA ended up unusually correct in the first place, was people autistically blurting out objections to “obvious-seeming” claims.

If we keep the cached conclusions of that process but ditch the methods that got us here, we’re likely to drift away from truth over time (or at least fail to advance the frontier of knowledge nearly as much as we could).

EA is not “finished”. We have not solved the problem of “figure out a plan that saves the world”, such that the main obstacle is Implementing Existing Ideas. The main obstacle continues to be Figuring Things Out.

EAs should note and propagate criticisms and objections to their Favorite Ideas and Favorite People just because they’re curious about what the answer is.

(And aren’t hindered by Modest Epistemology or Worry About Looking Dumb or Worry About Making EA Look Bad, so they’re free to blurt without first doing a complicated calculus about whether it’s Okay to say the first thought that popped into their head.)

They shouldn’t need to suspect that their Favorite Idea is secretly false/​bad, or that their Favorite Person is secretly evil/​corrupt, in order to notice an anomaly and go “huh, what’s that about?” and naively raise the issue (including raising it in public).

Most Bayesian updating is incremental; and when a single piece of evidence is obviously decisive, it’s less likely that EAs will be the only ones who notice it, so it matters less whether we spot the thing first. The ambiguous, hard-to-resolve cases that require unusual heuristics, experience, or domain knowledge are most of where we can hope to improve the world.

If EAs want to outperform, they need to be good at the micro-level updates, and at building up good intuitions about areas via many, many repeated instances of poking at small things and seeing how reality shakes out.

I think we need to fix that process in EA — practice it more at the individual level, and find ways to better incentivize it at the group level.

Not just when there’s a big Generate A Far-Mode Criticism Of EA contest, or a clear Bad Guy to criticize, but when you just see an Eliezer-comment or Rob-comment or Toby-comment that doesn’t quite make sense to you and you blurt out that tiny note of dissonance, even if you fully expect that there’s a perfectly good response you just aren’t thinking of.

(Or no good response, but it’s OK because Eliezer Yudkowsky and Rob Bensinger and Toby Ord are not perfect angelic beings and people make mistakes.)

I do think that EA leadership isn’t dumb, and has thought a lot about the Big Questions, such that you’ll often be able to beat the larger intellectual market and guess at important truths if you try exercises like “attempt to come up with a good reason why Carl Shulman /​ Holden Karnofsky /​ etc. might be doing X, even though X isn’t what I’d do at a glance”.

But I don’t think this exercise should be required in order to blurt out a first-order objection. Noticing when something seems false is a lot easier than doing that and generating a plausible hypothesis about another human’s brain. And if you do come up with a plausible-sounding hypothesis, well, blurting out your first-order objection is a great way to test whether your hypothesis is correct!