The legibility trap: how "fund the most rigorous thinkers" can quietly select for the wrong people

Link post

Disclosure: I wrote this with help from an LLM as a drafting and editing partner. The argument, the experience, the factual claims, and every editorial decision are mine; I directed the drafting, verified the sources, and revised heavily. But enough of the final wording originated in that collaboration that it likely crosses the 10% threshold, so I’m flagging it.

Cross-posted and adapted from a piece I wrote for a general audience. I’ve rewritten it for this community because the merits-first version is the one worth your time.

Epistemic status: High confidence that speed-and-legibility constraints shape philanthropic portfolios. Medium confidence that this systematically disadvantages high-proximity organizations relative to their expected value. Lower confidence on the specific reforms I propose. I’m posting because I think the central claim is testable, and I’d like help sharpening it or finding evidence against it.

I spent a decade directing philanthropy at Google.org, helping deploy roughly $700M. I now build AI tools for the social sector and co-lead a nonprofit. I’m not writing this as an opponent of effective altruism. I share several of its core commitments: evidence matters, marginal dollars should be scrutinized, and good intentions are not enough. My concern is narrower, and I think the next wave of AI philanthropy could make an avoidable version of an old mistake.

What I’m not arguing

Expected-value reasoning works. GiveWell-directed money has almost certainly saved tens of thousands of lives that less disciplined philanthropy would not have. Cost-effectiveness analysis is one of this movement’s genuinely great contributions, and the world would be better if more philanthropy were subjected to it. I don’t believe rigor is bad, that measurement is inherently oppressive, or that we should fund on narrative and good intentions. I’ve watched good intentions waste enormous amounts of money up close.

One thing I am arguing is worth noting, because it’s easy to misread as a genetic fallacy. I’m not claiming statistics is tainted because eugenicists built it. Galton coined the word and built regression; Pearson and Fisher held the eugenics chair after him; none of that makes regression false. The empirical core of this post stands without the history at all. But there’s a live, present-tense pattern worth naming, not just a dead one, and I’ll make that case explicitly later rather than smuggle it in.

The claim

When a field decides that the people best equipped to solve its hardest problems are the people easiest to recognize as rigorous, it builds a selection mechanism. And that mechanism can do something its designers do not intend: it filters out the people with the most direct knowledge of the problem. My hypothesis is that in many under-resourced social-change domains, proximity and elite-funder legibility are in tension. The leaders closest to a problem are often harder for a distant funder to evaluate quickly, not because they’re weaker but because the signals they carry are less familiar to the funder.

Here’s how I define the terms:

Legibility is how easily a funder can assess someone using the tools the funder already trusts: a clean theory of change, a logic model, a quantified track record, a credential the funder recognizes, a warm intro from someone in the funder’s network.

Proximity is direct lived or practitioner knowledge of the problem, the kind held by someone who has survived the thing, or organized the people who have.

The trap: the org with the deepest proximity is often among the least legible to a new funder, because its leaders didn’t come up through the institutions that manufacture legibility. So a system optimized to move money toward the most assessable opportunities can, without anyone deciding it, move money away from the closest knowledge.

I lived both sides of this.

The view from both seats

For ten years at Google.org I was the funder. I moved money at speed with this logic: to deploy fast, you fund what you can diligence fast, and what you can diligence fast is whatever is already legible to you. I was good at it. I also watched the legibility filter operate on people I knew were extraordinary.

One example I think about often. I made a $2M grant to a Black-led organization reducing gun violence, a group organizing the people closest to the shootings around an approach with real evidence behind it. The grant wasn’t large enough to require executive sign-off under our own thresholds. The institution asked me to get a senior vice president to approve it anyway. The work made the system reach for a higher signature than its own rules required. I don’t think anyone involved was acting in bad faith. The discomfort was structural: the org was less legible to the institution than its dollar figure warranted, and the institution compensated by demanding more authority.

Then I left to build. Now I run organizations that funders can’t diligence quickly, because they’re built on lived experience that doesn’t map onto a standard logic model. The view from the other side of the table is what convinced me the filter is real and mostly invisible to the people applying it.

There’s a further point here, and it isn’t a critique from outside EA. I worked closely with Open Philanthropy’s criminal justice program during the years it ran one of its largest domestic reform efforts: more than $130M in grants under Chloe Cockburn. We sat in the same funders collaborative, ran events together, shared grant due diligence. Some of you know Nuño Sempere’s critical review, which put the work at a fraction of the cost-effectiveness of top GiveWell charities, and Open Phil’s own parting rationale cited the stronger cost-effectiveness of global health. That’s real. I won’t pretend the program was a cost-per-QALY triumph. But look at what happened when Open Phil stepped back. It didn’t wind the work down or hand it to generalist analysts. It spun it out as Just Impact, a fund whose explicit mission is building the power of “highly strategic, directly-impacted leaders.” An EA-aligned funder revising a cause downward on its own metrics still chose, for the part it wanted to keep, a proximity-led structure.

Why this should bother EAs specifically

You already have the counterexample in your own canon. Direct cash transfers are among the most rigorously evidenced interventions in global development, and the entire premise of GiveDirectly is that the person in poverty knows how to allocate a dollar better than an expert program designed around a single outcome. I’m not citing this from the outside. Google.org was one of GiveDirectly’s earliest funders and my team directly supported the work, putting in over $10M from 2012 onward to support both the transfers and the research that validates them. The evidence supports a careful version of the claim. In a Rwandan youth-employment benchmark, cost-equivalent cash was superior across several economic outcomes, while the training program beat cash on business knowledge. In a child-nutrition benchmark, neither the in-kind program nor cost-equivalent cash moved core child outcomes within a year, though cash produced greater consumption and asset accumulation and a larger transfer improved dietary diversity and growth. So the lesson isn’t “cash always wins.” It’s that recipient choice performs well enough that complex expert-designed programs should often have to beat cost-equivalent cash, not merely beat no treatment. One of the best-evidenced things this movement endorses already rests on a bet that proximity can beat expert design. You apply it to recipients of cash. I’m asking you to consider it for the people who allocate.

The analogy isn’t perfect. A household deciding how to use cash is not the same as a community deciding how to allocate philanthropic capital across public goods, externalities, and collective-action problems. Cash can’t fix broken public services, and proximity doesn’t dissolve coordination failures. A 2025 RCT of monthly payments to low-income U.S. families with young children found no measurable improvement in child development outcomes and somewhat higher maternal anxiety, so the recipient-knows-best result has real limits. The narrow claim survives all of it: “the people closest to a problem can allocate better than distant experts” is something your own best evidence supports in important cases. That makes proximity decision-relevant, not sentiment to be stripped out.

The mechanism, stated as a crux

My central empirical claim: holding cause area and intervention quality constant, philanthropic capital deployed under speed-and-legibility constraints will systematically underfund high-proximity organizations relative to their true expected value, because assessment cost is correlated with distance from the funder’s network rather than with impact.

If that’s true, a field that scales fast on a “tech-caliber talent” frame doesn’t just risk reproducing existing hierarchies. It selects for them.

If it’s false, the cleanest disconfirming evidence would be one of two things: funders using rapid, legibility-weighted diligence show no proximity skew in their portfolios once you control for measured cost-effectiveness; or high-proximity orgs that do get funded systematically underperform legible ones at equivalent funding levels. I haven’t found that evidence, but I’d welcome it.

The third-wave context

Nan Ransohoff’s widely-read essay on the “third wave” of American philanthropy argues that AI-linked wealth could generate tens of billions a year in new giving, and that the sector needs more talent, better infrastructure, and faster ways to move money. I agree with a lot of it. My friend and former colleague Andrew Dunckelman, deputy director at the Gates Foundation, wrote a response that sharpened the part I think matters most. The question isn’t only how fast the money can move, it’s who decides what counts as progress, and whether new institutions end up donor-legible but not field-legitimate.

My worry is narrow and specific. The third-wave frame imports Silicon Valley’s selection function wholesale: fund the tech-caliber builders, model the allocators on Sequoia, expect to be wary of people from traditional philanthropy by default. In that ecosystem of funders, allocators, and builders, the people closest to the problems appear only as the problems. If you build the next $50B of philanthropic infrastructure on a frame where proximity is invisible, the legibility trap stops being an accident of diligence and becomes part of the architecture.

This is adjacent to the democratic critique of EA that already exists on this forum, but it is not the same argument. I’m not primarily claiming that allocation should be more democratic because democracy is intrinsically good, though I often think it is. I’m making an efficiency claim: proximity may be decision-relevant evidence about expected impact, and current diligence systems may systematically discount it. If I’m right, this isn’t only a fairness problem. It’s a mispricing.

What I’m actually proposing

Here are three proposals, each meant to be tested rather than taken on faith.

1. Treat proximity as a measurable input to expected value, not a tiebreaker after it. Most cost-effectiveness models implicitly assume the funder can see the opportunity. When assessment cost is correlated with distance from impact, that assumption biases the portfolio. Build legibility cost into the model explicitly and you can at least see the skew you’re introducing.

2. Fund rigorous evaluations of participatory allocation. Participatory grantmaking, ceding allocation decisions to people with lived proximity, is promising but not proven. The Ford Foundation’s own review concedes the better-outcomes hypothesis “has yet to be backed up by a solid body of evidence,” and a 2023 evidence review reached the same verdict. That’s a reason to run the comparison, not dismiss it. Some of it could be RCTs; some could be randomized reviewer-panel designs, quasi-experimental comparisons across grantmaking processes, or retrospective portfolio analyses scored against pre-specified outcomes. We have decades of rigorous evidence on cash and almost none comparing participatory and expert allocation at equivalent scale. That’s a gap EA is unusually well-positioned to close, and the result would be decision-relevant either way.

3. Back bridgers, and define them operationally so the proposal can be tested rather than admired. The strongest versions of proximity-led philanthropy are not anti-institutional, and this is where the false binary between “elite analyst” and “community voice” breaks down. At Google.org, some of the best work I funded ran through intermediated systems that combined local trust with measurable structure. The Grow with Google Small Business Fund worked through Community Development Financial Institutions: they sit far closer to small businesses than a corporate funder ever could, but they also carry the underwriting discipline, reporting infrastructure, and repeatable process to move serious capital. That fund deployed $180M across 62 CDFIs, enabled roughly 131,000 small-business loans totaling $1.6B, and directed 56% of funds to minority-owned businesses. That is not “trust whoever has lived experience.” It is proximity plus structure plus measurable outcomes, which is exactly what I mean by a bridger.

By bridger I mean a person or institution with four traits, each of which is assessable:

Community legitimacy: credible, accountable relationships with the people most affected by the problem.
Execution capacity: the ability to manage money, hire, report, and deliver.
Translational fluency: the ability to speak to funders and to communities without reducing either to the other.
Evaluation openness: willingness to be measured, audited, and compared against alternatives.

A selection criterion built on those four traits tracks impact better than one built on credential familiarity and network proximity, and it’s harder to game than vague appeals to authenticity because all four are observable.

The pattern I actually want to name

I put this after the proposals because they stand without it. The deeper claim is harder to prove, and I want it judged separately from them.

When a system rewards measurable cognitive performance and then lets that performance stand in for moral standing, it hands the people at the top a way out of a hard question. If my intelligence earned me the right to decide, I never have to ask whether I should be the one deciding. The ranking does the moral work so the person doesn’t have to, and the trait that justifies the power is exactly the trait it feels unwelcome to examine.

That move is old, and naming its history is not a genetic fallacy, because I’m not using the history to discredit the tools. I’m pointing at a recurring structure. Eugenics was never only a set of statistical methods. It was a project: rank human beings on a single scale of measurable worth, then use the ranking to decide who deserves to flourish and who deserves to decide.

The tools became respectable. The project’s logic, measure the trait, let the trait confer standing, did not disappear; it recurs in IQ and the g-factor, in the SAT (whose designer sat on the advisory council of the American Eugenics Society), and, I’d argue, in a present-day culture that treats raw cognitive horsepower as the thing that entitles a person to hold the pen. Galton endowed what became UCL’s Galton Chair of Eugenics; Pearson and Fisher each held it. The same instinct, measured intelligence as the arbiter of human standing, runs through a long line of otherwise brilliant people.

I’m not asking you to accept that this history discredits regression, or EA, or anything else. The tools are fine. The claim is narrower: “the highest-scoring, most rigorous reasoners should be the ones to decide” is not a neutral, ahistorical principle. It can become another instance of a pattern that has reliably seated a particular kind of person at the table and called it merit. You can believe every theorem in the textbook and still hold that suspicion. I’m only insisting on the suspicion.

What I’m asking

I’m not asking for less rigor, but for rigor applied to philanthropy’s own selection mechanism. The question was never whether the third wave of philanthropy will be smart enough to deploy this money well. The people deciding will be brilliant. The question is whether they’ll build a system that can see, fund, and trust the people who actually know, or whether “move fast” will keep quietly resolving to “fund who we can already read.”

I think it’s the latter by default, and I think this community has both the tools and the stated values to make it otherwise. I’d like to be argued out of the parts of this I have wrong. In particular I’d welcome:

Evidence that rapid grantmaking does, or does not, skew toward elite-legible organizations once you control for measured cost-effectiveness.
Examples of funders already measuring proximity or field legitimacy well, so I’m not reinventing a wheel.
Better ways to define proximity that keep it falsifiable rather than vague.
Reasons to expect bridgers, as defined above, to underperform analysts or traditional grantmakers.
Existing evaluations of participatory or community-led allocation I should read.

A note on provenance: this argument began as a piece for a general audience, which makes the moral case more directly and leans harder on the historical context I’ve kept light here. If you want that version, it’s at https://www.linkedin.com/pulse/i-have-two-harvard-degrees-theyre-wrong-qualification-justin-steele-0czsc. The two are aimed at different rooms and read differently in tone. I think the empirical claim above should stand or fall on its own terms either way, and that’s the version I’d most like pressure-tested.

The legibility trap: how “fund the most rigorous thinkers” can quietly select for the wrong people