Advice to pivot into AI Safety is likely miscalibrated

Link post

tl;dr

AI career advice orgs, prominently 80,000 Hours, encourage career moves into AI safety roles, including mid‑career pivots. I analyse the quality of this advice from the private satisfaction, public-good, and counterfactual equilibrium perspectives, and learn the following things.

Rational Failure: If you value personal, direct impact highly, it can be rational to attempt a pivot that will probably fail (e.g., $≪ 50 %$ success chance).
Opacity: if we needed to work out whether our advice was producing good or not, we would need data both on success rates and candidate quality distributions, which are currently unavailable.
Misalignment: The optimal success rate for the field (maximizing total impact) differs from the optimal rate for individuals (maximizing personal EV). The advice ecosystem appears calibrated to neither.
Counterfactual impact: Counterfactually, the value of a pivot is much lower than naïvely; in highly contested roles you need not only to be the best but to be the best by a wide enough margin to justify all the effort of the other people you are displacing.
Donations: If you donate at the moment and would pause donations while taking a career sabbatical, that is very likely a negative EV move in public goods terms. If the EV of a pivot is uncertain, donating the attempt costs (e.g., the sabbatical expenses you were willing to pay) provides a guaranteed positive counterfactual impact and doesn’t require all this fancy modelling.

The analysis here should not be astonishing; we know advice calibration is hard, and we know that the collective and private goals might be misaligned. Grinding through the details nonetheless reveals some less obvious implications; for example not only are the number of jobs and the number of applicants important, but your model of the distribution of talent ends up having a huge effect. The fact that the latter two factors—number of applicants and talent distribution —are inscrutable to the people who make the risky choice to do the career pivoting, is why I think that advice about careers is miscalibrated.

The problem bites for mid-career professionals with high switching costs. It is likely less severe for early career professionals who have lower switching costs, or for people doing something other than switching jobs (e.g. if you are starting up a new organisation the model would be different). I propose some mitigations, both personal and institutional. I made an interactive widget to help individuals evaluate their own pivots.

This was a draft for Draft Amnesty Week. I revisited it to edit for style

Epistemic status

A solid back-of-the-envelope analysis of some relatively well-understood but frequently under-regarded concepts.

Here’s a napkin model of the AI-safety career-advice economy—or rather, three models of increasing complexity. They show how advice can sincerely recommend gambles that mostly fail, and why—without better data—we can’t tell whether that failure rate is healthy (leading to impact at low cost) or wasteful (potentially destroying happiness and even impact). In other words, it’s hard to know whether our altruism is “effective.”

In AI Safety in particular, there’s an extra credibility risk idiosyncratic to this system. AI Safety is, loosely speaking, about managing the risks of badly aligned mechanisms producing perverse outcomes. As such, it’s particularly incumbent on our field to avoid mechanisms that are badly aligned and produce perverse outcomes; otherwise we aren’t taking our own risk model seriously.

To keep things simple, we ignore as many complexities as possible.

We evaluate decisions in terms of cause impact, which we assume we can price in donation-equivalent dollars. This is a public good.
Individual private goods are the candidate’s own gains and losses. We assume the candidate’s job satisfaction and remuneration (i.e. personal utility) can be costed in dollars.

Part A—Private career pivot decision

An uncertain career pivot is a gamble, so we model it the same way we model other gambles.

Meet Alice

Alice is a senior software engineer in her mid-30s, making her a mid-career professional. She has been donating roughly 10% of her income to effective charities and now wonders whether to switch lanes entirely to achieve impact via technical AI safety work in one of those AI Safety jobs she’s seen advertised She has saved six months of runway funds to explore AI-safety roles—research engineering, governance, or technical coordination. Each month out of work costs her foregone income and reduced career prospects. Her question is simple: Is this pivot worth the costs?

To build the model, Alice needs to estimate four things:

Upside

Annual Surplus ( $Δ u$ ): This is the key number. It’s the difference in Alice’s total annual utility between the new AI safety role ( $u_{1}$ ) and her current baseline ( $u_{0}$ ). This surplus combines the change in her salary and her impact—indirectly via donations and directly by doing some fancy AI safety job.
- $u = w + α (i + d)$ , where $w$ is wage, $i$ is impact, $d$ is donations, and $α$ is Alice’s weighting of impact versus consumption.
- $Δ u := u_{1} - u_{0}$ .

Downside

Runway ( $ℓ$ ): The maximum time she’s willing to try, in years.
Burn Rate ( $c$ ): This is Alice’s net opportunity cost per year while on sabbatical (e.g., foregone pay, depleted savings), measured in k$/year.

Odds

Application Rate ( $u_{1}$ ): The number of distinct job opportunities she can apply for per year.
Success Probability ( $u_{0}$ ): Her average probability of getting an offer from a single application. We assume these are independent and identically distributed (i.i.d.).

Caution

The i.i.d. assumption (each job is independent) is likely optimistic. In reality, applications are correlated: if Alice is a good fit for one role, she’s likely a good fit for others (and vice-versa). We formalise this in the next section with a notion of candidate quality distributions that captures the notion that you don’t know your “ranking” in the field, but most people are not in at the top of it, by definition.

Timing

Discount Rate ( $ρ$ ): A continuous rate per year that captures her time preference. A higher $ρ$ means she values immediate gains more—for example, if she expects short AGI timelines, $ρ$ might be high.

Modeling the Sabbatical: The Decision Threshold

With these inputs, we can calculate the total expected value (EV) of her sabbatical gamble. The full derivation is in Appendix A, but here’s the result:

$Δ E V_ρ (p) = \frac{1 - e^{- (r p + ρ) ℓ}}{r p + ρ} (\frac{Δ u r p}{ρ} - c) .$

This formula looks complex, but its logic is simple. The entire decision hinges on the sign of the bracketed term: $(\frac{Δ u r p}{ρ} - c)$ This is a direct comparison between the expected gain rate (the upside $Δ u$ , multiplied by the success rate $r p$ , and adjusted for discounting $1 / ρ$ ) and the burn rate ( $c$ ). The prefactor scales that value according to the length of Alice’s runway and her discount rate.

The EV is positive if and only if the gain rate exceeds the burn rate. This means Alice’s decision boils down to a simple question: Is her per-application success probability, $p$ , high enough to make the gamble worthwhile?

We can find the exact break-even probability, $p^{*}$ , by setting the gain rate equal to the burn rate. This gives a much simpler formula for Alice’s decision threshold:

$p^{*} = \frac{c ρ}{r Δ u} .$

If Alice believes her actual $p$ is greater than this $p^{*}$ , the pivot has a positive expected value. If $p < p^{*}$ , she should not take the sabbatical, at least on these terms.

What This Model Tells Alice

This simple threshold $p^{*}$ gives us a clear way to think about her decision:

The bar gets higher: The threshold $p^{*}$ increases with higher costs ( $c$ ), shorter timelines, or greater impatience ( $ρ$ ). If her sabbatical is expensive or she’s in a hurry, she needs to be more confident of success.
The bar gets lower: The threshold $p^{*}$ decreases with more opportunities ( $r$ ) or a higher upside ( $Δ u$ ). If the job offers a massive impact gain or she can apply to many roles, she can tolerate a lower chance of success on any single one.
Runway doesn’t change the threshold: Notice that the runway length $ℓ$ isn’t in the $p^{*}$ formula. A longer runway gives her more expected value (or loss) if she does take the gamble, but it doesn’t change the break-even probability itself.
The results are fragile to uncertainty: This model is highly sensitive to her estimates. If she overestimates her potential impact (a high $Δ u$ ) or underestimates her time preference (a low $ρ$ ), she’ll calculate a $p^{*}$ that is artificially low, making the pivot look much safer than it is.^[1]
The key unknown: Even with a perfectly calculated $p^{*}$ , Alice still faces the hardest part: estimating her own actual success probability, $p$ .

That $p$ is, essentially, her chance of getting an offer. It depends not only on the number of jobs available but crucially on the number and quality of the other applicants.

All that said, this is a relatively “optimistic” model. If Alice attaches a high value to getting her hands dirty in AI safety work, she might be willing to accept a remarkably low $p$ ; we’ll see that in the worked example. Hold that thought, though, because I’ll argue that this personal decision rule can be pretty bad at maximizing total impact.

Caution

If you are using these calculations for real, be aware that our heuristics are likely overestimating Alice’s chances. Job applications are not IID. The effective number of independent shots is lower than the raw application count, reducing effective $r$ --- if your skills don’t match the first job, it is also less likely to match the second, because the jobs might be similar to each other.

Worked example

Let’s plug in some plausible numbers for Alice. She’s a successful software engineer who earns $w_{0} = 180$ k$/year, donates $d_{0} = 18$ k$/year after tax, and has no on-the-job impact $I_{0} = 0$ (i.e. no net harm, no net good). That means Alice earns $w_{0} = 180$ and donates $d_{0} = 18$ . A target role offers $w_{1} = 120$ , $d_{1} = 0$ , and $I_{1} = 100$ . Set $α = 1$ , runway $ℓ = 0.5$ years, application rate $r = 24$ /year, discount $ρ = 1 / 3$ , and burn $c = 50$ . Then $Δ u = (120 + 0 + 100) - (180 + 18 + 0) = 22$ and $p^{*} = \frac{c ρ}{r Δ u} = \frac{50 \cdot \frac{1}{3}}{24 \cdot 22} \approx 3.16 % .$ Over six months, the chance of at least one success at $p^{*}$ is $q^{*} = 1 - e^{- r p^{*} ℓ} \approx 31.5 %$ . Alice’s expected actual sabbatical length is $E [τ] = \frac{1 - e^{- r p^{*} ℓ}}{r p^{*}} \approx 0.416 years (\approx5.0 months)$ , and, conditional on success, it’s $E [τ ∣ success] \approx 0.234 years (\approx2.8 months)$ . Under these assumptions, we expect the sabbatical to break even because the job offers enough upside to compensate for a greater-than-even risk of failure.

We plot a few values for Alice to visualize the trade-offs for different upsides $Δ u$ .

To play around with the assumptions, try the interactive Pivot EV Calculator (source at danmackinlay/career_pivot_calculator).

Part B—Field-level model

tl;dr

In a world with a heavy-tailed distribution of candidate impact, the field benefits from many attempts because a few “hits” dominate. In light-tailed worlds, the same encouragement becomes destructive. We simply don’t know which world we’re in.

So far, this has been Alice’s private perspective. Let’s zoom out to the field level and consider: What if everyone followed Alice’s decision rule? Is the resulting number of applicants healthy for the field? What’s the optimal number of people who should try to pivot?

From personal gambles to field strategy

Our goal is to move beyond Alice’s private break-even ( $p^{*}$ ) and calculate the field’s welfare-maximizing applicant pool size ( $K^{*}$ ). This $K^{*}$ is how many “Alices” the field can afford to have rolling the dice before the costs of failures outweigh the value of successes.

To analyze this, we must shift our model in three ways:

Switch to a Public Ledger: From a field-level perspective, private wages and consumption are just transfers; they therefore drop out of the analysis. What matters is the net production of public goods (i.e., impact).
Distinguish Public vs. Private Costs: The costs are now different.
- Private Cost (Part A): $c$ included Alice’s full opportunity cost (foregone wages, etc.).
- Public Cost (Part B): We now use $γ$ , which captures only the foregone public goods during a sabbatical (e.g., $γ = I_{0} + d_{0} + ε$ , or baseline impact + baseline donations + externalities).
Move from Dynamic Search to Static Contest: Instead of one person’s dynamic search, we’ll use a static “snapshot” model of the entire field for one year. We assume there are $N$ open roles and $K$ total applicants that year.

Reconciling the individual and field-level Models

In Part A, Alice saw jobs arriving one-by-one (a Poisson process with rate $r$ ). In Part B, we are modeling an annual “contest” with $K$ applicants competing for $N$ jobs.

We can bridge these two views by setting $N \approx r$ . This treats the entire year’s worth of job opportunities as a single “batch” to be filled from the pool of $K$ candidates who are “on the market” that year.

This is a standard simplification. It allows us to stop worrying about the timing of individual applications and focus on the quality of the matches, which is determined by the size of the applicant pool ( $K$ ). We can then compare the total Present Value (PV) of the benefits (better hires) against the total PV of the costs (failed sabbaticals).

Here’s a paragraph to bridge those two concepts. I’d suggest placing this just before the first plot in Part B, where you start to visualize $W (K)$ .

If $N$ jobs are available annually (which we’ve already equated to Alice’s application rate $r$ ) and $K$ total applicants are competing for them, a simple approximation for the per-application success probability is that it’s proportional to the ratio of jobs to applicants.

For the rest of this analysis, we’ll assume a simple mapping: $p \approx N / K$ . This allows us to plot both models on the same chart: as the field becomes more crowded ( $K$ increases), the individual chance of success ( $p$ ) for any single application shrinks.

The Field-Level Model: Assumptions

Here’s the minimally complicated version of our new model:

There are $K$ total applicants and $N$ open roles per year.
Each applicant $k$ has a true, fixed potential impact $J^{(k)}$ drawn i.i.d. from the talent distribution $F$ .
Employers perfectly observe $J^{(k)}$ and hire the $N$ best candidates. (This is a strong, optimistic assumption about hiring efficiency).
Applicants do not know their own $J^{(k)}$ , only the distribution $F$ .

The intuition is that the field benefits from a larger pool $K$ because it increases the chance of finding high-impact candidates. But the field also pays a price for every failed applicant.

Benefits vs. Costs on the Public Ledger

Let’s define the two sides of the field’s welfare equation.

The Marginal Benefit (MV) of a Larger Pool

The benefit of a larger pool $K$ is finding better candidates. We care about the marginal value of adding one more applicant to the pool, which we define as ${M V}_{K}$ . This is the expected annual increase in impact from widening the pool from $K$ to $K + 1$ . (Formally, ${M V}_{K} := E [S_{N, K + 1}] - E [S_{N, K}]$ , where $S_{N, K}$ is the total impact of the top $N$ hires from a pool of $K$ ).

The Marginal Cost (MC) of a Larger Pool

The cost is simpler. When $K > N$ , adding one more applicant results (on average) in one more failed pivot. This failed pivot costs the field the foregone public good during the sabbatical. We define the social burn rate per year as $γ$ . To compare this to the annual benefit ${M V}_{K}$ , we need the total present value of this foregone impact. We call this $L_{fail, δ}$ (the PV of one failed attempt). (This cost is derived in Appendix B as $L_{fail, δ} = γ \frac{1 - e^{- δ ℓ}}{δ}$ ).

We do not model employer congestion from reviewing lots of applicants —the rationale is that it’s empirically small because employers stop looking at candidates when they’re overwhelmed [@Horton2021Jobseekers].[^2] Note, however, that we also assume employers perfectly observe $J^{(k)}$ , which means we’re being optimistic about the field’s ability to sort candidates. Maybe we could model a noisy search process?

Field-Level trade-offs

We can now find the optimal pool size $K^{*}$ . The total public welfare $W (K)$ peaks when the marginal benefit of one more applicant equals the marginal cost.

As derived in Appendix B, the total welfare $W (K)$ is maximized when the present value of the annual benefit stream from the marginal applicant ( ${M V}_{K} / δ$ ) equals the total present value of their failure cost ( $L_{fail, δ}$ ). $\frac{{M V}_{K}}{δ} = L_{fail, δ}$ Substituting the expression for $L_{fail, δ}$ and cancelling the discount rate $δ$ , we get a very clean threshold: ${M V}_{K} = γ (1 - e^{- δ ℓ}) .$ This equation is the core of the field-level problem. The optimal pool size $K^{*}$ is the point where the expected annual marginal benefit ( ${M V}_{K}$ ) falls to the level of the total foregone public good from a single failed sabbatical attempt.

The Importance of Tail Distributions

How quickly does ${M V}_{K}$ shrink? Extreme value theory tells us it depends entirely on the tail of the candidate-quality distribution, $F$ . The shape of the tail determines how quickly returns from widening the applicant pool diminish.

We consider two families (the specific formulas are in Appendix B):

Light tails (e.g., Exponential): In this world, candidates vary, but the best isn’t transformatively better than average. Returns diminish quickly: the marginal value ${M V}_{K}$ shrinks hyperbolically (roughly as $1 / K$ ).
Heavy tails (e.g., Fréchet): This captures the “unicorn” intuition. Returns diminish much more slowly. If the tail is heavy enough, ${M V}_{K}$ decays extremely slowly, justifying a very wide search.

Implications for Optimal Pool Size

This difference in diminishing returns hugely affects the optimal pool size $K^{*}$ . The full solutions for $K^{*}$ are in Appendix B.

With light tails, there’s a finite pool size beyond which ramping up the hype (increasing $K$ ) reduces net welfare. Every extra applicant burns $L_{fail, δ}$ in foregone public impact while adding an ${M V}_{K}$ that shrinks rapidly.

With heavy tails, it’s different. As the tail gets heavier, $K^{*}$ explodes. In very heavy-tailed worlds, very wide funnels can still be net positive. We may decide it’s worth, as a society, spending a lot of resources to find the few unicorns.

We set the expected impact per hire per year to $μ_{imp} = 100$ (impact dollars/yr) to match Alice’s hypothetical target role. This is just for exposition.

We can, of course, plot this.

This plot shows total net welfare $W (K)$ and marks the maximum $K^{*}$ for each family, so we can see where total welfare peaks. The dashed line at $K = N$ shows where failures begin: $(K > N \Rightarrow K - N$ people each impose a public cost of $L_{fail, δ})$ . The markers show $K^{*} = arg max W (K)$ , the pool size beyond which widening further would reduce total impact.
Units: $B (K)$ is in impact dollars per year and is converted to PV by multiplying by $H_{δ} = \frac{1}{δ}$ . The subtraction uses the discounted per-failure cost $L_{fail, δ} = γ \frac{1 - e^{- δ ℓ}}{δ}$ .
Fréchet curves use the large- $K$ asymptotic $B (K) \approx s K^{1 / α} C_{N}$ (with $s = μ_{imp} / Γ (1 - 1 / α)$ ). We could get the exact $B (K)$ for Fréchet, but the asymptotic is good enough to illustrate the qualitative behaviour.
We treat all future uncertainties about role duration, turnover, or project lifespan as already captured in the overall discount rate $δ$ .

We can combine these perspectives to visualize the tension between private incentives and public welfare.

This visualization combines the private and public views by assuming an illustrative mapping from pool size to success probability: $p \approx β N / K$ , where $β$ bundles screening efficiency; in this figure $β = 1$ . The black curve (left axis) shows a candidate’s private expected value (EV) as a function of success probability $p$ . The coloured curves (right axis) show the field’s welfare, $W (K)$ . The private break-even point $p^{*}$ (black dashed line) can fall far to the left of the field-optimal point $p (K^{*})$ (coloured vertical lines). This gap marks the region where individuals may be rationally incentivized, at the field level, to enter even though, at the candidate level, the field is already saturated or oversaturated.

<div>

None

If your applicant pool does not have heavy tails, widening the funnel likely increases social loss.

</div>

Part C—Counterfactual Impact and Equilibrium

Part A modeled Alice’s pivot as a private gamble, where the upside was the absolute impact ( $I_{1}$ ) of the new role. Part B zoomed out, analyzing the field-level optimum ( $K^{*}$ ) and showing how the marginal value ( ${M V}_{K}$ ) of adding one more applicant shrinks as the pool ( $K$ ) grows.

Now we connect these two views. What if Alice is a sophisticated applicant? She understands the field-level dynamics from Part B and wants to maximize her true counterfactual impact—not just the absolute impact of the role she fills. She must update her private EV calculation from Part A.

This introduces a feedback loop: Alice’s personal incentive to apply now depends on the crowd size ( $K$ ) and the talent distribution ( $F$ ), and those factors in turn affect the crowd size.

The Counterfactual Impact Model

In Part A, Alice’s upside $Δ u$ included the absolute impact $I_{1}$ . But her true impact is counterfactual: it’s the value she adds compared to the next-best person who would have been hired if she hadn’t applied.

How can she estimate this? She doesn’t know her own quality ( $J^{(k)}$ ) relative to the pool. If we assume she is, from the field’s perspective, just one more “draw” from the talent distribution $F$ (formally: applicants are exchangeable), then her expected counterfactual impact from the decision to apply is exactly the marginal value of adding one more applicant: ${M V}_{K}$ (from Part B).

This ${M V}_{K}$ is her ex-ante expected impact before the gamble is resolved. But her EV calculation (from Part A) needs the upside conditional on success.

Let $I_{C F}$ be this value: the expected counterfactual impact given she succeeds and gets the job. Let $q_{K}$ be her overall probability of success in a pool of size $K$ . (In the static model of Part B, if she joins a pool of $K$ others, there are $K + 1$ applicants for $N$ slots, so $q_{K} \approx N / (K + 1)$ ). If her attempt fails (with probability $1 - q_{K}$ ), her counterfactual impact is zero.

Therefore, the ex-ante expected impact ${M V}_{K}$ is simply the probability of success multiplied by the value of that success:

${M V}_{K} = (q_{K} \cdot I_{C F}) + ((1 - q_{K}) \cdot 0) = q_{K} \cdot I_{C F} .$

This gives us the key value Alice needs. Alice’s expected counterfactual impact, conditional on success, is:

$I_{C F} = \frac{{M V}_{K}}{q_{K}} .$

Alice can now recalibrate her private decision from Part A. She defines a counterfactual private surplus, $Δ u_{C F}$ , by replacing the naive, absolute impact $I_{1}$ with her sophisticated, counterfactual estimate $I_{C F}$ .

This changes the dynamics entirely. In Part A, the value of the upside ( $Δ u$ ) was fixed. Now, the value of the upside ( $Δ u_{C F}$ ) itself depends on the pool size $K$ .

As $K$ grows, both ${M V}_{K}$ (the marginal value) and $q_{K}$ (the success chance) decrease. How $I_{C F}$ behaves depends on which of those decreases faster (the marginal value or the success chance)---a property determined by the tail of the impact distribution $F$ .

The Dynamics of Counterfactual Impact

The behaviour of $I_{C F}$ leads to radically different incentives depending on the tail shape of the talent pool.

Case 1: Light Tails

In a light-tailed world (e.g., an Exponential distribution), talent is relatively clustered. The math shows (see Appendix: see Appendix) that ${M V}_{K}$ and $q_{K}$ shrink at roughly the same rate, causing their ratio to be constant.

$I_{C F} = μ (Light Tail)$

(where $μ$ is the population’s average impact).

Intuition: As the pool $K$ grows, the quality of the $N$ th-best hire—the person we displace—rises almost as fast as the quality of the average successful hire (us). The gap between us and the person we displace remains small and roughly constant. Our counterfactual impact is just the average impact, $μ$ .

Implication: If the average impact $μ$ is modest, $I_{C F}$ may not be enough to offset a large pay cut (like Alice’s). In this world, we’d expect pivots with high private costs to be EV-negative for the average applicant, regardless of how large the pool gets.

Case 2: Heavy Tails

In a heavy-tailed world (e.g., Fréchet), “unicorns” with transformative impact exist. Here, ${M V}_{K}$ shrinks much more slowly than $q_{K}$ . As shown in the appendix, for a Fréchet distribution with shape $α$ , the result is:

$I_{C F} \propto K^{1 / α} (Heavy Tail)$

The expected counterfactual impact conditional on success actually increases as the field gets more crowded.

Intuition: Success in a very large, competitive pool ( $K$ ) is a powerful signal. It suggests we aren’t just “good,” but potentially a “unicorn.” We aren’t just displacing the $N$ th-best person; we’re potentially displacing someone much further down the tail.

Implication: In this world, the rate of decay of the field value with funnel size can be slow. The potential upside $I_{C F}$ can grow large enough to easily offset significant private costs (like pay cuts and foregone donations). As a corollary, if we believe in unicorns, it can still make sense to risk large private costs with a low chance of success in order to discover whether we are, in fact, a unicorn.

Alice Revisited

Alice revisited. With light‑tailed assumptions, $I_{C F}$ equals the population mean $μ$ and is too small to offset Alice’s pay cut and lost donations—Alice’s counterfactual surplus is negative regardless of $K$ . Under heavy‑tailed assumptions, $I_{C F}$ rises with $K$ ; across a broad range of conditions, the pivot can become attractive despite large pay cuts (i.e. if Alice might truly be a unicorn). The sign and size of this effect hinge on the tail parameter and scale, which are currently unmeasured.

Visualizing Private Incentives vs. Public Welfare

We can now visualize the dynamics of private, public, and counterfactual private valuations by assuming an illustrative mapping between pool size and success probability: $p \approx β N / K$ . This allows us to see how the incentives change as the field gets more crowded (moving left on the x-axis).

Visualizing the Misalignment

This visualization brings all three models together, using Alice’s parameters to illustrate the tensions between private incentives and public good. The plot is dense, but it reveals three key dynamics:

The Information Gap (A vs. C): The “Naive EV” (A, black solid line) is far more optimistic than the “Counterfactual EV” (C, dashed colored lines). An applicant using the simple model from Part A—ignoring her displacement effect—will drastically overestimate the personal EV of pivoting.
The Cost Barrier (C): In light-tailed worlds (Exponential, purple; Fréchet $α = 3.0$ , red), the sophisticated applicant’s EV is always negative. Alice’s financial losses (her $50 k$ burn rate plus $18 k$ in foregone donations) dominate any plausible counterfactual impact. Only in heavy-tailed “unicorn” worlds (Fréchet $α = 2.0$ , green; $α = 1.8$ , orange) does the pivot become EV-positive.
The Structural Misalignment (B vs. C): This is the core problem. In the heavy-tailed worlds where pivoting is privately rational (green, orange), the socially optimal pool size ( $K^{*}$ , the peak of the solid welfare lines) is far smaller than the private equilibrium ( $K_{e q}$ , where the dashed EV lines cross zero). The system incentivizes massive Over-Entry.

Why Does This Happen? The Misalignment Mechanism

The plot shows that a misalignment exists; our model explains why it’s structural. It boils down to a conflict between the private cost of trying and the social cost of failing.

We can identify the two different “stop” signals:

The Social Optimum ( $K^{*}$ ): The field’s total welfare (Part B) peaks when the marginal impact of one more applicant ( ${M V}_{K}$ ) drops to equal the social cost of the applicant’s (likely) failed attempt. For Alice, this cost is the public good she stops producing during her sabbatical—primarily her donations.
- Social Cost of Failure ( $γ$ ): $18 k$ /year.
The Private Equilibrium ( $K_{e q}$ ): A sophisticated Alice (Part C) stops applying when her private EV becomes zero. This happens when the marginal impact she can expect to have ( ${M V}_{K}$ , which she shares with other applicants) drops to equal her effective private cost hurdle per application. As derived in Appendix C, this hurdle is $\frac{c ρ}{r}$ .
- Private Cost Hurdle ( $\frac{c ρ}{r}$ ): $\frac{50 k \cdot 1 / 3}{24} \approx 0.69 k$ (or $690).

This is the misalignment.

The field implicitly “wants” applicants to stop when the marginal impact benefit drops below $18,000. But because the job search is so efficient (high $r$ ), Alice’s private cost for one more “shot at the prize” is only $690.

She is rationally incentivized to keep trying long after her entry is creating a net social loss. This is what drives the massive gap in our heavy-tailed model: the social optimum $K^{*}$ might be around 11,600, but the private equilibrium $K_{e q}$ balloons to over 178,000.

This leads to a clear, if sobering, takeaway for a mid-career professional like Alice. Given her high opportunity cost (including foregone donations), donating is the robustly higher-EV option unless she has strong, specific evidence that both (a) the talent pool is extremely heavy-tailed and (b) she is likely to be one of the “unicorns” in that tail.

Implications and Solutions

Our takeaway is that the AI safety career funnel, and likely other high-impact career funnels, is miscalibrated; not in the sense that we are definitely producing net harm right this minute, but in the sense that the feedback mechanisms do not exist to make sure that we do not produce net harm. It’s hard to know how bad this is, but Alice’s $18 k$ social cost versus the $690$ private hurdle is worrisome if it is typical.

That is to say, all this work on alignment is misaligned. Organizations influencing the funnel size don’t internalize the costs borne by unsuccessful applicants. This incentivizes maximizing application volume (a visible proxy) rather than welfare-maximizing matches—a classic setup for Goodhart’s Law.

That said, it’s not even clear what we should be aligned to; the equilibria for maximizing personal life satisfaction for applicants, or maximizing total field impact, may differ.

If we want to make a credible claim to be impact-driven, it is the latter, the net public good, that needs to be prioritized, because “please donate so that a bunch of monied professionals can self-actualize” is not a great pitch.

For Individuals: Knowing the Game

For mid-career individuals, the decision is high-stakes. (For early-career individuals, costs $c$ are lower, making the gamble more favourable, but the need to estimate $p$ remains.)

Calculate your threshold ( $p^{*}$ ): Use the model in Part A (and the linked calculator). Without strong evidence that $p > p^{*}$ is true, a pivot involving significant unpaid time is likely EV-negative.
Seek cheap signals about whether you are in fact a unicorn: Look for personalized evidence of fit—such as applying to a few roles before leaving your current job—before committing significant resources.
Use grants as signals: Organizations like Open Philanthropy offer career transition grants. These serve as information gates. If received, a grant lowers the private cost ( $c$ ). If denied, it is a valuable calibration signal. If a major funder declines to underwrite the transition, candidates should update $p$ downwards. (If you don’t get that Open Phil transition grant, don’t quit your current job.)
Change the game by doing something other than applying for a job:
1. Achieving impact by getting AI Safety on the agenda at your current job (if it’s tech-related) is often overlooked. You can have a huge impact by influencing your current employer to take AI safety seriously, without needing to pivot careers.
2. Founding or joining a new organization can bring multiple roles into existence, reducing the need to compete for existing roles.

For Organizations: Transparency and Feedback

Employers and advice organizations control the information flow. Unless they provide evidence-based estimates of success probabilities, their generic encouragement should be treated with scepticism.

Publish stage-wise acceptance rates (Base Rates). Employers must publish historical data (applicants, interviews, offers) by track and seniority. This is the single most impactful intervention for anchoring $p$ .
Provide informative feedback and rank. Employers should provide standardized feedback or an indication of relative rank (e.g., “top quartile”). This feedback is costly, but this cost must be weighed against the significant systemic waste currently externalized onto applicants and the long-term credibility of the field.
Track advice calibration. Advice organizations should track and publish their forecast calibration (e.g., Brier scores) regarding candidate success. If an advice organization doesn’t track outcomes, its advice cannot be calibrated except by coincidence.

For the Field: Systemic Calibration

To optimize the funnel size, the field needs to measure costs and impact tails.

Estimate applicant costs ( $c ℓ$ ). Advice organizations or funders should survey applicants (successful and unsuccessful) to estimate typical pivot costs.
Track realized impact proxies. Employers should analyze historical cohorts to determine if widening the funnel is still yielding significantly better hires, or if returns are rapidly diminishing.
Experiment with mechanism design. In capacity-constrained rounds, implementing soft caps—pausing applications after a certain number—can reduce applicant-side waste without significantly harming match quality [@Horton2024Reducing].

Where next?

I’d like feedback from people deeper in the AI safety career ecosystem about what I’ve gotten wrong. Is the model here sophisticated enough to capture the main dynamics? I’d love to chat with people from 80,000 Hours, MATS, FHI, CHAI, Redwood Research, Anthropic, etc., about this. What is your model about the candidate impact distribution, the tail behaviour, and the costs? What have I got wrong? What have I missed? I’m open to the possibility that this is well understood and being actively managed behind the scenes, but I haven’t seen it laid out this way anywhere.

Appendices

See the original post.

↩︎
To be consistent we need to take this to be a local linear approximation at your current wage and impact level; so we are implicitly looking at marginal utility.