AI career advice orgs, prominently 80,000 Hours, encourage career
moves into AI safety roles, including mid‑career pivots. I analyse the
quality of this advice from the private satisfaction, public-good, and
counterfactual equilibrium perspectives, and learn the following
things.
Rational Failure: If you value personal, direct impact highly,
it can be rational to attempt a pivot that will probably fail
(e.g., ≪50% success chance).
Opacity: if we needed to work out whether our advice was
producing good or not, we would need data both on success rates
and candidate quality distributions, which are currently
unavailable.
Misalignment: The optimal success rate for the field
(maximizing total impact) differs from the optimal rate for
individuals (maximizing personal EV). The advice ecosystem appears
calibrated to neither.
Counterfactual impact: Counterfactually, the value of a pivot
is much lower than naïvely; in highly contested roles you need not
only to be the best but to be the best by a wide enough margin to
justify all the effort of the other people you are displacing.
Donations: If you donate at the moment and would pause
donations while taking a career sabbatical, that is very likely a
negative EV move in public goods terms. If the EV of a pivot is
uncertain, donating the attempt costs (e.g., the sabbatical
expenses you were willing to pay) provides a guaranteed positive
counterfactual impact and doesn’t require all this fancy
modelling.
The analysis here should not be astonishing; we know advice
calibration is hard, and we know that the collective and private goals
might be misaligned. Grinding through the details nonetheless reveals
some less obvious implications; for example not only are the number of
jobs and the number of applicants important, but your model of the
distribution of talent ends up having a huge effect. The fact that the
latter two factors—number of applicants and talent distribution
—are inscrutable to the people who make the risky choice to do the
career pivoting, is why I think that advice about careers is
miscalibrated.
The problem bites for mid-career professionals with high switching
costs. It is likely less severe for early career professionals who
have lower switching costs, or for people doing something other than
switching jobs (e.g. if you are starting up a new organisation the
model would be different). I propose some mitigations, both personal
and institutional. I made an interactive
widget to help
individuals evaluate their own pivots.
This was a draft for Draft Amnesty Week. I revisited it to edit for style
Epistemic status
A solid back-of-the-envelope analysis of some relatively
well-understood but frequently under-regarded concepts.
Here’s a napkin model of the AI-safety career-advice economy—or
rather, three models of increasing complexity. They show how advice can
sincerely recommend gambles that mostly fail, and why—without better
data—we can’t tell whether that failure rate is healthy (leading to
impact at low cost) or wasteful (potentially destroying happiness and
even impact). In other words, it’s hard to know whether our altruism is
“effective.”
In AI Safety in particular, there’s an extra credibility risk
idiosyncratic to this system. AI Safety is, loosely speaking, about
managing the risks of badly aligned mechanisms
producing perverse outcomes. As such, it’s particularly incumbent on
our field to avoid mechanisms that are badly aligned and produce
perverse outcomes; otherwise we aren’t taking our own risk model
seriously.
To keep things simple, we ignore as many complexities as possible.
We evaluate decisions in terms of cause impact, which we assume we
can price in donation-equivalent dollars. This is a public good.
Individual private goods are the candidate’s own gains and losses.
We assume the candidate’s job satisfaction and remuneration
(i.e. personal utility) can be costed in dollars.
Part A—Private career pivot decision
An uncertain career pivot is a gamble, so we model it the same way we
model other gambles.
Meet Alice
Alice is a senior software engineer in her mid-30s, making her a
mid-career professional. She has been donating roughly 10% of her
income to effective charities and now wonders whether to switch lanes
entirely to achieve impact via technical AI safety work in one of
those AI Safety jobs she’s seen
advertised She has saved
six months of runway funds to explore AI-safety roles—research
engineering, governance, or technical coordination. Each month out of
work costs her foregone income and reduced career prospects. Her
question is simple: Is this pivot worth the costs?
To build the model, Alice needs to estimate four things:
Upside
Annual Surplus (Δu): This is the key number. It’s the
difference in Alice’s total annual utility between the new AI
safety role (u1) and her current baseline (u0). This surplus
combines the change in her salary and her impact—indirectly via
donations and directly by doing some fancy AI safety job.
u=w+α(i+d), where w is wage, i is impact, d is
donations, and α is Alice’s weighting of impact versus
consumption.
Δu:=u1−u0.
Downside
Runway (ℓ): The maximum time she’s willing to try, in
years.
Burn Rate (c): This is Alice’s net opportunity cost per year
while on sabbatical (e.g., foregone pay, depleted savings), measured
in k$/year.
Odds
Application Rate (u1): The number of distinct job
opportunities she can apply for per year.
Success Probability (u0): Her average probability of getting
an offer from a single application. We assume these are independent
and identically distributed (i.i.d.).
Caution
The i.i.d. assumption (each job is independent) is likely optimistic.
In reality, applications are correlated: if Alice is a good fit for
one role, she’s likely a good fit for others (and vice-versa). We
formalise this in the next section with a notion of candidate quality
distributions that captures the notion that you don’t know your
“ranking” in the field, but most people are not in at the top of it,
by definition.
Timing
Discount Rate (ρ): A continuous rate per year that captures
her time preference. A higher ρ means she values immediate
gains more—for example, if she expects short AGI timelines, ρ
might be high.
Modeling the Sabbatical: The Decision Threshold
With these inputs, we can calculate the total expected value (EV) of her
sabbatical gamble. The full derivation is in Appendix
A, but here’s the result:
ΔEV_ρ(p)=1−e−(rp+ρ)ℓrp+ρ(Δurpρ−c).
This formula looks complex, but its logic is simple. The entire decision
hinges on the sign of the bracketed term: (Δurpρ−c) This is a direct comparison between the expected gain rate (the
upside Δu, multiplied by the success rate rp, and adjusted for
discounting 1/ρ) and the burn rate (c). The prefactor scales
that value according to the length of Alice’s runway and her discount
rate.
The EV is positive if and only if the gain rate exceeds the burn rate.
This means Alice’s decision boils down to a simple question: Is her
per-application success probability, p, high enough to make the gamble
worthwhile?
We can find the exact break-even probability, p∗, by setting the
gain rate equal to the burn rate. This gives a much simpler formula for
Alice’s decision threshold:
p∗=cρrΔu.
If Alice believes her actual p is greater than this p∗, the pivot
has a positive expected value. If p<p∗, she should not take the
sabbatical, at least on these terms.
What This Model Tells Alice
This simple threshold p∗ gives us a clear way to think about her
decision:
The bar gets higher: The threshold p∗ increases with higher
costs (c), shorter timelines, or greater impatience (ρ). If
her sabbatical is expensive or she’s in a hurry, she needs to be
more confident of success.
The bar gets lower: The threshold p∗ decreases with more
opportunities (r) or a higher upside (Δu). If the job
offers a massive impact gain or she can apply to many roles, she can
tolerate a lower chance of success on any single one.
Runway doesn’t change the threshold: Notice that the runway
length ℓ isn’t in the p∗ formula. A longer runway gives her
more expected value (or loss) if she does take the gamble, but
it doesn’t change the break-even probability itself.
The results are fragile to uncertainty: This model is highly
sensitive to her estimates. If she overestimates her potential
impact (a high Δu) or underestimates her time preference (a
low ρ), she’ll calculate a p∗ that is artificially low,
making the pivot look much safer than it is.[1]
The key unknown: Even with a perfectly calculated p∗, Alice
still faces the hardest part: estimating her own actual success
probability, p.
That p is, essentially, her chance of getting an offer. It depends not
only on the number of jobs available but crucially on the number and
quality of the other applicants.
All that said, this is a relatively “optimistic” model. If Alice
attaches a high value to getting her hands dirty in AI safety work, she
might be willing to accept a remarkably low p; we’ll see that in the
worked example. Hold that thought, though, because I’ll argue that this
personal decision rule can be pretty bad at maximizing total impact.
Caution
If you are using these calculations for real, be aware that our
heuristics are likely overestimating Alice’s chances. Job applications
are not IID. The effective number of independent shots is lower than
the raw application count, reducing effective r --- if your skills
don’t match the first job, it is also less likely to match the second,
because the jobs might be similar to each other.
Worked example
Let’s plug in some plausible numbers for Alice. She’s a successful
software engineer who earns w0=180k$/year, donates d0=18k$/year
after tax, and has no on-the-job impact I0=0 (i.e. no net
harm, no net good). That means Alice earns w0=180 and donates
d0=18. A target role offers w1=120, d1=0, and
I1=100. Set α=1, runway ℓ=0.5 years,
application rate r=24/year, discount ρ=1/3, and burn c=50. Then
Δu=(120+0+100)−(180+18+0)=22 and p∗=cρrΔu=50⋅1324⋅22≈3.16%. Over six months, the chance of at least one success at p∗ is
q∗=1−e−rp∗ℓ≈31.5%. Alice’s expected actual
sabbatical length is
E[τ]=1−e−rp∗ℓrp∗≈0.416years (≈5.0 months),
and, conditional on success, it’s
E[τ∣success]≈0.234years (≈2.8 months).
Under these assumptions, we expect the sabbatical to break even because
the job offers enough upside to compensate for a greater-than-even risk
of failure.
We plot a few values for Alice to visualize the trade-offs for different
upsides Δu.
In a world with a heavy-tailed distribution of candidate impact, the
field benefits from many attempts because a few “hits” dominate. In
light-tailed worlds, the same encouragement becomes destructive. We
simply don’t know which world we’re in.
So far, this has been Alice’s private perspective. Let’s zoom out to the
field level and consider: What if everyone followed Alice’s decision
rule? Is the resulting number of applicants healthy for the field?
What’s the optimal number of people who should try to pivot?
From personal gambles to field strategy
Our goal is to move beyond Alice’s private break-even (p∗) and
calculate the field’s welfare-maximizing applicant pool size (K∗).
This K∗ is how many “Alices” the field can afford to have rolling the
dice before the costs of failures outweigh the value of successes.
To analyze this, we must shift our model in three ways:
Switch to a Public Ledger: From a field-level perspective,
private wages and consumption are just transfers; they therefore
drop out of the analysis. What matters is the net production of
public goods (i.e., impact).
Distinguish Public vs. Private Costs: The costs are now
different.
Private Cost (Part A):c included Alice’s full opportunity
cost (foregone wages, etc.).
Public Cost (Part B): We now use γ, which captures
only the foregone public goods during a sabbatical (e.g.,
γ=I0+d0+ε, or baseline
impact + baseline donations + externalities).
Move from Dynamic Search to Static Contest: Instead of one
person’s dynamic search, we’ll use a static “snapshot” model of the
entire field for one year. We assume there are N open roles and
K total applicants that year.
Reconciling the individual and field-level Models
In Part A, Alice saw jobs arriving one-by-one (a Poisson process with
rate r). In Part B, we are modeling an annual “contest” with K
applicants competing for N jobs.
We can bridge these two views by setting N≈r. This treats
the entire year’s worth of job opportunities as a single “batch” to be
filled from the pool of K candidates who are “on the market” that
year.
This is a standard simplification. It allows us to stop worrying about
the timing of individual applications and focus on the quality of
the matches, which is determined by the size of the applicant pool
(K). We can then compare the total Present Value (PV) of the
benefits (better hires) against the total PV of the costs (failed
sabbaticals).
Here’s a paragraph to bridge those two concepts. I’d suggest placing
this just before the first plot in Part B, where you start to
visualize W(K).
If N jobs are available annually (which we’ve already equated to
Alice’s application rate r) and K total applicants are competing
for them, a simple approximation for the per-application success
probability is that it’s proportional to the ratio of jobs to
applicants.
For the rest of this analysis, we’ll assume a simple mapping:
p≈N/K. This allows us to plot both models on the same chart:
as the field becomes more crowded (K increases), the individual
chance of success (p) for any single application shrinks.
The Field-Level Model: Assumptions
Here’s the minimally complicated version of our new model:
There are K total applicants and N open roles per year.
Each applicant k has a true, fixed potential impact J(k)
drawn i.i.d. from the talent distribution F.
Employers perfectly observe J(k) and hire the N best
candidates. (This is a strong, optimistic assumption about hiring
efficiency).
Applicants do not know their own J(k), only the distribution
F.
The intuition is that the field benefits from a larger pool K because
it increases the chance of finding high-impact candidates. But the field
also pays a price for every failed applicant.
Benefits vs. Costs on the Public Ledger
Let’s define the two sides of the field’s welfare equation.
The Marginal Benefit (MV) of a Larger Pool
The benefit of a larger pool K is finding better candidates. We care
about the marginal value of adding one more applicant to the pool,
which we define as MVK. This is the expected annual
increase in impact from widening the pool from K to K+1. (Formally,
MVK:=E[SN,K+1]−E[SN,K], where
SN,K is the total impact of the top N hires from a pool of K).
The Marginal Cost (MC) of a Larger Pool
The cost is simpler. When K>N, adding one more applicant results (on
average) in one more failed pivot. This failed pivot costs the field the
foregone public good during the sabbatical. We define the social burn
rate per year as γ. To compare this to the annual benefit
MVK, we need the total present value of this foregone
impact. We call this Lfail,δ (the PV of one failed
attempt). (This cost is derived in Appendix B as
Lfail,δ=γ1−e−δℓδ).
We do not model employer congestion from reviewing lots of applicants
—the rationale is that it’s empirically small because employers stop
looking at candidates when they’re overwhelmed
[@Horton2021Jobseekers].[^2] Note, however, that we also assume
employers perfectly observe J(k), which means we’re being
optimistic about the field’s ability to sort candidates. Maybe we could
model a noisy search process?
Field-Level trade-offs
We can now find the optimal pool size K∗. The total public welfare
W(K) peaks when the marginal benefit of one more applicant equals the
marginal cost.
As derived in Appendix B, the total welfare
W(K) is maximized when the present value of the annual benefit
stream from the marginal applicant (MVK/δ) equals the
total present value of their failure cost (Lfail,δ).
MVKδ=Lfail,δ Substituting the expression for Lfail,δ and
cancelling the discount rate δ, we get a very clean threshold: MVK=γ(1−e−δℓ). This equation is the core of the field-level problem. The optimal
pool size K∗ is the point where the expected annual marginal
benefit (MVK) falls to the level of the total foregone
public good from a single failed sabbatical attempt.
The Importance of Tail Distributions
How quickly does MVK shrink? Extreme value
theory tells us it depends entirely on the
tail of the candidate-quality distribution, F. The shape of the tail
determines how quickly returns from widening the applicant pool
diminish.
We consider two families (the specific formulas are in Appendix
B):
Light tails (e.g.,
Exponential):
In this world, candidates vary, but the best isn’t transformatively
better than average. Returns diminish quickly: the marginal value
MVK shrinks hyperbolically (roughly as 1/K).
Heavy tails (e.g.,
Fréchet):
This captures the “unicorn” intuition. Returns diminish much more
slowly. If the tail is heavy enough, MVK decays
extremely slowly, justifying a very wide search.
Implications for Optimal Pool Size
This difference in diminishing returns hugely affects the optimal pool
size K∗. The full solutions for K∗ are in Appendix
B.
With light tails, there’s a finite pool size beyond which ramping
up the hype (increasing K) reduces net welfare. Every extra applicant
burns Lfail,δ in foregone public impact while adding an
MVK that shrinks rapidly.
With heavy tails, it’s different. As the tail gets heavier, K∗
explodes. In very heavy-tailed worlds, very wide funnels can still be
net positive. We may decide it’s worth, as a society, spending a lot of
resources to find the few unicorns.
We set the expected impact per hire per year to μimp=100
(impact dollars/yr) to match Alice’s hypothetical target role. This is
just for exposition.
We can, of course, plot this.
This plot shows total net welfare W(K) and marks the maximum
K∗ for each family, so we can see where total welfare peaks. The
dashed line at K=N shows where failures begin:
(K>N⇒K−N people each impose a public cost of
Lfail,δ). The markers show K∗=argmaxW(K), the
pool size beyond which widening further would reduce total impact.
Units: B(K) is in impact dollars per year and is converted to PV
by multiplying by Hδ=1δ. The subtraction uses
the discounted per-failure cost
Lfail,δ=γ1−e−δℓδ.
Fréchet curves use the large-K asymptotic
B(K)≈sK1/αCN (with
s=μimp/Γ(1−1/α)). We could get the exact
B(K) for Fréchet, but the asymptotic is good enough to illustrate
the qualitative behaviour.
We treat all future uncertainties about role duration, turnover, or
project lifespan as already captured in the overall discount rate
δ.
We can combine these perspectives to visualize the tension between
private incentives and public welfare.
This visualization combines the private and public views by assuming an
illustrative mapping from pool size to success probability:
p≈βN/K, where β bundles screening efficiency; in
this figure β=1. The black curve (left axis) shows a candidate’s
private expected value (EV) as a function of success probability p.
The coloured curves (right axis) show the field’s welfare, W(K). The
private break-even point p∗ (black dashed line) can fall far to the
left of the field-optimal point p(K∗) (coloured vertical lines). This
gap marks the region where individuals may be rationally incentivized,
at the field level, to enter even though, at the candidate level, the
field is already saturated or oversaturated.
<div>
None
If your applicant pool does not have heavy tails, widening the funnel
likely increases social loss.
</div>
Part C—Counterfactual Impact and Equilibrium
Part A modeled Alice’s pivot as a private gamble, where the upside was
the absolute impact (I1) of the new role. Part B zoomed
out, analyzing the field-level optimum (K∗) and showing how the
marginal value (MVK) of adding one more applicant shrinks
as the pool (K) grows.
Now we connect these two views. What if Alice is a sophisticated
applicant? She understands the field-level dynamics from Part B and
wants to maximize her true counterfactual impact—not just the
absolute impact of the role she fills. She must update her private EV
calculation from Part A.
This introduces a feedback loop: Alice’s personal incentive to apply now
depends on the crowd size (K) and the talent distribution (F), and
those factors in turn affect the crowd size.
The Counterfactual Impact Model
In Part A, Alice’s upside Δu included the absolute impact
I1. But her true impact is counterfactual: it’s the value
she adds compared to the next-best person who would have been hired if
she hadn’t applied.
How can she estimate this? She doesn’t know her own quality (J(k))
relative to the pool. If we assume she is, from the field’s perspective,
just one more “draw” from the talent distribution F (formally:
applicants are exchangeable), then her expected counterfactual impact
from the decision to apply is exactly the marginal value of adding
one more applicant: MVK (from Part B).
This MVK is her ex-ante expected impact before the
gamble is resolved. But her EV calculation (from Part A) needs the
upside conditional on success.
Let ICF be this value: the expected counterfactual impact
given she succeeds and gets the job. Let qK be her overall
probability of success in a pool of size K. (In the static model of
Part B, if she joins a pool of K others, there are K+1 applicants
for N slots, so qK≈N/(K+1)). If her attempt fails (with
probability 1−qK), her counterfactual impact is zero.
Therefore, the ex-ante expected impact MVK is simply the
probability of success multiplied by the value of that success:
MVK=(qK⋅ICF)+((1−qK)⋅0)=qK⋅ICF.
This gives us the key value Alice needs. Alice’s expected
counterfactual impact, conditional on success, is:
ICF=MVKqK.
Alice can now recalibrate her private decision from Part A. She defines
a counterfactual private surplus, ΔuCF, by replacing the
naive, absolute impact I1 with her sophisticated,
counterfactual estimate ICF.
This changes the dynamics entirely. In Part A, the value of the upside
(Δu) was fixed. Now, the value of the upside (ΔuCF)
itself depends on the pool size K.
As K grows, both MVK (the marginal value) and qK (the
success chance) decrease. How ICF behaves depends on
which of those decreases faster (the marginal value or the success
chance)---a property determined by the tail of the impact distribution
F.
The Dynamics of Counterfactual Impact
The behaviour of ICF leads to radically different
incentives depending on the tail shape of the talent pool.
Case 1: Light Tails
In a light-tailed world (e.g., an Exponential distribution), talent is
relatively clustered. The math shows (see Appendix: see
Appendix) that MVK and qK shrink at
roughly the same rate, causing their ratio to be constant.
ICF=μ(Light Tail)
(where μ is the population’s average impact).
Intuition: As the pool K grows, the quality of the Nth-best
hire—the person we displace—rises almost as fast as the quality of
the average successful hire (us). The gap between us and the person
we displace remains small and roughly constant. Our counterfactual
impact is just the average impact, μ.
Implication: If the average impact μ is modest,
ICF may not be enough to offset a large pay cut (like
Alice’s). In this world, we’d expect pivots with high private costs to
be EV-negative for the average applicant, regardless of how large the
pool gets.
Case 2: Heavy Tails
In a heavy-tailed world (e.g., Fréchet), “unicorns” with transformative
impact exist. Here, MVK shrinks much more slowly than
qK. As shown in the appendix, for a Fréchet distribution with shape
α, the result is:
ICF∝K1/α(Heavy Tail)
The expected counterfactual impact conditional on success actually
increases as the field gets more crowded.
Intuition: Success in a very large, competitive pool (K) is a
powerful signal. It suggests we aren’t just “good,” but potentially a
“unicorn.” We aren’t just displacing the Nth-best person; we’re
potentially displacing someone much further down the tail.
Implication: In this world, the rate of decay of the field value
with funnel size can be slow. The potential upside ICF
can grow large enough to easily offset significant private costs (like
pay cuts and foregone donations). As a corollary, if we believe in
unicorns, it can still make sense to risk large private costs with a low
chance of success in order to discover whether we are, in fact, a
unicorn.
Alice Revisited
Alice revisited. With light‑tailed assumptions, ICF
equals the population mean μ and is too small to offset Alice’s pay
cut and lost donations—Alice’s counterfactual surplus is negative
regardless of K. Under heavy‑tailed assumptions, ICF
rises with K; across a broad range of conditions, the pivot can become
attractive despite large pay cuts (i.e. if Alice might truly be a
unicorn). The sign and size of this effect hinge on the tail parameter
and scale, which are currently unmeasured.
Visualizing Private Incentives vs. Public Welfare
We can now visualize the dynamics of private, public, and counterfactual
private valuations by assuming an illustrative mapping between pool size
and success probability: p≈βN/K. This allows us to see how
the incentives change as the field gets more crowded (moving left on the
x-axis).
Visualizing the Misalignment
This visualization brings all three models together, using Alice’s
parameters to illustrate the tensions between private incentives and
public good. The plot is dense, but it reveals three key dynamics:
The Information Gap (A vs. C): The “Naive EV” (A, black solid
line) is far more optimistic than the “Counterfactual EV” (C, dashed
colored lines). An applicant using the simple model from Part
A—ignoring her displacement effect—will drastically
overestimate the personal EV of pivoting.
The Cost Barrier (C): In light-tailed worlds (Exponential,
purple; Fréchet α=3.0, red), the sophisticated applicant’s EV
is always negative. Alice’s financial losses (her 50k burn rate
plus 18k in foregone donations) dominate any plausible
counterfactual impact. Only in heavy-tailed “unicorn” worlds
(Fréchet α=2.0, green; α=1.8, orange) does the pivot
become EV-positive.
The Structural Misalignment (B vs. C): This is the core problem.
In the heavy-tailed worlds where pivoting is privately rational
(green, orange), the socially optimal pool size (K∗, the peak
of the solid welfare lines) is far smaller than the private
equilibrium (Keq, where the dashed EV lines cross zero). The
system incentivizes massive Over-Entry.
Why Does This Happen? The Misalignment Mechanism
The plot shows that a misalignment exists; our model explains why
it’s structural. It boils down to a conflict between the private cost
of trying and the social cost of failing.
We can identify the two different “stop” signals:
The Social Optimum (K∗): The field’s total welfare (Part B)
peaks when the marginal impact of one more applicant
(MVK) drops to equal the social cost of the applicant’s
(likely) failed attempt. For Alice, this cost is the public good she
stops producing during her sabbatical—primarily her donations.
Social Cost of Failure (γ):18k/year.
The Private Equilibrium (Keq): A sophisticated Alice
(Part C) stops applying when her private EV becomes zero. This
happens when the marginal impact she can expect to have
(MVK, which she shares with other applicants) drops to
equal her effective private cost hurdle per application. As
derived in Appendix C, this hurdle is cρr.
The field implicitly “wants” applicants to stop when the marginal impact
benefit drops below $18,000. But because the job search is so
efficient (high r), Alice’s private cost for one more “shot at the
prize” is only $690.
She is rationally incentivized to keep trying long after her entry is
creating a net social loss. This is what drives the massive gap in our
heavy-tailed model: the social optimum K∗ might be around 11,600, but
the private equilibrium Keq balloons to over 178,000.
This leads to a clear, if sobering, takeaway for a mid-career
professional like Alice. Given her high opportunity cost (including
foregone donations), donating is the robustly higher-EV optionunless she has strong, specific evidence that both (a) the talent pool
is extremely heavy-tailed and (b) she is likely to be one of the
“unicorns” in that tail.
Implications and Solutions
Our takeaway is that the AI safety career funnel, and likely other
high-impact career funnels, is miscalibrated; not in the sense that we
are definitely producing net harm right this minute, but in the sense
that the feedback mechanisms do not exist to make sure that we do not
produce net harm. It’s hard to know how bad this is, but Alice’s 18k
social cost versus the 690 private hurdle is worrisome if it is
typical.
That is to say, all this work on alignment is misaligned. Organizations
influencing the funnel size don’t internalize the costs borne by
unsuccessful applicants. This incentivizes maximizing application volume
(a visible proxy) rather than welfare-maximizing matches—a classic
setup for Goodhart’s Law.
That said, it’s not even clear what we should be aligned to; the
equilibria for maximizing personal life satisfaction for applicants, or
maximizing total field impact, may differ.
If we want to make a credible claim to be impact-driven, it is the
latter, the net public good, that needs to be prioritized, because
“please donate so that a bunch of monied professionals can
self-actualize” is not a great pitch.
For Individuals: Knowing the Game
For mid-career individuals, the decision is high-stakes. (For
early-career individuals, costs c are lower, making the gamble more
favourable, but the need to estimate p remains.)
Calculate your threshold (p∗): Use the model in Part A (and
the linked calculator). Without strong evidence that p>p∗ is
true, a pivot involving significant unpaid time is likely
EV-negative.
Seek cheap signals about whether you are in fact a unicorn: Look
for personalized evidence of fit—such as applying to a few roles
before leaving your current job—before committing significant
resources.
Use grants as signals: Organizations like Open Philanthropy
offer career transition grants. These serve as information gates. If
received, a grant lowers the private cost (c). If denied, it is a
valuable calibration signal. If a major funder declines to
underwrite the transition, candidates should update p downwards.
(If you don’t get that Open Phil transition grant, don’t quit your
current job.)
Change the game by doing something other than applying for a
job:
Achieving impact by getting AI Safety on the agenda at your
current job (if it’s tech-related) is often overlooked. You can
have a huge impact by influencing your current employer to take
AI safety seriously, without needing to pivot careers.
Founding or joining a new organization can bring multiple roles
into existence, reducing the need to compete for existing roles.
For Organizations: Transparency and Feedback
Employers and advice organizations control the information flow. Unless
they provide evidence-based estimates of success probabilities, their
generic encouragement should be treated with scepticism.
Publish stage-wise acceptance rates (Base Rates). Employers must
publish historical data (applicants, interviews, offers) by track
and seniority. This is the single most impactful intervention for
anchoring p.
Provide informative feedback and rank. Employers should provide
standardized feedback or an indication of relative rank (e.g., “top
quartile”). This feedback is costly, but this cost must be weighed
against the significant systemic waste currently externalized onto
applicants and the long-term credibility of the field.
Track advice calibration. Advice organizations should track and
publish their forecast calibration (e.g., Brier scores) regarding
candidate success. If an advice organization doesn’t track outcomes,
its advice cannot be calibrated except by coincidence.
For the Field: Systemic Calibration
To optimize the funnel size, the field needs to measure costs and impact
tails.
Estimate applicant costs (cℓ). Advice organizations or
funders should survey applicants (successful and unsuccessful) to
estimate typical pivot costs.
Track realized impact proxies. Employers should analyze
historical cohorts to determine if widening the funnel is still
yielding significantly better hires, or if returns are rapidly
diminishing.
Experiment with mechanism design. In capacity-constrained
rounds, implementing soft caps—pausing applications after a
certain number—can reduce applicant-side waste without
significantly harming match quality [@Horton2024Reducing].
Where next?
I’d like feedback from people deeper in the AI safety career ecosystem
about what I’ve gotten wrong. Is the model here sophisticated enough to
capture the main dynamics? I’d love to chat with people from 80,000
Hours, MATS, FHI, CHAI, Redwood Research, Anthropic, etc., about this.
What is your model about the candidate impact distribution, the tail
behaviour, and the costs? What have I got wrong? What have I missed? I’m
open to the possibility that this is well understood and being actively
managed behind the scenes, but I haven’t seen it laid out this way
anywhere.
Further reading
Resources that complement the mechanism-design view of the AI safety
career ecosystem:
SPAR AI—Safety Policy and Alignment
Research program. An example of a program that provides structured
training and, implicitly, some “negative previews” of the grind of
AI safety work.
MATS
retrospectives
—LessWrong. Transparency on acceptance rates, alumni experiences,
and obstacles faced in this training program.
Why not just send people to
Bluedot
on FieldBuilding Substack. A critique of naive funnel-building and
the hidden costs of over-sending candidates to “default” programs.
80,000 Hours career change
guides.
Practical content on managing costs, transition grants, and
opportunity cost—useful for calibrating c in the pivot-EV model.
Forecasting in personal
decisions ---
80k. Advice on making and updating stage-wise probability forecasts;
relevant to candidate calibration.
To be consistent we need to take this to be a local linear
approximation at your current wage and impact level; so we are
implicitly looking at marginal utility.
Advice to pivot into AI Safety is likely miscalibrated
Link post
This was a draft for Draft Amnesty Week. I revisited it to edit for style
Here’s a napkin model of the AI-safety career-advice economy—or rather, three models of increasing complexity. They show how advice can sincerely recommend gambles that mostly fail, and why—without better data—we can’t tell whether that failure rate is healthy (leading to impact at low cost) or wasteful (potentially destroying happiness and even impact). In other words, it’s hard to know whether our altruism is “effective.”
In AI Safety in particular, there’s an extra credibility risk idiosyncratic to this system. AI Safety is, loosely speaking, about managing the risks of badly aligned mechanisms producing perverse outcomes. As such, it’s particularly incumbent on our field to avoid mechanisms that are badly aligned and produce perverse outcomes; otherwise we aren’t taking our own risk model seriously.
To keep things simple, we ignore as many complexities as possible.
We evaluate decisions in terms of cause impact, which we assume we can price in donation-equivalent dollars. This is a public good.
Individual private goods are the candidate’s own gains and losses. We assume the candidate’s job satisfaction and remuneration (i.e. personal utility) can be costed in dollars.
Part A—Private career pivot decision
An uncertain career pivot is a gamble, so we model it the same way we model other gambles.
To build the model, Alice needs to estimate four things:
Upside
Annual Surplus (Δu): This is the key number. It’s the difference in Alice’s total annual utility between the new AI safety role (u1) and her current baseline (u0). This surplus combines the change in her salary and her impact—indirectly via donations and directly by doing some fancy AI safety job.
u=w+α(i+d), where w is wage, i is impact, d is donations, and α is Alice’s weighting of impact versus consumption.
Δu:=u1−u0.
Downside
Runway (ℓ): The maximum time she’s willing to try, in years.
Burn Rate (c): This is Alice’s net opportunity cost per year while on sabbatical (e.g., foregone pay, depleted savings), measured in k$/year.
Odds
Application Rate (u1): The number of distinct job opportunities she can apply for per year.
Success Probability (u0): Her average probability of getting an offer from a single application. We assume these are independent and identically distributed (i.i.d.).
Timing
Discount Rate (ρ): A continuous rate per year that captures her time preference. A higher ρ means she values immediate gains more—for example, if she expects short AGI timelines, ρ might be high.
Modeling the Sabbatical: The Decision Threshold
With these inputs, we can calculate the total expected value (EV) of her sabbatical gamble. The full derivation is in Appendix A, but here’s the result:
ΔEV_ρ(p)=1−e−(rp+ρ)ℓrp+ρ(Δurpρ−c).
This formula looks complex, but its logic is simple. The entire decision hinges on the sign of the bracketed term: (Δurpρ−c) This is a direct comparison between the expected gain rate (the upside Δu, multiplied by the success rate rp, and adjusted for discounting 1/ρ) and the burn rate (c). The prefactor scales that value according to the length of Alice’s runway and her discount rate.
The EV is positive if and only if the gain rate exceeds the burn rate. This means Alice’s decision boils down to a simple question: Is her per-application success probability, p, high enough to make the gamble worthwhile?
We can find the exact break-even probability, p∗, by setting the gain rate equal to the burn rate. This gives a much simpler formula for Alice’s decision threshold:
p∗=cρrΔu.
If Alice believes her actual p is greater than this p∗, the pivot has a positive expected value. If p<p∗, she should not take the sabbatical, at least on these terms.
What This Model Tells Alice
This simple threshold p∗ gives us a clear way to think about her decision:
The bar gets higher: The threshold p∗ increases with higher costs (c), shorter timelines, or greater impatience (ρ). If her sabbatical is expensive or she’s in a hurry, she needs to be more confident of success.
The bar gets lower: The threshold p∗ decreases with more opportunities (r) or a higher upside (Δu). If the job offers a massive impact gain or she can apply to many roles, she can tolerate a lower chance of success on any single one.
Runway doesn’t change the threshold: Notice that the runway length ℓ isn’t in the p∗ formula. A longer runway gives her more expected value (or loss) if she does take the gamble, but it doesn’t change the break-even probability itself.
The results are fragile to uncertainty: This model is highly sensitive to her estimates. If she overestimates her potential impact (a high Δu) or underestimates her time preference (a low ρ), she’ll calculate a p∗ that is artificially low, making the pivot look much safer than it is.[1]
The key unknown: Even with a perfectly calculated p∗, Alice still faces the hardest part: estimating her own actual success probability, p.
That p is, essentially, her chance of getting an offer. It depends not only on the number of jobs available but crucially on the number and quality of the other applicants.
All that said, this is a relatively “optimistic” model. If Alice attaches a high value to getting her hands dirty in AI safety work, she might be willing to accept a remarkably low p; we’ll see that in the worked example. Hold that thought, though, because I’ll argue that this personal decision rule can be pretty bad at maximizing total impact.
Worked example
Let’s plug in some plausible numbers for Alice. She’s a successful software engineer who earns w0=180k$/year, donates d0=18k$/year after tax, and has no on-the-job impact I0=0 (i.e. no net harm, no net good). That means Alice earns w0=180 and donates d0=18. A target role offers w1=120, d1=0, and I1=100. Set α=1, runway ℓ=0.5 years, application rate r=24/year, discount ρ=1/3, and burn c=50. Then Δu=(120+0+100)−(180+18+0)=22 and p∗=cρrΔu=50⋅1324⋅22≈3.16%. Over six months, the chance of at least one success at p∗ is q∗=1−e−rp∗ℓ≈31.5%. Alice’s expected actual sabbatical length is E[τ]=1−e−rp∗ℓrp∗≈0.416 years (≈5.0 months), and, conditional on success, it’s E[τ∣success]≈0.234 years (≈2.8 months). Under these assumptions, we expect the sabbatical to break even because the job offers enough upside to compensate for a greater-than-even risk of failure.
We plot a few values for Alice to visualize the trade-offs for different upsides Δu.
To play around with the assumptions, try the interactive Pivot EV Calculator (source at danmackinlay/career_pivot_calculator).
Part B—Field-level model
So far, this has been Alice’s private perspective. Let’s zoom out to the field level and consider: What if everyone followed Alice’s decision rule? Is the resulting number of applicants healthy for the field? What’s the optimal number of people who should try to pivot?
From personal gambles to field strategy
Our goal is to move beyond Alice’s private break-even (p∗) and calculate the field’s welfare-maximizing applicant pool size (K∗). This K∗ is how many “Alices” the field can afford to have rolling the dice before the costs of failures outweigh the value of successes.
To analyze this, we must shift our model in three ways:
Switch to a Public Ledger: From a field-level perspective, private wages and consumption are just transfers; they therefore drop out of the analysis. What matters is the net production of public goods (i.e., impact).
Distinguish Public vs. Private Costs: The costs are now different.
Private Cost (Part A): c included Alice’s full opportunity cost (foregone wages, etc.).
Public Cost (Part B): We now use γ, which captures only the foregone public goods during a sabbatical (e.g., γ=I0+d0+ε, or baseline impact + baseline donations + externalities).
Move from Dynamic Search to Static Contest: Instead of one person’s dynamic search, we’ll use a static “snapshot” model of the entire field for one year. We assume there are N open roles and K total applicants that year.
The Field-Level Model: Assumptions
Here’s the minimally complicated version of our new model:
There are K total applicants and N open roles per year.
Each applicant k has a true, fixed potential impact J(k) drawn i.i.d. from the talent distribution F.
Employers perfectly observe J(k) and hire the N best candidates. (This is a strong, optimistic assumption about hiring efficiency).
Applicants do not know their own J(k), only the distribution F.
The intuition is that the field benefits from a larger pool K because it increases the chance of finding high-impact candidates. But the field also pays a price for every failed applicant.
Benefits vs. Costs on the Public Ledger
Let’s define the two sides of the field’s welfare equation.
The Marginal Benefit (MV) of a Larger Pool
The benefit of a larger pool K is finding better candidates. We care about the marginal value of adding one more applicant to the pool, which we define as MVK. This is the expected annual increase in impact from widening the pool from K to K+1. (Formally, MVK:=E[SN,K+1]−E[SN,K], where SN,K is the total impact of the top N hires from a pool of K).
The Marginal Cost (MC) of a Larger Pool
The cost is simpler. When K>N, adding one more applicant results (on average) in one more failed pivot. This failed pivot costs the field the foregone public good during the sabbatical. We define the social burn rate per year as γ. To compare this to the annual benefit MVK, we need the total present value of this foregone impact. We call this Lfail,δ (the PV of one failed attempt). (This cost is derived in Appendix B as Lfail,δ=γ1−e−δℓδ).
We do not model employer congestion from reviewing lots of applicants —the rationale is that it’s empirically small because employers stop looking at candidates when they’re overwhelmed [@Horton2021Jobseekers].[^2] Note, however, that we also assume employers perfectly observe J(k), which means we’re being optimistic about the field’s ability to sort candidates. Maybe we could model a noisy search process?
Field-Level trade-offs
We can now find the optimal pool size K∗. The total public welfare W(K) peaks when the marginal benefit of one more applicant equals the marginal cost.
As derived in Appendix B, the total welfare W(K) is maximized when the present value of the annual benefit stream from the marginal applicant (MVK/δ) equals the total present value of their failure cost (Lfail,δ). MVKδ=Lfail,δ Substituting the expression for Lfail,δ and cancelling the discount rate δ, we get a very clean threshold: MVK=γ(1−e−δℓ). This equation is the core of the field-level problem. The optimal pool size K∗ is the point where the expected annual marginal benefit (MVK) falls to the level of the total foregone public good from a single failed sabbatical attempt.
The Importance of Tail Distributions
How quickly does MVK shrink? Extreme value theory tells us it depends entirely on the tail of the candidate-quality distribution, F. The shape of the tail determines how quickly returns from widening the applicant pool diminish.
We consider two families (the specific formulas are in Appendix B):
Light tails (e.g., Exponential): In this world, candidates vary, but the best isn’t transformatively better than average. Returns diminish quickly: the marginal value MVK shrinks hyperbolically (roughly as 1/K).
Heavy tails (e.g., Fréchet): This captures the “unicorn” intuition. Returns diminish much more slowly. If the tail is heavy enough, MVK decays extremely slowly, justifying a very wide search.
Implications for Optimal Pool Size
This difference in diminishing returns hugely affects the optimal pool size K∗. The full solutions for K∗ are in Appendix B.
With light tails, there’s a finite pool size beyond which ramping up the hype (increasing K) reduces net welfare. Every extra applicant burns Lfail,δ in foregone public impact while adding an MVK that shrinks rapidly.
With heavy tails, it’s different. As the tail gets heavier, K∗ explodes. In very heavy-tailed worlds, very wide funnels can still be net positive. We may decide it’s worth, as a society, spending a lot of resources to find the few unicorns.
We set the expected impact per hire per year to μimp=100 (impact dollars/yr) to match Alice’s hypothetical target role. This is just for exposition.
We can, of course, plot this.
This plot shows total net welfare W(K) and marks the maximum K∗ for each family, so we can see where total welfare peaks. The dashed line at K=N shows where failures begin: (K>N⇒K−N people each impose a public cost of Lfail,δ). The markers show K∗=argmaxW(K), the pool size beyond which widening further would reduce total impact.
Units: B(K) is in impact dollars per year and is converted to PV by multiplying by Hδ=1δ. The subtraction uses the discounted per-failure cost Lfail,δ=γ1−e−δℓδ.
Fréchet curves use the large-K asymptotic B(K)≈sK1/αCN (with s=μimp/Γ(1−1/α)). We could get the exact B(K) for Fréchet, but the asymptotic is good enough to illustrate the qualitative behaviour.
We treat all future uncertainties about role duration, turnover, or project lifespan as already captured in the overall discount rate δ.
We can combine these perspectives to visualize the tension between private incentives and public welfare.
This visualization combines the private and public views by assuming an illustrative mapping from pool size to success probability: p≈βN/K, where β bundles screening efficiency; in this figure β=1. The black curve (left axis) shows a candidate’s private expected value (EV) as a function of success probability p. The coloured curves (right axis) show the field’s welfare, W(K). The private break-even point p∗ (black dashed line) can fall far to the left of the field-optimal point p(K∗) (coloured vertical lines). This gap marks the region where individuals may be rationally incentivized, at the field level, to enter even though, at the candidate level, the field is already saturated or oversaturated.
<div>
</div>
Part C—Counterfactual Impact and Equilibrium
Part A modeled Alice’s pivot as a private gamble, where the upside was the absolute impact (I1) of the new role. Part B zoomed out, analyzing the field-level optimum (K∗) and showing how the marginal value (MVK) of adding one more applicant shrinks as the pool (K) grows.
Now we connect these two views. What if Alice is a sophisticated applicant? She understands the field-level dynamics from Part B and wants to maximize her true counterfactual impact—not just the absolute impact of the role she fills. She must update her private EV calculation from Part A.
This introduces a feedback loop: Alice’s personal incentive to apply now depends on the crowd size (K) and the talent distribution (F), and those factors in turn affect the crowd size.
The Counterfactual Impact Model
In Part A, Alice’s upside Δu included the absolute impact I1. But her true impact is counterfactual: it’s the value she adds compared to the next-best person who would have been hired if she hadn’t applied.
How can she estimate this? She doesn’t know her own quality (J(k)) relative to the pool. If we assume she is, from the field’s perspective, just one more “draw” from the talent distribution F (formally: applicants are exchangeable), then her expected counterfactual impact from the decision to apply is exactly the marginal value of adding one more applicant: MVK (from Part B).
This MVK is her ex-ante expected impact before the gamble is resolved. But her EV calculation (from Part A) needs the upside conditional on success.
Let ICF be this value: the expected counterfactual impact given she succeeds and gets the job. Let qK be her overall probability of success in a pool of size K. (In the static model of Part B, if she joins a pool of K others, there are K+1 applicants for N slots, so qK≈N/(K+1)). If her attempt fails (with probability 1−qK), her counterfactual impact is zero.
Therefore, the ex-ante expected impact MVK is simply the probability of success multiplied by the value of that success:
MVK=(qK⋅ICF)+((1−qK)⋅0)=qK⋅ICF.
This gives us the key value Alice needs. Alice’s expected counterfactual impact, conditional on success, is:
ICF=MVKqK.
Alice can now recalibrate her private decision from Part A. She defines a counterfactual private surplus, ΔuCF, by replacing the naive, absolute impact I1 with her sophisticated, counterfactual estimate ICF.
This changes the dynamics entirely. In Part A, the value of the upside (Δu) was fixed. Now, the value of the upside (ΔuCF) itself depends on the pool size K.
As K grows, both MVK (the marginal value) and qK (the success chance) decrease. How ICF behaves depends on which of those decreases faster (the marginal value or the success chance)---a property determined by the tail of the impact distribution F.
The Dynamics of Counterfactual Impact
The behaviour of ICF leads to radically different incentives depending on the tail shape of the talent pool.
Case 1: Light Tails
In a light-tailed world (e.g., an Exponential distribution), talent is relatively clustered. The math shows (see Appendix: see Appendix) that MVK and qK shrink at roughly the same rate, causing their ratio to be constant.
ICF=μ(Light Tail)
(where μ is the population’s average impact).
Intuition: As the pool K grows, the quality of the Nth-best hire—the person we displace—rises almost as fast as the quality of the average successful hire (us). The gap between us and the person we displace remains small and roughly constant. Our counterfactual impact is just the average impact, μ.
Implication: If the average impact μ is modest, ICF may not be enough to offset a large pay cut (like Alice’s). In this world, we’d expect pivots with high private costs to be EV-negative for the average applicant, regardless of how large the pool gets.
Case 2: Heavy Tails
In a heavy-tailed world (e.g., Fréchet), “unicorns” with transformative impact exist. Here, MVK shrinks much more slowly than qK. As shown in the appendix, for a Fréchet distribution with shape α, the result is:
ICF∝K1/α(Heavy Tail)
The expected counterfactual impact conditional on success actually increases as the field gets more crowded.
Intuition: Success in a very large, competitive pool (K) is a powerful signal. It suggests we aren’t just “good,” but potentially a “unicorn.” We aren’t just displacing the Nth-best person; we’re potentially displacing someone much further down the tail.
Implication: In this world, the rate of decay of the field value with funnel size can be slow. The potential upside ICF can grow large enough to easily offset significant private costs (like pay cuts and foregone donations). As a corollary, if we believe in unicorns, it can still make sense to risk large private costs with a low chance of success in order to discover whether we are, in fact, a unicorn.
Alice Revisited
Alice revisited. With light‑tailed assumptions, ICF equals the population mean μ and is too small to offset Alice’s pay cut and lost donations—Alice’s counterfactual surplus is negative regardless of K. Under heavy‑tailed assumptions, ICF rises with K; across a broad range of conditions, the pivot can become attractive despite large pay cuts (i.e. if Alice might truly be a unicorn). The sign and size of this effect hinge on the tail parameter and scale, which are currently unmeasured.
Visualizing Private Incentives vs. Public Welfare
We can now visualize the dynamics of private, public, and counterfactual private valuations by assuming an illustrative mapping between pool size and success probability: p≈βN/K. This allows us to see how the incentives change as the field gets more crowded (moving left on the x-axis).
Visualizing the Misalignment
This visualization brings all three models together, using Alice’s parameters to illustrate the tensions between private incentives and public good. The plot is dense, but it reveals three key dynamics:
The Information Gap (A vs. C): The “Naive EV” (A, black solid line) is far more optimistic than the “Counterfactual EV” (C, dashed colored lines). An applicant using the simple model from Part A—ignoring her displacement effect—will drastically overestimate the personal EV of pivoting.
The Cost Barrier (C): In light-tailed worlds (Exponential, purple; Fréchet α=3.0, red), the sophisticated applicant’s EV is always negative. Alice’s financial losses (her 50k burn rate plus 18k in foregone donations) dominate any plausible counterfactual impact. Only in heavy-tailed “unicorn” worlds (Fréchet α=2.0, green; α=1.8, orange) does the pivot become EV-positive.
The Structural Misalignment (B vs. C): This is the core problem. In the heavy-tailed worlds where pivoting is privately rational (green, orange), the socially optimal pool size (K∗, the peak of the solid welfare lines) is far smaller than the private equilibrium (Keq, where the dashed EV lines cross zero). The system incentivizes massive Over-Entry.
Why Does This Happen? The Misalignment Mechanism
The plot shows that a misalignment exists; our model explains why it’s structural. It boils down to a conflict between the private cost of trying and the social cost of failing.
We can identify the two different “stop” signals:
The Social Optimum (K∗): The field’s total welfare (Part B) peaks when the marginal impact of one more applicant (MVK) drops to equal the social cost of the applicant’s (likely) failed attempt. For Alice, this cost is the public good she stops producing during her sabbatical—primarily her donations.
Social Cost of Failure (γ): 18k/year.
The Private Equilibrium (Keq): A sophisticated Alice (Part C) stops applying when her private EV becomes zero. This happens when the marginal impact she can expect to have (MVK, which she shares with other applicants) drops to equal her effective private cost hurdle per application. As derived in Appendix C, this hurdle is cρr.
Private Cost Hurdle (cρr): 50k⋅1/324≈0.69k (or $690).
This is the misalignment.
The field implicitly “wants” applicants to stop when the marginal impact benefit drops below $18,000. But because the job search is so efficient (high r), Alice’s private cost for one more “shot at the prize” is only $690.
She is rationally incentivized to keep trying long after her entry is creating a net social loss. This is what drives the massive gap in our heavy-tailed model: the social optimum K∗ might be around 11,600, but the private equilibrium Keq balloons to over 178,000.
This leads to a clear, if sobering, takeaway for a mid-career professional like Alice. Given her high opportunity cost (including foregone donations), donating is the robustly higher-EV option unless she has strong, specific evidence that both (a) the talent pool is extremely heavy-tailed and (b) she is likely to be one of the “unicorns” in that tail.
Implications and Solutions
Our takeaway is that the AI safety career funnel, and likely other high-impact career funnels, is miscalibrated; not in the sense that we are definitely producing net harm right this minute, but in the sense that the feedback mechanisms do not exist to make sure that we do not produce net harm. It’s hard to know how bad this is, but Alice’s 18k social cost versus the 690 private hurdle is worrisome if it is typical.
That is to say, all this work on alignment is misaligned. Organizations influencing the funnel size don’t internalize the costs borne by unsuccessful applicants. This incentivizes maximizing application volume (a visible proxy) rather than welfare-maximizing matches—a classic setup for Goodhart’s Law.
That said, it’s not even clear what we should be aligned to; the equilibria for maximizing personal life satisfaction for applicants, or maximizing total field impact, may differ.
If we want to make a credible claim to be impact-driven, it is the latter, the net public good, that needs to be prioritized, because “please donate so that a bunch of monied professionals can self-actualize” is not a great pitch.
For Individuals: Knowing the Game
For mid-career individuals, the decision is high-stakes. (For early-career individuals, costs c are lower, making the gamble more favourable, but the need to estimate p remains.)
Calculate your threshold (p∗): Use the model in Part A (and the linked calculator). Without strong evidence that p>p∗ is true, a pivot involving significant unpaid time is likely EV-negative.
Seek cheap signals about whether you are in fact a unicorn: Look for personalized evidence of fit—such as applying to a few roles before leaving your current job—before committing significant resources.
Use grants as signals: Organizations like Open Philanthropy offer career transition grants. These serve as information gates. If received, a grant lowers the private cost (c). If denied, it is a valuable calibration signal. If a major funder declines to underwrite the transition, candidates should update p downwards. (If you don’t get that Open Phil transition grant, don’t quit your current job.)
Change the game by doing something other than applying for a job:
Achieving impact by getting AI Safety on the agenda at your current job (if it’s tech-related) is often overlooked. You can have a huge impact by influencing your current employer to take AI safety seriously, without needing to pivot careers.
Founding or joining a new organization can bring multiple roles into existence, reducing the need to compete for existing roles.
For Organizations: Transparency and Feedback
Employers and advice organizations control the information flow. Unless they provide evidence-based estimates of success probabilities, their generic encouragement should be treated with scepticism.
Publish stage-wise acceptance rates (Base Rates). Employers must publish historical data (applicants, interviews, offers) by track and seniority. This is the single most impactful intervention for anchoring p.
Provide informative feedback and rank. Employers should provide standardized feedback or an indication of relative rank (e.g., “top quartile”). This feedback is costly, but this cost must be weighed against the significant systemic waste currently externalized onto applicants and the long-term credibility of the field.
Track advice calibration. Advice organizations should track and publish their forecast calibration (e.g., Brier scores) regarding candidate success. If an advice organization doesn’t track outcomes, its advice cannot be calibrated except by coincidence.
For the Field: Systemic Calibration
To optimize the funnel size, the field needs to measure costs and impact tails.
Estimate applicant costs (cℓ). Advice organizations or funders should survey applicants (successful and unsuccessful) to estimate typical pivot costs.
Track realized impact proxies. Employers should analyze historical cohorts to determine if widening the funnel is still yielding significantly better hires, or if returns are rapidly diminishing.
Experiment with mechanism design. In capacity-constrained rounds, implementing soft caps—pausing applications after a certain number—can reduce applicant-side waste without significantly harming match quality [@Horton2024Reducing].
Where next?
I’d like feedback from people deeper in the AI safety career ecosystem about what I’ve gotten wrong. Is the model here sophisticated enough to capture the main dynamics? I’d love to chat with people from 80,000 Hours, MATS, FHI, CHAI, Redwood Research, Anthropic, etc., about this. What is your model about the candidate impact distribution, the tail behaviour, and the costs? What have I got wrong? What have I missed? I’m open to the possibility that this is well understood and being actively managed behind the scenes, but I haven’t seen it laid out this way anywhere.
Further reading
Resources that complement the mechanism-design view of the AI safety career ecosystem:
Christopher Clay, AI Safety’s Talent Pipeline is Over-optimised for Researchers
AI Safety Field Growth Analysis 2025
Why experienced professionals fail to land high-impact roles Context deficits and transition traps that explain why even strong senior hires often bounce out of the AI safety funnel.
Levelling Up in AI Safety Research Engineering —EA Forum. A practical upskilling roadmap; complements the “lower c, raise V, raise p” levers by reducing risk before a pivot.
SPAR AI—Safety Policy and Alignment Research program. An example of a program that provides structured training and, implicitly, some “negative previews” of the grind of AI safety work.
MATS retrospectives —LessWrong. Transparency on acceptance rates, alumni experiences, and obstacles faced in this training program.
Why not just send people to Bluedot on FieldBuilding Substack. A critique of naive funnel-building and the hidden costs of over-sending candidates to “default” programs.
How Stuart Russell’s IASEAI conference failed to live up to its potential (FBB #8) —EA Forum. A cautionary tale about how even well-intentioned field-building efforts can misfire without mechanism design.
80,000 Hours career change guides. Practical content on managing costs, transition grants, and opportunity cost—useful for calibrating c in the pivot-EV model.
Forecasting in personal decisions --- 80k. Advice on making and updating stage-wise probability forecasts; relevant to candidate calibration.
AI safety technical research—Career review
Updates to our research about AI risk and careers − 80,000 Hours
The case for taking your technical expertise to the field of AI policy − 80,000 Hours
Center for the Alignment of AI Alignment Centers. A painfully relatable satire that deserves citing here.
AMA: Ask Career Advisors Anything—EA Forum
Appendices
See the original post.
To be consistent we need to take this to be a local linear approximation at your current wage and impact level; so we are implicitly looking at marginal utility.