There are at least four ways of thinking about replaceability:
The naive view, where your impact is the value you produced using some amount of resources.
The single comparison view, where your impact is the value you produced using some amount of resources, minus the value someone else wouldâve produced using those resources had you not done so.
The replacement view, where your impact is the value you produced using some amount of resources, minus the value the âreplacement-level personâ of that reference group wouldâve produced using those resources. The replacement-level person is the person whoâs only barely talented enough to enter a field.
The God view, where your impact is the total utility in the world where you do the thing, minus the total utility in the world where you donât do the thing. I assume for the purposes of this post that this is the normatively correct view, in order to have a benchmark.
Key takeaways:
The replacement view may avoid some practical failure modes of the naive and single comparison views.
Iâm sure the naive and single comparison views are wrong, but Iâm less sure whether that makes a large difference for peopleâs decisions in practice. I assign 20% credence to the claim âthe naive and single comparison views lead people to take avoidably and substantially wrong career decisions â„10% of the timeâ.
Maybe we should consider adopting the replacement view.
Iâm pretty sure the replacement view is more accurate than the naive and single comparison views, but Iâm less sure whether itâd improve peopleâs decisions in practice. I assign 15% credence to the claim âpeople aiming to do good with their careers would have noticeably more impact were they to use the replacement viewâ.
That said, there are complicated effects to account for that are beyond the scope of the replacement view, e.g. supply and demand elasticities and lag times for changes to percolate through a system.
The evidence in this post comes mainly from, in decreasing order of importance, (i) me reasoning about the problem, (ii) me doing some Monte Carlo simulations and (iii) somewhat analogous methods being used in sports.
The Monte Carlo simulations test how often views (1), (2) and (3) lead to the same career choice as view (4) under four idealised scenarios.
The naive and single comparison views do badly under some scenarios, but the replacement view does well under all scenarios.
The simulations make a lot of assumptions (see the Appendix for more) in order to more easily model the problem, and so should be taken with a grain of salt.
What Replaceability Is
You want to do some good in the world. But to do good you need resources, and some of those will be in limited supply. If you use scarce resources, that means someone else wonât. Now they canât do as much good as they couldâve. So how should you decide which scarce resources to use, or whether to use them at all?
Replaceability is a partial answer to this. Replaceability is usually taken to mean something like âthe extent to which someone else would do what youâd do in a job if you donât take the jobâ. I interpret it more generally, as âthe extent to which other people wouldâve produced similar value with a scarce resource had you not used itâ.[1]
The paradigmatic example is choosing a career: if youâre thinking of becoming a doctor to save hundreds of lives, you should perhaps keep in mind that someone else mightâve saved those lives had you chosen a different career. If so, the doctor profession is highly replaceable. But this is just one of many situations where replaceability may be a factor:
Decision
Scarce resource
Choosing a job
Salary, opportunities for direct impact and support from employer and colleagues
Asking someone to be a mentor
Mentorâs time and energy
Applying for a grant
Grant, grantmakerâs time and grantmakerâs connections
Marketing a resource to the community
Prestige and community attention
Four Views on Impact
We can look at impact in at least four different ways:
The naive view. Your impact is the value you produced directly and indirectly using some amount of resources.
This view, which I think is common outside effective altruism, is incorrect because it fails to consider the counterfactual.
The single comparison view. Your impact is the value you produced directly and indirectly using some amount of resources, minus the value someone else wouldâve produced using those same resources had you not done so.
This view, which I think is common within effective altruism, is incorrect because it fails to consider trickle-down effects. That is, the person who replaces you wouldâve been replaced by someone less able, who in turn wouldâve been replaced by someone less able, and so on. Each of these steps may involve an additional loss of impact.
The replacement view. Your impact is the value you produced directly and indirectly using some amount of resources, minus the value the âreplacement-level personâ of that reference group wouldâve produced using those same resources.
This is an effective altruist variant of what sabermetricians call Wins Above Replacement, a stat that aims to measure how many more wins a baseball player contributes to a team than the replacement-level player. The replacement-level player is the substitute who wouldâve been called up to join the team had the player being measured not participated.
The replacement view seems confused, at least as a parallel to sports, because in sports itâs designed to compare people, not actions.
Benjamin Todd proposes something similar: that your impact is the value you produce, minus the value someone else wouldâve produced in your position, plus the value of any additionally freed-up resources. But it seems hard to me to estimate the value of those freed-up resources â it feels like punting.
The God view. Your impact is the value produced by everyone (including you) in the world where you use some amount of resources, minus the value that wouldâve been produced by everyone in the world where you didnât use those resources. (Because God is omniscient.)
This view is perhaps normatively correct but requires perfect information and computing power.
Benjamin Todd, Paul Christiano and others have thought and written about replaceability, but I think itâs fair to say no one has reached any definitive conclusion: itâs a hard problem. Todd recommends focusing on other, more robustly predictable factors, like personal fit and scale and solvability, when choosing a career. In fact, thereâs been remarkably little discussion about replaceability in the past few years, I think partly because people have realised that replaceability differs less across career options than those other things (personal fit, etc.), and partly because people have tactically retreated from a difficult problem.
I think this is somewhat unfortunate, as replaceability seems to me to be decision-relevant and somewhat tractable, even if not as important as the problems Todd and Christiano have moved on to.
Why Does Replaceability Matter?
There are ~1M licensed physicians in the US alone, which is ~2 orders of magnitude more than the number of effective altruists globally. Iâd expect there to be a difference of â„1 order of magnitude in the number of people working on different problems within effective altruism too. If the number of people working in a field affects replaceability in a way that matters when estimating counterfactual impact, it seems important to know how it matters.
To imagine a rather extreme scenario, I think itâs easy for someone whoâs choosing between two jobs, one in a field with lots of people and one that basically only they can do, to have heard of replaceability and think, âThough this thing that only I can do seems less important than this thing that many other people are doing, if I donât work on the thing that many people are doing, someone else nearly as good as me would take my place, so instead I should work on this thing that only I can do.â But if something like the replacement view is a better representation of the truth than the single comparison view, this person may be making a grievous error!
Sports seems somewhat analogous here. Sports teams are also making decisions about how to allocate scarce resources (e.g. playing time), and they do so by explicitly considering replaceability. (Itâs easier for them; they have good measures of how teams and players perform.)
For example, suppose a hockey team is deciding whether to sign a star left winger or equally talented star right winger. Suppose currently its top left winger is near star calibre, whereas its top right winger is middling. Suppose its worst left winger is so bad as to be a liability, whereas its worst right winger is pretty good. If the team only compared each potential addition to the player whose spot theyâd take, theyâd go with the star right winger (whoâs much better than the currently best right winger on the team, whoâs middling). But it may be better for the team to sign the star left winger (to get rid of the marginal left winger, whoâs a liability).[2]
I think itâs currently unclear how we are looking or should look at replaceability. My impression is that the single comparison view was circulated once long ago and that, ever since its flaws became apparent, thereâs been something of a vacuum. With what do people fill this vacuum? I have no idea. But we make decisions, so it must be something.
Simulations
Iâve run Monte Carlo simulations with the aim of seeing how these views perform in four simple, idealised scenarios. The scenarios pit two fields (as in, spheres of activity) against each other. (For more on how the simulations were run, see the Appendix.) The percentages signify how often youâd choose Field A if you took a given view:
#
Field A
Field B
Naive
Single Comparison
Replacement
God
1
1K ppl (medium talent), 100 jobs
1K ppl (medium talent), 100 jobs
49.9%
50.7%
50.0%
50.1%
2
1K ppl (high talent), 100 jobs
1K ppl (medium talent), 100 jobs
69.6%
62.3%
61.8%
62.4%
3
1K ppl (medium talent), 100 jobs
100 ppl (medium talent), 10 jobs
50.7%
26.0%
53.4%
49.4%
4
1K ppl (medium talent), 100 jobs
200 ppl (medium talent), 100 jobs
67.9%
55.8%
56.2%
56.5%
I interpret the results thusly:
The fields are identical so the decision is a toss-up. Indeed, all four views choose Field A ~50% of the time.
Field A has more talented people than Field B so we should pick it more often. Thatâs because youâre drawn randomly from each fieldâs talent pool, so youâll usually end up being more talented in Field A than in Field B.
Most views capture this pretty well (using the God view as our ground truth), ending up at 62%.
But the naive view is kind of off at 70%. Itâs too optimistic about higher-talent fields when jobs are scarce. I think thatâs because it ignores trickle-down effects. This doesnât matter when everyone has a job â in that case, the naive view is nearly identical to the replacement view. But when jobs are scarce thereâs a difference between these, and the difference grows as the overall talent of the field grows.
People are equally talented in both fields, but Field B has 10% as many people and jobs as Field A.
This is also a toss-up. The naive (51%) and replacement (53%) views do well here with the God view (49%) as ground truth.
But the single comparison view does poorly (26%). Itâs too optimistic about fields with fewer people since the distance to the next most talented person will tend to be larger there. But this distance doesnât matter much if thereâs a trickle-down effect.
People are equally talented in both fields, but Field B has fewer people relative to its number of jobs (2x) than does Field A (10x).
This isnât a toss-up â we should choose Field A 57% of the time. Thatâs because we stipulated that youâre always in a position to choose between the two fields, meaning, because there are fewer people for each job in Field B, a randomly chosen person who gets a job there is less talented than a randomly chosen person who gets a job in Field A (which is more competitive). So weâll tend to pick Field A more.
The single comparison (56%) and replacement (56%) views do well here.
But the naive view does poorly (68%). Itâs too pessimistic about fields where thereâs less competition in the labour market, though Iâm not sure why.
Like any model, this one rests on a number of simplifying assumptions. None of these results are guaranteed to hold outside the modelâs world. Still âŠ
The Replacement View Seems Useful, Maybe
Though I donât feel sure enough to actually make recommendations to people faced with career decisions, Iâm tentatively bullish on the replacement view. Arguments for:
There may be serious failure modes for both the naive and single comparison views.
If we trust the simulations, the naive view seems too bullish on fields with better candidates, and also too bearish on fields with less competition on the labour market. The single comparison view is too bullish on fields with fewer people and jobs (even if the people-to-jobs ratio is constant).
If we donât trust the simulations, there still seem to be intuitive issues like not considering counterfactuals at all (the naive view) and not considering trickle-down effects (the single comparison view).
I donât know of any practical alternatives to these views. I have come across considerations that seem important and have to do with replaceability, but no action-guiding frameworks or theories.
These same failure modes may not affect the replacement view. The replacement view does well in the simulations, though I wouldnât put too much weight on that. It also just takes into account counterfactuals as well as trickle-down effects, which seems good.
It seems workable. I think people often have a pretty good idea of what the replacement-level person in their field looks like, either from studying with people who wouldnât or who would only barely break into the field, or from working with them, or from seeing the things they produce. If one hasnât worked or studied in a field, one can still get a pretty good impression by talking with people in the field, or reading about it, or, again, seeing the things people in the field produce.
Arguments against:
It seems unworkable. Itâs unclear which reference group to use when locating the replacement-level person. Say youâre a wild animal welfare researcher. Is the correct reference group all wild animal welfare researchers? Is it all animal welfare researchers? Is it all impact-focused animal welfare researchers? Is it impact-focused people period? Or something else?
It seems conceptually confused. I donât have a mathematical proof for the replacement view. It isnât, as far as I know, solidly grounded in economic or moral theory.
In sports, stats like Wins Above Replacement are used to compare the impact of people, not the impact of actions. (Choosing which player to sign is an action, but Iâm not sure whether the assumptions hold if you look at it that way.) That makes me suspicious of simply transposing it over to career decisions.
Maybe thinking about replaceability is getting too in the weeds when we still havenât figured out more important considerations. It seems likely to me that factors like how pressing a problem is, career capital and so on are substantially more important than replaceability. Maybe those factors swamp replaceability, such that itâs basically not worth thinking about when you could instead be thinking about those other things?
Maybe replaceability is subsumed by personal fit. If personal fit is the distance between a candidate and the average candidate, itâs analogous to the replacement view, which is the distance between a candidate and the replacement-level candidate. These measures should correlate. I think the way they differ depends on how talent is distributed.
They both help you predict how much effort you add to a problem.[3] (They donât say anything about scale or solvability.) So maybe we only need one of them.
However, when I ran the same simulations with a âpersonal fit viewâ, it picked the wrong thing â„30% of runs in scenario (2) and â„17% of runs in scenario (4). Maybe the right framing is something like âwe should start thinking of personal fit as not comparing ourselves to the average, but comparing ourselves to the replacement levelâ?
My simulation code may be buggy.
Replaceability is also more complicated than this post makes it out to be. For example, in the real world:
It can take time for changes to percolate down, unlike in sports where an empty position must be filled immediately. A company may not immediately find a replacement, if it does at all.
Perhaps the model could include a probability P that some person will take a job had one not taken it, as Benjamin Todd does here. This probability would depend on how many candidates there are, how talented they are, how broadly openings are advertised and so on. These factors surely differ from field to field.
Employers donât always hire the best candidates. However, I expect them to choose the better candidate more often than not, so maybe itâs correct to say that they hire the best candidates in expectation.
People donât always know what theyâre a good fit for. That means one may end up displacing someone into a career where they end up doing much more good. But again, I expect people to have a pretty clear picture most of the time of how good theyâd be at a thing.
Talent may not be lognormally distributed. Iâm pretty sure it follows a heavy-tailed distribution (that seems to be the case in hockey, baseball, programming and labour in the UK[4]), but itâs not clear to me which one, and this may make a difference.
Choosing to use a scarce resource may increase or decrease the supply of that resource (on top of the amount one used), and the degree to which this happens can vary from field to field. For example, if more people try to be doctors, hospitals can pay them lower salaries, meaning they have more money to spend (assuming the labour market for doctors has non-zero supply and demand elasticities).
The problem that replaceability is meant to address is a pretty rare one. There doesnât seem to be much research on it. There arenât that many situations where people (1) pursue the same goal[5], (2) donât usually coordinate, (3) use shared scarce resources with substantial supply and demand and (4) are able to use those resources to varying degrees of efficiency. But effective altruists are in this rare situation.
Appendix: Monte Carlo Simulations
Here is the procedure I used to simulate career choices based on the four replaceability views:
For two different fields, Field A and Field B, generate N people with different Talent Levels, and M jobs with different âEffort Multipliersâ.[6] Assume these are lognormally distributed.[7]
Effort Multipliers represent the fact that some jobs allow a person to get more work done towards solving a problem than other jobs, e.g. by providing more opportunities or better support.
We can describe the effort added by a person working a job with the formula Effort = Talent Level Ă Effort Multiplier. So the total effort of people working on a problem (corresponding to the God view) is given by the sum of Talent Level Ă Effort Multiplier for all person-job pairs in both fields. (We assume people who donât get a job produce zero effort.)
For each field, assign the most talented people to the jobs with the highest Effort Multipliers, one at a time, until there are no more available jobs (or people).
Select a person at random from each field, excluding those people who didnât get a job.[8] This is you and your talent at each thing. Youâre now going to decide which field to work in.
For each of the four views, calculate the value of choosing Field A, and the value of choosing Field B.
We get a personâs naive effort in a field by multiplying their Talent Level in that field by the Effort Multiplier of the job theyâd get, i.e. Your Effort = Your Talent Level Ă Your Effort Multiplier.
Your effort in each field according to the naive view is simply Your Effort.
Your effort in each field according to the single comparison view is Your EffortâThe Person Whoâd Replace Youâs Effort.
Your effort in each field according to the replacement view is Your EffortâThe Replacement-Level Personâs Effort. Remember, the replacement-level person is the person whoâd barely get a job in the field.
Repeat steps (1) to (4) 10,000 times for each scenario.
Estimate P(choose Field A) for each of the views.
References
Page, Scott E. 2018. The Model Thinker: What You Need to Know to Make Data Work for You. Basic Books.
Pearl, Judea, Madelyn Glymour, and Nicholas P Jewell. 2016. Causal Inference in Statistics: A Primer. John Wiley & Sons.
Replaceability is different from counterfactuals. Pearl, Glymour, and Jewell (2016) describes a counterfactual as âan âifâ statement in which the âifâ portion is untrue or unrealizedâ. This involves tallying up all the ways a thing wouldâve gone differently. Replaceability is a special kind of counterfactual reasoning, dealing only with the use (or non-use) of a scarce resource.
True, ice time makes this a more subtle calculation. Signing the star left winger means the near-star-calibre left winger gets pushed down to the second line, meaning their (considerable) impact is reduced. But I think it serves as an example of these kinds of considerations mattering in practice.
I frame it as âhow much effort you addâ, not âhow much impact you haveâ, because impact also depends on other things, in particular the problem areasâ relative scale (defined, after 80,000 Hours, as Good Done Ă· % of Problem Solved) and solvability (% of Problem Solved Ă· % Increase in Effort). Focusing on effort alone is cleaner as we can bracket those other concepts. As far as this post is concerned, all problems have the same scale and solvability.
NB. âIncrease in Effortâ is called by 80,000 Hours âIncrease in Resourcesâ, but since Iâm already using the word âresourceâ to refer to labour, time and money, Iâm calling it âIncrease in Effortâ instead.
Some posts, like this one, point to income and researcher citation count as evidence of this (emphasis mine): âIf job performance is like income, or the number of citations people have on academic papers, it is more like a log normal distribution[.] That is, most aspiring academics have few citations, while some have thousands, tens of thousands, or even hundreds of thousands. [...] Weâre very unsure about this question, and would like to see more research into it. Some evidence weâve seen suggests that output is normally distributed even in âcomplexâ jobs, like being a doctor. However, for the most difficult and creative work, like academic research, we suspect that the variance is high in the tails. Even there, itâs hard to be confident since many measures of output (such as citation count) are likely to overstate differences in productivity.â
As alluded to in the quoted passage, I think income and citation count arenât good evidence. Even if talent is normally distributed (i.e. follows a bell curve), salary and citations could well have heavy tails due to nonlinear effects later in the causal chain. The Matthew effect â where having an advantage gets you further advantages â applies here too, as well-cited papers are more likely to get further citations independently of quality, and richer people are more likely to get more money regardless of talent.
This model implicitly takes neglectedness and personal fit into account â neglectedness (and importance) is captured by a jobâs impact level, and personal fit is captured by a personâs talent level.
Page (2018) writes: âIn some cases, we may know the mean of the distribution and also know that all values must be positive. Given those constraints, the maximal entropy distribution must have a long tail, and as we spread the distribution across more values, we must balance high values with many low-value outcomes.â
I donât know the means of these talent distributions, but it does seem likely to me that (a) that talent canât be negative and (b) the distance between the average talent and zero talent is smaller than the distance between the average talent and the greatest talent. That seems like a pretty good justification for a heavy-tailed distribution.
Note that this means that, if the number of people exceeds the number of jobs, youâll tend to have an above average talent level. If there are 10x as many people as jobs, for example, youâre randomly selected from the 90th percentile.
Impact above Replacement
Link post
Summary
There are at least four ways of thinking about replaceability:
The naive view, where your impact is the value you produced using some amount of resources.
The single comparison view, where your impact is the value you produced using some amount of resources, minus the value someone else wouldâve produced using those resources had you not done so.
The replacement view, where your impact is the value you produced using some amount of resources, minus the value the âreplacement-level personâ of that reference group wouldâve produced using those resources. The replacement-level person is the person whoâs only barely talented enough to enter a field.
The God view, where your impact is the total utility in the world where you do the thing, minus the total utility in the world where you donât do the thing. I assume for the purposes of this post that this is the normatively correct view, in order to have a benchmark.
Key takeaways:
The replacement view may avoid some practical failure modes of the naive and single comparison views.
Iâm sure the naive and single comparison views are wrong, but Iâm less sure whether that makes a large difference for peopleâs decisions in practice. I assign 20% credence to the claim âthe naive and single comparison views lead people to take avoidably and substantially wrong career decisions â„10% of the timeâ.
Maybe we should consider adopting the replacement view.
Iâm pretty sure the replacement view is more accurate than the naive and single comparison views, but Iâm less sure whether itâd improve peopleâs decisions in practice. I assign 15% credence to the claim âpeople aiming to do good with their careers would have noticeably more impact were they to use the replacement viewâ.
That said, there are complicated effects to account for that are beyond the scope of the replacement view, e.g. supply and demand elasticities and lag times for changes to percolate through a system.
The evidence in this post comes mainly from, in decreasing order of importance, (i) me reasoning about the problem, (ii) me doing some Monte Carlo simulations and (iii) somewhat analogous methods being used in sports.
The Monte Carlo simulations test how often views (1), (2) and (3) lead to the same career choice as view (4) under four idealised scenarios.
The naive and single comparison views do badly under some scenarios, but the replacement view does well under all scenarios.
The simulations make a lot of assumptions (see the Appendix for more) in order to more easily model the problem, and so should be taken with a grain of salt.
What Replaceability Is
You want to do some good in the world. But to do good you need resources, and some of those will be in limited supply. If you use scarce resources, that means someone else wonât. Now they canât do as much good as they couldâve. So how should you decide which scarce resources to use, or whether to use them at all?
Replaceability is a partial answer to this. Replaceability is usually taken to mean something like âthe extent to which someone else would do what youâd do in a job if you donât take the jobâ. I interpret it more generally, as âthe extent to which other people wouldâve produced similar value with a scarce resource had you not used itâ.[1]
The paradigmatic example is choosing a career: if youâre thinking of becoming a doctor to save hundreds of lives, you should perhaps keep in mind that someone else mightâve saved those lives had you chosen a different career. If so, the doctor profession is highly replaceable. But this is just one of many situations where replaceability may be a factor:
Four Views on Impact
We can look at impact in at least four different ways:
The naive view. Your impact is the value you produced directly and indirectly using some amount of resources.
This view, which I think is common outside effective altruism, is incorrect because it fails to consider the counterfactual.
The single comparison view. Your impact is the value you produced directly and indirectly using some amount of resources, minus the value someone else wouldâve produced using those same resources had you not done so.
This view, which I think is common within effective altruism, is incorrect because it fails to consider trickle-down effects. That is, the person who replaces you wouldâve been replaced by someone less able, who in turn wouldâve been replaced by someone less able, and so on. Each of these steps may involve an additional loss of impact.
The replacement view. Your impact is the value you produced directly and indirectly using some amount of resources, minus the value the âreplacement-level personâ of that reference group wouldâve produced using those same resources.
This is an effective altruist variant of what sabermetricians call Wins Above Replacement, a stat that aims to measure how many more wins a baseball player contributes to a team than the replacement-level player. The replacement-level player is the substitute who wouldâve been called up to join the team had the player being measured not participated.
The replacement view seems confused, at least as a parallel to sports, because in sports itâs designed to compare people, not actions.
Benjamin Todd proposes something similar: that your impact is the value you produce, minus the value someone else wouldâve produced in your position, plus the value of any additionally freed-up resources. But it seems hard to me to estimate the value of those freed-up resources â it feels like punting.
The God view. Your impact is the value produced by everyone (including you) in the world where you use some amount of resources, minus the value that wouldâve been produced by everyone in the world where you didnât use those resources. (Because God is omniscient.)
This view is perhaps normatively correct but requires perfect information and computing power.
Benjamin Todd, Paul Christiano and others have thought and written about replaceability, but I think itâs fair to say no one has reached any definitive conclusion: itâs a hard problem. Todd recommends focusing on other, more robustly predictable factors, like personal fit and scale and solvability, when choosing a career. In fact, thereâs been remarkably little discussion about replaceability in the past few years, I think partly because people have realised that replaceability differs less across career options than those other things (personal fit, etc.), and partly because people have tactically retreated from a difficult problem.
I think this is somewhat unfortunate, as replaceability seems to me to be decision-relevant and somewhat tractable, even if not as important as the problems Todd and Christiano have moved on to.
Why Does Replaceability Matter?
There are ~1M licensed physicians in the US alone, which is ~2 orders of magnitude more than the number of effective altruists globally. Iâd expect there to be a difference of â„1 order of magnitude in the number of people working on different problems within effective altruism too. If the number of people working in a field affects replaceability in a way that matters when estimating counterfactual impact, it seems important to know how it matters.
To imagine a rather extreme scenario, I think itâs easy for someone whoâs choosing between two jobs, one in a field with lots of people and one that basically only they can do, to have heard of replaceability and think, âThough this thing that only I can do seems less important than this thing that many other people are doing, if I donât work on the thing that many people are doing, someone else nearly as good as me would take my place, so instead I should work on this thing that only I can do.â But if something like the replacement view is a better representation of the truth than the single comparison view, this person may be making a grievous error!
Sports seems somewhat analogous here. Sports teams are also making decisions about how to allocate scarce resources (e.g. playing time), and they do so by explicitly considering replaceability. (Itâs easier for them; they have good measures of how teams and players perform.)
For example, suppose a hockey team is deciding whether to sign a star left winger or equally talented star right winger. Suppose currently its top left winger is near star calibre, whereas its top right winger is middling. Suppose its worst left winger is so bad as to be a liability, whereas its worst right winger is pretty good. If the team only compared each potential addition to the player whose spot theyâd take, theyâd go with the star right winger (whoâs much better than the currently best right winger on the team, whoâs middling). But it may be better for the team to sign the star left winger (to get rid of the marginal left winger, whoâs a liability).[2]
I think itâs currently unclear how we are looking or should look at replaceability. My impression is that the single comparison view was circulated once long ago and that, ever since its flaws became apparent, thereâs been something of a vacuum. With what do people fill this vacuum? I have no idea. But we make decisions, so it must be something.
Simulations
Iâve run Monte Carlo simulations with the aim of seeing how these views perform in four simple, idealised scenarios. The scenarios pit two fields (as in, spheres of activity) against each other. (For more on how the simulations were run, see the Appendix.) The percentages signify how often youâd choose Field A if you took a given view:
I interpret the results thusly:
The fields are identical so the decision is a toss-up. Indeed, all four views choose Field A ~50% of the time.
Field A has more talented people than Field B so we should pick it more often. Thatâs because youâre drawn randomly from each fieldâs talent pool, so youâll usually end up being more talented in Field A than in Field B.
Most views capture this pretty well (using the God view as our ground truth), ending up at 62%.
But the naive view is kind of off at 70%. Itâs too optimistic about higher-talent fields when jobs are scarce. I think thatâs because it ignores trickle-down effects. This doesnât matter when everyone has a job â in that case, the naive view is nearly identical to the replacement view. But when jobs are scarce thereâs a difference between these, and the difference grows as the overall talent of the field grows.
People are equally talented in both fields, but Field B has 10% as many people and jobs as Field A.
This is also a toss-up. The naive (51%) and replacement (53%) views do well here with the God view (49%) as ground truth.
But the single comparison view does poorly (26%). Itâs too optimistic about fields with fewer people since the distance to the next most talented person will tend to be larger there. But this distance doesnât matter much if thereâs a trickle-down effect.
People are equally talented in both fields, but Field B has fewer people relative to its number of jobs (2x) than does Field A (10x).
This isnât a toss-up â we should choose Field A 57% of the time. Thatâs because we stipulated that youâre always in a position to choose between the two fields, meaning, because there are fewer people for each job in Field B, a randomly chosen person who gets a job there is less talented than a randomly chosen person who gets a job in Field A (which is more competitive). So weâll tend to pick Field A more.
The single comparison (56%) and replacement (56%) views do well here.
But the naive view does poorly (68%). Itâs too pessimistic about fields where thereâs less competition in the labour market, though Iâm not sure why.
Like any model, this one rests on a number of simplifying assumptions. None of these results are guaranteed to hold outside the modelâs world. Still âŠ
The Replacement View Seems Useful, Maybe
Though I donât feel sure enough to actually make recommendations to people faced with career decisions, Iâm tentatively bullish on the replacement view. Arguments for:
There may be serious failure modes for both the naive and single comparison views.
If we trust the simulations, the naive view seems too bullish on fields with better candidates, and also too bearish on fields with less competition on the labour market. The single comparison view is too bullish on fields with fewer people and jobs (even if the people-to-jobs ratio is constant).
If we donât trust the simulations, there still seem to be intuitive issues like not considering counterfactuals at all (the naive view) and not considering trickle-down effects (the single comparison view).
I donât know of any practical alternatives to these views. I have come across considerations that seem important and have to do with replaceability, but no action-guiding frameworks or theories.
These same failure modes may not affect the replacement view. The replacement view does well in the simulations, though I wouldnât put too much weight on that. It also just takes into account counterfactuals as well as trickle-down effects, which seems good.
It seems workable. I think people often have a pretty good idea of what the replacement-level person in their field looks like, either from studying with people who wouldnât or who would only barely break into the field, or from working with them, or from seeing the things they produce. If one hasnât worked or studied in a field, one can still get a pretty good impression by talking with people in the field, or reading about it, or, again, seeing the things people in the field produce.
Arguments against:
It seems unworkable. Itâs unclear which reference group to use when locating the replacement-level person. Say youâre a wild animal welfare researcher. Is the correct reference group all wild animal welfare researchers? Is it all animal welfare researchers? Is it all impact-focused animal welfare researchers? Is it impact-focused people period? Or something else?
It seems conceptually confused. I donât have a mathematical proof for the replacement view. It isnât, as far as I know, solidly grounded in economic or moral theory.
In sports, stats like Wins Above Replacement are used to compare the impact of people, not the impact of actions. (Choosing which player to sign is an action, but Iâm not sure whether the assumptions hold if you look at it that way.) That makes me suspicious of simply transposing it over to career decisions.
Maybe thinking about replaceability is getting too in the weeds when we still havenât figured out more important considerations. It seems likely to me that factors like how pressing a problem is, career capital and so on are substantially more important than replaceability. Maybe those factors swamp replaceability, such that itâs basically not worth thinking about when you could instead be thinking about those other things?
Maybe replaceability is subsumed by personal fit. If personal fit is the distance between a candidate and the average candidate, itâs analogous to the replacement view, which is the distance between a candidate and the replacement-level candidate. These measures should correlate. I think the way they differ depends on how talent is distributed.
They both help you predict how much effort you add to a problem.[3] (They donât say anything about scale or solvability.) So maybe we only need one of them.
However, when I ran the same simulations with a âpersonal fit viewâ, it picked the wrong thing â„30% of runs in scenario (2) and â„17% of runs in scenario (4). Maybe the right framing is something like âwe should start thinking of personal fit as not comparing ourselves to the average, but comparing ourselves to the replacement levelâ?
My simulation code may be buggy.
Replaceability is also more complicated than this post makes it out to be. For example, in the real world:
It can take time for changes to percolate down, unlike in sports where an empty position must be filled immediately. A company may not immediately find a replacement, if it does at all.
Perhaps the model could include a probability P that some person will take a job had one not taken it, as Benjamin Todd does here. This probability would depend on how many candidates there are, how talented they are, how broadly openings are advertised and so on. These factors surely differ from field to field.
Employers donât always hire the best candidates. However, I expect them to choose the better candidate more often than not, so maybe itâs correct to say that they hire the best candidates in expectation.
People donât always know what theyâre a good fit for. That means one may end up displacing someone into a career where they end up doing much more good. But again, I expect people to have a pretty clear picture most of the time of how good theyâd be at a thing.
Talent may not be lognormally distributed. Iâm pretty sure it follows a heavy-tailed distribution (that seems to be the case in hockey, baseball, programming and labour in the UK[4]), but itâs not clear to me which one, and this may make a difference.
Choosing to use a scarce resource may increase or decrease the supply of that resource (on top of the amount one used), and the degree to which this happens can vary from field to field. For example, if more people try to be doctors, hospitals can pay them lower salaries, meaning they have more money to spend (assuming the labour market for doctors has non-zero supply and demand elasticities).
The problem that replaceability is meant to address is a pretty rare one. There doesnât seem to be much research on it. There arenât that many situations where people (1) pursue the same goal[5], (2) donât usually coordinate, (3) use shared scarce resources with substantial supply and demand and (4) are able to use those resources to varying degrees of efficiency. But effective altruists are in this rare situation.
Appendix: Monte Carlo Simulations
Here is the procedure I used to simulate career choices based on the four replaceability views:
For two different fields, Field A and Field B, generate N people with different Talent Levels, and M jobs with different âEffort Multipliersâ.[6] Assume these are lognormally distributed.[7]
Effort Multipliers represent the fact that some jobs allow a person to get more work done towards solving a problem than other jobs, e.g. by providing more opportunities or better support.
We can describe the effort added by a person working a job with the formula Effort = Talent Level Ă Effort Multiplier. So the total effort of people working on a problem (corresponding to the God view) is given by the sum of Talent Level Ă Effort Multiplier for all person-job pairs in both fields. (We assume people who donât get a job produce zero effort.)
For each field, assign the most talented people to the jobs with the highest Effort Multipliers, one at a time, until there are no more available jobs (or people).
Select a person at random from each field, excluding those people who didnât get a job.[8] This is you and your talent at each thing. Youâre now going to decide which field to work in.
For each of the four views, calculate the value of choosing Field A, and the value of choosing Field B.
We get a personâs naive effort in a field by multiplying their Talent Level in that field by the Effort Multiplier of the job theyâd get, i.e. Your Effort = Your Talent Level Ă Your Effort Multiplier.
Your effort in each field according to the naive view is simply Your Effort.
Your effort in each field according to the single comparison view is Your EffortâThe Person Whoâd Replace Youâs Effort.
Your effort in each field according to the replacement view is Your EffortâThe Replacement-Level Personâs Effort. Remember, the replacement-level person is the person whoâd barely get a job in the field.
Repeat steps (1) to (4) 10,000 times for each scenario.
Estimate P(choose Field A) for each of the views.
References
Page, Scott E. 2018. The Model Thinker: What You Need to Know to Make Data Work for You. Basic Books.
Pearl, Judea, Madelyn Glymour, and Nicholas P Jewell. 2016. Causal Inference in Statistics: A Primer. John Wiley & Sons.
Replaceability is different from counterfactuals. Pearl, Glymour, and Jewell (2016) describes a counterfactual as âan âifâ statement in which the âifâ portion is untrue or unrealizedâ. This involves tallying up all the ways a thing wouldâve gone differently. Replaceability is a special kind of counterfactual reasoning, dealing only with the use (or non-use) of a scarce resource.
True, ice time makes this a more subtle calculation. Signing the star left winger means the near-star-calibre left winger gets pushed down to the second line, meaning their (considerable) impact is reduced. But I think it serves as an example of these kinds of considerations mattering in practice.
I frame it as âhow much effort you addâ, not âhow much impact you haveâ, because impact also depends on other things, in particular the problem areasâ relative scale (defined, after 80,000 Hours, as Good Done Ă· % of Problem Solved) and solvability (% of Problem Solved Ă· % Increase in Effort). Focusing on effort alone is cleaner as we can bracket those other concepts. As far as this post is concerned, all problems have the same scale and solvability.
NB. âIncrease in Effortâ is called by 80,000 Hours âIncrease in Resourcesâ, but since Iâm already using the word âresourceâ to refer to labour, time and money, Iâm calling it âIncrease in Effortâ instead.
Some posts, like this one, point to income and researcher citation count as evidence of this (emphasis mine): âIf job performance is like income, or the number of citations people have on academic papers, it is more like a log normal distribution[.] That is, most aspiring academics have few citations, while some have thousands, tens of thousands, or even hundreds of thousands. [...] Weâre very unsure about this question, and would like to see more research into it. Some evidence weâve seen suggests that output is normally distributed even in âcomplexâ jobs, like being a doctor. However, for the most difficult and creative work, like academic research, we suspect that the variance is high in the tails. Even there, itâs hard to be confident since many measures of output (such as citation count) are likely to overstate differences in productivity.â
As alluded to in the quoted passage, I think income and citation count arenât good evidence. Even if talent is normally distributed (i.e. follows a bell curve), salary and citations could well have heavy tails due to nonlinear effects later in the causal chain. The Matthew effect â where having an advantage gets you further advantages â applies here too, as well-cited papers are more likely to get further citations independently of quality, and richer people are more likely to get more money regardless of talent.
Benjamin Todd calls this a âshared aim communityâ.
This model implicitly takes neglectedness and personal fit into account â neglectedness (and importance) is captured by a jobâs impact level, and personal fit is captured by a personâs talent level.
Page (2018) writes: âIn some cases, we may know the mean of the distribution and also know that all values must be positive. Given those constraints, the maximal entropy distribution must have a long tail, and as we spread the distribution across more values, we must balance high values with many low-value outcomes.â
I donât know the means of these talent distributions, but it does seem likely to me that (a) that talent canât be negative and (b) the distance between the average talent and zero talent is smaller than the distance between the average talent and the greatest talent. That seems like a pretty good justification for a heavy-tailed distribution.
Note that this means that, if the number of people exceeds the number of jobs, youâll tend to have an above average talent level. If there are 10x as many people as jobs, for example, youâre randomly selected from the 90th percentile.