Note: This report was produced with only one week of desktop research, for the purpose of identifying promising causes to evaluate at depth. We only have low confidence in our findings here, and the conclusions should generally be taken by readers as merely suggestive rather determinative.

Summary

Factoring in the expected benefits of higher economic productivity (i.e. greater economic output and also improved health), as well as the tractability of educational streaming advocacy, I find that the marginal expected value of educational streaming advocacy to boost productivity to be 5,582 DALYs per USD 100,000, which is around 9x as cost-effective as giving to a GiveWell top charity (CEA).

Key Points

Importance: This is a strongly important cause, with 2.01 * 10⁸ DALYs potentially accruable if productivity could be permanently raised by 1% in low-income countries. Around 76% of the potential gains are economic in nature, while the remaining 24% is health related.
Neglectedness: Educational levels are rising, but slowly, and whatever governments/charities/businesses are doing to solve this problem in low-income countries are probably insufficient.
Tractability: A moderately tractable solution is available, in the form of advocating for the policy of educational streaming (i.e. tailoring teaching to student learning levels). Empirically, streaming does help improve academic performance, which in turn boosts growth; at the same time, it seems fairly plausible that an NGO could successfully lobby a government to carry out this educational reform (which many rich world countries already implement), and that a persuaded government would be able to successfully test students and then sort them to be taught according to their abilities.

Caveats

This report was produced with only one week of research, and critically, only desktop research was used, without experts consulted due to the lack of time. More research – at the intermediate stage and subsequently deep stage – will be needed before we can have high confidence in these findings.
The headline cost-effectiveness will almost certainly fall if this cause area is subjected to deeper research: (a) this is empirically the case, from past experience; and (b) theoretically, we suffer from optimizer’s curse (where causes appear better than the mean partly because they are genuinely more cost-effective but also partly because of random error favouring them, and when deeper research fixes the latter, the estimated cost-effectiveness falls). As it happens, CEARCH does not intend to perform deeper research in this area, given that the headline cost-effectiveness does not meet our threshold of 10x that of a GiveWell top charity.

Further Discussion

On the positive side, CEARCH’s findings here converge with Founder’s Pledge’s research on education; they too find that streaming (as well as salt iodisation) to be effective educational interventions. Note that our findings are genuinely independent, insofar as (a) our approach is very different, and (b) we did not consult FP’s research until after this report was done – a good sign, insofar as it suggests that the findings may be robust.
On the negative side, our estimate of the rate at which streaming will likely be reversed as an educational policy depends on a very limited number of case studies.
Meanwhile, our estimate of how educational gains will be counterfactually accrued even absent intervention – and hence our estimate of the counterfactual income effects of the intervention – relies on a very imprecise projection based on past test score trends.
Similarly, our estimate of the health benefits from increased growth relies on a very general model.
With respect to tractability, one should have low confidence in our finding that education in general and streaming in particular is the most effective intervention to boost productivity – there is evidence for this, but the literature is vast (and riven with disagreement besides), and the research presented here is nowhere close to being comprehensive.
Moreover, our results are sensitive to certain point estimates (e.g. the Duflo, Dupas and Kremer estimate of how streaming improves test scores, or the Hanushek and Miko estimate on how test scores increases GDP per capita growth)
Perhaps the biggest weakness in our tractability analysis is that it does not take into account the future growth in primary and secondary enrollment, which would increase both the benefits (from increasing number of students being taught effectively) and costs (from more testing); and with the former outweighing the latter, it is likely that the headline cost-effectiveness number is an underestimate, at least in this respect
We do not model the numerous potential side-effects from improved education via streaming, which may include reduced child marriage (excellent) and increased student stress (not so much).

Expected Benefit: Increased Economic Output

Naturally, the primary expected benefit of improved productivity is just increased economic output/higher GDP. Overall, if productivity could be boosted by just 1% per annum in the context of low-income countries, around 1.53 * 10⁸ DALYs could potentially be gained, with this benefit modelled in the following way.

Moral Weights: I take the value of doubling consumption for one person for one year to be 0.21 DALYs. This is calculated as a function of (a) the value of consumption relative to life from GiveWell’s IDinsight survey of the community perspective, as adjusted for social desirability bias, and (b) CEARCH’s estimate of the value of a full, healthy life in DALY terms. For more details, refer to CEARCH’s evaluative framework.

Scale: For this analysis, I normalize the degree of income increase per person to a 1% increase in productivity per annum (i.e. treating this as the baseline against which subsequent analysis of any productivity-increasing intervention is done). Correspondingly, this yields a 1% degree of consumption doubling per person. At the same time, the total number of people in low income countries in 2024 is around 739 million. Put together, the total number of consumption doublings achievable from raising productivity by 1% per annum for everyone in low income countries is 7.39 million.

Persistence: The potential economic gains from increased productivity are not something that are theoretically available only in a single year, but rather will accrue over many years. In terms of how this multi-year benefit is calculated:

Firstly, I discount for the probability of the solution not persisting. In this case, the solution is educational streaming (i.e. sorting students by ability and tailoring teaching to them accordingly); this is potentially one of the most impactful productivity-boosting interventions available, and this point this will be discussed at greater length later. For now, note that there is of course a year-on-year chance that such an educational policy is reversed. And to calculate the rate of policy reversal on educational streaming, I look at three case studies – Singapore, Malaysia and Finland.

In Singapore, primary streaming (at primary grade three) and secondary streaming (at secondary grade two) began in 1979 and 1980 respectively, with primary schools offering both a normal course (3 years) as well as an extended course (5 years), while secondary schools offered the special/express course (4 years), the normal course (4 years) as well as vocational education.

Using the end dates of the respective primary and secondary streaming policies enacted in 1979 and 1980 respectively, we can calculate the reversal rate for streaming policy implementation by taking the years in which reversal occurred and dividing by the total number of years in which reversal could have happened but didn’t.

That said, streaming (in the fundamental sense of sorting students by ability levels) has never been reversed in Singapore.

Primary streaming was changed in 1992, as the normal vs extended course differentiation (which left some students taking the Primary School Leaving Examinations, or PSLE, after 6 vs 8 years of primary education) was replaced by the EM1/2/3 tracks (where different streams sat for different mother tongue papers and the science paper was not required for EM3 pupils, but all sat for the PSLE in Primary 6) - but this was streaming nonetheless. Similarly, when EM1 & EM2 were merged in 2004, this still left schools being able to determine which of their students could sit for the higher mother tongue paper in the PSLE; and when EM3 was discontinued in 2008, this was replaced by subject-based banding (i.e. subject-specific streaming).

As for secondary streaming—while subject-based banding will replace the old streaming system in 2024, this is just streaming—indeed, at a more granular-level. The upshot of all this is that streaming policy, initially implemented in ¹⁹⁷⁹⁄₈₀, seems fairly ingrained in Singapore, which has seen no reversal over the potential years it could have been abolished (i.e. a 0% reversal rate).
Meanwhile, in Malaysia, streaming into arts vs science streams (students in the former typically being weaker than those in the latter) began in 1979 and still continues as of 2023, despite plans to replace it being seriously considered in recent years—and in any case, it is unclear if the proposed new system of allowing students to choose their own elective subjects on an ad hoc basis would not have ended up as de facto subject-based banding anyway, as students select into their stronger subjects and out of their weaker ones. In short, there have been zero years of reversal out of all the potential reversal years, such that there is a 0% reversal rate, so far.
In Finland, mandatory schooling began in 1921 with students being able to attend either the selective university-track oppikoulu or the non-selective kansakoulu, but this system was eventually phased out starting in 1972 with streaming abolished and all students broadly having the same learning goals. Given 52 years of possible reversal and 1 year in which reversal finally started, this comes out to about a 1.92% reversal rate.

In aggregating these reversal rates, equal weightage is used – yielding a reversal rate of around 0.6% per annum.

Secondly, I take into account the proportion of the potential benefit counterfactually remaining.

On the one hand, the potential benefit can diminish because productivity increases anyway due to educational gains that would have happened regardless (call this the per capita effect). Note that we can factor out all productivity gains from other sources, as that is not factored in as a potential benefit in the first place. As for modelling the productivity gains from educational advances that would have happened anyway (e.g. because of schools—whether government or private non-profit or private for-profit—improving pedagogy; or because of economic development bringing with it more resources that both state and private actors can spend on education) – we can roughly do this by modelling the average rate of convergence in harmonised learning outcome score (as calculated by Altinok, Angrist, and Patrinos), from the estimated 2024 global average score to the current top score, on the basis that this exhausts most of the gains available (given that diminishing marginal returns apply to productivity gains from education). From this, I estimate approximately a 0.4% per annum loss in counterfactual benefits to the fact that education is improving (and the world capturing the associated productivity gains) in any case.

On the other hand, the potential benefit can grow due to population growth increasing the number of people who can potentially benefit from having their consumption doubled (call this the population effect). For data, I pull from UN Population Division estimates up to 2100, and then assume constancy after. Diagram 1 shows global population over time, relative to the 2024 base.

Diagram 1: Global population, 2024-2100

Thirdly, I discount for the probability of the world being destroyed anyway (i.e. general existential risk discount) – around 0.07% per annum. This takes into account the probability of extinction, since the benefits of increased productivity to the people who would enjoy it would be nullified if said people would die in an extinction event anyway. For how this risk is calculated, refer to CEARCH’s shallow research on nuclear war.

Fourthly, I apply a broad uncertainty discount of 0.1% per annum to take into account the fact that there is a non-zero chance that in the future, the benefits or costs do not persist for factors we do not and cannot identify in the present (e.g. actors directing resources to solve the problem when none are currently doing so).

Overall, by taking population growth, and discounting the number of potential beneficiaries each year by the various per annum discounts (i.e. solution reversal, existential risk, uncertainty), the total proportion of the benefit that is counterfactually achievable by a total solution relative to the 2024 baseline is shown in Diagram 2.

Diagram 2: Total proportion of productivity gains that is counterfactually achievable by a total solution

Finally, by summing the discounted per annum relative values for 2024-2100, and then using a perpetual value formula for 2101 to infinity, we see that the benefit of increased economic output will last for the equivalent of 98 baseline years.

Value of Outcome: Overall, the raw value of increased economic output is 1.53 * 10⁸ DALYs, in the context of a 1% increase in productivity per annum in low-income countries.

Probability of Occurrence: There are two issues here – the probability of our chosen intervention working, and the probability that (even if the intervention succeeded in raising productivity), whether that increased productivity translates to increased income. The former issue – on intervention effectiveness – I leave to the final section of this report. On the latter – I raise this for conceptual completeness, but it’s a fairly trivial issue, insofar as as claims on any goods or services produced will either be the owner’s (i.e. his income), or else that of the capital owners and labourers he has to pay off (i.e. their income). Hence, the probability of productivity increasing income can be assigned ~1.

Expected Value: Hence, the expected value of increased economic output is 1.53 * 10⁸ DALYs, in the context of a 1% increase in productivity per annum in low-income countries.

Expected Benefit: Improved Health from Increased Productivity

With economic improvement comes also the expected benefit of improved health. Overall, if productivity is boosted by 1% per annum in low-income countries, around 4.81 * 10⁷ DALYs from health gains are potentially accruable, with this benefit modelled as follows.

Moral Weights: Here, what we’re interested in is the DALYs averted per capita for each percentage point in growth of GDP per capita. To do this, I take the difference in DALYs lost per capita between high-income and low-income countries, and divide by the time needed for low-income countries to catch up to current high-income GDP per capita levels from current low-income GDP per capita levels at 1% growth per annum. DALYs lost per capita are age-standardized to factor out composition effects (i.e. richer countries happening to have older and less healthy citizens). In all, this method suggests that around 0.0007 DALYs per individual is gained from a 1% increase in productivity per annum in low income countries.

Scale: In term of potential beneficiaries, the total number of people in low income countries in the baseline year of 2024 is again 739 million.

Persistence: The same per annum discounts and the same projections of population growth over time, as discussed in the previous section, are used here as well, such the benefit of improved health from increased productivity will similarly last for the equivalent of 98 baseline years.

Value of Outcome: Overall, the raw perpetual value of improved health from increased productivity is 4.81 * 10⁷ DALYs, in the context of a 1% increase in productivity per annum in low-income countries.

Probability of Occurrence: The probability of productivity improving health is a function of the probability of (a) productivity increasing income (~1, as discussed previously), and (b) the probability of increased income improving health. With respect to the latter, various studies – Baird et al, Filmer & Pritchett, Pritchett & Summers – point to the conclusion that increases in GDP per capita are associated with reductions in mortality in low- and middle-income countries. Hence, the idea that increased economic output leads to improved health (i.e. fewer deaths and less disability as well as pain) has a very high probability, and given the strong theoretical reasons for this (i.e. with increased economic output, countries can spend more on sanitation/nutrition/access to healthcare etc), this probability can be assigned ~1. And overall, therefore, the probability of productivity improving health outcomes can similarly be assigned ~1; there is no material uncertainty over this.

Expected Value: All in all, the expected value of improved health from increased productivity is 4.81 * 10⁷ DALYs, in the context of a 1% increase in productivity per annum in low-income countries.

Tractability

The real question in all this, of course, is whether we can plausibly boost productivity above current trend – and if so, how much of a productivity boost we can achieve (e.g. will we gain less, equal or more than the nominal 1% baseline?)

To summarize our findings on tractability: we can capture 0.1% of above-mentioned income/health gains from a 1% productivity boost, via a USD 3.43 million investment into advocacy for educational streaming, which means the proportion of the problem solved per additional USD 100,000 spent is around 0.00003.

To begin with, let us discuss what intervention we should even be relying upon, to boost economic productivity:

In terms of potential interventions to boost productivity – this is such a broad area that it will be impossible to be comprehensive, let alone at a shallow research stage. However, generally speaking, we can first try to identify what factors drive an improvement in productivity. Empirically, a regression on data of 60 countries observed from 1960 to 2018 by Dieppe et al using (i) a Bayesian approach that combines information from a wide range of models while favouring simple models with high explanatory power so as to identify key correlates of productivity growth from a pool of 29 candidate variables, along with (ii) initial values of the potential drivers of productivity rather than averages or changes during the sample period to address potential concerns over reverse causality, found that higher productivity growth rates between 1960 and 2018 were associated with the following conditions in 1960: (a) higher investment as a share of GDP; (b) a better-educated workforce (proxied by average years of schooling); (c) stronger institutions (proxied by the rule of law); (d) greater innovation (proxied by a higher number of patents per capita); (e) higher urbanization; (f) lower inflation; and (g) economic complexity.
Note that reverse causality is still a potential problem, with productivity driving the potential driver (since possibly pre-1960 productivity growth rates are positively autocorrelated with productivity growth rates from 1960 to 2018, which would be positively correlated with driver growth rates over that same period, which would in turn be positively autocorrelated with pre-1960 driver growth rates, which would then be positively correlated with 1960 driver levels). That said, this is sufficiently unlikely given the theoretical and empirical case against consistently positive autocorrelation between past and future productivity growth (n.b. respectively, the idea of copying technological/educational/managerial innovations etc and catch-up growth being easier than inventing further new ideas, and the fact that advanced economies underperform the global average of productivity growth while developing economies over-perform). Hence, we can be fairly confident that reverse causality is probably not driving the results and that the candidate drivers really do affect productivity and not the other way around.
With all that said and done, per Dieppe’s analysis we have 6 potential intervention-types for raising total factor productivity (n.b. investment raises labour productivity, but not TFP, and is subject to both diminishing marginal returns and the constraint of depreciation besides, making it unlikely to be an impactful intervention overall): (b) education; (c) institutions; (d) innovation/R&D; (e) urbanization; (f) macroeconomic stability; and (g) economic complexity.
I rule out (c), (e) and (g) on pure tractability grounds - (c) it is extremely difficult to persuade governments to try and change institutions, let alone help them successfully do so; (e) urbanization is driven by deep-set economic factors that isn’t easily altered by the action of agents; and finally (g) it appears to me difficult to radically change an economy’s complexity as a whole, relative to radically reforming specific things like the education system/approach to R&D/macroeconomic policy (which will already be hard enough).
Using Dieppe’s ranking of the relative importance of various drivers for low and middle income countries (LMICs), education appears by far the most impactful intervention; hence, we will focus on that, specifically in the context of low income countries
To narrow down from intervention-type to specific intervention (in education, to improve student learning), I consult Evans and Popova. This review of systematic reviews suggests that tailoring teaching to student learning levels, as well as individualized repeated teacher training interventions associated with a specific task or tool, are what’s consistently shown to be effective at raising student test scores.
In terms of choosing between these two interventions, my general sense is that training teachers effectively is harder than simply testing and streaming students, such that the tailoring intervention may be more effective in expectation.
A necessary caveat to all this, of course, is that this analysis is necessarily limited and shallow—many possibilities as well as lines of evidence have almost certainly been left out, and all conclusions should be taken with a hefty grain of salt.

Having settled on an intervention, we can now lay out our theory of change:

Step 1: Lobby a government to implement streaming (i.e. tailoring teaching to student learning levels
Step 2: The persuaded government successfully implements streaming (i.e. they actually manage to test students to identify ability levels and then actually organize them into different classes/schools to be taught at different levels)
Step 3: Streaming increases educational performance
Step 4: Improved educational performance increases income and hence captures the available economic/health gains

Step 1: To estimate the probability of successfully lobbying a government to implement streaming, I consult both the outside and inside view.

For the outside view, I consult three reference classes: the success rate of general lobbying attempts in (a) the US and (b) the EU; and on top of that, I look at the success rate of (c) nonprofit advocacy attempts in China. The US/EU cases are theoretically less representative insofar as lobbying in rich countries is harder than lobbying poor ones, but at the same time China is probably uniquely difficult to influence due to the closed political system. Ultimately, equal weightage is used, producing an aggregate outside view probability of 32%.

For the inside view, I reason as follows. Streaming/tracking seems fairly common, whether in East Asia (e.g. Singapore/China/Japan/South Korea) or Europe (e.g. Austria/Germany/Hungary/Slovakia) – hence, the idea will not seem especially radical to a policymaker, and correspondingly it doesn’t seem at all improbable that a government is convinced by a nonprofit working on this to try out streaming (i.e. not <=10% chance of success). At the same time, advocacy is fundamentally hard, and it seems unlikely that this will have a >=50% chance of success. Given the context of working in low-income countries where governments defer more to NGOs than rich world governments do, I would tend to say that the average chance of success is on the moderately higher end of these stipulated bounds – perhaps 33% or so.

When aggregating the outside and inside views, it’s important to note that it’s unclear how lobbying success rates in rich world countries (or China, which is fairly sui generis in its political system) translates to low-income countries; on the other hand, there are the usual worries about inferential uncertainties for the inside view. Hence, equal weightage is used, which yields a combined probability of 32%.

Step 2: To estimate the probability of a persuaded government successfully implementing streaming, I once more consult both the outside and inside views.

For the outside view, I use as reference classes the same three countries examined in the context of policy reversal – Singapore, Malaysia, and Finland – and examine their execution of official streaming policy. Singapore, on its part, successfully streams secondary school students into express vs normal streams, as does Malaysia successfully stream its high school students into arts vs science streams. As for Finland – before it moved to a comprehensive system, it successfully streamed students into grammar schools vs public comprehensives. The upshot of all this is a 100% success rate (instances of success per attempt to stream) for each country and hence in aggregate (n.b. equal weightage is used, not that it matters).

For the inside view, I reason as follows. It’s conceptually not difficult at all to test students by ability and then sort them into the relevant classes/schools to be taught to different extents/at different speeds. Hence, the chances of success do not appear very low (i.e. <=10%); indeed, it appears better than even (i.e. >50%). On the other hand, some of these low income countries are fairly wartorn/beset by political instability, and it’s not out of the question that such countries are unable to carry out basic functions like administering national tests and the organization of schools into different educational streams—hence, I conservatively assume that there is a 66% chance of successfully carrying out streaming conditional on a low income government trying.

In combining the outside and inside views, I use equal weightage, producing an 83% success rate for streaming implementation – though inside views are typically beset by inferential uncertainties, the outside view is fairly unrepresentative (i.e. taking reference from high/medium income countries when the median intervention country will be Malawi at best and the Congo at worst).

Step 3: To estimate the extent to which streaming increases educational performance, in test score standard deviations, I use an empirical estimate – per Duflo, Dupas and Kremer’s RCT in Kenya, streaming increases test performance by 0.182 SD in the short run and by 0.235 SD in the long run – I use an average of both (i.e. 0.209).

Step 4: Finally, to estimate the extent to which improved educational performance increases income and hence captures the available economic/health gains, I use another empirical estimate – per Hanushek and Kimko, a one SD increase in test scores translates to a 1.4% increase in GDP per capita growth. In modelling the aggregate effects, I make the simplifying assumption that this applies (eventually) to everyone in a single (low-income) intervention country who is educated to at least the secondary school level. Via this approach, we see that improved educational performance of one SD’s worth of test scores in the likely student population captures around 2% of potential economic/health gains.

Overall, the proportion of income/health gains from productivity captured by educational streaming policy advocacy – as a function of (a) the probability of successfully lobbying a government to implement streaming; (b) the probability of a persuaded government successfully implementing streaming; (c) the extent to which streaming increases educational performance, in test score standard deviations; and (d) the extent to which improved educational performance increases income and hence captures the available economic/health gains – is ultimately 0.001.

Meanwhile, on the costing side, we have to be concerned with both the cost of advocacy (for a nonprofit working on the matter) and the cost of execution (for the government).

To estimate the cost of advocacy, I consider two reference classes – an existing charity and a hypothetical Charity Entrepreneurship-incubated charity.

For the former, I look at Pratham, which basically helps to carry out educational streaming via its programme of Teaching at the Right Level in India, though it is also scaling into Africa, the rest of South Asia and Latin America. Pratham is also a former GiveWell standout organization, albeit from more than a decade ago when standards were different/not as high. Here, I use its UK costing, converting GBP into USD. I assess that around 1 year of operations is a reasonable timeframe for the charity to do geographic selection, subsequent in-country preparatory activities (e.g. prepare supporting research reports on the economic and health benefits of the policy, conducting public polling to show public support, construct a coalition of NGOs and advocates, convince past and present politicians to be legislative champions) and to actually lobby the sitting government—and hence succeed (in which case it can pivot to a different country) or judge that policymakers are just not receptive and that its efforts have failed (in which case it can pivot or else shutdown). Overall, this yields a single-year cost of around USD 1,640,000.

For the hypothetical CE incubatee – the typical structure is that of 2 co-founders, with funding of around USD 50,000 per person per annum. I also make the assumption that the charity will mainly be engaged in advocacy, while relying on Pratham or government partners to do actual execution (of testing and teaching). And, as before, I take a year to be a reasonable timeframe for either identifiable success or failure. In all, this translates to a single-year cost of USD 100,000.

In averaging these two perspectives, we should note that while the existing organization’s financial track record generally gives a much better indication of baseline expenditure requirements in the cause area, Pratham’s spending is not representative here given the different operating model; and in any case, the explicitly EA-aligned CE-incubatee will almost certainly be more cost-effective. Hence, I weigh towards the incubatee’s costs, and find that the money required to conduct lobbying will probably be around USD 240,000

As for the cost of execution – to estimate the cost of ongoing national testing, I do the following things:

Take estimated US costs of $2 per student under conditions meant to minimize costs
Calculate the annual number of students needing testing by taking aggregate low-income country population, dividing by the number of these countries, multiplying by the proportion of 9 and 12 year olds in said low-income countries on the assumption that they will be subject to testing (for streaming at the primary and secondary levels respectively), and then multiplying each age group by primary and secondary enrollment rates respectively.
Factor in all the years in which testing will take place (factoring in both costs declining because of solution reversal/existential risk/uncertainty and also increasing due to population growth)
Discount for the probability that advocacy succeeds (and that the costs are incurred at all)
Discount for the lower counterfactual cost of average poor government spending relative to EA funding going to top GiveWell charities or similar, as a function of the top GiveWell health charity’s cost-effectiveness relative to just giving cash to poor people, correcting for GiveWell’s undervaluation of life vs income.

With all this done, the cost of actually executing streaming as an educational policy is probably around USD 3,190,000

Put together, the total cost of the intervention will be around USD 3,430,000.

Consequently, the proportion of the problem solved per additional USD 100,000 spent is around 0.00003.

Marginal Expected Value of Educational Streaming Advocacy to Boost Productivity

All in all, the marginal expected value of educational streaming advocacy to boost productivity is 5,582 DALYs per USD 100,000 spent, making this around 9x as cost-effective as a GiveWell top charity.

Shallow Report on Productivity

Summary

Expected Benefit: Increased Economic Output

Expected Benefit: Improved Health from Increased Productivity

Tractability

Marginal Expected Value of Educational Streaming Advocacy to Boost Productivity