How Important is Exploration and Prioritization?

This is a research report written in 2022. Three years have passed since then, and I don’t necessarily endorse some of its conclusions now. That said, due to its mostly theoretical nature, much of it seem to have stood my personal test of time. I’m therefore making it public, with the due amount of caveats.

Epistemic status: Highly uncertain. These are more like “observations I find interesting” than confidence conclusions.


In this document I demonstrate that under many circumstances, one should spend lots of effort (and likely more than commonly assumed[1]) on exploration (i.e. looking for new projects) and prioritization (i.e. comparing known projects), rather than exploitation (i.e. directly working on a project). Furthermore, I provide heuristics that help one decide how much effort to spend on exploration and prioritization.

I assume that the goal is to maximize expected altruistic impact, where by “expected” I’m referring to the mathematical expectation. The actor here can be an individual or a community, and a project can be anything ranging from a 1-week personal project to an entire cause area.

Section 1 will deal with exploration, section 2 will deal with prioritization, and section 3 will test and apply the theory in the real world. Mathematical details can be found in the appendix.

Key Findings

This is an explorative study, so the results are generally of low confidence, and everything below should be seens as hypotheses. The main goal is to suggest paths for future research, rather than to provide definitive answers.

Theoretical Results

  • Claim 3 (Importance of exploration): We, as a community and as individuals, should spend more than half of our effort on searching for potential new projects, rather than working on known projects.

    • Confidence: 45% /​ 20% [2]

  • Claim 6a (Importance of prioritization, uncorrelated case): When faced with

    k (k
    isn’t too small) equally-good longtermist projects that aren’t correlated, we should act as if the k
    tasks of evaluating every project are each as important as working on the best project that we finally identify, and allocate our effort evenly across the
    k+1 tasks, as long as we are able to identify the ex-post best project[3] by working on the
    evaluation tasks.

    • Confidence: 40% /​ 25%

  • Claim 6b (Importance of prioritization, correlated case): When the k

    projects are strongly correlated, prioritization becomes much less important than in the no-correlation case. However, one should still be willing to spend only a small portion (something like
    1/​sqrt(k) or 1/​log(k)
    ; note that this is already much higher than the
    1/​(k+1) in Claim 6a) of one’s effort on direct work, and to spend the remaining portion on prioritization.
    • Confidence: 40% /​ 20%

Practical Results [Caveat: Written for the 2022 community]

  • Claim 7 (Applicability in real world): In real world, the most important factors deciding the applicability of our model are difficulty of reducing uncertainty[4] by exploration and prioritization (E&P), comparative scalability of E&P vs exploitation, heavy-tailedness of opportunities, fixed total budget, and utility-maximization objective.

    • Confidence: Moderate (hard to quantify)

  • Claim 8 (Applying the framework to different parts of EA): In EA, the top 3 areas where E&P deserves the largest portions of resources (relative to the total resources allocated to that area) are

    • identifying promising individuals[5] (relative to the budget of talent cultivation),

    • cause prioritization (relative to the budget of all EA research and direct work),

    • within-cause prioritization (relative to the budget of that cause).

    • Confidence: Low (hard to quantify)

Instrumental Results

Note that the following claims are results (based on mostly qualitative arguments) rather than assumptions.

  • From section 1.2: The distribution of TOC impact (impact stemming from a project’s TOC (theory of change), as opposed to e.g. flow-through effects) across different projects is strictly more heavy-tailed than log-normal.

    • Confidence: 70% /​ 55% [6]

  • Claim 2 (stronger version of previous claim): The distribution of TOC impact across different projects resembles the Pareto distribution in terms of heavy-tailedness.

    • Confidence: 40% /​ 30%

  • Claim 4: For any particular project, the distribution of its TOC impact across all potential scenarios is (roughly) at least as heavy-tailed as the Pareto distribution.[7]

    • Confidence: 70% /​ 55%

Important Limitations

  • Negative impact is ignored.

  • Pascal’s wager emerges in the model of prioritization, but we don’t have a very good way to handle this. (relevant)

  • The “log-normal world” is ignored, and the focus is primarily on the “power law world”. This exaggerates the importance of E&P (exploration and prioritization).

    • “Log-normal world” is the possibility that the distribution of impact is log-normal, and “power law world” is the possibility that the distribution obeys a power law (cf. claim 2, claim 4).

    • I think the power law world is somewhat more likely to be the realistic model for our purpose than the log-normal world is. See section 1.1, 1.2 and 2.1 on this.

  • Non-TOC impact of projects (e.g. flow-through effects) are ignored.

    • This is justifiable in some cases but not in all. (see the last parts of section 1.1)

  • Individual projects are assumed to consist purely of direct work (rather than a mixture of E&P and direct work, which is often the case in reality), and all parts of an individual project are homogeneous. This is especially problematic when we define projects to be larger in scope, e.g. when projects are entire cause areas.[8] Moreover, it’s sometimes hard to distinguish direct work and E&P, e.g. many kinds of direct work also provide insights on prioritization.

    • Despite this, findings in this document still tell us to optimize more for information value (E&P) when your project leads to both information value and direct impact.

  • Differences in scalability (diminishing returns) are mostly ignored.

  • Externalities (and, more generally, all social interactions) are ignored.

    • For example, if your own E&P also provides information value to others, then your E&P should be more important than what’s suggested in this document. Also it will be especially important for you to make your attempts & conclusions publicly known (e.g. writing about my job). On the other hand, you also benefit from other people’s E&P, which reduces the importance of doing E&P yourself.

Above are the key takeaways from this report. Please see this Google Document for the report itself.

Huge thank-you to Daniel Kokotajlo for mentorship and Nuno Sempere for the helpful feedback. These people don’t necessarily agree with the claims, and the report reflects my personal opinion only.

  1. ^

    In the current situation, the EA community as a whole seems to allocate approximately 9~12% of its resources to cause prioritization. But there’s some nuance to this—see the last parts of section 1.3 .

  2. ^

    Meaning I assign 0.45 probability to [the statement being true (in the real world)], and 0.2 probability to [the statement being true (in the real world) and my model being mostly right about the reason].

  3. ^

    Which, by the way, is quite an unrealistic assumption. This assumption is also shared by claim 6b.

  4. ^

    By taking this factor into account, we’ve dealt with the unrealistic assumption (“we are able to identify the ex-post best project”) in claim 6a and 6b.

  5. ^

    Including, for example, providing opportunities for individuals to test fit.

  6. ^

    Conditional on the heavy-tailedness comparison here being meaningful, which isn’t obvious. Same for similar comparisons elsewhere in this document.

  7. ^

    Subject to caveats about Pascal’s wager. See section 2.1 .

  8. ^

    A more realistic model might be a hierarchical one, where projects have sub-projects and sub-sub-projects, etc., and you need to do some amount of prioritization at every level of the hierarchy.

  9. ^

    Meaning I assign ≥0.8 probability to [the statement being true], and ≥0.8 probability to [the statement being true and my model being mostly right about the reason].

  10. ^

    For any technology X, assume that a constant amount of resources are spent each year on developing X. If we observe an exponential increase in X’s efficiency, we can infer that it always takes a constant amount of resources to double X’s efficiency, regardless of its current efficiency—which points to a Pareto distribution. This “amount of work needed to double the efficiency” may have been slowly increasing in the case of integrated circuits, but far slower than what a log-normal distribution would predict.

  11. ^

    Here the diminishing returns mean “saving 108 lives is less than 105 times as good as saving 103 lives, not because we’re scope-insensitive but because we’re risk averse and 108 is usually much more speculative and thus riskier than 103.”

  12. ^

    Isoelastic utility functions are good representatives of the broader class of HARA utility functions, which, according to Wikipedia, is “the most general class of utility functions that are usually used in practice”.

  13. ^

    Under the “fundamental assumptions” or “sense check with intuition” approach, the true distribution has finite mean, but the Pareto distribution used for approximating the true distribution has infinite mean. Under the “heuristics” approach, the true distribution itself has infinite mean.

  14. ^

    u is defined in section 2.2; it stands for the extra gain in “how good is the project that we work on” resulting from prioritization, compared to working only on an arbitrary project. For example, if prioritization increases the project quality from 1 DALY/​$ to 2 DALY/​$, then u=100%=1.0 .

  15. ^

    See the “⍺” column of the “Revenue” rows in table 1 of the paper.

  16. ^

    Copulas are used for modeling the dependence structure between multiple random variables. A reversed Clayton copula is a copula that shows stronger correlation when the variables take larger values. Mathematical knowledge about the (reversed) Clayton copula (and about copulas in general) isn’t needed for reading this section.

  17. ^

    This is a very crude guess, and my 90% confidence interval will likely be very, very wide.

  18. ^

    Note that by choosing r=⅓ I’m underestimating (to a rather small extent) the strength of correlation. I’ll briefly revisit this in a later footnote.

  19. ^

    Recall that we underestimated the strength of correlation by choosing r=⅓, so here u=(log k)-1 is an overestimation of the boost from prioritization, though I think the extent of overestimation is rather small.

  20. ^

    reversed because it’s negatively correlated with importance of E&P

  21. ^

    For GPT-3, 12% of compute is spent on training smaller models than the final 175B-parameter one, according to table D.1 of the GPT-3 paper, though it’s unclear whether that 12% is used for exploration/​comparison, or simply checking for potential problems. Google’s T5 adopted a similar approach of experimenting on smaller models, and they made it clear that those experiments were used to explore and compare model designs, including network architectures. Based on the data in the paper I estimate that 10%-30% of total compute is spent on those experiments, with high uncertainty.

  22. ^

    I’m not counting referral fees into E&P, since they’re usually charged on the lawyer’s side while I’m mainly examining the client’s willingness to pay. Plus, it’s unclear what portion of clients use referral services, and how much referral services help improve the competence of the lawyer that you find.

  23. ^

    This is based on a simple ballpark estimate, and so I don’t provide details here.

  24. ^

    Including, for example, providing opportunities for individuals to test fit.

  25. ^

    The extent to which to prioritize promising individuals, is often discussed under the title of “elitism”. Also here’s some related research.

  26. ^

    Including, for example, providing opportunities for individuals to test fit.

  27. ^

    I know little about macroeconomics, and this claim is of rather low confidence.