brentonmayer comments on 80,000 Hours two-year review: 2021–2022

brentonmayer 9 Mar 2023 11:23 UTC
10 points
1 ∶ 0
Hi Vaidehi—I’m answering here as I was responsible for 80k’s impact evaluation until late last year.

My understanding is that plan changes (previously IASPC’s then DIPY’s) were a core metric 80K used in previous years to evaluate impact. It seems that there has been a shift to a new metric—CBPC’s (see below).
This understanding is a little off. Instead, it’s that in 2019 we decided to switch from IASPCs to DIPYs and CBPCs.

The best place to read about the transition is the mistakes page here, and I think the best places to read detail on how these metrics work is the 2019 review for DIPYs and the 2020 review for CBPCs. (There’s a 2015 blog post on IASPCs.)
~~~
Some more general comments on how I think about this:
A natural way to think about 80k’s impact is as a funnel which culminates in a single metric which we can relate to as a for profit does to revenue.

I haven’t been able to create a metric which is overall strong enough to make me want to rely on it like that.

The closest I’ve come is the DIPY, but it’s got major problems:
1. Lags by years.
2. Takes hundreds of hours to put together.
3. Requires a bunch of judgement calls—these are hard for people without context to assess and have fairly low inter-rater reliability (between people, but also the same people over time).
4. Most (not all) of them come from case studies where people are asked questions directly by 80,000 Hours staff. That introduces some sources of error, including from social-desirability bias.
5. The case studies it’s based on can’t be shared publicly.
6. Captures a small fraction of our impact.
7. Doesn’t capture externalities.
(There’s a bit more discussion on impact eval complexities in the 2019 annual review.)

So, rather than thinking in terms of a single metric to optimise, when I think about 80k’s impact and strategy I consider several sources of information and attempt to weigh each of them appropriately given their strengths and weaknesses.

The major ones are listed in the full 2022 annual review, which I’ll copy out here:
1. Open Philanthropy EA/LT survey.
2. EA Survey responses.
3. The 80,000 Hours user survey. A summary of the 2022 user survey is linked in the appendix.
4. Our in-depth case study analyses, which produce our top plan changes (last analysed in 2020). EDIT: this process produces the DIPYs as well. I’ve made a note of this in the public annual review—apologies, doing this earlier might have prevented you getting the impression that we retired them.
5. Our own data about how users interact with our services (e.g. our historical metrics linked in the appendix).
6. Our and others’ impressions of the quality of our visible output.
~~~
On your specific questions:
- I understand that we didn’t make predictions about CBPCs in 2021.
- Otherwise, I think the above is probably the best general answer to give to most of these—but lmk if you have follow ups :)
What links here?
- Cody_Fenwick's comment on 80,000 Hours two-year review: 2021–2022 by 80000_Hours (9 Mar 2023 11:36 UTC; 3 points)
- brentonmayer's comment on 80,000 Hours two-year review: 2021–2022 by 80000_Hours (9 Mar 2023 12:27 UTC; 2 points)