It would be helpful to have information on lag metrics (actual change in user behavior) in the summary and more prominently in the full report.
My understanding is that plan changes (previously IASPCâs then DIPYâs) were a core metric 80K used in previous years to evaluate impact. It seems that there has been a shift to a new metricâCBPCâs (see below).
From the 2022 user survey :
Criteria Based Plan Changes
1021 respondents answered that 80,000 Hours had increased their impact. Within those, we identified 266 people as having made a âcriteria based plan changeâ (CBPC) i.e. they answered theyâd taken a different job or graduate course of study and there was at least a 30% chance they wouldnât have if not for 80,000 Hours programs.
The CBPCs are across problem areas. They also vary in how impressive they seemâboth in terms of how much of a counterfactual impact 80,000 Hours specifically seems to have had on the change, and how promising the new career choice seems to be as a way of having impact. Most often the website was cited as most important for the change. In reading the responses, we found 19 that seemed the most impressive cases of counterfactual impact and 69 that seemed moderately impressive.
Iâd be curious to know the following:
How does the CBPC differ from the previous metrics youâve used?
How important is the CBPC metric to informing strategy decisions compared to other metrics? How do you see the CBPC metric interacting with other metrics like engagement? Are there other lag metrics you think are directly impactful apart from plan changes (e.g. people being better informed about cause areas, helping high impact orgs recruit, etc.?
Were you on track with predicted CBPCâs or not (and were there any predictions on thisâperhaps with the old metrics)? The 2021 predictions doc doesnât mention them (as compared to the 2020 prediction doc)
By what process do you rate different plan changes? Does predicted impact vary by cause, or other factors? Are these ratings reviewed by advisors external to 80,000 Hours?
Hi VaidehiâIâm answering here as I was responsible for 80kâs impact evaluation until late last year.
My understanding is that plan changes (previously IASPCâs then DIPYâs) were a core metric 80K used in previous years to evaluate impact. It seems that there has been a shift to a new metricâCBPCâs (see below).
This understanding is a little off. Instead, itâs that in 2019 we decided to switch from IASPCs to DIPYs and CBPCs.
The best place to read about the transition is the mistakes page here, and I think the best places to read detail on how these metrics work is the 2019 review for DIPYs and the 2020 review for CBPCs. (Thereâs a 2015 blog post on IASPCs.)
~~~
Some more general comments on how I think about this:
A natural way to think about 80kâs impact is as a funnel which culminates in a single metric which we can relate to as a for profit does to revenue.
I havenât been able to create a metric which is overall strong enough to make me want to rely on it like that.
The closest Iâve come is the DIPY, but itâs got major problems:
Lags by years.
Takes hundreds of hours to put together.
Requires a bunch of judgement callsâthese are hard for people without context to assess and have fairly low inter-rater reliability (between people, but also the same people over time).
Most (not all) of them come from case studies where people are asked questions directly by 80,000 Hours staff. That introduces some sources of error, including from social-desirability bias.
The case studies itâs based on canât be shared publicly.
So, rather than thinking in terms of a single metric to optimise, when I think about 80kâs impact and strategy I consider several sources of information and attempt to weigh each of them appropriately given their strengths and weaknesses.
The major ones are listed in the full 2022 annual review, which Iâll copy out here:
Our in-depth case study analyses, which produce our top plan changes (last analysed in 2020). EDIT: this process produces the DIPYs as well. Iâve made a note of this in the public annual reviewâapologies, doing this earlier might have prevented you getting the impression that we retired them.
Our own data about how users interact with our services (e.g. our historical metrics linked in the appendix).
Our and othersâ impressions of the quality of our visible output.
~~~
On your specific questions:
I understand that we didnât make predictions about CBPCs in 2021.
Otherwise, I think the above is probably the best general answer to give to most of theseâbut lmk if you have follow ups :)
It would be helpful to have information on lag metrics (actual change in user behavior) in the summary and more prominently in the full report.
My understanding is that plan changes (previously IASPCâs then DIPYâs) were a core metric 80K used in previous years to evaluate impact. It seems that there has been a shift to a new metricâCBPCâs (see below).
From the 2022 user survey :
Iâd be curious to know the following:
How does the CBPC differ from the previous metrics youâve used?
How important is the CBPC metric to informing strategy decisions compared to other metrics? How do you see the CBPC metric interacting with other metrics like engagement? Are there other lag metrics you think are directly impactful apart from plan changes (e.g. people being better informed about cause areas, helping high impact orgs recruit, etc.?
Were you on track with predicted CBPCâs or not (and were there any predictions on thisâperhaps with the old metrics)? The 2021 predictions doc doesnât mention them (as compared to the 2020 prediction doc)
By what process do you rate different plan changes? Does predicted impact vary by cause, or other factors? Are these ratings reviewed by advisors external to 80,000 Hours?
Hi VaidehiâIâm answering here as I was responsible for 80kâs impact evaluation until late last year.
This understanding is a little off. Instead, itâs that in 2019 we decided to switch from IASPCs to DIPYs and CBPCs.
The best place to read about the transition is the mistakes page here, and I think the best places to read detail on how these metrics work is the 2019 review for DIPYs and the 2020 review for CBPCs. (Thereâs a 2015 blog post on IASPCs.)
~~~
Some more general comments on how I think about this:
A natural way to think about 80kâs impact is as a funnel which culminates in a single metric which we can relate to as a for profit does to revenue.
I havenât been able to create a metric which is overall strong enough to make me want to rely on it like that.
The closest Iâve come is the DIPY, but itâs got major problems:
Lags by years.
Takes hundreds of hours to put together.
Requires a bunch of judgement callsâthese are hard for people without context to assess and have fairly low inter-rater reliability (between people, but also the same people over time).
Most (not all) of them come from case studies where people are asked questions directly by 80,000 Hours staff. That introduces some sources of error, including from social-desirability bias.
The case studies itâs based on canât be shared publicly.
Captures a small fraction of our impact.
Doesnât capture externalities.
(Thereâs a bit more discussion on impact eval complexities in the 2019 annual review.)
So, rather than thinking in terms of a single metric to optimise, when I think about 80kâs impact and strategy I consider several sources of information and attempt to weigh each of them appropriately given their strengths and weaknesses.
The major ones are listed in the full 2022 annual review, which Iâll copy out here:
Open Philanthropy EA/âLT survey.
EA Survey responses.
The 80,000 Hours user survey. A summary of the 2022 user survey is linked in the appendix.
Our in-depth case study analyses, which produce our top plan changes (last analysed in 2020). EDIT: this process produces the DIPYs as well. Iâve made a note of this in the public annual reviewâapologies, doing this earlier might have prevented you getting the impression that we retired them.
Our own data about how users interact with our services (e.g. our historical metrics linked in the appendix).
Our and othersâ impressions of the quality of our visible output.
~~~
On your specific questions:
I understand that we didnât make predictions about CBPCs in 2021.
Otherwise, I think the above is probably the best general answer to give to most of theseâbut lmk if you have follow ups :)