Sella, thanks for the post. I think this is a very interesting idea (and I am guessing that other non-US/UK EA groups may think so as well). I see it as doing relative optimization in a much larger space rather than absolute optimization within a small group (people who actually have a chance of going into 80,000 Hours’s highest-impact paths).
In that sense, Probably Good reminds me of what Elijah explained here about what the ImpactMatters team is trying to do under their new roof at Charity Navigator:
Certainly in typical EA terms, many of the nonprofits that are analyzed are not the most cost-effective. But we also know that standard EA nonprofits are a fraction of the $300 bil nonprofit sector, and there is a portion of that money that has high intra-cause elasticity but low inter-cause elasticity. Impact analysis could be a way of shifting that money, yielding very cost-effective returns [...]
I think the analogy to ImpactMatters is insightful and relevant, and indeed reaching a broader audience/scope (even at the cost of including less impactful career paths) is part of the justification for this work. I think the difference between inter-cause elasticity and intra-cause elasticity may be even larger when discussing careers, because in addition to people’s priorities and values, many people will have education, experience and skills which make it less likely (or even desirable) that they move to a completely different cause area.
I do however also want to highlight that I think there are justifications for this view beyond just a numbers game. As we discuss in our overview and in our core principles, we think there are disagreements within EA that warrant some agnosticism and uncertainty. One example of this is the more empiricist view which focuses on measurable interventions and views speculative work that cannot be easily evaluated or validated skeptically, vs. the more hits-based approach which focuses on interventions that are less certain but are estimated to have orders of magnitude more impact in expectation. These views are (arguably) at the crux of comparisons between top cause areas that are a core part of the EA community (e.g. global poverty & health vs. existential risk mitigation). For many people working in both of these cause areas, we genuinely believe careers within their field are the most promising thing they could do.
Additionally, we not only believe in broader career advice is useful in optimizing the impact of those who would not choose top priority paths, but actually may lead to more people joining top priority paths in the focus areas of existing career orgs in the long run. As we mention in our overview and in our paths to impact, and based on our experience in career guidance so far, we believe that providing people answers to the questions they already care about, while discussing crucial considerations they might not think about often, is a great way to expose people to impact maximization principles. Our hope is that even if we care exclusively about top priority paths already researched by 80K and others, this organization will end up having a net positive effect on the number of people who pursue these paths. Whether this will be the case, of course, remains to be seen—but we intend on measuring and evaluating this question as one of our core questions moving forward.
Yes, thanks for that: I can see the broader strategic implications. I actually think the equivalent to “but actually may lead to more people joining top priority paths in the focus areas of existing career orgs in the long run” may also be true in the effective giving space.
Whether this will be the case, of course, remains to be seen—but we intend on measuring and evaluating this question as one of our core questions moving forward.
Could your describe your current thinking on how you’d measure and evaluate this question?
I imagine more clarity on that question would be quite useful for evaluating and informing your organisation. And if you measured it in a way that meant the results could generalise to other organisations/efforts, or if you had a method of measuring it that others could adapt, I imagine this could be useful in a variety of other ways too. E.g., it could inform 80k’s own approach, approaches of university and local groups, topic and attendee selection for EAGs, etc.
I agree this is an important question that would be of value to other organizations as well. We’ve already consulted with 80K, CE and AAC about it, but still feel this is an area we have a lot more work to do on. It isn’t explicitly pointed out in our open questions doc, but when we talk about measuring and evaluating our counterfactual benefits and harms, this question has been top of mind for us.
The short version of our current thinking is separated into short-term measurement and long-term measurement. We expect that longer term this kind of evaluation will be easier—since we’ll at least have career trajectories to evaluate. Counterfactual impact estimation is always challenging without an experimental set up which is hard to do at scale, but I think 80K and OpenPhil have put out multiple surveys that try to and extract estimates of counterfactual impact and do so reasonably well given the challenges, so we’ll probably do something similar. Also, at that point, we could compare our results to theirs, which could be a useful barometer. In the specific context of our effect on people taking existing priority paths, I think it’ll be interesting to compare the chosen career paths of people who have discovered 80K through our website relative to those who discovered 80K from other sources.
Our larger area of focus at the moment is how to evaluate the effect of our work in the short term, when we can’t yet see our long-term effect on people’s careers. We plan on measuring proxies, such as changes to their values, beliefs and plans. We expect whatever proxy we use in the short term to be very noisy and based on a small sample size, so we plan on relying heavily on qualitative methods. This is one of the reasons we reached out to a lot of people who are experienced in this space (and we’re incredibly grateful they agreed to help) - we think their intuition is an invaluable proxy to figuring out if we’re heading in the right direction.
This is an area that we believe is important and we still have a lot of uncertainty about, so additional advice from people with significant experience in this domain would be highly appreciated.
Here’s a (potentially stupid) idea for a mini RCT-type evaluation of this that came to mind: You could perhaps choose some subset of applicants for advising calls, and then randomly assign half of those to go through your normal process and half to be simply referred to 80k. And 80k could perhaps do the same in the other direction.
You could perhaps arrange for these referred people to definitely be spoken to (rather than not being accepted for advising or waiting for many months). And/or you could choose the subset for this random allocation to ensure the people are fairly good fits for either organisation’s focus (rather than e.g. someone who’ll very clearly focus on longtermism or someone who’ll very clearly focus on global health & poverty).
And then you could see whether the outcomes differ depending on which org the people were randomly assigned to speak to. Including seeing if the people assigned to speak to 80k were substantially more likely to then pursue their priority paths, and if so, whether they stuck with that, whether they liked it, and whether they seem to be doing well at it.
I raise this as food for thought rather than as a worked-out plan. It’s possible that anything remotely likely this would be too complicated and time-consuming to be worthwhile. And even if something like this is worth doing, maybe various details would need to be added or changed.
I like this, but have a few concerns. First, you need to pick a good outcome metrics, and most are high-variance and not very informative / objective. I also think the hoped-for outcomes are different, since 80k wants a few people to pick high-priority career paths, and probably good wants slight marginal improvements along potentially non-ideal career paths. And lastly, you can’t reliably randomize, since many people who might talk to Probably Good will be looking at 80k as well. Given all of that, I worry that even if you pick something useful to measure, the power / sample size needed, given individual variance, would be very large.
Still, I’d be happy to help Sella / Omer work through this and set it up, since I suspect they will get more applicants than they will be able to handle, and randomizing seems like a reasonable choice—and almost any type of otherwise useful follow-up survey can be used in this way once they are willing to randomize.
Sella, thanks for the post. I think this is a very interesting idea (and I am guessing that other non-US/UK EA groups may think so as well). I see it as doing relative optimization in a much larger space rather than absolute optimization within a small group (people who actually have a chance of going into 80,000 Hours’s highest-impact paths).
In that sense, Probably Good reminds me of what Elijah explained here about what the ImpactMatters team is trying to do under their new roof at Charity Navigator:
Great point Pablo.
I think the analogy to ImpactMatters is insightful and relevant, and indeed reaching a broader audience/scope (even at the cost of including less impactful career paths) is part of the justification for this work. I think the difference between inter-cause elasticity and intra-cause elasticity may be even larger when discussing careers, because in addition to people’s priorities and values, many people will have education, experience and skills which make it less likely (or even desirable) that they move to a completely different cause area.
I do however also want to highlight that I think there are justifications for this view beyond just a numbers game. As we discuss in our overview and in our core principles, we think there are disagreements within EA that warrant some agnosticism and uncertainty. One example of this is the more empiricist view which focuses on measurable interventions and views speculative work that cannot be easily evaluated or validated skeptically, vs. the more hits-based approach which focuses on interventions that are less certain but are estimated to have orders of magnitude more impact in expectation. These views are (arguably) at the crux of comparisons between top cause areas that are a core part of the EA community (e.g. global poverty & health vs. existential risk mitigation). For many people working in both of these cause areas, we genuinely believe careers within their field are the most promising thing they could do.
Additionally, we not only believe in broader career advice is useful in optimizing the impact of those who would not choose top priority paths, but actually may lead to more people joining top priority paths in the focus areas of existing career orgs in the long run. As we mention in our overview and in our paths to impact, and based on our experience in career guidance so far, we believe that providing people answers to the questions they already care about, while discussing crucial considerations they might not think about often, is a great way to expose people to impact maximization principles. Our hope is that even if we care exclusively about top priority paths already researched by 80K and others, this organization will end up having a net positive effect on the number of people who pursue these paths. Whether this will be the case, of course, remains to be seen—but we intend on measuring and evaluating this question as one of our core questions moving forward.
Yes, thanks for that: I can see the broader strategic implications. I actually think the equivalent to “but actually may lead to more people joining top priority paths in the focus areas of existing career orgs in the long run” may also be true in the effective giving space.
Could your describe your current thinking on how you’d measure and evaluate this question?
I imagine more clarity on that question would be quite useful for evaluating and informing your organisation. And if you measured it in a way that meant the results could generalise to other organisations/efforts, or if you had a method of measuring it that others could adapt, I imagine this could be useful in a variety of other ways too. E.g., it could inform 80k’s own approach, approaches of university and local groups, topic and attendee selection for EAGs, etc.
I agree this is an important question that would be of value to other organizations as well. We’ve already consulted with 80K, CE and AAC about it, but still feel this is an area we have a lot more work to do on. It isn’t explicitly pointed out in our open questions doc, but when we talk about measuring and evaluating our counterfactual benefits and harms, this question has been top of mind for us.
The short version of our current thinking is separated into short-term measurement and long-term measurement. We expect that longer term this kind of evaluation will be easier—since we’ll at least have career trajectories to evaluate. Counterfactual impact estimation is always challenging without an experimental set up which is hard to do at scale, but I think 80K and OpenPhil have put out multiple surveys that try to and extract estimates of counterfactual impact and do so reasonably well given the challenges, so we’ll probably do something similar. Also, at that point, we could compare our results to theirs, which could be a useful barometer. In the specific context of our effect on people taking existing priority paths, I think it’ll be interesting to compare the chosen career paths of people who have discovered 80K through our website relative to those who discovered 80K from other sources.
Our larger area of focus at the moment is how to evaluate the effect of our work in the short term, when we can’t yet see our long-term effect on people’s careers. We plan on measuring proxies, such as changes to their values, beliefs and plans. We expect whatever proxy we use in the short term to be very noisy and based on a small sample size, so we plan on relying heavily on qualitative methods. This is one of the reasons we reached out to a lot of people who are experienced in this space (and we’re incredibly grateful they agreed to help) - we think their intuition is an invaluable proxy to figuring out if we’re heading in the right direction.
This is an area that we believe is important and we still have a lot of uncertainty about, so additional advice from people with significant experience in this domain would be highly appreciated.
Thanks, that all sounds reasonable :)
Here’s a (potentially stupid) idea for a mini RCT-type evaluation of this that came to mind: You could perhaps choose some subset of applicants for advising calls, and then randomly assign half of those to go through your normal process and half to be simply referred to 80k. And 80k could perhaps do the same in the other direction.
You could perhaps arrange for these referred people to definitely be spoken to (rather than not being accepted for advising or waiting for many months). And/or you could choose the subset for this random allocation to ensure the people are fairly good fits for either organisation’s focus (rather than e.g. someone who’ll very clearly focus on longtermism or someone who’ll very clearly focus on global health & poverty).
And then you could see whether the outcomes differ depending on which org the people were randomly assigned to speak to. Including seeing if the people assigned to speak to 80k were substantially more likely to then pursue their priority paths, and if so, whether they stuck with that, whether they liked it, and whether they seem to be doing well at it.
I raise this as food for thought rather than as a worked-out plan. It’s possible that anything remotely likely this would be too complicated and time-consuming to be worthwhile. And even if something like this is worth doing, maybe various details would need to be added or changed.
I like this, but have a few concerns. First, you need to pick a good outcome metrics, and most are high-variance and not very informative / objective. I also think the hoped-for outcomes are different, since 80k wants a few people to pick high-priority career paths, and probably good wants slight marginal improvements along potentially non-ideal career paths. And lastly, you can’t reliably randomize, since many people who might talk to Probably Good will be looking at 80k as well. Given all of that, I worry that even if you pick something useful to measure, the power / sample size needed, given individual variance, would be very large.
Still, I’d be happy to help Sella / Omer work through this and set it up, since I suspect they will get more applicants than they will be able to handle, and randomizing seems like a reasonable choice—and almost any type of otherwise useful follow-up survey can be used in this way once they are willing to randomize.