It’s bugged me for a while that EA has ~13 years of community building efforts but (AFAIK) not much by way of “strong” evidence of the impact of various types of community building / outreach, in particular local/student groups. I’d like to see more by way of baking self-evaluation into the design of community building efforts, and think we’d be in a much better epistemic place if this was at the forefront of efforts to professionalise community building efforts 5+ years ago.
By “strong” I mean a serious attempt at causal evaluation using experimental or quasi-experimental methods—i.e. not necessarily RCTs where these aren’t practical (though it would be great to see some of these where they are!), but some sort of “difference in difference” style analysis, or before-after comparisons. For example, how do groups’ key performance stats (e.g. EA’s ‘produced’, donors, money moved, people going on to EA jobs) compare in the year(s) before vs after getting a full/part time salaried group organiser? Possibly some of this already exists either privately or publicly and the relevant people know where to look (I haven’t looked hard, sorry!). E.g. I remember GWWC putting together a fundraising prospectus in 2015 which estimated various counterfactual scenarios. Have there been serious self-evaluations since ? (Sincere apologies if I’ve missed them or could find them easily—this is a genuine question!)
In terms of what I’d like to see more of with respect to self-evaluation, and tentatively think we could have done better on this over the last 5+ years:
When new initiatives are launched, serious consideration should be paid to how to get high quality evidence of the impact of those initiatives, which aspects of them work best.
E.g. with the recent scale-up of funding for EA groups and hiring or full time coordinators, it would be great if some sort of small-scale A/B test could be run and/or a phased-in introduction. E.g. you could take the top 30-40 universities/groups that we’d ideally have professional outreach at and randomly select half of them to start a (possibly phased-in) programme of professional group leading at the start of 2022-23, and another half at the start of 2023-24.
Possibly this is already happening and I don’t know—apologies if so! (I’ve had one very brief conversation with someone involved which suggested that it isn’t being approached like this)
One objection is that this would delay likely-valuable outreach and is hard to do well. This is true, but it builds knowledge for the future and I wish we’d done more of this 5+ years ago so we’d be more confident in the effectiveness of the increased expenditure today and ideally have a better idea what type of campus support is most effective!
I would love to see 1-4 people with strong quant / social science / impact evaluation skills work for ~6-12 months to do a retrospective evaluation of the evidence of the last ~13 years of movement-building efforts, especially support to local groups. They would need the support of people and organisations that led these efforts, to share data on expenditure and key outcomes. Even if lots of this relied on observational data, my guess is that distilling the information from various groups / efforts would be very valuable in understanding their effectiveness.
I’d personally be pretty excited to see well-run analyses of this type, and would be excited for you or anyone who upvoted this to go for it. I think the reason why it hasn’t happened is simply that it’s always vastly easier to say that other people should do something than to actually do it yourself.
I completely agree that it is far easier to suggest an analysis than to execute one! I personally won’t have the capacity to do this in the next 12-18 months, but would be happy to give feedback on a proposal and/or the research as it develops if someone else is willing and able to take up the mantle.
I do think that this analysis is more likely to be done (and in a high quality way) if it was either done by, commissioned by, or executed with significant buy-in from CEA and other key stakeholders involved in community building and running local groups. This is partly a case of helping source data etc, but also gives important incentives for someone to do this research. If I had lots of free time over the next 6 months, I would only take this on if I was fairly confident that the people in charge of making decisions would value this research. One model would be for someone to write up a short proposal for the analysis and take it to the decision makers; another would be for the decision-makers to commission it (my guess is that this demand-driven approach is more likely to result in a well-funded, high quality study).
To be clear, I massively appreciate the work that many, many people (at CEA and many other orgs) do and have done on community building and professionalising the running of groups (sorry if the tone of my original comment was implicitly critical). I think such work is very likely very valuable. I also think the hits-based model is the correct one as we ramp up spending and that not all expenditure should be thoroughly evaluated. But in cases where it seems very likely that we’ll keep doing the same type of activity for many years and spend comparatively large resources on it (e.g. support for groups), it makes sense to bake self-evaluation into the design of programmes, to help improve their design in the future.
P.S. I’ve also just seen Joan’s write-up of the Focus University groups in the comments below, which suggests that there is already some decent self-evaluation, experimentation and feedback loops happening as part of these programmes’ designs. So it is very possible that there is a good amount of this going on that I (as a very casual observer) am just not aware of!
Agreed! Note, however, that in the case of the FTX grants it will be pretty hard to do this analysis oneself without access to at the very least the list of funded projects, if not the full applications.
I think we would have had the capacity to do difference-in-difference analyses (or even simpler analyses of pre-post differences in groups with or without community building grants, full-time organisers etc.) if the outcome measures tracked in the EA Groups Survey were not changed across iterations and, especially, if we had run the EA Groups Survey more frequently (data has only been collected 3 times since 2017 and was not collected before we ran the first such survey in that year).
As a positive example, 80,000 Hours does relatively extensive impact evaluations. The most obvious limitation is that they have to guess whether any career changes are actually improvements, but I don’t see how to fix that—determining the EV of even a single person’s career is an extremely hard problem. IIRC they’ve done some quasi-experiments but I couldn’t find them from quickly skimming their impact evaluations.
It’s bugged me for a while that EA has ~13 years of community building efforts but (AFAIK) not much by way of “strong” evidence of the impact of various types of community building / outreach, in particular local/student groups. I’d like to see more by way of baking self-evaluation into the design of community building efforts, and think we’d be in a much better epistemic place if this was at the forefront of efforts to professionalise community building efforts 5+ years ago.
By “strong” I mean a serious attempt at causal evaluation using experimental or quasi-experimental methods—i.e. not necessarily RCTs where these aren’t practical (though it would be great to see some of these where they are!), but some sort of “difference in difference” style analysis, or before-after comparisons. For example, how do groups’ key performance stats (e.g. EA’s ‘produced’, donors, money moved, people going on to EA jobs) compare in the year(s) before vs after getting a full/part time salaried group organiser? Possibly some of this already exists either privately or publicly and the relevant people know where to look (I haven’t looked hard, sorry!). E.g. I remember GWWC putting together a fundraising prospectus in 2015 which estimated various counterfactual scenarios. Have there been serious self-evaluations since ? (Sincere apologies if I’ve missed them or could find them easily—this is a genuine question!)
In terms of what I’d like to see more of with respect to self-evaluation, and tentatively think we could have done better on this over the last 5+ years:
When new initiatives are launched, serious consideration should be paid to how to get high quality evidence of the impact of those initiatives, which aspects of them work best.
E.g. with the recent scale-up of funding for EA groups and hiring or full time coordinators, it would be great if some sort of small-scale A/B test could be run and/or a phased-in introduction. E.g. you could take the top 30-40 universities/groups that we’d ideally have professional outreach at and randomly select half of them to start a (possibly phased-in) programme of professional group leading at the start of 2022-23, and another half at the start of 2023-24.
Possibly this is already happening and I don’t know—apologies if so! (I’ve had one very brief conversation with someone involved which suggested that it isn’t being approached like this)
One objection is that this would delay likely-valuable outreach and is hard to do well. This is true, but it builds knowledge for the future and I wish we’d done more of this 5+ years ago so we’d be more confident in the effectiveness of the increased expenditure today and ideally have a better idea what type of campus support is most effective!
I would love to see 1-4 people with strong quant / social science / impact evaluation skills work for ~6-12 months to do a retrospective evaluation of the evidence of the last ~13 years of movement-building efforts, especially support to local groups. They would need the support of people and organisations that led these efforts, to share data on expenditure and key outcomes. Even if lots of this relied on observational data, my guess is that distilling the information from various groups / efforts would be very valuable in understanding their effectiveness.
I’d personally be pretty excited to see well-run analyses of this type, and would be excited for you or anyone who upvoted this to go for it. I think the reason why it hasn’t happened is simply that it’s always vastly easier to say that other people should do something than to actually do it yourself.
I completely agree that it is far easier to suggest an analysis than to execute one! I personally won’t have the capacity to do this in the next 12-18 months, but would be happy to give feedback on a proposal and/or the research as it develops if someone else is willing and able to take up the mantle.
I do think that this analysis is more likely to be done (and in a high quality way) if it was either done by, commissioned by, or executed with significant buy-in from CEA and other key stakeholders involved in community building and running local groups. This is partly a case of helping source data etc, but also gives important incentives for someone to do this research. If I had lots of free time over the next 6 months, I would only take this on if I was fairly confident that the people in charge of making decisions would value this research. One model would be for someone to write up a short proposal for the analysis and take it to the decision makers; another would be for the decision-makers to commission it (my guess is that this demand-driven approach is more likely to result in a well-funded, high quality study).
To be clear, I massively appreciate the work that many, many people (at CEA and many other orgs) do and have done on community building and professionalising the running of groups (sorry if the tone of my original comment was implicitly critical). I think such work is very likely very valuable. I also think the hits-based model is the correct one as we ramp up spending and that not all expenditure should be thoroughly evaluated. But in cases where it seems very likely that we’ll keep doing the same type of activity for many years and spend comparatively large resources on it (e.g. support for groups), it makes sense to bake self-evaluation into the design of programmes, to help improve their design in the future.
P.S. I’ve also just seen Joan’s write-up of the Focus University groups in the comments below, which suggests that there is already some decent self-evaluation, experimentation and feedback loops happening as part of these programmes’ designs. So it is very possible that there is a good amount of this going on that I (as a very casual observer) am just not aware of!
Agreed! Note, however, that in the case of the FTX grants it will be pretty hard to do this analysis oneself without access to at the very least the list of funded projects, if not the full applications.
I also agree this would be extremely valuable.
I think we would have had the capacity to do difference-in-difference analyses (or even simpler analyses of pre-post differences in groups with or without community building grants, full-time organisers etc.) if the outcome measures tracked in the EA Groups Survey were not changed across iterations and, especially, if we had run the EA Groups Survey more frequently (data has only been collected 3 times since 2017 and was not collected before we ran the first such survey in that year).
As a positive example, 80,000 Hours does relatively extensive impact evaluations. The most obvious limitation is that they have to guess whether any career changes are actually improvements, but I don’t see how to fix that—determining the EV of even a single person’s career is an extremely hard problem. IIRC they’ve done some quasi-experiments but I couldn’t find them from quickly skimming their impact evaluations.
This would be great. It also closely aligns with what EA expects before and after giving large funding in most cause areas.