Re: CEA should publish what it has learned about group support work and invest in structured evaluation
On quasi-experiments: my feeling is that overall these wouldn’t be cruxy for deciding whether this sort of work is worth doing at all (because I think that we have strong enough evidence, e.g. from OP’s survey, that this is likely the case).
I think it’s fair to say OP’s survey indicates that groups are valuable (at least for longtermism, which is where the survey focused). I think it provides very little information as to why some groups are more valuable than others (groups at top universities seem particularly valuable, but we don’t know if that’s because of their prestige, the age of the groups, paid organizers, or other factors) or which programs from CEA (or others) have the biggest (or smallest) impact on group success. So even if we assume that groups are valuable, and that CEA does group support work, I don’t think those assumptions imply that CEA’s group support work is valuable. My best guess is that CEA’s group support is valuable, but that we don’t know much about which work (e.g. paid organizers vs. online resources) has the most impact on the outcomes we care about. I find it quite plausible that some of the work could actually be counterproductive (e.g. this discussion).
Greater (and more rigorous) experimentation would help sort these details out, especially if it were built into new programs at the outset.
For this sort of setup, I think that we’re better off using a looser form of iteration, feedback, and case studies, and user interviews. (This is analogous to the pre-product-market-fit stage of a company where you’re not doing tonnes of A/B testing or profit maximization, but are instead trying to get a richer sense of what products would be useful via user interviews etc.) I think that experiments/quasi-experiments are much more useful for situations where there are clear outcome measures and the overall complexity of the environment is somewhat lower.
I feel like this has been going on for many years, without a lot of concrete lessons to show for it. Years ago, and also more recently, CEA has discussed feedback loops being too long to learn much, and capacity being too tight to experiment as much as desired.
I agree that we care about multiple outcomes and that this adds some complexity. But we can still do our best to measure those different outcomes and go from there. Six years (more if you count early GWWC groups or EA Build) into CEA’s group support work, we should be well beyond the point of trying to establish product-market-fit.
EA movement building needs more measurement. I’m not privy to all the details of how EA movement building works but it comes across to me as more of a “spray and pray” strategy than I’d like. While we have done some work I think we’ve still really underinvested in market research to test how our movement appeals to the public before running the movement out into the wild big-time. I also think we should do more to track how our current outreach efforts are working, measuring conversion rates, etc. It’s weird that EA has a reputation of being so evidence-based but doesn’t really take much of an evidence-based orientation to its own growth as far as I can tell.
Also worth noting: Peter is a manager of the EAIF, the main funding option for national/city based groups. Max has mentioned that one of the reasons why he thinks public and/or (quasi) experimental evaluation of group work is relatively low priority is because CEA is already sharing information with other funders and key stakeholders (including, I assume, EAIF). Peter’s comment suggests that he doesn’t view whatever information he’s received as constituting a firm base of evidence to guide future decision making.
Max’s comments from our private correspondence (which he’s given me permission to share):
I think that we’ve shared this [i.e. learnings re: group support] with people who are actively trying to do similar things, and we’re happy to continue to do this. I’m not sure I see doing a full public writeup being competitive with other things we could focus on… it’s not that we have a great writeup that we could share but are hoarding: it would take a lot of staff time to communicate it all publicly, and we also couldn’t say some of the most important things. It’s easier to have conversations with people who are interested (where you can focus on the most relevant bits, say things that are hard to say publicly).”
Hey, thanks for this. I work on CEA’s groups team. When you say “we don’t know much about which work … has the most impact on the outcomes we care about”—I think I would rather say
a) We have a reasonable, yet incomplete, view on how many people different groups cause to engage in EA, and some measure on what is the depth of that engagement
b) We are unsure how many of those people would have become engaged in EA anyway
c) We do not have a good mapping from “people engaging with EA” to the things that we actually want in the world
I think we should be sharing more of the data we have on what types of community building have, so far, seemed to generate more engagement. To this end we have a contractor who will be providing a centralized service for some community building tasks, to help spread what is working. I also think groups that seem to be performing well should be running experiments where other groups adopt their model. I have proposed this to several groups, and will continue to do so.
However trying to predict the mapping from engagement to good things happening in the world is (a) sufficiently difficult that I don’t think anyone can do it reliably (b) deeply unpleasant to a lot of communities. In trying to measure this we could decrease the amount of good that is happening in the world—and also probably wouldn’t succeed in taking the measurement accurately.
I think we should be sharing more of the data we have on what types of community building have, so far, seemed to generate more engagement. To this end we have a contractor who will be providing a centralized service for some community building tasks, to help spread what is working.
I’d love to see more sharing of data and what types of community building seem most effective. But I guess I’m confused as to how you’re assessing the latter. To what extent does this assessment incorporate control groups, even if imperfect (e.g. by comparing the number of engaged EAs a group generates before and after getting a paid organizer, or by comparing the trajectory of EAs generated by groups with paid organizers to that of groups without them?)
trying to predict the mapping from engagement to good things happening in the world is (a) sufficiently difficult that I don’t think anyone can do it reliably (b) deeply unpleasant to a lot of communities.
Yes, totally agree that trying to map from engagement to final outcomes is overkill. Thanks for clarifying this point. FWIW, the difficulty issue is the key factor for me, I was surprised by your “unpleasant to a lot of communities” comment. By that, are you referring to the dynamic where if you have to place value on outcomes, some people/orgs will be disappointed with the value you place on their work?
I also think groups that seem to be performing well should be running experiments where other groups adopt their model. I have proposed this to several groups, and will continue to do so.
This seems like another area where control groups would be helpful in making the exercise an actual experiment. Seems like a fairly easy place to introduce at least some randomization into, i.e. designate a pool of groups that could potentially benefit from adopting another group’s practices, and randomly select which of those groups actually do so. Presumably there would be some selection biases since some groups in the “adopt another group’s model” condition may decline to do so, but still potentially a step forward in measuring causality.
I was surprised by your “unpleasant to a lot of communities” comment. By that, are you referring to the dynamic where if you have to place value on outcomes, some people/orgs will be disappointed with the value you place on their work?
Not really. I was more referring that any attempt to quantify the likely impact someone will have is (a) inaccurate (b) likely to create some sort of hierarchy and unhealthy community dynamics.
This seems like another area where control groups would be helpful in making the exercise an actual experiment. Seems like a fairly easy place to introduce at least some randomization into
I agree with this, I like the idea of successful groups joining existing mentorship programs such that there is a natural control group of “average of all the other mentors.” (There are many ways this experiment would be imperfect, as I’m sure you can imagine) - I think the main implementation challenge here so far has been “getting groups to actually want to do this.” We are very careful to preserve the groups’ autonomy, I think this acts as a check on our behaviour. If groups engage on programs with us voluntarily, and we don’t make that engagement a condition of funding, it demonstrates that our programs are at least delivering value in the eyes of the organizers. If we started trying to claim more autonomy and started designating groups into experiments, we’d lose one of our few feedback measures. On balance I think I would prefer to have the feedback mechanism rather than the experiment. (The previous paragraph does contain some simplifications, it would certainly be possible to find examples of where we haven’t optimised purely for group autonomy)
Thanks for clarifying these points Rob. Agree that group autonomy is an important feedback loop, and that this feedback is more important than the experiment I suggested. But to the extent its possible to do experimentation on a voluntary basis, I do think that’d be valuable.
Re: CEA should publish what it has learned about group support work and invest in structured evaluation
I think it’s fair to say OP’s survey indicates that groups are valuable (at least for longtermism, which is where the survey focused). I think it provides very little information as to why some groups are more valuable than others (groups at top universities seem particularly valuable, but we don’t know if that’s because of their prestige, the age of the groups, paid organizers, or other factors) or which programs from CEA (or others) have the biggest (or smallest) impact on group success. So even if we assume that groups are valuable, and that CEA does group support work, I don’t think those assumptions imply that CEA’s group support work is valuable. My best guess is that CEA’s group support is valuable, but that we don’t know much about which work (e.g. paid organizers vs. online resources) has the most impact on the outcomes we care about. I find it quite plausible that some of the work could actually be counterproductive (e.g. this discussion).
Greater (and more rigorous) experimentation would help sort these details out, especially if it were built into new programs at the outset.
I feel like this has been going on for many years, without a lot of concrete lessons to show for it. Years ago, and also more recently, CEA has discussed feedback loops being too long to learn much, and capacity being too tight to experiment as much as desired.
I agree that we care about multiple outcomes and that this adds some complexity. But we can still do our best to measure those different outcomes and go from there. Six years (more if you count early GWWC groups or EA Build) into CEA’s group support work, we should be well beyond the point of trying to establish product-market-fit.
This comment from Peter Wildeford’s recently published criticisms of EA seems relevant to this topic:
Also worth noting: Peter is a manager of the EAIF, the main funding option for national/city based groups. Max has mentioned that one of the reasons why he thinks public and/or (quasi) experimental evaluation of group work is relatively low priority is because CEA is already sharing information with other funders and key stakeholders (including, I assume, EAIF). Peter’s comment suggests that he doesn’t view whatever information he’s received as constituting a firm base of evidence to guide future decision making.
Max’s comments from our private correspondence (which he’s given me permission to share):
Hey, thanks for this. I work on CEA’s groups team. When you say “we don’t know much about which work … has the most impact on the outcomes we care about”—I think I would rather say
a) We have a reasonable, yet incomplete, view on how many people different groups cause to engage in EA, and some measure on what is the depth of that engagement
b) We are unsure how many of those people would have become engaged in EA anyway
c) We do not have a good mapping from “people engaging with EA” to the things that we actually want in the world
I think we should be sharing more of the data we have on what types of community building have, so far, seemed to generate more engagement. To this end we have a contractor who will be providing a centralized service for some community building tasks, to help spread what is working. I also think groups that seem to be performing well should be running experiments where other groups adopt their model. I have proposed this to several groups, and will continue to do so.
However trying to predict the mapping from engagement to good things happening in the world is (a) sufficiently difficult that I don’t think anyone can do it reliably (b) deeply unpleasant to a lot of communities. In trying to measure this we could decrease the amount of good that is happening in the world—and also probably wouldn’t succeed in taking the measurement accurately.
Thanks Rob, this is helpful!
I’d love to see more sharing of data and what types of community building seem most effective. But I guess I’m confused as to how you’re assessing the latter. To what extent does this assessment incorporate control groups, even if imperfect (e.g. by comparing the number of engaged EAs a group generates before and after getting a paid organizer, or by comparing the trajectory of EAs generated by groups with paid organizers to that of groups without them?)
Yes, totally agree that trying to map from engagement to final outcomes is overkill. Thanks for clarifying this point. FWIW, the difficulty issue is the key factor for me, I was surprised by your “unpleasant to a lot of communities” comment. By that, are you referring to the dynamic where if you have to place value on outcomes, some people/orgs will be disappointed with the value you place on their work?
This seems like another area where control groups would be helpful in making the exercise an actual experiment. Seems like a fairly easy place to introduce at least some randomization into, i.e. designate a pool of groups that could potentially benefit from adopting another group’s practices, and randomly select which of those groups actually do so. Presumably there would be some selection biases since some groups in the “adopt another group’s model” condition may decline to do so, but still potentially a step forward in measuring causality.
Not really. I was more referring that any attempt to quantify the likely impact someone will have is (a) inaccurate (b) likely to create some sort of hierarchy and unhealthy community dynamics.
I agree with this, I like the idea of successful groups joining existing mentorship programs such that there is a natural control group of “average of all the other mentors.” (There are many ways this experiment would be imperfect, as I’m sure you can imagine) - I think the main implementation challenge here so far has been “getting groups to actually want to do this.” We are very careful to preserve the groups’ autonomy, I think this acts as a check on our behaviour. If groups engage on programs with us voluntarily, and we don’t make that engagement a condition of funding, it demonstrates that our programs are at least delivering value in the eyes of the organizers. If we started trying to claim more autonomy and started designating groups into experiments, we’d lose one of our few feedback measures. On balance I think I would prefer to have the feedback mechanism rather than the experiment. (The previous paragraph does contain some simplifications, it would certainly be possible to find examples of where we haven’t optimised purely for group autonomy)
Thanks for clarifying these points Rob. Agree that group autonomy is an important feedback loop, and that this feedback is more important than the experiment I suggested. But to the extent its possible to do experimentation on a voluntary basis, I do think that’d be valuable.
I agree with this statement entirely.
Go team!