Thanks once again for conducting this study and taking the time to write up the results so clearly. As Guy Raveh says, it takes courage to share your work publicly at all, let alone work that runs contrary to popular opinion on a forum full of people who make a living out of critiquing documents. The whole CEA events team really appreciates it!
Firstly, I want to concede and acknowledge that we haven’t done a great job at sharing how we measure and evaluate the impact of our programmes lately. There’s no complicated reason behind this; running large conferences around the world is very time-consuming, and analysing the impact of events is very difficult! That said, we’ve made more capacity on the team for this, and increasing our capacity for this work is a top priority. As a result, there’s been a lot more impact evaluation going on behind the scenes this year. We hope to share that work soon.
This null result is useful to us; while changes in behaviours and attitudes aren’t a key outcome we’re aiming for with EAGx events (more on this below), it is something we often allude to or claim as something that happens as a result of our events. This study is a data point against that, and that might mean we should redirect some efforts towards generating other kinds of outcomes. Thanks!
However, I don’t expect we’ll update our theory of change because of this study, which I don’t think will come as a surprise to you.
This was mentioned in the comments, and you acknowledge this too, but this survey is very likely underpowered, and I’d probably want to see a much larger sample size before reaching any firm conclusions.
Secondly, many of the behaviours and attitudes you ask about, particularly things like attending further events and engaging more in online EA spaces, aren’t the primary things we aim to influence via EAGx events. We’re typically aiming for concrete plan changes, such as finding new impactful roles, opportunities, and collaborators, and we think a lot of the value of events often comes from just a few cases.
You write:
A large portion of the impact may be concentrated in a small subset of attendees. It is plausible that the majority of conference attendees experience few long-term effects, while a small minority are led to major life or career changes as a result of connections made or motivation obtained at the conference.
This is our current best guess at what’s happening and what we target with our evaluations, but I don’t expect a study like this to pick up on this effect. I like your recommendation to find even more ways to actually identify these impacts!
That said, donating more to EA causes and creating more connections in the community are outcomes that we aim for, so it’s interesting that you didn’t see a long-term effect on these questions (though note the point about the survey being underpowered).
Thanks for writing up the recommendations. As mentioned, we’ve been investing more time in measuring our impact and hope to share some thoughts here soon. This post updated me a bit towards trying something more longitudinal too.
This was mentioned in the comments, and you acknowledge this too, but this survey is very likely underpowered, and I’d probably want to see a much larger sample size before reaching any firm conclusions.
To add some more context to this, my colleague @Willem Sleegers put together a simple online tool for looking at a power analysis for this study. The parameters can be edited to look at different scenarios (e.g. larger sample sizes or different effect sizes).
This suggests that power for mean differences of around 0.2 (which is roughly what they seemed to observe for the behaviour/attitude questions) with a sample of 40 and 38 in the control and treatment group respectively would be 22.8%. A sample size of around 200 in each would be needed for 80% power.[1] Fortunately, that seems like it might be within the realm of feasibility for some future study of EA events, so I’d be enthusiastic to see something like that happen.
It’s also worth considering what these effect sizes mean in context. For example, a mean difference of 0.2 could reflect around 20% of respondents moving up one level on the scales used (e.g. from taking an EA action “between a week and a month” ago to taking one “between a week and a day” ago, which seems like it could represent a practically significant increase in EA engagement, though YMMV as to what is practically significant). As such, we also included a simple tool for people to look at what mean difference is implied by different distributions of responses.
It’s important to note that this isn’t an exact assessment of the power for their analyses, since we don’t know the exact details of their analyses or their data (e.g. the correlation between the within-subjects components). But the provided tool is relatively easily adapated for different scenarios.
Though this is not taking into account any attempts to correct for differential attrition or differing characteristics of the control and intervention group, which may further reduce power, and this is also the power for a single test, and power may be further reduced if running multiple tests.
Replying in full now!
Thanks once again for conducting this study and taking the time to write up the results so clearly. As Guy Raveh says, it takes courage to share your work publicly at all, let alone work that runs contrary to popular opinion on a forum full of people who make a living out of critiquing documents. The whole CEA events team really appreciates it!
Firstly, I want to concede and acknowledge that we haven’t done a great job at sharing how we measure and evaluate the impact of our programmes lately. There’s no complicated reason behind this; running large conferences around the world is very time-consuming, and analysing the impact of events is very difficult! That said, we’ve made more capacity on the team for this, and increasing our capacity for this work is a top priority. As a result, there’s been a lot more impact evaluation going on behind the scenes this year. We hope to share that work soon.
This null result is useful to us; while changes in behaviours and attitudes aren’t a key outcome we’re aiming for with EAGx events (more on this below), it is something we often allude to or claim as something that happens as a result of our events. This study is a data point against that, and that might mean we should redirect some efforts towards generating other kinds of outcomes. Thanks!
However, I don’t expect we’ll update our theory of change because of this study, which I don’t think will come as a surprise to you.
This was mentioned in the comments, and you acknowledge this too, but this survey is very likely underpowered, and I’d probably want to see a much larger sample size before reaching any firm conclusions.
Secondly, many of the behaviours and attitudes you ask about, particularly things like attending further events and engaging more in online EA spaces, aren’t the primary things we aim to influence via EAGx events. We’re typically aiming for concrete plan changes, such as finding new impactful roles, opportunities, and collaborators, and we think a lot of the value of events often comes from just a few cases.
You write:
This is our current best guess at what’s happening and what we target with our evaluations, but I don’t expect a study like this to pick up on this effect. I like your recommendation to find even more ways to actually identify these impacts!
That said, donating more to EA causes and creating more connections in the community are outcomes that we aim for, so it’s interesting that you didn’t see a long-term effect on these questions (though note the point about the survey being underpowered).
Thanks for writing up the recommendations. As mentioned, we’ve been investing more time in measuring our impact and hope to share some thoughts here soon. This post updated me a bit towards trying something more longitudinal too.
Thanks again!
To add some more context to this, my colleague @Willem Sleegers put together a simple online tool for looking at a power analysis for this study. The parameters can be edited to look at different scenarios (e.g. larger sample sizes or different effect sizes).
This suggests that power for mean differences of around 0.2 (which is roughly what they seemed to observe for the behaviour/attitude questions) with a sample of 40 and 38 in the control and treatment group respectively would be 22.8%. A sample size of around 200 in each would be needed for 80% power.[1] Fortunately, that seems like it might be within the realm of feasibility for some future study of EA events, so I’d be enthusiastic to see something like that happen.
It’s also worth considering what these effect sizes mean in context. For example, a mean difference of 0.2 could reflect around 20% of respondents moving up one level on the scales used (e.g. from taking an EA action “between a week and a month” ago to taking one “between a week and a day” ago, which seems like it could represent a practically significant increase in EA engagement, though YMMV as to what is practically significant). As such, we also included a simple tool for people to look at what mean difference is implied by different distributions of responses.
It’s important to note that this isn’t an exact assessment of the power for their analyses, since we don’t know the exact details of their analyses or their data (e.g. the correlation between the within-subjects components). But the provided tool is relatively easily adapated for different scenarios.
Though this is not taking into account any attempts to correct for differential attrition or differing characteristics of the control and intervention group, which may further reduce power, and this is also the power for a single test, and power may be further reduced if running multiple tests.