Hi, thanks for putting together a serious attempt to measure the effect of EAGx conferences.
Whilst I am highly supportive of your attempts to do this, I wanted to ask about your methodology. You are attempting a difference in differences, with a control and treatment group, but immediately state: ’In the data from the initial collection period (‘before’), the control group and treatment groups showed several significant differences.’
To my knowledge this effectively means you no longer have a suitable control and treatment groups and makes a difference in differences ill-advised?
Very happy to be wrong here! But whilst you made some attempts to find a control and treatment group, your initial analysis of the two groups suggests you are comparing apples to oranges?
Control and treatment groups can have some differences, but critically we want the parallel trends assumption to hold. From your descriptive analysis of the two groups this is unlikely to be true. Would you agree that your control group and treatment group would have gone on to engage with EA in different ways if the conference had never happened? I’d say so, given that you demonstrate they are rather different groups of people!
If I’m right—I may not be—This still leaves room for analysing the effects on conference attendance on attendees behaviour, but just losing the control and using differing methodology. I hope I’m wrong—but I think this means that we can’t use this part of your results.
Thanks for the feedback Sam. It’s definitely a limitation but the diff-in-diff analysis still has significant value. The specific way the treatment and control groups are different constrains the stories we can tell where the conference did have a big (hopefully positive) effect but appared not to due to some unobserved factors. If none of these stories seem plausible then we can still be relatively confident in the results.
The post mentions that the difference in donation appears to be driven by a 3 respondents, and the idea that non-attendee donations fall by ~50% without attendance but would be unchanged with attendance seems unlikely (and confounded with high-earning professionals having presumably less time to attend).
Otherwise, the control group seems to have similar beliefs but is much less likely to take EA actions. This isn’t surprising given attending EAGx is an EA action but does present a problem. Looking only at people who were planning to attend but didn’t (for various reasons) would have given a very solid subgroup but there were too few of these to do any statistical analysis. Though a bigger conference could have looked specifically at that group, which I’d be really excited to see.
With diff-in-diff we need the parallel trends assumption as you point out, but we don’t need parallel levels: if the groups would have continued at their previous (different) rates of engagement in the absence of the conference then we should be fine. Similarly, if there’s some external event affecting EA in general and we can assume it would have impacted both groups equivalently (at least in % of engagement) then the diff-in-diff methodology should account for that.
So (excluding the donation case) we have a situation where a more engaged group and a less engaged group both didnt change their behavior.
If the conference had a big positive effect then this would imply that in the absence of the conference the attendees / the more engaged group would have decreased their engagement dramatically but that the effect of the conference happened to cancel that out. It also implies that whatever factor would have led attendees to become less engaged wouldn’t have affected non-attendees (or at least is strongly correlated to attendance).
You could imagine the response rates being responsible, but I’m struggling to think of a credibly story for this: The 41% of attendees who dropped out of the follow-up survey would presumably be those least affected by the conference, which would make the data overestimate the impact of EAGx. Perhaps the 3% of contacted people who volunteered for the treatment group were much more consistent in their EA engagement than the (more engaged on average) attendees who volunteered and so were less affected by an EA-wide downturn that conference attendance happened to cancel out? But this seems tenuous and ‘just-so’.
To me the most plausible way this could happen is reversion to the mean: EA engagement is highly volatile on a year-to-year level with only the most engaged going to EAGx and that results in them maintaining their high-level of EA engagement for at least the next year (roughly cancelling out the usual decline).
This last point is the biggest issue with the analysis in my opinion. Following attendees over the long-run with multiple surveys per year (to compare results before vs. after a conference) would help a lot, but huge incentives would be needed to maintain a meaningful sample for more than a couple of years.
Hi, thanks for putting together a serious attempt to measure the effect of EAGx conferences.
Whilst I am highly supportive of your attempts to do this, I wanted to ask about your methodology. You are attempting a difference in differences, with a control and treatment group, but immediately state: ’In the data from the initial collection period (‘before’), the control group and treatment groups showed several significant differences.’
To my knowledge this effectively means you no longer have a suitable control and treatment groups and makes a difference in differences ill-advised?
Very happy to be wrong here! But whilst you made some attempts to find a control and treatment group, your initial analysis of the two groups suggests you are comparing apples to oranges?
Control and treatment groups can have some differences, but critically we want the parallel trends assumption to hold. From your descriptive analysis of the two groups this is unlikely to be true. Would you agree that your control group and treatment group would have gone on to engage with EA in different ways if the conference had never happened? I’d say so, given that you demonstrate they are rather different groups of people!
If I’m right—I may not be—This still leaves room for analysing the effects on conference attendance on attendees behaviour, but just losing the control and using differing methodology. I hope I’m wrong—but I think this means that we can’t use this part of your results.
Thanks for the feedback Sam. It’s definitely a limitation but the diff-in-diff analysis still has significant value. The specific way the treatment and control groups are different constrains the stories we can tell where the conference did have a big (hopefully positive) effect but appared not to due to some unobserved factors. If none of these stories seem plausible then we can still be relatively confident in the results.
The post mentions that the difference in donation appears to be driven by a 3 respondents, and the idea that non-attendee donations fall by ~50% without attendance but would be unchanged with attendance seems unlikely (and confounded with high-earning professionals having presumably less time to attend).
Otherwise, the control group seems to have similar beliefs but is much less likely to take EA actions. This isn’t surprising given attending EAGx is an EA action but does present a problem. Looking only at people who were planning to attend but didn’t (for various reasons) would have given a very solid subgroup but there were too few of these to do any statistical analysis. Though a bigger conference could have looked specifically at that group, which I’d be really excited to see.
With diff-in-diff we need the parallel trends assumption as you point out, but we don’t need parallel levels: if the groups would have continued at their previous (different) rates of engagement in the absence of the conference then we should be fine. Similarly, if there’s some external event affecting EA in general and we can assume it would have impacted both groups equivalently (at least in % of engagement) then the diff-in-diff methodology should account for that.
So (excluding the donation case) we have a situation where a more engaged group and a less engaged group both didnt change their behavior.
If the conference had a big positive effect then this would imply that in the absence of the conference the attendees / the more engaged group would have decreased their engagement dramatically but that the effect of the conference happened to cancel that out. It also implies that whatever factor would have led attendees to become less engaged wouldn’t have affected non-attendees (or at least is strongly correlated to attendance).
You could imagine the response rates being responsible, but I’m struggling to think of a credibly story for this: The 41% of attendees who dropped out of the follow-up survey would presumably be those least affected by the conference, which would make the data overestimate the impact of EAGx. Perhaps the 3% of contacted people who volunteered for the treatment group were much more consistent in their EA engagement than the (more engaged on average) attendees who volunteered and so were less affected by an EA-wide downturn that conference attendance happened to cancel out? But this seems tenuous and ‘just-so’.
To me the most plausible way this could happen is reversion to the mean: EA engagement is highly volatile on a year-to-year level with only the most engaged going to EAGx and that results in them maintaining their high-level of EA engagement for at least the next year (roughly cancelling out the usual decline).
This last point is the biggest issue with the analysis in my opinion. Following attendees over the long-run with multiple surveys per year (to compare results before vs. after a conference) would help a lot, but huge incentives would be needed to maintain a meaningful sample for more than a couple of years.