This was mentioned in the comments, and you acknowledge this too, but this survey is very likely underpowered, and I’d probably want to see a much larger sample size before reaching any firm conclusions.
To add some more context to this, my colleague @Willem Sleegers put together a simple online tool for looking at a power analysis for this study. The parameters can be edited to look at different scenarios (e.g. larger sample sizes or different effect sizes).
This suggests that power for mean differences of around 0.2 (which is roughly what they seemed to observe for the behaviour/attitude questions) with a sample of 40 and 38 in the control and treatment group respectively would be 22.8%. A sample size of around 200 in each would be needed for 80% power.[1] Fortunately, that seems like it might be within the realm of feasibility for some future study of EA events, so I’d be enthusiastic to see something like that happen.
It’s also worth considering what these effect sizes mean in context. For example, a mean difference of 0.2 could reflect around 20% of respondents moving up one level on the scales used (e.g. from taking an EA action “between a week and a month” ago to taking one “between a week and a day” ago, which seems like it could represent a practically significant increase in EA engagement, though YMMV as to what is practically significant). As such, we also included a simple tool for people to look at what mean difference is implied by different distributions of responses.
It’s important to note that this isn’t an exact assessment of the power for their analyses, since we don’t know the exact details of their analyses or their data (e.g. the correlation between the within-subjects components). But the provided tool is relatively easily adapated for different scenarios.
Though this is not taking into account any attempts to correct for differential attrition or differing characteristics of the control and intervention group, which may further reduce power, and this is also the power for a single test, and power may be further reduced if running multiple tests.
To add some more context to this, my colleague @Willem Sleegers put together a simple online tool for looking at a power analysis for this study. The parameters can be edited to look at different scenarios (e.g. larger sample sizes or different effect sizes).
This suggests that power for mean differences of around 0.2 (which is roughly what they seemed to observe for the behaviour/attitude questions) with a sample of 40 and 38 in the control and treatment group respectively would be 22.8%. A sample size of around 200 in each would be needed for 80% power.[1] Fortunately, that seems like it might be within the realm of feasibility for some future study of EA events, so I’d be enthusiastic to see something like that happen.
It’s also worth considering what these effect sizes mean in context. For example, a mean difference of 0.2 could reflect around 20% of respondents moving up one level on the scales used (e.g. from taking an EA action “between a week and a month” ago to taking one “between a week and a day” ago, which seems like it could represent a practically significant increase in EA engagement, though YMMV as to what is practically significant). As such, we also included a simple tool for people to look at what mean difference is implied by different distributions of responses.
It’s important to note that this isn’t an exact assessment of the power for their analyses, since we don’t know the exact details of their analyses or their data (e.g. the correlation between the within-subjects components). But the provided tool is relatively easily adapated for different scenarios.
Though this is not taking into account any attempts to correct for differential attrition or differing characteristics of the control and intervention group, which may further reduce power, and this is also the power for a single test, and power may be further reduced if running multiple tests.