I ran this analysis, beheading (?) the data to remove all responses which I scored over 5 and changed them to 5 (18 responses).
I realised that all of these responses were from EAGx attendees—that’s not that surprising since 90% of survey responses were from EAGx attendees but that highlights a limitation of my analysis. Plausibly, if I had data from CEP retreats, we might find more high-scoring impact stories.
Previously:
This means the cost per counterfactual impact point at EAGx events is ~$1,219, while the cost per counterfactual impact point at CEP retreats is ~$1,711
Post-beheading:
The cost per counterfactual impact point at EAGx events is ~$1,639, while the cost per counterfactual impact point at CEP retreats is ~$1,724 (due to GBP/USD fluctuation).
So, EAGx still performs slightly better but it’s much closer, probably not a significant difference.
Thanks for proposing this—it does suggest the results are sensitive to my scoring system. I’m not sure where this leaves me; that isn’t ideal and I’d like something more robust but, on the other hand, I think these high-scoring results (people securing jobs, teams being formed) are exactly the kind of things we want to happen at our events so I think it’s reasonable to put significant weight on them.
Thanks for running this analysis Ollie! Interesting findings!
it does suggest the results are sensitive to my scoring system. I’m not sure where this leaves me; that isn’t ideal and I’d like something more robust but, on the other hand, I think these high-scoring results (people securing jobs, teams being formed) are exactly the kind of things we want to happen at our events so I think it’s reasonable to put significant weight on them.
Agree that this exercise doesn’t yield an obvious conclusion. Given that you’ve found the results to be sensitive to the scoring system, I suggest trying to figure out how sensitive. You’ve crunched the numbers using max scores of 50 and 5; I imagine it’d be quick to do the same with max scores of 20,10, and 1 (the other scores you used in your original scoring system).
The other methodology I’d suggest looking at would be to keep the same relative rankings you used originally, but just condense the range of scores (to say 1,2,3,4,5 vs. 1,5,10,20,50). That would capture the fact that you think starting an EA project is more valuable than meeting a collaborator (which is lost by capping the scores at 5), but would assess it as 2.5x more valuable vs. 10x. (Btw, I think the technical term for “beheading” the data is “Winsorizing” though that’s usually done using percentiles of the data set, which is another way you could do a sensitivity analysis).
This sort of more comprehensive sensitivity analysis would shed some light on whether your observation about EAGxAustralia is supported by the broader data set:
For EAGxAustralia, three outcomes really stood out and the rest were good but unremarkable, according to the attendees. It seems likely to me that a good chunk of the value of the event was accrued by just a few people who had more life-changing things happen to them (e.g. getting a grant or job).
If that looks to be a robust finding, that has pretty big implications for how events should be run. FWIW I’d consider that a more important finding than EAGx events looking more cost-effective than CEP events, and would suggest editing the bottom line upfront section to note that.
Longer term, I’d look to refine the metrics you use for events and how you collect the data. I love that you’ve started looking beyond “number of connections” to “valuable outcomes”; this definitely seems like a move in the right direction. However, it’s also not feasible for you to score responses from attendees at scale going forward. So I’d suggest giving asking responders to score the event themselves, while providing guidance on how different experiences should be scored (e.g. starting a new project = X) to promote consistency across respondents.
My hunch is that it’d be good to have people score the event along the different dimensions (connections, learning, motivation/positivity, action, other) you listed in the “How do attendees get value from EA community-building events?” post. That might make the survey too onerous, but if you could collect that data you’d have a lot of granularity about which events accrued which type of value and it’s probably easier to do relative scoring within categories rather than across them. You’d still be able to create a single score based on a weighted average of the different dimensions (where you’d presumably give connections and learning the most weight, since that’s where people seem to get the most value).
Hey Anon,
I ran this analysis, beheading (?) the data to remove all responses which I scored over 5 and changed them to 5 (18 responses).
I realised that all of these responses were from EAGx attendees—that’s not that surprising since 90% of survey responses were from EAGx attendees but that highlights a limitation of my analysis. Plausibly, if I had data from CEP retreats, we might find more high-scoring impact stories.
Previously:
Post-beheading:
The cost per counterfactual impact point at EAGx events is ~$1,639, while the cost per counterfactual impact point at CEP retreats is ~$1,724 (due to GBP/USD fluctuation).
So, EAGx still performs slightly better but it’s much closer, probably not a significant difference.
Thanks for proposing this—it does suggest the results are sensitive to my scoring system. I’m not sure where this leaves me; that isn’t ideal and I’d like something more robust but, on the other hand, I think these high-scoring results (people securing jobs, teams being formed) are exactly the kind of things we want to happen at our events so I think it’s reasonable to put significant weight on them.
Thanks for running this analysis Ollie! Interesting findings!
Agree that this exercise doesn’t yield an obvious conclusion. Given that you’ve found the results to be sensitive to the scoring system, I suggest trying to figure out how sensitive. You’ve crunched the numbers using max scores of 50 and 5; I imagine it’d be quick to do the same with max scores of 20,10, and 1 (the other scores you used in your original scoring system).
The other methodology I’d suggest looking at would be to keep the same relative rankings you used originally, but just condense the range of scores (to say 1,2,3,4,5 vs. 1,5,10,20,50). That would capture the fact that you think starting an EA project is more valuable than meeting a collaborator (which is lost by capping the scores at 5), but would assess it as 2.5x more valuable vs. 10x. (Btw, I think the technical term for “beheading” the data is “Winsorizing” though that’s usually done using percentiles of the data set, which is another way you could do a sensitivity analysis).
This sort of more comprehensive sensitivity analysis would shed some light on whether your observation about EAGxAustralia is supported by the broader data set:
If that looks to be a robust finding, that has pretty big implications for how events should be run. FWIW I’d consider that a more important finding than EAGx events looking more cost-effective than CEP events, and would suggest editing the bottom line upfront section to note that.
Longer term, I’d look to refine the metrics you use for events and how you collect the data. I love that you’ve started looking beyond “number of connections” to “valuable outcomes”; this definitely seems like a move in the right direction. However, it’s also not feasible for you to score responses from attendees at scale going forward. So I’d suggest giving asking responders to score the event themselves, while providing guidance on how different experiences should be scored (e.g. starting a new project = X) to promote consistency across respondents.
My hunch is that it’d be good to have people score the event along the different dimensions (connections, learning, motivation/positivity, action, other) you listed in the “How do attendees get value from EA community-building events?” post. That might make the survey too onerous, but if you could collect that data you’d have a lot of granularity about which events accrued which type of value and it’s probably easier to do relative scoring within categories rather than across them. You’d still be able to create a single score based on a weighted average of the different dimensions (where you’d presumably give connections and learning the most weight, since that’s where people seem to get the most value).