Thanks for running the survey, writing it up, and posting the data. I think this is chiefly valuable for giving people an approximate overview of what we know about the movement, so it’s great to have the summary document which does that.
I would have preferred fewer attempts to look for statistical significance, as I’m not sure they ever helped much and think they have led you to at least one misleading conclusion. In particular:
Reading the “The Four Focus Areas of Effective Altruism”, one would expect a roughly even split between (1) poverty, (2) metacharity, (3) far future / xrisk / AI, and (4) nonhuman animals. Above, instead of equal splits, poverty emerges as a clear leader [Footnote: Statistically significant with a t-test, p < 0.0001]
On the contrary, I think the main message from the data is that in the sample collected, they are roughly evenly split. The biggest of the four beats the smallest by less than a factor of two—this is a relatively small difference when there are no mechanisms I can see which should equalise their size (I would not have been shocked if you’d found an order of magnitude difference between some two of them).
Doing a test here for statistical significance is basically checking the hypothesis that survey participants were drawn from a distribution with exactly equal proportions supporting the four causes. But that’s obviously nonsense—we don’t need a big survey to tell us that. It does tell us that poverty is the biggest (where we might not have been confident about which was), but statistical significance is misleading in terms of what that means—the raw ratios are more informative.
Thanks for the feedback. I agree that particular test/conclusion was unnecessary/misleading. I think we’ll be more careful to avoid tests like that in future survey analyses :)
It’s hard to say. Others have told me that they greatly preferred backing up these kinds of statements with statistical testing. I guess I can’t make everyone happy. :)
OK, I guess I inferred the causality as being you did the test, then wrote the statement. If you were going to use the same language anyway, I agree that the test doesn’t hurt—but I think that this statement might have been better left out or weakened.
I agree with the spirit of this criticism, though it seems that the problem is not significance testing as such, but a failure to define the null hypothesis adequately.
Thanks for running the survey, writing it up, and posting the data. I think this is chiefly valuable for giving people an approximate overview of what we know about the movement, so it’s great to have the summary document which does that.
I would have preferred fewer attempts to look for statistical significance, as I’m not sure they ever helped much and think they have led you to at least one misleading conclusion. In particular:
On the contrary, I think the main message from the data is that in the sample collected, they are roughly evenly split. The biggest of the four beats the smallest by less than a factor of two—this is a relatively small difference when there are no mechanisms I can see which should equalise their size (I would not have been shocked if you’d found an order of magnitude difference between some two of them).
Doing a test here for statistical significance is basically checking the hypothesis that survey participants were drawn from a distribution with exactly equal proportions supporting the four causes. But that’s obviously nonsense—we don’t need a big survey to tell us that. It does tell us that poverty is the biggest (where we might not have been confident about which was), but statistical significance is misleading in terms of what that means—the raw ratios are more informative.
Thanks for the feedback. I agree that particular test/conclusion was unnecessary/misleading. I think we’ll be more careful to avoid tests like that in future survey analyses :)
It’s hard to say. Others have told me that they greatly preferred backing up these kinds of statements with statistical testing. I guess I can’t make everyone happy. :)
OK, I guess I inferred the causality as being you did the test, then wrote the statement. If you were going to use the same language anyway, I agree that the test doesn’t hurt—but I think that this statement might have been better left out or weakened.
I agree with the spirit of this criticism, though it seems that the problem is not significance testing as such, but a failure to define the null hypothesis adequately.