[2] I generated scores for 43 different metrics and have not tested for significant correlations between all possible pairings. Additionally, I identified some correlations as significant but chose to exclude them if I believed them to be especially misleading, given known confounding factors or methodological difficulties. Given the high rate of Type II error, I only report significant correlations in the discussion below, rather than treating nonsignificant correlations as providing meaningful evidence that there is no relationship between two variables. This statistical analysis was only one input to help me clarify my thinking, rather than the main criterion for deciding the key lessons from social movement history. I used Spearman’s correlation rather than Pearson’s correlation because it depends on fewer assumptions such as continuous data and is less sensitive to outliers.
Did you adjust for multiple tests? It looks like you didn’t adjust the level for significance down (0.05), so did you adjust p-values up?
I didn’t adjust. This was very much “exploratory analysis”, and it’s not common to adjust in exploratory analysis as far as I’m aware.
I also didn’t discuss a couple of correlations that turned out to be significant because it seemed pretty likely to me that they were spurious / artifacts of the methodology and question wording; and I was worried that either people would misinterpret or place too much weight on the “findings”, or I’d have to make the post excessively lengthy to discuss the various caveats and nuances.
We had a lot of internal discussion about whether using this statistical analysis was appropriate, given concerns such as (1) the very small number of data points, (2) the scores not really being continuous data since they were always either whole numbers or halves, etc. So any thoughts on the methodology used there are welcome.
Hmm, I guess with at least 40 correlations (before excluding some?), making this kind of adjustment will very likely leave you with no statistically significant correlations, unless you had some extremely small p-values (it looks like they were > 0.01, so not that small), but you could also take that as a sign that this kind of analysis is unlikely to be informative without retesting the same hypotheses separately on new data. EDIT: Actually, how many correlations did you test?
I think it’s worth noting explicitly in writing the much higher risk that these are chance correlations, due the number of tests you did. It may also be worth reporting the original p-values and adjusted significance level (or adjusted p-values; I assume you can make the inverse p-value adjustments instead, but haven’t checked if anyone does this).
It might also be worth reporting the number and proportion of statistically significant correlations you found (before and/or after the exclusions). If the tests were independent (they aren’t), you’d expect around 5% if the null were true in all cases, just by chance. Just reasoning from my understanding of hypothesis tests, a higher proportion than 5% would increase your confidence in the claim that some of the statistically significant relationships you identified are likely non-chance relationships (or that the significant ones are dependent), and a similar or lower proportion would suggest they are chance (or that your study is underpowered, or that the insignificant ones are dependent).
I was going to suggest ANOVA F-tests with linear models for dependent variables of interest to get around the independent tests assumption, but unless you cut down the number of independent variables to less than the number of movements, the model will probably overfit using extreme coefficients and perfectly predict the dependent variable, and this wouldn’t be informative. You could constrain their values to try to prevent this, but then this gets much messier and there’s still no guarantee this will address the problem.
EDIT: Also, I’m not sure what kinds of tests you used, but with small sample sizes, my understanding is that tests based on resampling (permutation, bootstrapping, jackknife) tend to be more accurate than tests using asymptotic distributions (e.g. a normally distributed test statistic is often not a good approximation for a small sample), but this is a separate concern from adjusting for multiple tests. I’m also not sure how much this actually matters.
Did you adjust for multiple tests? It looks like you didn’t adjust the level for significance down (0.05), so did you adjust p-values up?
I didn’t adjust. This was very much “exploratory analysis”, and it’s not common to adjust in exploratory analysis as far as I’m aware.
I also didn’t discuss a couple of correlations that turned out to be significant because it seemed pretty likely to me that they were spurious / artifacts of the methodology and question wording; and I was worried that either people would misinterpret or place too much weight on the “findings”, or I’d have to make the post excessively lengthy to discuss the various caveats and nuances.
We had a lot of internal discussion about whether using this statistical analysis was appropriate, given concerns such as (1) the very small number of data points, (2) the scores not really being continuous data since they were always either whole numbers or halves, etc. So any thoughts on the methodology used there are welcome.
Hmm, I guess with at least 40 correlations (before excluding some?), making this kind of adjustment will very likely leave you with no statistically significant correlations, unless you had some extremely small p-values (it looks like they were > 0.01, so not that small), but you could also take that as a sign that this kind of analysis is unlikely to be informative without retesting the same hypotheses separately on new data. EDIT: Actually, how many correlations did you test?
I think it’s worth noting explicitly in writing the much higher risk that these are chance correlations, due the number of tests you did. It may also be worth reporting the original p-values and adjusted significance level (or adjusted p-values; I assume you can make the inverse p-value adjustments instead, but haven’t checked if anyone does this).
It might also be worth reporting the number and proportion of statistically significant correlations you found (before and/or after the exclusions). If the tests were independent (they aren’t), you’d expect around 5% if the null were true in all cases, just by chance. Just reasoning from my understanding of hypothesis tests, a higher proportion than 5% would increase your confidence in the claim that some of the statistically significant relationships you identified are likely non-chance relationships (or that the significant ones are dependent), and a similar or lower proportion would suggest they are chance (or that your study is underpowered, or that the insignificant ones are dependent).
I was going to suggest ANOVA F-tests with linear models for dependent variables of interest to get around the independent tests assumption, but unless you cut down the number of independent variables to less than the number of movements, the model will probably overfit using extreme coefficients and perfectly predict the dependent variable, and this wouldn’t be informative. You could constrain their values to try to prevent this, but then this gets much messier and there’s still no guarantee this will address the problem.
EDIT: Also, I’m not sure what kinds of tests you used, but with small sample sizes, my understanding is that tests based on resampling (permutation, bootstrapping, jackknife) tend to be more accurate than tests using asymptotic distributions (e.g. a normally distributed test statistic is often not a good approximation for a small sample), but this is a separate concern from adjusting for multiple tests. I’m also not sure how much this actually matters.