Thanks for your reply, and the tweaks to the post. However:
[I] decided to keep the discussion short because the regression seemed to offer very limited practical significance (as you pointed out). Had I decided to give it more weight in my analysis then it certainly would be appropriate to offer a fuller explanation. Nonetheless, I should have been clearer about the limited usefulness of the regression, and noted it as the reason for the short discussion.
I think the regression having little practical significance makes it the most useful part of the analysis: it illustrates the variation in the dependent variable is poorly explained by all/any of the variables investigated, that many of the associations found by bivariate assessment vanish when controlling for others, and gives better estimates of the effect size (and thus relative importance) of those which still exert statistical effect. Noting, essentially, “But the regression analyses implies a lot of the associations we previously noted are either confounded or trivial, and even when we take all the variables together we can’t predict welcomeness much better than taking the average” at the end buries the lede.
A worked example. The summary notes, “EAs in local groups, in particular, view the movement as more welcoming than those not in local groups” (my emphasis). If you look at the t-test between members and nonmembers there’s a difference of ~ 0.25 ‘likert levels’, which is one of the larger effect sizes reported.
Yet we presumably care about how much of this difference can be attributed to local groups. If the story is “EAs in local groups find EA more welcoming because they skew (say) male and young”, it seems better to focus attention on these things instead. Regression isn’t a magic wand to remove confounding (cf.), but it tends to be better than not doing it at all (which is essentially what is being done when you test association between a single variable and the outcome).
As I noted before, the ‘effect size’ of local group membership when controlling for other variables is still statistically significant, but utterly trivial. Again: it is ~ 1/1000thof a likert level; the upper bound of the 95% confidence interval would only be ~ 2/1000th of a likert level. By comparison, the effect of gender or year of involvement are two orders of magnitude greater. It seems better in the conclusion to highlight results like these, rather than results the analysis demonstrates have no meaningful effect when other variables are controlled for.
A few more minor things:
(Which I forgot earlier). If you are willing to use means, you probably can use standard errors/confidence intervals, which may help in the ‘this group looks different, but small group size’ points.
Bonferroni makes a rod for your back given it is conservative (cf.); an alternative approach is false discovery rate control instead of family wise error rate control. Although minor, if you are going to use this to get your adjusted significance threshold, this should be mentioned early, and the result which ‘doesn’t make the cut’ should be simply be reported as non-significant.
It is generally a bad idea to lump categories together (e.g. countries, cause areas) for regression as this loses information (and so statistical power). One of the challenges of regression analysis is garden of forking path issues (even post-hoc—some coefficients ‘pop into’ and out of statistical significance depending on which model is used, and once I’ve seen one, I’m not sure how much to discount subsequent ones). It is here where an analysis plan which pre-specifies this is very valuable.
I’m appreciating this exchange. I wonder if part of the problem stems from the word welcoming*, especially as selection bias naturally tends to neglect those who didn’t feel welcome. This could especially be a problem for assessing how welcome women feel, if what’s happening is that many quickly don’t feel welcome and simply leave.
One way to overcome this would be to set up a contact list for a group of male and female people attending an intro event. Even 10 of each (and 5 others) could be useful, not for statistical significance but for an initial assessment at low cost in time and effort. This could be via email but better would be via phone also. You could follow up after the first activity, at the end of the session, a month later and a year later. It could be repeated on a small scale at several intro events, which might give more initial info than a large sample at one event, which might not be representative.
The most powerful tool might be telephoned “semi-structured interviews” which is a well-established social science and participatory appraisal method. Again you wouldn’t be looking for statistical significance but more for hypothesis generation, which could then be used in a follow up. eg if a lot of women were saying something like “I just didn’t feel comfortable” or “it was too ….” that could suggest a more specific follow up study, or even lead directly to thoughts about a way to redesign intros.
It helps if such a survey wasn’t conducted by someone seen as an organiser, and perhaps ideally a woman?
Thanks for your reply, and the tweaks to the post. However:
I think the regression having little practical significance makes it the most useful part of the analysis: it illustrates the variation in the dependent variable is poorly explained by all/any of the variables investigated, that many of the associations found by bivariate assessment vanish when controlling for others, and gives better estimates of the effect size (and thus relative importance) of those which still exert statistical effect. Noting, essentially, “But the regression analyses implies a lot of the associations we previously noted are either confounded or trivial, and even when we take all the variables together we can’t predict welcomeness much better than taking the average” at the end buries the lede.
A worked example. The summary notes, “EAs in local groups, in particular, view the movement as more welcoming than those not in local groups” (my emphasis). If you look at the t-test between members and nonmembers there’s a difference of ~ 0.25 ‘likert levels’, which is one of the larger effect sizes reported.
Yet we presumably care about how much of this difference can be attributed to local groups. If the story is “EAs in local groups find EA more welcoming because they skew (say) male and young”, it seems better to focus attention on these things instead. Regression isn’t a magic wand to remove confounding (cf.), but it tends to be better than not doing it at all (which is essentially what is being done when you test association between a single variable and the outcome).
As I noted before, the ‘effect size’ of local group membership when controlling for other variables is still statistically significant, but utterly trivial. Again: it is ~ 1/1000th of a likert level; the upper bound of the 95% confidence interval would only be ~ 2/1000th of a likert level. By comparison, the effect of gender or year of involvement are two orders of magnitude greater. It seems better in the conclusion to highlight results like these, rather than results the analysis demonstrates have no meaningful effect when other variables are controlled for.
A few more minor things:
(Which I forgot earlier). If you are willing to use means, you probably can use standard errors/confidence intervals, which may help in the ‘this group looks different, but small group size’ points.
Bonferroni makes a rod for your back given it is conservative (cf.); an alternative approach is false discovery rate control instead of family wise error rate control. Although minor, if you are going to use this to get your adjusted significance threshold, this should be mentioned early, and the result which ‘doesn’t make the cut’ should be simply be reported as non-significant.
It is generally a bad idea to lump categories together (e.g. countries, cause areas) for regression as this loses information (and so statistical power). One of the challenges of regression analysis is garden of forking path issues (even post-hoc—some coefficients ‘pop into’ and out of statistical significance depending on which model is used, and once I’ve seen one, I’m not sure how much to discount subsequent ones). It is here where an analysis plan which pre-specifies this is very valuable.
I’m appreciating this exchange. I wonder if part of the problem stems from the word welcoming*, especially as selection bias naturally tends to neglect those who didn’t feel welcome. This could especially be a problem for assessing how welcome women feel, if what’s happening is that many quickly don’t feel welcome and simply leave.
One way to overcome this would be to set up a contact list for a group of male and female people attending an intro event. Even 10 of each (and 5 others) could be useful, not for statistical significance but for an initial assessment at low cost in time and effort. This could be via email but better would be via phone also. You could follow up after the first activity, at the end of the session, a month later and a year later. It could be repeated on a small scale at several intro events, which might give more initial info than a large sample at one event, which might not be representative.
The most powerful tool might be telephoned “semi-structured interviews” which is a well-established social science and participatory appraisal method. Again you wouldn’t be looking for statistical significance but more for hypothesis generation, which could then be used in a follow up. eg if a lot of women were saying something like “I just didn’t feel comfortable” or “it was too ….” that could suggest a more specific follow up study, or even lead directly to thoughts about a way to redesign intros.
It helps if such a survey wasn’t conducted by someone seen as an organiser, and perhaps ideally a woman?
an alternative might be “satisfaction” ?