Thanks for thisâthe presentation of results is admirably clear. Yet I have two worries:
1) Statistics: I think the statistical methods are frequently missing the mark. Sometimes this is a minor quibble; other times more substantial:
a) The dependent variable (welcomenessâassessed by typical Likert scale) is ordinal data i.e. âvery welcomingâ > welcoming > neither etc). The write-up often treats this statistically either as categorical data (e.g. chi2) or interval data (e.g. t-test, the use of âmean welcomenessâ throughout). Doing the latter is generally fine (the data looks pretty well-behaved, t-tests are pretty robust, and I recall controversy about when to use non-parametric tests). Doing the former isnât.
chi2 tests against the null of (in essence) the proportion in each ârowâ of a table is the same between columns: it treats the ordered scale as a set of 5 categories (e.g. like countries, ethnicities, etc.). Statistical significance for this is not specific for âmore or less welcomingâ: two groups with identical âmean welcomenessâ yet with a different distribution across levels could âpass statistical significanceâ by chi2. Tests for âranked dependent by categorical independentâ data exist (e.g. Kruskall-Wallis) and should be used instead.
Further, chi2 assumes the independent variable is categorical too. Usually it is (e.g. where you heard about EA) but sometimes it isnât (e.g. age, year of joining, ?political views). For similar reasons to the above, a significant chi2 result doesnât demonstrate a (monotonic) relationship between welcomeness and time in EA. There are statistical tests for trend which can be used instead.
Still further, chi2 (ditto K-W) is an âomnibusâ test: it tells you your data is surprising given the null, but not what is driving the surprise. Thus statistical significance âon the testâ doesnât indicate whether particular differences (whether highlighted in the write-up or otherwise) are statistically significant.
b) The write-up also seems to be switching between the descriptive and the inferential in an unclear way. Some remarks on the data are accompanied with statistical tests (implying an inference from the sample to the population), whilst similar remarks are not: compare the section on âtime joining EAâ (where there are a couple of tests to support a âlonger in EAâfinding it more welcomingâ), versus age (which notes a variety of differences between age groups, but no statistical tests).
My impression is the better course is the former, and so differences being highlighted to the readers interest should be accompanied by whether these differences are statistically significant. This uniform approach also avoids âgarden of forking pathâ worries (e.g. âDid you not report p values for the age section because you didnât test, or because they werenât significant?â)
c) The ordered regression is comfortably the âhighest yieldâ bit of statistics performed, as it is appropriate to the data, often more sensitive (e.g. lumping the data into two groups by time in EA and t-testing is inferior technique to regression), and helps answer questions of confounding sometimes alluded to in the text (âWelcoming seems to go up with X, but down with Y, which is weird because X and Y correlateâ), but uniformly important (âPeople in local groups find EA more welcomingâbut could that driven by other variables between those within and without local groups?â)
It deserves a much fuller explanation (e.g. how did âcountryâ and âtop priority causeâ become single variables with a single regression coefficientâis the âlumping togetherâ implied in the text post-hoc? How was variable selection/âmodel choice decided? Model 1 lacks only âtop priority causeâ, so assumedly âadding in political spectrum didnât improve explanatory powerâ is a typo?). When its results vary with the univarible analysis, I would prefer the former over the latter. That fb membership, career shifting (in model 2), career type, and politics arenât significant predictors means their relationship to welcomingness, if, even if statistically significant, probably confounding rather than true association.
It is unfortunate some of these are highlighted in the summary and conclusion, even more so when a crucial negative result from the regression is relatively unsung. The ~3% R^2 and very small coefficients (with the arguable exception of sex) implies very limited practical significance: almost all the variation in whether an EA finds EA welcoming or not is not predicted by the factors investigated; although EAs in local groups find EA more welcoming, this effectâalbeit statistically significantâis (if I interpret the regression right) around 0.1% of a single likert level.
2) Selection bias: A perennial challenge to the survey is issues of selection bias. Although happily noted frequently in discussion, I still feel it is underweighed: I think it is huge enough to make the results all but uninterpretable.
Facially, one would expect those who find EA less welcoming are less likely to join. We probably wouldnât think that how welcoming people already in EA think it is would be informative to how good it is at welcoming people into EA (caricatured example: I wouldnât be that surprised if members of something like the KKK found it generally welcoming). As mentioned in the âpoliticsâ section, the relative population size seems a far better metric (although the baseline hard to establish) to which welcomingness adds very little.
Crucially, selection bias imposes some nigh-inscrutable but potentially sign-inverting considerations to any policy âupshotâ. A less welcoming subgroup could be cause for concern, but alternatively cause for celebration: perhaps this subgroup offers other âpull factorsâ that mean people who find EA is less welcoming nonetheless join and stick around within it (and vice versa: maybe subgroups whose members find EA very welcoming do so because they indirectly filter out everyone who doesnât). Akin to Wald and the bombers in WW2, it is crucial to work out which. But I donât think we can here.
Thank you Gregory for the very constructive criticism, it strikes me as one of the most useful types of comments a post can receive, and is good for me personally as a researcher.
âMisuse of Chi^2â
That is a very fair critique of the use of Chi^2 here. I have replaced the Chi^2 tests with K-W tests where appropriate and made a comment in the âupdates and correctionsâ section noting this. Replacing the Chi^2 tests as K-Ws did not change any of our results in any of the sections except politics (which became non-significant). Looking into the change in the politics finding would require more work at this stage to drill down into more detail, and the regression results presented later suggest doing this might not be of much added value.
âDescriptive-v-inferentialâ
My intention in each sub-section was to report whether there was any significant relationship (using the inappropriate Chi^2 test) or use inferential style language in the cases where I used t-tests. In cases where I had not found a relationship (e.g., First Heard of EA). I used language to that effect âThese differences are neither significant nor very substantialâ. In the specific case of age that you mention I mistakenly diverged from this intended style by not using either a reference to a significance test or language to that effect. I have added the K-W test to this section. Certainly, more can be done to ensure the style is more consistent and does not mislead the reader.
âThe ordered regressionâ
Youâre right that the discussion of the regression was insufficient. I wanted to include the regression in the post because, as you mentioned, regression analysis can do a lot to clarify these relationships. But I decided to keep the discussion short because the regression seemed to offer very limited practical significance (as you pointed out). Had I decided to give it more weight in my analysis then it certainly would be appropriate to offer a fuller explanation. Nonetheless, I should have been clearer about the limited usefulness of the regression, and noted it as the reason for the short discussion.
Regardless, hereâs a more detailed explanation:
Variables in the model (and piece in general) were chosen based on cleavages in EA we have found in previous posts to explore how they might differ in terms of welcomeness. âTop Priorityâ was a separate model because so many respondents either did not give a top priority or gave many and thus were excluded. It was disappointing that the factors in the survey data explained so little of the variation.Nevertheless, I thought it would still be of interest to see that the major themes we have been discussing in the survey series so far donât seem to be very important on this measure.
The line regarding political spectrum does indeed appear to be a mistake so I have removed it and stated something to this effect in the âupdates and correctionsâ section.
For simplicity, Country and Top Priority Cause were each presented as a variable where the most popular response was compared to all the others combined. These were the USA and Global Poverty, though the table and discussion should have been more explicit about this, and has been updated accordingly. Country was categorised into the top countries by number of responses; USA, UK, Germany, Australia, Canada, Switzerland, Sweden, Netherlands, and âotherâ. The initial significance we noted in both of these categories was in comparison only to the most popular response; those prioritising AI Risk and Meta Charities appeared significantly more likely to view EA as more welcoming compared to Global Poverty, and those EAs from Australia and Canada appeared significantly more likely to view EA as more welcoming compared to American EAs. However, it would have been more appropriate to model each country as a dummy variable also, which has been done in the regression table linked to here. Due to how our previous phrasing of this result could be misinterpreted, we have decided to to de-emphasise this conclusion.
âSelection biasâ
As you point out, measuring the EA-related sentiment among potential EAs and/âor people who left EA was unfortunately impossible with the main survey and would require actively reaching out to these highly dispersed groups. There was no intention in this post to argue how good the movement is at welcoming people into EA overall, although some may attempt to do so based on the results presented here and so it is wise to add caveats about the limits to doing so. I think your suggestion of focusing more on population sizes relative to a baseline (where possible to establish) is a great idea as a first step in moving in that direction. If this were the aim of the post then certainly the results presented here do little to accomplish that goal. Instead, we could only look at how welcoming people already in EA think it is, the results of which I donât think are âall but uninterpretableâ.There do seem to be meaningful differences in welcomeness perceptions within our sample that still seem worth talking about, even if we canât see the differences outside our sample. If we think the differences in perceived welcomeness are predictors of dropping out of EA, then these findings might hint at factors that influence retention. Again, our data do not allow us to make these inferences about retention but could be useful signposts for further analyses to explore how community perceptions of welcomeness may affect EA retention.
In fact, we debated internally whether to publish this piece at all due to concerns of selection bias and we were unsure what conclusions we could actually draw. We ultimately went ahead with publishing it, though with the decision to not make any specific recommendations. Even still, I can see how we ended up overstating what can be concluded from this data. I certainly share your concern that any âpolicyâ devised simply by looking at the results presented here would almost certainly miss the mark. It was not the intention here to make policy suggestions on how to make EA more welcoming (though there is a sentence in the Local Groups section that does slide in that direction), as clearly a lot more information is needed from former or potential-but-non-EAs.
Once again, many thanks for your thoughtful comments and suggestions.
Thanks for your reply, and the tweaks to the post. However:
[I] decided to keep the discussion short because the regression seemed to offer very limited practical significance (as you pointed out). Had I decided to give it more weight in my analysis then it certainly would be appropriate to offer a fuller explanation. Nonetheless, I should have been clearer about the limited usefulness of the regression, and noted it as the reason for the short discussion.
I think the regression having little practical significance makes it the most useful part of the analysis: it illustrates the variation in the dependent variable is poorly explained by all/âany of the variables investigated, that many of the associations found by bivariate assessment vanish when controlling for others, and gives better estimates of the effect size (and thus relative importance) of those which still exert statistical effect. Noting, essentially, âBut the regression analyses implies a lot of the associations we previously noted are either confounded or trivial, and even when we take all the variables together we canât predict welcomeness much better than taking the averageâ at the end buries the lede.
A worked example. The summary notes, âEAs in local groups, in particular, view the movement as more welcoming than those not in local groupsâ (my emphasis). If you look at the t-test between members and nonmembers thereâs a difference of ~ 0.25 âlikert levelsâ, which is one of the larger effect sizes reported.
Yet we presumably care about how much of this difference can be attributed to local groups. If the story is âEAs in local groups find EA more welcoming because they skew (say) male and youngâ, it seems better to focus attention on these things instead. Regression isnât a magic wand to remove confounding (cf.), but it tends to be better than not doing it at all (which is essentially what is being done when you test association between a single variable and the outcome).
As I noted before, the âeffect sizeâ of local group membership when controlling for other variables is still statistically significant, but utterly trivial. Again: it is ~ 1/â1000thof a likert level; the upper bound of the 95% confidence interval would only be ~ 2/â1000th of a likert level. By comparison, the effect of gender or year of involvement are two orders of magnitude greater. It seems better in the conclusion to highlight results like these, rather than results the analysis demonstrates have no meaningful effect when other variables are controlled for.
A few more minor things:
(Which I forgot earlier). If you are willing to use means, you probably can use standard errors/âconfidence intervals, which may help in the âthis group looks different, but small group sizeâ points.
Bonferroni makes a rod for your back given it is conservative (cf.); an alternative approach is false discovery rate control instead of family wise error rate control. Although minor, if you are going to use this to get your adjusted significance threshold, this should be mentioned early, and the result which âdoesnât make the cutâ should be simply be reported as non-significant.
It is generally a bad idea to lump categories together (e.g. countries, cause areas) for regression as this loses information (and so statistical power). One of the challenges of regression analysis is garden of forking path issues (even post-hocâsome coefficients âpop intoâ and out of statistical significance depending on which model is used, and once Iâve seen one, Iâm not sure how much to discount subsequent ones). It is here where an analysis plan which pre-specifies this is very valuable.
Iâm appreciating this exchange. I wonder if part of the problem stems from the word welcoming*, especially as selection bias naturally tends to neglect those who didnât feel welcome. This could especially be a problem for assessing how welcome women feel, if whatâs happening is that many quickly donât feel welcome and simply leave.
One way to overcome this would be to set up a contact list for a group of male and female people attending an intro event. Even 10 of each (and 5 others) could be useful, not for statistical significance but for an initial assessment at low cost in time and effort. This could be via email but better would be via phone also. You could follow up after the first activity, at the end of the session, a month later and a year later. It could be repeated on a small scale at several intro events, which might give more initial info than a large sample at one event, which might not be representative.
The most powerful tool might be telephoned âsemi-structured interviewsâ which is a well-established social science and participatory appraisal method. Again you wouldnât be looking for statistical significance but more for hypothesis generation, which could then be used in a follow up. eg if a lot of women were saying something like âI just didnât feel comfortableâ or âit was too âŚ.â that could suggest a more specific follow up study, or even lead directly to thoughts about a way to redesign intros.
It helps if such a survey wasnât conducted by someone seen as an organiser, and perhaps ideally a woman?
Thanks for thisâthe presentation of results is admirably clear. Yet I have two worries:
1) Statistics: I think the statistical methods are frequently missing the mark. Sometimes this is a minor quibble; other times more substantial:
a) The dependent variable (welcomenessâassessed by typical Likert scale) is ordinal data i.e. âvery welcomingâ > welcoming > neither etc). The write-up often treats this statistically either as categorical data (e.g. chi2) or interval data (e.g. t-test, the use of âmean welcomenessâ throughout). Doing the latter is generally fine (the data looks pretty well-behaved, t-tests are pretty robust, and I recall controversy about when to use non-parametric tests). Doing the former isnât.
chi2 tests against the null of (in essence) the proportion in each ârowâ of a table is the same between columns: it treats the ordered scale as a set of 5 categories (e.g. like countries, ethnicities, etc.). Statistical significance for this is not specific for âmore or less welcomingâ: two groups with identical âmean welcomenessâ yet with a different distribution across levels could âpass statistical significanceâ by chi2. Tests for âranked dependent by categorical independentâ data exist (e.g. Kruskall-Wallis) and should be used instead.
Further, chi2 assumes the independent variable is categorical too. Usually it is (e.g. where you heard about EA) but sometimes it isnât (e.g. age, year of joining, ?political views). For similar reasons to the above, a significant chi2 result doesnât demonstrate a (monotonic) relationship between welcomeness and time in EA. There are statistical tests for trend which can be used instead.
Still further, chi2 (ditto K-W) is an âomnibusâ test: it tells you your data is surprising given the null, but not what is driving the surprise. Thus statistical significance âon the testâ doesnât indicate whether particular differences (whether highlighted in the write-up or otherwise) are statistically significant.
b) The write-up also seems to be switching between the descriptive and the inferential in an unclear way. Some remarks on the data are accompanied with statistical tests (implying an inference from the sample to the population), whilst similar remarks are not: compare the section on âtime joining EAâ (where there are a couple of tests to support a âlonger in EAâfinding it more welcomingâ), versus age (which notes a variety of differences between age groups, but no statistical tests).
My impression is the better course is the former, and so differences being highlighted to the readers interest should be accompanied by whether these differences are statistically significant. This uniform approach also avoids âgarden of forking pathâ worries (e.g. âDid you not report p values for the age section because you didnât test, or because they werenât significant?â)
c) The ordered regression is comfortably the âhighest yieldâ bit of statistics performed, as it is appropriate to the data, often more sensitive (e.g. lumping the data into two groups by time in EA and t-testing is inferior technique to regression), and helps answer questions of confounding sometimes alluded to in the text (âWelcoming seems to go up with X, but down with Y, which is weird because X and Y correlateâ), but uniformly important (âPeople in local groups find EA more welcomingâbut could that driven by other variables between those within and without local groups?â)
It deserves a much fuller explanation (e.g. how did âcountryâ and âtop priority causeâ become single variables with a single regression coefficientâis the âlumping togetherâ implied in the text post-hoc? How was variable selection/âmodel choice decided? Model 1 lacks only âtop priority causeâ, so assumedly âadding in political spectrum didnât improve explanatory powerâ is a typo?). When its results vary with the univarible analysis, I would prefer the former over the latter. That fb membership, career shifting (in model 2), career type, and politics arenât significant predictors means their relationship to welcomingness, if, even if statistically significant, probably confounding rather than true association.
It is unfortunate some of these are highlighted in the summary and conclusion, even more so when a crucial negative result from the regression is relatively unsung. The ~3% R^2 and very small coefficients (with the arguable exception of sex) implies very limited practical significance: almost all the variation in whether an EA finds EA welcoming or not is not predicted by the factors investigated; although EAs in local groups find EA more welcoming, this effectâalbeit statistically significantâis (if I interpret the regression right) around 0.1% of a single likert level.
2) Selection bias: A perennial challenge to the survey is issues of selection bias. Although happily noted frequently in discussion, I still feel it is underweighed: I think it is huge enough to make the results all but uninterpretable.
Facially, one would expect those who find EA less welcoming are less likely to join. We probably wouldnât think that how welcoming people already in EA think it is would be informative to how good it is at welcoming people into EA (caricatured example: I wouldnât be that surprised if members of something like the KKK found it generally welcoming). As mentioned in the âpoliticsâ section, the relative population size seems a far better metric (although the baseline hard to establish) to which welcomingness adds very little.
Crucially, selection bias imposes some nigh-inscrutable but potentially sign-inverting considerations to any policy âupshotâ. A less welcoming subgroup could be cause for concern, but alternatively cause for celebration: perhaps this subgroup offers other âpull factorsâ that mean people who find EA is less welcoming nonetheless join and stick around within it (and vice versa: maybe subgroups whose members find EA very welcoming do so because they indirectly filter out everyone who doesnât). Akin to Wald and the bombers in WW2, it is crucial to work out which. But I donât think we can here.
Thank you Gregory for the very constructive criticism, it strikes me as one of the most useful types of comments a post can receive, and is good for me personally as a researcher.
âMisuse of Chi^2â
That is a very fair critique of the use of Chi^2 here. I have replaced the Chi^2 tests with K-W tests where appropriate and made a comment in the âupdates and correctionsâ section noting this. Replacing the Chi^2 tests as K-Ws did not change any of our results in any of the sections except politics (which became non-significant). Looking into the change in the politics finding would require more work at this stage to drill down into more detail, and the regression results presented later suggest doing this might not be of much added value.
âDescriptive-v-inferentialâ
My intention in each sub-section was to report whether there was any significant relationship (using the inappropriate Chi^2 test) or use inferential style language in the cases where I used t-tests. In cases where I had not found a relationship (e.g., First Heard of EA). I used language to that effect âThese differences are neither significant nor very substantialâ. In the specific case of age that you mention I mistakenly diverged from this intended style by not using either a reference to a significance test or language to that effect. I have added the K-W test to this section. Certainly, more can be done to ensure the style is more consistent and does not mislead the reader.
âThe ordered regressionâ
Youâre right that the discussion of the regression was insufficient. I wanted to include the regression in the post because, as you mentioned, regression analysis can do a lot to clarify these relationships. But I decided to keep the discussion short because the regression seemed to offer very limited practical significance (as you pointed out). Had I decided to give it more weight in my analysis then it certainly would be appropriate to offer a fuller explanation. Nonetheless, I should have been clearer about the limited usefulness of the regression, and noted it as the reason for the short discussion.
Regardless, hereâs a more detailed explanation:
Variables in the model (and piece in general) were chosen based on cleavages in EA we have found in previous posts to explore how they might differ in terms of welcomeness. âTop Priorityâ was a separate model because so many respondents either did not give a top priority or gave many and thus were excluded. It was disappointing that the factors in the survey data explained so little of the variation.Nevertheless, I thought it would still be of interest to see that the major themes we have been discussing in the survey series so far donât seem to be very important on this measure.
The line regarding political spectrum does indeed appear to be a mistake so I have removed it and stated something to this effect in the âupdates and correctionsâ section.
For simplicity, Country and Top Priority Cause were each presented as a variable where the most popular response was compared to all the others combined. These were the USA and Global Poverty, though the table and discussion should have been more explicit about this, and has been updated accordingly. Country was categorised into the top countries by number of responses; USA, UK, Germany, Australia, Canada, Switzerland, Sweden, Netherlands, and âotherâ. The initial significance we noted in both of these categories was in comparison only to the most popular response; those prioritising AI Risk and Meta Charities appeared significantly more likely to view EA as more welcoming compared to Global Poverty, and those EAs from Australia and Canada appeared significantly more likely to view EA as more welcoming compared to American EAs. However, it would have been more appropriate to model each country as a dummy variable also, which has been done in the regression table linked to here. Due to how our previous phrasing of this result could be misinterpreted, we have decided to to de-emphasise this conclusion.
âSelection biasâ
As you point out, measuring the EA-related sentiment among potential EAs and/âor people who left EA was unfortunately impossible with the main survey and would require actively reaching out to these highly dispersed groups. There was no intention in this post to argue how good the movement is at welcoming people into EA overall, although some may attempt to do so based on the results presented here and so it is wise to add caveats about the limits to doing so. I think your suggestion of focusing more on population sizes relative to a baseline (where possible to establish) is a great idea as a first step in moving in that direction. If this were the aim of the post then certainly the results presented here do little to accomplish that goal. Instead, we could only look at how welcoming people already in EA think it is, the results of which I donât think are âall but uninterpretableâ.There do seem to be meaningful differences in welcomeness perceptions within our sample that still seem worth talking about, even if we canât see the differences outside our sample. If we think the differences in perceived welcomeness are predictors of dropping out of EA, then these findings might hint at factors that influence retention. Again, our data do not allow us to make these inferences about retention but could be useful signposts for further analyses to explore how community perceptions of welcomeness may affect EA retention.
In fact, we debated internally whether to publish this piece at all due to concerns of selection bias and we were unsure what conclusions we could actually draw. We ultimately went ahead with publishing it, though with the decision to not make any specific recommendations. Even still, I can see how we ended up overstating what can be concluded from this data. I certainly share your concern that any âpolicyâ devised simply by looking at the results presented here would almost certainly miss the mark. It was not the intention here to make policy suggestions on how to make EA more welcoming (though there is a sentence in the Local Groups section that does slide in that direction), as clearly a lot more information is needed from former or potential-but-non-EAs.
Once again, many thanks for your thoughtful comments and suggestions.
Thanks for your reply, and the tweaks to the post. However:
I think the regression having little practical significance makes it the most useful part of the analysis: it illustrates the variation in the dependent variable is poorly explained by all/âany of the variables investigated, that many of the associations found by bivariate assessment vanish when controlling for others, and gives better estimates of the effect size (and thus relative importance) of those which still exert statistical effect. Noting, essentially, âBut the regression analyses implies a lot of the associations we previously noted are either confounded or trivial, and even when we take all the variables together we canât predict welcomeness much better than taking the averageâ at the end buries the lede.
A worked example. The summary notes, âEAs in local groups, in particular, view the movement as more welcoming than those not in local groupsâ (my emphasis). If you look at the t-test between members and nonmembers thereâs a difference of ~ 0.25 âlikert levelsâ, which is one of the larger effect sizes reported.
Yet we presumably care about how much of this difference can be attributed to local groups. If the story is âEAs in local groups find EA more welcoming because they skew (say) male and youngâ, it seems better to focus attention on these things instead. Regression isnât a magic wand to remove confounding (cf.), but it tends to be better than not doing it at all (which is essentially what is being done when you test association between a single variable and the outcome).
As I noted before, the âeffect sizeâ of local group membership when controlling for other variables is still statistically significant, but utterly trivial. Again: it is ~ 1/â1000th of a likert level; the upper bound of the 95% confidence interval would only be ~ 2/â1000th of a likert level. By comparison, the effect of gender or year of involvement are two orders of magnitude greater. It seems better in the conclusion to highlight results like these, rather than results the analysis demonstrates have no meaningful effect when other variables are controlled for.
A few more minor things:
(Which I forgot earlier). If you are willing to use means, you probably can use standard errors/âconfidence intervals, which may help in the âthis group looks different, but small group sizeâ points.
Bonferroni makes a rod for your back given it is conservative (cf.); an alternative approach is false discovery rate control instead of family wise error rate control. Although minor, if you are going to use this to get your adjusted significance threshold, this should be mentioned early, and the result which âdoesnât make the cutâ should be simply be reported as non-significant.
It is generally a bad idea to lump categories together (e.g. countries, cause areas) for regression as this loses information (and so statistical power). One of the challenges of regression analysis is garden of forking path issues (even post-hocâsome coefficients âpop intoâ and out of statistical significance depending on which model is used, and once Iâve seen one, Iâm not sure how much to discount subsequent ones). It is here where an analysis plan which pre-specifies this is very valuable.
Iâm appreciating this exchange. I wonder if part of the problem stems from the word welcoming*, especially as selection bias naturally tends to neglect those who didnât feel welcome. This could especially be a problem for assessing how welcome women feel, if whatâs happening is that many quickly donât feel welcome and simply leave.
One way to overcome this would be to set up a contact list for a group of male and female people attending an intro event. Even 10 of each (and 5 others) could be useful, not for statistical significance but for an initial assessment at low cost in time and effort. This could be via email but better would be via phone also. You could follow up after the first activity, at the end of the session, a month later and a year later. It could be repeated on a small scale at several intro events, which might give more initial info than a large sample at one event, which might not be representative.
The most powerful tool might be telephoned âsemi-structured interviewsâ which is a well-established social science and participatory appraisal method. Again you wouldnât be looking for statistical significance but more for hypothesis generation, which could then be used in a follow up. eg if a lot of women were saying something like âI just didnât feel comfortableâ or âit was too âŚ.â that could suggest a more specific follow up study, or even lead directly to thoughts about a way to redesign intros.
It helps if such a survey wasnât conducted by someone seen as an organiser, and perhaps ideally a woman?
an alternative might be âsatisfactionâ ?