Hi Jacob—thanks for giving critical feedback. It’s much appreciated so thank you for your directness. Whilst I agree with some aspects of your comment, I also disagree with some parts (or don’t think they’re important enough to not update at all based on the research).
In the literature review, strength of evidence was evaluated based on the number of studies supporting a particular conclusion. This metric is entirely flawed as it will find support for any conclusion with a sufficiently high number of studies. For example, suppose you ran 1000 studies of an ineffective intervention. At the standard false positive rate of p-values below 0.05, we would expect 50 studies with significant results. By the standards used in the review, this would be strong evidence the intervention was effective, despite it being entirely ineffective by assumption.
I think this is down to a miscommunication on our part, so I’ll try to clarify. In essence, I think we shouldn’t have said “the number of papers we found”′ but “the percentage of total papers we found”, as we looked for all papers that both would have supported or gone against the conclusions laid out in the literature review table. For example, for the statement:
“Protest movements can have significant impacts (2-5% shifts) on voting behaviour and electoral outcomes”
We found 5 studies that seemed methodologically robust and studied the outcomes of nonviolent protest on voting behaviour. 4 out of these 5 studies found significant positive impacts. One of these 5 studies found negligible impacts of protest on voting behaviour of protest in some contexts, but significant in others. Obviously there is some publication bias here, as this is not clearly not always true, but nonetheless all of the literature that does exist which examines the impact of protest on voting behaviour finds positive results in some cases (5/5 studies).
We’re fairly aware of multiple hypothesis testing, and the evidence base for protest interventions is very small, so it’s exceedingly unlikely that there would be a scenario like what you outline in your comment. The chance of 5⁄5 studies all finding positive results purely by chance is extremely small. Specifically, it would be 0.0003% (0.05^5), which I think we can both agree is fairly unlikely. It seems this was down to poor communication on our part, so I’ll amend that.
2. The work surveying the UK public before and after major protests is extremely susceptible to social desirability, agreement and observer biases. The design of the surveys and questions exacerbates these biases and the measured effect might entirely reflect these biases. Comparison to a control group does not mitigate these issues as exposure to the intervention likely results in differential bias between the two groups.
I definitely agree the survey does have some limitations (and we’ve discussed this off the Forum). That’s the main reason we’ve only given a single star of weighting (i.e. what we think is our weakest form of evidence), and haven’t updated our views particularly strongly based on it.
That said, I think the important thing for our survey was that we were only looking at how responses changed over time. This, in my opinion, means levels of bias matter less as we think they’ll be roughly constant at all 3 time periods, given it was three fresh samples of respondents. I’ll also note that we got feedback from what I deem experienced researchers/academics prior to this survey, and they thought the design was sufficiently good.
Additionally, the main conclusion of our survey was that it found no “backfire” effect, or that people didn’t support climate policies less as a result of disruptive protest. In other words, we found no significant changes—which seems unlikely if there were significant positive biases at play as you claim.
Edited after: I think your social desirability critique is also misguided as only 18% of people in the UK were supportive of these protests (according to our survey). I find it hard to believe that this is the case, whilst also eliciting high levels of social desirability in our post-protest surveys as you claim.
3. The research question is not clearly specified. The authors indicate they’re not interested in the effects of a median protest, but do not clarify what sort of protest exactly they are interested in. By extension, it’s not clear how the inclusion criteria for the review reflects the research question.
I disagree somewhat strongly with this, and I think we clearly say what we’re interested in. Specifically, in our executive summary, we say the following:
We’re focusing primarily on Western democracies, such as those in North America and Western Europe.
Specifically, we mostly focus on the outcomes of large protest movements with at least 1,000 participants in active involvement (and we define active involvement).
This is because we want to understand the impact that protests can have, rather than the impact that most protests will have. Due to this, our research looks at unusually influential protest movements.
From our literature review: Relatively recent protest movements, to provide more generalisability to the current social, political and economic contexts. Whilst we did include several papers on the Civil Rights Movements (1950s United States), most papers we included were focused on movements from the 1990s onwards.
Maybe you disagree with this being a sufficient level of clarity—but I think this fairly clearly sets out the types of protest movements we’re including for the purposes of this research. If you don’t think this is particularly clear, or you think specific bits are ambiguous, I would be interested to hear more about that.
Edit: In hindsight, I think we could have given a specific list of countries we focus on, but it really is overwhelmingly the US, UK and other countries in Western Europe (e.g. Belgium, Spain, Germany) so I thought this was somewhat clear.
4. In this summary, uncertainty is expressed from 0-100%, with 0% indicating very high uncertainty. This does not correspond with the usual interpretation of uncertainty as a probability, where 0% would represent complete confidence the proposition was false, 50% total uncertainty and 100% complete confidence the proposition was true. Similarly, Table 4 neither corresponds to the common statistical meaning of variance nor defines a clear alternative.
This feels fairly semantical, and I think is quite a weak criticism. I think it’s quite common for people to express their confidence in a belief using confidence intervals (see Reasoning Transparency) where they use confidence in the way we used uncertainty. I think if your specific issue is that we’ve used the word ‘uncertainty’ when we should have used ‘confidence’, I think that’s a very minor point that would hardly affect the rest of our work. We do also in brackets right confidence intervals (e.g. 60-80% confidence) but I’m happy to update it if you think it’ll provide clarity. That being said, this feels quite nitpick-y and I assume most readers understood what we meant.
Regarding variance, I agree that we could have been clearer on the exact definition we wanted, but again I think this is quite nitpick-y as we do outline what we consider to be the variance in this case, namely in section 7:
“..there are large differences between the effectiveness of the most impactful protest movements, and a randomly selected (or median) protest movement. We think this difference is large enough such that if we were able to accurately model the cost-effectiveness of these protest movements, it would be different by a factor of at least 10. “
I don’t think this is a substantive criticism as it’s largely over a choice in wording, that we do somewhat address (even though I do think we could better explain our use of the term).
Overall, I think most of your issues are semantic or minor, which either I’m happy to amend (as in point 1 and 4) or think we’ve already made adjustments for weak survey evidence (as you point out in point 2).Based on this, I don’t think what you’ve said substantially detracts from the report, such that people should put significantly less weight on it. Due to that, I also disagree with your claim that “Given the severity of these issues, I’d expect many more methodological issues on further inspection.” as in my opinion, the issues you raise are largely not severe nor methodological.
That being said, I don’t expect anyone to update from thinking very negatively about protest movements to thinking very positively about protest movements based on this work. If anything, I expect it to be a reasonably minor-moderate update, given the still small evidence base and difficulties with conducting reasonable research in this space.
This feedback has definitely been useful, so if you do have other criticisms, I would be interested in hearing them. I do appreciate you taking the time to leave this comment, and being direct about your thoughts. If you’re at all interested, we would be happy to compensate you, or other willing and able reviewers, for their time to look through the work more thoroughly and provide additional criticism/feedback.
Thank you for your responses and engagement. Overall, it seems like we agree 1 and 2 are problems; still disagree about 3; and I don’t think I made my point on 4 understood and your explanation raises more issues in my mind. While I think these 4 issues are themselves substantive, I worry they are the tip of an iceberg as 1 and 2 are in my opinion relatively basic issues. I appreciate your offer to pay for further critique; I hope someone is able to take you up on it.
Great, I think we agree the approach outlined in the original report should be changed. Did the report actually use percentage of total papers found? I don’t mean to be pedantic but it’s germane to my greater point: was this really a miscommunication of the intended analysis, or did the report originally intend to use number of papers founds, as it seems to state and then execute on: “Confidence ratings are based on the number of methodologically robust (according to the two reviewers) studies supporting the claim. Low = 0-2 studies supporting, or mixed evidence; Medium = 3-6 studies supporting; Strong = 7+ studies supporting.”
It seems like we largely agree in not putting much weight in this study. However, I don’t think comparisons against a baseline measurement mitigates the bias concerns much. For example, exposure to the protests is a strong signal of social desirability: it’s a chunk of society demonstrating to draw attention to the desirability of action on climate change. This exposure is present in the “after” measurement and absent in the “before” measurement, thus differential and potentially biasing the estimates. Such bias could be hiding a backlash effect.
The issue lies in defining “unusually influential protest movements”. This is crucial because you’re selecting on your outcome measurement, which is generally discouraged. The most cynical interpretation would be that you excluded all studies that didn’t find an effect because, by definition, these weren’t very influential protest movements.
Unfortunately, this is not a semantic critique. Call it what you will but I don’t know what the confidences/uncertainties you are putting forward mean and your readers would be wrong to assume. I didn’t read the entire OpenPhil report, but I didn’t see any examples of using low percentages to indicate high uncertainty. Can you explain concretely what your numbers mean?
My best guess is this is a misinterpretation of the “90%” in a “90% confidence interval”. For example, maybe you’re interpreting a 90% CI from [2,4] to indicate we are highly confident the effect ranges from 2 to 4, while a 10% CI from [2.9, 3.1] would indicate we have very little confidence in the effect? This is incorrect as CIs can be constructed at any level of confidence regardless of the size of effect, from null to very large, or the variance in the effect.
Thank you for pointing to this additional information re your definition of variance; I hadn’t seen it. Unfortunately, it illustrates my point that these superficial methodological issues are likely just the tip of the iceberg. The definition you provide offers two radically different options for the bound of the range you’re describing: randomly selected or median protest. Which one is it? If it’s randomly selected, what prevents randomly selecting the most effective protests, in which case the range would be zero? Etc.
Lastly, I have to ask in what regard you don’t find these critiques methodological? The selection of outcome measure in a review, survey design, construction of a research question and approach to communicating uncertainty all seem methodological—at least these are topics commonly covered in research methods courses and textbooks.
Thanks for your quick reply Jacob! I think I still largely degree on how substantive you think these are, and address these points below. I also feel sad that your comments feel slightly condescending or uncharitable, which makes it difficult for me to have a productive conversation.
Great, I think we agree the approach outlined in the original report should be changed. Did the report actually use percentage of total papers found? I don’t mean to be pedantic but it’s germane to my greater point: was this really a miscommunication of the intended analysis, or did the report originally intend to use number of papers founds, as it seems to state and then execute on: “Confidence ratings are based on the number of methodologically robust (according to the two reviewers) studies supporting the claim. Low = 0-2 studies supporting, or mixed evidence; Medium = 3-6 studies supporting; Strong = 7+ studies supporting.”
The first one—Our aim was to examine all the papers (within our other criteria of recency, democratic context, etc) that related to the impacts of protest on public opinion, policy change, voting behaviour, etc. We didn’t exclude any because they found negative or negligible results—as that would obviously be empirically extremely dubious.
2. It seems like we largely agree in not putting much weight in this study. However, I don’t think comparisons against a baseline measurement mitigates the bias concerns much. For example, exposure to the protests is a strong signal of social desirability: it’s a chunk of society demonstrating to draw attention to the desirability of action on climate change. This exposure is present in the “after” measurement and absent in the “before” measurement, thus differential and potentially biasing the estimates. Such bias could be hiding a backlash effect.
I didn’t make this clear enough in my first comment (I’ve now edited it) but I think your social desirability critique feels somewhat off. Only 18% of people in the UK were supportive of these protests (according to our survey), with a fair bit of negative media attention about the protests. This makes it hard to believe that respondents would genuinely feel any positive social desirability bias, when the majority of the public actually disapprove of the protests. If anything, it would be much more likely to have negative social desirability bias. I’m open to ways on how we might test this post-hoc with the data we have, but not sure if that’s possible.
3. The issue lies in defining “unusually influential protest movements”. This is crucial because you’re selecting on your outcome measurement, which is generally discouraged. The most cynical interpretation would be that you excluded all studies that didn’t find an effect because, by definition, these weren’t very influential protest movements.
Just to reiterate what I said above for clarity: Our aim was to examine all the papers that related to the impacts of protest on public opinion, policy change, voting behaviour, etc. We didn’t exclude any because they found negative or negligible results—as that would obviously be empirically extremely dubious. The only reason we specified that our research looks at large and influential protest movements is that this is by default what academics study (as they are interesting and able to get published). There are almost no studies looking at the impact of small protests, which make up the majority of protests, so we can’t claim to have any solid understanding of their impacts. The research was largely aiming to understand the impacts for the largest/most well-studied protest movements, and I think that aim was fulfilled.
4. Unfortunately, this is not a semantic critique. Call it what you will but I don’t know what the confidences/uncertainties you are putting forward mean and your readers would be wrong to assume. I didn’t read the entire OpenPhil report, but I didn’t see any examples of using low percentages to indicate high uncertainty. Can you explain concretely what your numbers mean?
Sure—what we mean is that we’re 80% confident that our indicated answer is likely to be the true answer. For example, for our answers on policy change, we’re 40-60% confident that our finding (highlighted in blue) is likely to be correct e.g. there’s a 60-40% chance we’ve also got it wrong. One could also assume from where we’ve placed it on our summary table that if it was wrong, it’s likely to be in the boxes immediately surrounding what we indicated.
E.g. if you look at the Open Phil report, here is a quote similar to how we’ve used it:
“to indicate that I think the probability of my statement being true is >50%”
I understand that confidence intervals can be constructed for any effect size, but we indicate the effect sizes using the upper row in the summary table (and quantify it where we think it is reasonable to do so).
Lastly, I have to ask in what regard you don’t find these critiques methodological? The selection of outcome measure in a review, survey design, construction of a research question and approach to communicating uncertainty all seem methodological—at least these are topics commonly covered in research methods courses and textbooks.
The reasons why I don’t find these critiques as highlighting significant methodological flaws is that:
I don’t think we have selected the wrong outcome measure, but we just didn’t communicate it particularly well, which I totally accept.
The survey design isn’t perfect, which I admit, but we didn’t put a lot of weight on this for our report so in my view it’s not pointing out a methodological issue with this report. Additionally, you think there will be high levels of positive social desirability bias when this is the opposite of what I would expect—given the majority of the public (82% in our survey) don’t support the protests (and report this on the survey, indicating the social desirability bias doesn’t skew positive)
Similar to my first bullet point—I think the research question is well constructed (i.e. it wasn’t selecting for the outcome as I clarified) but you’ve read it in a fairly uncharitable way (which due to our fault, is possible because we’ve been vaguer than ideal)
Finally I think we’ve communicated uncertainty in quite a reasonable way, and other feedback we’ve got indicates that people fully understood what we meant. We’ve received 4+ other pieces of feedback regarding our uncertainty communication which people found useful and indicative, so I’m currently putting more weight on this than your view. That said, I do think it can be improved, but I’m not sure it’s as much of a methodological issue as a communicative issue.
I also feel sad that your comments feel slightly condescending or uncharitable, which makes it difficult for me to have a productive conversation.
I’m really sorry to come off that way, James. Please know it’s not my intention, but duly noted, and I’ll try to do better in the future.
Got it; that’s helpful to know, and thank you for taking the time to explain!
SDB is generally hard to test for post hoc, which is why it’s so important to design studies to avoid it. As the surveys suggest, not supporting protests doesn’t imply people don’t report support for climate action; so, for example, the responses about support for climate action could be biased upwards by the social desirability of climate action, even though those same respondents don’t support protests. Regardless, I don’t allege to know for certain these estimates are biased upwards (or downwards for that matter, in which case maybe the study is a false negative!). Instead, I’d argue the design itself is susceptible to social desirability and other biases. It’s difficult, if not impossible, to sort out how those biases affected the result, which is why I don’t find this study very informative. I’m curious why, if you think the results weren’t likely biased, you chose to down-weight it?
Understood; thank you for taking the time to clarify here. I agree this would be quite dubious. I don’t mean to be uncharitable in my interpretation: unfortunately, dubious research is the norm, and I’ve seen errors like this in the literature regularly. I’m glad they didn’t occur here!
Great, this makes sense and seems like standard practice. My misunderstanding arose from an error in the labeling of the tables: Uncertainty level 1 is labeled “highly uncertain,” but this is not the case for all values in that range. For example, suppose you were 1% confident that protests led to a large change. Contrary to the label, we would be quite certain protests did not lead to a large change. 20% confidence would make sense to label as highly uncertain as it reflects a uniform distribution of confidence across the five effect size bins. But confidences below that, in fact, reflect increasing certainty about the negation of the claim. I’d suggest using traditional confidence intervals here instead as they’re more familiar and standard, eg: We believe the average effects of protests on voting behavior is in the interval of [1, 8] percentage points with 90% confidence, or [3, 6] pp with 80% confidence.
Further adding to my confusion, the usage of “confidence interval” in “which can also be interpreted as 0-100% confidence intervals,” doesn’t reflect the standard usage of the term.
The reasons why I don’t find these critiques as highlighting significant methodological flaws is that:
Sorry, I think this was a miscommunication in our comments. I was referring to “Issues you raise are largely not severe nor methodological,” which gave me the impression you didn’t think the issues were related to the research methods. I understand your position here better.
Anyway, I’ll edit my top-level comment to reflect some of this new information; this generally updates me toward thinking this research may be more informative. I appreciate your taking the time to engage so thoroughly, and apologies again for giving an impression of anything less than the kindness and grace we should all aspire to.
Hi Jacob—thanks for giving critical feedback. It’s much appreciated so thank you for your directness. Whilst I agree with some aspects of your comment, I also disagree with some parts (or don’t think they’re important enough to not update at all based on the research).
I think this is down to a miscommunication on our part, so I’ll try to clarify. In essence, I think we shouldn’t have said “the number of papers we found”′ but “the percentage of total papers we found”, as we looked for all papers that both would have supported or gone against the conclusions laid out in the literature review table. For example, for the statement:
“Protest movements can have significant impacts (2-5% shifts) on voting behaviour and electoral outcomes”
We found 5 studies that seemed methodologically robust and studied the outcomes of nonviolent protest on voting behaviour. 4 out of these 5 studies found significant positive impacts. One of these 5 studies found negligible impacts of protest on voting behaviour of protest in some contexts, but significant in others. Obviously there is some publication bias here, as this is not clearly not always true, but nonetheless all of the literature that does exist which examines the impact of protest on voting behaviour finds positive results in some cases (5/5 studies).
We’re fairly aware of multiple hypothesis testing, and the evidence base for protest interventions is very small, so it’s exceedingly unlikely that there would be a scenario like what you outline in your comment. The chance of 5⁄5 studies all finding positive results purely by chance is extremely small. Specifically, it would be 0.0003% (0.05^5), which I think we can both agree is fairly unlikely. It seems this was down to poor communication on our part, so I’ll amend that.
I definitely agree the survey does have some limitations (and we’ve discussed this off the Forum). That’s the main reason we’ve only given a single star of weighting (i.e. what we think is our weakest form of evidence), and haven’t updated our views particularly strongly based on it.
That said, I think the important thing for our survey was that we were only looking at how responses changed over time. This, in my opinion, means levels of bias matter less as we think they’ll be roughly constant at all 3 time periods, given it was three fresh samples of respondents. I’ll also note that we got feedback from what I deem experienced researchers/academics prior to this survey, and they thought the design was sufficiently good.
Additionally, the main conclusion of our survey was that it found no “backfire” effect, or that people didn’t support climate policies less as a result of disruptive protest. In other words, we found no significant changes—which seems unlikely if there were significant positive biases at play as you claim.
Edited after: I think your social desirability critique is also misguided as only 18% of people in the UK were supportive of these protests (according to our survey). I find it hard to believe that this is the case, whilst also eliciting high levels of social desirability in our post-protest surveys as you claim.
I disagree somewhat strongly with this, and I think we clearly say what we’re interested in. Specifically, in our executive summary, we say the following:
We’re focusing primarily on Western democracies, such as those in North America and Western Europe.
Specifically, we mostly focus on the outcomes of large protest movements with at least 1,000 participants in active involvement (and we define active involvement).
This is because we want to understand the impact that protests can have, rather than the impact that most protests will have. Due to this, our research looks at unusually influential protest movements.
From our literature review: Relatively recent protest movements, to provide more generalisability to the current social, political and economic contexts. Whilst we did include several papers on the Civil Rights Movements (1950s United States), most papers we included were focused on movements from the 1990s onwards.
Maybe you disagree with this being a sufficient level of clarity—but I think this fairly clearly sets out the types of protest movements we’re including for the purposes of this research. If you don’t think this is particularly clear, or you think specific bits are ambiguous, I would be interested to hear more about that.
Edit: In hindsight, I think we could have given a specific list of countries we focus on, but it really is overwhelmingly the US, UK and other countries in Western Europe (e.g. Belgium, Spain, Germany) so I thought this was somewhat clear.
This feels fairly semantical, and I think is quite a weak criticism. I think it’s quite common for people to express their confidence in a belief using confidence intervals (see Reasoning Transparency) where they use confidence in the way we used uncertainty. I think if your specific issue is that we’ve used the word ‘uncertainty’ when we should have used ‘confidence’, I think that’s a very minor point that would hardly affect the rest of our work. We do also in brackets right confidence intervals (e.g. 60-80% confidence) but I’m happy to update it if you think it’ll provide clarity. That being said, this feels quite nitpick-y and I assume most readers understood what we meant.
Regarding variance, I agree that we could have been clearer on the exact definition we wanted, but again I think this is quite nitpick-y as we do outline what we consider to be the variance in this case, namely in section 7:
“..there are large differences between the effectiveness of the most impactful protest movements, and a randomly selected (or median) protest movement. We think this difference is large enough such that if we were able to accurately model the cost-effectiveness of these protest movements, it would be different by a factor of at least 10. “
I don’t think this is a substantive criticism as it’s largely over a choice in wording, that we do somewhat address (even though I do think we could better explain our use of the term).
Overall, I think most of your issues are semantic or minor, which either I’m happy to amend (as in point 1 and 4) or think we’ve already made adjustments for weak survey evidence (as you point out in point 2). Based on this, I don’t think what you’ve said substantially detracts from the report, such that people should put significantly less weight on it. Due to that, I also disagree with your claim that “Given the severity of these issues, I’d expect many more methodological issues on further inspection.” as in my opinion, the issues you raise are largely not severe nor methodological.
That being said, I don’t expect anyone to update from thinking very negatively about protest movements to thinking very positively about protest movements based on this work. If anything, I expect it to be a reasonably minor-moderate update, given the still small evidence base and difficulties with conducting reasonable research in this space.
This feedback has definitely been useful, so if you do have other criticisms, I would be interested in hearing them. I do appreciate you taking the time to leave this comment, and being direct about your thoughts. If you’re at all interested, we would be happy to compensate you, or other willing and able reviewers, for their time to look through the work more thoroughly and provide additional criticism/feedback.
Thank you for your responses and engagement. Overall, it seems like we agree 1 and 2 are problems; still disagree about 3; and I don’t think I made my point on 4 understood and your explanation raises more issues in my mind. While I think these 4 issues are themselves substantive, I worry they are the tip of an iceberg as 1 and 2 are in my opinion relatively basic issues. I appreciate your offer to pay for further critique; I hope someone is able to take you up on it.
Great, I think we agree the approach outlined in the original report should be changed. Did the report actually use percentage of total papers found? I don’t mean to be pedantic but it’s germane to my greater point: was this really a miscommunication of the intended analysis, or did the report originally intend to use number of papers founds, as it seems to state and then execute on: “Confidence ratings are based on the number of methodologically robust (according to the two reviewers) studies supporting the claim. Low = 0-2 studies supporting, or mixed evidence; Medium = 3-6 studies supporting; Strong = 7+ studies supporting.”
It seems like we largely agree in not putting much weight in this study. However, I don’t think comparisons against a baseline measurement mitigates the bias concerns much. For example, exposure to the protests is a strong signal of social desirability: it’s a chunk of society demonstrating to draw attention to the desirability of action on climate change. This exposure is present in the “after” measurement and absent in the “before” measurement, thus differential and potentially biasing the estimates. Such bias could be hiding a backlash effect.
The issue lies in defining “unusually influential protest movements”. This is crucial because you’re selecting on your outcome measurement, which is generally discouraged. The most cynical interpretation would be that you excluded all studies that didn’t find an effect because, by definition, these weren’t very influential protest movements.
Unfortunately, this is not a semantic critique. Call it what you will but I don’t know what the confidences/uncertainties you are putting forward mean and your readers would be wrong to assume. I didn’t read the entire OpenPhil report, but I didn’t see any examples of using low percentages to indicate high uncertainty. Can you explain concretely what your numbers mean?
My best guess is this is a misinterpretation of the “90%” in a “90% confidence interval”. For example, maybe you’re interpreting a 90% CI from [2,4] to indicate we are highly confident the effect ranges from 2 to 4, while a 10% CI from [2.9, 3.1] would indicate we have very little confidence in the effect? This is incorrect as CIs can be constructed at any level of confidence regardless of the size of effect, from null to very large, or the variance in the effect.
Thank you for pointing to this additional information re your definition of variance; I hadn’t seen it. Unfortunately, it illustrates my point that these superficial methodological issues are likely just the tip of the iceberg. The definition you provide offers two radically different options for the bound of the range you’re describing: randomly selected or median protest. Which one is it? If it’s randomly selected, what prevents randomly selecting the most effective protests, in which case the range would be zero? Etc.
Lastly, I have to ask in what regard you don’t find these critiques methodological? The selection of outcome measure in a review, survey design, construction of a research question and approach to communicating uncertainty all seem methodological—at least these are topics commonly covered in research methods courses and textbooks.
Thanks for your quick reply Jacob! I think I still largely degree on how substantive you think these are, and address these points below. I also feel sad that your comments feel slightly condescending or uncharitable, which makes it difficult for me to have a productive conversation.
The first one—Our aim was to examine all the papers (within our other criteria of recency, democratic context, etc) that related to the impacts of protest on public opinion, policy change, voting behaviour, etc. We didn’t exclude any because they found negative or negligible results—as that would obviously be empirically extremely dubious.
I didn’t make this clear enough in my first comment (I’ve now edited it) but I think your social desirability critique feels somewhat off. Only 18% of people in the UK were supportive of these protests (according to our survey), with a fair bit of negative media attention about the protests. This makes it hard to believe that respondents would genuinely feel any positive social desirability bias, when the majority of the public actually disapprove of the protests. If anything, it would be much more likely to have negative social desirability bias. I’m open to ways on how we might test this post-hoc with the data we have, but not sure if that’s possible.
Just to reiterate what I said above for clarity: Our aim was to examine all the papers that related to the impacts of protest on public opinion, policy change, voting behaviour, etc. We didn’t exclude any because they found negative or negligible results—as that would obviously be empirically extremely dubious. The only reason we specified that our research looks at large and influential protest movements is that this is by default what academics study (as they are interesting and able to get published). There are almost no studies looking at the impact of small protests, which make up the majority of protests, so we can’t claim to have any solid understanding of their impacts. The research was largely aiming to understand the impacts for the largest/most well-studied protest movements, and I think that aim was fulfilled.
Sure—what we mean is that we’re 80% confident that our indicated answer is likely to be the true answer. For example, for our answers on policy change, we’re 40-60% confident that our finding (highlighted in blue) is likely to be correct e.g. there’s a 60-40% chance we’ve also got it wrong. One could also assume from where we’ve placed it on our summary table that if it was wrong, it’s likely to be in the boxes immediately surrounding what we indicated.
E.g. if you look at the Open Phil report, here is a quote similar to how we’ve used it:
I understand that confidence intervals can be constructed for any effect size, but we indicate the effect sizes using the upper row in the summary table (and quantify it where we think it is reasonable to do so).
The reasons why I don’t find these critiques as highlighting significant methodological flaws is that:
I don’t think we have selected the wrong outcome measure, but we just didn’t communicate it particularly well, which I totally accept.
The survey design isn’t perfect, which I admit, but we didn’t put a lot of weight on this for our report so in my view it’s not pointing out a methodological issue with this report. Additionally, you think there will be high levels of positive social desirability bias when this is the opposite of what I would expect—given the majority of the public (82% in our survey) don’t support the protests (and report this on the survey, indicating the social desirability bias doesn’t skew positive)
Similar to my first bullet point—I think the research question is well constructed (i.e. it wasn’t selecting for the outcome as I clarified) but you’ve read it in a fairly uncharitable way (which due to our fault, is possible because we’ve been vaguer than ideal)
Finally I think we’ve communicated uncertainty in quite a reasonable way, and other feedback we’ve got indicates that people fully understood what we meant. We’ve received 4+ other pieces of feedback regarding our uncertainty communication which people found useful and indicative, so I’m currently putting more weight on this than your view. That said, I do think it can be improved, but I’m not sure it’s as much of a methodological issue as a communicative issue.
I’m really sorry to come off that way, James. Please know it’s not my intention, but duly noted, and I’ll try to do better in the future.
Got it; that’s helpful to know, and thank you for taking the time to explain!
SDB is generally hard to test for post hoc, which is why it’s so important to design studies to avoid it. As the surveys suggest, not supporting protests doesn’t imply people don’t report support for climate action; so, for example, the responses about support for climate action could be biased upwards by the social desirability of climate action, even though those same respondents don’t support protests. Regardless, I don’t allege to know for certain these estimates are biased upwards (or downwards for that matter, in which case maybe the study is a false negative!). Instead, I’d argue the design itself is susceptible to social desirability and other biases. It’s difficult, if not impossible, to sort out how those biases affected the result, which is why I don’t find this study very informative. I’m curious why, if you think the results weren’t likely biased, you chose to down-weight it?
Understood; thank you for taking the time to clarify here. I agree this would be quite dubious. I don’t mean to be uncharitable in my interpretation: unfortunately, dubious research is the norm, and I’ve seen errors like this in the literature regularly. I’m glad they didn’t occur here!
Great, this makes sense and seems like standard practice. My misunderstanding arose from an error in the labeling of the tables: Uncertainty level 1 is labeled “highly uncertain,” but this is not the case for all values in that range. For example, suppose you were 1% confident that protests led to a large change. Contrary to the label, we would be quite certain protests did not lead to a large change. 20% confidence would make sense to label as highly uncertain as it reflects a uniform distribution of confidence across the five effect size bins. But confidences below that, in fact, reflect increasing certainty about the negation of the claim. I’d suggest using traditional confidence intervals here instead as they’re more familiar and standard, eg: We believe the average effects of protests on voting behavior is in the interval of [1, 8] percentage points with 90% confidence, or [3, 6] pp with 80% confidence.
Further adding to my confusion, the usage of “confidence interval” in “which can also be interpreted as 0-100% confidence intervals,” doesn’t reflect the standard usage of the term.
Sorry, I think this was a miscommunication in our comments. I was referring to “Issues you raise are largely not severe nor methodological,” which gave me the impression you didn’t think the issues were related to the research methods. I understand your position here better.
Anyway, I’ll edit my top-level comment to reflect some of this new information; this generally updates me toward thinking this research may be more informative. I appreciate your taking the time to engage so thoroughly, and apologies again for giving an impression of anything less than the kindness and grace we should all aspire to.