An Initial Response to MFA’s Online Ads Study

I think that Mercy For Animals is really leading by example by performing evaluations like this and being so transparent with results and much credit to Peter Hurford, .impact and Jason Ketola for an excellent study design. :)

I don’t think that any of the points I make require urgent answers and please know that I am not trying to be critical of MFA or any individuals involved with this study. All of these comments/​questions are strictly my own personal views and aren’t that of any of my current or future employers. I think that it’s really worthwhile for everyone to do their best to understand what these results imply for animal advocacy and what we can learn for future studies. I have also attempted to be polite with my tone but this can often be a struggle when communicating online. If I come across as abrasive, rude or confrontational at some points this really wasn’t my intention :).

It’s also probably worth noting that empirical studies are really hard to fully execute and even the most competent professionals sometimes make mistakes. For instance, this year’s winner of the Nobel Prize in economics fairly recently published a paper that seems to have data analysis errors.

I also think it’s great to see that most people in the FaceBook thread are responding in a constructive manner yet disappointing to see some responding with possibly intentionally insulting comments.

My comments/​questions ended up becoming pretty long so I copy and pasted them into a Google Document in case it’s easier to give feedback there. In addition I also posted a summary of this post to the original FaceBook thread discussing this study here.

I think that the most important points I make are:

  1. The study is really useful because we now can say that it’s likely that the effect of animal advocacy in this context is less than some threshold.

  2. I don’t think this threshold is a 10% difference between the groups. This seems to be true for the raw data where the number of participants in each group was ~1000 but it seems that there were 684 participants in the control group and 749 in the experimental so I think the threshold is greater than a 10% difference between the groups.

  3. I think that the way this study has been reported has some really admirable aspects.

  4. Some of the wording used to report the study may be slightly misleading.

  5. I think there may be some overestimation of how big a sample size is needed in a study like this to get useful results.

  6. I think the results of this study are reason to think that animal advocacy via online advertising in this context is less effective than I previously thought it to be. This is because this study suggests that it’s unlikely the effects of online advertising in this context are above a threshold which I previously assigned some probability to and I have now lessened the probability that I put on an effect like this in light of the findings of this study. As a result I would direct slightly less resources to online advertising in this context relative to other techniques then I would have prior to being aware of the results of this study.

With that out of the way my extended questions/​comments on the reporting of this study and what we can learn for future studies will now begin! :)

Would it be better to do a pre-analysis plan for future studies?

Would it be better to do pre treatment/​intervention and post treatment/​intervention data collection rather than just post treatment/​intervention data collection for future studies? By this I mean something like a baseline survey and an endline survey which seemed to be used in a lot of social science RCTs.

Was it worth using Edge Research to analyze the data for this study? Will external bodies like Edge Research do data analysis for future MFA studies?

Why was the study so low powered? Was it originally thought that online ads were more effective or perhaps the study’s power was constrained by inadequate funding?

“Edge Research then “weighted” the data so the respondents from each group were identical in gender, geography, and age.” I am not totally sure what this means and it seems important. It would be great if someone could please explain more about what the “weighting” process entails.

“Our study was powered to detect a 10 percent difference between the groups…”

I really like how much emphasis is being put on the power of this study. I think too often not enough attention is paid to this aspect of study methodology. Although, I think it might be a little misleading to say that this study was powered to detect a 10 percent difference between the groups but I am not totally sure. I think this statement is true for the raw data which has ~1000 participants in each group but if we are just focused on the 684 participants in the control group and 749 participants in the experiment group I think the study probably has less power than that needed to detect a 10 percent difference between the groups.

“… a 10 percent difference between the groups, and since the differences between the groups were much smaller than that, we can’t be confident about whether the differences between the groups were due to chance or were the true group means.“

This is a good point and I am glad that it is being made here but I worry it could be a little misleading. When I analyzed the percentage difference in animal product consumption of the experiment group relative to the control group I seemed to find an average difference of 4.415%. It might also be worth noting that the percentage difference in pork consumption in the experiment group relative to the control group was 9.865% and the percentage difference in chicken consumption in the experiment group relative to the control group was 7.178%. It depends on one’s reference frame and this might just be semantics but in this context I probably wouldn’t characterize 4.415%, 7.178% and 9.865% as much smaller than 10%.

“However, since other studies have found that showing footage of farmed animal cruelty decreases meat consumption, and since the difference in our study was not close to statistically significant, we think it’s unlikely.”

It is great practice to link to other studies like this when writing up the results of study. I encourage people to continue doing this :) I noticed that “other studies” was hyperlinked to the abridged report of a study titled “U.S. Meat Demand: The Influence of Animal Welfare Media Coverage.” My limited understanding is that that study is looking at the effect that newspaper and magazine articles regarding farm animal welfare has on meat consumption. Based off this, I am not sure what conclusions we can reach about the effect of footage of farmed animal cruelty on meat consumption.

“However, since other studies have found that showing footage of farmed animal cruelty decreases meat consumption, and since the difference in our study was not close to statistically significant, we think it’s unlikely.”

Hmm… I could be wrong but I feel that saying that emphasising the increased animal product consumption results were not close to being statistically significant may slightly conflict with this later statement: “Participants 17–20 and 21–25 in the experimental group appeared to be eating slightly to dramatically more animal products than those of the same age in the control group. However, none of these differences was statistically significant. Had we not applied the Bonferroni correction, nearly half of these differences would have been statistically significant at the 85% or 95% level.” This could be a misunderstanding on my part.

“Because of the extremely low power of our study, we don’t actually know whether the two groups’ diets were the same or slightly different.”

“extremely” might be a bit of an overstatement :) It guess it depends once again on one’s reference frame and possibly semantics. It might be worth mentioning that my understanding is that this is the highest powered study that the animal advocacy movement has completed.

“Based on our study design, it appears we would have needed tens to hundreds of thousands of participants to properly answer this question.”

This may be a little misleading. Taking reported egg consumption which is perhaps the most suffering dense animal product and therefore possible impacts on egg consumption are perhaps the most important thing us and doing a quick two tail test power calculation based on participants reported consumption of egg servings in the past two days using:

Control group mean:1.389

Experiment group mean: 1.447

Ball parking the standard deviation from the reported consumption of egg servings to be ~ 1.79

This gives a sample size required to detect s.s. results 80% of the time at the classic probability level of p=0.05 to be approximately 15,000 participants in total. This sample size seems to clearly be on the lower end of the estimated sample size required and it’s probably good that future studies take a consideration like this into account.

“The bottom line is this: We don’t know whether showing people farmed animal cruelty video in this context will cause them to slightly increase, slightly decrease, or sustain their consumption of animal products a few months later.”

Nice work on clearly communicating the bottom line of the results of the study. This is a great precedent for future studies to ensure that the results aren’t inaccurately interpreted by others which can be a significant problem with studies like this one. I guess I wonder if a more informative bottom line would be something like: “We think that it’s likely that showing people a farmed animal cruelty video in this context will not cause more than a [insert answer from reworked accurate power calculation] overall difference in animal product consumption a few months later compared to someone who is similar in almost all respects but didn’t watch a farmed animal cruelty video in this context.”

“We compared the groups in two ways: (1) with a Bonferroni correction to account for the fact that we were conducting multiple group comparisons, and (2) without the Bonferroni correction.”

Excellent work using the Bonferroni correction! :) Again, I think that is another great precedent for future studies to follow.

“If the “less meat” and “no meat” categories were combined into one general “reducer” category, we would find a statistically significant difference at the 95% level, with the experimental group more likely to intend to eat less meat four months into the future.”

I really like how transparent this is and that it was made clear that categories would have to be combined in order for there to be a statistically significant result. Less transparent reporting of this study could easily have neglected to mention that fact. It might be worth mentioning that I feel as if reporting this result without reporting the somewhat analogous result of the previous section titled “Self-Reported Dietary Change” by combining the categories of “Decreased in last 4 months” and “Did not eat meat then, do not eat meat now” may be slightly inconsistent reporting of results. This could be a misunderstanding on my part though.

“Finally, we asked three questions about attitudes. Answers to these questions have correlated with rates of vegetarian eating and meat reduction in other studies.”

This may be a silly question but what studies are being referred to here? :)

“Discouragingly, we found no statistically significant differences in reported meat, dairy, and egg consumption. But because we powered the study to detect only a 10% difference, we can’t be confident there was no difference or a modest positive or negative difference. Since even a tiny reduction (for example 0.1%) in meat consumption could make an online advertising program worthwhile, there’s no useful information we can take away from participants’ reported food choices.”

I probably wouldn’t say that there’s no useful information that we can take away from participants’ reported food choices. I think you’re being much too hard on yourselves there :). To quote some of a previous comment from Harish Sethu who made this point very well on the FaceBook thread: “All studies, even ones which do not detect something statistically significant, tell us something very useful. If nothing, they tell us something about the size of the effect relative to the size of any methodological biases that may mask those effects. Studies which fail to detect something statistically significant play a crucial role in informing the methodological design of future studies (and not just the sample size).”

“Based on all of the above, we don’t feel the study provides any concrete practical guidance. Therefore, the study won’t cause us to reallocate funding for our online advertising in a positive or negative direction.”

This seems surprising to me. It seems to imply that all of the information gathered cancels out and doesn’t cause one to update in the positively or negatively at all. I previously I had a relatively weak and fairly dispersed prior regarding the impact of animal advocacy via online advertising. I think the results of this study are reason to think that animal advocacy via online advertising in this context is less effective than I previously thought it to be. This is because this study suggests that it’s unlikely the effects of online advertising in this context are above a threshold which I previously assigned some probability to and I have now lessened the probability that I put on these larger effects in light of the findings of this study. As a result I would direct slightly less resources to online advertising in this context relative to other techniques then I would have prior to being aware of the results of this study.

“Large-scale studies such as this one have found that for online advertising campaigns, the majority of impact comes from changing the behaviors of those who view the ad but never click on it. In our study, we looked only at those who actually clicked on the ad. Therefore, our study wasn’t a study of the overall impact of online pro-vegetarian ads; rather it was a study on the impact of viewing online farmed animal cruelty video after clicking on an ad.”

I am not totally sure but I think that the wrong study may have been linked here accidentally. I skimmed the study that has been linked and couldn’t find any evidence that supports the claim that is being made and instead supports the claim that is made in the next bullet point of MFA’s report. But maybe I missed something when skimming. Is there is a specific page number or section of this linked study that has the relevant information? I would be quite interested in looking at a study that did find the results mentioned here.

“Large-scale studies of online advertising have also found that sample sizes of more than 1 million people are typically needed before statistically significant behavioral changes can be detected. We spoke to numerous data collection companies as well as Facebook and accessing that much data for a study like this is not possible.”

I am little skeptical of the external validity of those results when applied in the context of the ads used in this MFA study. This is because there seem to be a couple of key relevant differences between the types of ads used in these two studies. Based on my limited understanding MFA’s ads in this study were targeted FaceBook ads which when clicked took participants to a landing page which played a video. In contrast, this other cited study’s ads were display ads on Yahoo’s homepage and participants were random visitors who were assigned to treatment. Based on this, I am unsure how informative it is to mention this other study in this manner to support of the conclusion that it is difficult to draw practical conclusions from this MFA study. It might also be worth drawing attention to the previous power calculation than I did which suggested we may be able to get quite useful results from a total sample size of 15,000 participants.

Counterfactuals are always really difficult to get at but I can’t help but wonder if this would have been mentioned in the initial reporting of the study: “Numerous studies have found that self-reports on dietary choices are extremely unreliable. On top of that, we’ve also found that diet-change interventions (like the one we did with the experimental group) can change how people self-report their food choices. This combination of unreliability and discrepancy suggests the self-reports of servings of animal products eaten may not be valid.” if the study had found positive results? My intuition is that it wouldn’t and I think that may be a little bit problematic.

I think that it’s great that you have made the raw data from this experiment public. This is another thing that future studies would do well to emulate. I would love to do a quick re-analysis of the data to make sure that we reach the correct conclusions from this study. I had a quick look but it isn’t initially clear to me how to reach 684 participants in the control group and 749 people in the experiment group. My guess is that all of the participants who were 12 years or younger, didn’t fully complete the survey and were not female were excluded from the final analysis. Is that right? Or was there some additional exclusion criteria?