Independent re-analysis of MFA veg ads RCT data
If you would like to get people to stop eating animals, there are a lot of things you could do: protest meat-serving restaurants, hand out leaflets, show online ads, lobby companies to do āmeatless mondaysā, etc. To compare these, it would be useful to know how much of an impact they have. A while ago I proposed a simple survey to measure the impact of online ads:
Show your ads, as usual
Randomly divide people into control and experimental groups, 50-50.
Experimental group sees anti-veg page, control group sees something irrelevant.
Use retargeting cookies to advertise to pull people back in for a follow-up.
Ask people whether they eat meat.
Well, some people planned a study along these lines (methodology) and the results are now out. They randomized who saw the anti-meat videos, followed up with retargeting cookies, and asked people questions about their consumption of various animal products. This is the biggest study of its type I know of, and Iām very excited that its now complete.
The biggest problem I see is that they ended up surveying many fewer people than they set out to. The methodology considered how many people would need to complete the survey to pick up changes of varying sizes, and concluded:
We need to get at minimum 3.2k people to take the survey to have any reasonable hope of finding an effect. Ideally, weād want say 16k people or more.
They only got 2k responses, however, and only 1.8k valid ones. This means the study is āunderpoweredā: even if an effect exists at the size the experimenters expected, thereās a large chance the study wouldnāt be able to clearly show the effect.
Still, letās work with what we have. To compensate for having minimal data, we should run a single test, the one we think is most applicable. Running multiple tests would mean weād need to use a Bonferroni correction or something similar, and that dramatically decreases your statistical power.
Before looking at the data or reading the writeup, I committed (via email to David and Allison) to an approach, what I thought of as the simplest, most straight-forward way of lookign at it. I would categorize each sample as āmeat-eatingā or āvegetarianā based on whether they reported eating any meat in the past two days, compute an effect size as the difference in vegetarianism between the two groups, and compute a p-value with a standard two-tailed t-test.
So what do we have to work with for questions? The survey asked, among other things:
In the past two days, how many servings have you had of the following foods? Please give your best guess.
Pork (ham, bacon, ribs, etc.)
Beef (hamburgers, meatballs, in tacos, etc.)
Dairy (milk, yogurt, cheese, etc.)
Eggs (omelet, in salad, etc.)
Chicken and Turkey (fried chicken, turkey sandwich, in soup, etc.)
Fish and Seafood (tuna, crab, baked fish, etc.)
This is potentially rich data, except I donāt expect peopleās responses to be very good. If I tried to answer it, Iām sure Iād miss things for silly reasons, like forgetting what I had for dinner yesterday or not being sure what counts as a serving. On the other hand, if I had a policy for myself of not eating meat, it would very easy to answer those questions! So I categorized people just as āeats meatā vs ādoesnāt eat meatā.
There were 970 control and 1054 experimental responses in the dataset they released. Of these, only 864 (89%) and 934 (89%) fully filled out this set of questions. I counted someone as a meat-eater if they answered anything other than ā0 servingsā to any of the four meat-related questions, and a vegetarian otherwise. Totaling up responses I see:
valid responses | vegetarians | % | |
---|---|---|---|
control | 864 | 55 | 6.4% |
experimental | 934 | 78 | 8.4% |
The bottom line is, 2% more people in the experimental group were vegetarians than in the control group (p=0.053 p=0.108). Honestly, this is far higher than I expected. Weāre surveying people who saw a single video four months ago, and weāre seeing that about 2% more of them are vegetarian than they would have been otherwise.
Update 2016-02-20: I computed the p-value wrong; 0.053 was from a one-tailed test instead of a two-tailed test. The right p-value is 0.108. (I had used an online calcalculator intended for evaluating A/āB tests that give you conversion numbers. It didnāt specify one- or two-tailed, but since two-tailed is what you should use for A/āB tests thatās what I thought it would be using. After Alexander, Michael, and Dan pointed out that it looked wrong, I computed a p-value computationally. [1])
This is a very different way of interpreting the study results than any of the writeups Iāve seen. Edgeās Report, Mercy for Animals, and Animal Charity Evaluators all conclude that there was basically no effect. I think this mostly comes from their asking questions where Iād expect the data to be noisier, like looking at how much of various things people think they eat or their attitudes toward meat consumption, plus their asking lots of different questions and so needing to correct downward to compensate for the multiple comparisons.
(Thereās probably something interesting you could do comparing the responses to the attitude questions with whether people reported eating any meat. I started looking at this some, just roughly, but didnāt get very far. Maybe there are hints that the ads do their work by reducing recidivism instead of convincing people to give up meat, but Iām too sleepy to figure this out. My work is all in this sheet.)
[1] See footnote on jefftk.com version; e-a.com doesnāt preserve indentation in pre-tags.
Thanks very much for doing this Jeff. Itās useful to have an independent re-analysis. My credence that these ads work is increased from knowing that the data has been re-analysed by someone who would have expected no effect, and in fact did find one. Even if the effect was reducing recidivism, that still seems pretty useful! Hopefully in the future there will be more studies done that actually get statistically significant results.
From your results it still would be reasonable to conclude that thereās āno effectā since the p-value is >0.05, but the p-value is low enough that I would give a follow-up study a reasonably high chance of getting a statistically significant result.
Also: it looks like youāre using a one-sided t-test to get your p-value. I donāt know much about significance testing but wouldnāt it be better to use a two-sided t-test? My understanding is that one-sided tests are sort of cheating by making your p-value half of what it really should be.
I agree that a two-sided test would be the right thing to use here, and p-value calculations arenāt something I fully understand. Is this calculation one-sided or two-sided?
It looks like the NORMDIST function on your sheet is taking the integral from 0 to
z_score
, which is one-sided. A two-sided test would takeI canāt tell whatās being done in that calculation.
Iām getting a p-value of 0.108 from a Pearson chi-square test (with cell values 55, 809; 78, 856). A chi-square test and a two-tailed t-test should give very similar results with these data, so I agree with Michael that it looks like your p=0.053 comes from a one-tailed test.
Yes, youāre right. Sorry! I redid it computationally and also got 0.108. Post updated.