Thoughts on the Reducetarian Labs MTurk Study

When it comes to helping non­hu­man an­i­mals in fac­tory farms, there’s a lot of things peo­ple can do. But when figur­ing out what is best, we cur­rently have to rely en­tirely on our in­tu­itions. The hope is that, in the fu­ture, em­piri­cal re­search will at least be able to strongly guide our in­tu­itions and im­prove the effec­tive­ness at which we work as well as our un­der­stand­ing of how to achieve vic­to­ries for an­i­mals.

Prior work has ei­ther had an in­suffi­ciently large sam­ple size to find sig­nifi­cant effects (e.g., ACE 2013 Hu­mane Ed­u­ca­tion Study, MFA 2016 On­line Ads Study), had an in­suffi­cient con­trol group size (e.g., ACE 2013 Leaflet­ing Study, 2014 FARM Pay-per-view Study, VO 2015 Leaflet­ing Study; Feren­bach, 2015; Hen­nessy, 2016), had no con­trol group (e.g., THL 2011 On­line Ads Study; THL 2012 Leaflet­ing Study; THL 2014 Leaflet­ing Study; THL 2015 Leaflet­ing Study; VO 2015 Pay-per-view Study; James, 2015; Ve­gan­uary 2016 Pledge Study), only mea­sured in­tent to change diet rather than ac­tual self-re­ported diet change (e.g., MFA 2016 MTurk Study), or did not at­tempt to mea­sure differ­ences against a baseline prior to the in­ter­ven­tion (e.g., VO 2014 Leaflet­ing Study; CAA 2015 VegFest Study; VO 2016 Leaflet­ing Study; Ardnt, 2016). (This is all the veg re­search I know of, if you can name any more, I’d be happy to add them to this liter­a­ture overview.)

How­ever, now, the An­i­mal Welfare Ac­tion Lab and Re­duc­etar­ian Labs teamed up, with help from my­self and Kieran Greig, to pro­duce the 2016 AWAL News­pa­per Study (see an­nounce­ments from the Re­duc­etar­ian Foun­da­tion and from AWAL), and found the first statis­ti­cally sig­nifi­cant effect on meat re­duc­tion us­ing an ac­tual, and siz­able, con­trol group. This is a very amaz­ing con­tri­bu­tion, and I ap­plaud the whole team for do­ing it well!

In this doc­u­ment, I aim to re-an­a­lyze the study for my­self. In do­ing so, I use their data but my own statis­ti­cal anal­y­sis to repli­cate the effects I care most about. I also recre­ate all the anal­y­sis in this writeup, rephras­ing it in my own words (so I and oth­ers can un­der­stand from slightly differ­ent per­spec­tives).

What is the ba­sic method­ol­ogy?

Par­ti­ci­pants were re­cruited through Ama­zon Me­chan­i­cal Turk, which is a broadly rep­re­sen­ta­tive sam­ple of the US pop­u­la­tion that can be re­cruited for on­line sur­veys at low cost. 3076 par­ti­ci­pants were re­cruited to take a baseline food fre­quency ques­tion­naire with a va­ri­ety of self-re­ported be­hav­ior and at­ti­tude ques­tions. How­ever, for this re-anal­y­sis, I’m only go­ing to dig into the ques­tions about how many serv­ings of meat they self-re­port eat­ing.

One week af­ter that, par­ti­ci­pants were re­con­tacted to view a news­pa­per ar­ti­cle. One third of par­ti­ci­pants ran­domly saw an ar­ti­cle ad­vo­cat­ing veg­e­tar­i­anism, an­other third of par­ti­ci­pants ran­domly saw an ar­ti­cle ad­vo­cat­ing re­duc­etar­i­anism (eat­ing less meat), and a fi­nal third of par­ti­ci­pants saw a con­trol ar­ti­cle ad­vo­cat­ing ex­er­cise. Im­me­di­ately af­ter view­ing the ar­ti­cle, par­ti­ci­pants were asked to re­port their diet.

Five weeks af­ter that, par­ti­ci­pants were re­con­tacted and asked to re­port their diet again.

Ex­act copies of the news­pa­per ar­ti­cles, copies of the sur­vey, and full method­ol­ogy are available in the study writeup.

What are the head­line re­sults?

  • Par­ti­ci­pants in the treat­ment groups, on av­er­age, changed their diet to eat 0.8 less serv­ings of turkey, pork, chicken, fish, and beef per week than those in the con­trol group.

  • There was a statis­ti­cally sig­nifi­cant differ­ence be­tween the three groups (ANOVA, p = 0.03) and be­tween treat­ment (pooled) and con­trol (chi-square test, p = 0.001).

  • There is no statis­ti­cally sig­nifi­cant differ­ence be­tween the re­duc­etar­ian mes­sage and the veg­e­tar­ian mes­sage on diet change (chi-square test, p = 0.09).

  • The pres­ence of a con­trol group is im­por­tant. Without look­ing at the con­trol group, there ap­pear to be sig­nifi­cant de­creases in veg­e­tar­ian rates, but when tak­ing the con­trol group into ac­count these de­creases dis­ap­pear.

There were other effects on in­ten­tions too, but I pre­fer to fo­cus on the diet change since it’s a lot more im­por­tant and ex­cit­ing to me. For more, feel free to check out the origi­nal study.

How should we cau­tiously in­ter­pret these re­sults?

The study re­ports an av­er­age of eat­ing one less serv­ing of meat for the treat­ment groups, but I think it is more clear to split this into three groups—of the 1422 peo­ple in the treat­ment groups, 258 peo­ple (18.2%) had no change, 538 (37.8%) in­creased an av­er­age of 6.94 serv­ings per week of meat on av­er­age, and 626 (44.0%) peo­ple de­creased an av­er­age of −7.66 serv­ings per week.

For the 702 peo­ple in the con­trol group, 138 peo­ple (19.7%) had no change, 286 (40.7%) peo­ple in­creased an av­er­age of 6.90 serv­ings per week, and 278 (39.6%) de­creased an av­er­age of −6.34 serv­ings per week.

Thus the treat­ment and con­trol group differ in both the num­ber of peo­ple who ul­ti­mately de­cide to re­duce and the mag­ni­tude of the av­er­age re­duc­tion for those that do end up re­duc­ing. Break­ing it up like this, the num­ber of peo­ple who re­duce (hold­ing the mag­ni­tude of re­duc­tion con­stant) is in­signifi­cant be­tween groups af­ter con­trol­ling for mul­ti­ple hy­poth­e­sis test­ing us­ing Ben­jam­ini-Hochberg pro­ce­dure (t-test, p = 0.0585), and the mag­ni­tude of re­duc­tion among those who do re­duce (hold­ing the num­ber of peo­ple re­duc­ing con­stant) is sig­nifi­cant (t-test, p = 0.007).

We can then keep in mind that the ac­tual mag­ni­tude of change is, for many peo­ple, a lot more than one serv­ing per week, even if it is one serv­ing per week on av­er­age for the en­tire group.

All that be­ing said, it’s still un­clear to what de­gree we can take these num­bers liter­ally, since peo­ple can’t cor­rectly re­call the pre­cise num­bers of serv­ings they have eaten over the past month, and be­cause there is a good deal of fluc­tu­a­tion in diets. How­ever, it does look like this re­duc­tion effect sur­vives a few differ­ent ways of look­ing at it and also sur­vives my in­de­pen­dent data re-anal­y­sis, such as by look­ing at the mag­ni­tude of re­duc­tion and a bi­nary value of whether or not there was any re­duc­tion; by look­ing across ANOVA, chi-square, and t-tests; and by ei­ther pool­ing or not pool­ing the treat­ment groups.

I also did an ex­tra san­ity check—did the treat­ment cause peo­ple to also re­duce on fruits, nuts, veg­eta­bles, beans, and grains? Or maybe to in­crease on veg­eta­bles due to so­cial de­sir­a­bil­ity bias? The an­swer is no on the first (chi-squared, p = 0.7933) and no on the sec­ond (chi-squared, p = 0.1778), both of which are good for the re­sults of this study.

When re­duc­ing, are peo­ple just shift­ing away from beef and to­ward chicken?

In a word, no.

Among those who re­duced beef, there was also a −0.7 serv­ings per week re­duc­tion of chicken in the con­trol group and a −2.2 re­duc­tion of chicken in the treat­ment group.

Similarly, among those who re­duced chicken, there was also a −1 serv­ings per week re­duc­tion of beef in the con­trol group and a −2.3 re­duc­tion of beef in the treat­ment group.

I’m not mak­ing any claims of statis­ti­cal sig­nifi­cance be­tween treat­ment and con­trol groups for this, but it is pretty clear that peo­ple aren’t shift­ing from beef to chicken, but rather just re­duc­ing across the board.

What did this study not find?

Notably to me, while the mag­ni­tude of peo­ple eat­ing less meat is sig­nifi­cant, this study found no effect on peo­ple cut­ting out meat en­tirely (even though the veg­e­tar­ian ap­peal sug­gested do­ing that). In­fer­ring veg­e­tar­i­anism from peo­ple who re­ported no serv­ings con­sumed of beef, turkey, chicken, fish, and pork, in the treat­ment group nine peo­ple start veg­e­tar­i­anism and twelve peo­ple stop and in the con­trol group four peo­ple start veg­e­tar­i­anism and three peo­ple stop. This differ­ence is not sig­nifi­cant among the groups (ANOVA, p = 0.26) or among treat­ment vs. con­trol (chi-square test, p = 0.55).

What im­pli­ca­tion does this have for our ex­ist­ing strate­gies?

I’m go­ing to spec­u­late pretty wildly here and af­ter­wards I’ll men­tion ap­pro­pri­ate dis­claimers that walk back my spec­u­la­tion.

Ac­cord­ing to ACE’s re­search which re­lies on a re-anal­y­sis of a 2012 THL study, we come up with ex­pec­ta­tions that 3.3% of peo­ple shown a leaflet or on­line ad will stop eat­ing red meat, 1.6% will stop eat­ing chicken, 1.0% will stop eat­ing fish, 0.4% will stop eat­ing eggs, and 0.6% will stop eat­ing dairy.

Ac­cord­ing to this study, 2.7% of peo­ple shown a news­pa­per story about veg­e­tar­i­anism or re­duc­etar­i­anism stop eat­ing red meat. How­ever, no­tably 2% of peo­ple shown the story about ex­er­cise also stop eat­ing red meat. This means that we would es­ti­mate the true effect of the news­pa­per study to be a net change of 0.7 per­centage points. (Pre­sum­ably, the 2% change from the con­trol story comes not from peo­ple chang­ing their diet once con­vinced about the power of ex­er­cise, but from just a gen­eral trend to­ward veg­e­tar­i­anism over time un­re­lated to any news story, and/​or from so­cial de­sir­a­bil­ity bias, and/​or a mix of other fac­tors.)

Go­ing down the list, there is a −0.4 per­centage point change for elimi­nat­ing chicken (once the treat­ment group is com­pared to the con­trol group), and −0.1 per­centage points for fish, +2.0 per­centage points for eggs (mean­ing peo­ple in the treat­ment group ate more eggs than peo­ple in the con­trol group—this could be a sub­sti­tu­tion effect or it could be just ran­dom noise), and −0.6 per­centage points for dairy. Th­ese stark de­par­tures from the other study re­mind the im­por­tance of in­clud­ing a con­trol group. Also, don’t take these num­bers liter­ally be­cause none of them were statis­ti­cally sig­nifi­cantly differ­ent from +0 per­centage points (no change).


I’m not go­ing to plug those num­bers into the calcu­la­tor and call it a day, though, be­cause the re­al­ity is that there is no statis­ti­cally sig­nifi­cant differ­ence be­tween groups on in­di­vi­d­ual food items, likely be­cause of the lower sam­ple sizes and high within-item vari­abil­ity. The pat­tern only emerges on the larger scale of re­duc­tion across all food.

It’s also worth not­ing that a news­pa­per ar­ti­cle on MTurk is much differ­ent than a leaflet handed out in per­son along with an in-per­son sur­vey. MTurk can be bet­ter in some re­spects, since you’re less likely to have a non­re­sponse bias in who an­swers your sur­vey as it’s eas­ier to fol­low up with ev­ery­one. Also, you have a much bet­ter knowl­edge of who was ac­tu­ally in your treat­ment and con­trol groups. On the other hand, a news­pa­per ar­ti­cle is less per­sua­sive than a leaflet or video, and the fact that peo­ple are paid to read it and may ex­pect com­pre­hen­sion ques­tions cre­ates an un­re­al­is­tic effect that won’t be pre­sent in real life.


Tak­ing this a differ­ent way, we may want to fo­cus on re­duc­tion in­stead of elimi­na­tion, since that’s what this study was about. How­ever, ACE num­bers are about how many an­i­mals are spared, whereas we only know about the num­ber of serv­ings that are not eaten (and even that may be hard to take liter­ally).

To sim­plify things for my­self, I’m go­ing to look just at chicken con­sump­tion, since chick­ens are the vast ma­jor­ity of fac­tory farmed an­i­mals. From this study, we found that par­ti­ci­pants re­ported eat­ing an av­er­age of 4.9 serv­ings of chicken per week at baseline. Given that a serv­ing of chicken is ap­prox­i­mately 3 ounces, 4.9 serv­ings of chicken per week is 0.41kg of chicken per week, or 21.32kg per year. As a san­ity check this matches up well with the USDA re­port­ing (p15) that Amer­i­cans ate 24kg of chicken per year in 2000.

A chicken weighs 1.83kg, so tak­ing this sur­vey data liter­ally would mean that sur­vey re­spon­dents are con­sum­ing 11.6 chick­ens per year and re­spon­dents in the treat­ment group re­duce their con­sump­tion by 0.26 serv­ings per week, which as­sum­ing treat­ment effects con­tinue to hold and don’t de­cline (a strong as­sump­tion) and pro­ject­ing those effects out an­nu­ally, would be a re­duc­tion of roughly 1.1 chick­ens per year per re­spon­dent.

As­sum­ing the news­pa­per ad over MTurk is the same as a leaflet or an on­line ad (an­other strong as­sump­tion), we could pro­ject to 1.1 chick­ens saved per $0.35, or 3.1 chick­ens spared per dol­lar, which is quite close to ACE’s es­ti­mate of 3.6 chick­ens spared per dol­lar.

Go­ing off the spec­u­la­tive deep end, we can note that a fac­tory farmed broiler chicken lives for 42 days on av­er­age, so 3.1 chick­ens spared per dol­lar is 130 days of fac­tory farmed suffer­ing averted per dol­lar, which is $2.81 per chicken DALY.

How­ever, I want to strongly cau­tion against tak­ing these num­bers very liter­ally, since there still is a lot we don’t know. The serv­ing sizes re­ported by our sam­ple are not close to literal num­bers for a va­ri­ety of rea­sons, and have a lot of noise and fluc­tu­a­tion. Also, there are a lot of differ­ences be­tween MTurk and the real world, and MTurk might not ac­cu­rately re­flect how our ma­te­ri­als do against the real pub­lic.


Lastly, it’s also in­ter­est­ing to note that the re­duc­etar­ian mes­sage and the veg­e­tar­ian mes­sage re­sulted in roughly the same (no statis­ti­cally sig­nifi­cant differ­ence) amount of meat re­duc­tion. How­ever, this claim should not be taken too liter­ally, as it could eas­ily be due to the sam­ple size not be­ing large enough to pick up a small differ­ence be­tween the two mes­sages. This was similar to a find­ing in Feren­bach (2015) where both videos tested pro­duced roughly the same amount of be­hav­ior change, though the sam­ple sizes in that study were even smaller.

Why did this study work when oth­ers haven’t?

Ear­lier I men­tioned that all pre­vi­ous stud­ies have been held back by hav­ing in­ad­e­quate sam­ple sizes or not hav­ing a con­trol group. Through the magic of a con­trol group and an ad­e­quate sam­ple size, this study pre­vailed. Pretty sim­ple!

One way statis­ti­cally sig­nifi­cant effects could be found in a smaller sam­ple was through a ran­dom­ized block de­sign. This was used by AWAL to re­duce the var­i­ance in the out­come mea­sure, which in­creases statis­ti­cal power.

This study had a con­trol group of 742 peo­ple and a treat­ment group of 1495 peo­ple. Com­bined, that’s about 20% larger than the pre­vi­ous largest study, the MFA 2016 On­line Ads Study, with a treat­ment group of 934 peo­ple and a con­trol group of 864 peo­ple.

What unan­swered ques­tions re­main?

I’d like to do a more care­ful power anal­y­sis to see ex­actly how this study sur­veyed enough peo­ple to be effec­tive, es­pe­cially since it’s not that much larger than other stud­ies.

I’d be cu­ri­ous to also repli­cate the study anal­y­sis with­out the ran­dom­ized block de­sign and see if the effects still hold. It could be that statis­ti­cal power was only achieved through the block de­sign.

It’s also pos­si­ble the study may have just got­ten lucky.


I’d like to look more into the at­tri­tion data. Since some peo­ple who took the first wave didn’t show up for the sec­ond wave and some peo­ple who took the sec­ond wave didn’t show up for the third wave (de­spite a much higher com­pen­sa­tion), there’s a worry about a non­re­sponse bias where peo­ple pre­dis­posed to not like veg­e­tar­i­anism drop out in­stead of filling out their sur­vey show­ing their lack of change, which leads us to over­state the amount of veg­e­tar­i­anism.


I’d also like to look more deeply at the way the food fre­quency ques­tion­naire was used. As my friend and ACE in­tern Kieran Greig pointed out to me, the FFQ asks the re­spon­dent about their meat in­take in dis­crete, or­di­nal buck­ets (zero times per week, 0-1 times per week, 1-6 times per week, 1-3 times per day, 4 or more times per day) and these buck­ets are then trans­formed into con­tin­u­ous, nu­mer­i­cal data (e.g., “zero times per week” be­came 0 times per week, “1-6 times per week” be­came 3.5 times per week, “4 or more times per day” be­came 28 times per week).

How­ever, there are mul­ti­ple other ways these buck­ets could have been trans­formed into con­tin­u­ous data (for ex­am­ple, as­sum­ing “4 or more” roughly be­comes 4 seems to re­ally un­der­es­ti­mate the “or more” part), and it would be quite prob­le­matic if the effect failed to repli­cate un­der cer­tain meth­ods and not oth­ers. I have not yet tested this, but the fact that the study effects do hold un­der a bi­nary vari­able (any meat re­duc­tion at all ver­sus no or nega­tive meat re­duc­tion) is en­courag­ing.

There are also other meth­ods that can be used to an­a­lyze the differ­ences in or­di­nal val­ues be­tween the treat­ment and con­trol groups and be­tween the baseline and endline waves, such as or­di­nal lo­gis­tic re­gres­sion, that would be able to an­a­lyze the data and find statis­ti­cally sig­nifi­cant effects with­out the need to pick a par­tic­u­lar method of trans­form­ing the or­di­nal data into con­tin­u­ous, nu­meric data. While the out­put from this model would be very difficult to in­ter­pret in terms of amount of meat re­duced, we would definitely ex­pect a statis­ti­cally sig­nifi­cant effect on this kind of model if the effects of the treat­ment are real and not just an ar­ti­fact of the method of trans­for­ma­tion used.

While the trans­for­ma­tion does in­tro­duce com­pli­ca­tions, I’d gen­er­ally note that us­ing an or­di­nal FFQ sounds like a good idea. The or­di­nal buck­ets may be less ac­cu­rate, but I’d ex­pect re­spon­dents would find it much eas­ier to fill out rather than try­ing to re­call the pre­cise amount of serv­ings that they ate. Since I wouldn’t re­ally trust the differ­ence be­tween some­one self-re­port­ing 4 serv­ings in­stead of 3, it makes sense to cre­ate a siz­able bucket where we would ex­pect differ­ences to be mean­ingful.


I’m some­what cu­ri­ous how much a news­pa­per ad ap­prox­i­mates a leaflet and I’m some­what cu­ri­ous to repli­cate the study again, but with ac­tual leaflets (and in­clud­ing a con­trol leaflet), though oth­ers I’ve talked to have been less in­ter­ested in more MTurk stud­ies.


I and many oth­ers would also very much like to see a study on a plat­form other than MTurk, such as a repli­ca­tion of the MFA 2016 On­line Ads Study but with an even larger sam­ple size. We’d re­ally like to see if effects on MTurk hold up in other ar­eas.


The bot­tom line is that while these re­sults are en­courag­ing, we know that most stud­ies that peo­ple try to repli­cate end up failing to repli­cate. There are still many unan­swered ques­tions here that we’ll only re­ally know as the field of em­piri­cal an­i­mal rights work con­tinues to evolve.


Dis­claimer: I funded 75% of the costs of the study, pro­vided con­sult­ing on the study method­ol­ogy, and con­tinue to be in­volved in Re­duc­etar­ian Labs’s em­piri­cal work.

Thanks to Krys­tal Cald­well, Kieran Greig, Brian Kate­man, Bob­bie Mac­don­ald, Justis Mills, Joey Savoie, and Alli­son Smith for re­view­ing an ad­vanced copy of this work.