I am a Research Scientist at the Humane and Sustainable Food Lab at Stanford.
Seth Ariel Green šø
Hi David,
To be honest Iām having trouble pinning down what the central claim of the meta-analysis is.
To paraphrase Diddyās character in Get Him to the Greek, āWhat are you talking about, the name of the [paper] is called ā[Meaningfully reducing consumption of meat and animal products is an unsolved problem]!ā (š) That is our central claim. Weāre not saying nothing works; weāre saying that meaningful reductions either have not been discovered yet or do not have substantial evidence in support.
However the authors hedge this in places
Thatās author, singular. I said at the top of my initial response that I speak only for myself.
When pushed, I say I am āapproximately veganā or āmostly vegan,ā which is just typically āveganā for short, and most people donāt push. If a vegan gives me a hard time about the particulars, which essentially never happens, I stop talking to them š
IMHO we would benefit from a clear label for folks who arenāt quite vegan but who only seek out high-welfare animal products; I think pasturism/āpasturist is a possible candidate.
Love talking nitty gritty of meta-analysis š
IMHO, the āmath hardā parts of meta-analysis are figuring out what questions you want to ask, what are sensible inclusion criteria, and what statistical models are appropriate. Asking how much time this takes is the same as asking, where do ideas come from?
The ābodybuilding hardā part of meta-analysis is finding literature. The evaluators didnāt care for our search strategy, which you could charitably call ābespokeā and uncharitably call āad hoc and fundamentally unreplicable.ā But either way, I read about 1000 papers closely enough to see if they qualified for inclusion, and then, partly to make sure I didnāt duplicate my own efforts, I recorded notes on every study that looked appropriate but wasnāt. I also read, or at least read the bibliographies of, about 160 previous reviews. Maybe youāre a faster reader than I am, but ballpark, this was 500+ hours of work.
Regarding the computational aspects, the git history tells the story, but specifically making everything computationally reproducible, e.g. writing the functions, checking my own work, setting things up to be generalizableāa week of work in total? Iām not sure.
The paper went through many internal revisions and changed shape a lot from its initial draft when we pivoted in how we treated red and processed meat. Thatās hundreds of hours. Peer review was probably another 40 hour workweek.
As I reread reviewer 2ā²s comments today, it occurred to me that some of their ideas might be interesting test cases for what Claude Code is and is not capable of doing. Iām thinking particularly of trying to formally incorporate my subjective notes about uncertainty (e.g. the many places where I admit that the effect size estimates involved a lot of guesswork) into some kind of...supplementary regression term about how much weight an estimate should get in meta-analysis? Like maybe Iād use Wasserstein-2 distance, as my advisor Don recently proposed? Or Bayesian meta-analysis? This is an important problem, and I donāt consider it solved by RoB2 or whatever, which means that fixing it might be, IDK, a whole new paper which takes however long that does? As my co-authors Don and Betsy & co. comment in a separate paper on which I was an RA:
> Too often, research syntheses focus solely on estimating effect sizes, regardless of whether the treatments are realistic, the outcomes are assessed unobtrusively, and the key features of the experiment are presented in a transparent manner. Here we focus on what we term landmark studies, which are studies that are exceptionally well-designed and executed (regardless of what they discover). These studies provide a glimpse of what a meta-analysis would reveal if we could weight studies by quality as well as quantity. [the point being, meta-analysis is not well-suited for weighing by quality.]Itās possible that some of the proposed changes would take less time than that. Maybe risk of bias assessment could be knocked out in a week?. But itās been about a year since the relevant studies were in my working memory, which means Iād probably have to re-read them all, and across our main and supplementary dataset, thatās dozens of papers. How long does it take you to read dozens of papers? Iād say I can read about 3-4 papers a day closely if Iām really, really cranking. So in all likelihood, yes, weeks of work, and thatās weeks where I wouldnāt be working on a project about building empathy for chickens. Which admittedly Iām procrastinating on by writing this 500+ word comment š
David, there are two separate questions here, which is whether these analyses should be done or whether I should have done them in response to the evaluations. If you think these analyses are worth doing, by all means, go ahead!
A final reflective note: David, I want to encourage you to think about the optics/āpolitics of this exchange from the point of view of prospective Unjornal participants/āauthors. There are no incentives to participate. I did it because I thought it would be fun and I was wondering if anyone would have ideas or extensions that improved the paper. Instead, I got some rather harsh criticisms implying we should have written a totally different paper. Then I got this essay, which was unexpected/āunannounced and used, again, rather harsh language to which I objected. Do you think this exchange looks like an appealing experience to others? Iād say the answer is probably not.
A potential alternative: I took a grad school seminar where we replicated and extended other peopleās papers. Typically the assignment was to do the robustness checks in R or whatever, and then the author would come in and weād discuss. It was a great setup. It worked because the grad students actually did the work, which provided an incentive to participate for authors. The co-teachers also pre-selected papers that they thought were reasonably high-quality, and I bet that if they got a student response like Matthewās, they would have counseled them to be much more conciliatory, to remember that participation is voluntary, to think through the risks of making enemies (as I counseled in my original response), etc. I wonder if something like that would work here too. Like, the expectation is that reviewers will computationally reproduce the paper, conduct extensions and robustness checks, ask questions if they have them, work collaboratively with authors, and then publish a review summarizing the exchange. That would be enticing! Instead what I got here was like a second set of peer reviewers, and unusually harsh ones at that, and nobody likes peer review.
It might be the case that meta-analyses arenāt good candidates for this kind of work, because the extensions/ārobustness checks would probably also have taken Matthew and the other responder weeks, e.g. a fine end of semester project for class credit but not a very enticing hobby.
Just a thought.
For what itās worth, I thought Davidās characterization of the evaluations was totally fair, even a bit toned down. E.g. this is the headline finding of one of them:
major methodological issues undermine the studyās validity. These include improper missing data handling, unnecessary exclusion of small studies, extensive guessing in effect size coding, lacking a serious risk-of-bias assessment, and excluding all-but-one outcome per study.
David characterizes these as āconstructive and actionable insights and suggestionsā. I would say they are tantamount to asking for a new paper, especially the excluding of small studies, which was core to our design and would require a whole new search, which would take months. To me, it was obvious that I was not going to do that (the paper had already been accepted for publication at that point). The remaining suggestions also implied dozens ( hundreds?) of hours of work. Spending weeks satisfying two critics didnāt pass a cost-benefit test.[1] It wasnāt a close call.
- ^
really need to follow my own advice now and go actually do other projects š
- ^
@geoffrey Weād love to run a megastudy! My lab put in a grant proposal with collaborators at a different Stanford lab to do just that but we ultimately went a different direction. Today, however, I generally believe that we donāt even know what is the right question to be askingāthough if I had to choose one it would be, what ballot intiative does the most for animal welfare while also getting the highest levels of public support, e.g. is there some other low-hanging fruit equivalent to ācage freeā like āno mutilationā that would be equally popular. But in general I think weāre back to the drawing board in terms of figuring out what is the study we want to run and getting a version of it off the ground, before we start thinking about scaling up to tens of thousands of people.
@david_reinstein, I suppose any press is good press so I should be happy that you are continuing to mull on the lessons of our paper š but I am disappointed to see that the core point of my responses is not getting through. Iāll frame it explicitly here: when we did one check and not another, or one one search protocol and not another, the reason, every single time, is opportunity costs. When I say āwe thought it made more sense to focus on the risks of bias that seemed most specific to this literature,ā I am using the word āfocusā deliberately, in the sense of āfocus means saying no,ā i.e. āwe are always triaging.ā At every juncture, navigating the explore/āexploit dilemma requires judgment calls. You donāt have to like that I said no to you, but itās not a false dichotomy, and I do not care for that characterization.
To the second question of whether anyone will do the kind of extension work, I personally see this as a great exercise for grad students. I did all kinds of replication and extension work in grad school. A deep dive into a subset of contact hypothesis literature I did in a political psychology class in 2014, which started with a replication attempt, eventually morphed into The Contact Hypothesis Re-evaluated. If you, a grad student. want to do this kind of project, please be in touch, Iād love to hear from you. (Iād recommend starting by downloading the repo and asking claude code about robustness checks that do and do not require gathering additional data).
Thatās interesting, but not what Iām suggesting. Iām suggesting something that would, e.g., explain why you tell people to āignore the signs of my estimates for the total welfareā when you share posts with them. That is a particular style and it says something about whether one should take your work in a literal spirit or not, which falls under the meta category of why you write the way you write; and to my earlier point, youāre sharing this suggestion here with me in a comment rather than in the post itself š Finally, the fact that thereās a lot of uncertainty about whether wild animals have positive or negative lives is exactly the point I raised about why I have trouble engaging with your work. The meta post I am suggesting, by contrast, motivate and justify this style of reasoning as a whole, rather than providing a particular example of it. The post youāve shared is a link in a broader chain. Iām suggesting you zoom out and explain what you like about this chain and why youāre building it.
By all means, show us the way by doing it better š Iād be happy to read more about where you are coming from, I think your work is interesting and if you are right, it has huge implications for all of us.
Echoing Richardās comment, EA is a community with communal norms, and a different forum might be a better fit for your style. Substack, for instance, is more likely to reward a confrontational approach. There is no moral valence to this observation, and likewise there is no moral valence to the EA community implicitly shunning you for not following its norms. Weāre talking about fit.
Pointing out āthe irony of debating āAI rightsā when basic human rights are still contestedā is contrary to EA communal norms in several ways, e.g. itās not intended to persuade but rather to end/āsubstantially redirect a conversation, its philosophical underpinnings have extremely broad and (I think to us) self-evidently absurd implications (should we bombard the Game of Thrones subreddit with messages about how people shouldnāt be debating fiction when people are starving?), its tone was probably out of step with how we talk, etc. Downvoting a comment like that amounts to āthis is not to my tastes and I want to talk about something else.ā
āI started to notice a pattern ā sameness in tone, sameness in structure, even sameness in thought. Ideas endlessly repackaged, reframed, and recycled. A sort of intellectual monoculture.ā This is a fairly standard EA criticism. Being an EA critic is a popular position. But I think you can trust that weāve heard it before, responded before, etc. I am sympathetic to folks not wanting to do it again.
(Vasco asked me to take a look at this post and I am responding here.)
Hi Vasco,
Iāve been taking a minute to reflect on what I want to say about this kind of project. A few different thoughts, at a few different levels of abstraction.
In the realm of politics, Iām glad the ACLU and FIRE exist, even if I donāt agree with them on everything, because I think theyāre useful poles in the ecosystem. I feel similarly about your work. I think this kind of detailed cost-benefit work on non-standard issues, or on standard issues but that leads to non-standard conclusions, is a healthy contribution to EA, separately from whether I agree with or even understand it.
The main barrier to my engaging deeply with your work is that your analyses hinge on strong assumptions that I have no idea how to verify even in theory. The claim that nematodes live net-negative lives, for instance, which you believe with 55% confidence: I have no clue if this is true. Iām not even sure how many hours I would need to devote to form any belief on this whatsoever. (Hundreds?) In general, I have about 2-3 hours of good thinking per day.
IMO, the top comment on this post expresses the āEA consensusā about your style of analysisāI notice that it has gotten more upvotes and such than the post itself. One implication of this is that there is some persuasion work to be done get folks onboard with some of your assumptions, stylistic choices, and modes of analysis. Perhaps a post along the lines of āwhy I write the way I writeā (Nietzsche did this) or āThe moral philosophical assumptions underpinning my style of analysisā would go some of the way to bridging that gap.
I get the sense that you are building an elaborate intellectual edifice whose many moving parts are distributed in many posts, comments, and external philosophical texts. Thatās well and good, I also have a āheadcanonā about my work and ideas that I havenāt fully systematized, e.g. I write almost exclusively about the results of randomized controlled trials without explaining the intellectual foundations of why I do that. But I think your intellectual foundations are more abstruse and counterintuitive. Readers might enjoy a meta post about those foundations: a āstart here to understand Vasco Griloās writingā primer.
I am generally on board with using the EA forum as an extended job interview, e.g. establishing a reputation as someone who can reason and write clearly about an arbitrary subject. I think youāre doing a fine job of that. On the other hand, the interaction with Kevin Xia about whether this work is appropriate for Hive, the downvotes that post received, and the fact that you are the only contributor to the soil animals topic here are face value evidence that writing about this topic as much as you do is not career-optimal. Perhaps it deserves its own forum: soilanimalsmatter.substack.com or something like that? And then you you can actually build up the whole intellectual edifice from foundations upwards. I do this (https://āāregressiontothemeat.substack.com/āā) and it is working for me. Just a thought.
I am amenable to this argument and generally skeptical of longtermism on practical grounds. (I have a lot of trouble thinking of someone 300-500 years ago plausibly doing anything with my interests in mind that actually makes a difference. Possible exceptions include folks associated with the Gloriois Revolution.)
I think the best counterargument is that itās easier to set things on a good course than to course correct. Analogy: easier to found Google, capitalizing on advertisersā complacency, than to fix advertising from within; easier to create Zoom than to get Microsoft to make Skype good.
Im not saying this is right but I think that is how I would try to motivate working on longtermism if I did (work on longtermism).
I bought the 3M mask on your recc š
Hi Ben, I agree that there are a lot of intermediate weird outcomes that I donāt consider, in large part because I see them as less likely than (I think) you do. I basically think society is going to keep chugging along as it is, in the same way that life with the internet is certainly different than life without it but we basically all still get up, go to work, seek love and community, etc.
However I donāt think Iām underestimating how transformative AI would be in the section on why my work continues to make sense to me if we assume AI is going to kill us all or usher in utopia, which I think could be fairly described as transformative scenarios ;)
If McDonalds becomes human-labor-free, I am not sure what effect that would have on advocating for cage-free campaigns. I could see it going many ways, or no ways. I still think persuading people that animals matter, and they should give cruelty-free options a chance, is going to matter under basically every scenario I could think of, including that one.
Iād like to see a serious re-examination of the evidence underpinning GiveWellās core recommendations, focusing on
how recent is the evidence?
what are the core results on the primary outcomes of interest?
How much is GiveWell doing add-on analysis/ātheorizing to boost those results into something amenable, or do the results speak for themselves?
How reproducible/āopen-science-y/āpre-registered/āetc. are the papers under discussion?
Are there any working papers/āin-progress things worth adding to the evidence base?
I did this for one intervention in GiveWell should fund an SMC replication & @Holden Karnofsky did a version of it in Minimal-trust investigations, but I think these investigations are worth doing multiple times over the years from multiple parties. Itās a lot of work though, so I see why it doesnāt get done too often.
Linkpost: AI does not abĀsolve us of helping animals
something along these lines, though not exactly
I wonder what the optimal protein intake is for trying to increase power to mass ratio, which is the core thing the sports I do (running, climbing, and hiking) ask for. I do not think that gaining mass is the average health/āfitness goal, nor obviously the right thing for most people. Iād bet that most Americans would put losing weight and aerobic capacity a fair bit higher.
Hi James, neat visualizations, and very validating that you were able to extend our work like this! We worked hard to make our materials legible but you donāt really know how well that went until someone actually tries to use them š So this is great to see.
Yes, a switch away from chicken meat towards beef could be good under some circumstances/āassumptions. But the goal of our experiment was to come up with an effect size large enough to take to Chipotle, and we donāt think we found one. My guess is that the interspecies tradeoffs also would not be very persuasive to a fast casual chain relative to beefās larger climate impact.
Iām not sure. Sofritas are more or less an analogue to ground beef, but Iām not sure people make that connection. Our thinking for this experiment was that chicken typically has the fewest analogues widely available, so we should try to focus on that. But I am no longer sure that I have a good sense of how introducing PMAs would impact meat consumption. Yes, we find some evidence that chicknāitas absorbs demand from chicken specifically, but itās not a slam dunk by any means. Maybe another PMA or two would have larger effects. I doubt it.
I agree that proto-vegetarians might be more actiely exploring alternatives...but how many people are in this category? Iād venture less than 1% of people are seriously considering it. Probably a much larger category are looking to ācut backā in some sense, but that might mean many things to them.
I think our experiment has high ecological validity for the thing we are testing, which is the introduction of PMAs to an online, Chipotle-like menu. Thatās a real environment in which people encounter PMAs, and because itās online, IMHO it may lack promotion, buzz, etc. Perhaps a more elaborate test of a more fleshed out, multi-component theory would find different effects. On the other hand, our intervention is easily scaled up.
For tests of āhearsay about how X or Y tastes really good, has to be tried etcā see, Sparkman et al. (2020, e.g. figure 2) and Piester et al. (2020). We review some of those studies here. I think broadly speaking you are talking about norms-based approaches, see here for a general review and here for a review specific to eating meat.
Itās an interesting question.
From the POV of our core contention -- that we donāt currently have a validated, reliable intervention to deploy at scaleāwhether this is because of absence of evidence (AoE) or evidence of absence (EoA) is hard to say. I donāt have an overall answer, and ultimately both roads lead to āunsolved problem.ā
We can cite good arguments for EoA (these studies are stronger than the norm in the field but show weaker effects, and that relationship should be troubling for advocates) or AoE (weāre not talking about very many studies at all), and ultimately I think the line between the two is in the eye of the beholder.
Going approach by approach, my personal answers are
choice architecture is probably AoE, it might work better than expected but we just donāt learn very much from 2 studies (I am working on something about this separately)
the animal welfare appeals are more AoE, esp. those from animal advocacy orgs
social psych approaches, Iām skeptical of but there werenāt a lot of high-quality papers so Iām not so sure (see here for a subsequent meta-analysis of dynamic norms approaches).
I would recommend health for older folks, environmental appeals for Gen Z. So there Iād say we have evidence of efficacy, but to expect effects to be on the order of a few percentage points.
Were I discussing this specifically with a funder, I would say, if youāre going to do one of the meta-analyzed approachesāpsych, nudge, environment, health, or animal welfare, or some hybrid thereofāyou should expect small effect sizes unless you have some strong reason to believe that your intervention is meaningfully better than the category average. For instance, animal welfare appeals might not work in general, but maybe watching Dominion is unusually effective. However, as we say in our paper, there are a lot of cool ideas that havenāt been tested rigorously yet, and from the point of view of knowledge, Iād like to see those get funded first.