I am a Research Scientist at the Humane and Sustainable Food Lab at Stanford.
Seth Ariel Green šø
Earlier this year year, I sent a Works in Progress piece about Far UV to a friend who is a material science engineer and happens to work in UV applications (he once described his job as āliterally watching paint dry,ā e.g. checking how long it takes for a UV lamp to dry a coat of paint on your car). I asked
Iām interested in the comment that thereās no market for these things because thereās no public health authority to confer official status on them. That doesnāt really make sense to me. If you wanted to market your airplane as the most germ-free or whatever, you could just point to clinical trials and not mention the FDA. Does the FDA or whomever really have independent status to bestow thatās so powerful?
Friend replied:
Certain UV disinfection applications are quite popular, but indoor air treatment is one of the more challenging ones. One issue is that these products only can offer āthe most germ-freeā environment, the amount of protection is not really quantifiable. If a sick person coughed in my direction would an overhead UV lamp stop the germs from reaching me? Probably not...
...far-UVC technology has some major limitations. Relatively high cost per unit, most average only 1 year lifetime or less with decreasing efficacy over time, have to replace the entire system when itās spent, adds to electricity cost when in use, and you need to install a lot of them for it to be effective because when run at too high power levels they produce large amounts of hazardous ozone...
NIST has been working on establishing some standards for measuring UV output and the effectiveness of these systems for the last few years. Doesnāt seem to be helping too much with convincing the general public. Covid was the big chance for these technologies to spread, and they did in some places like airports, just not everywhere
You can/āshould take this with a grain of salt. On the other hand, I generally believe that some EAs tend to be very credulous towards solutions that promise a high amount of efficacy on core cause areas, and operate with a mental model that the missing ingredient for scale is more often money than culture/āmechanics: that everything is like bed nets. By contrast, I believe that if some solution looks very promising to outsiders but remains in limited useāe.g. kangaroo care for preemies in Nigeria or removing lead from turmeric in Bangladeshāthere is likely a deep reason for that thatās only legible to insiders, i.e. people with a lot of local context. Here, I suspect that many of us donāt have a strong understanding of the costs, limitations, true efficacy, and logistical difficulties of UV light.
Thatās my epistemology. But if someone wants to fund and run an RCT testing the effects of, say, a cluster of aerolamps on covid cases at a big public event, Iād be happy to consult on design, measurement strategy, IRB approval, etc. (Gotta put that university affiliation to use on something!)
Itās an interesting question.
From the POV of our core contention -- that we donāt currently have a validated, reliable intervention to deploy at scaleāwhether this is because of absence of evidence (AoE) or evidence of absence (EoA) is hard to say. I donāt have an overall answer, and ultimately both roads lead to āunsolved problem.ā
We can cite good arguments for EoA (these studies are stronger than the norm in the field but show weaker effects, and that relationship should be troubling for advocates) or AoE (weāre not talking about very many studies at all), and ultimately I think the line between the two is in the eye of the beholder.
Going approach by approach, my personal answers arechoice architecture is probably AoE, it might work better than expected but we just donāt learn very much from 2 studies (I am working on something about this separately)
the animal welfare appeals are more AoE, esp. those from animal advocacy orgs
social psych approaches, Iām skeptical of but there werenāt a lot of high-quality papers so Iām not so sure (see here for a subsequent meta-analysis of dynamic norms approaches).
I would recommend health for older folks, environmental appeals for Gen Z. So there Iād say we have evidence of efficacy, but to expect effects to be on the order of a few percentage points.
Were I discussing this specifically with a funder, I would say, if youāre going to do one of the meta-analyzed approachesāpsych, nudge, environment, health, or animal welfare, or some hybrid thereofāyou should expect small effect sizes unless you have some strong reason to believe that your intervention is meaningfully better than the category average. For instance, animal welfare appeals might not work in general, but maybe watching Dominion is unusually effective. However, as we say in our paper, there are a lot of cool ideas that havenāt been tested rigorously yet, and from the point of view of knowledge, Iād like to see those get funded first.
Hi David,
To be honest Iām having trouble pinning down what the central claim of the meta-analysis is.
To paraphrase Diddyās character in Get Him to the Greek, āWhat are you talking about, the name of the [paper] is called ā[Meaningfully reducing consumption of meat and animal products is an unsolved problem]!ā (š) That is our central claim. Weāre not saying nothing works; weāre saying that meaningful reductions either have not been discovered yet or do not have substantial evidence in support.
However the authors hedge this in places
Thatās author, singular. I said at the top of my initial response that I speak only for myself.
When pushed, I say I am āapproximately veganā or āmostly vegan,ā which is just typically āveganā for short, and most people donāt push. If a vegan gives me a hard time about the particulars, which essentially never happens, I stop talking to them š
IMHO we would benefit from a clear label for folks who arenāt quite vegan but who only seek out high-welfare animal products; I think pasturism/āpasturist is a possible candidate.
Love talking nitty gritty of meta-analysis š
IMHO, the āmath hardā parts of meta-analysis are figuring out what questions you want to ask, what are sensible inclusion criteria, and what statistical models are appropriate. Asking how much time this takes is the same as asking, where do ideas come from?
The ābodybuilding hardā part of meta-analysis is finding literature. The evaluators didnāt care for our search strategy, which you could charitably call ābespokeā and uncharitably call āad hoc and fundamentally unreplicable.ā But either way, I read about 1000 papers closely enough to see if they qualified for inclusion, and then, partly to make sure I didnāt duplicate my own efforts, I recorded notes on every study that looked appropriate but wasnāt. I also read, or at least read the bibliographies of, about 160 previous reviews. Maybe youāre a faster reader than I am, but ballpark, this was 500+ hours of work.
Regarding the computational aspects, the git history tells the story, but specifically making everything computationally reproducible, e.g. writing the functions, checking my own work, setting things up to be generalizableāa week of work in total? Iām not sure.
The paper went through many internal revisions and changed shape a lot from its initial draft when we pivoted in how we treated red and processed meat. Thatās hundreds of hours. Peer review was probably another 40 hour workweek.
As I reread reviewer 2ā²s comments today, it occurred to me that some of their ideas might be interesting test cases for what Claude Code is and is not capable of doing. Iām thinking particularly of trying to formally incorporate my subjective notes about uncertainty (e.g. the many places where I admit that the effect size estimates involved a lot of guesswork) into some kind of...supplementary regression term about how much weight an estimate should get in meta-analysis? Like maybe Iād use Wasserstein-2 distance, as my advisor Don recently proposed? Or Bayesian meta-analysis? This is an important problem, and I donāt consider it solved by RoB2 or whatever, which means that fixing it might be, IDK, a whole new paper which takes however long that does? As my co-authors Don and Betsy & co. comment in a separate paper on which I was an RA:
> Too often, research syntheses focus solely on estimating effect sizes, regardless of whether the treatments are realistic, the outcomes are assessed unobtrusively, and the key features of the experiment are presented in a transparent manner. Here we focus on what we term landmark studies, which are studies that are exceptionally well-designed and executed (regardless of what they discover). These studies provide a glimpse of what a meta-analysis would reveal if we could weight studies by quality as well as quantity. [the point being, meta-analysis is not well-suited for weighing by quality.]Itās possible that some of the proposed changes would take less time than that. Maybe risk of bias assessment could be knocked out in a week?. But itās been about a year since the relevant studies were in my working memory, which means Iād probably have to re-read them all, and across our main and supplementary dataset, thatās dozens of papers. How long does it take you to read dozens of papers? Iād say I can read about 3-4 papers a day closely if Iām really, really cranking. So in all likelihood, yes, weeks of work, and thatās weeks where I wouldnāt be working on a project about building empathy for chickens. Which admittedly Iām procrastinating on by writing this 500+ word comment š
David, there are two separate questions here, which is whether these analyses should be done or whether I should have done them in response to the evaluations. If you think these analyses are worth doing, by all means, go ahead!
A final reflective note: David, I want to encourage you to think about the optics/āpolitics of this exchange from the point of view of prospective Unjornal participants/āauthors. There are no incentives to participate. I did it because I thought it would be fun and I was wondering if anyone would have ideas or extensions that improved the paper. Instead, I got some rather harsh criticisms implying we should have written a totally different paper. Then I got this essay, which was unexpected/āunannounced and used, again, rather harsh language to which I objected. Do you think this exchange looks like an appealing experience to others? Iād say the answer is probably not.
A potential alternative: I took a grad school seminar where we replicated and extended other peopleās papers. Typically the assignment was to do the robustness checks in R or whatever, and then the author would come in and weād discuss. It was a great setup. It worked because the grad students actually did the work, which provided an incentive to participate for authors. The co-teachers also pre-selected papers that they thought were reasonably high-quality, and I bet that if they got a student response like Matthewās, they would have counseled them to be much more conciliatory, to remember that participation is voluntary, to think through the risks of making enemies (as I counseled in my original response), etc. I wonder if something like that would work here too. Like, the expectation is that reviewers will computationally reproduce the paper, conduct extensions and robustness checks, ask questions if they have them, work collaboratively with authors, and then publish a review summarizing the exchange. That would be enticing! Instead what I got here was like a second set of peer reviewers, and unusually harsh ones at that, and nobody likes peer review.
It might be the case that meta-analyses arenāt good candidates for this kind of work, because the extensions/ārobustness checks would probably also have taken Matthew and the other responder weeks, e.g. a fine end of semester project for class credit but not a very enticing hobby.
Just a thought.
For what itās worth, I thought Davidās characterization of the evaluations was totally fair, even a bit toned down. E.g. this is the headline finding of one of them:
major methodological issues undermine the studyās validity. These include improper missing data handling, unnecessary exclusion of small studies, extensive guessing in effect size coding, lacking a serious risk-of-bias assessment, and excluding all-but-one outcome per study.
David characterizes these as āconstructive and actionable insights and suggestionsā. I would say they are tantamount to asking for a new paper, especially the excluding of small studies, which was core to our design and would require a whole new search, which would take months. To me, it was obvious that I was not going to do that (the paper had already been accepted for publication at that point). The remaining suggestions also implied dozens ( hundreds?) of hours of work. Spending weeks satisfying two critics didnāt pass a cost-benefit test.[1] It wasnāt a close call.
- ^
really need to follow my own advice now and go actually do other projects š
- ^
@geoffrey Weād love to run a megastudy! My lab put in a grant proposal with collaborators at a different Stanford lab to do just that but we ultimately went a different direction. Today, however, I generally believe that we donāt even know what is the right question to be askingāthough if I had to choose one it would be, what ballot intiative does the most for animal welfare while also getting the highest levels of public support, e.g. is there some other low-hanging fruit equivalent to ācage freeā like āno mutilationā that would be equally popular. But in general I think weāre back to the drawing board in terms of figuring out what is the study we want to run and getting a version of it off the ground, before we start thinking about scaling up to tens of thousands of people.
@david_reinstein, I suppose any press is good press so I should be happy that you are continuing to mull on the lessons of our paper š but I am disappointed to see that the core point of my responses is not getting through. Iāll frame it explicitly here: when we did one check and not another, or one one search protocol and not another, the reason, every single time, is opportunity costs. When I say āwe thought it made more sense to focus on the risks of bias that seemed most specific to this literature,ā I am using the word āfocusā deliberately, in the sense of āfocus means saying no,ā i.e. āwe are always triaging.ā At every juncture, navigating the explore/āexploit dilemma requires judgment calls. You donāt have to like that I said no to you, but itās not a false dichotomy, and I do not care for that characterization.
To the second question of whether anyone will do the kind of extension work, I personally see this as a great exercise for grad students. I did all kinds of replication and extension work in grad school. A deep dive into a subset of contact hypothesis literature I did in a political psychology class in 2014, which started with a replication attempt, eventually morphed into The Contact Hypothesis Re-evaluated. If you, a grad student. want to do this kind of project, please be in touch, Iād love to hear from you. (Iād recommend starting by downloading the repo and asking claude code about robustness checks that do and do not require gathering additional data).
Thatās interesting, but not what Iām suggesting. Iām suggesting something that would, e.g., explain why you tell people to āignore the signs of my estimates for the total welfareā when you share posts with them. That is a particular style and it says something about whether one should take your work in a literal spirit or not, which falls under the meta category of why you write the way you write; and to my earlier point, youāre sharing this suggestion here with me in a comment rather than in the post itself š Finally, the fact that thereās a lot of uncertainty about whether wild animals have positive or negative lives is exactly the point I raised about why I have trouble engaging with your work. The meta post I am suggesting, by contrast, motivate and justify this style of reasoning as a whole, rather than providing a particular example of it. The post youāve shared is a link in a broader chain. Iām suggesting you zoom out and explain what you like about this chain and why youāre building it.
By all means, show us the way by doing it better š Iād be happy to read more about where you are coming from, I think your work is interesting and if you are right, it has huge implications for all of us.
Echoing Richardās comment, EA is a community with communal norms, and a different forum might be a better fit for your style. Substack, for instance, is more likely to reward a confrontational approach. There is no moral valence to this observation, and likewise there is no moral valence to the EA community implicitly shunning you for not following its norms. Weāre talking about fit.
Pointing out āthe irony of debating āAI rightsā when basic human rights are still contestedā is contrary to EA communal norms in several ways, e.g. itās not intended to persuade but rather to end/āsubstantially redirect a conversation, its philosophical underpinnings have extremely broad and (I think to us) self-evidently absurd implications (should we bombard the Game of Thrones subreddit with messages about how people shouldnāt be debating fiction when people are starving?), its tone was probably out of step with how we talk, etc. Downvoting a comment like that amounts to āthis is not to my tastes and I want to talk about something else.ā
āI started to notice a pattern ā sameness in tone, sameness in structure, even sameness in thought. Ideas endlessly repackaged, reframed, and recycled. A sort of intellectual monoculture.ā This is a fairly standard EA criticism. Being an EA critic is a popular position. But I think you can trust that weāve heard it before, responded before, etc. I am sympathetic to folks not wanting to do it again.
(Vasco asked me to take a look at this post and I am responding here.)
Hi Vasco,
Iāve been taking a minute to reflect on what I want to say about this kind of project. A few different thoughts, at a few different levels of abstraction.
In the realm of politics, Iām glad the ACLU and FIRE exist, even if I donāt agree with them on everything, because I think theyāre useful poles in the ecosystem. I feel similarly about your work. I think this kind of detailed cost-benefit work on non-standard issues, or on standard issues but that leads to non-standard conclusions, is a healthy contribution to EA, separately from whether I agree with or even understand it.
The main barrier to my engaging deeply with your work is that your analyses hinge on strong assumptions that I have no idea how to verify even in theory. The claim that nematodes live net-negative lives, for instance, which you believe with 55% confidence: I have no clue if this is true. Iām not even sure how many hours I would need to devote to form any belief on this whatsoever. (Hundreds?) In general, I have about 2-3 hours of good thinking per day.
IMO, the top comment on this post expresses the āEA consensusā about your style of analysisāI notice that it has gotten more upvotes and such than the post itself. One implication of this is that there is some persuasion work to be done get folks onboard with some of your assumptions, stylistic choices, and modes of analysis. Perhaps a post along the lines of āwhy I write the way I writeā (Nietzsche did this) or āThe moral philosophical assumptions underpinning my style of analysisā would go some of the way to bridging that gap.
I get the sense that you are building an elaborate intellectual edifice whose many moving parts are distributed in many posts, comments, and external philosophical texts. Thatās well and good, I also have a āheadcanonā about my work and ideas that I havenāt fully systematized, e.g. I write almost exclusively about the results of randomized controlled trials without explaining the intellectual foundations of why I do that. But I think your intellectual foundations are more abstruse and counterintuitive. Readers might enjoy a meta post about those foundations: a āstart here to understand Vasco Griloās writingā primer.
I am generally on board with using the EA forum as an extended job interview, e.g. establishing a reputation as someone who can reason and write clearly about an arbitrary subject. I think youāre doing a fine job of that. On the other hand, the interaction with Kevin Xia about whether this work is appropriate for Hive, the downvotes that post received, and the fact that you are the only contributor to the soil animals topic here are face value evidence that writing about this topic as much as you do is not career-optimal. Perhaps it deserves its own forum: soilanimalsmatter.substack.com or something like that? And then you you can actually build up the whole intellectual edifice from foundations upwards. I do this (https://āāregressiontothemeat.substack.com/āā) and it is working for me. Just a thought.
I am amenable to this argument and generally skeptical of longtermism on practical grounds. (I have a lot of trouble thinking of someone 300-500 years ago plausibly doing anything with my interests in mind that actually makes a difference. Possible exceptions include folks associated with the Gloriois Revolution.)
I think the best counterargument is that itās easier to set things on a good course than to course correct. Analogy: easier to found Google, capitalizing on advertisersā complacency, than to fix advertising from within; easier to create Zoom than to get Microsoft to make Skype good.
Im not saying this is right but I think that is how I would try to motivate working on longtermism if I did (work on longtermism).
I bought the 3M mask on your recc š
Hi Ben, I agree that there are a lot of intermediate weird outcomes that I donāt consider, in large part because I see them as less likely than (I think) you do. I basically think society is going to keep chugging along as it is, in the same way that life with the internet is certainly different than life without it but we basically all still get up, go to work, seek love and community, etc.
However I donāt think Iām underestimating how transformative AI would be in the section on why my work continues to make sense to me if we assume AI is going to kill us all or usher in utopia, which I think could be fairly described as transformative scenarios ;)
If McDonalds becomes human-labor-free, I am not sure what effect that would have on advocating for cage-free campaigns. I could see it going many ways, or no ways. I still think persuading people that animals matter, and they should give cruelty-free options a chance, is going to matter under basically every scenario I could think of, including that one.
Iād like to see a serious re-examination of the evidence underpinning GiveWellās core recommendations, focusing on
how recent is the evidence?
what are the core results on the primary outcomes of interest?
How much is GiveWell doing add-on analysis/ātheorizing to boost those results into something amenable, or do the results speak for themselves?
How reproducible/āopen-science-y/āpre-registered/āetc. are the papers under discussion?
Are there any working papers/āin-progress things worth adding to the evidence base?
I did this for one intervention in GiveWell should fund an SMC replication & @Holden Karnofsky did a version of it in Minimal-trust investigations, but I think these investigations are worth doing multiple times over the years from multiple parties. Itās a lot of work though, so I see why it doesnāt get done too often.
something along these lines, though not exactly
I wonder what the optimal protein intake is for trying to increase power to mass ratio, which is the core thing the sports I do (running, climbing, and hiking) ask for. I do not think that gaining mass is the average health/āfitness goal, nor obviously the right thing for most people. Iād bet that most Americans would put losing weight and aerobic capacity a fair bit higher.
Hi Jeff, I think weāre talking about the same lifespan, my friend was talking about 1 year of continuous use (he works in industrial applications).