I am a Research Scientist at the Humane and Sustainable Food Lab at Stanford and a nonresident fellow at the Kahneman-Treisman Center at Princeton. By trade, I am a meta-analyst.
Here is my date-me doc.
I am a Research Scientist at the Humane and Sustainable Food Lab at Stanford and a nonresident fellow at the Kahneman-Treisman Center at Princeton. By trade, I am a meta-analyst.
Here is my date-me doc.
Sure, there are more reasonable ways to express the argument, all of which boil down to “experts have a comparative advantage at producing welfare gains.” And the people don’t need to be poor for this theory to be in vogue, e.g. American public health officials giving very nuanced guidance about masks in March-April 2020 because they were looking to achieve a second-order effect (proper distribution). I think my broader point, however, about the necessity for such a theory, regardless of how it is expressed, holds. I went with a cynical version in part because I was trying to make a point that a theory can be a ‘folk theory’ and therefore impolite, but elucidating that was out of scope for the text.
Three use cases come to mind for the forum:
establishing a reputation in writing as a person who can follow good argumentative norms (perhaps as a kind of extended courtship of EA jobs/orgs)
disseminating findings that are mainly meant for other forums, e.g. research reports
keeping track of what the community at large is thinking about/working on, which is mostly facilitated by organizations like RP & GiveWell using the forum to share their work.
I don’t think I would use the forum for hashing out anything I was really thinking hard about; I’d probably have in-person conversations or email particular persons.
Thanks for clarifying. That inevitably rests on a strong assumption about the relative importance of chicken welfare to human welfare, and it looks like your work builds on Bob Fischer’s estimates for conversion. That’s a fine starting point but for my tastes, this is a truly hard problem where the right answer is probably not knowable even in theory. When I’m discussing this, I’ll probably stick to purely empirical claims, e.g., “we can make X chickens’ lives better in Y ways” or “we can reduce meat consumption by Z pounds” and be hand-wavy about the comparison between species. YMMV.
Thank you for the additional context!
re: Pure Earth: GiveWell notes that its Pure Earth estimates are “substantially less rigorous than both our top charity cost-effectiveness estimates,” so I don’t want to read too much into it. However, a claim that an intervention is merely 18X better at helping poor people than they are at helping themselves still strikes me as extraordinary, albeit in a way that we become acclimated to over time.
As to what good social theory would look like here, there is some nice work in sociology on the causes and consequences of lead exposure in America (see Muller, Sampson, and Winter 2018 for a review). I don’t expect EA orgs to produce this level of granularity when justifying their work, but some theory about why an opportunity exist would be very much appreciated, at least by me.
I’ve followed your work a bit w.r.t. animal welfare. That’s 15 chicken DALYs right? That seems plausible to me. The theory I would construct for this would start with the fact that there are probably more chickens living on factory farms at this moment than there are humans alive. Costco alone facilitates the slaughter of ~100M chickens/year. If you improve the welfare of just the Costco chickens by just 1% of a DALY per chicken, that’s 1M DALYs. I could very much believe that a corporate campaign of that magnitude might cost about $66K (approximately 1M/15). So I find this claim much less extraordinary.
As a potential title, maybe “Disability among farmed animals is neglected relative to human disability”? or something like that
Great to see this — I am a past FSRF recipient and it was a very positive experience!
In an investigative reporting setting, it is common to see the reason why anonymity was requested and granted, e.g. someone “requested anonymity to avoid potential retribution from their employer.” There is also a general norm around trusting quoted sources that can be verified over anonymous comments. I think these norms have evolved because they are useful for credibility.
Animal welfare is an area where climate concerns and a canonical EA cause converge because factory farming is a major contributor to both problems. By that light, EAs are actually doing quite a lot to mitigate climate change, just under a different banner.
Glad you liked it! I also got a lot out of Jia Tolentino’s “We Come From Old Virginia” in her book Trick Mirror
there have been a few “EA” responses to this issue but TBF they can be a bit hard to find
https://www.cold-takes.com/minimal-trust-investigations/
As an aside, I’m pretty underwhelmed by concerns about using LLINs as fishing nets. These concerns are very media-worthy, but I’m more worried about things like “People just never bother to hang up their LLIN,” which I’d guess is a more common issue. The LLIN usage data we use would (if accurate) account for both.
Besides the harm caused by some people contracting malaria because they don’t sleep under their nets, which we already account for in our cost-effectiveness analysis, the article warns that fishing with insecticide treated nets may deplete fish stocks. In making this case, the article cites only one study, which reports that about 90% of households in villages along Lake Tanganyika used bed nets to fish. It doesn’t cite any studies examining the connection between bed nets and depleted fish stocks more directly. The article states that “Recent hydroacoustic surveys show that Zambia’s fish populations are dwindling” and “recent surveys show that Madagascar’s industrial shrimp catch plummeted to 3,143 tons in 2010 from 8,652 tons in 2002,” but declines in fish populations and shrimp catch may have causes other than mosquito net-fishing.
Yeah, I was curious about this too, and we try to get at something theoretically similar by putting out all the “zeitgeist” studies in an attempt to define the dominant approaches of a given era. Like, in the mid-2010s, everyone was thinking about bystander stuff. But if memory serves, once I saw the above graph, I basically just dropped this whole line of inquiry because we were basically seeing no relationship between effect size and publication date. Having said that, behavioral outcomes get more common over time (see graph in original post), and that is probably also having a depressing effect on the relationship. There could be some interesting further analyses here—we try to facilitate them by open sourcing our materials.
By the way, apologies for saying above that your “intuition is moot,” I meant “your intuition about mootness is correct” 😃 (I just changed it)
Hi Akhil,
Thanks for engaging.
I do not think we have missed a significant portion of primary prevention papers in our time period. Looking at that page, I am seeing some things that had midpoint evaluations in 2018. Looking at this group’s final report (https://www.whatworks.co.za/documents/publications/390-what-works-to-prevent-vawg-final-performance-evaluation-report-mar-2020/file), I do not see anything that qualifies as a primary prevention of sexual violence. We did a painstaking systematic search and I’m reasonably confident we got just about everything that meets our criteria. As to whether we might have chosen different criteria—too late now, but for my own curiosity, what would you suggest?
We have many subgroup analyses in the paper, though for my tastes, I wish we could have done more in terms of grouping studies together by theoretical approach and then analyzing them. This turned out to be really hard in practice because there was so much overlapping content but also so many bespoke delivery mechanisms. This heterogeneity is one reason my next meta-analysis (https://forum.effectivealtruism.org/posts/k9qqGZtmWz3x4yaaA/environmental-and-health-appeals-are-the-most-effective) sets strict quality-related inclusion criteria and then compares theoretical approaches head-to-head.
Definitely, and this is arguably the main limitation of this paper: we’re a few years out of date. Basically what happened was, we started this paper pre-pandemic (the first conversation I was part of was summer 2017!), did our search, and then lost a few years to the pandemic and also a lot of the team worked on a different meta-analysis in the interim (https://www.annualreviews.org/content/journals/10.1146/annurev-psych-071620-030619, on which I was an RA). I still think we’ve got some interesting and general lessons from reviewing 30+ years worth of papers.
👋 our search extends to 1985, but the first paper was from 1986. We started our search by replicating and extending a previous review, which says “The start date of 1985 was chosen to capture the 25-year period prior to the initial intended end date of 2010. The review was later extended through May 2012 to capture the most recent evaluation studies at that time.” I’m not too worried about missing stuff from before that, though, because the first legit evaluation we could find was from 1986. There’s actually a side story to tell here about how the people doing this work back then were not getting supported by their fields or their departments, but persisted anyway.
But I think your concern is, why include studies from that far back at all vs. just the “modern era” however we define that (post MeToo? post Dear Colleague Letter?). That’s a fair question, but your intuition about mootness is right, there’s essentially zero relationship between effect size and time.
Here’s a figure that plots average effect size over time from our 4-exploratory-analyses.html
script:
And the overall slope is really tiny:
dat |> sum_lm(d, year)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5.67062 4.85829 -1.16720 0.24370
## year 0.00297 0.00242 1.22631 0.22067
Some research evaluations last over time! But Munger’s ‘temporal validity’ argument really stuck with me: the social world changes over time, so things that work in one place and time could fail in another for reasons that have nothing to do with rigor, but changing context.
In general, null results should be our default expectation in behavioral research: https://www.bu.edu/bulawreview/files/2023/12/STEVENSON.pdf
However, per https://eiko-fried.com/antidotes-to-cynicism-creep/#6_Antidotes_to_cynicism_creep
More broadly, for me personally, the way forward is to incentivize, champion, and promote better and more robust scientific work. I find this motivating and encouraging, and an efficient antidote against cynicism creep. I find it intellectually rewarding because it is an effort that spans many areas including teaching science, doing science, and communicating science. And I find it socially rewarding because it is a teamwork effort embedded in a large group of (largely early career) scientists trying to improve our fields and build a more robust, cumulative science.
TLDR: I write meta-analyses on a contract basis, e.g. here, here, and here. If you want to commission a meta-analysis, and get a co-authored paper to boot, I’d love to hear from you.
Skills & background: I am a nonresident fellow at the Kahneman-Treisman Center at Princeton and an affiliate at the Humane and Sustainable Food Lab at Stanford. Previously I worked at Glo Foundation, Riskified, and Code Ocean.
Location/remote: Brooklyn.
Resume/CV/LinkedIn: see here.
Email/contact: setgree at gmail dot com
Other notes: I’m reasonably subject-agnostic, though my expertise is in behavioral science research.
This happens to be trending on Hacker News right now: https://www.ycombinator.com/blog/startup-school-east-boston/
they also have a cofounder matching program https://www.ycombinator.com/cofounder-matching
Probably nothing like this exists for EA-specific matching though IDK
As I argue in the SMC piece, not just any RCT will suffice, and today we know a lot more about what good research looks like. IMO, we should (collectively) be revisiting things we think we know with modern research methods. So yes, I think we can know things. But we are talking about hundreds of millions of dollars. Our evidentiary standards should be high.
Related: Keving Munger on temporal validity https://journals.sagepub.com/doi/10.1177/20531680231187271
I can see why this piece’s examples and tone will rankle folks here. But speaking for myself, I think its core contention is directionally correct: EA’s leading orgs’ and thinkers’ predictions and numeric estimates have an “all fur coat and no knickers” problem—putative precision but weak foundations. My entry to GiveWell’s Change Our Mind contest made basically the same point (albeit more politely).
Another way to frame this critique is to say it’s an instance of the Shirky principle: institutions will try to preserve the problem to which they are the solution. If GiveWell (or whoever) tried to clear up the ambiguous evidence underpinning its recommendations by funding more research (on the condition that the research would provide clear cost-benefit analyses in terms of lives saved per dollar), then what further purpose would the evaluator have once that estimate came back?
There are very reasonable counterpoints to this. I just think the critique is worth engaging with.
I amended the text to be less inflammatory