I’ll frame it explicitly here: when we did one check and not another, or one one search protocol and not another, the reason, every single time, is opportunity costs. When I write: “we thought it made more sense to focus on the risks of bias that seemed most specific to this literature,” notice the word ‘focus’, which means saying no.
That is clearly the case, and I accept there are tradeoffs. But ideally I would have liked to see a more direct response to the substance of the points made by the evaluators. But I understand that there are tradeoffs there as well.
In other words, because of opportunity costs, we are always triaging. At every juncture, navigating the explore/exploit dilemma requires judgment calls. You don’t have to like that I said no to you, but it’s not a false dichotomy, and I do not care for that characterization.
Perhaps ‘false dichotomy’ was too strong, given the opportunity costs (not an excuse: I got that from the RoastMyPost’s take on this). But as I understand it there are clear rubrics and guidelines for this meta-analyses. In cases where you choose to depart from the standard practice, maybe it’s reasonable to give a more detailed and grounded explanation of why you did this. And the evaluators did present very specific arguments for different practices you could have followed and could still follow in future work. I think judgment calls based on experience gets you somewhere but it would be better to explicitly defend why you made a particular judgment call, and respond to and consider the analytical points made by the evaluators. And ideally follow up with the checks they suggest, although I understand that it’s hard to do this given how busy you are and the nature of academic incentives.
I hope I am being fair here; I’m trying to be even-handed and sympathetic to both sides. Of course, for this exercise to be useful, we have to allow for and permit constructive expert criticism; which I think these evaluations do indeed embody. I appreciate you having responded to these at all. I’d be happy to get others’ opinions on whether we’ve been fair here.
To the second question of whether anyone will do the kind of extension work, I personally see this as a great exercise for grad students. I did all kinds of replication and extension exercises in grad school. A deep dive into a subset of contact hypothesis literature I did in a political psychology class in 2014 , which started with a replication attempt, eventually morphed into The Contact Hypothesis Re-evaluated.
If a grad student wanted to do this kind of project, please be in touch, I’d love to hear from you.
I had previously responded “casting this as ’for graduate students” makes it seem less valuable and prestigious,” which I still stand by. But I appreciate that you adjusted your response to note “If a grad student wanted to do this kind of project, please be in touch, I’d love to hear from you” which I think helps a lot.
The point I was making—perhaps preaching to the choir here:
These extensions and replication, and follow-up steps may be needed to a large project deeply credible and useful and to capture a large part of the value. Why not give equal esteem and career rewards for that? The current system of journals tends not to do so (at least not in economics, the field I’m most familiar with). This is one of the things that we hope that credible evaluation separated from journal publications can improve upon.
Chiming in here with my outsider impressions on how fair the process seems
@david_reinstein If I were to rank the evaluator reports, evaluation summary, and the EA Forum post in which ones seemed the most fair, I would have ranked the Forum post last. It wasn’t until I clicked through to the evaluation reports that I felt the process wasn’t so cutting.
Let me focus on one very specific framing in the Forum post, since it feels representative. One heading includes the phrase “this meta-analysis is not rigorous enough”. This has a few connotations that you probably didn’t mean. One, this meta-analysis is much worse than others. Two, the claims are questionable. Three, there’s a universally correct level of quality that meta-analyses should reach and anything that falls short of that is inadmissible as evidence.
In reality, it seems this meta-analysis is par for the course in terms of quality. And it was probably more difficult to do so given the heterogeneity in the literature. And the central claim of the meta-analysis doesn’t seem like something either evaluator disputed (though one evaluator was hesitant).
Again, I know that’s not what you meant and there are many caveats throughout the post. But it’s one of a few editorial choices that make the Forum post seem much more critical than the evaluation reports, which is a bit unusual since the Evaluators are the ones who are actually critiquing the paper.
Finally, one piece of context that felt odd not to mention was the fundamental difficulty of finding an expert in both food consumption and meta-analysis. That limits the ability of any reviewer to make a fair evaluation. This is acknowledged at the bottom of the Evaluation Summary. Elsewhere, I’m not sure where it’s said. Without that mentioned, I think it’s easy for a casual reader to leave thinking the two Evaluators are the “most correct”.
Thanks for the detailed feedback, this seems mostly reasonable. I’ll take a look again at some of the framings, and try to adjust. (Below and hopefully later in more detail).
the phrase “this meta-analysis is not rigorous enough”.
it seems this meta-analysis is par for the course in terms of quality.
This was my take on how to succinctly depict the evaluators’ reports (not my own take), in a way the casual reader would be able to digest. Maybe this was rounding down too much, but not by a lot, I think. Some quotes from Janés evaluation that I think are representative:
Overall, aside from its commendable transparency, the meta-analysis is not of particularly high quality
Overall, the transparency is strong, but the underlying analytic quality is limited.
This doesn’t seem to reflect ‘par for the course’ to me, but it depends on what the course is; i.e., what the comparison group. My own sense/guess is that this more rigorous and careful than most work in this area of meat consumption interventions (and adjacent) but less rigorous than the meta-analyses the evaluators are used to seeing in their academic contexts and the practices they espouse. But academic meta-analysts will tend to focus on areas where they can find a proliferation of high-quality more homogenous research, not necessarily the highest impact areas.
Note that the evaluators rated this 40th and 25th percentile for methods and 75th and 39th percentile overall.
And the central claim of the meta-analysis doesn’t seem like something either evaluator disputed (though one evaluator was hesitant).
To be honest I’m having trouble pinning down what the central claim of the meta-analysis is. Is it a claim that “the main approaches being used to motivate reduced meat consumption don’t seem to work”, i.e., that we can bound the effects as very small, at best? That’s how I’d interpret the reporting of the pooled effects 95% CI as standardized mean effect of 0.02 and 0.12. I would say that both evaluators are sort of disputing that claim.
However the authors hedge this in places and sometimes it sounds more like they’re saying that ~”even the best meta-analysis possible leaves a lot of uncertainty” … An absence of evidence more than an evidence of absence, and this is something the evaluators seem to agree with.
Finally, one piece of context that felt odd not to mention was the fundamental difficulty of finding an expert in both food consumption and meta-analysis.
That is/was indeed challenging. Let me try to adjust this post to note that.
a few editorial choices … make the Forum post seem much more critical than the evaluation reports, which is a bit unusual since the Evaluators are the ones who are actually critiquing the paper.
My goal for this post was to fairly represent the evaluator’s take, to provide insights to people who might want to use this for decision-making and future research, to raise the question of standards in meta-analysis in EA-related areas. I will keep thinking about whether I missed the mark here. One possible clarification though: we don’t frame the evaluator’s role as (only) looking to criticize or find errors in the paper. We ask them to give a fair assessment of it, evaluating its strengths, weaknesses, credibility, and usefulness. These evaluations can also be useful if they give people more confidence in the paper and its conclusions, and thus reason to update more on this for their own decision-making.
To be honest I’m having trouble pinning down what the central claim of the meta-analysis is.
To paraphrase Diddy’s character in Get Him to the Greek, “What are you talking about, the name of the [paper] is called “[Meaningfully reducing consumption of meat and animal products is an unsolved problem]!” (😃) That is our central claim. We’re not saying nothing works; we’re saying that meaningful reductions either have not been discovered yet or do not have substantial evidence in support.
However the authors hedge this in places
That’s author, singular. I said at the top of my initial response that I speak only for myself.
I think “an unsolved problem” could indicate several things. it could be
We have evidence that all of the commonly tried approaches are ineffective, i.e., we have measured all of their effects and they are tightly bounded as being very small
We have a lack of evidence, thus very wide credible intervals over the impact of each of the common approaches.
To me, the distinction is important. Do you agree?
You say above
meaningful reductions either have not been discovered yet or do not have substantial evidence in support
But even “do not have substantial evidence in support” could mean either of the above … a lack of evidence, or strong evidence that the effects are close to zero. At least to my ears.
As for ‘hedge this’, I was referring to the paper not to the response, but I can check this again.
For what it’s worth, I read that abstract as saying something like, “within the class of interventions studied so far, the literature has yet to settle onto any intervention that can reliably reduce animal product consumption by a meaningful amount, where meaningful amount might be a 1% reduction at Costco scale or long-term 10% reduction at a single cafeteria. The class of interventions being studied tends to be informational and nudge-style interventions like advertising, menu design, and media pamphlets. When effect sizes differ for a given type of intervention, the literature has not offered a convincing reason why a menu-design choice works in one setting versus another.”
Okay, now that I’ve typed that up, I can see why “unsolved problem” is unclear.
And I’m probably taking a lot of leaps of faith in interpretation here
From the POV of our core contention -- that we don’t currently have a validated, reliable intervention to deploy at scale—whether this is because of absence of evidence (AoE) or evidence of absence (EoA) is hard to say. I don’t have an overall answer, and ultimately both roads lead to “unsolved problem.”
We can cite good arguments for EoA (these studies are stronger than the norm in the field but show weaker effects, and that relationship should be troubling for advocates) or AoE (we’re not talking about very many studies at all), and ultimately I think the line between the two is in the eye of the beholder.
Going approach by approach, my personal answers are
choice architecture is probably AoE, it might work better than expected but we just don’t learn very much from 2 studies (I am working on something about this separately)
the animal welfare appeals are more EoA, esp. those from animal advocacy orgs
social psych approaches, I’m skeptical of but there weren’t a lot of high-quality papers so I’m not so sure (see here for a subsequent meta-analysis of dynamic norms approaches).
I would recommend health for older folks, environmental appeals for Gen Z. So there I’d say we have evidence of efficacy, but to expect effects to be on the order of a few percentage points.
Were I discussing this specifically with a funder, I would say, if you’re going to do one of the meta-analyzed approaches—psych, nudge, environment, health, or animal welfare, or some hybrid thereof—you should expect small effect sizes unless you have some strong reason to believe that your intervention is meaningfully better than the category average. For instance, animal welfare appeals might not work in general, but maybe watching Dominion is unusually effective. However, as we say in our paper, there are a lot of cool ideas that haven’t been tested rigorously yet, and from the point of view of knowledge, I’d like to see those get funded first.
That is clearly the case, and I accept there are tradeoffs. But ideally I would have liked to see a more direct response to the substance of the points made by the evaluators. But I understand that there are tradeoffs there as well.
Perhaps ‘false dichotomy’ was too strong, given the opportunity costs (not an excuse: I got that from the RoastMyPost’s take on this). But as I understand it there are clear rubrics and guidelines for this meta-analyses. In cases where you choose to depart from the standard practice, maybe it’s reasonable to give a more detailed and grounded explanation of why you did this. And the evaluators did present very specific arguments for different practices you could have followed and could still follow in future work. I think judgment calls based on experience gets you somewhere but it would be better to explicitly defend why you made a particular judgment call, and respond to and consider the analytical points made by the evaluators. And ideally follow up with the checks they suggest, although I understand that it’s hard to do this given how busy you are and the nature of academic incentives.
I hope I am being fair here; I’m trying to be even-handed and sympathetic to both sides. Of course, for this exercise to be useful, we have to allow for and permit constructive expert criticism; which I think these evaluations do indeed embody. I appreciate you having responded to these at all. I’d be happy to get others’ opinions on whether we’ve been fair here.
I had previously responded “casting this as ’for graduate students” makes it seem less valuable and prestigious,” which I still stand by. But I appreciate that you adjusted your response to note “If a grad student wanted to do this kind of project, please be in touch, I’d love to hear from you” which I think helps a lot.
The point I was making—perhaps preaching to the choir here:
These extensions and replication, and follow-up steps may be needed to a large project deeply credible and useful and to capture a large part of the value. Why not give equal esteem and career rewards for that? The current system of journals tends not to do so (at least not in economics, the field I’m most familiar with). This is one of the things that we hope that credible evaluation separated from journal publications can improve upon.
Chiming in here with my outsider impressions on how fair the process seems
@david_reinstein If I were to rank the evaluator reports, evaluation summary, and the EA Forum post in which ones seemed the most fair, I would have ranked the Forum post last. It wasn’t until I clicked through to the evaluation reports that I felt the process wasn’t so cutting.
Let me focus on one very specific framing in the Forum post, since it feels representative. One heading includes the phrase “this meta-analysis is not rigorous enough”. This has a few connotations that you probably didn’t mean. One, this meta-analysis is much worse than others. Two, the claims are questionable. Three, there’s a universally correct level of quality that meta-analyses should reach and anything that falls short of that is inadmissible as evidence.
In reality, it seems this meta-analysis is par for the course in terms of quality. And it was probably more difficult to do so given the heterogeneity in the literature. And the central claim of the meta-analysis doesn’t seem like something either evaluator disputed (though one evaluator was hesitant).
Again, I know that’s not what you meant and there are many caveats throughout the post. But it’s one of a few editorial choices that make the Forum post seem much more critical than the evaluation reports, which is a bit unusual since the Evaluators are the ones who are actually critiquing the paper.
Finally, one piece of context that felt odd not to mention was the fundamental difficulty of finding an expert in both food consumption and meta-analysis. That limits the ability of any reviewer to make a fair evaluation. This is acknowledged at the bottom of the Evaluation Summary. Elsewhere, I’m not sure where it’s said. Without that mentioned, I think it’s easy for a casual reader to leave thinking the two Evaluators are the “most correct”.
Thanks for the detailed feedback, this seems mostly reasonable. I’ll take a look again at some of the framings, and try to adjust. (Below and hopefully later in more detail).
This was my take on how to succinctly depict the evaluators’ reports (not my own take), in a way the casual reader would be able to digest. Maybe this was rounding down too much, but not by a lot, I think. Some quotes from Janés evaluation that I think are representative:
This doesn’t seem to reflect ‘par for the course’ to me, but it depends on what the course is; i.e., what the comparison group. My own sense/guess is that this more rigorous and careful than most work in this area of meat consumption interventions (and adjacent) but less rigorous than the meta-analyses the evaluators are used to seeing in their academic contexts and the practices they espouse. But academic meta-analysts will tend to focus on areas where they can find a proliferation of high-quality more homogenous research, not necessarily the highest impact areas.
Note that the evaluators rated this 40th and 25th percentile for methods and 75th and 39th percentile overall.
To be honest I’m having trouble pinning down what the central claim of the meta-analysis is. Is it a claim that “the main approaches being used to motivate reduced meat consumption don’t seem to work”, i.e., that we can bound the effects as very small, at best? That’s how I’d interpret the reporting of the pooled effects 95% CI as standardized mean effect of 0.02 and 0.12. I would say that both evaluators are sort of disputing that claim.
However the authors hedge this in places and sometimes it sounds more like they’re saying that ~”even the best meta-analysis possible leaves a lot of uncertainty” … An absence of evidence more than an evidence of absence, and this is something the evaluators seem to agree with.
That is/was indeed challenging. Let me try to adjust this post to note that.
My goal for this post was to fairly represent the evaluator’s take, to provide insights to people who might want to use this for decision-making and future research, to raise the question of standards in meta-analysis in EA-related areas. I will keep thinking about whether I missed the mark here. One possible clarification though: we don’t frame the evaluator’s role as (only) looking to criticize or find errors in the paper. We ask them to give a fair assessment of it, evaluating its strengths, weaknesses, credibility, and usefulness. These evaluations can also be useful if they give people more confidence in the paper and its conclusions, and thus reason to update more on this for their own decision-making.
Hi David,
To paraphrase Diddy’s character in Get Him to the Greek, “What are you talking about, the name of the [paper] is called “[Meaningfully reducing consumption of meat and animal products is an unsolved problem]!” (😃) That is our central claim. We’re not saying nothing works; we’re saying that meaningful reductions either have not been discovered yet or do not have substantial evidence in support.
That’s author, singular. I said at the top of my initial response that I speak only for myself.
I think “an unsolved problem” could indicate several things. it could be
We have evidence that all of the commonly tried approaches are ineffective, i.e., we have measured all of their effects and they are tightly bounded as being very small
We have a lack of evidence, thus very wide credible intervals over the impact of each of the common approaches.
To me, the distinction is important. Do you agree?
You say above
But even “do not have substantial evidence in support” could mean either of the above … a lack of evidence, or strong evidence that the effects are close to zero. At least to my ears.
As for ‘hedge this’, I was referring to the paper not to the response, but I can check this again.
For what it’s worth, I read that abstract as saying something like, “within the class of interventions studied so far, the literature has yet to settle onto any intervention that can reliably reduce animal product consumption by a meaningful amount, where meaningful amount might be a 1% reduction at Costco scale or long-term 10% reduction at a single cafeteria. The class of interventions being studied tends to be informational and nudge-style interventions like advertising, menu design, and media pamphlets. When effect sizes differ for a given type of intervention, the literature has not offered a convincing reason why a menu-design choice works in one setting versus another.”
Okay, now that I’ve typed that up, I can see why “unsolved problem” is unclear.
And I’m probably taking a lot of leaps of faith in interpretation here
It’s an interesting question.
From the POV of our core contention -- that we don’t currently have a validated, reliable intervention to deploy at scale—whether this is because of absence of evidence (AoE) or evidence of absence (EoA) is hard to say. I don’t have an overall answer, and ultimately both roads lead to “unsolved problem.”
We can cite good arguments for EoA (these studies are stronger than the norm in the field but show weaker effects, and that relationship should be troubling for advocates) or AoE (we’re not talking about very many studies at all), and ultimately I think the line between the two is in the eye of the beholder.
Going approach by approach, my personal answers are
choice architecture is probably AoE, it might work better than expected but we just don’t learn very much from 2 studies (I am working on something about this separately)
the animal welfare appeals are more EoA, esp. those from animal advocacy orgs
social psych approaches, I’m skeptical of but there weren’t a lot of high-quality papers so I’m not so sure (see here for a subsequent meta-analysis of dynamic norms approaches).
I would recommend health for older folks, environmental appeals for Gen Z. So there I’d say we have evidence of efficacy, but to expect effects to be on the order of a few percentage points.
Were I discussing this specifically with a funder, I would say, if you’re going to do one of the meta-analyzed approaches—psych, nudge, environment, health, or animal welfare, or some hybrid thereof—you should expect small effect sizes unless you have some strong reason to believe that your intervention is meaningfully better than the category average. For instance, animal welfare appeals might not work in general, but maybe watching Dominion is unusually effective. However, as we say in our paper, there are a lot of cool ideas that haven’t been tested rigorously yet, and from the point of view of knowledge, I’d like to see those get funded first.