Chiming in here with my outsider impressions on how fair the process seems
@david_reinstein If I were to rank the evaluator reports, evaluation summary, and the EA Forum post in which ones seemed the most fair, I would have ranked the Forum post last. It wasn’t until I clicked through to the evaluation reports that I felt the process wasn’t so cutting.
Let me focus on one very specific framing in the Forum post, since it feels representative. One heading includes the phrase “this meta-analysis is not rigorous enough”. This has a few connotations that you probably didn’t mean. One, this meta-analysis is much worse than others. Two, the claims are questionable. Three, there’s a universally correct level of quality that meta-analyses should reach and anything that falls short of that is inadmissible as evidence.
In reality, it seems this meta-analysis is par for the course in terms of quality. And it was probably more difficult to do so given the heterogeneity in the literature. And the central claim of the meta-analysis doesn’t seem like something either evaluator disputed (though one evaluator was hesitant).
Again, I know that’s not what you meant and there are many caveats throughout the post. But it’s one of a few editorial choices that make the Forum post seem much more critical than the evaluation reports, which is a bit unusual since the Evaluators are the ones who are actually critiquing the paper.
Finally, one piece of context that felt odd not to mention was the fundamental difficulty of finding an expert in both food consumption and meta-analysis. That limits the ability of any reviewer to make a fair evaluation. This is acknowledged at the bottom of the Evaluation Summary. Elsewhere, I’m not sure where it’s said. Without that mentioned, I think it’s easy for a casual reader to leave thinking the two Evaluators are the “most correct”.
Thanks for the detailed feedback, this seems mostly reasonable. I’ll take a look again at some of the framings, and try to adjust. (Below and hopefully later in more detail).
the phrase “this meta-analysis is not rigorous enough”.
it seems this meta-analysis is par for the course in terms of quality.
This was my take on how to succinctly depict the evaluators’ reports (not my own take), in a way the casual reader would be able to digest. Maybe this was rounding down too much, but not by a lot, I think. Some quotes from Janés evaluation that I think are representative:
Overall, aside from its commendable transparency, the meta-analysis is not of particularly high quality
Overall, the transparency is strong, but the underlying analytic quality is limited.
This doesn’t seem to reflect ‘par for the course’ to me, but it depends on what the course is; i.e., what the comparison group. My own sense/guess is that this more rigorous and careful than most work in this area of meat consumption interventions (and adjacent) but less rigorous than the meta-analyses the evaluators are used to seeing in their academic contexts and the practices they espouse. But academic meta-analysts will tend to focus on areas where they can find a proliferation of high-quality more homogenous research, not necessarily the highest impact areas.
Note that the evaluators rated this 40th and 25th percentile for methods and 75th and 39th percentile overall.
And the central claim of the meta-analysis doesn’t seem like something either evaluator disputed (though one evaluator was hesitant).
To be honest I’m having trouble pinning down what the central claim of the meta-analysis is. Is it a claim that “the main approaches being used to motivate reduced meat consumption don’t seem to work”, i.e., that we can bound the effects as very small, at best? That’s how I’d interpret the reporting of the pooled effects 95% CI as standardized mean effect of 0.02 and 0.12. I would say that both evaluators are sort of disputing that claim.
However the authors hedge this in places and sometimes it sounds more like they’re saying that ~”even the best meta-analysis possible leaves a lot of uncertainty” … An absence of evidence more than an evidence of absence, and this is something the evaluators seem to agree with.
Finally, one piece of context that felt odd not to mention was the fundamental difficulty of finding an expert in both food consumption and meta-analysis.
That is/was indeed challenging. Let me try to adjust this post to note that.
a few editorial choices … make the Forum post seem much more critical than the evaluation reports, which is a bit unusual since the Evaluators are the ones who are actually critiquing the paper.
My goal for this post was to fairly represent the evaluator’s take, to provide insights to people who might want to use this for decision-making and future research, to raise the question of standards in meta-analysis in EA-related areas. I will keep thinking about whether I missed the mark here. One possible clarification though: we don’t frame the evaluator’s role as (only) looking to criticize or find errors in the paper. We ask them to give a fair assessment of it, evaluating its strengths, weaknesses, credibility, and usefulness. These evaluations can also be useful if they give people more confidence in the paper and its conclusions, and thus reason to update more on this for their own decision-making.
To be honest I’m having trouble pinning down what the central claim of the meta-analysis is.
To paraphrase Diddy’s character in Get Him to the Greek, “What are you talking about, the name of the [paper] is called “[Meaningfully reducing consumption of meat and animal products is an unsolved problem]!” (😃) That is our central claim. We’re not saying nothing works; we’re saying that meaningful reductions either have not been discovered yet or do not have substantial evidence in support.
However the authors hedge this in places
That’s author, singular. I said at the top of my initial response that I speak only for myself.
I think “an unsolved problem” could indicate several things. it could be
We have evidence that all of the commonly tried approaches are ineffective, i.e., we have measured all of their effects and they are tightly bounded as being very small
We have a lack of evidence, thus very wide credible intervals over the impact of each of the common approaches.
To me, the distinction is important. Do you agree?
You say above
meaningful reductions either have not been discovered yet or do not have substantial evidence in support
But even “do not have substantial evidence in support” could mean either of the above … a lack of evidence, or strong evidence that the effects are close to zero. At least to my ears.
As for ‘hedge this’, I was referring to the paper not to the response, but I can check this again.
For what it’s worth, I read that abstract as saying something like, “within the class of interventions studied so far, the literature has yet to settle onto any intervention that can reliably reduce animal product consumption by a meaningful amount, where meaningful amount might be a 1% reduction at Costco scale or long-term 10% reduction at a single cafeteria. The class of interventions being studied tends to be informational and nudge-style interventions like advertising, menu design, and media pamphlets. When effect sizes differ for a given type of intervention, the literature has not offered a convincing reason why a menu-design choice works in one setting versus another.”
Okay, now that I’ve typed that up, I can see why “unsolved problem” is unclear.
And I’m probably taking a lot of leaps of faith in interpretation here
From the POV of our core contention -- that we don’t currently have a validated, reliable intervention to deploy at scale—whether this is because of absence of evidence (AoE) or evidence of absence (EoA) is hard to say. I don’t have an overall answer, and ultimately both roads lead to “unsolved problem.”
We can cite good arguments for EoA (these studies are stronger than the norm in the field but show weaker effects, and that relationship should be troubling for advocates) or AoE (we’re not talking about very many studies at all), and ultimately I think the line between the two is in the eye of the beholder.
Going approach by approach, my personal answers are
choice architecture is probably AoE, it might work better than expected but we just don’t learn very much from 2 studies (I am working on something about this separately)
the animal welfare appeals are more EoA, esp. those from animal advocacy orgs
social psych approaches, I’m skeptical of but there weren’t a lot of high-quality papers so I’m not so sure (see here for a subsequent meta-analysis of dynamic norms approaches).
I would recommend health for older folks, environmental appeals for Gen Z. So there I’d say we have evidence of efficacy, but to expect effects to be on the order of a few percentage points.
Were I discussing this specifically with a funder, I would say, if you’re going to do one of the meta-analyzed approaches—psych, nudge, environment, health, or animal welfare, or some hybrid thereof—you should expect small effect sizes unless you have some strong reason to believe that your intervention is meaningfully better than the category average. For instance, animal welfare appeals might not work in general, but maybe watching Dominion is unusually effective. However, as we say in our paper, there are a lot of cool ideas that haven’t been tested rigorously yet, and from the point of view of knowledge, I’d like to see those get funded first.
Chiming in here with my outsider impressions on how fair the process seems
@david_reinstein If I were to rank the evaluator reports, evaluation summary, and the EA Forum post in which ones seemed the most fair, I would have ranked the Forum post last. It wasn’t until I clicked through to the evaluation reports that I felt the process wasn’t so cutting.
Let me focus on one very specific framing in the Forum post, since it feels representative. One heading includes the phrase “this meta-analysis is not rigorous enough”. This has a few connotations that you probably didn’t mean. One, this meta-analysis is much worse than others. Two, the claims are questionable. Three, there’s a universally correct level of quality that meta-analyses should reach and anything that falls short of that is inadmissible as evidence.
In reality, it seems this meta-analysis is par for the course in terms of quality. And it was probably more difficult to do so given the heterogeneity in the literature. And the central claim of the meta-analysis doesn’t seem like something either evaluator disputed (though one evaluator was hesitant).
Again, I know that’s not what you meant and there are many caveats throughout the post. But it’s one of a few editorial choices that make the Forum post seem much more critical than the evaluation reports, which is a bit unusual since the Evaluators are the ones who are actually critiquing the paper.
Finally, one piece of context that felt odd not to mention was the fundamental difficulty of finding an expert in both food consumption and meta-analysis. That limits the ability of any reviewer to make a fair evaluation. This is acknowledged at the bottom of the Evaluation Summary. Elsewhere, I’m not sure where it’s said. Without that mentioned, I think it’s easy for a casual reader to leave thinking the two Evaluators are the “most correct”.
Thanks for the detailed feedback, this seems mostly reasonable. I’ll take a look again at some of the framings, and try to adjust. (Below and hopefully later in more detail).
This was my take on how to succinctly depict the evaluators’ reports (not my own take), in a way the casual reader would be able to digest. Maybe this was rounding down too much, but not by a lot, I think. Some quotes from Janés evaluation that I think are representative:
This doesn’t seem to reflect ‘par for the course’ to me, but it depends on what the course is; i.e., what the comparison group. My own sense/guess is that this more rigorous and careful than most work in this area of meat consumption interventions (and adjacent) but less rigorous than the meta-analyses the evaluators are used to seeing in their academic contexts and the practices they espouse. But academic meta-analysts will tend to focus on areas where they can find a proliferation of high-quality more homogenous research, not necessarily the highest impact areas.
Note that the evaluators rated this 40th and 25th percentile for methods and 75th and 39th percentile overall.
To be honest I’m having trouble pinning down what the central claim of the meta-analysis is. Is it a claim that “the main approaches being used to motivate reduced meat consumption don’t seem to work”, i.e., that we can bound the effects as very small, at best? That’s how I’d interpret the reporting of the pooled effects 95% CI as standardized mean effect of 0.02 and 0.12. I would say that both evaluators are sort of disputing that claim.
However the authors hedge this in places and sometimes it sounds more like they’re saying that ~”even the best meta-analysis possible leaves a lot of uncertainty” … An absence of evidence more than an evidence of absence, and this is something the evaluators seem to agree with.
That is/was indeed challenging. Let me try to adjust this post to note that.
My goal for this post was to fairly represent the evaluator’s take, to provide insights to people who might want to use this for decision-making and future research, to raise the question of standards in meta-analysis in EA-related areas. I will keep thinking about whether I missed the mark here. One possible clarification though: we don’t frame the evaluator’s role as (only) looking to criticize or find errors in the paper. We ask them to give a fair assessment of it, evaluating its strengths, weaknesses, credibility, and usefulness. These evaluations can also be useful if they give people more confidence in the paper and its conclusions, and thus reason to update more on this for their own decision-making.
Hi David,
To paraphrase Diddy’s character in Get Him to the Greek, “What are you talking about, the name of the [paper] is called “[Meaningfully reducing consumption of meat and animal products is an unsolved problem]!” (😃) That is our central claim. We’re not saying nothing works; we’re saying that meaningful reductions either have not been discovered yet or do not have substantial evidence in support.
That’s author, singular. I said at the top of my initial response that I speak only for myself.
I think “an unsolved problem” could indicate several things. it could be
We have evidence that all of the commonly tried approaches are ineffective, i.e., we have measured all of their effects and they are tightly bounded as being very small
We have a lack of evidence, thus very wide credible intervals over the impact of each of the common approaches.
To me, the distinction is important. Do you agree?
You say above
But even “do not have substantial evidence in support” could mean either of the above … a lack of evidence, or strong evidence that the effects are close to zero. At least to my ears.
As for ‘hedge this’, I was referring to the paper not to the response, but I can check this again.
For what it’s worth, I read that abstract as saying something like, “within the class of interventions studied so far, the literature has yet to settle onto any intervention that can reliably reduce animal product consumption by a meaningful amount, where meaningful amount might be a 1% reduction at Costco scale or long-term 10% reduction at a single cafeteria. The class of interventions being studied tends to be informational and nudge-style interventions like advertising, menu design, and media pamphlets. When effect sizes differ for a given type of intervention, the literature has not offered a convincing reason why a menu-design choice works in one setting versus another.”
Okay, now that I’ve typed that up, I can see why “unsolved problem” is unclear.
And I’m probably taking a lot of leaps of faith in interpretation here
It’s an interesting question.
From the POV of our core contention -- that we don’t currently have a validated, reliable intervention to deploy at scale—whether this is because of absence of evidence (AoE) or evidence of absence (EoA) is hard to say. I don’t have an overall answer, and ultimately both roads lead to “unsolved problem.”
We can cite good arguments for EoA (these studies are stronger than the norm in the field but show weaker effects, and that relationship should be troubling for advocates) or AoE (we’re not talking about very many studies at all), and ultimately I think the line between the two is in the eye of the beholder.
Going approach by approach, my personal answers are
choice architecture is probably AoE, it might work better than expected but we just don’t learn very much from 2 studies (I am working on something about this separately)
the animal welfare appeals are more EoA, esp. those from animal advocacy orgs
social psych approaches, I’m skeptical of but there weren’t a lot of high-quality papers so I’m not so sure (see here for a subsequent meta-analysis of dynamic norms approaches).
I would recommend health for older folks, environmental appeals for Gen Z. So there I’d say we have evidence of efficacy, but to expect effects to be on the order of a few percentage points.
Were I discussing this specifically with a funder, I would say, if you’re going to do one of the meta-analyzed approaches—psych, nudge, environment, health, or animal welfare, or some hybrid thereof—you should expect small effect sizes unless you have some strong reason to believe that your intervention is meaningfully better than the category average. For instance, animal welfare appeals might not work in general, but maybe watching Dominion is unusually effective. However, as we say in our paper, there are a lot of cool ideas that haven’t been tested rigorously yet, and from the point of view of knowledge, I’d like to see those get funded first.