I meant “constructive and actionable” In that he explained why the practices used in the paper had potentially important limitations (see here on “assigning an effect size of .01 for n.s. results where effects are incalculable”)...
And suggested a practical response including a specific statistical package which could be applied to the existing data:
”An option to mitigate this is through multiple imputation, which can be done through the metansue (i.e., meta-analysis of non-significant and unreported effects) package”
In terms of the cost-benefit test it depends on which benefit we are considering here. Addressing these concerns might indeed take months to do and might indeed cost hundreds of hours. Indeed, it’s hard to justify this in terms of the current academic/career incentives alone, as the paper had already been accepted for publication. If this we’re directly tied to grants there might be a case but as it stands I understand that it could be very difficult for you to take this further.
But I wouldn’t characterize doing this as simply “satisfying two critics”. The critiques themselves might be sound and relevant, and potentially impact the conclusion (at least in differentiating between “we have evidence,” the effects are small and “the evidence is indeterminate”, which I think is an important difference). And the value of the underlying policy question (~‘Should animal welfare advocates be using funding existing approaches to reducing mep consumption?’) seems high to me. So I would suggest that the benefit exceeds the cost here in net even if we might not have a formula for making it worth your while to make these adjustments right now.
I also think there might be value in setting an example standard that, particularly for high-value questions like this, we strive for a high level of robustness, following up on a range of potential concerns and critiques etc. I’d like to see these things as long-run living projects that can be continuously improved and updated (and re-evaluated). The current research reward system doesn’t encourage this, which is a gap we are trying to help fill.
David, there are two separate questions here, which is whether these analyses should be done or whether I should have done them in response to the evaluations. If you think these analyses are worth doing, by all means, go ahead!
Seth, for what it’s worth, I found your hourly estimates (provided in these forum comments but not something I saw in the evaluator response) on how long the extensions would take to be illuminating. Very rough numbers like this meta-analysis taking 1000 hours for you or a robustness check taking dozens / hundreds of hours more to do properly helps contextualize how reasonable the critiques are.
It’s easy for me (even now while pursuing research, but especially before when I was merely consuming it) to think these changes would take a few days.
It’s also gives me insight into the research production process. How long does it take to do a meta-analysis? How much does rigor cost? How much insight does rigor buy? What insight is possible given current studies? Questions like that help me figure out whether a project is worth pursuing and whether it’s compatible with career incentives or more of a non-promotable task
IMHO, the “math hard” parts of meta-analysis are figuring out what questions you want to ask, what are sensible inclusion criteria, and what statistical models are appropriate. Asking how much time this takes is the same as asking, where do ideas come from?
The “bodybuilding hard” part of meta-analysis is finding literature. The evaluators didn’t care for our search strategy, which you could charitably call “bespoke” and uncharitably call “ad hoc and fundamentally unreplicable.” But either way, I read about 1000 papers closely enough to see if they qualified for inclusion, and then, partly to make sure I didn’t duplicate my own efforts, I recorded notes on every study that looked appropriate but wasn’t. I also read, or at least read the bibliographies of, about 160 previous reviews. Maybe you’re a faster reader than I am, but ballpark, this was 500+ hours of work.
Regarding the computational aspects, the git history tells the story, but specifically making everything computationally reproducible, e.g. writing the functions, checking my own work, setting things up to be generalizable—a week of work in total? I’m not sure.
As I reread reviewer 2′s comments today, it occurred to me that some of their ideas might be interesting test cases for what Claude Code is and is not capable of doing. I’m thinking particularly of trying to formally incorporate my subjective notes about uncertainty (e.g. the many places where I admit that the effect size estimates involved a lot of guesswork) into some kind of...supplementary regression term about how much weight an estimate should get in meta-analysis? Like maybe I’d use Wasserstein-2 distance, as my advisor Don recently proposed? Or Bayesian meta-analysis? This is an important problem, and I don’t consider it solved by RoB2 or whatever, which means that fixing it might be, IDK, a whole new paper which takes however long that does? As my co-authors Don and Betsy & co. comment in a separate paper on which I was an RA: > Too often, research syntheses focus solely on estimating effect sizes, regardless of whether the treatments are realistic, the outcomes are assessed unobtrusively, and the key features of the experiment are presented in a transparent manner. Here we focus on what we term landmark studies, which are studies that are exceptionally well-designed and executed (regardless of what they discover). These studies provide a glimpse of what a meta-analysis would reveal if we could weight studies by quality as well as quantity. [the point being, meta-analysis is not well-suited for weighing by quality.]
It’s possible that some of the proposed changes would take less time than that. Maybe risk of bias assessment could be knocked out in a week?. But it’s been about a year since the relevant studies were in my working memory, which means I’d probably have to re-read them all, and across our main and supplementary dataset, that’s dozens of papers. How long does it take you to read dozens of papers? I’d say I can read about 3-4 papers a day closely if I’m really, really cranking. So in all likelihood, yes, weeks of work, and that’s weeks where I wouldn’t be working on a project about building empathy for chickens. Which admittedly I’m procrastinating on by writing this 500+ word comment 😃
I meant “constructive and actionable” In that he explained why the practices used in the paper had potentially important limitations (see here on “assigning an effect size of .01 for n.s. results where effects are incalculable”)...
And suggested a practical response including a specific statistical package which could be applied to the existing data:
”An option to mitigate this is through multiple imputation, which can be done through the
metansue(i.e., meta-analysis of non-significant and unreported effects) package”In terms of the cost-benefit test it depends on which benefit we are considering here. Addressing these concerns might indeed take months to do and might indeed cost hundreds of hours. Indeed, it’s hard to justify this in terms of the current academic/career incentives alone, as the paper had already been accepted for publication. If this we’re directly tied to grants there might be a case but as it stands I understand that it could be very difficult for you to take this further.
But I wouldn’t characterize doing this as simply “satisfying two critics”. The critiques themselves might be sound and relevant, and potentially impact the conclusion (at least in differentiating between “we have evidence,” the effects are small and “the evidence is indeterminate”, which I think is an important difference). And the value of the underlying policy question (~‘Should animal welfare advocates be using funding existing approaches to reducing mep consumption?’) seems high to me. So I would suggest that the benefit exceeds the cost here in net even if we might not have a formula for making it worth your while to make these adjustments right now.
I also think there might be value in setting an example standard that, particularly for high-value questions like this, we strive for a high level of robustness, following up on a range of potential concerns and critiques etc. I’d like to see these things as long-run living projects that can be continuously improved and updated (and re-evaluated). The current research reward system doesn’t encourage this, which is a gap we are trying to help fill.
David, there are two separate questions here, which is whether these analyses should be done or whether I should have done them in response to the evaluations. If you think these analyses are worth doing, by all means, go ahead!
Seth, for what it’s worth, I found your hourly estimates (provided in these forum comments but not something I saw in the evaluator response) on how long the extensions would take to be illuminating. Very rough numbers like this meta-analysis taking 1000 hours for you or a robustness check taking dozens / hundreds of hours more to do properly helps contextualize how reasonable the critiques are.
It’s easy for me (even now while pursuing research, but especially before when I was merely consuming it) to think these changes would take a few days.
It’s also gives me insight into the research production process. How long does it take to do a meta-analysis? How much does rigor cost? How much insight does rigor buy? What insight is possible given current studies? Questions like that help me figure out whether a project is worth pursuing and whether it’s compatible with career incentives or more of a non-promotable task
Love talking nitty gritty of meta-analysis 😃
IMHO, the “math hard” parts of meta-analysis are figuring out what questions you want to ask, what are sensible inclusion criteria, and what statistical models are appropriate. Asking how much time this takes is the same as asking, where do ideas come from?
The “bodybuilding hard” part of meta-analysis is finding literature. The evaluators didn’t care for our search strategy, which you could charitably call “bespoke” and uncharitably call “ad hoc and fundamentally unreplicable.” But either way, I read about 1000 papers closely enough to see if they qualified for inclusion, and then, partly to make sure I didn’t duplicate my own efforts, I recorded notes on every study that looked appropriate but wasn’t. I also read, or at least read the bibliographies of, about 160 previous reviews. Maybe you’re a faster reader than I am, but ballpark, this was 500+ hours of work.
Regarding the computational aspects, the git history tells the story, but specifically making everything computationally reproducible, e.g. writing the functions, checking my own work, setting things up to be generalizable—a week of work in total? I’m not sure.
The paper went through many internal revisions and changed shape a lot from its initial draft when we pivoted in how we treated red and processed meat. That’s hundreds of hours. Peer review was probably another 40 hour workweek.
As I reread reviewer 2′s comments today, it occurred to me that some of their ideas might be interesting test cases for what Claude Code is and is not capable of doing. I’m thinking particularly of trying to formally incorporate my subjective notes about uncertainty (e.g. the many places where I admit that the effect size estimates involved a lot of guesswork) into some kind of...supplementary regression term about how much weight an estimate should get in meta-analysis? Like maybe I’d use Wasserstein-2 distance, as my advisor Don recently proposed? Or Bayesian meta-analysis? This is an important problem, and I don’t consider it solved by RoB2 or whatever, which means that fixing it might be, IDK, a whole new paper which takes however long that does? As my co-authors Don and Betsy & co. comment in a separate paper on which I was an RA:
> Too often, research syntheses focus solely on estimating effect sizes, regardless of whether the treatments are realistic, the outcomes are assessed unobtrusively, and the key features of the experiment are presented in a transparent manner. Here we focus on what we term landmark studies, which are studies that are exceptionally well-designed and executed (regardless of what they discover). These studies provide a glimpse of what a meta-analysis would reveal if we could weight studies by quality as well as quantity. [the point being, meta-analysis is not well-suited for weighing by quality.]
It’s possible that some of the proposed changes would take less time than that. Maybe risk of bias assessment could be knocked out in a week?. But it’s been about a year since the relevant studies were in my working memory, which means I’d probably have to re-read them all, and across our main and supplementary dataset, that’s dozens of papers. How long does it take you to read dozens of papers? I’d say I can read about 3-4 papers a day closely if I’m really, really cranking. So in all likelihood, yes, weeks of work, and that’s weeks where I wouldn’t be working on a project about building empathy for chickens. Which admittedly I’m procrastinating on by writing this 500+ word comment 😃