For what itās worth, I thought Davidās characterization of the evaluations was totally fair, even a bit toned down. E.g. this is the headline finding of one of them:
major methodological issues undermine the studyās validity. These include improper missing data handling, unnecessary exclusion of small studies, extensive guessing in effect size coding, lacking a serious risk-of-bias assessment, and excluding all-but-one outcome per study.
David characterizes these as āconstructive and actionable insights and suggestionsā. I would say they are tantamount to asking for a new paper, especially the excluding of small studies, which was core to our design and would require a whole new search, which would take months. To me, it was obvious that I was not going to do that (the paper had already been accepted for publication at that point). The remaining suggestions also implied dozens ( hundreds?) of hours of work. Spending weeks satisfying two critics didnāt pass a cost-benefit test.[1] It wasnāt a close call.
I meant āconstructive and actionableā In that he explained why the practices used in the paper had potentially important limitations (see here on āassigning an effect size of .01 for n.s. results where effects are incalculableā)...
And suggested a practical response including a specific statistical package which could be applied to the existing data:
āAn option to mitigate this is through multiple imputation, which can be done through the metansue (i.e., meta-analysis of non-significant and unreported effects) packageā
In terms of the cost-benefit test it depends on which benefit we are considering here. Addressing these concerns might indeed take months to do and might indeed cost hundreds of hours. Indeed, itās hard to justify this in terms of the current academic/ācareer incentives alone, as the paper had already been accepted for publication. If this weāre directly tied to grants there might be a case but as it stands I understand that it could be very difficult for you to take this further.
But I wouldnāt characterize doing this as simply āsatisfying two criticsā. The critiques themselves might be sound and relevant, and potentially impact the conclusion (at least in differentiating between āwe have evidence,ā the effects are small and āthe evidence is indeterminateā, which I think is an important difference). And the value of the underlying policy question (~āShould animal welfare advocates be using funding existing approaches to reducing mep consumption?ā) seems high to me. So I would suggest that the benefit exceeds the cost here in net even if we might not have a formula for making it worth your while to make these adjustments right now.
I also think there might be value in setting an example standard that, particularly for high-value questions like this, we strive for a high level of robustness, following up on a range of potential concerns and critiques etc. Iād like to see these things as long-run living projects that can be continuously improved and updated (and re-evaluated). The current research reward system doesnāt encourage this, which is a gap we are trying to help fill.
David, there are two separate questions here, which is whether these analyses should be done or whether I should have done them in response to the evaluations. If you think these analyses are worth doing, by all means, go ahead!
Seth, for what itās worth, I found your hourly estimates (provided in these forum comments but not something I saw in the evaluator response) on how long the extensions would take to be illuminating. Very rough numbers like this meta-analysis taking 1000 hours for you or a robustness check taking dozens /ā hundreds of hours more to do properly helps contextualize how reasonable the critiques are.
Itās easy for me (even now while pursuing research, but especially before when I was merely consuming it) to think these changes would take a few days.
Itās also gives me insight into the research production process. How long does it take to do a meta-analysis? How much does rigor cost? How much insight does rigor buy? What insight is possible given current studies? Questions like that help me figure out whether a project is worth pursuing and whether itās compatible with career incentives or more of a non-promotable task
IMHO, the āmath hardā parts of meta-analysis are figuring out what questions you want to ask, what are sensible inclusion criteria, and what statistical models are appropriate. Asking how much time this takes is the same as asking, where do ideas come from?
The ābodybuilding hardā part of meta-analysis is finding literature. The evaluators didnāt care for our search strategy, which you could charitably call ābespokeā and uncharitably call āad hoc and fundamentally unreplicable.ā But either way, I read about 1000 papers closely enough to see if they qualified for inclusion, and then, partly to make sure I didnāt duplicate my own efforts, I recorded notes on every study that looked appropriate but wasnāt. I also read, or at least read the bibliographies of, about 160 previous reviews. Maybe youāre a faster reader than I am, but ballpark, this was 500+ hours of work.
Regarding the computational aspects, the git history tells the story, but specifically making everything computationally reproducible, e.g. writing the functions, checking my own work, setting things up to be generalizableāa week of work in total? Iām not sure.
As I reread reviewer 2ā²s comments today, it occurred to me that some of their ideas might be interesting test cases for what Claude Code is and is not capable of doing. Iām thinking particularly of trying to formally incorporate my subjective notes about uncertainty (e.g. the many places where I admit that the effect size estimates involved a lot of guesswork) into some kind of...supplementary regression term about how much weight an estimate should get in meta-analysis? Like maybe Iād use Wasserstein-2 distance, as my advisor Don recently proposed? Or Bayesian meta-analysis? This is an important problem, and I donāt consider it solved by RoB2 or whatever, which means that fixing it might be, IDK, a whole new paper which takes however long that does? As my co-authors Don and Betsy & co. comment in a separate paper on which I was an RA: > Too often, research syntheses focus solely on estimating effect sizes, regardless of whether the treatments are realistic, the outcomes are assessed unobtrusively, and the key features of the experiment are presented in a transparent manner. Here we focus on what we term landmark studies, which are studies that are exceptionally well-designed and executed (regardless of what they discover). These studies provide a glimpse of what a meta-analysis would reveal if we could weight studies by quality as well as quantity. [the point being, meta-analysis is not well-suited for weighing by quality.]
Itās possible that some of the proposed changes would take less time than that. Maybe risk of bias assessment could be knocked out in a week?. But itās been about a year since the relevant studies were in my working memory, which means Iād probably have to re-read them all, and across our main and supplementary dataset, thatās dozens of papers. How long does it take you to read dozens of papers? Iād say I can read about 3-4 papers a day closely if Iām really, really cranking. So in all likelihood, yes, weeks of work, and thatās weeks where I wouldnāt be working on a project about building empathy for chickens. Which admittedly Iām procrastinating on by writing this 500+ word comment š
A final reflective note: David, I want to encourage you to think about the optics/āpolitics of this exchange from the point of view of prospective Unjornal participants/āauthors. There are no incentives to participate. I did it because I thought it would be fun and I was wondering if anyone would have ideas or extensions that improved the paper. Instead, I got some rather harsh criticisms implying we should have written a totally different paper. Then I got this essay, which was unexpected/āunannounced and used, again, rather harsh language to which I objected. Do you think this exchange looks like an appealing experience to others? Iād say the answer is probably not.
A potential alternative: I took a grad school seminar where we replicated and extended other peopleās papers. Typically the assignment was to do the robustness checks in R or whatever, and then the author would come in and weād discuss. It was a great setup. It worked because the grad students actually did the work, which provided an incentive to participate for authors. The co-teachers also pre-selected papers that they thought were reasonably high-quality, and I bet that if they got a student response like Matthewās, they would have counseled them to be much more conciliatory, to remember that participation is voluntary, to think through the risks of making enemies (as I counseled in my original response), etc. I wonder if something like that would work here too. Like, the expectation is that reviewers will computationally reproduce the paper, conduct extensions and robustness checks, ask questions if they have them, work collaboratively with authors, and then publish a review summarizing the exchange. That would be enticing! Instead what I got here was like a second set of peer reviewers, and unusually harsh ones at that, and nobody likes peer review.
It might be the case that meta-analyses arenāt good candidates for this kind of work, because the extensions/ārobustness checks would probably also have taken Matthew and the other responder weeks, e.g. a fine end of semester project for class credit but not a very enticing hobby.
A final reflective note: David, I want to encourage you to think about the optics/āpolitics of this exchange from the point of view of prospective Unjornal participants/āauthors.
I appreciate the feedback. Iām definitely aware that we want to make this attractive to authors and others, both to submit their work and to engage with our evaluations. Note that in addition to asking for author submissions, our team nominates and prioritizes high-profile and potential-high-impact work, and contact authors to get their updates, suggestions, and (later) responses. (We generally only require author permission to do these evaluations from early-career authors at a sensitive point in their career.) We are grateful to you for having responded to these evaluations.
There are no incentives to participate.
I would disagree with this. We previously had author prizes (financial and reputational) focusing on authors who submitted work for our evaluation. although these prizes are not currently active. Iām keen to revise these prizes when the situation permits (funding and partners).
But there are a range of other incentives (not directly financial) for authors to submit their work, respond to evaluations and engage in other ways. I provide a detailed author FAQ here. This includes getting constructive feedback, signaling your confidence in your paper and openness to criticism, the potential for highly positive evaluations to help your paperās reputation, visibility, unlocking impact and grants, and more. (Our goal is that these evaluations will ultimately become the object of value in and of themselves, replacing āpublication in a journalā for research credibility and career rewards. But I admit thatās a long path.)
I did it because I thought it would be fun ad I was wondering if anyone would have ideas or extensions that improved the paper. Instead, I got some rather harsh criticisms implying we should have written a totally different paper.
Then I got this essay, which was unexpected/āunannounced and used, again, rather harsh language to which I objected. Do you think this exchange looks like an appealing experience to others? Iād say the answer is probably not.
I think itās important to communicate the results of our evaluations to wider audiences, and not only on our own platform. As I mentioned, I tried to fairly categorize your paper, the nature of the evaluations, and your response. Iāve adjusted my post above in response to some of your points where there was a case to be made that I was using loaded language, etc.
Would you recommend that I share any such posts with both the authors and the evaluators before making them? Itās a genuine question (to you and to anyone else reading these comments) - Iām not sure the correct answer.
As to your suggestion at the bottom, I will read and consider it more carefullyāit sounds good.
Aside: Iām still concerned with the connotation of replication, extension, and robustness checking being something that should be relegated to graduate students and not. This seems to diminish the value and prestige of work that I believe to be of the highest order practical value for important decisions in the animal welfare space and beyond.
In the replication/ārobustness checking domain, I think what i4replication.org is doing is excellent. Theyāre working with both graduate students and everyone from graduate students to senior professors to do this work and treating this as a high-value output meriting direct career rewards. I believe they encourage the replicators to be fair ā excessively conciliatory nor harsh, and focus on the methodology. We are in contact with i4replication.org and hoping to work with them more closely, with our evaluations and āevaluation gamesā offering grounded suggestions for robustness replication checks.
Would you recommend that I share any such posts with both the authors and the evaluators before making them?
Yes. But zooming back out, I donāt know if these EA Forum posts are necessary.
A practice I saw i4replication (or some other replication lab) is that the editors didnāt provide any āvalue-addedā commentary on any given paper. At least, I didnāt see these in any tweets they did. They link to the evaluation reports + a response from the author and then leave it at that.
Once in a while, there will be a retrospective on how the replications are going as a whole. But I think they refrain from commenting on any paper.
If I had to rationalize why they did that, my guess is that replications are already an opt-in thing with lots of downside. And psychologically, editor commentary has a lot more potential for unpleasantness. Peer review tends to be anonymous so it doesnāt feel as personal because the critics are kept secret. But editor commentary isnāt secret...actually feels personal, and editors tend to have more clout.
Basically, I think the bar for an editor commentary post like this should be even higher than the usual process. And the usual evaluation process already allows for author review and response. So I think a āvalue-addedā post like this should pass a higher bar of diplomacy and insight.
Thanks for the thoughts. Note that Iām trying to engage/āreport here because weāre working hard to make our evaluations visible and impactful, and this forum seems like one of the most promising interested audiences. But also eager to hear about other opportunities to promote and get engagement with this evaluation work, particularly in non-EA academic and policy circles.
I generally aim to just summarize and synthesize what the evaluators had written and the authorsā response, bringing in what seemed like some specific relevant examples, and using quotes or paraphrases where possible. I generally didnāt give these as my opinions but rather, the author and the evaluatorsā. Although I did specifically give āmy takeā in a few parts. If I recall my motivation I was trying to make this a little bit less dry to get a bit more engagement within this forum. But maybe that was a mistake.
And to this I added an opportunity to discuss the potential value of doing and supporting rigorous, ambitious, and āliving/āupdatedā meta-analysis here and in EA-adjacent areas. I think your response was helpful there, as was the authors. Iād like to see othersā takes
Some clarifications:
The i4replication groups does put out replication papers/āreports in each case and submits these to journals, and reports on this outcome on social media . But IIRC they only āweigh inā centrally when they find a strong case suggesting systematic issues/āretractions.
Note that their replications are not āopt-inā: they aimed to replicate every paper coming out in a set of ātop journalsā. (And now, they are moving towards a research focusing on a set of global issues like deforestation, but still not opt-in).
Iām not sure what works for them would work for us, though. Itās a different exercise. I donāt see an easy route towards our evaluations getting attention through āsubmitting them to journalsā (which naturally, would also be a bit counter to our core mission of moving research output and rewards away from the ājournal publication as a static output.)
Also: I wouldnāt characterize this post as āeditor commentaryā, and I donāt think I have a lot of clout here. Also note that typical peer review is both anonymous and never made public. Weāre making all our evaluations public, but the evaluators have the option to remain anonymous.
But your point about a higher-bar is well taken. Iāll keep this under consideration.
For what itās worth, I thought Davidās characterization of the evaluations was totally fair, even a bit toned down. E.g. this is the headline finding of one of them:
David characterizes these as āconstructive and actionable insights and suggestionsā. I would say they are tantamount to asking for a new paper, especially the excluding of small studies, which was core to our design and would require a whole new search, which would take months. To me, it was obvious that I was not going to do that (the paper had already been accepted for publication at that point). The remaining suggestions also implied dozens ( hundreds?) of hours of work. Spending weeks satisfying two critics didnāt pass a cost-benefit test.[1] It wasnāt a close call.
really need to follow my own advice now and go actually do other projects š
I meant āconstructive and actionableā In that he explained why the practices used in the paper had potentially important limitations (see here on āassigning an effect size of .01 for n.s. results where effects are incalculableā)...
And suggested a practical response including a specific statistical package which could be applied to the existing data:
āAn option to mitigate this is through multiple imputation, which can be done through the
metansue(i.e., meta-analysis of non-significant and unreported effects) packageāIn terms of the cost-benefit test it depends on which benefit we are considering here. Addressing these concerns might indeed take months to do and might indeed cost hundreds of hours. Indeed, itās hard to justify this in terms of the current academic/ācareer incentives alone, as the paper had already been accepted for publication. If this weāre directly tied to grants there might be a case but as it stands I understand that it could be very difficult for you to take this further.
But I wouldnāt characterize doing this as simply āsatisfying two criticsā. The critiques themselves might be sound and relevant, and potentially impact the conclusion (at least in differentiating between āwe have evidence,ā the effects are small and āthe evidence is indeterminateā, which I think is an important difference). And the value of the underlying policy question (~āShould animal welfare advocates be using funding existing approaches to reducing mep consumption?ā) seems high to me. So I would suggest that the benefit exceeds the cost here in net even if we might not have a formula for making it worth your while to make these adjustments right now.
I also think there might be value in setting an example standard that, particularly for high-value questions like this, we strive for a high level of robustness, following up on a range of potential concerns and critiques etc. Iād like to see these things as long-run living projects that can be continuously improved and updated (and re-evaluated). The current research reward system doesnāt encourage this, which is a gap we are trying to help fill.
David, there are two separate questions here, which is whether these analyses should be done or whether I should have done them in response to the evaluations. If you think these analyses are worth doing, by all means, go ahead!
Seth, for what itās worth, I found your hourly estimates (provided in these forum comments but not something I saw in the evaluator response) on how long the extensions would take to be illuminating. Very rough numbers like this meta-analysis taking 1000 hours for you or a robustness check taking dozens /ā hundreds of hours more to do properly helps contextualize how reasonable the critiques are.
Itās easy for me (even now while pursuing research, but especially before when I was merely consuming it) to think these changes would take a few days.
Itās also gives me insight into the research production process. How long does it take to do a meta-analysis? How much does rigor cost? How much insight does rigor buy? What insight is possible given current studies? Questions like that help me figure out whether a project is worth pursuing and whether itās compatible with career incentives or more of a non-promotable task
Love talking nitty gritty of meta-analysis š
IMHO, the āmath hardā parts of meta-analysis are figuring out what questions you want to ask, what are sensible inclusion criteria, and what statistical models are appropriate. Asking how much time this takes is the same as asking, where do ideas come from?
The ābodybuilding hardā part of meta-analysis is finding literature. The evaluators didnāt care for our search strategy, which you could charitably call ābespokeā and uncharitably call āad hoc and fundamentally unreplicable.ā But either way, I read about 1000 papers closely enough to see if they qualified for inclusion, and then, partly to make sure I didnāt duplicate my own efforts, I recorded notes on every study that looked appropriate but wasnāt. I also read, or at least read the bibliographies of, about 160 previous reviews. Maybe youāre a faster reader than I am, but ballpark, this was 500+ hours of work.
Regarding the computational aspects, the git history tells the story, but specifically making everything computationally reproducible, e.g. writing the functions, checking my own work, setting things up to be generalizableāa week of work in total? Iām not sure.
The paper went through many internal revisions and changed shape a lot from its initial draft when we pivoted in how we treated red and processed meat. Thatās hundreds of hours. Peer review was probably another 40 hour workweek.
As I reread reviewer 2ā²s comments today, it occurred to me that some of their ideas might be interesting test cases for what Claude Code is and is not capable of doing. Iām thinking particularly of trying to formally incorporate my subjective notes about uncertainty (e.g. the many places where I admit that the effect size estimates involved a lot of guesswork) into some kind of...supplementary regression term about how much weight an estimate should get in meta-analysis? Like maybe Iād use Wasserstein-2 distance, as my advisor Don recently proposed? Or Bayesian meta-analysis? This is an important problem, and I donāt consider it solved by RoB2 or whatever, which means that fixing it might be, IDK, a whole new paper which takes however long that does? As my co-authors Don and Betsy & co. comment in a separate paper on which I was an RA:
> Too often, research syntheses focus solely on estimating effect sizes, regardless of whether the treatments are realistic, the outcomes are assessed unobtrusively, and the key features of the experiment are presented in a transparent manner. Here we focus on what we term landmark studies, which are studies that are exceptionally well-designed and executed (regardless of what they discover). These studies provide a glimpse of what a meta-analysis would reveal if we could weight studies by quality as well as quantity. [the point being, meta-analysis is not well-suited for weighing by quality.]
Itās possible that some of the proposed changes would take less time than that. Maybe risk of bias assessment could be knocked out in a week?. But itās been about a year since the relevant studies were in my working memory, which means Iād probably have to re-read them all, and across our main and supplementary dataset, thatās dozens of papers. How long does it take you to read dozens of papers? Iād say I can read about 3-4 papers a day closely if Iām really, really cranking. So in all likelihood, yes, weeks of work, and thatās weeks where I wouldnāt be working on a project about building empathy for chickens. Which admittedly Iām procrastinating on by writing this 500+ word comment š
A final reflective note: David, I want to encourage you to think about the optics/āpolitics of this exchange from the point of view of prospective Unjornal participants/āauthors. There are no incentives to participate. I did it because I thought it would be fun and I was wondering if anyone would have ideas or extensions that improved the paper. Instead, I got some rather harsh criticisms implying we should have written a totally different paper. Then I got this essay, which was unexpected/āunannounced and used, again, rather harsh language to which I objected. Do you think this exchange looks like an appealing experience to others? Iād say the answer is probably not.
A potential alternative: I took a grad school seminar where we replicated and extended other peopleās papers. Typically the assignment was to do the robustness checks in R or whatever, and then the author would come in and weād discuss. It was a great setup. It worked because the grad students actually did the work, which provided an incentive to participate for authors. The co-teachers also pre-selected papers that they thought were reasonably high-quality, and I bet that if they got a student response like Matthewās, they would have counseled them to be much more conciliatory, to remember that participation is voluntary, to think through the risks of making enemies (as I counseled in my original response), etc. I wonder if something like that would work here too. Like, the expectation is that reviewers will computationally reproduce the paper, conduct extensions and robustness checks, ask questions if they have them, work collaboratively with authors, and then publish a review summarizing the exchange. That would be enticing! Instead what I got here was like a second set of peer reviewers, and unusually harsh ones at that, and nobody likes peer review.
It might be the case that meta-analyses arenāt good candidates for this kind of work, because the extensions/ārobustness checks would probably also have taken Matthew and the other responder weeks, e.g. a fine end of semester project for class credit but not a very enticing hobby.
Just a thought.
I appreciate the feedback. Iām definitely aware that we want to make this attractive to authors and others, both to submit their work and to engage with our evaluations. Note that in addition to asking for author submissions, our team nominates and prioritizes high-profile and potential-high-impact work, and contact authors to get their updates, suggestions, and (later) responses. (We generally only require author permission to do these evaluations from early-career authors at a sensitive point in their career.) We are grateful to you for having responded to these evaluations.
I would disagree with this. We previously had author prizes (financial and reputational) focusing on authors who submitted work for our evaluation. although these prizes are not currently active. Iām keen to revise these prizes when the situation permits (funding and partners).
But there are a range of other incentives (not directly financial) for authors to submit their work, respond to evaluations and engage in other ways. I provide a detailed author FAQ here. This includes getting constructive feedback, signaling your confidence in your paper and openness to criticism, the potential for highly positive evaluations to help your paperās reputation, visibility, unlocking impact and grants, and more. (Our goal is that these evaluations will ultimately become the object of value in and of themselves, replacing āpublication in a journalā for research credibility and career rewards. But I admit thatās a long path.)
I would not characterize the evaluatorsā reports in this way. Yes, there was some negative-leaning language, which, as you know, we encourage the evaluators to tone down. But there were a range of suggestions (especially from JanĆ©) which I see as constructive, detailed, and useful, both for this paper and for your future work. And I donāt see this as them suggesting āa totally different paper.ā To large extent they agreed with the importance of this project, with the data collected, and with many of your approaches. They praised your transparency. They suggested some different methods for transforming and analyzing the data and interpreting the results.
I think itās important to communicate the results of our evaluations to wider audiences, and not only on our own platform. As I mentioned, I tried to fairly categorize your paper, the nature of the evaluations, and your response. Iāve adjusted my post above in response to some of your points where there was a case to be made that I was using loaded language, etc.
Would you recommend that I share any such posts with both the authors and the evaluators before making them? Itās a genuine question (to you and to anyone else reading these comments) - Iām not sure the correct answer.
As to your suggestion at the bottom, I will read and consider it more carefullyāit sounds good.
Aside: Iām still concerned with the connotation of replication, extension, and robustness checking being something that should be relegated to graduate students and not. This seems to diminish the value and prestige of work that I believe to be of the highest order practical value for important decisions in the animal welfare space and beyond.
In the replication/ārobustness checking domain, I think what i4replication.org is doing is excellent. Theyāre working with both graduate students and everyone from graduate students to senior professors to do this work and treating this as a high-value output meriting direct career rewards. I believe they encourage the replicators to be fair ā excessively conciliatory nor harsh, and focus on the methodology. We are in contact with i4replication.org and hoping to work with them more closely, with our evaluations and āevaluation gamesā offering grounded suggestions for robustness replication checks.
Yes. But zooming back out, I donāt know if these EA Forum posts are necessary.
A practice I saw i4replication (or some other replication lab) is that the editors didnāt provide any āvalue-addedā commentary on any given paper. At least, I didnāt see these in any tweets they did. They link to the evaluation reports + a response from the author and then leave it at that.
Once in a while, there will be a retrospective on how the replications are going as a whole. But I think they refrain from commenting on any paper.
If I had to rationalize why they did that, my guess is that replications are already an opt-in thing with lots of downside. And psychologically, editor commentary has a lot more potential for unpleasantness. Peer review tends to be anonymous so it doesnāt feel as personal because the critics are kept secret. But editor commentary isnāt secret...actually feels personal, and editors tend to have more clout.
Basically, I think the bar for an editor commentary post like this should be even higher than the usual process. And the usual evaluation process already allows for author review and response. So I think a āvalue-addedā post like this should pass a higher bar of diplomacy and insight.
Thanks for the thoughts. Note that Iām trying to engage/āreport here because weāre working hard to make our evaluations visible and impactful, and this forum seems like one of the most promising interested audiences. But also eager to hear about other opportunities to promote and get engagement with this evaluation work, particularly in non-EA academic and policy circles.
I generally aim to just summarize and synthesize what the evaluators had written and the authorsā response, bringing in what seemed like some specific relevant examples, and using quotes or paraphrases where possible. I generally didnāt give these as my opinions but rather, the author and the evaluatorsā. Although I did specifically give āmy takeā in a few parts. If I recall my motivation I was trying to make this a little bit less dry to get a bit more engagement within this forum. But maybe that was a mistake.
And to this I added an opportunity to discuss the potential value of doing and supporting rigorous, ambitious, and āliving/āupdatedā meta-analysis here and in EA-adjacent areas. I think your response was helpful there, as was the authors. Iād like to see othersā takes
Some clarifications:
The i4replication groups does put out replication papers/āreports in each case and submits these to journals, and reports on this outcome on social media . But IIRC they only āweigh inā centrally when they find a strong case suggesting systematic issues/āretractions.
Note that their replications are not āopt-inā: they aimed to replicate every paper coming out in a set of ātop journalsā. (And now, they are moving towards a research focusing on a set of global issues like deforestation, but still not opt-in).
Iām not sure what works for them would work for us, though. Itās a different exercise. I donāt see an easy route towards our evaluations getting attention through āsubmitting them to journalsā (which naturally, would also be a bit counter to our core mission of moving research output and rewards away from the ājournal publication as a static output.)
Also: I wouldnāt characterize this post as āeditor commentaryā, and I donāt think I have a lot of clout here. Also note that typical peer review is both anonymous and never made public. Weāre making all our evaluations public, but the evaluators have the option to remain anonymous.
But your point about a higher-bar is well taken. Iāll keep this under consideration.