I was quite surprised/excited to see this on the forum, as I had literally just been thinking about it 5 minutes before opening up the forum!
From what I skimmed so far, I think it makes some good points/is a good start (although I’m not familiar enough in the area to judge); I definitely look forward to seeing some more work on this.
However, I was hoping to see some more discussion of the idea/problem that’s been bugging me for a while, which builds on your second point: what happens when later studies criticize/fail to replicate earlier studies’ findings? Is there any kind of repository/system where people can go to check if a given experiment/finding has received some form of published criticism? (i.e., one that is more efficient than scanning through every article that references the older study in the hopes of finding one that critically analyzes it).
I have been searching for such a system (in social sciences, not medicine/STEM) but thus far been unsuccessful—although I recognize that I may simply not know what to look for or may have overlooked it.
However, especially if such a system does not exist/has not been tried before, I would be really interested to get people’s feedback on such an idea. I was particularly motivated to look into this because in the field I’m currently researching, I came across one experiment/study that had very strange results—and when I looked a bit deeper, it appeared that either the experiment was just very poorly set up (i.e., it had loopholes for gaming the system) or the researcher accidentally switched treatment group labels (based on internal labeling inconsistencies in an early version of the paper). As I came to see how the results may have just been produced by undergrad students gaming the real-money incentive system, I’ve had less confidence in the latter outcome, but especially if the latter case were true it would be shocking… and perhaps unnoticed.[1] Regardless of what the actual cause was, this paper was cited by over 70 articles; a handful explicitly said that the paper’s findings led them to use one experimental method instead of another.
In light of this example (and other examples I’ve encountered in my time), I’ve thought it would be very beneficial to have some kind of system which tracks criticisms as well as dependency on/usage of earlier findings—beyond just “paper X is cited by paper Y” (which says nothing about whether paper Y cited paper X positively or negatively). By doing this, 1) there could be some academic standard that papers which emphasize or rely on (not just reference) earlier studies at least report whether those studies have received any criticism[2]; 2) If some study X reports that it relies on study/experiment Y, and study/experiment Y is later found to be flawed (e.g., methodological errors, flawed dataset, didn’t reproduce), the repository system could automatically flag study X’s findings as something appropriate like “needs review.”
But I’m curious what other people think! (In particular, are there already alternatives/things that try to deal with this problem, is the underlying problem actually that significant/widespread, is such a system feasible, would it have enough buy-in and would it help that much even if it did have buy-in, etc.)
(More broadly, I’ve long had a pipe dream of some kind of system that allows detailed, collaborative literature mapping—an “Epistemap”, if you will—but I think the system I describe above would be much less ambitious/more practical).
[1] This is a moderately-long digression/is not really all that important, but I can provide details if anyone is curious.
[2] Of course, especially in its early stage with limited adoption this system would be prone to false negatives (e.g., a criticism exists but is not listed in the repository), but aside from the “false sense of confidence” argument I don’t see how this could make it worse than the status quo.
A system somewhat similar to what you are talking about exists. Pubpeer, for example, is a place where post-publication peer reviews of papers are posted publicly (https://pubpeer.com/static/about). I’m not sure at this stage how much it is used, but in principle it allows you to see criticism on any article.
Scite.ai is also relevant—it uses AI to try and say whether citations of an article are positive or negative. I don’t know about its accuracy.
Neither of these address the problem of what happens if a study fails to replicate—often what happens is that the original study continues to be cited more than the replication effort.
Thanks for sharing those sources! I think a system like Pubpeer could partially address some of the issues/functions I mentioned, although I don’t think it quite went as far as I was hoping (in part because it doesn’t seem to have the “relies upon” aspect, but I also couldn’t find that many criticisms/analyses in the fields I’m more familiar with so it is hard to tell what kinds of analysis takes place there). The Scite.ai system seems more interesting—in part because I have specifically thought that it would be interesting to see whether machine learning could assist with this kind of semantic-richer bibliometrics.
Also, I wouldn’t judge based solely off of this, but the Nature article you linked has this quote regarding Scite’s accuracy: “According to Nicholson, eight out of every ten papers flagged by the tool as supporting or contradicting a study are correctly categorized.”
A couple of other new publication models that might be worth looking at are discussed here (Octopus and hypergraph, both of which are modular). Also this recent article about ‘publomics’ might have interesting ideas. Happy to talk about any of this if you are thinking about doing something in the space.
Those both seem interesting! I’ll definitely try to remember to reach out if I start doing more work in this field/on this project. Right now it’s just a couple of ideas that keep nagging at me but I’m not exactly sure what to do with them and they aren’t currently the focus of my research, but if I could see options for progress (or even just some kind of literature/discussion on the epistemap/repository concept, which I have not really found yet) I’d probably be interested.
I think it’s a really interesting, but also very difficult, idea. Perhaps one could identify a limited field of research where this would be especially valuable (or especially feasible, or ideally both), and try it out within that field as an experiment?
I would be very interested to know more if you have specific ideas of how to go about it.
Yeah, I have thought that it would probably be nice to find a field where it would be valuable (based on how much the field is struggling with these issues X the importance of the research), but I’ve also wondered if it might be best to first look for a field that has a fitting/acceptive ethos—i.e., a field where a lot of researchers are open to trying the idea. (Of course, that would raise questions about whether it could see similar buy-in when applied to different fields, but the purpose of such an early test would be to identify “how useful is this when there is buy-in?”)
At the same time, I have also recognized that it would probably be difficult… although I do wonder just how difficult it would be—or at least, why exactly it might be difficult. Especially if the problem is mainly about buy-in, I have thought that it would probably be helpful to look at similar movements like the shift towards peer-reviewing as well as the push for open data/data transparency: how did they convince journals/researchers to be more transparent and collaborative? If this system actually proved useful and feasible, I feel like it might have a decent chance of eventually getting traction (even if it may go slow).
The main concern I’ve had with the broader pipe dream I hinted at has been “who does the mapping/manages the systems?” Are the maps run by centralized authorities like journals or scientific associations (e.g., the APA), or is it mostly decentralized in that objects in the literature (individual studies, datasets, regressions, findings) have centrally-defined IDs but all of the connections (e.g., “X finding depends on Y dataset”, “X finding conflicts with Z finding”) are defined by packages/layers that researchers can contribute to and download from, like a library/buffet. (The latter option could allow “curation” by journals, scientific associations, or anyone else.) However, I think the narrower system I initially described would not suffer from this problem to the same extent—at least, the problems would not be more significant than those incurred with peer-review (since it is mainly just asking “1) does your research criticize another study? 2) What studies and datasets does your research rely on?”)
But I would definitely be interested to hear you elaborate on potential problems you see with the system. I have been interested in a project of this sort for years: I even did a small project last year to try the literature mapping (which had mixed-positive results in that it seemed potentially feasible/useful but I couldn’t find a great existing software platform to do both visually-nice mapping + rudimentary logic operations). I just can’t shake the desire to continue trying this/looking for research or commentary on the idea, but so far I really haven’t found all that much… which in some ways just makes me more interested in pursuing the idea (since that could suggest it’s a neglected idea… although it could also suggest that it’s been deemed impractical)
I was quite surprised/excited to see this on the forum, as I had literally just been thinking about it 5 minutes before opening up the forum!
From what I skimmed so far, I think it makes some good points/is a good start (although I’m not familiar enough in the area to judge); I definitely look forward to seeing some more work on this.
However, I was hoping to see some more discussion of the idea/problem that’s been bugging me for a while, which builds on your second point: what happens when later studies criticize/fail to replicate earlier studies’ findings? Is there any kind of repository/system where people can go to check if a given experiment/finding has received some form of published criticism? (i.e., one that is more efficient than scanning through every article that references the older study in the hopes of finding one that critically analyzes it).
I have been searching for such a system (in social sciences, not medicine/STEM) but thus far been unsuccessful—although I recognize that I may simply not know what to look for or may have overlooked it.
However, especially if such a system does not exist/has not been tried before, I would be really interested to get people’s feedback on such an idea. I was particularly motivated to look into this because in the field I’m currently researching, I came across one experiment/study that had very strange results—and when I looked a bit deeper, it appeared that either the experiment was just very poorly set up (i.e., it had loopholes for gaming the system) or the researcher accidentally switched treatment group labels (based on internal labeling inconsistencies in an early version of the paper). As I came to see how the results may have just been produced by undergrad students gaming the real-money incentive system, I’ve had less confidence in the latter outcome, but especially if the latter case were true it would be shocking… and perhaps unnoticed.[1] Regardless of what the actual cause was, this paper was cited by over 70 articles; a handful explicitly said that the paper’s findings led them to use one experimental method instead of another.
In light of this example (and other examples I’ve encountered in my time), I’ve thought it would be very beneficial to have some kind of system which tracks criticisms as well as dependency on/usage of earlier findings—beyond just “paper X is cited by paper Y” (which says nothing about whether paper Y cited paper X positively or negatively). By doing this, 1) there could be some academic standard that papers which emphasize or rely on (not just reference) earlier studies at least report whether those studies have received any criticism[2]; 2) If some study X reports that it relies on study/experiment Y, and study/experiment Y is later found to be flawed (e.g., methodological errors, flawed dataset, didn’t reproduce), the repository system could automatically flag study X’s findings as something appropriate like “needs review.”
But I’m curious what other people think! (In particular, are there already alternatives/things that try to deal with this problem, is the underlying problem actually that significant/widespread, is such a system feasible, would it have enough buy-in and would it help that much even if it did have buy-in, etc.)
(More broadly, I’ve long had a pipe dream of some kind of system that allows detailed, collaborative literature mapping—an “Epistemap”, if you will—but I think the system I describe above would be much less ambitious/more practical).
[1] This is a moderately-long digression/is not really all that important, but I can provide details if anyone is curious.
[2] Of course, especially in its early stage with limited adoption this system would be prone to false negatives (e.g., a criticism exists but is not listed in the repository), but aside from the “false sense of confidence” argument I don’t see how this could make it worse than the status quo.
A system somewhat similar to what you are talking about exists. Pubpeer, for example, is a place where post-publication peer reviews of papers are posted publicly (https://pubpeer.com/static/about). I’m not sure at this stage how much it is used, but in principle it allows you to see criticism on any article.
Scite.ai is also relevant—it uses AI to try and say whether citations of an article are positive or negative. I don’t know about its accuracy.
Neither of these address the problem of what happens if a study fails to replicate—often what happens is that the original study continues to be cited more than the replication effort.
Thanks for sharing those sources! I think a system like Pubpeer could partially address some of the issues/functions I mentioned, although I don’t think it quite went as far as I was hoping (in part because it doesn’t seem to have the “relies upon” aspect, but I also couldn’t find that many criticisms/analyses in the fields I’m more familiar with so it is hard to tell what kinds of analysis takes place there). The Scite.ai system seems more interesting—in part because I have specifically thought that it would be interesting to see whether machine learning could assist with this kind of semantic-richer bibliometrics.
Also, I wouldn’t judge based solely off of this, but the Nature article you linked has this quote regarding Scite’s accuracy: “According to Nicholson, eight out of every ten papers flagged by the tool as supporting or contradicting a study are correctly categorized.”
A couple of other new publication models that might be worth looking at are discussed here (Octopus and hypergraph, both of which are modular). Also this recent article about ‘publomics’ might have interesting ideas. Happy to talk about any of this if you are thinking about doing something in the space.
Those both seem interesting! I’ll definitely try to remember to reach out if I start doing more work in this field/on this project. Right now it’s just a couple of ideas that keep nagging at me but I’m not exactly sure what to do with them and they aren’t currently the focus of my research, but if I could see options for progress (or even just some kind of literature/discussion on the epistemap/repository concept, which I have not really found yet) I’d probably be interested.
I think it’s a really interesting, but also very difficult, idea. Perhaps one could identify a limited field of research where this would be especially valuable (or especially feasible, or ideally both), and try it out within that field as an experiment?
I would be very interested to know more if you have specific ideas of how to go about it.
Yeah, I have thought that it would probably be nice to find a field where it would be valuable (based on how much the field is struggling with these issues X the importance of the research), but I’ve also wondered if it might be best to first look for a field that has a fitting/acceptive ethos—i.e., a field where a lot of researchers are open to trying the idea. (Of course, that would raise questions about whether it could see similar buy-in when applied to different fields, but the purpose of such an early test would be to identify “how useful is this when there is buy-in?”)
At the same time, I have also recognized that it would probably be difficult… although I do wonder just how difficult it would be—or at least, why exactly it might be difficult. Especially if the problem is mainly about buy-in, I have thought that it would probably be helpful to look at similar movements like the shift towards peer-reviewing as well as the push for open data/data transparency: how did they convince journals/researchers to be more transparent and collaborative? If this system actually proved useful and feasible, I feel like it might have a decent chance of eventually getting traction (even if it may go slow).
The main concern I’ve had with the broader pipe dream I hinted at has been “who does the mapping/manages the systems?” Are the maps run by centralized authorities like journals or scientific associations (e.g., the APA), or is it mostly decentralized in that objects in the literature (individual studies, datasets, regressions, findings) have centrally-defined IDs but all of the connections (e.g., “X finding depends on Y dataset”, “X finding conflicts with Z finding”) are defined by packages/layers that researchers can contribute to and download from, like a library/buffet. (The latter option could allow “curation” by journals, scientific associations, or anyone else.) However, I think the narrower system I initially described would not suffer from this problem to the same extent—at least, the problems would not be more significant than those incurred with peer-review (since it is mainly just asking “1) does your research criticize another study? 2) What studies and datasets does your research rely on?”)
But I would definitely be interested to hear you elaborate on potential problems you see with the system. I have been interested in a project of this sort for years: I even did a small project last year to try the literature mapping (which had mixed-positive results in that it seemed potentially feasible/useful but I couldn’t find a great existing software platform to do both visually-nice mapping + rudimentary logic operations). I just can’t shake the desire to continue trying this/looking for research or commentary on the idea, but so far I really haven’t found all that much… which in some ways just makes me more interested in pursuing the idea (since that could suggest it’s a neglected idea… although it could also suggest that it’s been deemed impractical)