Anyone who has done desk research carefully knows that many citations don’t support the claim they’re cited for—usually in a subtle way, but sometimes a total nonsequitur. Here’s a fun list of 13 features we need to protect ourselves.
This seems to be a side effect of academia scaling so much in recent decades—it’s not that scientists are more dishonest than other groups, it’s that they don’t have time to carefully read everything in their sub-sub-field (… while maintaining their current arms-race publication tempo).
Take some claim P which is below the threshold of obviousness that warrants a citation.
It seems relatively easy, given current tech, to answer: (1) “Does the cited article say P?” This question is closely related to document summarisation—not a solved task, but the state of the art is workable. Having a reliable estimate of even this weak kind of citation quality would make reading research much easier—but under the above assumption of unread sources, it would also stop many bad citations from being written in the first place.
It is very hard to answer (2) “Is the cited article strong evidence for P?”, mostly because of the lack of a ground-truth dataset.
Language models for detecting bad scholarship
Epistemic institutions
Anyone who has done desk research carefully knows that many citations don’t support the claim they’re cited for—usually in a subtle way, but sometimes a total nonsequitur. Here’s a fun list of 13 features we need to protect ourselves.
This seems to be a side effect of academia scaling so much in recent decades—it’s not that scientists are more dishonest than other groups, it’s that they don’t have time to carefully read everything in their sub-sub-field (… while maintaining their current arms-race publication tempo).
Take some claim P which is below the threshold of obviousness that warrants a citation.
It seems relatively easy, given current tech, to answer: (1) “Does the cited article say P?” This question is closely related to document summarisation—not a solved task, but the state of the art is workable. Having a reliable estimate of even this weak kind of citation quality would make reading research much easier—but under the above assumption of unread sources, it would also stop many bad citations from being written in the first place.
It is very hard to answer (2) “Is the cited article strong evidence for P?”, mostly because of the lack of a ground-truth dataset.
We elaborate on this here.
(Thanks to Jungwon Byun and Gwern Branwen for comments.)