Citation metrics (total citations, h-index, g-index, etc.) are intended to estimate a researcher’s contribution to a field. However, if false claims get cited more then true claims (Serra-Garcia and Gneezy 2021), these citation metrics are clearly not fit for purpose.
I suggest modifying these citation metrics by weighing each paper by the probability that it will replicate. If each paper i has ci citations and probability of replicating pi, we can modify each formula as follows: instead of measuring total citations
TC=∑all ici, we consider credence weighted total citations CWTC=∑all ipici.Instead of using the h-index where we pick ‘the largest number h such that h articles have ci≥h’, we could use the credence weighted h-index where we pick the largest number h such that h articles have pici≥h. We can use this idea to modify citation metrics that evaluate researchers (as above), journals (Impact factor and CiteScore) and universities (rankings).
We can use prediction markets to elicit these probabilities, where the questions are resolved using a combination of large scale replication studies and surrogate scoring. DARPA SCORE is a proof of concept that this can be done on a large scale.
Prioritising credence weighted citation metrics over citation metrics, would improve the incentives researchers have. No longer will they have to compete with people who write 70 flimsy papers a year that no one actually thinks will replicate; now researchers who are right will be rewarded.
Credence Weighted Citation Metrics
Epistemic Institutions
Citation metrics (total citations, h-index, g-index, etc.) are intended to estimate a researcher’s contribution to a field. However, if false claims get cited more then true claims (Serra-Garcia and Gneezy 2021), these citation metrics are clearly not fit for purpose.
I suggest modifying these citation metrics by weighing each paper by the probability that it will replicate. If each paper i has ci citations and probability of replicating pi, we can modify each formula as follows: instead of measuring total citations TC=∑all ici, we consider credence weighted total citations CWTC=∑all ipici.Instead of using the h-index where we pick ‘the largest number h such that h articles have ci≥h’, we could use the credence weighted h-index where we pick the largest number h such that h articles have pici≥h. We can use this idea to modify citation metrics that evaluate researchers (as above), journals (Impact factor and CiteScore) and universities (rankings).
We can use prediction markets to elicit these probabilities, where the questions are resolved using a combination of large scale replication studies and surrogate scoring. DARPA SCORE is a proof of concept that this can be done on a large scale.
Prioritising credence weighted citation metrics over citation metrics, would improve the incentives researchers have. No longer will they have to compete with people who write 70 flimsy papers a year that no one actually thinks will replicate; now researchers who are right will be rewarded.