Relatedly, you might be interested in these two footnotes discussing how impressive it is that Sinatra et al. (2016) - the main paper we discuss in the doc—can predict the evolution of the Hirsch index (a citation measure) over a full career based on the the Hirsch index after the 20 or 50 papers:
Note that the evolution of the Hirsch index depends on two things: (i) citations to future papers and (ii) the evolution of citations to past papers. It seems easier to predict (ii) than (i), but we care more about (i). This raises the worry that predictions of the Hirsch index are a poor proxy of what we care about – predicting citations to future work – because successful predictions of the Hirsch index may work largely by predicting (ii) but not (i). This does make Sinatra and colleagues’ ability to predict the Hirsch index less impressive and useful, but the worry is attenuated by two observations: first, the internal validity of their model for predicting successful scientific careers is independently supported by its ability to predict Nobel prizes and other awards; second, they can predict the Hirsch index over a very long period, when it is increasingly dominated by future work rather than accumulating citations to past work.
Acuna, Allesina, & Kording (2012) had previously proposed a simple linear model for predicting scientists’ Hirsch index. However, the validity of their model for the purpose of predicting the quality of future work is undermined more strongly by the worry explained in the previous footnote; in addition, the reported validity of their model is inflated by their heterogeneous sample that, unlike the sample analyzed by Sinatra et al. (2016), contains both early- and late-career scientists. (Both points were observed by Penner et al. 2013.)
Neat. I’d be curious if anyone has tried blinding the predictive algorithm to prestige: ie no past citation information or journal impact factors. And instead strictly use paper content (sounds like a project for GPT-6).
It might be interesting also to think about how talent vs. prestige-based models explain the cases of scientists whose work was groundbreaking but did not garner attention at the time. I’m thinking, e.g. of someone like Kjell Keppe who basically described PCR, the foundational molbio method, a decade early.
If you look at natural experiments in which two groups publish the ~same thing, but only one makes the news, the fully talent-based model (I think?) predicts that there should not be a significant difference between citations and other markers of academic success (unless your model of talent is including something about marketing which seems like a stretch to me).
Relatedly, you might be interested in these two footnotes discussing how impressive it is that Sinatra et al. (2016) - the main paper we discuss in the doc—can predict the evolution of the Hirsch index (a citation measure) over a full career based on the the Hirsch index after the 20 or 50 papers:
Neat. I’d be curious if anyone has tried blinding the predictive algorithm to prestige: ie no past citation information or journal impact factors. And instead strictly use paper content (sounds like a project for GPT-6).
It might be interesting also to think about how talent vs. prestige-based models explain the cases of scientists whose work was groundbreaking but did not garner attention at the time. I’m thinking, e.g. of someone like Kjell Keppe who basically described PCR, the foundational molbio method, a decade early.
If you look at natural experiments in which two groups publish the ~same thing, but only one makes the news, the fully talent-based model (I think?) predicts that there should not be a significant difference between citations and other markers of academic success (unless your model of talent is including something about marketing which seems like a stretch to me).