I think the case of citations / scientific success is a bit subtle:
My guess is that the preferential attachment story applies most straightforwardly at the level of papers rather than scientists. E.g. I would expect that scientists who want to cite something on topic X will cite the most-cited paper on X rather than first looking for papers on X and then looking up the total citations of their authors.
I think the Sinatra et al. (2016) findings which we discuss in our relevant section push at least slightly against a story that says it’s all just about “who was first in some niche”. In particular, if preferential attachment at the level of scientists was a key driver, then I would expect authors who get lucky early in their career—i.e. publish a much-cited paper early—to get more total citations. In particular, citations to future papers by a fixed scientist should depend on citations to past papers by the same scientist. But that is not what Sinatra et al. find—they instead find that within the career of a fixed scientist the per-paper citations seem entirely random.
Instead their model uses citations to estimate an ‘intrinsic characteristic’ that differs between scientists—what they call Q.
(I don’t think this is very strong evidence that such an intrinsic quality ‘exists’ because this is just how they choose the class of models they fit. Their model fits the data reasonably well, but we don’t know if a different model with different bells and whistles wouldn’t fit the data just as well or better. But note that, at least in my view, the idea that there are ability differences between scientists that correlate with citations looks likely on priors anyway, e.g. because of what we know about GMA/the ‘positive manifold’ of cognitive tasks or garden-variety impressions that some scientists just seem smarter than others.)
The International Maths Olympiad (IMO) paper seems like a clear example of our ability to measure an ‘intrinsic characteristic’ before we’ve seen the analog of a citation counts. IMO participants are high school students, and the paper finds that even among people who participated in the IMO in the same year and got their PhD from the same department IMO scores correlate with citations, awards, etc. Now, we might think that maybe maths is extreme in that success there depends unusually much on fluid intelligence or something like that, and I’m somewhat sympathetic to that point / think it’s partly correct. But on priors I would find it very surprising if this phenomenon was completely idiosyncratic to maths. Like, I’d be willing to bet that scores at the International Physics Olympiad, International Biology Olympiad, etc., as well as simply GMA or high school grades or whatever, correlate with future citations in the respective fields.
The IMO example is particularly remarkable because it’s in the extreme tail of performance. If we’re not particularly interested in the tail, then I think some of studies on more garden-variety predictors such as GMA or personality we cite in the relevant section give similar examples.
Interesting! Many great threads here. I definitely agree that some component of scientific achievement is predictable, and the IMO example is excellent evidence for this. Didn’t mean to imply any sort of disagreement with the premise that talent matters; I was instead pointing at a component of the variance in outcomes which follows different rules.
Fwiw, my actual bet is that to become a top-of-field academic you need both talent AND to get very lucky with early career buzz. The latter is an instantiation of preferential attachment. I’d guess for each top-of-field academic there are at least 10 similarly talented people who got unlucky in the paper lottery and didn’t have enough prestige to make it to the next stage in the process.
It sounds like I should probably just read Sinatra, but its quite surprising to me that publishing a highly cited paper early in one’s career isn’t correlated with larger total number of citations, at the high-performing tail (did I understand that right? Were they considering the right tail?). Anecdotally I notice that the top profs I know tend to have had a big paper/ discovery early. I.e. Ed Boyden who I have been thinking of because he has interesting takes on metascience, ~invented optogenetics in his PhD in 2005 (at least I think this was the story?) and it remains his most cited paper to this day by a factor of ~3.
On the scientist vs paper preferential attachment story, I could buy that. I was pondering while writing my comment how much is person-prestige driven vs. paper driven. I think for the most-part you’re right that its paper driven but I decided this caches out as effectively the same thing. My reasoning was if number of citations per paper is power law-ish then because citations per scientist is just the sum of these, it will be dominated by the top few papers. Therefore preferential attachment on the level of papers will produce “rich get richer” on the level of scientists, and this is still an example of the things because its not an intrinsic characteristic.
That said, my highly anecdotal experience is that there is actually a per-person effect at the very top. I’ve been lucky to work with George Church, one of the top profs in synthetic biology. Folks in the lab literally talk about “the George Effect” when submitting papers to top journals: the paper is more attractive simply because George’s name is on it.
But my sense is that I should look into some of the refs you provided! (thanks :)
its quite surprising to me that publishing a highly cited paper early in one’s career isn’t correlated with larger total number of citations, at the high-performing tail (did I understand that right? Were they considering the right tail?
No, they considered the full distribution of scientists with long careers and sustained publication activity (which themselves form the tail of the larger population of everyone with a PhD).
That is, their analysis includes the right tail but wasn’t exclusively focused on it. Since by its very nature there will only be few data points in the right tail, it won’t have a lot of weight when fitting their model. So it could in principle be the case that if we looked only at the right tail specifically this would suggest a different model.
It is certainly possible that early successes may play a larger causal role in the extreme right tail—we often find distributions that are mostly log-normal, but with a power-law tail, suggesting that the extreme tail may follow different dynamics.
Thanks! I agree with a lot of this.
I think the case of citations / scientific success is a bit subtle:
My guess is that the preferential attachment story applies most straightforwardly at the level of papers rather than scientists. E.g. I would expect that scientists who want to cite something on topic X will cite the most-cited paper on X rather than first looking for papers on X and then looking up the total citations of their authors.
I think the Sinatra et al. (2016) findings which we discuss in our relevant section push at least slightly against a story that says it’s all just about “who was first in some niche”. In particular, if preferential attachment at the level of scientists was a key driver, then I would expect authors who get lucky early in their career—i.e. publish a much-cited paper early—to get more total citations. In particular, citations to future papers by a fixed scientist should depend on citations to past papers by the same scientist. But that is not what Sinatra et al. find—they instead find that within the career of a fixed scientist the per-paper citations seem entirely random.
Instead their model uses citations to estimate an ‘intrinsic characteristic’ that differs between scientists—what they call Q.
(I don’t think this is very strong evidence that such an intrinsic quality ‘exists’ because this is just how they choose the class of models they fit. Their model fits the data reasonably well, but we don’t know if a different model with different bells and whistles wouldn’t fit the data just as well or better. But note that, at least in my view, the idea that there are ability differences between scientists that correlate with citations looks likely on priors anyway, e.g. because of what we know about GMA/the ‘positive manifold’ of cognitive tasks or garden-variety impressions that some scientists just seem smarter than others.)
The International Maths Olympiad (IMO) paper seems like a clear example of our ability to measure an ‘intrinsic characteristic’ before we’ve seen the analog of a citation counts. IMO participants are high school students, and the paper finds that even among people who participated in the IMO in the same year and got their PhD from the same department IMO scores correlate with citations, awards, etc. Now, we might think that maybe maths is extreme in that success there depends unusually much on fluid intelligence or something like that, and I’m somewhat sympathetic to that point / think it’s partly correct. But on priors I would find it very surprising if this phenomenon was completely idiosyncratic to maths. Like, I’d be willing to bet that scores at the International Physics Olympiad, International Biology Olympiad, etc., as well as simply GMA or high school grades or whatever, correlate with future citations in the respective fields.
The IMO example is particularly remarkable because it’s in the extreme tail of performance. If we’re not particularly interested in the tail, then I think some of studies on more garden-variety predictors such as GMA or personality we cite in the relevant section give similar examples.
Interesting! Many great threads here. I definitely agree that some component of scientific achievement is predictable, and the IMO example is excellent evidence for this. Didn’t mean to imply any sort of disagreement with the premise that talent matters; I was instead pointing at a component of the variance in outcomes which follows different rules.
Fwiw, my actual bet is that to become a top-of-field academic you need both talent AND to get very lucky with early career buzz. The latter is an instantiation of preferential attachment. I’d guess for each top-of-field academic there are at least 10 similarly talented people who got unlucky in the paper lottery and didn’t have enough prestige to make it to the next stage in the process.
It sounds like I should probably just read Sinatra, but its quite surprising to me that publishing a highly cited paper early in one’s career isn’t correlated with larger total number of citations, at the high-performing tail (did I understand that right? Were they considering the right tail?). Anecdotally I notice that the top profs I know tend to have had a big paper/ discovery early. I.e. Ed Boyden who I have been thinking of because he has interesting takes on metascience, ~invented optogenetics in his PhD in 2005 (at least I think this was the story?) and it remains his most cited paper to this day by a factor of ~3.
On the scientist vs paper preferential attachment story, I could buy that. I was pondering while writing my comment how much is person-prestige driven vs. paper driven. I think for the most-part you’re right that its paper driven but I decided this caches out as effectively the same thing. My reasoning was if number of citations per paper is power law-ish then because citations per scientist is just the sum of these, it will be dominated by the top few papers. Therefore preferential attachment on the level of papers will produce “rich get richer” on the level of scientists, and this is still an example of the things because its not an intrinsic characteristic.
That said, my highly anecdotal experience is that there is actually a per-person effect at the very top. I’ve been lucky to work with George Church, one of the top profs in synthetic biology. Folks in the lab literally talk about “the George Effect” when submitting papers to top journals: the paper is more attractive simply because George’s name is on it.
But my sense is that I should look into some of the refs you provided! (thanks :)
No, they considered the full distribution of scientists with long careers and sustained publication activity (which themselves form the tail of the larger population of everyone with a PhD).
That is, their analysis includes the right tail but wasn’t exclusively focused on it. Since by its very nature there will only be few data points in the right tail, it won’t have a lot of weight when fitting their model. So it could in principle be the case that if we looked only at the right tail specifically this would suggest a different model.
It is certainly possible that early successes may play a larger causal role in the extreme right tail—we often find distributions that are mostly log-normal, but with a power-law tail, suggesting that the extreme tail may follow different dynamics.
Sorry meant to write “component of scientific achievement is predictable from intrinsic characteristics” in that first line