Key Takeaways

This post summaries a literature review I did as part of a project with Ben Snodin looking at how academic fields emerge out of obscurity, grow, and become self-sustaining. The main forum post on the entire project is here.
The topics / areas of the literature that seemed to be the most relevant to answering this question were: scientometrics (the quantitative study of science), technology forecasting in general, and innovation studies. From my brief look at these, scientometrics appeared to be the most relevant.
It is currently possible to make somewhat accurate predictions about which scientific areas are likely to experience rapid growth within the next 3 years using fairly simple measures, such as changes in the average age of references within the topic. However, using similar methods to forecast over timescales longer than 5 years looks quite difficult.
It is likely that pressures on scientists to maximise citation counts and publications can discourage exploration of novel areas, instead incentivising the production of low-risk papers in familiar, established areas. There is also some evidence that grants which are tolerant of early failure and reward long term success tend to generate more high impact publications^[1].
Analysis of coauthorship networks^[2] suggests that very young fields often start out with quite fragmented research networks, with many independent efforts being conducted alongside each other. For fields that go on to be successful, these networks tend to grow and then merge together, creating a much more connected cluster with shared concepts, themes and knowledge.
I didn’t come across much research on factors that affect the emergence of new scientific topics beyond factors related to paper metadata. It is possible that the effect of factors outside of paper metadata on emerging scientific topics is underrepresented in the literature^[3].

Context

Earlier this year I worked on a project with Ben Snodin looking at how academic fields emerge out of obscurity and grow into self-sustaining fields (the forum post on the entire project is here.

As a part of this project, I reviewed the existing literature on the early-stage development of scientific fields, on factors that influence innovation, and on techniques that aim to predict which scientific areas will soon experience rapid growth. In this post I will provide some highlights from the review.

I hope for this post to be useful for anyone interested in an overview of the literature surrounding technology forecasting, how scientific fields form, and the study of the scientific process itself.

Confidence and Transparency

I spent ~1.5 weeks at the start of the project looking at the literature. I wanted to catch all the main areas that contained relevant literature, as well as any especially relevant papers, so my review was quite broad and shallow. I suspect that if I were to spend another week looking at the literature now, the general research landscape that I describe would remain pretty similar^[3:1] - but a reasonable portion ( maybe ~50%) of the takeaway section could be replaced with better content.

I didn’t spend much / any time vetting the methodologies and conclusions of the papers I include in this post. Also, many of the points I make are based on a small number of papers (often just one).

The Research Landscape for Work on Early Field Development

The question of how fields grow out of obscurity and become self-sustaining is quite broad, and seems to be addressed by parts of quite a few different areas of literature. The main areas that I found to be useful for answering this question (to differing extents) were:

Scientometrics
Technology Development and Forecasting
Innovation Studies

Scientometrics

Scientometrics is the quantitative study of science and the scientific research process. Scientometrics is strongly related to the field of ‘Science of Science’, also known as SciSci. Scientometric work often involves analysing networks of scientific work generated from (usually large) datasets of published material. Scientometric work on how to define clusters in these networks (since clusters ~ research communities / topics / areas) and the dynamics of these clusters were particularly relevant to questions about emerging fields. Scientometrics also addresses many other topics, including career dynamics, citation dynamics, properties associated with impact, predictors of individual scientist achievement etc.

For those interested in more information, this paper is a really great (and short) summary of interesting scientometric research. The most widely used databases for scientometric work are Web of Science and Scopus. The main Journals for Scientometric work appear to be: Scientometrics, Quantitative Science Studies, Research Policy and the Journal of Informetrics.

Technology Development and Forecasting

In this rough grouping I have included both general models of technological development and more targeted forecasting of things like unit price for specific products, often suited to businesses. Regardless of generality, these two approaches both seem to focus on modelling / predicting technological progress in already well established technologies (usually predicting unit price or price per capability, e.g $/kWh for electricity generation). My impression is that these techniques tend to need a fair bit of historical data on a particular technology, which makes them less relevant to emerging areas.

Though I didn’t end up looking very deeply into this area, I think that this paper describing a generalisation of experience curves is worth mentioning, it shows a nice method to construct distributional forecasts of technological capabilities^[4].

Innovation Studies

Innovation Studies is a branch of social science concerned with the process of innovation, it is usually business focused and tends to concentrate on how management or organisational factors affect the outcome of innovative projects. I spent the least amount of time looking at this area, since it seemed like it would be difficult to apply these insights to whole academic fields.

Takeaways for Emerging Fields

Here are the highlights from the review. Basically all of the points here are from Scientometric work.

Prediction of Emerging Scientific Topics

There seems to be a lot of work aimed at predicting which new and small topics will soon grow rapidly using scientometric data (such as growth in paper counts, change in reference ages etc.). The US government somewhat recently funded research into this area through IARPA’s Foresight and Understanding from Scientific Exposition (FUSE) research program. This FUSE paper by researchers from SciTech Strategies seems to me to be a good example of a recently developed (and possibly unusually successful) working method of predicting which research areas will experience rapid growth. With this method, the authors were able to predict (with moderate accuracy) which research communities would have an annual compound growth rate of 10% or more over the forecast period, for forecasts of 3 years.^[5]^[6] For those interested in a wider overview of common emerging topic prediction methods, this paper detailing the results of a prediction competition might be a good place to look.

My vague impression is that many scientometric prediction techniques are likely to struggle with making accurate forecasts beyond five years without major changes, but may get significantly better at forecasting in the 3-4 year range within the next few (~5) years. This impression is based on some statements made by the authors of the technique I mentioned above, in combination with some of my own interpretation (which may be inaccurate)^[7].

Scientometric prediction work does have limits on the information it can provide us about early field growth. As far as I know, basically all scientometric prediction techniques are reliant on the existence of already published papers, so my impression is that scientometrics can’t really tell us much about scientific areas which have no papers, or a very small number (~ <20) papers. Scientometric research may also be unsuited for providing information on correlates / causes of topic emergence when these correlates / causes are unrelated to paper metadata (for example, the effects of funding changes or new demands).

How Scientific Networks Grow

Scientific networks are not just complex, they are also continuously evolving. The study of how networks change over time is called network dynamics, and investigations into the network dynamics of emerging topics are particularly relevant to understanding how young fields grow.

According to one paper, as a very young field emerges its co-authorship network^[2:1] tends to become more connected and unified^[8]. In the very early days of a field (when it is barely a field at all), the research network is often fragmented, with many independent efforts being conducted alongside each other. For fields that go on to be successful, these networks tend to grow and then merge together, creating a unified and much more connected cluster with shared concepts, themes, and knowledge.

The introduction of new journals may cause certain co-authorship networks to become much more “clustered” (again, according to one paper)^[9]. A possible explanation for this is that new journals can consolidate research communities that previously published their work in multiple, only somewhat relevant, journals. By consolidating the community under one journal, researchers could be more likely to encounter relevant work that they might previously have missed, which may result in a more connected and “clustered” network.

If journal creation does cause a large increase in clustered-ness, then perhaps it is possible to construct other interventions that also increase clustered-ness. And if it is also true that increasing clustered-ness is generally good for a field, then maybe this suggests that interventions that increase clustered-ness (~“scientific community building”) are a possible option for accelerating a field’s development.

Potential Triggers for Topic Emergence

I found one paper that looked at “triggers” for topic emergence^[10]. The authors assigned emergence triggers into three overarching trigger categories, with triggers able to be assigned to multiple categories:

scientific discovery (new findings),
technological innovation (new capabilities),
or exogenous factors (e.g a govt sponsorship).

They found that “discovery” and “exogenous” triggers were involved in the emergence of ~60% of emergent topics, with the most prevalent kind of exogenous factor being government actions.

The authors also note that while the influence of exogenous factors on topic emergence appears to be relatively significant, the presence of exogenous factors is probably difficult to detect when looking at the topic literature alone, and that exogenous factors are not often mentioned in bibliometric studies of topic emergence. This gives me a slight impression that the influence of exogenous factors on topic emergence is not very well studied.

Pressures on Researchers—Balancing Novelty and Security

The exploration of novel areas is necessary for the production of new topics and fields. Unfortunately, it seems that career pressures can discourage scientists from pursuing novel topics because a good strategy for maximising citations (and probably career security) seems to be to produce a steady stream of low-risk papers in a familiar, established area^[1:1].

Although scientists are strongly incentivised to pursue low-risk topics, it is worth noting that, at least according to one study, scientists do take more risks than would be expected if they were purely trying to maximise citations^[11]^[1:2]. They suggest that scientists are also be motivated by the desire to produce especially impactful science (e.g. for recognition, or because they find producing impactful science intrinsically rewarding) and that this drives higher levels of risk taking.

It is common for papers to suggest potential ways that working on riskier, more scientifically valuable topics could be incentivised. Suggestions include making funding more risk-tolerant, adding innovation or novelty scoring when evaluating grant applications, and funding people rather than projects^[12]^[1:3]. Some of these suggestions do seem to actually help in practice too. For example, grants which are more tolerant of early failure and better reward long-term success are reported to be more likely to generate high-impact publications^[1:4].

↩︎↩︎↩︎↩︎↩︎
Fortunato, S., Bergstrom, C. T., Börner, K., Evans, J. A., Helbing, D., Milojević, S., … & Barabási, A. L. (2018). Science of science. Science, 359(6379).
↩︎↩︎
Nodes are authors, edges are between co-authors.
↩︎↩︎
With the possible exception of science policy work that (at a glance) seems very relevant but which I wasn’t aware of during the review. Science policy work could be especially relevant to understanding ‘real world’ factors that influence the scientific landscape.
↩︎
For those interested, JD Farmer is quite a prominent researcher in this area and has a list of relevant papers on his website.
↩︎
They achieved critical success index (CSI) scores of around 0.25 overall for 3 year forecasts, with slightly higher CSI scores within certain technical disciplines. CSI score = (correct warnings of growth) / (all warnings + unwarned growths).
↩︎
Klavans R, Boyack KW, Murdick DA (2020)A novel approach to predicting exceptional growth in research. PLOS ONE 15(9): e0239177.
↩︎
The authors state that predictions of 3-4 years into the future are thought to be the ‘sweet spot’ for the method they use, and the reasons they give actually seem to apply to quite a large proportion of current prediction techniques in use. Any forecast less than 2 years will struggle because of how long it takes new research to be conducted and published. This is because for shorter forecasts, higher proportions of papers published within the forecast period will be unaffected by the original state when the prediction was made. For example, many would be going through the publication process at the time of prediction. This minimum constraint applies to all methods which use data on published papers, though the exact minimum will probably depend on the speed of a particular field’s “publishing cycle”. Forecasts of more than 5 years are reported to be difficult due to changes in the “underlying structure of research”. The authors use a clustering algorithm on a large network of papers to partition the network into “research communities”, and then predict which communities will have strongly grown by the forecast year. These “research communities” are basically analogous to small “areas of science” or “topics”. My impression is that if the overall network changes enough, then this also means that the true and current “research communities” change, such that the original “research communities” are no longer present to the extent needed for good predictions. If I am correct about this impression, then the ~ 5 year limit applies to any method which uses both some clustering algorithm to partition the network and doesn’t account for how those clusters also change over time as the network evolves. Unfortunately, this problem applies to many of the current most promising prediction methods. As a result, a much better understanding of scientific network dynamics seems likely be required to produce any decent forecasts beyond 5 years with this type of method.
↩︎
Bettencourt, L. M., Kaiser, D. I., & Kaur, J. (2009). Scientific discovery and topological transitions in collaboration networks. Journal of Informetrics, 3(3), 210-221.
↩︎
Sun, X., Kaur, J., Milojević, S., Flammini, A., & Menczer, F. (2013). Social dynamics of science. Scientific reports, 3(1), 1-6.
↩︎
Small, H., Boyack, K. W., & Klavans, R. (2014). Identifying emerging topics in science and technology. Research policy, 43(8), 1450-1467.
↩︎
Foster, J. G., Rzhetsky, A., & Evans, J. A. (2015). Tradition and innovation in scientists’ research strategies. American Sociological Review, 80(5), 875-908.
↩︎
Casadevall, A., & Fang, F. C. (2016). Revolutionary science.