Abstract
To what extent is the citation rate of new papers influenced by the past social relations of their authors? To answer this question, we present a data-driven analysis of nine different physics journals. Our analysis is based on a two-layer network representation constructed from two large-scale data sets, INSPIREHEP and APS. The social layer contains authors as nodes and coauthorship relations as links. This allows us to quantify the social relations of each author, prior to the publication of a new paper. The publication layer contains papers as nodes and citations between papers as links. This layer allows us to quantify scientific attention as measured by the change of the citation rate over time. We particularly study how this change correlates with the social relations of their authors, prior to publication. We find that on average the maximum value of the citation rate is reached sooner for authors who have either published more papers or who have had more coauthors in previous papers. We also find that for these authors the decay in the citation rate is faster, meaning that their papers are forgotten sooner.
1. INTRODUCTION
The availability of large-scale data sets about journals and scientific publications therein, their authors, institutions, cited references, and citations obtained in other papers has boosted scientometric research in recent years. They allow us to address new research questions that go beyond the calculation of mere bibliographic indicators. These particularly concern the role of social influences on the success of papers, for example coauthorship relations (Sarigol, Pfitzner, et al., 2014) or the relations between authors and handling editors (Sarigol, Garcia, et al., 2017). Such investigations have contributed to a new scientific discipline, the science of success (Jadidi, Karimi, et al., 2018; Sinatra & Lambiotte, 2018).
But such data also allow us to redo traditional scientometric analyses on a much larger scale. In Parolo, Pan, et al. (2015), the dynamics of the citation rate (i.e., the change in the number of citations during a fixed time interval) is analyzed. The authors find that the change of the average citation rate follows two characteristic phases: first a growth phase and then a decay phase. Interestingly, the duration of the first and the speed of the second phase have changed over the years. This allows us to draw conclusions about how the collective attention of scientists towards a given paper has evolved between early and recent times.
In general, the dynamics of citations are extensively studied in the bibliometric literature. For example, the relation between the current number of citations and the citation rate was studied in Jeong, Néda, and Barabási, (2003). Citations were found to occur in bursts, with large bursts within a few years after publication (Eom & Fortunato, 2011). Concerning the scientific field of a paper, citations from papers in the same field tend to be obtained earlier than citations from papers in other fields (Rinia, Van Leeuwen, et al., 2001). Citation rates have also been used to classify papers (Avramescu, 1979; Li & Ye, 2014). Such classes often identify papers that receive citations earlier or later than the majority of papers (Ciotti, Bonaventura, et al., 2016; Colavizza & Franceschet, 2016; Costas, van Leeuwen, & van Raan, 2010). Papers in the second class (i.e., which receive their citations only a long time after publication) are often called sleeping beauties or delayed (Burrell, 2005; van Raan, 2004). Their citation rate and how it differs from other papers was studied extensively in Lachance and Larivière (2014). This class has also been thoroughly studied outside paper classification settings. It was found that “sleeping beauties” are extremely rare, and only 0.04% of papers published in 1988 were identified as such (van Raan, 2004). They were also found to occur especially often in multidisciplinary data sets (Ke, Ferrara, et al., 2015).
Recent progress in the study of scientometric systems has very much relied on representing them as networks. A first example is citation networks, representing papers as nodes and citations as their (directed) links. Such networks can be seen as a knowledge map of science (Leydesdorff, Carley, & Rafols, 2013). They can be also used to predict scientific success (Mazloumian, 2012). A second example is coauthorship networks, representing scientists as nodes and their coauthorships as links. While sociological studies (Cetina, 2009) just report that communication between coauthors can be very intricate, formal models of how such collaborations form on the structural level have also been developed (Guimera, 2005; Tomasello, Vaccario, & Schweitzer, 2017). To study collaboration patterns in a university faculty (Claudel, Massaro, et al., 2017), such coauthorship networks have been combined with a network encoding the physical distance between the faculty members. It was also analyzed how communities detected on a coauthorship network overlap with different research topics (Battiston, Iacovacci, et al., 2016).
These investigations have the drawback that they study citation networks and coauthorship networks separately from each other. As already emphasized (Clauset, Larremore, & Sinatra, 2017; Schweitzer, 2014), this becomes a problem if one wants to study social influence on citation dynamics. For example, based on a data set of Physical Review, it was shown that scientists cite former coauthors more often (Martin, Ball, et al., 2013). Therefore, a better approach is to combine both the citation and the coauthorship network in a multilayer network. Links between the citation and the coauthorship layer express the authorship of papers. Using such a representation, a method to detect citation cartels was proposed (Fister, Fister, & Perc, 2016). Further, the rate of citations dependence on the authors’ total number of citations was studied (Petersen, Fortunato, et al., 2014). However, it has not yet been investigated how the position of authors in the coauthorship network influences when their papers are cited. In this paper we study exactly this question.
Our analysis extends recent studies that focus on the success of papers as measured by their total number of citations. In Sarigol et al. (2014), this success was related to the position of the authors in a coauthorship network. It was shown that authors of successful papers are considerably more central (as quantified by various centrality measures) in the coauthorship network. We extend this by an analysis of the dynamics of the citation rate over time (i.e., when their papers are cited). To parametrize the citation dynamics, we resort to the phases identified in Parolo et al. (2015). We extend this work by relating these phases to the social relations of the authors.
Our paper is structured as follows. In section 2.1 we explain how citation dynamics can be measured by means of citation histories, which represent the collective attention given to a paper. In section 2.2 we describe the data sets used for our analysis. In section 3.1 we introduce the multilayer network to combine social information about authors with citation data. We then turn to our research question and study in sections 3.2 and 3.3 how the social relations of authors in the coauthorship network influence the collective attention. Lastly, in section 4 we conclude our findings.
2. METHODS AND DATA
2.1. Dynamics of Citation Rates
2.1.1. Measuring attention
Citations are often used as a measure of the success of a paper, accumulated over time. They have the advantage that they are objective in the sense that they are protocolled in the reference lists of citing papers. But the sheer number of citations does not utilize the temporal information (i.e., how many of these citations arrive at a given time). This is captured in the citation rate, which better estimates the attention a paper receives in a given time (interval). Individual attention (i.e., who cites a given paper at a given time), is not of interest for our study. We focus on collective attention (i.e., the aggregate over all authors who cite this paper during a given time interval). Obviously, the citation rate is only a proxy for this collective attention. One could additionally consider other attention measures like the altmetric score. But such information is only available for very recent publications and further is strongly biased against the use of social media. Therefore, we decide to restrict our study to using only the citation rate as a proxy for collective attention. Most papers are still cited because they have caught in some way the attention of the authors of the citing papers. Furthermore, citation counts were found to be a good approximation of scientific impact as perceived by scientists from the same field as the paper (Radicchi, Weissman, & Bollen, 2017).
2.1.2. Citation histories
2.1.3. Two phases in citation histories
2.2. Bibliographic Databases
As we argued in section 2, citations are particularly suitable to quantify the collective attention by scientists from the same field as a given paper. Therefore, in our analysis we study different journals separately, because each describes a topic-related community of authors and their papers. To obtain the data for our study we resort to large bibliographic databases which index papers across journals. They collect information such as a paper’s title, the list of authors, the date of publication, and also the list of references that a paper cites. We extracted this set of information for nine journals from two such databases in the same way as in Nanumyan, Gote, and Schweitzer (2020) and as explained below.
2.2.1. APS database
This indexes papers published in journals by the American Physical Society (APS). Access to the database can be requested for research purposes at https://journals.aps.org/datasets. We extracted the journals Physical Review (PR), Physical Review A (PRA), Physical Review C (PRC), Physical Review E (PRE), and Reviews of Modern Physics (RMP) to cover a wide range of physics sub-fields.
The APS database has the known issue of name disambiguation, because it indexes authors by their name and not by a unique identifier. This means that different authors with the same name are indexed as one author. Such a “multiauthor” then owns all papers and coauthorships that were actually accumulated by multiple authors. In contrast, one author whose name can be spelled in different ways may be indexed as different authors in the database. The consequence for our study is that such undisambiguated authors bias measures involving (co)authorships. This problem has already been discussed in the scientific literature, and a disambiguation algorithm specifically for authors in the APS database was proposed (Sinatra, Wang, et al., 2016). We applied this algorithm to the APS database to lower the bias from undisambiguated authors.
2.2.2. INSPIREHEP database
The second database, called INSPIREHEP, indexes papers relevant for the field of high-energy physics. This database can be downloaded at http://inspirehep.net/dumps/inspire-dump.html. In this database authors are disambiguated, because each author is indexed by a unique identifier. We extracted the journals Journal of High Energy Physics (JHEP), Physics Letters (Phys. Lett.), Nuclear Physics (Nuc. Phys.), and high energy physics literature in Physical Review journals (PR-HEP) from this database. These were the four largest journals in terms of number of citations from papers in the same journal (i.e., the citations which we will use to compute citation rates in the later sections).
In INSPIREHEP some indexed papers have exceptionally large lists of authors, which sometimes even exceed 1,000 authors. Such large-scale coauthorships were termed hyperauthorships in Cronin (2001). Concerns were raised that it is unclear which authors actually made substantial contributions to such papers (Cronin, 2001), and that the coauthorship network is not an accurate representation of the social network of authors (Newman, 2004). Indeed, every author in such a hyperauthorship gets possibly thousands of collaborators from just a single paper, despite likely not having collaborated with all of them personally. This introduces a bias for measures involving coauthorships, and thus for our study. It was found that hyperauthorships usually occur in papers from large experiments (Newman, 2001), such as the ATLAS experiment at CERN. To avoid this bias we remove experimental papers from the database. To identify experimental papers we used meta-tags that INSPIREHEP provides, so-called XML-tags. These are essentially labels for papers that provide additional information, such as arXiv identifiers, author affiliations, or sometimes estimates of whether a paper is experimental or theoretical. We removed all papers from the database that are explicitly tagged as experimental. But because this tag might be unavailable for a paper, we further removed all papers that are not explicitly tagged as theoretical work or work in general physics.
To summarize, Table 1 provides summary statistics of the nine journals. It further also shows how large these journals actually are. For example, there is only one journal, MP, which contains fewer than 10,000 authors, and there are more than 400,000 citations between papers in PRA.
Database . | Journal . | |Vp| . | |Va| . | |Epc| . | |Ea| . |
---|---|---|---|---|---|
APS | PR | 46728 | 24307 | 253312 | 87386 |
PRA | 69147 | 41428 | 416639 | 144806 | |
PRC | 36039 | 22672 | 253948 | 108844 | |
PRE | 49118 | 36382 | 182701 | 95796 | |
RMP | 3006 | 3788 | 5282 | 5044 | |
IH | JHEP | 15739 | 7994 | 191990 | 39056 |
PR-HEP | 44829 | 33908 | 213625 | 115237 | |
Phys. Lett. | 22786 | 18078 | 56332 | 53089 | |
Nuc. Phys. | 24014 | 18733 | 125252 | 60018 |
Database . | Journal . | |Vp| . | |Va| . | |Epc| . | |Ea| . |
---|---|---|---|---|---|
APS | PR | 46728 | 24307 | 253312 | 87386 |
PRA | 69147 | 41428 | 416639 | 144806 | |
PRC | 36039 | 22672 | 253948 | 108844 | |
PRE | 49118 | 36382 | 182701 | 95796 | |
RMP | 3006 | 3788 | 5282 | 5044 | |
IH | JHEP | 15739 | 7994 | 191990 | 39056 |
PR-HEP | 44829 | 33908 | 213625 | 115237 | |
Phys. Lett. | 22786 | 18078 | 56332 | 53089 | |
Nuc. Phys. | 24014 | 18733 | 125252 | 60018 |
3. SOCIAL INFLUENCE ON CITATION RATE
3.1. Multilayer Network Representation
3.1.1. Combining information about papers and authors
Our aim is to combine the information about collective attention, as proxied by the citation rate, with information about the social relations between authors. For the latter, we specifically focus on coauthorship, because this is the most objective and best documented relation. Again, this is a proxy because it neglects other forms of social relationships, such as friendship, personal encounters (e.g., during conferences), electronic communication, or relations in social media. But we do not have this type of information available for all authors over long periods. Therefore we restrict our analysis to the coauthorship network that can be constructed from the available data, as described below.
To relate information about authorship and about papers in a tractable manner, multilayer networks come into play, because they allow us to represent such separate information in different layers. The nodes on the first layer correspond to papers and the (directed) links to their citations. Different from this, the nodes in the second layer correspond to the authors and the links to their coauthorships (i.e., there is a link between two authors if they wrote at least one paper together). Then, there are links that connect nodes on the first layer with nodes on the second layer. These links correspond to the authorship relations (i.e., for every author, there is exactly one such link to each of her papers). We construct such a two-layer network for each of the nine journals in our data set to represent the information about citations between papers as well as about the authorships.
To summarize the above, Figure 2 illustrates the two layers of citation and coauthorship networks and their coupling. It further displays the temporal dimension: The multilayer network evolves over time because new papers are published, and hence new coauthors appear. As the timeline indicates, paper i is published at time δi and then accumulates citations in the future, at times δ > δi. The publication layer allows us to define the degree of a paper i as the number (δ) of papers that cite i until time δ (see Eq. 1 and Figure 2). Specifically, it is the in-degree, because the publication network is directed. The question is now how the citation rate of this paper evolves over time, conditional on the social information about its authors at time δi, which is the publication time of paper i. In other words, we analyze the impact of information from before this publication.
3.1.2. Quantifying authors’ social relations
The coauthorship layer allows us to define the degree of an author n as the total number of distinct coauthors kn(δi) that the author had before time δi. Degree is the simplest centrality measure for networks and reflects the local information about the embedding of an author in the social network. We use it here because it was shown recently (Nanumyan et al., 2020) that this measure is a particularly good predictor for the future citation rate.
3.1.3. Parametrizing citation rates
The quantities (δi) and (δi) are based on the information of the authors of paper i. Our goal is to determine how they influence the citation dynamics of paper i (i.e., we need an analytically tractable parametrization of the citation rates). To parametrize the citation dynamics we resort to the two characteristic phases of citation histories mentioned in section 2.1. The first phase corresponds to increasing citation rates, and we parametrize by its duration , because we have no more precise knowledge about a general functional form of this phase. The second phase corresponds to an exponential decay, and we parametrize it as the parameter τi in Eq. 3 (i.e., the so-called lifetime). Both parameters, and τi, are illustrated in Figure 1.
We now have four parameters to summarize the information about paper i. The first two parameters are (δi) and (δi), which characterize the authors of paper i. The other two parameters are and τi, which characterize the citation history.
3.1.4. Excluding incomplete citation histories
Obviously, our data sets only contain papers published before the release date of the respective database. Hence, the time-span on which we can compute a given paper’s citation history is also limited by this date. This introduces an issue, especially for recent papers: The observable period of the citation history can be so short that the decay phase has not yet started at all. To account for this, we omitted all papers that were published within the last 5 years before the release of the respective database. Hence, for all papers in our study the citation histories are covered over at least 5 years. In addition, we also removed those papers whose citation rate is nondecreasing in the latest year, as this is a sign that the respective paper has not yet reached its decay phase.
3.2. Time to the Peak Citation Rate
3.2.1. Regressions
3.2.2. Fitted parameters
In Table 2 we show the fitted parameters for all journals. Except for one coefficient, all parameters β are negative, which means that peak delays get smaller for increasing numbers of previous coauthors or publications. The exception is JHEP, which has a positive β for the number of previous publications. However, this coefficient is not significant, meaning that it is likely not different from zero, and therefore does not contradict the discovered trend. To conclude, we find that the larger the number of previous coauthors or publications is, the sooner the peak citation rate is reached.
si . | Time . | α . | β . |
---|---|---|---|
PR | |||
NC | years | 0.768 | −0.022*** |
pubs | 0.903 | −0.003* | |
NP | years | 0.721 | −0.010*** |
pubs | 0.922 | −0.006*** | |
PRA | |||
NC | years | 1.055 | −0.001*** |
pubs | 0.637 | 0.000 | |
NP | years | 1.095 | −0.005*** |
pubs | 0.625 | 0.001 | |
PRC | |||
NC | years | 1.141 | −0.000* |
pubs | 1.132 | 0.000 | |
NP | years | 1.130 | −0.000 |
pubs | 1.141 | −0.000 | |
PRE | |||
NC | years | 0.907 | −0.001*** |
pubs | 0.971 | −0.000 | |
NP | years | 0.941 | −0.004*** |
pubs | 0.992 | −0.002** | |
RMP | |||
NC | years | 1.987 | −0.004* |
pubs | 1.482 | −0.003 | |
NP | years | 2.022 | −0.008* |
pubs | 1.534 | −0.008* | |
JHEP | |||
NC | years | 0.050 | −0.002 |
pubs | −0.251 | −0.001 | |
NP | years | 0.035 | 0.000 |
pubs | −0.286 | −0.000 | |
PR-HEP | |||
NC | years | 1.292 | −0.005*** |
pubs | 1.870 | −0.000* | |
NP | years | 1.321 | −0.004*** |
pubs | 1.907 | −0.001*** | |
Phys. Lett. | |||
NC | years | 0.813 | −0.007*** |
pubs | 1.088 | −0.009*** | |
NP | years | 0.824 | −0.004*** |
pubs | 1.094 | −0.005*** | |
Nuc. Phys. | |||
NC | years | 1.156 | −0.007*** |
pubs | 1.199 | −0.005*** | |
NP | years | 1.211 | −0.005*** |
pubs | 1.239 | −0.004*** |
si . | Time . | α . | β . |
---|---|---|---|
PR | |||
NC | years | 0.768 | −0.022*** |
pubs | 0.903 | −0.003* | |
NP | years | 0.721 | −0.010*** |
pubs | 0.922 | −0.006*** | |
PRA | |||
NC | years | 1.055 | −0.001*** |
pubs | 0.637 | 0.000 | |
NP | years | 1.095 | −0.005*** |
pubs | 0.625 | 0.001 | |
PRC | |||
NC | years | 1.141 | −0.000* |
pubs | 1.132 | 0.000 | |
NP | years | 1.130 | −0.000 |
pubs | 1.141 | −0.000 | |
PRE | |||
NC | years | 0.907 | −0.001*** |
pubs | 0.971 | −0.000 | |
NP | years | 0.941 | −0.004*** |
pubs | 0.992 | −0.002** | |
RMP | |||
NC | years | 1.987 | −0.004* |
pubs | 1.482 | −0.003 | |
NP | years | 2.022 | −0.008* |
pubs | 1.534 | −0.008* | |
JHEP | |||
NC | years | 0.050 | −0.002 |
pubs | −0.251 | −0.001 | |
NP | years | 0.035 | 0.000 |
pubs | −0.286 | −0.000 | |
PR-HEP | |||
NC | years | 1.292 | −0.005*** |
pubs | 1.870 | −0.000* | |
NP | years | 1.321 | −0.004*** |
pubs | 1.907 | −0.001*** | |
Phys. Lett. | |||
NC | years | 0.813 | −0.007*** |
pubs | 1.088 | −0.009*** | |
NP | years | 0.824 | −0.004*** |
pubs | 1.094 | −0.005*** | |
Nuc. Phys. | |||
NC | years | 1.156 | −0.007*** |
pubs | 1.199 | −0.005*** | |
NP | years | 1.211 | −0.005*** |
pubs | 1.239 | −0.004*** |
3.2.3. Size of the effect
We also study the size of the dependence between a paper’s peak-delay, , and the number of previous coauthors, , or publications, . To this end, we use our fitted models to predict the average peak-delay for given and for each journal. Figure 3 shows these predictions. Let us first focus on the number of previous coauthors in Figure 3 (left). We see that for all journals except RMP the predicted average is always less than 4 years, irrespective of the number of previous coauthors. For RMP, papers with no authors take around 7.5 years on average to reach the peak, but this number then also decreases to 4 years at roughly 150 previous coauthors.
We further point out the differences in speed across journals at which the peak-delays decrease for increasing numbers of previous coauthors. For example, papers in the journal PR-HEP reach the peak citation rate on average after 3.75 years for zero previous coauthors. This duration changes to roughly 2.5 years for papers with 100 previous coauthors. This is different from the journal PRE. There, a paper reaches the peak citation rate on average after 2.5 years for zero previous coauthors, which stays almost the same even at 100 previous coauthors. This means that journals have a large impact on the time when citations occur, especially with respect to the prospective decrease as the number of coauthors grows. Figure 3 also shows confidence bands for the predicted average . These are narrow for all journals except one, because of the large numbers of papers used in the model fits. For the exception, RMP, only 214 papers were used, which is why its confidence bands are wider.
Figure 3 (right) shows the average predicted by the number of previous publications, . The main difference from Figure 3 (left) is that now also the peak-delays for the journals PRA and PRE decrease noticeably for increasing numbers of previous publications. For example, is on average equal to 3 years for zero previous publications, but this number drops to 1 year for 200 previous publications. This means that, to receive citations earlier in these journals, increasing the number of publications appears to be a more successful strategy than increasing the number of coauthors.
To summarize, the negative binomial regression models show that for increasing numbers of previous coauthors or publications the highest citation rate is reached sooner. They also identify differences in the benefit of high numbers of coauthors or publications across journals: For journals such as PRC there is almost no decrease in peak delay, even with 200 previous coauthors. But for journals such as PR, papers that already have 50 previous coauthors reach their peak on average in less than half the time of papers with zero previous coauthors.
3.3. Characteristic Decay Time
3.3.1. Regressions
3.3.2. Fitted parameters
si . | Time . | ατ . | βτ . |
---|---|---|---|
PR | |||
NC | years | 0.714 | −0.082*** |
pubs | 0.859 | −0.013 | |
NP | years | 0.687 | −0.032*** |
pubs | 0.866 | −0.020* | |
PRA | |||
NC | years | 0.978 | −0.138*** |
pubs | 0.833 | −0.056*** | |
NP | years | 0.964 | −0.142*** |
pubs | 0.827 | −0.058*** | |
PRC | |||
NC | years | 0.996 | −0.071*** |
pubs | 1.017 | −0.052*** | |
NP | years | 0.986 | −0.083*** |
pubs | 1.013 | −0.063*** | |
PRE | |||
NC | years | 0.779 | −0.041*** |
pubs | 0.808 | −0.049*** | |
NP | years | 0.768 | −0.036*** |
pubs | 0.789 | −0.038*** | |
RMP | |||
NC | years | 1.328 | −0.272*** |
pubs | 1.281 | −0.337*** | |
NP | years | 1.305 | −0.247*** |
pubs | 1.224 | −0.281*** | |
JHEP | |||
NC | years | 0.573 | −0.061*** |
pubs | 0.512 | −0.002 | |
NP | years | 0.524 | −0.026*** |
pubs | 0.490 | 0.011 | |
PR-HEP | |||
NC | years | 0.917 | −0.159*** |
pubs | 1.284 | −0.054*** | |
NP | years | 0.902 | −0.135*** |
pubs | 1.269 | −0.039*** | |
Phys. Lett. | |||
NC | years | 0.846 | −0.080*** |
pubs | 0.918 | −0.123*** | |
NP | years | 0.849 | −0.070*** |
pubs | 0.916 | −0.102*** | |
Nuc. Phys. | |||
NC | years | 0.971 | −0.117*** |
pubs | 0.997 | −0.116*** | |
NP | years | 0.997 | −0.119*** |
pubs | 1.012 | −0.111*** |
si . | Time . | ατ . | βτ . |
---|---|---|---|
PR | |||
NC | years | 0.714 | −0.082*** |
pubs | 0.859 | −0.013 | |
NP | years | 0.687 | −0.032*** |
pubs | 0.866 | −0.020* | |
PRA | |||
NC | years | 0.978 | −0.138*** |
pubs | 0.833 | −0.056*** | |
NP | years | 0.964 | −0.142*** |
pubs | 0.827 | −0.058*** | |
PRC | |||
NC | years | 0.996 | −0.071*** |
pubs | 1.017 | −0.052*** | |
NP | years | 0.986 | −0.083*** |
pubs | 1.013 | −0.063*** | |
PRE | |||
NC | years | 0.779 | −0.041*** |
pubs | 0.808 | −0.049*** | |
NP | years | 0.768 | −0.036*** |
pubs | 0.789 | −0.038*** | |
RMP | |||
NC | years | 1.328 | −0.272*** |
pubs | 1.281 | −0.337*** | |
NP | years | 1.305 | −0.247*** |
pubs | 1.224 | −0.281*** | |
JHEP | |||
NC | years | 0.573 | −0.061*** |
pubs | 0.512 | −0.002 | |
NP | years | 0.524 | −0.026*** |
pubs | 0.490 | 0.011 | |
PR-HEP | |||
NC | years | 0.917 | −0.159*** |
pubs | 1.284 | −0.054*** | |
NP | years | 0.902 | −0.135*** |
pubs | 1.269 | −0.039*** | |
Phys. Lett. | |||
NC | years | 0.846 | −0.080*** |
pubs | 0.918 | −0.123*** | |
NP | years | 0.849 | −0.070*** |
pubs | 0.916 | −0.102*** | |
Nuc. Phys. | |||
NC | years | 0.971 | −0.117*** |
pubs | 0.997 | −0.116*** | |
NP | years | 0.997 | −0.119*** |
pubs | 1.012 | −0.111*** |
3.3.3. Size of the effect
We also intend to study the size of the dependence between decay exponents τi and the number of previous coauthors or publications . To this end, we visualize the estimated average decay parameters for the different journals in Figure 4. We focus on the description of the number of previous coauthors, Figure 4 (left), because overall both plots convey a similar message. We see that for papers with zero previous coauthors, the decay exponents are below 10 for all journals, except for RMP, which attains a decay exponent below 30. We further point out that papers in the journal JHEP have the smallest decay exponents even for up to 1,000 previous coauthors. This in turn means that decays in this journal tend to be particularly fast compared to the other journals.
3.4. Rescaling Time by Counting Publications
3.4.1. Effect of the growing scientific output
It is known that the number of papers published every year grows exponentially over time (Price, 1951). This means that in recent years there are more papers published in a given time interval than was the case longer ago. All of these new publications can potentially cite a given paper. This time dependence likely affects our regression results by confounding the respective response ( or τi) and predictor variable ( or ). In the past it was suggested that the dependence of the citation rate on the publication year of a paper can be weakened by counting time in terms of the number of published papers instead of absolute time (days, weeks, years, etc.; Parolo et al., 2015). Therefore we repeat our regressions from section 3.2 and 3.3, and while measuring time on this alternative timescale. Thereby we assess whether such a bias from the publication year of a paper is present in the relations that we found.
3.4.2. Results for the alternative timescale
The fitted parameters are listed in the pubs rows in Table 2 for the peak-delay models and in Table 3 for the decay models. They remain smaller than 0, except for three journals: PRA, PRC, and JHEP. For PRA and PRC the fitted parameters β are positive for the peak-delay models with the number of previous coauthors, , as predictor. However, neither of these parameters is significantly different from 0. For JHEP the fitted parameter, , is positive for the decay model with the previous number of publications, , as predictor. However, this parameter is also not significantly different from 0. Only one significantly positive parameter occurs in the whole study, namely for PRA with the number of previous publications, , as predictor. The fitted parameters for all other journals are either negative or insignificantly different from zero, as was the case when measuring time in years. This means that, also according to the alternative timescale, for most journals the citation rate peak is reached faster for papers by authors with more previous coauthors or publications. Accordingly, the decay becomes steeper for papers by such authors.
4. CONCLUSIONS
In this paper, we address the question of how the attention towards an academic publication is accumulated over time, depending on the social relations of its authors, as expressed in the coauthorship network. For example, does the attention mostly occur in an early phase right after publication? Or is it rather spread uniformly over time? Or might it even happen only after a long time has passed since publication? To obtain a tractable, objective characterization of attention, we proxy attention by the citation rate of a paper (i.e., the number of new citations obtained in a particular time interval). We argue that, in order for a citation to occur, the authors of the citing paper have to be aware of the cited paper.
To study the time when this attention occurs, we compute the change in the number of citations over a time interval (i.e., the citation rate). It is known that the citation rates of most papers have two characteristic phases over time, namely an increasing phase followed by a decay phase. We found that the first phase tends to get shorter and the decay in the second phase tends to get faster for papers written by authors who have many previous coauthors. We also found that for some journals the time to the peak citation rate is almost halved within the first 100 previous coauthors, while for other journals it stays almost unchanged. Such a difference is also present in the decay exponents for different journals.
In terms of attention, our findings mean that papers written by authors with more previous coauthors attract attention faster, but are then also forgotten sooner. We also found this effect when measuring the number of previous papers of the authors instead of the number of previous coauthors. Furthermore, this effect also persisted when we controlled for the time when a paper was published. But most importantly, we found this effect in nine journals, based on hundreds of thousands of authors and papers and far more than a million citations. A study on such a large scale is a strong sign that we have uncovered a general trend that is not limited to the analyzed data sets.
4.1. A Speculative Explanation
Which mechanisms could be responsible for this? One way how authors learn about the papers which they cite is through communication with other scientists. Hence, authors can use their (few or many) social contacts, proxied by coauthors, to “advertise” a paper. Our findings indicate that authors with many previous coauthors or papers tend to do so within a short period of time after publication. When a new publication is made, the authors “advertise” it to the scientific community by presenting it in conferences and seminars, by sharing it on social media, etc. This behaviour happens within a finite time period, after which the authors stop actively promoting the given publication. However, this explanation is merely speculative at this point.
4.2. Regressions Not Suitable for Predictions
Our performed regressions have low predictive power, as indicated by extremely small coefficients of determination, R2. For instance, for some regressions the R2 is as low as 0.001, meaning that only 0.1% of the variance in the dependent variable is explained. However, while our regression models are not useful for prediction, our inferred relations are significant. In particular our regressions show that the time to the peak citation rate and the subsequent decay are not independent of the authors.
4.3. No Causal Relations Studied
In our study, we focus on the detection of the dependence between citation rate and social relations of the authors. However, we do not (yet) aim to understand the actual mechanisms behind it. In other words, we study associations between measures of social relations and citation histories, but we do not aim to detect causal relationships between them. For example, our study does not guarantee that a paper gets scientific attention faster simply by replacing its authors by scientists with larger publication or coauthor counts. Instead, we observe such faster attention among papers whose authors were not actively chosen based on their past social relations.
4.4. Future Work
In the future, we also intend to study causal relationships. Such a study will allow us to determine why authors with many previous publications or coauthors tend to write papers that receive scientific attention faster. To this end, we can use generative modeling to learn more about these underlying mechanisms. For instance, hypotheses can be formulated and tested using the framework of coupled growth models presented in Nanumyan et al. (2020).
We find that a paper receives attention from the scientific community faster, the more coauthors the authors had prior to its publication. But we find as well that such a paper is also forgotten sooner again afterwards. Our findings indeed highlight that the citations of a paper can have substantially different dynamics depending on the social relations of the authors. Furthermore, our approach illustrates how such coupled dynamics can be studied by representing scientific collaborations in a multilayer network.
AUTHOR CONTRIBUTIONS
Christian Zingg: Conceptualization, Data curation, Formal analysis, Software, Validation, Visualization, Writing—original draft, Writing—review & editing. Vahan Nanumyan: Conceptualization, Data curation, Formal analysis, Software, Validation, Visualization, Writing—original draft, Writing—review & editing. Frank Schweitzer: Conceptualization, Formal analysis, Project administration, Supervision, Visualization, Writing—original draft, Writing—review & editing.
COMPETING INTERESTS
The authors have no competing interests.
FUNDING INFORMATION
No funding has been received for this research.
DATA AVAILABILITY
We use two large bibliographic databases, APS and INSPIREHEP. Access to the APS database can be requested for research purposes at https://journals.aps.org/datasets. Access to the INSPIREHEP database is possible either as a download or through an API as explained on its website https://inspirehep.net/. For this paper, we downloaded the INSPIREHEP database.
ACKNOWLEDGMENTS
The authors would like to thank all reviewers for their comments and Luca Verginer and Giacomo Vaccario for discussions concerning the negative binomial regression models.
REFERENCES
Author notes
Handling Editor: Ludo Waltman