## Abstract

Diversity in human capital is widely seen as critical to creating holistic and high-quality research, especially in areas that engage with diverse cultures, environments, and challenges. Quantification of diverse academic collaborations and their effect on research quality is lacking, especially at international scale and across different domains. Here, we present the first effort to measure the impact of geographic diversity in coauthorships on the citation of their papers across different academic domains. Our results unequivocally show that geographic coauthor diversity improves paper citation, but very long distance collaborations have variable impact. We also discover “well-trodden” collaboration circles that yield much less impact than similar travel distances. These relationships are observed to exist across different subject areas, but with varying strengths. These findings can help academics identify new opportunities from a diversity perspective, as well as inform funders on areas that require additional mobility support.

## PEER REVIEW

## 1. INTRODUCTION

International collaboration is a key part of scientific research, with the exchange of ideas from diverse sources leading to numerous breakthroughs. A recent paper by Sugimoto, Robinson-Garcia et al. (2017) showed that researchers with affiliations to more than one country during their career, so-called “mobile” researchers, had a significant boost in citations over their non-mobile colleagues. Indeed, several well-established international initiatives (Marie Curie Staff Exchange, German DAAD, Royal Society International Exchange) fund researcher mobility between countries and across disciplines. An important facilitator in long-distance collaboration is the ease of air transportation between locations.

### 1.1. Relevant Research

Collaboration in science is not new. Despite being often seen as a contemporary practice, research collaboration has always existed—although many collaborators were invisible from the authors’ lists (Shapin, 1989). Already in the early 20th century, a scientist like Einstein—who is wrongly seen as a “lone genius”—was collaborating with colleagues on many aspects of his research (Janssen & Renn, 2015; Pyenson, 1985). The first discipline to exhibit collaboration in the form of coauthorship was chemistry: 34% of papers in the field had more than one author, compared with 10% in physics and less than 1% in mathematics (Gingras, 2010).

After the Second World War, the large influx of research funding and the era of “big science” led to an important rise in collaboration activities and, as consequence, of multi-authored papers (Wuchty, Jones, & Uzzi, 2007). Since the beginning of the 1950s, most papers in the natural and medical sciences have more than one author (Cronin, Shaw, & La Barre, 2003; Franceschet & Costantini, 2010; Galison, 2003; Persson, Glänzel, & Danell, 2004; Wuchty et al., 2007), while single authorship remained the norm in social sciences and humanities until the early 2000s (Larivière, Gingras et al., 2015). In the latter group of disciplines, social sciences and arts and humanities have distinct practices: While the majority of papers in social sciences are the results of collaboration, single authorship remains the norm in arts and humanities (Larivière, Gingras, & Archambault, 2006). At the other end of the spectrum, fields such as high-energy physics have author lists that have gone beyond 5,000 names, a phenomenon named *hyperauthorship* (Cronin, 2005). Such decline in single authorship had long been predicted (Price, 1986), and shown empirically in the work of Harriet Zuckerman (1967). Indeed, focusing on Nobel Laureates between 1900 and 1959, she shows that after 1920, most of the laureates’ papers are the result of collaboration. The rise in collaborative activities can also be linked with an increase in international collaboration (Sonnenwald, 2007; Wagner & Leydesdorff, 2005), which is also observed in all fields but the arts and humanities (Larivière et al., 2006). Such growth is observed both in terms of the share of papers that are in international collaboration and the number of countries involved (Larivière et al., 2015).

#### 1.1.1. Multifaceted nature of collaboration

Several factors can be associated with this rise in researchers’ collaborative activities. The first factor is the ease with which technology has allowed researchers to communicate and conduct research (Katz & Martin, 1997). Since the advent of the digital age, technologies, such as the Internet, email, and online communication platforms, such as Skype, Zoom, and Teams, have allowed researchers to exchange data, meet, and write papers at a distance with much more ease than what was previously possible. Despite these technologies, previous research shows that there remains an effect of distance, where researchers are more likely to collaborate with colleagues that are physically closer (Abramo, D’Angelo, & Di Costa, 2009; Catalini, 2018; Gieryn, 2002; Hoekman, Frenken, & Tijssen, 2010). Another factor is its epistemic effect—that is, its effect on scientific impact (Wray, 2002). Science is increasingly complex, and larger teams are therefore necessary to tackle contemporary scientific problems. This has been shown empirically, as collaborative research is associated with higher citation rates (Franceschet & Costantini, 2010; Narin, Stevens, & Whitlow, 1991; Wuchty et al., 2007). This is specifically true for international collaboration (Glänzel, 2001). This can also be associated with infrastructure: Big science infrastructures have become so expensive that they have to be shared, often internationally. This is particularly true for smaller countries (Luukkonen, Persson, & Sivertsen, 1992). This positive relationship has been observed already in the early 20th century (Larivière et al., 2015). A third factor is policies from funders and universities. Indeed, some countries have made policies that emphasized collaboration, especially international (Abramo et al., 2009) or interdisciplinary (National Academy of Sciences, National Academy of Engineering, and Institute of Medicine, 2005). Such policies are based on the fact that countries’ resources are limited, and that collaboration is considered to lead to more important scientific results. A fourth factor is specialization: In a context where researchers are increasingly specialized, collaboration allows for researchers with complementary expertise to work together on a research problem (Franceschet & Costantini, 2010).

#### 1.1.2. Importance of distance and diversity

Despite the importance of digital technology in making long-distance collaboration possible, in person collaborations are still conducted. In this context, the possibility of traveling between two cities can be hypothesized to have an effect on the likelihood of collaboration, and reduce the effect of physical distance. Previous analyses (Ploszaj, Yan, & Börner, 2020) have been performed, using data on flight capacity and frequency, as well as collaboration. Using a sample of four universities in the United States, they have shown that more flights between cities and the proximity of airports to universities are linked with higher numbers of collaborations. Unsurprisingly, collaboration was higher in cases where direct flights can be obtained between the cities. Catalini, Fons-Rosen, and Gaulé (2020) also show that not only does travel cost constitute a friction to collaboration, a reduction to this friction leads to a increase in higher-quality projects. However, air travel is not necessarily associated with academic success. Research by Wynes, Donner et al. (2019) has shown, using a sample of researchers from the University of British Columbia (Canada) that, once controlling for age and discipline, air travel emissions were not associated with higher impact measures, although traveling was associated with higher salaries. Recent work at university level by Guo, Del Vecchio, and Pogrebna (2017) showed that the connectivity of universities via the air transport network is an important indicator of ranking growth for the universities, even after accounting for economic development.

### 1.2. Contribution

Building on these ideas, we use the air transport network to quantify the geographical diversity in paper coauthorships. The air transport network is a network of connections between cities (*nodes*) where the *edges* are flights. We use it to define measures of diversity between the researchers based in these cities, with full details provided on how we do this in Section 5. We focus on establishing a link between the geographical diversity of coauthors on a given paper and the number of citations that paper receives. As shown in Figure 1, a novelty is to develop distance and entropy measures for diversity on the coauthorship network and evaluate the variation of the Average Relative Citation (ARC) score against these.

The rest of the paper is structured as follows. In Section 2 we present the key results. In Section 3, we present the robustness of our results to potential confounding variables, such as the effect of university rankings. In Section 4 we examine the results by subject area and location, in order to examine subject and geographic specific differences. We provide details of the data and methods we use for this analysis in Section 5. We discuss implications for individual academics, universities, funders, and government policy in Section 6. In the Supplementary material, we include some additional results.

## 2. RESULTS

### 2.1. Main Discoveries

#### 2.1.1. Diverse collaborations lead to higher citations

Our primary main discovery is that for a relatively simple notion of diversity measured by the entropy of the probability of forming a collaboration, the ARC score is highly correlated with the entropy, as seen in Figure 2(a). We are aware of certain confounding variables, chiefly the potential effect that university rankings have on citations (Clauset, Arbesman, & Larremore, 2015). We show that this correlation persists even when accounting for this. We also reveal some popular “well-trodden” two-, three-, and four-way collaboration paths in Figures 2(b)–(c).

#### 2.1.2. Well-trodden paths and extreme distances lead to relatively lower citations

Our secondary main discovery is that the aforementioned “well-trodden” paths yield relatively lower citations than similar distances and that extremely long distance collaborations have variable or reduced citations. Using the air transport network distance metric, we show in Figure 3(a) how diversity initially benefits collaboration until distance takes its toll and impedes frequent exchange of ideas. Local spikes in the number of collaborations exist in the general data set, specific academic domains, and specific countries. These spikes correspond to well-trodden collaboration paths—see Figures 2(b)–(c) (highlighted by a black box in Figure 3) also correspond to local “dips” in ARC scores. That is to say, well-trodden collaboration paths do not yield as much citation as similar distances between other collaboration locations. We observe this pattern across all domains and countries, but note exaggerated effects in certain cases (e.g., long-distance collaboration is more detrimental in clinical medicine, possibly due to the practical and timely nature of its practice).

#### 2.1.3. A north-south divide exists in collaborative research

Finally, our third main discovery is that a divide exists in the composition of collaborative research, with most collaborations occurring between researchers located in the Global North. When looking at pairs of collaborations (where a collaboration between more than two authors contains multiple pairs), we see from Figure 1(b) that 94% of collaboration pairs are between researchers in the Global North.

### 2.2. Detailed Analysis of Effect of Distance, Diversity, and University Rank on ARC Scores

In Figure 1(e), we briefly introduce four important measures whose relationship with ARC scores we are interested in investigating. We give a more detailed explanation of these here, with the full derivation of the measures presented in Section 5. We also identify some key patterns we see in the relationships with ARC score, which can be seen in Figure 4.

- Collaboration distance: average weighted airport network distance. This is a measure of the average distance between collaborators on a given paper. The distance is the weighted network distance on the flight network. Based on the work of Gastner and Newman (2006), an edge on the network is assigned a weightwhere$effectivelengthofedgeij=\lambda dij+1\u2212\lambda $(1)
*d*_{ij}is the Euclidean distance between nodes*i*and*j*, and*λ*is a parameter that controls the importance of physical distance against graph distance. From Figure 4(a), we see a positive correlation between citations and this measure of distance. However, past a certain point, we see that the number of citations decreases. We can conjecture that the large average distance could mean that these coauthors are in remote areas, geographically and in terms of transport links. Collaboration diversity: weighted airport network distance entropy. This measure also looks at the weighted network distance between coauthors. It uses a more direct measure of diversity—the entropy of these distances. In Figure 4(b) we see that as this measure of diversity increases, the number of citations also increases consistently, showing a clear trend between diversity and citations.

Alternative collaboration diversity: weighted entropy of coauthor location. In this alternative measure of diversity, we consider the entropy of the geographic locations of the coauthors. In this case a weighted entropy measure is used (not to be confused with the weighted distances introduced previously). The “weight” in this case incorporates the centrality of nodes on the flight network, as well as university rankings. Again we see in Figure 4(c) that as this measure of diversity increases, the number of citations also increases consistently, showing a clear trend between diversity and citations.

Important confounding factor: average university rank weight. This measure weights cities by the average world ranking of the universities located within a certain radius. This is important to consider, as the reputation of a university can have a significant effect on the number of citations received by papers produced by its researchers (Clauset et al., 2015). In Figure 4(d) we see a strong correlation between the university rank weights and number of citations. This effect seems to flatten out somewhat as the average weight increases. This could be indicating that the effect of university rankings is less important for the top universities. However, it could also come from our specific choice of the construction of the weights. The exact nature of this relationship is outside the scope of this work.

In each of the plots comprising Figure 4 the data are binned. In each case, we also plot the number of papers that are in each bin. In addition to the main results already presented, we see that the variability of the ARC score increases for large values of each of these measures. We can see that these cases correspond to a very small number of papers, so this is not unexpected.

### 2.3. Robustness of Results to Parameter Choices and Confounding Variables

There are two key situations in which we check the robustness of the results obtained. The first of these concerns the key configuration parameter *λ*, which controls the balance between Euclidean distance and flight hop distance in Eq. 1. In our case, we choose a value of *λ* = $110,000$, as this gives some interpretability, which we lose for larger choices, as detailed in Section 5. However, the results we observe can also be seen for different choices of *λ*. One exception to this is that for much larger choices, such as *λ* = $15$, the weighted distances are completely dominated by the Euclidean distances. In this case we lose the interpretation of “well-trodden paths.” Further discussion is presented in the Supplementary material.

Second, as noted, it is well known that there is a strong link between university rankings and paper citations (Clauset et al., 2015). The relationship of interest in our case is therefore the effect that our distance and diversity measures have on ARC score, specifically not occurring via university rankings (as this is a relationship that is already well understood). To disentangle these effects, we explicitly account for the confounding effects of unversity rankings. We see that the patterns already observed still persist having done so. In Section 3 we present the full analysis controlling for this effect. In particular, the results displayed in Tables 1 and 2 give evidence to support our claims.

**Table 1.**

Method . | $x\u02c6$* . | $b\u02c6$_{1}
. | p-value
. | $b\u02c6$_{2}
. | p-value
. |
---|---|---|---|---|---|

Before adjusting | 1.60 | 0.25 | 0.00 | −0.08 | 0.00 |

After adjusting | 1.65 | 0.24 | 0.00 | −0.04 | 0.00 |

Method . | $x\u02c6$* . | $b\u02c6$_{1}
. | p-value
. | $b\u02c6$_{2}
. | p-value
. |
---|---|---|---|---|---|

Before adjusting | 1.60 | 0.25 | 0.00 | −0.08 | 0.00 |

After adjusting | 1.65 | 0.24 | 0.00 | −0.04 | 0.00 |

## 3. STATISTICAL ANALYSIS OF RESULTS

So far we have presented results that have been largely qualitative in nature. We have observed two distinct trends in the ARC score with increasing average distance and entropy of distance between coauthors. However, we now wish to quantify these results. Motivated by the patterns of the points in Figure 4(a), we first define a model to check for the existence, location, and significance of the “peak” we observe in the relationship between average weighted network distance and ARC score.

### 3.1. Average Weighted Airport Network Distance

*a*

_{1},

*b*

_{1},

*a*

_{2},

*b*

_{2}are such that

*f*(

*x*) is continuous at

*x**. The model is fitted for a range of values

*x**, and is optimized to find the value of

*x** for which the residual sum of squares is lowest. The optimal value $x\u02c6$* gives the estimated location of the peak. We can test whether a statistically significant peak exists by checking that the corresponding gradients $b\u02c6$

_{1}, $b\u02c6$

_{2}are significantly ≥0 and ≤0 respectively

^{1}. In Figure 5 we see an example of what this fit looks like. Our analysis confirms what we intuitively saw in Figure 4(a), with a statistically significant increase and decrease in ARC before and after the peak

^{2}. We emphasize that our goal here is not to accurately model the relationship that we observe, but merely to confirm the existence of this peaked shape that we see in the data. For this purpose, a simple piecewise linear model works well. More complicated models may capture the relationship better, but that is outside the scope of this work.

This does not yet tell the full story. As before, we can test for the pattern detailed above after removing the effect of university rankings, as mentioned in Section 2.3. The effect that they have on citations received by papers is already well studied (Clauset et al., 2015). We can see this clearly if we plot the (binned) university rank weights (as defined in Eq. 6) against the ARC scores. We do this in Figure 6 and see an almost linear relationship.

Disentangling how much of the relationship between average weighted distance and ARC score occurs via university ranks is a potentially difficult task, and we do not focus on that in our work. Instead, we take a conservative approach, removing as much of the effect of university ranks as possible by directly fitting ARC score against average university rank weights, and removing that effect before fitting the piecewise linear model of ARC score against average weighted distance. Specifically, letting *y*_{ARC} be the ARC score for each paper, *d*_{AV} be the average weighted airport network distance between the coauthors, and *w*_{AV} the average university rank weights of the coauthor locations, we first estimate $y\u02c6$_{ARC} from *y*_{ARC} ∼ *w*_{AV}. Then we fit our piecewise model *y*_{ARC} − $y\u02c6$_{ARC} ∼ *f*(*d*_{AV}), where *f*(*x*) is defined as in Eq. 2.

We compare the unadjusted fit (as seen in Figure 5) with the corresponding fit having adjusted for the effect of the university ranks in this way, with the results given in Table 1. We see that the observed increase stays almost constant, as does the peak location. However, the decrease that we observe seems to be at least partly tied in the university ranks.

Further analysis is presented in the Supplementary material, where we use stratification to support the results presented here.

### 3.2. Weighted Airport Network Distance Entropy

We now investigate the relationship between weighted airport network distance entropy and ARC score. In Figure 4(b) we see that the ARC score increases as the entropy increases. To test whether this increase is significant, the first step is to fit a linear model of ARC score against weighted distance entropy, having accounted for university rankings. Specifically, letting *y*_{ARC} be the ARC score for each paper, *d*_{ENT} be the average weighted airport network distance between the coauthors and *w*_{AV} the average university rank weights of the coauthor locations, we first estimate $y\u02c6$_{ARC} from *y*_{ARC} ∼ *w*_{AV}. Then we fit the simple model *y*_{ARC} − $y\u02c6$_{ARC} ∼ *d*_{ENT}. Again, we emphasize that our goal here is not to accurately model the relationship that we observe, and that other models may provide a better fit than the linear model that we use. However, our goal is simply to confirm the existence of a statistically significant trend.

In Table 2 we see the estimated parameters from fitting the above model, and from fitting the model without adjusting for university rankings. In each case, we see a significant increase in ARC score as distance entropy increases. In Figure 7 we see the fit of the model, having accounted for university rankings. A linear model does not capture the behavior of the data as well as the piece wise linear model fit for the average weighted distance metric. In fact, it looks as though the ARC scores initially decrease as the entropy increases. The reason for this is that we fit the model with the full data, but plot the binned data. As we can see from the numbers of papers in each bin, most of the bins have very few values, and the model fit is dominated by the two large spikes. Thus, in Figure 7, the higher ARC scores for very small values of the distance entropy are somewhat misleading, as are the corresponding results for very large values of the distance entropy.

## 4. COMPARISONS

Having defined methods to analyze our results quantitatively, and to control for the effect of university rankings, we now break the overall results down by academic field and coauthor location, in order to gain a better insight into the trends that are occurring.

### 4.1. Results by Academic Field

#### 4.1.1. Average weighted airport network distance

First, we compare different fields based on the location of the peak in the relationship between average weighted network distance and ARC score. We also compare the gradients before and after, to see how prominent the peak is. In Table 3 we see the results. There are several interesting features we notice here. Firstly, we see that for all the fields but one, there is a significant positive relationship until a point. Secondly, we notice that we can broadly split the different fields into three different categories, based on the patterns exhibited:

Fields such as Social Sciences, Clinical Medicine and Biomedical Research, which exhibit the peaked form described earlier, with significant increases and decreases.

Fields such as Physics, Engineering and Technology and Psychology, which exhibit a significant initial positive relationship, but subsequently plateau, with no significant positive or negative relationship.

Mathematics, which does not seem to exhibit any significant relationship.

**Table 3.**

Field . | $x\u02c6$* . | $b\u02c6$_{1}
. | p-value
. | $b\u02c6$_{2}
. | p-value
. |
---|---|---|---|---|---|

Social Sciences | 1.37 | 0.38 | 0.00 | −0.10 | 0.01 |

Engineering and Technology | 1.43 | 0.26 | 0.00 | 0.01 | 0.64 |

Professional Fields | 1.46 | 0.46 | 0.00 | −0.15 | 0.00 |

Clinical Medicine | 1.65 | 0.34 | 0.00 | −0.10 | 0.00 |

Physics | 1.65 | 0.21 | 0.00 | −0.01 | 0.79 |

Health | 1.67 | 0.27 | 0.00 | −0.07 | 0.33 |

Biomedical Research | 1.69 | 0.25 | 0.00 | −0.06 | 0.00 |

Chemistry | 1.76 | 0.11 | 0.00 | 0.04 | 0.13 |

Earth and Space | 1.86 | 0.25 | 0.00 | −0.09 | 0.00 |

Psychology | 1.90 | 0.22 | 0.00 | −0.01 | 0.75 |

Biology | 2.72 | 0.07 | 0.00 | −0.01 | 0.54 |

Mathematics | 3.96 | 0.01 | 0.65 | 0.17 | 0.19 |

Field . | $x\u02c6$* . | $b\u02c6$_{1}
. | p-value
. | $b\u02c6$_{2}
. | p-value
. |
---|---|---|---|---|---|

Social Sciences | 1.37 | 0.38 | 0.00 | −0.10 | 0.01 |

Engineering and Technology | 1.43 | 0.26 | 0.00 | 0.01 | 0.64 |

Professional Fields | 1.46 | 0.46 | 0.00 | −0.15 | 0.00 |

Clinical Medicine | 1.65 | 0.34 | 0.00 | −0.10 | 0.00 |

Physics | 1.65 | 0.21 | 0.00 | −0.01 | 0.79 |

Health | 1.67 | 0.27 | 0.00 | −0.07 | 0.33 |

Biomedical Research | 1.69 | 0.25 | 0.00 | −0.06 | 0.00 |

Chemistry | 1.76 | 0.11 | 0.00 | 0.04 | 0.13 |

Earth and Space | 1.86 | 0.25 | 0.00 | −0.09 | 0.00 |

Psychology | 1.90 | 0.22 | 0.00 | −0.01 | 0.75 |

Biology | 2.72 | 0.07 | 0.00 | −0.01 | 0.54 |

Mathematics | 3.96 | 0.01 | 0.65 | 0.17 | 0.19 |

Last, if we examine the point at which there is no longer a positive relationship (either the peak or the start of the plateau), then we see differences between the field. In Table 3 we have sorted the fields by the estimate of $x\u02c6$*, and we see that for fields such as Biology and Psychology increasing the average weighted network distance has a positive effect on ARC scores for much longer than for fields such as Social Sciences and Engineering and Technology.

#### 4.1.2. Weighted airport network distance entropy

We can perform the same comparison for the weighted distance entropy measure. In this case, we rank the subjects based on their estimated coefficients. We see from Table 4 that while the positive relationship between entropy and ARC score exists for every subject considered, the strength of that relationship varies greatly. Mathematics and Chemistry exhibit a much weaker relationship than the other subjects, while Social Sciences and Clinical Medicine exhibit the strongest relationship. An important factor to consider here is the number of coauthors that papers in each field generally have. This measure of diversity only makes sense for papers with more than two coauthors, but we know that medical papers can sometimes have very large numbers of authors, while mathematics papers often have only a handful. It may be valuable to examine further how this factor impacts the differing relationships we see here.

**Table 4.**

Field . | $b\u02c6$ . | p-value
. |
---|---|---|

Mathematics | 0.15 | 0.00 |

Chemistry | 0.18 | 0.00 |

Psychology | 0.26 | 0.00 |

Professional Fields | 0.28 | 0.00 |

Biology | 0.29 | 0.00 |

Physics | 0.29 | 0.00 |

Engineering and Technology | 0.30 | 0.00 |

Health | 0.30 | 0.00 |

Earth and Space | 0.35 | 0.00 |

Biomedical Research | 0.38 | 0.00 |

Social Sciences | 0.43 | 0.00 |

Clinical Medicine | 0.56 | 0.00 |

Field . | $b\u02c6$ . | p-value
. |
---|---|---|

Mathematics | 0.15 | 0.00 |

Chemistry | 0.18 | 0.00 |

Psychology | 0.26 | 0.00 |

Professional Fields | 0.28 | 0.00 |

Biology | 0.29 | 0.00 |

Physics | 0.29 | 0.00 |

Engineering and Technology | 0.30 | 0.00 |

Health | 0.30 | 0.00 |

Earth and Space | 0.35 | 0.00 |

Biomedical Research | 0.38 | 0.00 |

Social Sciences | 0.43 | 0.00 |

Clinical Medicine | 0.56 | 0.00 |

### 4.2. Results by City

Second, we compare the collaborations involving certain cities to investigate differences in the collaboration patterns of their researchers. In Figure 8(a) we see the plot of average weighted network distance against ARC score for Beijing, with Figures 8(b) and 8(c) showing the results for Boston and London respectively. The three patterns we can see are noticeably different. For Beijing and London, there are clear peaks, but the peak for London occurs at less than half that of Beijing. Meanwhile, for Boston, it appears that there is no peak at all. A closer examination reveals that while there does still appear to be a peaked relationship, some collaborations only a small distance away from Boston but with very high ARC scores are distorting this result.

This is certainly interesting in terms of understanding how these cities collaborate with others. However, a slight complication arises when comparing cities in this way. Although we can see three distinct patterns here, it is not yet clear how much of these differences arises from fundamentally different behaviors of the researchers in these cities, and how much is simply due to the geographies of the cities. For example, we might expect that the most productive collaborations for researchers from Beijing are those with large American centers of research, which would generally be a weighted network distance of 2–3 away. Similarly, for researchers from London, the weighted network distances to major European and American centres of research will be roughly between 1.2 and 1.9. Finally, the highly productive collaborations that researchers from Boston have are often from nearby Cambridge (home to Harvard and MIT), or other East Coast cities with large research institutions.

To try to reduce these geographical effects, we can compare cities where we imagine that the geographical effects would be similar. We see some of these comparisons in Table 5. From this, we can see that even between cities with similar geographical effects, there can be a significant difference in the observed patterns, especially with regards to the magnitude of the initial positive effect that increasing diversity has.

**Table 5.**

City . | $x\u02c6$* . | $b\u02c6$_{1}
. | p-value
. | $b\u02c6$_{2}
. | p-value
. |
---|---|---|---|---|---|

Boston | 3.32 | −0.13 | 0.02 | −0.50 | 0.11 |

Cambridge (USA) | 0.84 | 0.43 | 0.20 | −0.23 | 0.03 |

New York | 0.90 | 0.74 | 0.00 | −0.41 | 0.00 |

Berkeley | 1.30 | 0.68 | 0.00 | −0.20 | 0.10 |

London | 1.40 | 0.58 | 0.00 | −0.28 | 0.00 |

Oxford | 1.62 | 0.31 | 0.02 | −0.20 | 0.15 |

Edinburgh | 1.98 | 0.62 | 0.00 | −0.52 | 0.00 |

Dublin | 1.43 | 0.82 | 0.02 | −0.19 | 0.20 |

Beijing | 2.96 | 0.21 | 0.00 | −0.18 | 0.57 |

Hong Kong | 2.42 | 0.27 | 0.02 | −0.24 | 0.33 |

City . | $x\u02c6$* . | $b\u02c6$_{1}
. | p-value
. | $b\u02c6$_{2}
. | p-value
. |
---|---|---|---|---|---|

Boston | 3.32 | −0.13 | 0.02 | −0.50 | 0.11 |

Cambridge (USA) | 0.84 | 0.43 | 0.20 | −0.23 | 0.03 |

New York | 0.90 | 0.74 | 0.00 | −0.41 | 0.00 |

Berkeley | 1.30 | 0.68 | 0.00 | −0.20 | 0.10 |

London | 1.40 | 0.58 | 0.00 | −0.28 | 0.00 |

Oxford | 1.62 | 0.31 | 0.02 | −0.20 | 0.15 |

Edinburgh | 1.98 | 0.62 | 0.00 | −0.52 | 0.00 |

Dublin | 1.43 | 0.82 | 0.02 | −0.19 | 0.20 |

Beijing | 2.96 | 0.21 | 0.00 | −0.18 | 0.57 |

Hong Kong | 2.42 | 0.27 | 0.02 | −0.24 | 0.33 |

### 4.3. Further Work

In this work, we focus on testing whether there is a significant increase in the ARC score as the entropy measures increase, rather than measuring this effect. Similarly, for the average weighted airport network distance, we look for the existence and location of a peak using a piecewise linear model, without considering how well this model fits the data. While in each case, these models are suitable for our purposes, further work would be needed to more accurately model the relationships we observe.

Thus far, we have also been using fairly simple models to control for the effect of university rankings. To better understand the results, we may want to fit more complicated models by accounting for possible nonlinear effects of the variables involved. We may also want to investigate other factors that may affect ARC scores apart from university ranks, such as economic development.

Finally, our work has been looking at a specific year of data. An interesting extension would be to investigate if the relationships we have found differ for different years, and if so try to measure how the changing pattern of airline travel corresponds to the change in collaboration patterns.

## 5. METHODS

Here we detail the data and methods that we use in our analysis. In particular, in Section 5.1 we describe the data and in Section 5.2 we detail how the measures of diversity that we use are constructed.

### 5.1. Data

#### 5.1.1. Coauthorship network

This network consists of collaborations between different coauthors, where for each collaboration we have the location of each coauthor, an identifier for the paper, and a citation score for the paper. The citation score relates to the number of citations the paper received, normalized based on the subject area. This is the ARC score. The data consist of 352,057 papers published in 2005, with coauthors from 21,131 different locations. The locations of the coauthors are given as cities rather than universities. This means that we need to construct a mapping from universities to cities in order to incorporate university rankings into our analysis, as we shall describe.

#### 5.1.2. Air transport network

We take a snapshot of the air transport network in 2005 as a representative network showing major intercity connections. While we could have used a year-by-year analysis, we felt this was overanalyzing the problem, as collaborations are built up over a long time period and synchronicity with a particular year is unnecessary. The data consists of flight volumes between airports, with 9,192 airports and 33,075 flight links between them for the year that we focus on.

#### 5.1.3. Comparisons

In Figure 1 we see some simple comparisons between the networks of interest. We explore some of these in more detail here. In Figure 9 we see a random sample of the collaboration routes (the total number of routes is too large to plot clearly), while in Figure 10 we see the air transport routes. Comparing these, we see a number of differences. First, we see that although there is a strong connection between the United States and Europe in the air transport network, this is far more pronounced in the collaboration network. The same pattern holds true for the connections between Europe and Asia and Asia and the United States. Indeed, if we restrict ourselves to collaborations with coauthors from two or three different cities, we can see from Table 6 that the top collaboration routes (by ARC score) follow these patterns.

**Table 6.**

Two-way collaborations . | Three-way collaborations . | ||
---|---|---|---|

Countries . | No. of collaborations . | Countries . | No. of collaborations . |

Canada-USA | 3,447 | Germany-UK-USA | 128 |

Germany-USA | 3,043 | France-Germany-USA | 108 |

UK-USA | 2,965 | Germany-Switzerland-USA | 106 |

China-USA | 2,578 | Canada-UK-USA | 93 |

Japan-USA | 2,252 | France-UK-USA | 93 |

Two-way collaborations . | Three-way collaborations . | ||
---|---|---|---|

Countries . | No. of collaborations . | Countries . | No. of collaborations . |

Canada-USA | 3,447 | Germany-UK-USA | 128 |

Germany-USA | 3,043 | France-Germany-USA | 108 |

UK-USA | 2,965 | Germany-Switzerland-USA | 106 |

China-USA | 2,578 | Canada-UK-USA | 93 |

Japan-USA | 2,252 | France-UK-USA | 93 |

As noted in Figure 1, we see a north-south divide in the data, with disproportionately many collaborations occurring between cities in the Global North. In particular, the percentages given in Figure 1(b) are calculated by considering every pairwise collaboration and noting the location of the two relevant collaborators.

From this preliminary analysis, we also notice that there are a lot of long-distance collaborations present, in many cases between cities that do not have direct flights between them. This raises the interesting question of how journeys with multiple flights act as a barrier to collaboration, and what role is played by the distance on the air transport network compared with Euclidean distance. This provides further motivation for our work.

When performing our full analysis, our focus is on linking the number of citations that each paper receives with the relationship between the coauthors on the air transport network. More specifically, we want to see if there is a link between some measure of geographical diversity of the coauthors via the air transport network, and the ARC score for the paper. Thus, in what follows, we split our data by paper rather than considering summaries over all papers collaborated on by pairs of cities. For each paper, we then have access to a list of the coauthors on it, their location, and the ARC score. This is what we use for our analysis.

#### 5.1.4. University rankings

One more data set that we will make use of is the world university rankings, which comprises the rankings of the top 500 universities each year from 2005 onwards. As before, we focus on data from the year 2005. These data are necessary for our analysis because, as shown by Clauset et al. (2015), there is a relationship between the reputation and ranking of a university and the number of citations that a paper written by one of its researchers receives. When we look for a relationship between the number of citations that a paper receives and our various measures of diversity of the coauthors, we want to make sure that we take this effect into account.

### 5.2. Analysis

We now present the methods we use to investigate the link between geographical diversity of coauthors on a paper and the number of citations it receives. A key part in this will be defining our measures of geographical diversity. The first step towards these definitions is to connect our coauthorship data with our air transport data.

#### 5.2.1. Connecting cities with airports

There are a number of different ways to connect the coauthorship data with the air transport data. First, we want to find a distance measure between the cities in the coauthorship data set, where this distance is linked to the air transport network. We do this in an effort to replicate how two collaborating authors from potentially different countries could travel to meet each other. An initial measure of the distance between two cities is the number of flights it takes to travel between the two. We can calculate this by mapping each city to an airport and then finding the graph distance between the two airports on the air transport network.

*d*

_{ij}is the Euclidean distance between nodes

*i*and

*j*, and

*λ*is a parameter that controls the relative importance of physical distance against graph distance. The weighted network distance between two nodes is then given by the sum of the effective lengths on the shortest effective path between them. Incorporating Euclidean distance into our model makes sense intuitively because our distance measure is attempting to capture the geographical diversity of coauthors. We believe an important part of this is the difficulty of two potential collaborators traveling to meet each other. With this in mind, a long-haul flight presents more of a barrier than a shorter one.

It can be shown that, for the global air transport network, the value of *λ* that leads to the best replication of the observed network is 0 or close to it (Gastner & Newman, 2006). In our model, we choose *λ* = $110,000$. This choice fits with the conclusions of Gastner and Newman (2006), but is also useful from a practical perspective. We measure the Euclidean distances in kilometers, and because the longest distance Euclidean distance between two nodes on the air transport network is ∼9,000 km this means that a journey that involves multiple flights will always be assigned a greater weighted network distance than one involving only a single flight. Again, this fits with our intuition about the difficulty of two potential collaborators meeting, and gives some interpretability to the weighted network distances.

*A*and

*B*using the air transport network as follows:

Mapping cities to airports: First, each city is mapped to one or more airports, chosen as follows. We calculate the weighted degrees, on the air transport network, of all the airports within 100 km of the city. The city is then mapped to the five airports with the highest weighted degrees. If there is no airport within 100 km of the city, then it is mapped to the nearest airport. We denote the sets of airports associated with cities

*A*and*B*as 𝒜 and 𝓑 respectively.Calculating weighted network distances: For each pair of airports (

*a*,*b*)_{a∈𝒜,b∈𝓑}we then calculate the weighted graph distance on the air transport network using the edge weighting given by Eq. 3.Calculate shortest route: We set the weighted network distance between

*A*and*B*, which we denote as*d*_{AB}, to be the minimum of these weighted network distances.Correcting zero distances: Sometimes, due to the geographical proximity of two cities, the same airport might appear in 𝒜 and 𝓑. In this case, the minimum calculated in Step 3 will be 0, even though the cities may be up to 200 km apart. To correct for this, the distance between the two cities is set to be proportional to the Euclidean distance between them, normalized so that the maximum value it can take is 1.

*A*and

*B*is thus defined as

*i*and

*j*, and

*a*=

*i*

_{1}→

*i*

_{2}→ … →

*i*

_{N}=

*b*is the shortest weighted path from

*a*to

*b*on the air transport network.

We choose to map each city to potentially multiple airports in another attempt to recreate real-world travel situations, as the nearest airport to a city may not be the one with the best connections to certain other cities. The 100 km limit is set as the limit that a person might be willing to travel to an airport. Using a similar intuition to our choice of *λ*, setting the maximum distance to be 1 in the case that two cities share an airport is to ensure that any journey that contains a flight is considered “longer” than one that does not.

In Table 7, we can see that the weighted airport network distance is quite highly correlated with the Euclidean distance. When comparing ARC scores with average distance for different values of *λ*, we will see similar patterns for varying *λ*. This is perhaps unsurprising given these high correlation values.

**Table 7.**

. | Airport network . | Weighted airport network . | Euclidean . |
---|---|---|---|

Airport network | 1 | 0.96 | 0.62 |

Weighted airport network | 0.96 | 1 | 0.80 |

Euclidean | 0.62 | 0.80 | 1 |

. | Airport network . | Weighted airport network . | Euclidean . |
---|---|---|---|

Airport network | 1 | 0.96 | 0.62 |

Weighted airport network | 0.96 | 1 | 0.80 |

Euclidean | 0.62 | 0.80 | 1 |

*i*, such as eigenvector centrality or betweenness, the weighted centrality of a city

*A*is thus given by

*A*, as before.

*C*

_{i}(

*a*) is the centrality of airport

*a*, $daAe$ is the Euclidean distance between the city

*A*and airport

*a*, and

*α*is a decay parameter that we set to be equal to 2 as in Guo et al. (2017).

#### 5.2.2. Connecting cities with universities

As noted previously, the reputation of a university can have a large effect on the number of citations a paper written by one of its researchers receives (Clauset et al., 2015). Thus, we may want to control for university rankings in our analysis. We can use the university rankings data set to do this, but as the nodes in the coauthorship network are cities rather than universities we will have to use a similar method as we have done for the centrality measures to associate the ranked universities with the cities.

*A*as follows. First, we find all the universities within 20 km of the city and call this set 𝒰

_{A}. Then we calculate the weight

*w*

_{A}as follows:

*r*

_{u}is the rank of the university

*u*.

There are a number of things to note about this construction. First, we do not use a decay factor. This is because we are trying to replicate how the coauthorship data is aggregated into cities. Here, the collaborations from a city are the collection of the collaborations from each university associated with that city, with no dependence on how far the universities are from the city. Because we do not know exactly which universities are associated with each city, we use 20 km as an estimate. Empirically, this seems to include the relevant ranked universities for the largest cities of interest. The downside of this method is that many small towns very close to much larger cities are also given high university rank weights. This is hard to avoid with the current method, as all we have to match cities with universities are the respective location coordinates. Moreover, this will not affect our results significantly because these smaller towns have relatively few edges in the coauthorship network, except in the case when they are home to a large university. In this case, the large university ranking weight will have been assigned to them correctly.

The exact form of the weight with respect to the rankings is calculated so that the better a ranking is, the more weight it adds, with the square root term ensuring that this effect is not too dominant. We only have the rankings for 500 universities, so for most cities the university set 𝒰_{A} will be empty. The +1 means that the baseline weight is 1 rather than 0, because for a specific paper, we may want to look at the product of the university rank weights for its coauthors. For example, a city that did not have any top 500 universities within its radius would have a weight of 1. Boston has the highest weight of 2.84, which is unsurprising given its proximity to Harvard and MIT.

#### 5.2.3. Measures of diversity

We now present the three measures that we will use to investigate the relationship between coauthor diversity and paper citations.

##### 5.2.3.1. Average weighted network distance

*P*

_{i}with

*N*

_{i}coauthors from cities

*c*

_{i1}, …,

*c*

_{iNi}∈ 𝒞

_{i}we can then calculate the average weighted network distance as

##### 5.2.3.2. Entropy of weighted network distance

*p*

_{i}in this case are the probabilities of a certain weighted network distance appearing given the distribution of distances in our data. We can estimate these probabilities by sorting the observed distances into bins and then using the bin counts as an empirical distribution estimator.

This measure, also known as *Shannon’s diversity index*, quantifies the diversity of weighted network distances between coauthors on a paper. It may be more difficult to see how this measure captures diversity in a similar sense to our previous measure. In this case, a larger value indicates that the distances between coauthors are more varied. From the viewpoint of one specific coauthor, this would indicate that they collaborate with coauthors that are varying distances away from them—perhaps one international coauthor and one from a nearby university. Conversely a smaller value would indicate several coauthors that are the same distance from each other, such as several coauthors from local universities. It is worth noting that this measure is only meaningful for papers with more than two coauthors. With only two coauthors this entropy measure will always be zero, as the entropy of a single number is zero.

##### 5.2.3.3. Weighted entropy of coauthor location

An entropy-based measure that may seem more intuitive can be found by directly calculating the entropy of the geographical locations of the coauthors of a paper. We can calculate this as before by discretizing the locations into “bins,” which are two-dimensional in this case. The entropy of the locations then gives a direct measure of geographical diversity, as a higher value means that the coauthors are more spread out throughout the world, with fewer located close together in the same “bin.” This entropy measure is different to the one used previously in that it does not concern the actual (weighted network) distances between the coauthors, just whether or not they are clustered together.

*p*

_{i}are the probabilities of a certain geographic location bin. The

*w*

_{i}are weights that in our case take the form

*U*

_{i}are the averages of the university rank weights of the coauthor locations in the 2D bin used to calculate

*p*

_{i}. The

*C*

_{eig,i}are averages of the eigenvector centralities over the bins. We “power down”

*C*

_{eig}by raising it to a small power because the range is huge (over 10 orders of magnitude) and we do not want it to dominate the entropy values or university rank weights.

This form for the weights associates more weight with lower ranked universities and less connected cities. Thus, our measure of diversity rewards papers where the coauthors are not only spread out geographically but also not well connected on the air transport network. This means that papers with a higher weighted diversity indicate a greater difficulty for their coauthors to travel to each other, which is in line with our previous measures. The diversity measure also rewards papers with coauthors from less highly ranked universities, which helps to counteract the effect reported by Clauset et al. (2015) on the effects of university rankings and reputation on citations.

## 6. DISCUSSION ON LIMITATIONS AND IMPACT

In terms of limitations, we first acknowledge that funding is a strong confounding variable in the prominence of citation metrics for papers (Zhou, Cai, & Lyu, 2020), thus skewing our results to the importance of funded research. As such, heavy bias towards national-specific funding might lead to a preference to high citation papers for shorter distances than international distances. So, while there is supporting evidence from the literature (de Moya-Anegon, Guerrero-Bote et al., 2018) that international collaboration does improve citation, any diminishing return results analysis might need to consider the impact that funding has and the open challenge of disentangling causal mechanisms between funding and citation. Another consideration is the relative cost of a flight as a proportion of salary for underfunded researchers, which as a proportion of funding might be lower in the Global South, and certainly long-haul flights to the north make the problem more severe.

We also believe that university ranking is probably the most obvious confounding variable to check for, which indirectly includes aspects such as GDP. For example, if you are collaborating with someone overseas, while GDP may affect the flight cost and frequency, the fundamental motivation might be more related to academic aspects or the sheer practical distance of the flight. Certainly, there are secondary factors such as desirability of the travel location (Knight, 2014) and the dominance of conference locations in instigating collaborations (Fraz, 2015), and GDP may discourage early career researchers in low-income countries from making collaboration trips out of their own pocket, or that those without family or care responsibilities are more likely to form collaborations (Hu, Chen, & Liu, 2014), but we cannot distinguish this level of granularity within one paper, as there are inherent privilege issues in research for many countries.

Another limitation is that some researchers may use ground or maritime travel, but in general we believe air travel dominates international and long-distance national travel, or at least has a reasonable approximation to the distance cost irrespective of modality. Therefore small discrepancies in personal choice might not change the overall statistics much.

In terms of impact on the academic knowledge transfer and international collaboration, there are two distinct areas to which these results can contribute. The first is exchange and mobility: Many bilateral schemes (e.g., Royal Society International Exchange, German DAAD) dictate which countries are priority countries based on largely bilateral funding agreements and a common scientific priority agenda. Often this overlooks diversity and especially the Global North-South divide highlighted in this paper (94% of collaborations are between northern hemisphere universities). Beyond travel grants, domain-specific researchers can also benefit from this work (e.g., which countries have the greatest diversity potential for similar distance). Secondly, this paper may inform research funding policy: Current best practice recognizes the need to improve diversity, but lacks quantitative frameworks. While this work only provides a single dimension of geographic diversity (though one can argue geography is closely associated with many aspects of culture, ethnolinguistics, and practices), it provides domain-specific data on diversity gaps. This in turn can inform university policy as well as adding an extra diversity dimension for international partnerships (e.g., current GCRF funding is only based on income).

## 7. CONCLUSIONS

In this paper we have investigated connections between the citations that papers receive and how the coauthors are connected via the air transport network. In particular, we have looked at how different measures of geographical diversity of the coauthors on a paper are related to its ARC score. We have defined three different measures of diversity, relating to the average weighted (air transport) network distance between coauthors, the entropy of these weighted network distances, and the weighted entropy of the coauthors’ geographical locations. We have seen interesting relationships in each case. For the two types of entropy, the ARC score for a paper increases as the entropy, and thus the diversity, increases. As the average weighted distance increases, the ARC scores increase up to a point, but then start to decrease. In all cases there appears to be a link between diversity and citations.

To ensure that there were no obvious global confounding variables that could offer an alternative explanation for these results, we have also investigated the effects that the university rankings have on this relationship. We have seen that the relationship between the diversity measures and the average university rank weights is similar to the relationship between the diversity measures and the ARC scores. However, we have shown that the effects discussed above persist having controlled for the effects of university rankings. Furthermore, we have seen that different subject areas exhibit different relationships between diversity and ARC scores. This is also true when we look at collaborations made by researchers from specific cities.

## AUTHOR CONTRIBUTIONS

Cian Naik: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Validation, Visualization, Writing—original draft; Writing—review & editing. Cassidy Sugimoto: Conceptualization, Methodology, Project administration, Writing—review & editing. Vincent Larivière: Conceptualization, Data curation, Methodology, Project administration, Writing—review & editing. Chenlei Leng: Conceptualization, Funding acquisition, Methodology, Project administration, Supervision, Writing—review & editing. Weisi Guo: Conceptualization, Data curation, Methodology, Project administration, Supervision, Visualization, Writing—review & editing.

## FUNDING INFORMATION

Weisi Guo was supported by H2020 Marie-Curie [778305] and EPSRC [EP/L016400/1]. Cian Naik was supported by the EPSRC and MRC [1930478].

## DATA AVAILABILITY

The data sets used in our analysis are available from: https://www.kaggle.com/datasets/ciannaik/impact-of-geographic-diversity-on-citations-data.

## COMPETING INTERESTS

The authors have no competing interests.

## Notes

^{1}

In this case we define significance at the 5% level by checking that the *p*-values are ≤0.05.

^{2}

Throughout our analysis, we fit the piecewise linear model on the raw (rather than binned) data, but for ease of understanding we show the fit on the binned plot. However, in practice we find that the results are very similar if we perform a weighted fit to the binned data using the number of data points in each bin.

## REFERENCES

## Author notes

Handling Editor: Ludo Waltman