Measuring and interpreting the differences of the nations’ scientific specialization indexes by output and by input

Abstract This paper compares the national scientific profiles of 199 countries in 254 fields, tracked by two indices of scientific specialization based respectively on indicators of input and output. For each country, the indicator of inputs considers the number of researchers in each field. The output indicator, named Total Fractional Impact, based on the citations of publications indexed in the Web of Science, measures the scholarly impact of knowledge produced in each field. For each country, the approach allows us to measure the deviations between the two profiles, thereby revealing potential differences in research efficiency and/or capital allocation across fields, compared to benchmark countries.


INTRODUCTION
Policy-makers who have knowledge of the scientific specializations of their country can better formulate research policies and funding priorities, including by specific field, and can better assess the effectiveness of their initiatives in relation to strategic priorities. Whether public or private, however, stakeholders face major challenges in identifying scientific priorities and then parceling their investments (King, 2004;May, 1997). What is necessary is not only knowledge of the home nation scientific profile but also its relation to those of other countries, at regional and global levels.
The measurement of research activity and the construction of a national scientific profile can be carried out by considering either the input employed (resources and capital investment, research personnel, etc.) or the output produced (know-how, scientific publications, patents, etc.); that is, the knowledge developed and its scholarly impact (Sugimoto & Larivière, 2018).
In a previous work, for purposes of tracing the scientific profiles of countries, we proposed an index of scientific specialization based on scholarly impact of 2010-2019 Web of Science ( WoS) publications in each subject category (SC) (Abramo, D'Angelo, & Di Costa, 2022a). By producing a specialization profile for each country in relation to all SCs (254), we were able to identify the distinctive characteristics of individual countries and country clusters.
However, if we consider the whole process of scientific research production as a black box, the calculation of specialization indices can also be carried out by considering input indicators a n o p e n a c c e s s j o u r n a l alongside the output indicators. The former approach traces the profile of a country through the sectoral distribution of research investments; the latter through the relative distribution of its scientific production.
From an operational point of view, tracing the research profile of a country on the basis of input indicators is a challenging task, because at the global level, gathering input data disaggregated by field is formidable, even more so by univocal classification of those fields. Input data, or production factors according to the microeconomic theory of production, are labor (L) and capital (K ); that is, all resources other than labor used to conduct research activities. While K data are not available, in this paper we go some way to overcoming the obstacle concerning L data. In fact, the bibliometric approach allows not only measurement and classification of output, through observation of scientific output, but indirectly also the input, limited to the research staff. In fact, having understood how to disambiguate authors' identities and their country affiliations, this makes it possible to measure the size of the research staff of a country and to classify it per SC based on the prevalent SC in which each author's publications fall. It is then possible to measure the scientific specialization of countries with input data (limited to L), in a similar way as with output data.
It is then interesting to check whether and to what extent the resulting scientific profiles are different. The share of research fields showing deviations between the two indices would reveal differences in research efficiency and/or allocation of K across fields, compared to benchmark countries. In fact, because research output is a function of L and K, if a field specialization index is higher by input than by output, a possible explanation is that the country has historically invested less K in that field than in others and/or that the productivity of the researchers, compared to other countries, is lower in that field. When the share of such fields surpasses one half, the inference would be that the country is entering the area of imbalance across fields, in the efficiency of their research and/or capital allocations. Were K data available and accounted for, those differences would reveal directly field-level comparative advantages across countries.
Essentially, to move the national research profile towards alignment with strategic objectives, governments can act on two levers: differentiated allocation of public funds across fields, and/or differentiation of productivity incentives by scientific fields, although the latter would not be easy in practice. In any case, the effects of these interventions on field outputs of research, and on shifting the scientific profile, is in part dependent on the status of productivity across these very fields.
The objectives of the present work are therefore, for each country • produce two specialization profiles, respectively based on input and output indicators, corresponding to each of the 254 SCs of the WoS classification scheme; • analyze the two specialization profiles of countries by input and output indicators; and • assess the deviations between the two profiles for individual countries and country clusters; all this in a manner supportive of policy-makers intending to formulate research policies and priorities for funding by field.
The next section of this paper reviews the relevant literature. Section 3 describes the data and indicators used for analysis, and the methodology adopted for construction of the specialization profiles. Section 4 presents the results of the analysis and Section 5 comments the main findings and discusses the policy implications.

LITERATURE REVIEW
Scholars have generally applied frameworks from business or economics in studying specialization levels in scientific research. The most common approach is by "revealed comparative advantage" (Aksnes, Sivertsen et al., 2017;Allik, Realo, & Lauk, 2020;Bongioanni, Daraio et al., 2015;Cimini, Zaccaria, & Gabrielli, 2016;Horta & Veloso, 2007;Leydesdorff & Wagner, 2009;Li, 2017;Patelli, Cimini et al., 2017;Sandström & Van den Besselaar, 2018). Examining a field at international level, this approach "reveals" the comparative advantage of a country in proportions of labor factor, or output produced, compared globally or to a selection of countries. All comparative advantage indices used in international economics originate from the Balassa or "RCA" index (Balassa, 1965). The first to transfer RCA to investigation of specialization in scientific research was Frame (1977), who introduced the so-called "activity index. 1 " This indicator is typically based on one of the easily measured macroscopic bibliometric variables: total publications from a country; total citations received by the country's publications (Aksnes, van Leeuwen, & Sivertsen, 2014;Harzing & Giroud, 2014); and in some case more sophisticated combinations of output and impact (Abramo, D'Angelo, & Di Costa, 2014;Abramo et al., 2022a).
The value of the activity index is given by the ratio of two ratios. The first one measures the share of research effort (or output) of a country in a given field with respect to the national total, and the second one measures the same share but at a global level. The indicator is expressed as an absolute value or transformed on a scale [−100; +100] for easier understanding and comparison.
Subsequent to detailed analysis of its technicalities, Glänzel (2000), and Schubert and Braun (1986) have provided interpretations of this indicator. Other authors have explored theoretical problems in the construction of the activity index and related indicators (Aksnes et al., 2014;Rousseau, 2018Rousseau, , 2019Rousseau & Yang, 2012).
The bibliometric indicators generally used are based on output data extracted from bibliographic repertories ( WoS, Scopus) which, despite coverage problems (by discipline, language, country, etc.), have become the de facto standard for measuring research, and more generally, for studies in the field of the so-called "science of science" (Archambault, Vignola-Gagné et al., 2006;Hicks, 1999;Waltman, 2016). Compared to other approaches of measuring research, bibliometrics clearly has the advantage of access to data, gathered by repository publishers according to globally standardized procedures.
In contrast, input data are generally collected through local and international surveys, under the auspices of national research councils or international organizations, such as OECD and UNESCO. Although such entities collect and regularly update their data, none have the mandate or capacities to apply standard classification systems, so none can provide data sufficient for reliable study of specialization. Given the inaccessibility of data on inputs, scholars interested in the investigation of specialization at macro (i.e., country) level have thus far engaged solely with data on outputs.
On the other hand, there is no shortage of analyses on input and output data at meso level (i.e., surveys of data on a small set of local institutions, enabling evaluation of their specialization). Heinze, Tunger et al. (2019), for example, described research and teaching profiles for 68 public universities in Germany (from 1992 to 2015) and produced specialization maps for each of them. Fuchs and Heinze (2021) then revised the analysis on an updated data set (1992 to 2018). Teixeira, Rocha et al. (2012) adapted one output and three input measures from the RCA index of Balassa (1965) in the study of field-by-field diversity (specialization and/or diversification) of Portuguese higher education institutions.
Thus far however, in measurement of specialization at macro/country level, for the reasons explained above, there remain no works using input data. In this paper we try to fill this gap, using the bibliometric approach.

DATA AND METHODS
Observing the authorship of scientific publications, then taking on the task of disambiguating the author identities, and tagging by country affiliation and field of specialization, we are ultimately able to measure the size of a country's research staff in a given field. This input measure can then be used to construct the country's sectoral specialization profile in terms of inputs, in the manner of traditional approaches dealing only with outputs. In the following, we explain the methodological details.
The data set for the analysis is the same as previously used by Abramo et al. (2022a), which applied the rule-based scoring and clustering algorithm of Caron and van Eck (2014) to data extracted from the in-house WoS database of the Centre for Science and Technology Studies (CWTS) at Leiden University (updated to the 13th week of 2021). For this algorithm, bibliometric metadata on authors and their publications are taken as input, and clusters of publications likely to be written by the same author are taken as output. The algorithm considers four categories of bibliographic elements: • author name (first and last name, affiliation, email); • article (shared coauthors, grant numbers, address not linked to authors); • source (SC, journal); and • citation (self-citations, cocitations, bibliographic coupling).
The higher the number of shared bibliographic elements (source, topic, coauthors, emails, affiliations, references, etc.) between two publications, the stronger is the evidence that these are written by the same author.
Based on scoring values and thresholds, defined on a verified seed set, the algorithm develops clusters of publications and assigns them to an individual.
Of course, the algorithm is far from being error free, especially for authors with popular names, or production of highly diversified and heterogeneous bibliographic elements, a circumstance that could lead to splitting their portfolio in two or more clusters.
However, at the aggregate country level, this latter error, as extensively explained in the theory and methodology of the previous work, will have only marginal effects on analytical results. Referring to Abramo et al. (2022a), an important note is that to increase robustness of the analysis, the data set excludes those clusters that fail to comply with one or more of the following conditions: • contain at least 10 publications (excludes "occasional" researchers, for whom clustering has lower confidence levels); • of which at least one publication is after 2018 (designed to exclude researchers no longer active); and • with a "research age" 2 of minimum 5 years (designed to include only "established" researchers).
Through such "cuts" we effectively exclude small clusters, related to very young or occasional researchers but also those related to researchers no longer active (e.g., who are now retired). We also exclude part of those clusters deriving from the splitting of authors with popular names and/or with highly diversified scientific production, caused by the Caron and van Eck algorithm. All this allows us to have a higher confidence that the resulting data set actually represents the research staff of a given country, at present.
The final data set consists of over 2 million clusters, accounting for over 120 million authorships, related to almost 17 million unique publications. On average each cluster contains 58 publications, and each unique publication is coauthored by eight distinct clusters.
For field classification purposes, we use the WoS scheme, including 254 SCs 3 . Each cluster in the data set is provided with the 2010-2019 related WoS indexed publications 4 and is associated with a field, given by the "prevalent" SC of its publications (i.e., the one hosting most of his or her scientific production) 5 . In the input-based approach, the specialization index (IB)SI jk of country k, in the SC j is (1) where RS jk = research staff, operationalized as number of clusters of the country k in the SC j.
The higher the value of SI jk compared to 1, the more specialized the country k is in SC j, as the share of its research staff is higher than the expected value observed at world level. If SI jk is less than 1 it means that no specialization is involved in SC j for country k.
In the output-based approach, instead, we use the composite indicator proposed in Abramo et al. (2022a), and called Total Fractional Impact (TFI ), which is a combination of publication volume and field normalized citation impact. The TFI of a country k in SC j, is defined as where N jk = number of publications of country k, in SC j f ik = fractional contribution of coauthors of country k to publication i. For a publication with n coauthors, m of which are affiliated to country k, f ik is equal to m/n 6 c i = citations received by publication i (counted at the 13th week of 2021) c j = average citations received by all cited publications of the same year and SC j of publication i 7 2 Given by the difference between the first and the last publication year assigned to the cluster. 3 In WoS each publication inherits the SC of the hosting journal. 4 Only articles, reviews, letters, and proceedings papers. 5 Clusters with more than one prevalent SC are around 2% and are counted multiple times. 6 Note that according to the CvE algorithm, each cluster (and thus each author) is associated with one and only one country. 7 Abramo, Cicero, and D'Angelo (2012) demonstrated that the average of the distribution of citations received for all cited publications of the same year and SC is the best-performing scaling factor.
Applying Total Fractional Index, we can measure the output-based index of specialization (OB)SI jk of country k in SC j as In this case a value higher than 1 implies that country k is specialized in SC j, as the share of TFI in such SC is higher than the expected value observed at world level, and vice versa.
Countries can be more or less concentrated (diversified) in terms of scope (number of SCs) of research. We will assess that by the Gini index, or Gini coefficient, which measures variable distribution across a population (Gini, 1921). A higher Gini coefficient indicates greater inequality in the distribution of input (output) across SCs, with high-input SCs receiving much larger shares of the total input for research. The Gini coefficient ranges from 0 to 1, with 1 representing perfect inequality (concentration) and 0 representing perfect equality (diversification).

RESULTS
The analyses of the current paper, as follows, are aimed at comparing the distributions of SI jk calculated from input and output data. For this, we construct 199 × 254 matrices containing the SI values, by input and output, for a set of 199 countries in each of the 254 WoS SCs. For reasons of space, we present only a few examples of possible data elaborations. The complete data on all 199 countries in 254 SCs are found in Abramo, D'Angelo, and Di Costa (2022b).
As a first example, Figure 1 shows, for China, the distribution of SIs detected for the SCs of Biomedical Research (14 in all). The SI values measured through output are never greater than unity; instead, when measured through input, five fields out of the 14 reach levels greater than unity. The (OB)SI values are higher than the (IB)SI values in only four cases: Among these, the highest absolute values are in Toxicology (0.759 by output data, 0.639 by input data). In absolute value, the greatest gap is in Medical Laboratory Technology (0.882 vs. 2.859), followed by Virology (1.136 vs. 0.592) and Oncology (1.175 vs. 0.678). It therefore emerges that for China, in general, there is a significant lack of specialization in this set of SCs, and above all a gap in capital investment and/or productivity, compared to other countries. Figure 2 shows the comparison for the United States, looking at the SI values for input and output in the 20 SCs that are greatest by world output. In 15 out of 20, the (OB)SI value is higher than the (IB)SI value based on input, with a maximum deviation in Medicine, General & Internal; in this field, for the United States, the (OB)SI is 1.368, compared to an SI by input of 0.831. At the opposite extreme for these 20 SCs is Chemistry, Multidisciplinary which shows an (IB)SI of 1.267 versus an (OB)SI of 0.743 by output, or in other words, 41% less. Also for the United States, whether for specialization index by input or output, there are nine SCs with values greater than unity, and of these, eight SCs represent the particular case where both SI values are greater than unity (Astronomy & Astrophysics; Biochemistry & Molecular Biology; Cardiac & Cardiovascular Systems; Clinical Neurology; Neurosciences; Oncology; Public, Environmental & Occupational Health; Surgery). For these eight SCs, the percentage variation between the two SI values was within the ±10% in 10 out of 20 cases.    Table 1 provides an examination of the specialization profiles for the major European countries in terms of research output, specifically their top five SCs by specialization index based on input ((IB)SI ) and output data ((OB)SI ). All five of these European countries show a strong presence of "top" SCs (about 1/3 of the total, for both input and output) in the humanities and social sciences. Also interesting is that the intersection between the two sets of categories is rather limited: For France, Germany and Netherlands, two SCs appear in both columns; Italy and Spain have only one with a double appearance, and the United Kingdom has none. Finally, in this table, the top values of (IB)SI are greater than the corresponding top values of (OB)SI in 24 of the 30 total cases.
In Table 2, for the top seven countries by share of output, we look into the two SCs characterized by maximal difference between (IB)SI and (OB)SI, both negative and positive. In other words, for each country, columns 2-3 report the SCs with evident gaps in either or both of capital investment and productivity, given that the specialization indexes by output data do not align with what emerges concerning inputs. For China, for example, the maximal negative case ((OB)SI − (IB)SI ) is found in Medicine, Research & Experimental, and in Mathematics, Interdisciplinary Applications; for Russia, this is found in Chemistry, Applied and Mining & Mineral Processing.
Columns 4-5 report the opposite situation (i.e., SCs with maximal difference of SI by output data over input data), evidently due to higher capital allocation and/or productive efficiency   Table 3 reports, for each of the top 20 countries by share of output, the shares of SCs with (IB)SI greater than unity; (OB)SI greater than unity; and (OB)SI greater than (IB)SI. Within this group of 20 we quickly note some G7 countries, such as the United States, United Kingdom, Germany, and Canada, at the bottom of the table, but also another G7 country-Italy-near the top of the list. The first four countries in the list have about 70% of SCs with (OB)SI greater than (IB)SI, the last four about 50%. It should be noted, however, that the latter case describes capital allocation and efficiency of research that are more balanced across fields.

Concentration/Diversification in Country Disciplinary Profiles
The disciplinary profile of a country can be more or less specialized in a few SIs or distributed in many (diversified or "balanced"). In this regard, there are interesting differences between countries when considering SIs based on input or output data. Table 4 shows, for the top 20 countries by share of output, the value of the GINI coefficient (output data) and the relative coefficients of variation of the distributions of SI values for the 254 SCs (input and output data). For all 20 countries except Iran, the GINI value for their (IB)SI distribution is greater than the value for (OB)SI. Russia, Iran, and India, in view of the high values of GINI coefficients calculated in both modes, are the countries with highest level of concentration of sectoral  Figures 4 and 5 compare the national disciplinary profiles of the United States and Russia, the two countries already noted at the antipodes in specialization/differentiation of scientific profiles in terms of (IB)SI and (OB)SI. A first observation is that for both indices, the values for the United States never exceed 4.5. On the contrary, the trends for Russia show pronounced oscillations: (IB)SI, while in the range 0-4 for 237 of the 254 SCs, presents a number of sharp peaks, two of which are close to the value 16; for (OB)SI the trend is to even more oscillations, although with peaks not surpassing 8.
Finally, we investigated the relationship between the dispersion of the national profiles of the top 20 countries by share of output and the balance of efficiency of research and/or capital allocation across fields. The correlation analyses showed that countries with high dispersion are those more balanced (for (IB)SI, Pearson correlation coefficient: 0.543; Spearman correlation coefficient: 0.583; for (OB)SI, 0.420 and 0.514, respectively). For all 199 countries examined, Figures 6 and 7 show, on input and output sides, the world quantile maps of the GINI coefficient of the SI specialization index. Both maps show the presence of balanced vs. unbalanced research profiles, the former being typical of developed countries, the latter of developing countries. However, not only the "top" countries seen earlier, but almost all (189/199) nations show a higher value of input-based than output-based GINI coefficient (i.e., profiles that are more distributed on the input side). The largest differences are found for Latvia (

Clusters of Countries by Research-System Disciplinary Profile
In the previous sections we used specialization indices based on input and output data to reveal the scientific profile of countries, and especially to compare their disciplinary characterization with respect to all other countries. Such indices can also be used to group countries by similarity of respective profiles. We do this by grouping according to Ward's dissimilarity (Ward, 1963), after principal component analysis (PCA) for reduction of the 254 SC specialization indices to seven principal components 8 , beginning from both input and output data. The results are shown Tables 5 and 6, for input and output. There is partial overlapping in the composition of the identified groups but also an evident partial reconfiguration of the clusters when considering one or the other sides of data.
Taking either approach, the first cluster lacks the top countries by share of output seen earlier, including only East African countries, with Ghana also in the output approach.
China, India, and Iran gather in a cluster in both approaches, but the other associated countries change: Taking the input approach, the cluster includes a concentration of Middle Eastern, Asian, and North African countries, united (apart from a few) by linguistic-cultural factors, among which are some "tigers of the East" (Indonesia, Malaysia, Thailand).
Russia occupies a cluster as the sole top country, along with three post-Soviet countries also (Belarus, Kazakhstan, Ukraine). Note that many of the other post-Soviet countries appear in cluster 7 in the input approach, without any top country by share of output; and in cluster 3 in the output approach (along with Poland as a top country). 8 "Principal components" are new variables constructed as linear combinations of initial variables. The initial variables are the SIs on 254 SCs, combined so that the new variables are uncorrelated and most information within the initial variables is stored in the first components. Here, 254-dimensional data yields 254 principal components, but PCA maximizes information in the first ones, achieving a reduced data set focused on the first few components but without important loss of information. Specifically, the first seven components explain about 50% of the variability of the original information, both with input and with output data. Hence, we limit our analysis to these seven components and to as many clusters of countries. Clusters 5 (input data) and 6 (output data) are quite similar, with the top countries all English-speaking plus the Netherlands in the input approach, and Netherlands plus Sweden in the output approach.
France, Germany, Italy, and Switzerland are all present in clusters 6 (input data) and 7 (output data). Spain groups with these only for the input approach, while considering the output side, it appears as the sole top country of a cluster together with a number of Latin American countries. The situation of Japan is also singular, being associated with Brazil and Poland in the input approach and with France, Germany, Italy, and Switzerland in the output approach.  At the same time, with the input data, these four countries correspond to a profile that assimilates that of South Korea and Turkey, countries that instead associate with Brazil and Poland in an output cluster. Figures 8 and 9 show the ranking of the countries determined by input and output data respectively, but now limiting the analysis solely to principal components 1 and 2: a  representation still more partial on an even greater restriction of the overall information contained in the data 9 . Comparing the two graphs, we see that the rightmost cluster, containing technically and scientifically advanced countries (Australia, Canada, Netherlands, United Kingdom, United States) remains substantially unchanged in composition (with the exception of Sweden, present only for output data), while the other clusters present different recombinations of countries, the only other being the outlier character of Russia, isolated in both graphs.

DISCUSSION AND CONCLUSIONS
National research systems can be analyzed in terms of their scientific profiles, and their capital allocation and productive efficiency, through the application of scientific specialization indices (SIs), in this way supporting policy-makers as they work to define and pursue the research priorities of their countries. In this paper, we have constructed indices of scientific specialization, calculated from both input and output data, for a set of 199 countries, operating in 254 WoS SCs. One of the aims was to conduct a comparative analysis drawing on the results of the different SIs, more specifically: to produce, for each country, a dual specialization profile for each SC; for each country and field, to measure the deviations between the values of the two indices; and to observe how distinctive or common features of individual countries or clusters of countries, in terms of their SIs for different fields, may vary depending on the point of view of the index used.
For the calculation of the output-based specialization indices, we used the Total Fractional Impact (TFI) (i.e., the sum of the impact of the individual publications produced by the country 9 Note that in Figure 8, PC1 is not centered on zero. The distribution of PC1 is indeed centered on zero for the total 198 countries, but for the 20 largest in our analysis, in the input approach the values are all positive with an average of 6.7. in each SC). Given that the rate of international collaboration (and therefore coauthorship) in research varies from country to country, we adopted fractional counting to take into account the contribution to each publication by researchers from each country. For calculation of the input-based indices, we used the number of authors from the country in the SC, accepting that, due to lack of information, we could not account for invested capital.
A value above one for SI in a given SC indicates a specialization of the country in that SC, evidently because it presents some particular interest. However, based on the construction of the SI as a ratio of ratios, values higher than one are also naturally observed in all those SCs where the share of either TFI or of researchers, although low in value at national level, is nevertheless higher than the corresponding value at world level. This phenomenon is observed for some nationally specific SCs of Art & Humanities, such as "Literature, German, Dutch, Scandinavian" and "Literature, British Isles," for example, where Germany and the United Kingdom are at the top for the relative specialization indices.
Looking at the top 20 countries by share of output, the analysis of their share of SCs presenting differences in indices on output and input sides revealed that most of the G7 countries are characterized by very balanced capital allocation and efficiency of research across fields. Exceptions would be Japan and especially Italy, which falls in a group of opposite character, along with Turkey, Brazil, Poland, and Russia.
On the other hand, the presence of SCs with large shares of the country's total fractional impact or researchers, and with SIs much higher than one, is clearly informative of the research system structure, and reflects policy choices that have enhanced the concentration on certain SCs over others.
Depending on the distribution of SI values among SCs, a country can therefore have a more or less specialized or diversified disciplinary profile. In this regard, we observed that for all countries but one (Iran), the GINI coefficient for distribution of (IB)SI is higher than for (OB)SI. Russia, with the highest values of GINI coefficient on both input and output sides (0.750, 0.706), is the country with the strongest profile of specialization. Russia, along with Iran and India, is also one of the countries with the smallest difference between the two concentration indices: countries that have concentrated most of their resources on only a few sectors, following a historic industrialization model that has accumulated expertise in specific sectors. The contrary profiles of the greatest differences between the (OB)SI and (IB)SI are instead seen in Sweden, Switzerland, Canada, and the United States: countries that have diversified their researchers across fields, and which have even more nuanced profiles of specialization when measured through their output.
After PCA, reducing the 254 SC specialization indices to seven principal components, we were able to identify seven clusters of countries by similarities in their profiles. There is partial overlapping in the composition of the identified groups, but also an evident partial reconfiguration of the clusters when considering one or the other sides of data. China, India, and Iran, and four of the English-speaking countries (Australia, Canada, United Kingdom, United States) on the other, compose the nuclei of two groups that maintain similar specialization profiles regardless of the approach.
In concluding, we note that the proposed analysis is not free of the intrinsic limits of the bibliometric approach, inevitably with effects on analytical results. In particular, scientific publications in international scientific journals indexed in WoS represent only part of the total output from research activity. This emerges as a criticality especially where the repertoires provide very low coverage, for example in the fields of Art & Humanities (Aksnes & Sivertsen, 2019), which are fields also suffering from uneven coverage. The choice of field classification scheme also remains critical. In this work, we implemented the one available in WoS, which covers 254 SCs. The repertoire choice of a high number of fields allows good detail in profiling the specializations of countries, but on the other hand reduces confidence in the analyses, especially for smaller countries.
Other limitations concern citations as a proxy of scholarly impact, as not all citations are positive or indicate real use by citing authors; and citations are not representative of all uses (Abramo, 2018;Bornmann & Daniel, 2008;Tahamtan & Bornmann, 2018;Tahamtan, Safipour Afshar, & Ahamdzadeh, 2016).
Finally, on the input side, the author name disambiguation algorithm is not free of errors, which have an effect also on the accuracy of the output produced by each country. Most importantly, when extracting research staff from publications' metadata, we are not able to account for unproductive researchers or researchers who do not publish in journals indexed in WoS. Furthermore, due to a lack of data on capital investment by country (and even more so by relative fields), the methodological approach to measurement of inputs considers only the numbers of researchers. But research obviously depends on instrumental resources, not only human, and ignoring investment differentials between countries certainly leads to analytical bias. The difference in specialization of a country across fields, from the input and output sides, can in fact have two explanations: higher/lower productivity of the country's researchers but also their higher/lower access to instrumental resources, compared to their colleagues in other countries. For now, the distinction between the two determinants remains difficult to investigate given the lack of data and of a collection framework that is both comprehensive and detailed. On the other hand, however, we are addressing the question of higher/lower differentials in the productivity of researchers by at least examining the feasibility of measurement with respect to an international benchmark, country by country.