How can citation impact in bibliometrics be normalized? A new approach combining citing-side normalization and citation percentiles

Since the 1980s, many different methods have been proposed to field-normalize citations. In this study, an approach is introduced that combines two previously introduced methods: citing-side normalization and citation percentiles. The advantage of combining two methods is that their advantages can be integrated in one solution. Based on citing-side normalization, each citation is field weighted and, therefore, contextualized in its field. The most important advantage of citing-side normalization is that it is not necessary to work with a specific field categorization scheme for the normalization procedure. The disadvantages of citing-side normalization—the calculation is complex and the numbers are elusive—can be compensated for by calculating percentiles based on weighted citations that result from citing-side normalization. On the one hand, percentiles are easy to understand: They are the percentage of papers published in the same year with a lower citation impact. On the other hand, weighted citation distributions are skewed distributions with outliers. Percentiles are well suited to assigning the position of a focal paper in such distributions of comparable papers. The new approach of calculating percentiles based on weighted citations is demonstrated in this study on the basis of a citation impact comparison between several countries.


INTRODUCTION
Research systematically investigates what is (still) not known. In order to demonstrate the lag in current knowledge and the shoulders on which the exploration of the lag by new studies stand, authors of papers (ideally) cite all relevant previous publications (Kostoff, Murday, et al., 2006). On the basis of this norm in science to cite the relevant past literature, citations have been established as a proxy for scientific quality-measuring science "impact" as an important component of quality (Aksnes, Langfeldt, & Wouters, 2019). Narin (1976) proposed the term evaluative bibliometrics for methods using citation-based metrics for measuring cognitive influence (Moed, 2017;van Raan, 2019). Bornmann and Marewski (2019) introduced the bibliometricsbased heuristics (BBHs) concept concretizing the evaluative use of bibliometrics: "BBHs characterize decision strategies in research evaluations based on bibliometrics data (publications and citations). Other data (indicators) besides bibliometrics are not considered" (Bornmann, 2020).
According to Moed and Halevi (2015), research assessment (based on bibliometrics) is an integral part of any scientific activity these days: "it is an ongoing process aimed at improving a n o p e n a c c e s s j o u r n a l Citation: Bornmann, L. (2020 Since the 1980s, many approaches have been developed in the scientometrics field to fieldnormalize citations. Although some approaches (e.g., the number of publications published by an institution that belongs to the 10% most frequently cited publications in the corresponding fields) could reach the status of quasistandards, each approach has its specific disadvantages. In this paper, an approach is introduced combining the advantages of two published approaches and smoothing their specific disadvantages. The first approach is citing-side normalization, whereby each single citation of a paper is field-normalized. The second approach is the citation percentile, which is the percentage of papers in a given set of papers with lower citation impact than the focal paper.

LITERATURE OVERVIEW: A SHORT HISTORY OF FIELD NORMALIZATION
Field normalization has a long tradition in bibliometrics. Literature overviews on the developments in the field can be found in Mingers and Leydesdorff (2015), Bornmann and Marx (2015), and Waltman (2016). Field normalizations start from the basic premise that "not all citations are equal. Therefore, normalization can be seen as a process of benchmarking that is needed to enhance comparability across diverse scientists, fields, papers, time periods, and so forth" (Ioannidis, Boyack, & Wouters, 2016). Many studies on field normalization either deal with technical issues (e.g., the development of improved indicator variants) or with the way fields should be defined for use in normalization (e.g., by using journal sets or human-based assignments; see Wilsdon, Allen, et al., 2015). One of the earliest attempts in bibliometrics to field-normalize citations was made by Schubert and Braun (1986) and Vinkler (1986). They proposed to calculate the average citation rate for a journal or field and to use this reference score to field-normalize (single) papers published in the journal or field (by dividing the citation counts of every single papers by the reference score). The resulting metric was named the relative citation rate (RCR) by Schubert and Braun (1986).
Since its introduction, the RCR has been criticized for its use of the arithmetic average in the normalization. The arithmetic average should not be used as a measure of central tendency for skewed citation data. According to Glänzel and Moed (2013), "the mean should certainly not be used if the underlying distribution is very skewed, and has a long tail" (p. 383). The fact that arithmetic averages of citation data and, thus, field normalized citation scores are sensitive to outliers has been named by van Raan (2019) as the Göttingen effect: "In 2008, a paper published by a researcher of the University of Göttingen became extremely highly cited, many thousands of times a year, within a very short time … As a result, for several years after this publication, Göttingen achieved a very high position in … [university] rankings" (p. 260).
To deal with the problem of skewed distributions in field normalization, McAllister, Narin, and Corrigan (1983, p. 207) already proposed in the 1980s that percentiles should be used for citation data: the pth percentile of a distribution is defined as the number of citations X p the percent of papers receiving X p such that or fewer citations is equal to p. Since citation distributions are discrete, the pth percentile is defined only for certain p that occur in the particular distribution of interest. Thus we would say that a 1974 physiology paper receiving one citation falls in the 18th percentile of the distribution. This means that 82 percent (100 − 18) of all 1974 U.S. physiology papers received more than one citation. For any paper in the 18th percentile of any subject area citation distribution, 18 percent of the papers performed at a level less than or equal to the particular paper, and 82 percent of the papers in the subject area outperformed the particular paper. For Schreiber (2013) "percentiles … have become a standard instrument in bibliometrics" (p. 822) in current bibliometrics. Percentiles are recommended in the Leiden manifesto which includes 10 principles to guide research evaluation (Hicks, Wouters, et al., 2015). The most recent field-normalizing percentile approach has been published by Bornmann and Williams (2020).
One of the biggest challenges in field normalizing citations is the selection of the system categorizing papers to fields. The overview by Sugimoto and Weingart (2015) shows that existing systems emphasize cognitive, social, or institutional orientations of fields to a different extent. Various field categorization schemes are in use to normalize citations and there exists no standard use in bibliometrics. The most frequently used schemes are multidisciplinary schemes that span all fields (Sugimoto & Weingart, 2015;Wang & Waltman, 2016). These schemes are typically based on journal sets: the Web of Science (WoS) subject categories of Clarivate Analytics and the Scopus subject areas of Elsevier. The use of journal sets can be justified quite well: according to Milojevic (2020, p. 184) "journals often serve as anchors for individual research communities, and new journals may signify the formations of disciplines." Each journal is a well-crafted folder sustained by editors, reviewers, and authors who usually know and use that outlet. Authors typically direct their manuscripts in an informed way to reach the appropriate audience for the content and argument.
There are two problems with these schemes, however, which is why Waltman and van Eck (2012) proposed a new method for algorithmically constructing classification systems (see also Boyack & Klavans, 2010): (a) Because journals publish many different papers, journals are usually assigned to more than one category; and (b) journal sets represent broad fields which is why papers from specific fields might be misclassified (see Strotmann & Zhao, 2010). The results by Shu, Julien, et al. (2019) reveal that about half of the papers published in a journal are not from the field to which the journal has been assigned.
The system proposed by Waltman and van Eck (2012) is based on citation relations between single publications. The advantages of the system are that (a) it assigns single publications (and not journals) to fields and (b) it provides a fine-grained categorization scheme of publications. Ruiz-Castillo and Waltman (2015) demonstrate the use of the system for field normalization. The system, however, has not remained without criticism: because "fields" are algorithmic artifacts, they cannot easily be named (as against numbered), and therefore cannot be validated. Furthermore, a paper has to be cited or contain references in order to be classified, since the approach is based on direct citation relations … However, algorithmically generated classifications of journals have characteristics very different from content-based (that is, semantically meaningful) classifications … The new Leiden system is not only difficult to validate, it also cannot be accessed or replicated from outside its context of production in Leiden (Leydesdorff & Milojevic , 2015, p. 201).
As the recent results by Sjögårde, Ahlgren, and Waltman (2020) show, at least the labeling problem of the fields can be solved.
Another critical point is that the field assignments based on citation relations change with new citations. The approach does not lead to stable results, and it is elusive why the field assignment of a paper should change. Further critical remarks can be found in Haunschild, Schier, et al. (2018). Based on the critique of the system proposed by Waltman and van Eck (2012), Colliander and Ahlgren (2019) introduced an item-oriented approach that avoids clustering, but uses publicationlevel features to estimate subject similarities. The empirical comparison of this approach with standard approaches in bibliometrics by the authors revealed promising results. Future independent studies will demonstrate whether these first positive results can be confirmed.
As an alternative to multidisciplinary schemes, monodisciplinary schemes have been proposed for field normalization. The advantages of these schemes are that papers are usually assigned to a single research field and human indexers (field experts or authors of papers) assign the relevant field to a paper intellectually (Bornmann, Marx, & Barth, 2013). In recent years, studies have used different monodisciplinary schemes to field-normalize citations in certain fields: Bornmann and Wohlrabe (2019) used Journal of Economic Literature classification ( JEL) codes in economics, Bornmann, Schier, et al. (2011) and Bornmann and Daniel (2008) used Chemical Abstracts (CA) sections in chemistry and related areas, Radicchi and Castellano (2011) used Physics and Astronomy Classification Scheme (PACS) codes in physics and related areas, and Smolinsky and Lercher (2012) used the MathSciNet's Mathematics Subject Classification (MSC) system in mathematics. The disadvantages of monodisciplinary schemes are that they are restricted to single fields and the assignments by the indexers may be affected by subjective biases.
One problem that affects many field classification systems (mono-and multidisciplinary) is that they exhibit different aggregation levels, and it is not clear which level should be used to field-normalize citations (Waltman & van Eck, 2019;Wouters, Thelwall, et al., 2015). In bibliometrics, different results and opinions have been published as to whether an aggregation level change has any (significant) influence on the field-normalized scores: Zitt, Ramanana-Rahary, and Bassecoulard (2005) report a lack of stability of these scores; Colliander and Ahlgren (2011) arrive at another conclusion. Wang (2013) holds the opinion that "normalization at finer level is still unable to achieve its goal of improving homogeneity for a fairer comparison" (p. 867).

CITING-SIDE NORMALIZATION
The literature overview in section 2 has shown that there are many problems with field normalization in bibliometrics and it has not yet been possible to establish a standard. One can expect that some problems will remain unsolved without finding a perfect solution. For example, it will remain a normative decision as to which field categorization scheme is used (and on what level). Independently of the system that is used, fields are not isolated and research based on betweenfield collaborations is common (Ioannidis et al., 2016). "With the population of researchers, scientific literature and knowledge ever growing, the scientific endeavour increasingly integrates across boundaries" (Gates, Ke, et al., 2019, p. 34). According to Waltman and van Eck (2013a), "the idea of science being subdivided into a number of clearly delineated fields is artificial. In reality, boundaries between fields tend to be rather fuzzy" (p. 700).
A possible solution to these problems might be to avoid the use of field categorization schemes (Bornmann, Marx, et al., 2013), clustering (Waltman & van Eck, 2012), and similarity approaches (Colliander & Ahlgren, 2019), and for each focal paper (that is assessed) to manually search some papers for comparison that are thematically similar (Kostoff, 2002;Waltman, 2016). This solution corresponds to the judgement by Hou, Pan, et al. (2019) that field normalization cannot be solved by statistical techniques. The manual collection of papers for the comparison with a focal paper might be possible in the evaluation of small sets of papers; however, it is not practicable for large sets (e.g., all papers published by a university over several years). Furthermore, one needs experts from the fields to find the papers for comparison.
Another solution that can be applied to large sets of papers is not to normalize citation impact based on expected citations from the reference sets, but to normalize single citations directly. So-called citing-side field normalizing approaches have been proposed in recent years that normalize each single citation of a focal paper. van Raan (2014) sees these "field-independent normalization procedures" (p. 22) as an important and topical issue in bibliometrics. The simplest procedure is to divide each citation by the number of cited references of the citing paper. The use of the number of cited references is intended to reflect the disciplinary context of the citing paper and to standardize the citation field specifically. It is a decisive advantage of citing-side normalization that it "does not require a field classification system" (Waltman & van Eck, 2013a, p. 700). Citing-side normalization, thus, solves the problem with the selection of a field-categorization scheme by refraining from it.
Citing-side normalization might be a reasonable approach for citation analysis, as the goal of field normalization is the normalization of citation impact (see Waltman, van Eck, et al., 2013). Given the different directions of the two basic field normalization approaches, citing-side approaches are more focused on the aim of field normalization than approaches that are based on reference sets on the cited side: Citing-side approaches normalize each single citation of a focal paper. Bornmann and Marx (2015) demonstrated the problem of field normalization based on cited-side normalization by using the well-known paper by Hirsch (2005) on the h-index as an example. This paper is a typical bibliometrics paper (it introduces a new indicator based on publication and citation data), but receives citations from many fields (not only from the bibliometrics field). If a focal paper is attractive for authors publishing in other fields with high citation density, it has an advantage over another focal paper that is not as attractive for these fields. Although both focal papers might belong to the same field (viewed from the cited-side perspective), they have different chances of being cited.
The paper by Hirsch (2005) is concerned with another "problem" (for field normalization): It was published in the Proceedings of the National Academy of Sciences of the United States of America. This is a multidisciplinary journal and is assigned to another journal set than most of the papers published in bibliometrics (which are assigned to library and information science). Thus, by using journal sets as a field categorization scheme, the paper would not be compared with its "true" reference papers, but with various papers from many different fields, which are usually published in multidisciplinary journals. An appropriate reference set for this paper would be all papers published in journals in the library and information science set. If one decides to manually collect the reference papers for comparison (see above), the ideal reference set for the paper by Hirsch (2005) would consist of all papers publishing a variant of the h-index or all papers having introduced an indicator combining the number of publications and the number of citations in a single number.
The idea of citing-side normalization has been introduced by Zitt and Small (2008). They proposed a modification of the Journal Impact Factor ( JIF) by fractional citation weighting. The JIF is a popular journal metric that is published in the Journal Citation Reports by Clarivate Analytics. The indicator measures the average citation rate of papers published in a journal within 1 year. Citingside normalization is also named as source normalization, fractional counting of citations, or a priori normalization (Waltman, 2016;Waltman & van Eck, 2013a). The method focuses on the citation environment of single citations and weights each citation depending on its citation environment: A citation from a field with high citation density (on average, authors in these fields include many cited references in their papers) receives a lower weight than a citation from a field with low citation density (on average, authors in these fields include only a few cited references in their papers). The basic idea of the method is as follows: Each citation is adjusted for the number of references in the citing publication or in the citing journal (as a representative for the entire field). In the recent decade, some variants of citing-side indicators have been published (Waltman, 2016;Waltman & van Eck, 2013a). These variants are presented in the following based on the explanations by Bornmann and Marx (2015).
The first variant has been named SNCS1 (Source Normalized Citation Score 1). In the formula, a i is the average number of linked references in those publications that appeared in the same journal and in the same publication year as the citing publication i. Linked references are the part of cited references that refers to papers from journals covered by the citation index (e.g., WoS or Scopus). The limitation to linked references (instead of all references) is intended to prevent a situation in which fields that frequently cite publications are not indexed in WoS are disadvantaged (see . The calculation of the average number of linked references in SNCS1 is restricted to certain referenced publication years. Imagine a focal paper published in 2008 with a citation window covering a period of 4 years (2008 to 2011). In this case, every citation of the focal paper is divided by the average number of linked references to the four previous years. In other words, a citation from 2010 is divided by the linked cited references from the period 2007 to 2010. This restriction to recent publication years is designed to prevent fields that cite rather older literature from being disadvantaged in the normalization (Waltman & van Eck, 2013b).
SNCS2 is the second variant of citing-side indicators. Here, each citation is divided by the number of linked cited references in the citing publication. Therefore, the journal perspective is not considered in this variant. The selection of the reference publication years is analogous to SNCS1.
SNCS3 is a combination of SNCS1 and SNCS2. r i is equally defined as in SNCS2. p i is the share of papers that contain at least one linked cited reference among the following papers: from the same journal and publication year as the citing paper i. The selection of the referenced publication years is analogous to SNCS1 and SNC2.
Whereas Leydesdorff, Radicchi, et al. (2013) concluded that cited-side normalization outperforms citing-side normalization, the empirical results of Waltman and van Eck (2013a) and Bornmann and Marx (2015) demonstrated that citing-side normalization is more successful in field-normalizing citation impact than cited-side normalization. Therefore, it seems reasonable for reaching the goal of field normalization to weight each citation "based on the referencing behavior of the citing publication or the citing journal" (Waltman & van Eck, 2013a, p. 703). The comparison of the three citing-side approaches by Waltman and van Eck (2013b, p. 842) revealed that SNCS (2) should not be used. Furthermore, the SNCS(3) approach appears to be preferable over the SNCS(1) approach. The excellent performance of the SNCS(3) approach in the case of classification system C … suggests that this approach may be especially well suited for fine-grained analyses aimed for instance at comparing researchers or research groups active in closely related areas of research.
The results by Bornmann and Marx (2015), however, did not reveal these large differences between the three indicator variants.
Cited-side normalization is frequently confronted with the problem that the used field categorization scheme assigns papers to more than one field. Thus, it is necessary to consider these multiple assignments in the calculation of field-normalized indicators (see Waltman, van Eck, et al., 2011). As multiple assignments are not possible with citing-side normalization, this problem is no longer existent-a further decisive advantage of the approach.

PURPOSE OF THE STUDY-THE COMBINATION OF CITING-SIDE NORMALIZATION AND CITATION PERCENTILES
In section 3, the advantages of field normalization using citing-side approaches have been demonstrated based on the previous literature. Although these advantages have been reported in several papers over many years, these approaches have not been established as standard indicators in (applied) bibliometrics. For example, the Leiden Ranking (see https://www.leidenranking .com) does not consider citing-side indicators, but percentile-based cited-side indicators. One important reason for the avoidance of citing-side indicators might be that these indicators are more complicated to understand (and explain) than many cited-side indicators and indicators that are not based on field normalization. The results by Hammarfelt and Rushforth (2017) show that "simple and well-established indicators, like the JIF and the h-index, are preferred" (pp. 177-178) when indicators are used in practice. Jappe, Pithan, and Heinze (2018) similarly wrote that "the journal impact factor (JIF) … and the Hirsch Index (h-index or HI) … have spread widely among research administrators and funding agencies over the last decade." According to the University of Waterloo Working Group on Bibliometrics (2016), "there is often a demand for simple measures because they are easier to use and can facilitate comparisons" (p. 2).
This study is intended to propose a field normalization approach that combines citing-side normalization and citation percentiles. The advantage of the combination lies in the abandonment of a field classification system (by using citing-side normalization) and the realization of field normalized scores (percentiles) that are relatively simple to understand and being applied in research evaluation. In the first step of the approach, weighted citation counts are calculated based on the formula (see above) presented by Waltman and van Eck (2013a). In this study, the SNCS3 is used, as Waltman and van Eck (2013b) recommended its use (based on their empirical results). However, the approach is not bound to this SNCS variant. In the second step, the percentile approach proposed by Bornmann and Williams (2020) is used to calculate citation percentiles based on SNCS3. In this step, too, it is possible to use another percentile approach such as those proposed by Bornmann, Leydesdorff, and Mutz (2013) or Bornmann and Mutz (2014). This study prefers the approach by Bornmann and Williams (2020), because the authors point out the advantages of their approach over previous approaches.
Bornmann and Williams (2020) calculated cumulative frequencies in percentages (CPs) as demonstrated in Table 1 based on the size-frequency distribution (Egghe, 2005) to receive citation percentiles. The table shows the citation counts and SNCS3 for 24 fictitious papers. For example, there are five papers in the set with 12 citations and a weighted citation impact of 0.45 each. Note that not all papers with five citations have an SNCS3 score of 0.45 and vice versa. For the indicator CP-EX WC (the subscript WC stands for weighted citations), the first percentage (for papers with 1 citation) is set at 0. The calculation of the cumulative percentage starts in the second row with the percentage of the lowest citation count (16.67%). By setting the first row to zero, CP-EX WC measures exactly the percentage of papers with lower citation impact in the set of papers. For example, CP-EX WC = 95.83 means that exactly 95.83% of the papers in the set of 24 papers received a citation impact-measured by SNCS3-that is below the weighted citation impact of 4.51. 16.67% of the papers received less impact than the weighted citations of 0.20.
CP-EX WC can be calculated for all papers in a database (e.g., all WoS papers) with SNCS3 scores included (or the scores based on another variant). Because (weighted) citation impact depends on the length of the citation window, CP-EX WC should be calculated based on all papers in 1 year (i.e., separated by publication years). With CP-EX WC calculated using SNCS3, one receives a field-normalized indicator that is simple to understand-because the scores are cumulative percentages-and it is based on an advantageous method of field normalization (see above). The definition of CP-EX WC for a focal paper is that x% of papers published in the same year received a lower weighted citation impact than the focal paper. Weighted citation impact means that each citation of the focal paper is weighted by the citation behavior in its field. This definition is simple to understand, not only by bibliometric experts but also by laypersons.
As citation impact is dependent not only on the publication year but also on the document type of the cited publication (see, e.g., Lundberg, 2007), the CP-EX WC calculation should not only be separated by publication year, but also by document type. In this study, it was not necessary to consider the document type in the calculation, because only articles were included.

METHODS
The bibliometric data used in this paper are from an in-house version of the WoS used at the Max Planck Society (Munich, Germany). In this study, all papers are included from this database with the document type "article" and published between 2011 and 2015. The data set contains n = 7,908,584 papers; for n = 914,472 papers no SNCS3 values are available in the in-house database. Thus, the study is based on n = 6,994,112 papers. The SNCS3 scores and CP-EX WC values have been calculated as explained in the sections above. In the calculation of the SNCS3 indicator, we followed the procedure as explained by Waltman and van Eck (2013b). Whereas Waltman and van Eck (2013b), however, only included selected core journals from the WoS database in the SNCS3 calculation, the SNCS3 scores for the present study were calculated based on all journals in the WoS database.
6. RESULTS Figure 1 shows the distribution of SNCS3 scores for 6 years using frequency distributions. It is clearly visible that the SNCS3 distributions are very skewed and characterized by outliers (articles with very high weighted citation impact). Against the backdrop of these skewed distributions (despite citation weighting by citing-side normalization), it sounds reasonable (more than ever) to calculate percentiles based on SNCS3 scores. According to Seglen (1992), skewed citation distributions "should probably be regarded as the basic probability distribution of citations, reflecting both the wide range of citedness values potentially attainable and the low probability of achieving a high citation rate" (p. 632). This basic probability distribution does not appear to be valid only for citation distributions, but also weighted citation distributions (based on SNCS3). Similar to citations, the SNCS3 indicator appears to follow the so-called "bibliometric laws" (de Bellis, 2009, p. xxiv). This is a set of regularities working behind citation processes according to which a certain number of citations is related to the authors generating them (in their papers). The common feature of these processes (and similar processes based on the number of publications or text words) is an "amazingly steady tendency to the concentration of items on a relatively small stratum of sources" (de Bellis, 2009, p. xxiv).
One of these regularities leading to skewed citation distributions might be (larger) quality differences between the research published in the papers (Aksnes et al., 2019). A second regularity might be the type of contribution made by the paper: For example, one can expect many more citations for methods papers than for papers contributing empirical results (Bornmann, 2015;van Noorden, Maher, & Nuzzo, 2014). A third regularity might be a cumulative advantage effect by which "already frequently cited [publications] have a higher probability of receiving even more citations" (van Raan, 2019, p. 239). According to Ruocco, Daraio, et al. (2017), "Price's [Derek J. de Solla Price] assumption was that the papers to be cited are chosen at random with a probability that is proportional to the number of citations those same papers already have. Thus, highly cited papers are likely to gain additional citations, giving rise to the rich get richer cumulative effect."  Papers with low citation impact (i.e., low CP-EX WC scores) are prevalent, but the distributions approximate a uniform distribution.
In this study, the proposed indicator CP-EX WC has been exemplarily applied to publication and citation data of some countries: Switzerland, United Kingdom, United States, Germany, China, and Japan. The results are shown in Figure 3. The upper graph in the figure is based on full counting of the countries' papers. Thus, each paper contributes to the citation impact of a country with a weight of 1-independent of the additional number of countries involved. The score for a country shown in Figure 3 is its CP-EX WC median value. The dotted line in the graph marks the worldwide average. The score for Switzerland in the upper graph is above that line and means, for example, that on average, 60.85% of the papers worldwide have a weighted citation impact that is below the weighted citation impact of papers with a Swiss address.
The results in the upper graph correspond to results based on other (field-normalized) citationbased indicators (e.g., Bornmann, Wagner, & Leydesdorff, 2018;Stephen, Stahlschmidt, & Hinze, 2020). When citation impact is measured size independently, certain small countries such as Switzerland show an excellent performance (the Netherlands is another example, although it is not considered here). It follows the United Kingdom in the upper graph of Figure 3, which has exceeded the United States in citation impact in recent years. China and Japan are at the bottom of the country list. Although these results come as no real surprise, differences from previous results are also observable. One difference refers to the performance differences between the countries that do not appear to be very large. For example, the differences between Switzerland, the United Kingdom, and the United States exceed no more than four percentage points. Another difference from previous studies concerns the performance level. In previous studies, countries such as Switzerland show an excellent performance far away from midlevel performance. If we assume that the dotted line in Figure 3 represents a midlevel performance (50% of the papers worldwide exhibit a lower performance), the best countries (and also the worst) are not far away from 50%. On average, for example, papers from Switzerland are (only) around 10 percentage points above the midlevel performance.
The lower graph in Figure 3 is based on fractional counting. Thus, it has been considered that many papers were published by more than one country. In this study (which is based on the SNCS3 impact indicator), the CP-EX WC score for a paper has been weighted by the number of countries given on a paper (Bornmann & Williams, 2020).
The following formula leads to a fractionally counted mean CP-EX WC score for a country: where CPEX 1 to CPEX y are weighted by the number of countries given on a paper. For example, if a paper was published by authors from four countries, the paper is weighted by 0.25. The fractional assignment (weighting) is included by the notation FR i for paper i = 1 to paper y. The sums of the CP-EX WC scores for paper 1 to paper y published by the unit are divided by the sums of the weightings for paper 1 to paper y.
By applying fractional counting, citation impact benefits arising from collaborations are adjusted. As the results in the lower graph in Figure 3 show, fractional impact counting changes the national results differently: Whereas larger differences are visible for Switzerland, the United Kingdom, and Germany, the differences are smaller for Japan and China. Compared with the upper graph in Figure 3, China and Japan do not really profit from controlling international collaborations in the lower graph: The CP-EX WC scores only change from 46.80% to 46.49% (China) and 46.62% to 46.07% ( Japan). In contrast to China, Switzerland appears to profit significantly in terms of citation impact from international collaboration: Its CP-EX WC decreases from 60.85% (upper graph) to 55.5% (lower graph). The other two countries that also appear to profit from international collaboration are the United Kingdom and Germany (around four percentage points).

DISCUSSION
Because only experts from the same field can properly assess the research of their colleagues, the peer review process is the dominant research evaluation method. Since around the 1980s, the use of indicators in research evaluation has become increasingly popular. One reason might be that "direct assessment of research activity needs expert judgment, which is costly and onerous, so proxy indicators based on metadata around research inputs and outputs are widely used" (Adams, Loach, & Szomszor, 2016, p. 2). For Lamont (2012), another reason is that "governments have turned to new public management tools to ensure greater efficacy, with the result that quantitative measures of performance and benchmarking are diffusing rapidly" (p. 202). However, peer review and the use of indicators do not have to be incompatible approaches; it is seen as the "ideal way" in research evaluation to combine both methods in the so-called informed peer review process. According to Waltman and van Eck (2016, p. 542), "scientometric indicators can … be used by the peer review committee to complement the results of in-depth peer review with quantitative information, especially for scientific outputs that have not been evaluated in detail by the committee." In the confrontation of peer review and bibliometrics, one should consider that both methods are related: "citations provide a built-in form of peer review" (McAllister et al., 1983, p. 205).
Citation analysis is one of the most important methods in bibliometrics, as the method appears to measure quality issues: "at high frequency, citations are good indicators of utility, significance, even the notion of impact. The late sociologist of science, Robert Merton likened citations to repayments of intellectual debts. The normative process in science requires authors to acknowledge relevant previous contributions" (Panchal & Pendlebury, 2012, p. 1144). One of the major challenges of citation analyses is the field dependency of citations. If larger units in science are evaluated that are working in many fields, it is necessary to consider these differences in the statistical analyses (Bornmann, 2020). According to Kostoff (2002), "citation counts depend strongly on the specific technical discipline, or sub-discipline, being examined … The documentation and citation culture can vary strongly by sub-discipline. Since citation counts can vary sharply across sub-disciplines, absolute counts have little meaning, especially in the absence of absolute citation count performance standards" (p. 53; see also Fok & Franses, 2007).
One solution to the problem of field-specific differences in citation counts is to contextualize the results of citation analyses "case by case, considering all the relevant information" (D'Agostino, Dardanoni, & Ricci, 2017, p. 826). According to Waltman and van Eck (2019), one can "use straightforward non-normalized indicators and to contextualize these indicators with additional information that enables evaluators to take into account the effect of field differences" (p. 295). This might be the best solution if smaller research groups or institutions working in clearly definable fields are evaluated. For this solution, however, it is necessary to involve not only a bibliometric expert in the evaluation but also an expert from the evaluated field to contextualize these indicators. For example, for the identification of research groups working in the same field as the focal group, it is necessary for an expert to identify these groups that can be used for comparison of the focal group. This solution of contextualizing the number of times when research is cited is stretched to its limits when large units such as organizations or countries are addressed in evaluations. These units are multidisciplinary by nature.
Since the 1980s, many different methods have been proposed to field-normalize citations. It has not been possible to establish a standard method until now. In this study, an approach is proposed that combines two previously introduced methods: citing-side normalization and percentiles. The advantage of combining two methods is that their advantages can be integrated into a single solution. Based on citing-side normalization, each citation is field weighted and, therefore, contextualized in its field. The most important advantage of citing-side normalization is that it is not necessary to work with a specific field categorization scheme. The disadvantages of citingside normalization-the calculation is complex and the values elusive-can be compensated by calculating percentiles based on the field-weighted citations. On the one hand, percentiles are well understandable: It is the percentage of papers published in the same year with lower citation impact. On the other hand, weighted citation distributions are skewed distributions including outliers. Percentiles are well suited to assigning the position of a focal paper in such skewed distributions including a field-specific set of papers.
Many different approaches of percentile calculation exist . According to Schreiber (2013, p. 829) "all the discussed methods have advantages and disadvantages. Further investigations are needed to clarify what the optimal solution to the problem of calculating percentiles and assigning papers to PRCs [percentile rank classes] might be, especially for large numbers of tied papers." Bornmann and Williams (2020) appear to have found a percentile solution with comparably good properties. In this study, their percentile approach based on weighted citations (CP-EX WC ) has been applied to the analysis of several countries. The country results are similar to many other published results. This correspondence in the results can be interpreted as a good sign for the new approach: It appears to measure fieldnormalized citation impact in a similar way to other indicators. However, the approach also reveals the importance of measuring citation impact based on fractional counting. Several countries are strongly internationally oriented, which has a larger influence on the results.
Further studies are necessary to investigate the new approach introduced here. These studies could also focus on other units than those considered in this study (e.g., institutions and research groups). Furthermore, it would be interesting to know how the new approach can be understood by people who are not bibliometric experts: Is it as easy to understand as expected, or are there difficulties in understanding it?

ACKNOWLEDGMENTS
The bibliometric data used in this paper are from an in-house database developed and maintained by the Max Planck Digital Library (MPDL, Munich) and derived from the Science Citation Index Expanded (SCI-E), Social Sciences Citation Index (SSCI), and Arts and Humanities Citation Index (AHCI) prepared by Clarivate Analytics, formerly the IP & Science business of Thomson Reuters (Philadelphia, Pennsylvania, USA).