## Abstract

This review investigates (a) the number of unique counting methods, (b) to what extent counting methods can be categorized according to selected characteristics, (c) methods and elements to assess the internal validity of counting methods, and (d) to what extent and with which characteristics counting methods are used in research evaluations. The review identifies 32 counting methods introduced from 1981 to 2018. Two frameworks categorize these counting methods. Framework 1 describes selected mathematical properties and Framework 2 describes arguments for choosing a counting method. Twenty of the 32 counting methods are rank dependent, fractionalized, and introduced to measure contribution, participation, etc. of an object of study. Next, three criteria for internal validity are used to identify five methods that test the adequacy, two elements that test the sensitivity, and three elements that test the homogeneity of counting methods. Finally, a literature search finds that only three of the 32 counting methods are used by four research evaluations or more. Two counting methods are used with the same characteristics as defined in the studies that introduced the counting methods. The review provides a detailed foundation for working with counting methods, and many of the findings provide bases for future investigations of counting methods.

## PEER REVIEW

## 1. INTRODUCTION

The topic of the present review is counting methods in bibliometrics. The bibliometric research literature has discussed counting methods for at least 40 years. However, the topic remains relevant, as the findings in the review show. Section 1.1 provides the background for the review, and is followed by the study aims (Section 1.2) and research questions (Section 1.3).

### 1.1. Background

The use of counting methods in the bibliometric research literature is often reduced to the choice between full and fractional counting. Full counting gives the authors of a publication 1 credit each, whereas fractional counting shares 1 credit between the authors of a publication. However, several studies document that there are not just two but many counting methods in the bibliometric research literature, and that the distinction between full and fractional counting is too simple to cover these many counting methods (for examples, see Todeschini & Baccini, 2016, pp. 54–74; Waltman, 2016, pp. 378–380; and Xu, Ding et al., 2016).

A counting method functions as one of the core elements of a bibliometric indicator, such as the *h*-index fractionally counted (Egghe, 2008) or by first-author counting (Hu, Rousseau, & Chen, 2010). For some indicators, the counting method is the only element, such as the number of publications (full counting) on a researcher’s publication list. Counting methods are not only relevant for how to construct and understand indicators but also for bibliometric network analyses (Perianes-Rodriguez, Waltman, & van Eck, 2016), field-normalization of indicators (Waltman & van Eck, 2015), and rankings (Centre for Science and Technology Studies, n.d.).

There are bibliometric analyses where the choice of counting method makes no difference. For sole-authored publications, the result is the same whether the authors are credited by full counting (1 credit) or fractional counting (1/1 credit). However, coauthorship is the norm in most research fields, and the average number of coauthors per publication has been increasing (Henriksen, 2016; Lindsey, 1980, p. 152; Price, 1986, pp. 78–79; Wuchty, Jones, & Uzzi, 2007). Other objects of study reflect this trend; for example, the number of countries per publication has also been increasing (Gauffriau, Larsen et al., 2008, p. 152; Henriksen, 2016).

Choosing a counting method is essential for the majority of bibliometric analyses that evaluate authors, institutions, countries, or other objects of study. A study of a sample of 99 bibliometric studies finds that, for two-thirds of the studies, the choice of counting method can affect the results (Gauffriau, 2017, p. 678). The effect of shifting between counting methods can be seen in the Leiden Ranking, where it is possible to choose between full and fractional counting for the indicators on scientific impact (Centre for Science and Technology Studies, 2019). A change from one counting method to the other alters the scores, and the order of the institutions in the ranking may change.

Nonetheless, many bibliometric studies do not explicitly justify the choice of counting method. More broadly, in bibliometric practice, there are no common standards for how to describe counting methods in the methods section of studies (Gauffriau, 2017, p. 678; Gauffriau et al., 2008, pp. 166–169). This implicit status of counting methods is also reflected in bibliometric textbooks and handbooks from the most recent decade. Many do not have a chapter or larger section dedicated to counting methods (for examples, see Ball, 2018; Cronin & Sugimoto, 2014; Gingras, 2016; Glänzel, Moed et al., 2019). Others have a chapter or larger section dedicated to counting methods but include only common counting methods and/or lack well-defined frameworks to describe the counting methods (for examples, see Sugimoto & Larivière, 2018, pp. 54–56; Todeschini & Baccini, 2016, pp. 54–74; Vinkler, 2010, Chapter 10; Vitanov, 2016, pp. 27–29).

The present review demonstrates that a consistent analysis of the majority of bibliometric counting methods can reveal new knowledge about those counting methods. Given this, I argue for the explicit use, analysis, and discussion of counting methods in bibliometric practice and research as well as in bibliometric textbooks and handbooks.

### 1.2. Aims and Relevance

The topic of this review is bibliometric counting methods. The aims are to investigate counting methods in the bibliometric research literature and to provide insights into their common characteristics, the assessment of their internal validity, and how they are used. The review presents different categorizations of counting methods and discusses the counting methods based on these categorizations. Hence, the review does not focus on counting methods individually but rather on the general characteristics found across counting methods. The general characteristics provide insights into the counting methods and how they overlap or differ from each other.

Three previous reviews of counting methods (Todeschini & Baccini, 2016, pp. 54–74; Waltman, 2016, pp. 378–380; Xu et al., 2016) are fairly comprehensive; however, the present review includes still more counting methods. In their handbook, Todeschini and Baccini present a list of counting methods and provide a definition for each counting method, but they do not in a consistent manner analyze characteristics across counting methods. Waltman uses a division into full, fractional, and other counting methods. The present review develops Waltman’s approach further, resulting in a more well-defined categorization of counting methods. Xu et al.’s categorization of counting methods analyzes data distributions. Although the review does not use this categorization, it discusses the categorization as one route for future research.

Counting methods are often categorized with regard to their mathematical properties (for examples, see Rousseau, Egghe, & Guns, 2018, sec. 5.6.3; Waltman, 2016, pp. 378–380; Xu et al., 2016). In addition to a categorization based on selected mathematical properties (Gauffriau, Larsen et al., 2007), the review applies another approach (Gauffriau, 2017), which builds on qualitative text analysis. I adopt this approach to describe why counting methods are introduced into the bibliometric research literature.

Previous studies either have documented that many different counting methods exist in the bibliometric research literature or have analyzed selected counting methods using well-defined frameworks. The present review does both by covering more counting methods than previous reviews and by providing detailed insight into the general characteristics of these counting methods. Furthermore, the review considers three criteria for assessing the internal validity of bibliometric indicators (Gingras, 2014), applying these to identify methods and elements that can be used to assess the internal validity of the counting methods. Finally, the review investigates the use of the counting methods in research evaluations. The results of the review are a unique resource for informing the use of counting methods and inspiring further investigations of counting methods.

### 1.3. Research Questions: RQs 1–4

The aims presented in Section 1.2 lead to four interconnected research questions (RQs):

RQ 1: How many unique counting methods are there and when were they introduced into the bibliometric research literature?

RQ 1 is useful to understand the magnitude and timeliness of this review’s aims. As discussed in Section 1.1, counting methods often remain implicit in bibliometric analyses, even though there are many counting methods to choose from and a change from one counting method to another may alter the scores for the objects of study. The review provides an overview of how many counting methods there are in the bibliometric research literature. Making this information available is the first step in facilitating the explicit and informed choice of counting methods in bibliometric analyses.

RQ 2: To what extent can the counting methods identified by RQ 1 be categorized according to selected frameworks that focus on characteristics of the counting methods?

RQ 2 explores whether the counting methods identified by RQ 1 share characteristics. As Section 1.1 mentions, the simplified dichotomy of full or fractional counting is often seen in the bibliometric research literature. The analysis for RQ 2 uses two fine-grained frameworks to provide both a more detailed categorization of the counting methods’ mathematical properties and a categorization of why the counting methods were introduced into the bibliometric research literature. These categorizations do not focus on only a few counting methods, but rather, provide knowledge about a large number of counting methods.

RQ 3: Which methods and elements from the studies that introduce the counting methods identified by RQ 1 can be used to assess the internal validity of those counting methods?

Where RQ 2 focuses on the shared characteristics of the counting methods identified by RQ 1, RQ 3 supplements this with information about the assessment of the internal validity of counting methods (i.e., as drawn from the studies that introduce the counting methods). As discussed in Section 1.1, the counting method is a core element in the construction of a bibliometric indicator. Therefore, not only the characteristics of the counting methods but also the internal validity of the counting methods are important.

RQ 4: To what extent are the counting methods identified by RQ 1 used in research evaluations and to what extent is this use compliant with the definitions in the studies that introduce the counting methods?

As mentioned previously, the use of counting methods is often reduced to a choice between full and fractional counting. RQ 4 investigates the use of counting methods in more detail. The use of the counting methods should comply with the design of the counting methods. If one or more of the characteristics identified under RQ 2 changes between the point of introduction and the point of use of a counting method then the internal validity of the counting method may be compromised.

## 2. METHODS

Section 2 presents the methods used to address the RQs presented in Section 1.3. Organized by RQ, Table 1 summarizes the methods, as well as the related data, tools, and results. Sections 2.1–2.4 provide detailed presentations of the methods, discuss the rationale for applying the methods, and show how the results from each RQ inform the subsequent RQs.

**Table 1.**

RQ
. | Method
. | Data
. | Tools
. | Results
. |
---|---|---|---|---|

RQ 1 | Literature search based on citing and cited studies. | Peer-reviewed studies in English published 1970–2018. | Google Scholar. | Thirty-two counting methods introduced into the bibliometric research literature over the period 1981–2018. No unique counting methods are introduced in the period 1970–1980. |

RQ 2 | Categorizations of counting methods. | The 32 counting methods identified by RQ 1. | Two frameworks: The first describes selected mathematical properties of counting methods, and the second describes arguments for choosing a counting method. | Thirty counting methods are categorized according to the first framework, and all 32 counting methods are categorized according to the second framework. |

RQ 3 | Identification of methods and elements useful for assessing internal validity of counting methods. | The 32 counting methods identified by RQ 1. | Three internal validity criteria for bibliometric indicators. The results from RQ 2. | Five methods and five elements related to the assessment of the internal validity of counting methods are identified. |

RQ 4 | Literature search based on citing studies. | The 32 counting methods identified by RQ 1. A sample of research evaluations that use counting methods identified by RQ 1. | Google Scholar. The results from RQ 2. | Three of the 32 counting methods are each used in a minimum of four research evaluations. For one counting method, the use does not comply with the characteristics of the counting method as defined by the study that introduced the counting method. |

RQ
. | Method
. | Data
. | Tools
. | Results
. |
---|---|---|---|---|

RQ 1 | Literature search based on citing and cited studies. | Peer-reviewed studies in English published 1970–2018. | Google Scholar. | Thirty-two counting methods introduced into the bibliometric research literature over the period 1981–2018. No unique counting methods are introduced in the period 1970–1980. |

RQ 2 | Categorizations of counting methods. | The 32 counting methods identified by RQ 1. | Two frameworks: The first describes selected mathematical properties of counting methods, and the second describes arguments for choosing a counting method. | Thirty counting methods are categorized according to the first framework, and all 32 counting methods are categorized according to the second framework. |

RQ 3 | Identification of methods and elements useful for assessing internal validity of counting methods. | The 32 counting methods identified by RQ 1. | Three internal validity criteria for bibliometric indicators. The results from RQ 2. | Five methods and five elements related to the assessment of the internal validity of counting methods are identified. |

RQ 4 | Literature search based on citing studies. | The 32 counting methods identified by RQ 1. A sample of research evaluations that use counting methods identified by RQ 1. | Google Scholar. The results from RQ 2. | Three of the 32 counting methods are each used in a minimum of four research evaluations. For one counting method, the use does not comply with the characteristics of the counting method as defined by the study that introduced the counting method. |

### 2.1. RQ 1: Literature Search for Counting Methods

RQ 1 serves to illustrate the magnitude and timeliness of the present review. The results of RQ 1 go on to form the basis for RQs 2–4.

RQ 1: How many unique counting methods are there and when were they introduced into the bibliometric research literature?

To answer RQ 1, a literature search is employed to identify counting methods in the bibliometric research literature. The literature search concentrates on studies that introduce counting methods rather than studies that use counting methods.

To be included in the review, studies must introduce counting methods defined by an equation or similar to guide calculation. Where a study presents only minor variations to the equation of an existing counting method, that variant approach is not included as a separate counting method. Section 3.2 gives a few examples of such variations. Variations of existing counting methods are also not included in cases where the variations add weights to publication counts in the form of citations, Journal Impact Factors, publication type weights, etc. Furthermore, the counting methods must be applicable to publications with any number of authors. For counting methods introduced with different objects of study in different studies—for example, institutions in one study and countries in another study—only the first study is included.

In addition to the counting methods included in this review, “hybrids” also exist in the literature. Hybrids are counting methods that sit somewhere between two known counting methods. For example, Combined Credit Allocation sits between complete-fractionalized and Harmonic counting (Liu & Fang, 2012, p. 41) and the First and Others credit-assignment schema sits between straight and Harmonic counting (Weigang, 2017, p. 187). Hybrids are not included in the review.

The literature search is restricted to peer-reviewed studies in English from the period 1970–2018. Prior to 1970, discussions about counting methods in bibliometrics seem to be in the context of specific studies; however, from approximately 1970 onwards, some of the discussions about counting methods start to offer general recommendations in relation to choosing a counting method for a bibliometric analysis. These general recommendations derive primarily from changing norms for coauthorship and the launch of the *Science Citation Index* (Cole & Cole, 1973, pp. 32–33; Lindsey, 1980; Narin, 1976), but other factors such as institutional and national research policies and programs may have had an effect as well. Counting methods introduced after 2018 are not included in the review as the use of these is difficult to assess (RQ 4) at the present time (i.e., less than two years after their introduction into the bibliometric research literature).

Studies that introduce counting methods into the bibliometric research literature are found via a search of cited and citing studies (Harter, 1986, pp. 58–60; 186). The search begins with studies cited by or citing Gauffriau et al.’s work on counting methods (Gauffriau, 2017; Gauffriau et al., 2007, 2008; Gauffriau & Larsen, 2005). From these cited and citing studies, studies are selected for the review. The cited and citing studies of these selected studies are then searched to add yet more studies to the review, and so on. This search approach is chosen because the terminology for counting methods is not well defined. Many terms are general, such as “count” or “number of,” making adequate keywords difficult to identify.

Citations to 10 selected studies are searched using Google Scholar, Scopus, and Web of Science. Google Scholar proves to have the best coverage. This finding is supported by large scale studies of Google Scholar (Delgado López-Cózar, Orduña-Malea, & Martín-Martín, 2019, sec. 4.3).

Two sources are used to find cited and citing studies, respectively. Cited studies are found via the reference lists in the selected studies. Citing studies are found via Google Scholar. The titles and the first lines of the abstracts as presented in Google Scholar’s list of results are skimmed to find relevant studies among the citing studies. In addition, a search in the citing studies for the terms “count,” “counting,” “fraction,” “fractionalized,” and “fractionalised” is conducted via Google Scholar’s “Search within citing articles.” The results are skimmed to find relevant studies. The final search for citing studies was completed in December 2018.

The result of the literature search cannot claim a 100% coverage of all counting methods that exist in the bibliometric research literature; however, when citations to and from the studies included in this review are checked, a significant redundancy is encountered (i.e., studies already included in the review). Thus, it is assumed that the review covers the majority of counting methods discussed in the bibliometric research literature during the period 1970–2018.

### 2.2. RQ 2: Categorizations of Counting Methods According to Their Characteristics

RQ 1 identifies counting methods in the bibliometric research literature. RQ 2 explores whether the counting methods identified under RQ 1 share characteristics. Shared characteristics can facilitate general knowledge about counting methods rather than specific knowledge of only a few counting methods. The results of RQ 2 are used in the analyses related to RQs 3 and 4.

RQ 2: To what extent can the counting methods identified by RQ 1 be categorized according to selected frameworks that focus on characteristics of the counting methods?

The list of 32 counting methods created under RQ 1 does not provide information about the types of counting methods. Therefore, RQ 2 applies two frameworks to categorize the counting methods identified by RQ 1. “Framework” is a unifying term for generalizable approaches that describe and facilitate the use, analysis, and discussion of counting methods. These approaches compile elements such as consistent terminology, definitions, and categorizations of counting methods.

The first framework (Framework 1) describes selected mathematical properties of the counting methods (Gauffriau et al., 2007). The second framework (Framework 2) is based on a qualitative text analysis and describes four groups of arguments for the choice of counting method in a bibliometric analysis (Gauffriau, 2017). As such, the two frameworks have different foundations and they address different characteristics of the counting methods. The frameworks are described in detail in Sections 2.2.1 and 2.2.2.

#### 2.2.1. Framework 1: Selected mathematical properties of counting methods

“Framework 1: Selected mathematical properties of counting methods” builds on measure theory (Halmos, 1950) and provides well-defined definitions and a detailed terminology for counting methods (Gauffriau et al., 2007). Thus, the framework offers a more precise terminology compared to the dichotomy of full and fractional counting. Framework 1 was developed in 2007, and the definitions and terminology were applied to common counting methods from the bibliometric research literature (Gauffriau et al., 2007). The name “Framework 1: Selected mathematical properties of counting methods” is introduced for the context of the present review.

A foundation for measure theory is sets and subsets (Halmos, 1950, pp. 9–15). In relation to counting methods, sets are, for example, authors, institutions, or countries. These sets can be subsets of each other, such as institutions in countries. And the sets are subsets of the set “the world” as represented by a database (Gauffriau et al., 2007, pp. 180–183). Thus, measure theory facilitates an explicit analysis of aggregation levels of counting methods. Furthermore, a counting method is additive if the score calculated as a sum of two or more sets is equal to the score for the union of the same sets. The sets must be disjoint (Gauffriau et al., 2007, pp. 185–186; Halmos, 1950, p. 30). And counting methods can be normalized (i.e., fractionalized), meaning that a publication is equal to 1 (Gauffriau et al., 2007, pp. 187–188; Halmos, 1950, p. 171).

The review uses the framework to categorize counting methods according to two mathematical properties: whether a counting method is rank dependent or rank independent, and whether it is fractionalized or nonfractionalized. The mathematical properties are discussed in more detail later in this section.

Below, firstly, the framework’s terminology for counting methods is introduced (Gauffriau et al., 2007). Secondly, the mathematical properties of the counting methods are presented (Gauffriau et al., 2007). And lastly, assumptions made about the mathematical properties are discussed to enable the use of the framework in the present review.

##### Terminology

###### Counting method

A “counting method is defined by the choice of basic unit [of analysis], object [of study] and score function” (Gauffriau et al., 2007, p. 178; square brackets added).

“Objects of study,” “basic units of analysis,” and “score function” are described below. Table 2 illustrates how five score functions work in counting methods, where countries are both units of analysis and objects of study. The basic units of analysis are credited, and the objects of study are scored by collecting the credits from the basic units of analysis assigned to the object of study. For the illustration, the table presents a publication with three addresses: one from Country X and two from Country Y.

**Table 2.**

Score functions
. | Basic units of analysis in the publication
. | Objects of study
. | |||
---|---|---|---|---|---|

Country X . | Country Y . | Country Y . | Country X . | Country Y . | |

Complete | 1 credit | 1 credit | 1 credit | Score: 1 | Score: 2 |

Complete-fractionalized | 1/3 credit | 1/3 credit | 1/3 credit | Score: 1/3 | Score: 2/3 |

Straight | 1 credit | 0 credit | 0 credit | Score: 1 | Score: 0 |

Whole | 1 credit | 1/2 credit | 1/2 credit | Score: 1 | Score: 1 |

Whole-fractionalized | 1/2 credit | 1/4 credit | 1/4 credit | Score: 1/2 | Score: 1/2 |

Score functions
. | Basic units of analysis in the publication
. | Objects of study
. | |||
---|---|---|---|---|---|

Country X . | Country Y . | Country Y . | Country X . | Country Y . | |

Complete | 1 credit | 1 credit | 1 credit | Score: 1 | Score: 2 |

Complete-fractionalized | 1/3 credit | 1/3 credit | 1/3 credit | Score: 1/3 | Score: 2/3 |

Straight | 1 credit | 0 credit | 0 credit | Score: 1 | Score: 0 |

Whole | 1 credit | 1/2 credit | 1/2 credit | Score: 1 | Score: 1 |

Whole-fractionalized | 1/2 credit | 1/4 credit | 1/4 credit | Score: 1/2 | Score: 1/2 |

I restrict the literature search to unique score functions rather than unique counting methods introduced into the bibliometric research literature. One score function can be applied in several counting methods. That is, score functions can be used with different combinations of basic units of analysis and objects of study. This means that a score function can describe a class of counting methods. However, from the literature search, it is clear that score functions are introduced as counting methods with specific basic units of analysis and objects of study, often authors (i.e., at the microlevel). Therefore, the term *counting method* is used for the score functions identified in the literature search.

###### Objects of study and basic units of analysis

Objects of study are the objects presented in the results of bibliometric analyses, such as researchers, institutions, or countries. Objects of study can be found in publications, but they may also be objects not directly visible in the publication, such as unions of countries (e.g., the European Union (EU) and the United Kingdom).

###### Basic units of analysis are found in publications

Objects of study are scored by collecting credits from the basic units of analysis. It is common to find bibliometric analyses in which the basic units of analysis and the objects of study are at the same aggregation level, such as authors (microlevel). However, this is not the case in the calculation of the score for the EU as an object of study. Credits are given to countries belonging to the EU; thus, countries are the basic units of analysis. Thus, the objects of study and the basic units of analysis are at different aggregation levels: unions of countries (supra-level) and countries (macrolevel), respectively.

###### Score function

A score function describes how the objects of study are scored. The basic units of analysis are credited individually before the objects of study collect the credits. Five common score functions are presented below:

Complete

A credit of 1 is given to each basic unit of analysis in a publication. An object of study collects the credits from the basic units of analysis assigned to the object of study.

Complete-fractionalized

A credit of 1/

*n*is given to each basic unit of analysis where*n*is the number of basic units of analysis in a publication. An object of study collects the credits from the basic units of analysis assigned to the object of study.Straight

A credit of 1 is given to the basic unit of analysis ranked first in a publication. All other basic units of analysis in the publication are credited 0. An object of study collects the credits from the basic units of analysis assigned to the object of study.

Instead of first authors (i.e., the basic unit of analysis ranked first in the publication), last authors or reprint authors can also be credited (for examples, see Gauffriau et al., 2007, p. 676). The review does not discuss these alternatives further.

Whole

A credit of 1 is given to each basic unit of analysis, assigned one-to-one to a unique object of study, in a publication. If a unique object of study is represented by more basic units of analysis in a publication, these basic units of analysis share 1 credit in whatever way. An object of study collects the credits from the basic units of analysis assigned to the object of study.

Whole-fractionalized

A credit of 1/

*m*is given to each basic unit of analysis, assigned one-to-one to a unique object of study, where*m*is the number of unique objects of study related to a publication. If a unique object of study is represented by more basic units of analysis in a publication, these basic units of analysis share 1/*m*credit in whatever way. An object of study collects the credits from the basic units of analysis assigned to the object of study.

When the terminology is reduced to full and fractional counting, the differences between complete and whole score functions are not immediately visible. Both are called full counting. Neither are the differences between complete-fractionalized, straight, and whole-fractionalized score functions. All three are variations of fractional counting.

###### A note on terminology

In measure theory, which is the theoretical basis for Framework 1, the term *normalized* is used (Halmos, 1950, p. 171) for the property where the credit of 1 is shared (i.e., divided among the basic units of analysis of a publication). The present review uses the alternative term *fractionalized* because this has become the norm in the bibliometric research literature discussing counting methods. The term *normalized* typically refers to field-normalization (Waltman, 2016, secs. 6 and 7).

##### Mathematical properties

The five score functions above have definitions based on five mathematical properties introduced below. Table 3 shows how the mathematical properties form score functions, and thus, classes of counting methods (Gauffriau et al., 2007, p. 198). Detailed explanations follow below the table. The five mathematical properties are used to form assumptions, which are necessary for the analysis undertaken in relation to RQ 2:

Defined for all objects/not defined for all objects

All classes of counting methods in Table 3 except whole-fractionalized counting are defined for all objects of study. To test whether a counting method is defined for all objects of study, some of the objects of study can be merged to form a union. If this does not change the score for the objects of study not included in the union, then the score function is defined for all objects of study. The United Kingdom as an object of study can be used as illustration. To find all publications affiliated with the United Kingdom in Web of Science, it is necessary to search for publications from England, Scotland, Wales, and Northern Ireland. Take a publication with 10 unique country affiliations in which the UK is represented by only one of England, Scotland, Wales, or Northern Ireland. Following whole-fractionalized counting, the score for each country affiliated with the publication is 1/10. Now take another publication, again with 10 unique country affiliations. In this publication, the UK is represented by three countries, such as England, Scotland, and Wales. In this case, the three countries are merged, and the score for each country affiliated with the publication becomes 1/8.

Based on a fixed crediting scheme/not based on a fixed crediting scheme

All classes of counting methods in Table 3 except whole and whole-fractionalized counting have fixed crediting schemes. Whole and whole-fractionalized counting are not based on fixed crediting schemes, as a change of objects of study may also change the credits given to basic units of analysis. If the objects of study and the basic units of analysis are institutions, then unique institutions in the affiliation section of a publication will be credited. If the basic units of analysis are kept and the objects of studies changed to countries, then unique countries in a publication will be credited via their institutions. If more than one institution from a country contributes to the publication, then the institutions share the credit for that country. Thus, the basic units of analysis cannot be credited independently of the objects of study. If a counting method is based on a fixed crediting scheme, then the counting method is additive (see next item).

Additive/nonadditive

Complete, complete-fractionalized, and straight counting are additive. The score for the objects of study can be calculated via credits to basic units of analysis at the same aggregation level (for example, macrolevel) or to basic units of analysis at lower aggregation levels (for example, meso- or microlevel) and the score will remain the same given that there is a one-to-one relation between the aggregation levels. If countries are objects of study, then it makes no difference whether the basic units of analysis are institutions or countries, providing that each address in the affiliation section of a publication has only one institution and one country. If a counting method is additive, then the counting method is defined for all objects (see first item).

Rank-independent/rank-dependent

Complete and complete-fractionalized counting are rank independent. The order of the basic units of analysis—for example, the order of countries in the affiliation section of a publication—does not influence how the basic units of analysis are credited. All basic units of analysis get the same credit. Straight counting is rank dependent because only the first basic unit in the affiliation section of a publication is credited. All other basic units of analysis get 0 credit. This property of rank-independency/rank-dependency is not applicable to whole and whole-fractionalized counting, as these counting methods are not based on fixed crediting schemes. For example, if countries are the objects of study, then for a publication with 10 country affiliations, where affiliation numbers two, six and seven are Denmark, the credit can be attributed to the affiliation ranked second, sixth, or seventh in whatever way. Thus, rank-dependency cannot be applied. Neither can rank-independency be applied (i.e., where all basic units of analysis receive the same credit).

Fractionalized/nonfractionalized

Complete-fractionalized, straight, and whole-fractionalized counting are fractionalized because, with these methods, the basic units of analysis in a publication share a total credit of 1. The rationale is that a publication equals 1 credit. Complete and whole counting are not fractionalized (i.e., the credits for the basic units of analysis of a publication can sum to more than 1). Note that fractionalized and additive are two different properties. For example, whole-fractionalized counting is fractionalized and nonadditive, whereas complete counting is nonfractionalized and additive.

**Table 3.**

Defined for all objects
. | Based on a fixed crediting scheme
. | Additive
. | Rank-independent
. | Fractionalized
. | Classes of counting methods described in the literature
. |
---|---|---|---|---|---|

Yes | Yes | Yes | No | Complete | |

Yes | Complete-fractionalized | ||||

No | No | ||||

Yes | Straight | ||||

No | Yes | Not applicable | No | ||

Yes | |||||

No | No | Whole | |||

Yes | |||||

No | No | No | Not applicable | No | |

Yes | Whole-fractionalized |

Defined for all objects
. | Based on a fixed crediting scheme
. | Additive
. | Rank-independent
. | Fractionalized
. | Classes of counting methods described in the literature
. |
---|---|---|---|---|---|

Yes | Yes | Yes | No | Complete | |

Yes | Complete-fractionalized | ||||

No | No | ||||

Yes | Straight | ||||

No | Yes | Not applicable | No | ||

Yes | |||||

No | No | Whole | |||

Yes | |||||

No | No | No | Not applicable | No | |

Yes | Whole-fractionalized |

In the review, the use of the framework with these five mathematical properties to categorize counting methods incorporate the assumptions below.

##### Assumptions about mathematical properties for counting methods

The analyses presented in the review focus on two of the five properties: rank independent/rank dependent and fractionalized/nonfractionalized. The following assumptions explain why the review focuses on these two properties.

As already mentioned, score functions are introduced into the bibliometric research literature as counting methods, often with authors as basic units of analysis and objects of study (i.e., at the microlevel). Without information about how the score functions work at, for example, the meso- or macrolevel, it is difficult to decide the score functions’ status for the first three mathematical properties: defined for all objects/not defined for all objects, based on a fixed crediting scheme/not based on a fixed crediting scheme, and additive/nonadditive. For example, complete-fractionalized and whole-fractionalized counting differ regarding the three properties (see Table 3), but at the microlevel, the calculations of scores are identical. At other aggregation levels, the calculations differ for the two counting methods.

Using Framework 1, however, the first three mathematical properties can help when making assumptions about the counting methods at the microlevel that use rank to determine credits for the basic units of analysis. As mentioned in the introduction to the five mathematical properties, counting methods with rank-dependent score functions have a fixed crediting scheme. If based on a fixed crediting scheme, the score functions are additive. If additive, the score functions are defined for all objects.

For score functions introduced as counting methods at the microlevel that are not rank-dependent, it is difficult to decide if the score function is rank independent (for example, complete and complete-fractionalized counting) or, rather, if the rank independent/rank dependent property is not applicable (for example, whole and whole-fractionalized counting). In the review, such counting methods are assumed to be rank independent, and thus, based on a fixed crediting scheme, additive, and defined for all objects.

For all counting methods included in the review the status for the property fractionalized/nonfractionalized is explicitly evident in the studies that introduce the counting methods.

Based on the above assumptions, the results of the present review focus on the properties rank dependent/rank independent and fractionalized/nonfractionalized. Thus, the categorization of counting methods is rank dependent and fractionalized (see Section 3.2.1), rank dependent and nonfractionalized (see Section 3.2.2), rank independent and nonfractionalized (see Section 3.2.3), and rank independent and fractionalized (see Section 3.2.4).

#### 2.2.2. Framework 2: Four groups of arguments for choosing a counting method for a study

“Framework 2: Four groups of arguments for choosing a counting method for a study” proposes a categorization of arguments for choosing a counting method for a study. The categorization is developed from the arguments for counting methods in a sample of 32 studies published in 2016 in peer-reviewed journals and supplemented with arguments for counting methods from three older studies (Gauffriau, 2017). The name “Framework 2: Four groups of arguments for choosing a counting method for a study” is introduced for the context of the present review.

I use Framework 2 to categorize counting methods according to the arguments for why a counting method is introduced into the bibliometric research literature. The studies found in relation to RQ 1 that introduce counting methods argue for why the new counting methods are needed. These arguments are assigned to the four groups of arguments in Framework 2. This use is a slight modification compared to the original intention of Framework 2, in which the arguments relate to choosing a counting method for a study—not introducing a new counting method. However, I assume that a counting method is introduced with the aim of being used in other studies. Thus, the arguments for the introduction and for the use of a counting method are seen as compatible.

Limited resources made it impossible to engage two people to assign arguments to Groups 1–4, which would allow a calculation of intercoder reliability. Instead, the arguments and assignment to groups are reported in the Supplementary Material, Section 3, to make the categorization as transparent as possible.

Table 4 presents the categorization with Groups 1–4 (Gauffriau, 2017, p. 679). Descriptions of the four groups follow Table 4.

**Table 4.**

Category
. | Counting method(s)
. |
---|---|

Group 1: The indicator measures the (impact of)… | |

… participation of an object of study | Whole |

… production of an object of study | Whole, complete-fractionalized |

… contribution of an object of study | Whole, complete-fractionalized (rank independent and rank dependent) |

… output/volume/creditable to/performance of an object of study | Whole, complete-fractionalized |

… the role of authors affiliated with an object of study | Straight, last author, reprint author |

Group 2: Additivity of counting method | |

Additivity of counting method | Whole, complete-fractionalized |

Group 3: Pragmatic reasons | |

Availability of data | Whole, straight, reprint author |

Prevalence of counting method | Whole |

Simplification of indicator | Whole |

Insensitive to change of counting method | Whole |

Group 4: Influence on/from the research community | |

Incentive against collaboration | Complete-fractionalized |

Comply with researchers’ perceptions of how their publications and/or citations are counted | Whole |

Category
. | Counting method(s)
. |
---|---|

Group 1: The indicator measures the (impact of)… | |

… participation of an object of study | Whole |

… production of an object of study | Whole, complete-fractionalized |

… contribution of an object of study | Whole, complete-fractionalized (rank independent and rank dependent) |

… output/volume/creditable to/performance of an object of study | Whole, complete-fractionalized |

… the role of authors affiliated with an object of study | Straight, last author, reprint author |

Group 2: Additivity of counting method | |

Additivity of counting method | Whole, complete-fractionalized |

Group 3: Pragmatic reasons | |

Availability of data | Whole, straight, reprint author |

Prevalence of counting method | Whole |

Simplification of indicator | Whole |

Insensitive to change of counting method | Whole |

Group 4: Influence on/from the research community | |

Incentive against collaboration | Complete-fractionalized |

Comply with researchers’ perceptions of how their publications and/or citations are counted | Whole |

The four groups of arguments for choosing a counting method:

Group 1: The indicator measures the (impact of) contribution/participation/… of an object of study

The arguments for counting methods relate to the concept that the study attempts to measure by using the counting method to design an indicator. For example, some studies in the sample argue that whole counting is suitable for indicators measuring the object of study’s participation in a research endeavor.

Group 2: Additivity of counting method

The arguments for counting methods relate to mathematical properties of the counting method itself: namely, to ensure that the counting method is additive and to avoid double counting of publications.

Group 3: Pragmatic reasons

The conceptual/methodological arguments included in Groups 1 and 2 are not taken into account but instead are pragmatic reasons for the choice of a counting method. Whole counting is quite common in Group 3. This may be explained by this counting method being the readily available approach in the databases often used to calculate bibliometric indicators (i.e., Web of Science and Scopus). In these databases, a search for publications from Denmark returns the number of publications in which Denmark appears at least once in the list of affiliations. This corresponds to whole counting.

Group 4: Influence on/from the research community

The arguments in Group 4 are not related to what an indicator measures (i.e., as in Group 1), but rather, to the impact of the indicator on the research community under evaluation (and vice versa). For example, one of the studies analyzed to create the framework argued for whole counting when the objects of study are researchers because a researcher should have 1 credit for each publication in his or her publication list (Waltman & van Eck, 2015, p. 891). The argument is that this is how a researcher intuitively counts his or her publications, and that this intuitive counting approach should be reflected in the evaluation.

In the review, the categorization of counting methods uses all four groups of arguments. The analysis focuses on arguments for why the counting methods are introduced into the bibliometric research literature.

### 2.3. RQ 3: Internal Validity of Counting Methods

RQ 2 focuses on shared characteristics of the counting methods identified by RQ 1. RQ 3 adds information about the assessment of internal validity of the counting methods by the studies that introduce the counting methods.

RQ 3: Which methods and elements from the studies that introduce the counting methods identified by RQ 1 can be used to assess the internal validity of those counting methods?

To answer RQ 3, methods for and elements of the assessment of internal validity of the counting methods in the studies that introduce the counting methods (RQ 1) are identified. There are no standards commonly applied for such assessments of internal validity, and only a few of the studies that introduce counting methods explicitly include assessments of the internal validity of the counting methods. However, all the studies include analyses of the counting methods. These analyses set out methods that may be used to assess the internal validity of the counting methods. As well, the counting methods may have elements that themselves can indicate weak internal validity of the counting methods.

It is not possible to evaluate in a consistent and manageable manner how well these methods and elements work as assessments of internal validity of counting methods. Instead, RQ 3 evaluates how well each of the methods and elements corresponds to three internal validity criteria for well-constructed bibliometric indicators: adequacy, sensitivity, and homogeneity (Gingras, 2014, pp. 112–116).

Internal validity is defined as follows: “… I concentrate on criteria directly related to the internal validity of the indicator evaluated through its adequacy to the reality behind the concept it is supposed to measure” (Gingras, 2014, p. 112). Internal validity is one facet of the concept “validity,” which in this review focuses on the validity of the counting method itself and not on external conditions when applying the counting method. Other facets of validity take external conditions into account, such as sampling error, operationalization, and population properties (Fidler & Wilcox, 2018, sec. 1.2). I analyze that introduce counting methods and, therefore, do not take validity related to external conditions into account.

In the review, the three internal validity criteria are used on counting methods instead of indicators. Counting methods, however, function as core elements or the only element in bibliometric indicators. Thus, if a counting method does not comply with the criteria for internal validity, the same conclusion could be reached for bibliometric indicators using that counting method.

Guidance is provided for how to apply the criteria at an overarching level (Gingras, 2014, pp. 112–116). However, implementation in a specific case, such as the present review, requires several choices, as described below. Apart from the study introducing the three criteria, two studies have applied the three criteria to evaluate bibliometric indicators (Wildgaard, 2015, sec. 6.3, 2019, sec. 14.4.1). The present review is the first to use three criteria to evaluate counting methods.

Three validity criteria for well-constructed bibliometric indicators:

Adequacy

According to the adequacy criterion, an indicator should be an adequate proxy for the object that the indicator is designed to measure. The indicator and the object should have the same characteristics, such as order of magnitude. The relationship between object and indicator is tested via an independent and accepted measure for the object (Gingras, 2014, pp. 112–115).

The counting methods identified by RQ 1 are assigned, using Framework 2, to arguments for the introduction of the counting methods. In the implementation of the adequacy criterion, the methods below may be used to document that the counting methods are adequate proxies for their aims, that is, the arguments for the counting methods:

- ○
Compare to other counting methods or bibliometric indicators: The scores obtained by the counting method are compared to scores obtained by existing counting methods or bibliometric indicators when applied to empirical publication sets or publication sets constructed for exemplification. Some publication sets are as small as one publication.

- ○
Principles to guide the definitions of the counting methods: A list of principles are stated explicitly and used in the definition of the counting method.

- ○
Quantitative models for distributions of scores: A quantitative model is used to test whether the counting method gives scores, fitting the model, to the objects of study.

- ○
Surveys or other empirical evidence: Surveys or other empirical evidence about coauthorship practice are used to define target values for the credits for basic units of analysis.

- ○
Compare groups of objects of study: The scores obtained by the counting method are compared for groups of objects of study with different characteristics.

There are more methods for the assessment of the adequacy of a counting method, but each of these methods was found in only one study and, therefore, not included in the list above. One example is the comparison of scores obtained by the counting method where the order of the authors in a publication is kept versus where the order is shuffled (Trueba & Guerrero, 2004, Fig. 4).

As mentioned, the present review’s analysis does not assess how well these methods work in the studies that introduce counting methods. Instead, the analysis assesses whether the methods are appropriate to test the counting methods as adequate proxies for their aims.

- ○
Sensitivity

According to the sensitivity criterion, an indicator should reflect changes over time in the object that the indicator is designed to measure (Gingras, 2014, pp. 115–116).

Section 1 explains that the increasing average number of coauthors per publication is a driver behind the discussion about counting methods. In relation to the sensitivity criterion, two elements are defined. Where present, these elements highlight counting methods that are less flexible to an increasing number of authors per publication:

- ○
Time specific evidence: Surveys or other empirical evidence about coauthorship practice that are not updated over time and, therefore, do not reflect changes over time in the average number of coauthors per publication.

- ○
Fixed credits for selected basic units of analysis: A fixed share of the credit for the selected basic unit of analysis, such as the first author. As the number of authors per publication increases, such fixed credits leave less credit for the nonselected authors of the publications. This is only true for fractionalized counting methods where 1 credit is shared among the authors of a publication. Therefore, the analysis considers this element in relation to fractionalized counting methods only.

Counting methods with one or both of the above elements are less flexible to an increasing number of authors per publication and, therefore, they do not comply with the sensitivity criterion.

- ○
Homogeneity

According to the homogeneity criterion, an indicator should measure only one dimension and avoid heterogeneous indicators, such as the

*h*-index, which combines publication and citation counts in one indicator. When a heterogeneous indicator increases/decreases, it is not immediately clear whether one or more elements cause the change. Thus, the indicator becomes difficult to interpret (Gingras, 2014, p. 116).Some of the counting methods are homogeneous, whereas others are complex, mixing many elements. A mix of elements can make it difficult to instantly understand how the scores of the counting method are obtained and which elements account for how much of the score. The implementation of the homogeneity criterion investigates elements that work against the criterion:

- ○
Parameter values selected by bibliometrician: The equation for the counting method has parameter(s) where the bibliometrician selects the values of the parameter(s) for each analysis individually.

- ○
External elements: The equation for the counting method is dependent on elements external to the publications that are included in an analysis (for example, an author’s position as principal investigator, an author’s

*h*-index, an author’s number of publications). - ○
Conditional equations: To calculate credits for all basic units of analysis, a conditional equation is needed. One part of the equation is dedicated to specific basic units of analysis or specific publications (for example, first authors or publications with local authors) and another part of the equation is dedicated to the remaining basic units of analysis or publications.

Counting methods with one or more of the above elements are heterogeneous, as the elements lead to several dimensions being present in the same counting method.

Of the three validity criteria, the homogeneity criterion is the most difficult to apply to counting methods. A mix of different elements that have the same measure unit does not count as heterogeneous but as composite (Gingras, 2014, p. 122). This said, the difference between heterogeneous and composite is described by Gingras using an example, which makes an exact interpretation difficult. It is a matter for debate whether some of the conditional equations included in the review’s analysis use the same measure unit—for example, author contributions for first authors and other authors, respectively—and, therefore, whether these equations identify true heterogeneous counting methods.

- ○

### 2.4. RQ 4: Use of the Counting Methods in Research Evaluations

RQ 4 investigates to what extent the counting methods identified by RQ 1 are used in research evaluations. The research evaluations should comply with the design of the counting methods in the studies that introduce the counting methods. If one or more of the characteristics identified under RQ 2 change from the introduction to the use of the counting methods, then the introducing study’s guidance about how to use the counting method may be compromised.

RQ 4: To what extent are the counting methods identified by RQ 1 used in research evaluations and to what extent is this use compliant with the definitions in the studies that introduce the counting methods?

RQ 4 is addressed through a literature search aimed at identifying research evaluations that use the counting methods identified by RQ 1. The literature search is restricted to peer-reviewed studies in English. The peer-review criterion ensures some level of quality check and increases the likelihood that researchers have authored the studies. Thus, reports from university management, PowerPoint presentations, sales materials, etc. are not included. The literature search does not distinguish between studies where the research evaluations are the primary result and studies where the research evaluations are part of the results.

Counting methods can be used in many contexts, such as in the development of new counting methods or investigations of the mathematical properties of the counting methods. The focus for the present review is research evaluations covering a minimum of 30 researchers where researchers are the objects of study. If institutions or countries are the objects of study, the institutions or countries cannot be represented by fewer than 30 researchers. Counting methods that are difficult to apply on larger publication sets are probably less well-suited for research evaluations. In other words, the emphasis is on scalable counting methods.

To find studies that use the counting methods identified by RQ 1, citations in Google Scholar to the counting methods are searched. As discussed in Section 2.1, Google Scholar covers more publications relevant to the review than either Web of Science or Scopus. Furthermore, to avoid research evaluations with almost identical implementations of a counting method, the same author cannot represent several research evaluations for the same counting method. Some counting methods are used in several studies by the same author (for example, see Abramo, D’Angelo, & Rosati, 2013, p. 201; Abramo, Aksnes, & D’Angelo, 2020, p. 7). Including all of these research evaluations would give this implementation more weight than implementations represented by one research evaluation.

The number of research evaluations that use each counting method is reported using the following intervals: Zero research evaluations use the counting method, one to three research evaluations use the counting method, and four or more research evaluations use the counting method.

For counting methods with four or more research evaluations, samples of five research evaluations, if available, are selected randomly for inclusion in an analysis of the use of the counting methods. With five research evaluations per counting method, it is possible to get an indication of whether or not the characteristics from the introduction of the counting methods, as identified under RQ 2, are kept in the research evaluations. With one to three research evaluations per counting method the results of the analysis would not be sufficiently robust.

## 3. RESULTS

Section 3 reports the results for RQs 1–4 based on the methods presented in Section 2. Sections 3.1–3.4 present the results for each of the RQs 1–4. The Supplementary Material, Section 1, offers a schematic overview of results under all RQs.

### 3.1. RQ 1: Thirty-Two Unique Counting Methods in the Bibliometric Research Literature

RQ 1: How many unique counting methods are there and when were they introduced into the bibliometric research literature?

Four score functions are introduced prior to 1970 and fall outside the time frame covered by the literature search. Recall that score functions are counting methods where different basic units of analysis and objects of study can be applied. The four score functions are complete, complete-fractionalized, straight, and whole counting (see definitions in Section 2.2.1). The review uses these pre-1970 score functions as reference points in some of the following analyses.

Beyond the four pre-1970 score functions, another 32 unique score functions are identified. These were introduced into the bibliometric research literature during the period 1981–2018^{3}. There were no unique score functions introduced during the period 1970–1980. The majority, or 17, of the score functions were introduced in the most recent decade (2010–2018), as illustrated in Figure 1.

All the score functions are introduced as counting methods, which are score functions with specific units of analysis and objects of study. Thus, the term *counting methods* is used for the score functions identified in the literature search.

### 3.2. RQ 2: Categorizations of Counting Methods According to Frameworks 1 and 2

RQ 2: To what extent can the counting methods identified by RQ 1 be categorized according to selected frameworks that focus on characteristics of the counting methods?

The RQ 2 categorizations build on two frameworks. The frameworks are independent of each other. In the presentation of the results, Framework 1 with selected mathematical properties takes priority. This framework can be seen as a further development of the binary division into full and fractional counting—a division often seen in discussions about counting methods. Framework 2 describes arguments for choosing a counting method. With Framework 2 as secondary, the framework adds extra information to the categories created via Framework 1. However, it is possible for either of the frameworks to be given priority or for the categorizations of the counting methods to be presented separately for each framework. To support different categorizations, the Supplementary Material, Section 1, simply lists all 32 counting methods chronologically.

In the presentation below, beginning with the largest category, the counting methods are divided into four categories based on Framework 1: rank dependent and fractionalized (Section 3.2.1), rank dependent and nonfractionalized (Section 3.2.2), rank independent and nonfractionalized (Section 3.2.3), and rank independent and fractionalized (Section 3.2.4). Two counting methods do not fit these properties (Section 3.2.5) and two arguments for introducing counting methods do not currently comply with Framework 2 (Section 3.2.6).

Most of the counting methods have a name. Counting methods without a name are in the review named after the author(s) of the study introducing the counting methods (i.e., [author, publication year]).

#### 3.2.1. Rank-dependent and fractionalized counting methods

Twenty-one of the 32 counting methods identified by RQ 1 are rank dependent and fractionalized, meaning that the basic units of analysis in a publication share 1 credit but do not receive equal shares. Among the pre-1970 counting methods, straight counting has these properties.

In addition to the rank-dependent counting methods, the results include counting methods where the credits for basic units of analysis are shared unevenly based on characteristics other than rank, such as an author’s position as principal investigator, an author’s *h*-index, or an author’s number of publications.

Figure 2 and List 1 present the 21 counting methods. In Figure 2, a 10-author publication example illustrates the 14 counting methods^{4} where rank determines the credits for the basic units of analysis. Figure 2 is followed by List 1, with seven counting methods where characteristics other than rank determine the credits. For the counting methods in List 1, more information than the number and rank of authors in a publication is needed to calculate the credits. This extra information is, for example, an author’s position as principal investigator, an author’s *h*-index, or an author’s number of publications. Thus, it is not possible to do a generic calculation for a publication with 10 authors and show the seven counting methods in Figure 1. Instead, List 1 describes these counting methods.

Eighteen of the 21 counting methods are defined with authors as both basic units of analysis and objects of study. One counting method is defined with authors as basic units of analysis and institutions as objects of study (Howard, Cole, & Maxwell, 1987, p. 976). One study, which introduces two counting methods, has authors and countries as basic units of analysis and objects of study (Egghe, Rousseau, & Van Hooydonk, 2000, p. 146).

The arguments for 18 of the 21 counting methods can be linked to Group 1 in Framework 2: “The indicator measures the (impact of) contribution/participation/… of an object of study.” In addition, two of the 21 counting methods (Assimakis & Adam, 2010; Howard et al., 1987, p. 976) aim to measure productivity^{5}, an approach that is not included but can be added to Group 1 in Framework 2. The final study of the 21 studies argues “Credit is allocated among scientists based on their perceived contribution rather than their actual contribution” (Shen & Barabasi, 2014, p. 12,329). This argument is assigned to Group 4 in Framework 2: “Comply with researchers’ perceptions of how their publications and/or citations are counted.”

As mentioned above, in addition to the 14 counting methods comprising Figure 2, there are seven counting methods where the credits for basic units of analysis are shared unevenly based on characteristics other than rank. List 1 describes these counting methods.

All but one of the counting methods in List 1 were introduced after Framework 1 was published. In the framework, rank is determined based on the information in a publication (Gauffriau et al., 2007, pp. 179; 188). Thus, the definition of rank in the framework does not cover the counting methods where the credits are distributed based on characteristics other than rank. This review assumes credits distributed based on characteristics other than rank to be a variation of the property rank dependent. However, the counting methods in List 1 are defined with authors as both basic units of analysis and objects of study. It would require information about how the counting methods are defined at other aggregation levels to find out whether or not credits distributed based on characteristics other than rank can be seen as a variation of the property rank dependent for these counting methods.

In List 1, there are some examples of studies that add small changes to existing counting methods. As discussed in Section 2.1, the review does not consider these studies as presenting distinct representations of counting methods.

**List 1:** Fractionalized counting methods. The credits to the basic units of analysis are distributed based on characteristics other than rank.

The credit of 1 for a publication is divided between the authors in such a way that the senior author receives twice the credit of nonsenior authors (Boxenbaum, Pivinski, & Ruberg, 1987, pp. 566–568).

Pareto weights

The credit of 1 for a publication is divided between the authors. An author receives the greater credit if the number of actual citations is more in line with the author’s average number of citations per publication (i.e., neither higher nor lower; Tol, 2011, pp. 292–293; 296–297).

Persson suggests a modification of Tol’s counting method where the weight assigned to an author can change from one publication to the next (Persson, 2017). The review does not discuss these alternatives further.

Shapley value approach

The credit of 1 for a publication is divided between the authors according to their Shapley value, a concept from game theory. An author’s weight is calculated by averaging the marginal contribution of the author in all possible coauthor combinations. The marginal contribution is based on the author’s number of citations or on other impact scores for the author (Papapetrou, Gionis, & Mannila, 2011).

Contribution to an article’s visibility, first approach

The credit of 1 for a publication is divided between the authors, with weights dependent on an indicator (for example, the

*h*-index) calculated for each author (Egghe, Guns, & Rousseau, 2013, pp. 57–59).The credit of 1 for a publication is divided between the authors, with weights dependent on the author’s share of authorships in the cocitation network of the publication and also on the number of cocitations. The more publications and citations an author has in the research field, the more credit will be assigned to her/him (Shen & Barabasi, 2014).

Other studies suggest modifications to Shen and Barabasi’s counting method. In one study, author ranks in the publications are taken into account (Wang, Guo et al., 2017); in another, publication years and whether or not publications are highly cited are taken into account (Bao & Zhai, 2017). The review does not discuss these alternatives further.

Relative intellectual contribution

In publications where the authors state their contributions guided by the CRediT

^{6}taxonomy, the types of contributions can be weighted and these weights credited to the contributing authors. In total, all author contributions to a publication sum to 1 (Rahman, Regenstein et al., 2017).The credit of 1 for a publication is divided equally between those authors who are principal investigators. All other authors of the publication are credited 0 (Steinbrüchel, 2019, pp. 307–308).

#### 3.2.2. Rank-dependent and nonfractionalized counting methods

A much smaller category, with six counting methods, has the properties rank dependent and nonfractionalized, meaning that the sum of credits for basic units of analysis in a publication can sum to more than 1 credit and the basic units of analysis do not receive equal shares. The pre-1970 counting methods are not represented in this category.

In addition to the rank-dependent counting methods, the results include counting methods where the credits for basic units of analysis are shared unevenly based on characteristics other than rank, such as an author’s position as principal investigator, an author’s *h*-index, or an author’s number of publications.

The six counting methods are defined with authors as basic units of analysis and objects of study.

The arguments for the introductions of four of the six counting methods are from Group 1 in Framework 2: “The indicator measures the (impact of) contribution/participation/… of an object of study.” The two remaining counting methods (Ellwein, Khachab, & Waldman, 1989, p. 320) aim to measure productivity^{7}, an approach that is not included in, but can be added to, Group 1 in Framework 2.

Figure 3 and List 2 present the six counting methods. Figure 3 shows five of the counting methods. An example with a 10-author publication provides a visual representation of the counting methods. For one of the counting methods, the credits are distributed based on characteristics other than rank, as discussed in relation to List 1. List 2, with only one item, describes this counting method.

**List 2:** Nonfractionalized counting method. Credits to the basic units of analysis are distributed based on characteristics other than rank.

Contribution to an article’s visibility, second approach

The

*h*-index (or another indicator) is calculated for each author of a publication and for the union of the authors’ publications. An author receives a share of the credit for the publication equal to her or his*h*-index divided by the*h*-index for the union. The sum of credits to the authors of a publication may exceed 1 (Egghe et al., 2013, pp. 57–59).

#### 3.2.3. Rank-independent and nonfractionalized counting methods

The next category includes three counting methods, which are rank independent and nonfractionalized. All basic units of analysis in a publication receive equally sized credits and the total credit for a publication can sum to more than 1. Among the pre-1970 counting methods, complete counting has these properties.

The three counting methods are defined with authors as basic units of analysis and objects of study.

The first counting method aims to give a balanced representation of productivity across research disciplines (Kyvik, 1989, pp. 206–209). This type of argument is not yet included in Framework 2. See the Supplementary Material, Section 2.3, for a further analysis. For the two remaining counting methods (de Mesnard, 2017; Tscharntke, Hochberg, et al., 2007), the argument for the introduction of the counting method is assigned to Group 1 in Framework 2: “The indicator measures the (impact of) contribution/participation/… of an object of study.”

Figure 4 provides a visual representation of the three counting methods. A 10-author publication is used as an example for the illustration. In Figure 4, the Equal Contribution method results in scores identical to scores obtained by complete-fractionalized counting. However, complete-fractionalized counting has no limit for how small a fraction of the credit, from a publication to a basic unit of analysis, can be. The Equal Contribution method gives a minimum 5% of the credit from a publication to a basic unit of analysis; thus, unlike scores obtained by complete-fractionalized counting, the total credit for a publication can sum to more than 1^{8}.

#### 3.2.4. Rank-independent and fractionalized counting methods

For completeness, the category of rank independent and fractionalized counting methods is included. However, none of the 32 counting methods identified by RQ 1 belong to this category. Among the pre-1970 counting methods, complete-fractionalized counting has the properties of being rank independent and fractionalized. The basic units of analysis in a publication share 1 credit evenly.

#### 3.2.5. Two counting methods do not comply with Framework 1

Two of the counting methods identified by RQ 1 do not comply with the selected properties from Framework 1: rank dependent/rank independent and fractionalized/nonfractionalized. These are the Online fractionation approach (Nederhof & Moed, 1993) and the Norwegian Publication Indicator (NPI) (Sivertsen, 2016, p. 912). As documented below, there are different reasons for why the two counting methods do not fit Framework 1.

The two counting methods are analyzed under RQs 3 and 4. These analyses are not affected by the counting methods not fitting Framework 1.

##### On-line fractionation approach

The description of Framework 1 in Section 2.2.1 mentions that whole counting and whole-fractionalized counting do not comply with the property of being rank dependent or rank independent. Whole-fractionalized counting is introduced in 1993, under the name On-line fractionation approach (Nederhof & Moed, 1993). As such, it is well documented that the On-line fractionation approach does not comply with the Framework 1 property of being either rank dependent or rank independent (Gauffriau et al., 2007, p. 188). Nonetheless, the basic units of analysis and objects of study can be determined. The On-line fractionation approach is defined with countries as basic units of analysis and objects of study. The argument for the introduction of the counting method is assigned to Group 3 in Framework 2: “Pragmatic reasons.” The counting method is easier to use on larger publication sets compared to complete-fractionalized counting (Nederhof & Moed, 1993, p. 41).

##### The counting method used in the Norwegian Publication Indicator (NPI)

The other counting method that does not comply with Framework 1 is the Norwegian Publication Indicator (NPI) (Sivertsen, 2016, p. 912). In the NPI, authors are the basic units of analysis and institutions are the objects of study. An institution’s score for a publication is calculated by first adding up the complete-fractionalized credits of the authors from the institution to a sum for the institution. Next, the square root of the sum is calculated. Applying the square root to a sum for basic units of analysis as done in the NPI does not comply with measure theory, which is the theoretical foundation for the mathematical properties of Framework 1 (see Section 2.2.1)^{9}.

Furthermore, neither does the NPI comply with Framework 2. Similar to Kyvik’s counting method (Kyvik, 1989, pp. 206–209), the argument for the NPI is to give a balanced representation of productivity across research disciplines (Sivertsen, 2016, p. 912). This argument has not yet been covered by Framework 2.

The NPI does not fit with either of the Frameworks 1 and 2. Thus, the Supplementary Material, Section 2, is a case study that uses the two frameworks to analyze the NPI, and through this analysis, identify potential for developing the frameworks further.

#### 3.2.6. Counting methods that do not currently comply with Framework 2

Two arguments for introducing counting methods do not currently comply with Framework 2. Both arguments are reported in the sections above.

The first argument is to give a balanced representation of productivity across research disciplines used in two studies (Kyvik, 1989, pp. 206–209; Sivertsen, 2016, p. 912). This argument has not yet been covered by Framework 2; however, in the Supplementary Material, Section 2.3, a case study discusses how Framework 2 can be developed to include the argument.

The second argument is to measure productivity, an approach that is not included but can be added to Group 1 in Framework 2 as mentioned in Sections 3.2.1 and 3.2.2. This argument is used by three studies (Assimakis & Adam, 2010, p. 422; Ellwein et al., 1989, p. 320; Howard et al., 1987, p. 976).

The counting methods are analyzed in relation to RQs 3 and 4. These analyses are not affected by the counting methods not currently fitting Framework 2.

### 3.3. RQ 3: Methods and Elements to Assess Internal Validity of Counting Methods

RQ 3: Which methods and elements from the studies that introduce the counting methods identified by RQ 1 can be used to assess the internal validity of those counting methods?

RQ 3 applies three criteria for well-constructed bibliometric indicators: Adequacy, sensitivity, and homogeneity. The adequacy criterion identifies methods that may be used to assess the adequacy of counting methods in the studies that introduce counting methods. The sensitivity and homogeneity criteria identify elements that indicate weak sensitivity and define heterogeneity in the equations of the counting methods and, as such, work against the two criteria.

RQ 3 does not evaluate how well these methods and elements work as assessments of internal validity of the counting methods in the studies that introduce counting methods. Thus, RQ 3 does not answer whether the 32 counting methods identified by RQ 1 are internally valid or not. Instead, RQ 3 evaluates how well each of the methods and elements corresponds to the relevant criteria for internal validity: adequacy, sensitivity, and homogeneity.

Table 5 presents the schematic overview from the Supplementary Material, Section 1, in relation to RQ 3. The table shows which counting methods apply to the methods and elements to assess adequacy, sensitivity, and homogeneity. Sections 3.3.1–3.3.3 report the results for each of the three criteria.

**Table 5.**

Counting method
. | Methods to support adequacy
. | Elements that work against sensitivity
. | Elements that work against homogeneity
. | |||||||
---|---|---|---|---|---|---|---|---|---|---|

Compare to other counting methods or bibliometric indicators . | Principles to guide the definitions of the counting methods . | Quantitative models for distributions of scores . | Surveys or other empirical evidence . | Compare groups of objects of study . | Time-specific evidence . | Fixed credits for selected basic units of analysis . | Parameter values selected by bibliometrician . | External elements . | Conditional equations . | |

Harmonic counting (Hodge & Greenberg, 1981; Hagen, 2008) | No | Yes | No | No | No | No | No | No | No | No |

Proportional counting (Hodge & Greenberg, 1981; Van Hooydonk, 1997) | No | Yes | No | No | No | No | No | No | No | No |

[Howard et al., 1987] | Yes | No | No | No | Yes | No | No | No | No | No |

[Boxenbaum et al., 1987] | No | No | No | Yes | No | Yes | No | No | Yes | Yes |

[Kyvik, 1989] | Yes | No | Yes | No | Yes | No | NA | No | No | Yes |

Exponential function (Ellwein et al., 1989) | Yes | No | No | No | Yes | No | NA | Yes | No | No |

Second-and-last function (Ellwein et al., 1989) | Yes | No | No | No | Yes | No | NA | Yes | No | No |

On-line fractionation approach (Nederhof & Moed, 1993) | Yes | No | No | No | Yes | No | No | No | No | No |

Correct credit distribution (Lukovits & Vinkler, 1995) | No | Yes | No | Yes | No | Yes | No | Yes | No | Yes |

Pure geometric counting (Egghe et al., 2000) | Yes | No | No | No | No | No | No | No | No | No |

Noblesse Oblige (Egghe et al., 2000; Zuckerman, 1968) | Yes | No | No | No | No | No | Yes | Yes | No | Yes |

Refined weights (Trueba & Guerrero, 2004) | Yes | Yes | No | No | Yes | No | N | Yes | No | Yes |

Sequence determines credit (Tscharntke et al., 2007) | Yes | No | No | No | No | No | NA | No | No | Yes |

Equal contribution (Tscharntke et al., 2007) | Yes | No | No | No | No | No | NA | No | No | Yes |

Weight coefficients (Zhang, 2009) | Yes | No | No | No | No | No | NA | No | No | Yes |

Golden productivity index (Assimakis & Adam, 2010) | Yes | No | No | No | No | No | Yes | No | No | Yes |

Tailor based allocations (Galam, 2011) | Yes | No | No | No | No | No | No | Yes | No | Yes |

Pareto weights (Tol, 2011) | Yes | No | No | No | No | No | No | No | Yes | No |

Shapley value approach (Papapetrou et al., 2011) | Yes | No | No | No | Yes | No | No | No | Yes | No |

A-index (Stallings et al., 2013) | Yes | Yes | No | No | Yes | No | No | No | No | No |

Weighted fractional output, intramural or extramural (Abramo et al., 2013) | Yes | No | No | No | Yes | No | Yes | No | No | Yes |

Absolute weighing factor (Aziz & Rozing, 2013) | Yes | No | No | No | Yes | No | No | No | No | Yes |

Contribution to an article’s visibility, first approach (Egghe et al., 2013) | No | No | Yes | No | Yes | No | No | No | Yes | No |

Contribution to an article’s visibility, second approach (Egghe et al., 2013) | No | No | Yes | No | Yes | No | NA | No | Yes | No |

[Shen & Barabasi, 2014] | No | No | No | No | Yes | No | No | No | Yes | No |

Network-based model (Kim & Diesner, 2014) | Yes | No | No | Yes | Yes | Yes | No | Yes | No | Yes |

Norwegian Publication Indicator (Sivertsen, 2016) | Yes | No | Yes | No | Yes | No | NA | No | No | No |

[Zou & Peterson, 2016] | Yes | No | No | Yes | Yes | Yes | NA | No | No | No |

Relative intellectual contribution (Rahman et al., 2017) | Yes | No | No | No | No | No | No | No | Yes | No |

Parallelization bonus (de Mesnard, 2017) | Yes | Yes | No | No | No | No | NA | No | No | No |

[Steinbrüchel, 2019] | Yes | No | No | No | No | No | No | No | Yes | No |

[Bihari & Tripathi, 2019] | Yes | No | No | No | No | No | No | No | No | No |

Counting method
. | Methods to support adequacy
. | Elements that work against sensitivity
. | Elements that work against homogeneity
. | |||||||
---|---|---|---|---|---|---|---|---|---|---|

Compare to other counting methods or bibliometric indicators . | Principles to guide the definitions of the counting methods . | Quantitative models for distributions of scores . | Surveys or other empirical evidence . | Compare groups of objects of study . | Time-specific evidence . | Fixed credits for selected basic units of analysis . | Parameter values selected by bibliometrician . | External elements . | Conditional equations . | |

Harmonic counting (Hodge & Greenberg, 1981; Hagen, 2008) | No | Yes | No | No | No | No | No | No | No | No |

Proportional counting (Hodge & Greenberg, 1981; Van Hooydonk, 1997) | No | Yes | No | No | No | No | No | No | No | No |

[Howard et al., 1987] | Yes | No | No | No | Yes | No | No | No | No | No |

[Boxenbaum et al., 1987] | No | No | No | Yes | No | Yes | No | No | Yes | Yes |

[Kyvik, 1989] | Yes | No | Yes | No | Yes | No | NA | No | No | Yes |

Exponential function (Ellwein et al., 1989) | Yes | No | No | No | Yes | No | NA | Yes | No | No |

Second-and-last function (Ellwein et al., 1989) | Yes | No | No | No | Yes | No | NA | Yes | No | No |

On-line fractionation approach (Nederhof & Moed, 1993) | Yes | No | No | No | Yes | No | No | No | No | No |

Correct credit distribution (Lukovits & Vinkler, 1995) | No | Yes | No | Yes | No | Yes | No | Yes | No | Yes |

Pure geometric counting (Egghe et al., 2000) | Yes | No | No | No | No | No | No | No | No | No |

Noblesse Oblige (Egghe et al., 2000; Zuckerman, 1968) | Yes | No | No | No | No | No | Yes | Yes | No | Yes |

Refined weights (Trueba & Guerrero, 2004) | Yes | Yes | No | No | Yes | No | N | Yes | No | Yes |

Sequence determines credit (Tscharntke et al., 2007) | Yes | No | No | No | No | No | NA | No | No | Yes |

Equal contribution (Tscharntke et al., 2007) | Yes | No | No | No | No | No | NA | No | No | Yes |

Weight coefficients (Zhang, 2009) | Yes | No | No | No | No | No | NA | No | No | Yes |

Golden productivity index (Assimakis & Adam, 2010) | Yes | No | No | No | No | No | Yes | No | No | Yes |

Tailor based allocations (Galam, 2011) | Yes | No | No | No | No | No | No | Yes | No | Yes |

Pareto weights (Tol, 2011) | Yes | No | No | No | No | No | No | No | Yes | No |

Shapley value approach (Papapetrou et al., 2011) | Yes | No | No | No | Yes | No | No | No | Yes | No |

A-index (Stallings et al., 2013) | Yes | Yes | No | No | Yes | No | No | No | No | No |

Weighted fractional output, intramural or extramural (Abramo et al., 2013) | Yes | No | No | No | Yes | No | Yes | No | No | Yes |

Absolute weighing factor (Aziz & Rozing, 2013) | Yes | No | No | No | Yes | No | No | No | No | Yes |

Contribution to an article’s visibility, first approach (Egghe et al., 2013) | No | No | Yes | No | Yes | No | No | No | Yes | No |

Contribution to an article’s visibility, second approach (Egghe et al., 2013) | No | No | Yes | No | Yes | No | NA | No | Yes | No |

[Shen & Barabasi, 2014] | No | No | No | No | Yes | No | No | No | Yes | No |

Network-based model (Kim & Diesner, 2014) | Yes | No | No | Yes | Yes | Yes | No | Yes | No | Yes |

Norwegian Publication Indicator (Sivertsen, 2016) | Yes | No | Yes | No | Yes | No | NA | No | No | No |

[Zou & Peterson, 2016] | Yes | No | No | Yes | Yes | Yes | NA | No | No | No |

Relative intellectual contribution (Rahman et al., 2017) | Yes | No | No | No | No | No | No | No | Yes | No |

Parallelization bonus (de Mesnard, 2017) | Yes | Yes | No | No | No | No | NA | No | No | No |

[Steinbrüchel, 2019] | Yes | No | No | No | No | No | No | No | Yes | No |

[Bihari & Tripathi, 2019] | Yes | No | No | No | No | No | No | No | No | No |

#### 3.3.1. Adequacy—five methods

According to the adequacy criterion, an indicator should be an adequate proxy for the object the indicator is designed to measure. Adequacy is tested through an independent and accepted measure for the object. The analysis identifies five methods in the studies that introduce the counting methods identified by RQ 1 that may be used to assess the adequacy of the counting methods:

Compare to other counting methods or bibliometric indicators

Principles to guide the definitions of the counting methods

Quantitative models for distributions of scores

Surveys or other empirical evidence

Compare groups of objects of study

Below, the results report how well each of the methods assesses adequacy. There are examples of studies that explicitly use the methods to assess the adequacy of the counting methods. These include Sivertsen’s use of a quantitative model (Sivertsen, 2016, p. 911) and Shen and Barabási’s use of groups of objects of study comprised of Nobel laureates versus their coauthors (Shen & Barabasi, 2014, p. 12,326). However, all the studies include analyses of the counting methods. These analyses include methods that, in the review, are interpreted as assessments of the internal validity of the counting methods.

##### Compare to other counting methods or bibliometric indicators

When the adequacy of counting methods is assessed by comparisons to other counting methods or bibliometric indicators, these other counting methods or bibliometric indicators should constitute independent and accepted measures of the aims of the counting methods. The aims of the counting methods are analyzed in relation to RQ 2 via Framework 2.

In the studies that introduce the counting methods, 25 of the 32 counting methods are analyzed by making comparisons with other counting methods or bibliometric indicators. An example are studies that introduce those counting methods for which the aim is that first or last authors provide the largest contribution/participation/… to a publication (Group 1 in Framework 2). Complete-fractionalized counting does not reflect this aim because all coauthors are credited equally. Therefore, comparisons involving the complete-fractionalized counting method that result in weak correlations may be regarded as evidence of the adequacy of the counting methods emphasizing first- or last-authors contributions (for examples, see Abramo et al., 2013, p. 207; Assimakis & Adam, 2010, pp. 424–425).

To successfully use comparisons with other counting methods or bibliometric indicators to assess adequacy, the bibliometrician should first evaluate the relevance of the other counting methods or bibliometric indicators in relation to aim of the counting method under assessment.

Furthermore, according to the adequacy criterion, not only should the other counting methods or bibliometric indicators used in comparisons be accepted measures of the aims of the counting methods, they should also be independent of the counting method under assessment. However, it can be debated whether or not other counting methods or bibliometric indicators are independent from the counting methods under assessment, as both build on publications and/or citations. Biases in the other counting methods or bibliometric indicators may also very well be present in the counting methods under assessment. This potential bias should be taken into account if adequacy of counting methods is assessed by comparisons to other counting methods or bibliometric indicators.

##### Principles to guide the definitions of the counting methods

When principles to guide the definitions of the counting methods are used to support the adequacy of the counting methods, these principles constitute an ideal description of the counting methods and, as such, represent independent and accepted measures of the aims of the counting methods. The aims of the counting methods are analyzed in relation to RQ 2 via Framework 2.

Six of the 32 counting methods have principles to guide their definitions. The six counting methods aim to measure (the impact of) the contribution/participation/… of an object of study (Group 1 in Framework 2). The principles are used to design the counting methods but not necessarily to assess the counting methods. Five out of the six studies emphasize the principles rank-dependency and/or fractionalization (Hodge & Greenberg, 1981; Lukovits & Vinkler, 1995, pp. 92–93; Stallings et al., 2013, pp. 9681–9682; Trueba & Guerrero, 2004, pp. 182–183). The remaining study’s principles focus on division of tasks (de Mesnard, 2017).

Decisions about whether the principles to guide definitions of the counting methods are independent and accepted measures of the aims of the counting methods are in some cases based on thorough analyses (de Mesnard, 2017) and in other cases on personal experiences (Hodge & Greenberg, 1981). In the latter case, it is difficult to assess if the principles are appropriate for assessing the adequacy of the counting methods.

##### Quantitative models for distributions of scores

When quantitative models for distributions of scores are used to assess the adequacy of the counting methods, the scores for the objects of study are tested against distributions, which should constitute independent and accepted measures of the aims of the counting methods. The aims of the counting methods are analyzed in relation to RQ 2 via Framework 2.

Four of the 32 counting methods have quantitative models for distributions of scores. These distributions are the Gini-coefficients close to 0.5 (Kyvik, 1989, pp. 209–210), Lotka’s law (Egghe et al., 2013, pp. 59–62; Kyvik, 1989, p. 211), and equal scores across objects of study (Kyvik, 1989, pp. 207–208; Sivertsen, 2016, p. 911). For two of the counting methods, the aim is to give a balanced representation of productivity across research disciplines. This aim is reflected in the quantitative model (Kyvik, 1989, pp. 207–208; Sivertsen, 2016, p. 911). Whether the other quantitative models can work as independent and accepted measures of the aims of the counting methods depends on the validity of the models. Many studies in the bibliometric literature analyze Lotka’s law, and the Gini-coefficient is investigated in economics, bibliometrics, and other research fields. As such, we have general knowledge about these quantitative models. This said, in the studies that use the models to assess the adequacy of counting methods, the relation between the aim of the counting methods and the model must be made clear.

##### Surveys or other empirical evidence

When surveys or other empirical evidence are used in the assessment of adequacy of counting methods, the credits to the basic units of analysis are evaluated against empirical data about how coauthors in a publication share credits. The idea is to define target values for the credits for the basic units of analysis. These target values should constitute independent and accepted measures of the aims of the counting methods. The aims of the counting methods are analyzed in relation to RQ 2 via Framework 2.

Four of the 32 counting methods use results from surveys (Kim & Diesner, 2014, pp. 593–595; Lukovits & Vinkler, 1995, pp. 93–94; Zou & Peterson, 2016, pp. 904–906), and one study uses a calculation of the ratio of senior researchers to nonsenior researchers to determine credits for basic units of analysis (Boxenbaum et al., 1987, pp. 566–568). All four counting methods aim to measure (the impact of) the contribution/participation/… of an object of study (Group 1 in Framework 2).

Surveys and other empirical evidence can be well suited to create independent and accepted measures of the aims of counting methods. However, the study designs that inform the surveys and evidence are important to take into account. Certainly, surveys or other empirical evidence are created at a specific point in time, and this can impact the sensitivity of the counting methods (see Section 3.3.2).

##### Compare groups of objects of study

When comparisons between groups of objects of study are used to assess adequacy, scores for groups of objects of study are compared, such as early career versus senior researchers. These comparisons between groups of objects of study should rely on independent and accepted measures of the aims of the counting methods. The aims of the counting methods are analyzed in relation to RQ 2 via Framework 2.

Sixteen of the 32 counting methods use the compare groups of objects of study approach to assess the adequacy of the counting methods. The comparisons are between institutions or countries (for example, Howard et al., 1987), research fields or disciplines (for example, Sivertsen, 2016, p. 911), publication sets from different databases (for example, Shen & Barabasi, 2014, p. 12,326), high-impact and other researchers (for example, Abramo et al., 2013, pp. 204–206), principal investigator and student (for example, Egghe et al., 2013, p. 64), or award-winners and other researchers (for example, Aziz & Rozing, 2013, pp. 4–6).

The studies that make comparisons between groups of objects of study to assess adequacy have different aims for introducing the counting methods. The studies represent all four groups from Framework 1. For some of the counting methods, the comparisons can be related to evaluations of the adequacy of the counting methods, such as in comparisons of principal investigator versus student, where the expectation would be that the principal investigator would have the higher score (Egghe et al., 2013, p. 64). In other cases, the relation is less clear, and assessment of validity of adequacy may not be the intention of the studies making the comparisons.

#### 3.3.2. Sensitivity—two elements

According to the sensitivity criterion, an indicator should reflect changes over time in the object that the indicator is designed to measure. For counting methods, it is important that they are able to adapt to the increasing average number of coauthors per publication. The analysis identifies two elements in the counting methods identified by RQ 1 that make counting methods less adaptable to increasing numbers of authors per publication:

Time-specific evidence

Fixed credits for selected basic units of analysis

Below, the results report for each of these two elements the effect resulting from an increasing number of authors per publication. The studies that introduce the counting methods do not analyze the issue of increasing numbers of authors per publication.

##### Time-specific evidence

This element overlaps with the method: surveys or other empirical evidence. As discussed in Section 3.3.1, counting methods where the adequacy is tested against the results of surveys or other empirical evidence about how coauthors of a publication share credits may eventually become obsolete due to changes in coauthor practices not being reflected in the empirical evidence. One study creates new evidence (Zou & Peterson, 2016, pp. 904–906). However, some of the evidence from the four studies using surveys or other empirical data dates back to the 1980s (Boxenbaum et al., 1987, p. 567; Kim & Diesner, 2014, p. 594; Lukovits & Vinkler, 1995, p. 94; Vinkler, 1993, pp. 217–223). A further limitation of the evidence is that it relates to investigations of smaller numbers of authors per publication. For example, two studies include publications with up to five authors (Kim & Diesner, 2014, p. 594; Lukovits & Vinkler, 1995, p. 94). The four counting methods using empirical evidence aim to measure (the impact of) the contribution/participation/… of an object of study (Group 1 in Framework 2). To support this aim, the empirical evidence must be updated regularly to reflect the current average number of coauthors per publication.

##### Fixed credits for selected basic units of analysis

Three counting methods use fixed credits for selected authors only, independent of the number of coauthors. As the average number of coauthors increases, the credits for each of the other authors will decrease. The differences in credits assigned to the selected versus other authors may become extreme and, therefore, may not comply with the sensitivity criterion. The three counting methods that apply fixed credits for selected basic units of analysis (Abramo et al., 2013, p. 201; Assimakis & Adam, 2010, pp. 422–423; Egghe et al., 2000, p. 146) all aim to measure (the impact of) the contribution/participation/… of an object of study (Group 1 in Framework 2). The use of fixed credits may not reflect this aim if the average number of coauthors increases.

#### 3.3.3. Homogeneity—three elements

According to the homogeneity criterion, an indicator should measure only one dimension and avoid heterogeneous indicators. The analysis investigates different elements that contribute to the equations for counting methods. Where there are several different elements in the equations, it is not immediately clear how these different elements affect the scores obtained by the counting methods. Such elements do not support homogeneity.

The analysis identifies three elements in the counting methods identified by RQ 1 that contribute to heterogeneity and, therefore, work against the homogeneity of the counting methods:

Parameter values selected by bibliometrician

External elements

Conditional equations

Below are the results for each of these three elements. None of the studies that introduce the counting methods analyze homogeneity.

##### Parameter values selected by bibliometrician

Seven of the 32 counting methods have one or more parameter values for the bibliometrician to select individually in each analysis. A change of parameter values will change the distribution of credits among the basic units of analysis of a publication. This ensures that the counting methods can be adapted to accommodate credit distribution traditions in various research fields. For an example, see the illustration of “Network-based model (Kim & Diesner, 2014)” in Figure 2 in Section 3.2.1.

The most common situation seen in the five counting methods is that the bibliometrician selects the value of a parameter, and that this value can vary between 0 and 1 (Egghe et al., 2000, p. 146; Ellwein et al., 1989, p. 321; Kim & Diesner, 2014, p. 591; Lukovits & Vinkler, 1995, pp. 92–95). Two counting methods include several parameter values to be selected by the bibliometrician (Galam, 2011, p. 371; Trueba & Guerrero, 2004, pp. 184–185). The effect of selecting a given parameter value as opposed to another parameter value is not immediately clear in the score obtained by the counting method; therefore, the counting method is heterogeneous.

##### External elements

The counting methods defined in Sections 3.2.1 and 3.2.2 as rank-dependent counting methods, in which the credits for basic units of analysis are shared unevenly based on characteristics other than rank, use external elements, such as an author’s position as principal investigator, an author’s *h*-index, or an author’s number of publications.

Eight of the 32 counting methods include external elements. In five counting methods, these external elements are author-level bibliometric indicators, such as the *h*-index. Sometimes the bibliometrician can chose between several indicators (Egghe et al., 2013, pp. 58–59; Papapetrou et al., 2011, pp. 554–555) and in other cases, specific indicators are used in the definitions of the counting methods (Shen & Barabasi, 2014, pp. 12,325–12,327; Tol, 2011, pp. 292–293). Other external elements are whether or not an author is a principal investigator (Boxenbaum et al., 1987; Steinbrüchel, 2019, pp. 307–308) and the type and extent of author contributions to a publication cf. the CRediT taxonomy^{10} (Rahman et al., 2017, p. 278). At present, author contributions are not often an element made explicit by the publications included in an analysis. However, it should be noted that, increasingly, journal publications include author contribution statements. External elements require background information about the author, and the effect of this background information on the score obtained by the counting methods is not immediately clear. Therefore, the counting methods are heterogeneous.

##### Conditional equations

Most counting methods have one equation, which is applied to all basic units of analysis and all publications. But some counting methods divide basic units of analysis or publications into groups according to specific characteristics and then use conditional equations on each group. Thirteen of the 32 counting methods apply conditional equations. Nine of the 13 counting methods use author rank, such as first author, to divide the basic units of analysis into groups and apply conditional equations to the groups (for examples, see Assimakis & Adam, 2010, p. 422; Galam, 2011, p. 371; Trueba & Guerrero, 2004, pp. 184–185). Three counting methods use the number of authors per publication to create groups (Aziz & Rozing, 2013, p. 2; Kyvik, 1989, p. 206; Zhang, 2009, p. 416). In one counting method, one group has publications with first and last authors from the same institution, and, in the other group, first and last authors are from different institutions (Abramo et al., 2013, p. 201). These groupings of basic units of analysis or publications mean that it is not immediately clear how the counting methods’ scores are obtained. Therefore, the counting methods are heterogeneous.

### 3.4. RQ 4: Three Counting Methods Are Used in Four or More Research Evaluations

RQ 4: To what extent are the counting methods identified by RQ 1 used in research evaluations and to what extent is this use compliant with the definitions in the studies that introduce the counting methods?

RQ 4 employs a literature search to identify research evaluations that use the counting methods identified by RQ 1. The focus is on research evaluations covering a minimum of 30 researchers. Some counting methods are used by the same author in several research evaluations. In such cases, only one of the research evaluations is counted. The Supplementary Material, Section 1, provides a detailed schematic overview of the results related to RQ 4.

Fifteen of the counting methods are not used in research evaluations covering a minimum of 30 researchers, and 14 counting methods are used in one to three research evaluations. Only three counting methods are used in four or more research evaluations: harmonic counting, Hodge & Greenberg’s counting method, and Sequence determines credit (Hodge & Greenberg, 1981; Howard et al., 1987; Tscharntke et al., 2007). For each of these three counting methods, a random sample of five research evaluations using the counting methods is selected for a further analysis of how the counting methods are used. The Supplementary Material, Section 4, lists the research evaluations included in the analysis.

The research evaluations that use the three counting methods should draw on the same characteristics for the counting methods as presented at the introduction of the counting methods (see Section 3.2). If one or more of these characteristics change between the introduction and use of the counting method, then the research evaluation’s use of the counting method may compromised.

In the samples of research evaluations using Harmonic counting and Sequence determines credit, the counting methods are used with the same characteristics as in the studies that introduce the counting methods. As in the introduction of the counting methods, the objects of study and the basic units of analysis are authors and the arguments for the use of the counting methods are from Group 1: “The indicator measures the (impact of) contribution/participation/… of an object of study.”

The research evaluations’ arguments for using Howard et al.’s counting method are also from Group 1, a situation that is in agreement with the study introducing the counting method. However, in the introduction of the counting method, authors are the basic units of analysis and institutions are the objects of study. In the research evaluations, the basic units of analysis and the objects of study are authors, institutions, or countries. The research evaluations that have basic units of analysis and objects of study other than those present at the introduction of the counting method do validate the counting methods by comparisons with other counting methods (see Section 3.3.1 for more about this method for assessment of adequacy). However, one study that has countries as objects of study does not validate the counting method at all (Tsai & Lydia Wen, 2005). In this research evaluation, the use of Howard et al.’s counting method may be compromised, as the use is not validated by either the study introducing the counting method or the research evaluation using the counting method.

## 4. DISCUSSION

Section 4 has two parts. Section 4.1 discusses the methods used in the present review and the limitations resulting from the methodological choices made. Section 4.2 gives interpretations of the results presented in the review.

### 4.1. Discussion of the Methods—Limitations

Section 4.1 discusses the review’s methods and their limitations. Results related to an RQ inform subsequent RQs (see Table 1, Section 2); therefore, limitations related to the methods used in relation to RQ 1 have consequences for all the following RQs, and the limitations for RQ 2 affect RQs 3 and 4.

#### 4.1.1. RQ 1: Literature search covers counting methods in the bibliometric research literature

The literature search aimed at identifying counting methods and undertaken in relation to RQ 1 forms the basis for the review. The literature search covers peer-reviewed studies in English from the period 1970–2018. Including more publication types and more languages could lead to the identification of additional counting methods, such as counting methods used in local university reports. To include the period 2019–2020 would most likely result in more counting methods; however, counting methods introduced after 2018 are excluded from the review because the use of these will be difficult to assess in relation to RQ 4 (i.e., less than 2 years after their introduction into the bibliometric research literature).

The literature search identifies 32 counting methods, which, in relation to RQ 2, are then assigned to categories. Including the period 2019–2020 could add counting methods to the analysis and, thus, more detail to the results. However, the proportion of counting methods in each of the categories is consistent over time, in that none of the categories include counting methods from one decade alone (see the Supplementary Material, Section 1). This suggests that adding a few new counting methods would be unlikely to change the overall results.

Even though the result of the literature search may be supplemented with more counting methods, the review scrutinizes more counting methods than previous reviews. The present review demonstrates that the majority of the counting methods were introduced in the most recent decade. If this trend continues, future counting methods may change the proportion of counting methods in each of the categories used to structure this review.

#### 4.1.2. RQ 2: Two frameworks selected among other possible frameworks

RQ 2 uses two selected frameworks to categorize the counting methods identified by RQ 1 according to the characteristics of those counting methods. Framework 1 describes selected mathematical properties of counting methods and Framework 2 describes arguments for choosing a counting method for a bibliometric analysis.

The literature search did uncover other frameworks that may be suitable for the analysis of many different counting methods. Indeed, drawing on theories, methods, and concepts from other research fields, the number of potentially relevant frameworks is very large. However, using only one or a few frameworks in an analysis serves to prevent overly complex results. The present review uses two frameworks. To illustrate this, the potential of a framework not used in the review is discussed below.

Xu et al.’s framework divides counting methods into linear, curve, and “other” counting methods (Xu et al., 2016, pp. 1974–1977). A closer look at the counting methods in the present review with regard to what Xu et al. define as curved counting methods reveals that Zou and Peterson’s counting method (Zou & Peterson, 2016, p. 906) is the nonfractionalized version of pure geometric counting (Egghe et al., 2000, p. 146). In both counting methods, the second author gets half the credit given to the first author, the third author gets half the credit given to the second author, and so on. Furthermore, Howard et al.’s counting method (Howard et al., 1987, p. 976) has similar characteristics. The credits are reduced by one third going from the first author to the second author, and so on. A further analysis utilizing Xu et al.’s framework may reveal other similarities that do not emerge from applying Frameworks 1 and 2. However, Xu et al.’s framework does not fit with as many counting methods as Frameworks 1 and 2, as “citation-based credit assignment methods”—for example, Pareto weights—are not included in Xu et al.’s framework (Xu et al., 2016, p. 1974).

The two frameworks selected for the review represent a further development of the well-known dichotomy of full versus fractional counting (Framework 1) and focus on the argument for the introduction of the counting methods (Framework 2). Thus, the frameworks illustrate that different approaches can be used to describe counting methods. Framework 1, Xu et al.’s framework, and other frameworks used in relation to counting methods draw on mathematical properties, which are highly relevant for the analysis of counting methods. However, applying Framework 2 in the review shows that approaches other than mathematical can add useful nuance to our knowledge about counting methods.

#### 4.1.3. RQ 3: Homogeneity criterion may be developed further

RQ 3 applies three validity criteria (adequacy, sensitivity, and homogeneity) that are developed for and tested on bibliometric indicators (Gingras, 2014, pp. 116–119; Wildgaard, 2015, sec. 6.3, 2019, sec. 14.4.1). Although guidance for how to apply the criteria exists at the overall level, implementation in a specific case, such as the present review, requires several choices.

The homogeneity criterion is difficult to use in relation to counting methods. The criterion guidance explains that a mix of different elements with the same measure unit does not present as heterogeneous but as composite (Gingras, 2014, p. 122). However, the difference between heterogeneous and composite is described with a less than ideal example that makes accurate interpretation of the guidance difficult.

The homogeneity criterion may be interpreted more strictly than is done in the review (see example in Section 2.3), leading to the definition of fewer elements for indicating heterogeneous counting methods. Or the difference between heterogeneous and composite may be ignored, leading to the definition of more elements for indicating heterogeneous counting methods.

#### 4.1.4. RQ 4: Selective focus on peer-reviewed research evaluations

RQ 4 conducts a literature search to identify research evaluations that use the counting methods identified by RQ 1. This means that research evaluations that do not cite studies identified by RQ 1 are not found in the RQ 4 literature search. For some of the well-known counting methods, this could result in underrepresentation in the RQ 4 search results: Research evaluations may mention the name of the counting method without a reference at all, or they may cite later studies describing the counting method rather than the original study that introduced the counting method.

Three counting methods identified by RQ 1 have several names and/or studies that introduce the counting methods. This is sometimes seen in bibliometric studies and can lead to misinterpretations (Gauffriau et al., 2008, pp. 166–169). The three counting methods are: Harmonic counting, Proportional counting (also known as Arithmetic counting), and Noblesse Oblige (Egghe et al., 2000, p. 146; Hagen, 2008; Hodge & Greenberg, 1981; Van Hooydonk, 1997; Zuckerman, 1968). A literature search of the alternative names and/or studies that introduce the counting methods is conducted. The search leads to a few additional research evaluations that use the counting methods identified by RQ 1. In the results related to RQ 4, Proportional counting would change from the interval “zero research evaluations use the counting method” to “one to three research evaluations use the counting method.” This change would have no impact on the results of the review (see the Supplementary Material, Section 1).

Furthermore, the review excludes research evaluations in reports and gray literature and only includes peer-reviewed studies in English. These limitations mean that results regarding the use of the counting methods may be underreported. A broader literature search could be conducted for selected languages or by introducing limitations other than the ones chosen in the present review to manage the literature search. It is worth noting, however, that including all publication types and all languages would be impractical, resulting in a huge search. The result set of such a search would pose considerable challenges for analysis, especially for qualitative approaches such as those employed in relation to RQ 4.

A final point is that the focus of the review is on the applied use of counting methods in larger research evaluations, and thereby, the scalability of the counting methods. This focus can be adjusted to accommodate other types of use cases, such as investigations of how existing counting methods inform the development of new counting methods or tests of the mathematical properties of the counting methods. As with the choice of frameworks for categorizations of counting methods, there are many possibilities for analyses exploring the use of counting methods.

### 4.2. Discussion of the Results

Section 4.1 discussed the review’s methods and their limitations. Section 4.2 discusses interpretations of the results related to RQs 1–4.

#### 4.2.1. RQ 1: New introductions of counting methods underline the relevance of analyses of counting methods

The results related to RQ 1 show that counting methods in the bibliometric research literature should neither be reduced to a simplified choice between two counting methods—full and fractional counting—nor be implicit in bibliometric analyses. The literature search identifies 32 counting methods, and the majority (17 counting methods) have been introduced in the most recent decade, a situation that underlines the relevance of the review.

#### 4.2.2. RQ 2: Consistent analyses of counting methods reveal categories of counting methods

The results related to RQ 2 demonstrate that consistent analyses of counting methods provide new knowledge and allow a more nuanced understanding of counting methods.

Below, three main observations based on the results of applying Frameworks 1 and 2 are discussed. The observations do not relate to counting methods introduced in a specific decade; rather, they are valid for counting methods from the 1980s as well as for counting methods from the 2010s. Following the three observations, counting methods not fitting the frameworks are discussed.

##### Observation 1

The first observation is that all counting methods are introduced with specific basic units of analysis and objects of study, often authors. Recall that complete counting and whole counting methods are identical at the microlevel but often result in different scores at the meso- or macrolevels. In other words, this difference is not visible if two counting methods are only defined at the microlevel. Thus, not all definitions of the counting methods necessarily hold if the counting methods are applied at aggregation levels other than the aggregation levels for which they are specifically defined (often the microlevel). The use of the counting methods would be facilitated if they were to be introduced as score functions, which could be combined with different basic units of analysis and objects of study.

##### Observation 2

The second observation is about rank-dependent counting methods, excluding rank-dependent counting methods based on other characteristics than rank. The majority (19 of the 32 counting methods) are rank dependent. Again, most of these counting methods are defined at the microlevel.

Among the pre-1970 counting methods, straight counting is rank dependent. Older studies have shown that the difference in scores obtained by straight counting and other pre-1970 counting methods levels out at the meso- and macrolevels. Obviously, when straight counting is used at the microlevel, it is important to be the first author of a publication to receive credit for that publication (Lindsey, 1980, pp. 146–150). However, at the meso- or macrolevel, straight counting scores for institutions or countries are fair approximations of scores resulting from whole or fractional counting (Braun, Glänzel, & Schubert, 1989, p. 168; Cole & Cole, 1973, pp. 32–33).

As discussed in Section 1.1, today, there are more institutions and countries per publication. Therefore, in analyses with recent publications, scores resulting from whole versus straight counting are more likely to differ (for examples, see Gauffriau et al., 2008, pp. 156–157; Lin, Huang, & Chen, 2013). However, straight and complete-fractionalized counting still yield similar scores at the meso- or macrolevels.

Given the situation described above, it is likely that the results regarding straight counting would hold for other rank-dependent and fractionalized counting methods. In theory, this means that the 14 rank-dependent and fractionalized counting methods could be substituted by complete-fractionalized counting for analyses at the meso- and macrolevels. Complete-fractionalized counting is rank independent and, therefore, easier to apply compared to rank-dependent counting methods with their more complex equations. In future research, it would be interesting to investigate comparisons between complete-fractionalized counting and the 14 rank dependent and fractionalized counting methods using empirical data at the meso- and macrolevels.

##### Observation 3

The third observation is that almost all of the counting methods (28 of 32) are introduced with an argument from Group 1 in Framework 2: “The indicator measures the (impact of) contribution/participation/… of an object of study.” This result suggests that the common understanding of counting methods is that they relate to the concept that the study aims to measure, for example, the object of study’s participation in a research endeavor as measured by whole counting. However, four of the counting methods in the review show that there are alternative arguments for introducing counting methods (see the Supplementary Material, Section 1). An interpretation that assumes all counting methods aim to measure the contribution/participation/… of an object of study would be a mistake.

##### Counting methods that do not fit the study frameworks

In addition to the observations above, as shown in Section 3.2.5, two counting methods do not fit the selected properties from Framework 1. Also, as Section 3.2.6 illustrates, two arguments for introducing counting methods do not currently comply with Framework 2. In the context of the present review, it is no surprise that not all counting methods fit Frameworks 1 and 2. As discussed in Section 4.1.2, neither does Xu et al.’s framework cover all counting methods.

Indeed, the counting methods outside the frameworks offer a potential opportunity to investigate the further development of the frameworks. Also, an analysis of why the counting methods do not fit the frameworks could give new perspectives on the counting methods. To this end, the Supplementary Material, Section 2, presents an example case study using Frameworks 1 and 2 to analyze the counting method used in the NPI, and through this analysis, to identify potential for developing the frameworks further and achieving deeper understanding of the NPI.

#### 4.2.3. RQ 3: Assessment of internal validity of counting methods can be developed further

The results related to RQ 3 present the application of three criteria to evaluate the internal validity of counting methods. First, methods to assess the adequacy of counting methods, and thus, to support the internal validity of the counting methods are presented. Next, the analysis considers elements that define weak sensitivity of the counting methods. Finally, elements in counting methods that make those counting methods heterogeneous are examined. Elements connected with weak sensitivity and heterogeneous elements do not support the internal validity of counting methods.

The use of the adequacy criterion in relation to counting methods suggests that adequacy can be analyzed in relation to the aims of the counting methods (i.e., the arguments for introducing the counting methods from Framework 2). The use of the sensitivity criterion on counting methods shows that seven counting methods have elements that indicate weak sensitivity (see the Supplementary Material, Section 1). On the other hand, the remaining counting methods do not accommodate explicit measures to support sensitivity (i.e., reflecting the increasing number of authors per publication over time). Likewise, the use of the homogeneity criterion in relation to counting methods indicates that a large number (22 of the counting methods or 15 counting methods if a strict interpretation of the criterion is used; see Section 4.1.3) have heterogeneous elements that work against homogeneity. At least for these counting methods, no specific measures are taken to support homogeneity. Thus, the results related to RQ 4 suggest potential for the consistent use of validity criteria in relation to counting methods.

The bibliometrician can decide not to use counting methods with elements that work against sensitivity and homogeneity (see the Supplementary Material, Section 1). However, there may be more elements than those identified in the review, potentially leading to the exclusion of more counting methods. Alternatively, the heterogeneous counting methods may have high adequacy and thus prove useful in research evaluations anyway. As such, the present review’s application of the three criteria for validity should be used as attention and guiding points for selecting counting methods—but not as a selection key. Wildgaard reaches a similar conclusion after her implementation of the three criteria (Wildgaard, 2015, p. 95).

#### 4.2.4. RQ 4: The context in which the counting method are used should be assessed

The results related to RQ 4 investigate to what extent the counting methods identified by RQ 1 are used in research evaluations, and whether research evaluations use the counting methods in agreement with how the counting methods are described initially in the studies that introduce them. The analysis finds that a large majority of the counting methods (29 of 32) are either used in a maximum of three research evaluations or not used at all in research evaluations. The paradox of this moderate use and new counting methods continuously being introduced into the bibliometric research literature remains unsolved.

Three counting methods are used in at least four research evaluations. In one instance, a counting method is used in a research evaluation with other basic units of analysis and objects of study than those defined in the introduction of the counting method. It is important to be aware of the contexts in which the counting methods are used and whether these contexts differ from the definition of the counting methods. In a previous study, the results show that the use of pre-1970 score functions is not consistent across studies. Many of the score functions are used with several arguments from Framework 2 in the bibliometric research literature (Gauffriau, 2017).

## 5. CONCLUSION

The aims of the present review are to investigate counting methods in the bibliometric research literature and to provide insights into their common characteristics, the assessment of their internal validity, and how they are used.

The review shows that the topic of counting methods in bibliometrics is complex but the review also demonstrates that consistent analysis of counting methods is possible. The analysis of counting methods lead to several new findings. Below are some of the main findings and possible implications of the findings.

One important finding is that 27 of the 32 counting methods covered by the review are defined at the microlevel (authors). This makes it difficult to use these counting methods at other aggregation levels in a consistent manner.

Another important finding suggests that the common understanding of counting methods is that they relate to the concept that the study using the counting methods aims to measure, for example, the concept “author contribution.” Often, however, these concepts are not well defined in studies that introduce counting methods. Research on counting methods can benefit from better integration with studies on the concepts to be measured via the counting methods.

Furthermore, the review applies three internal validity criteria for well-constructed bibliometric indicators (adequacy, sensitivity, and homogeneity) to counting methods for the first time. The criteria help identify methods and elements useful for assessing the internal validity of counting methods. Some of these methods and elements (for example, comparisons of counting methods) are often used in analyses of counting methods. However, as the results show, many other methods and elements can be used to assess the internal validity of counting methods, such as to define concepts measured by the counting methods (see item above).

Finally, the review documents the paradox between the many counting methods introduced into the bibliometric research literature and the finding that only a few of these counting methods are used in research evaluations. This finding may indicate a gap between theoretical and applied approaches to counting methods. For example, many university rankings do not provide detailed and peer-reviewed documentation about the applied counting method, with the Leiden Ranking as an exception.

The review provides practitioners in research evaluation and researchers in bibliometrics with a detailed foundation for working with counting methods. At the same time, many of the findings in the review provide bases for future investigations of counting methods. Well-defined frameworks other than those used in the review could be applied to investigate counting methods. The categories of counting methods identified in the review could also be analyzed further, such as through a study of how to use the counting methods at different aggregation levels. A further evaluation could be carried out of the methods and elements deemed useful for assessing internal validity of counting methods. And, finally, the use of counting methods in contexts other than research evaluations could be examined. The schematic overview of the results of the review presented in the Supplementary Material (see Section 1) may be a useful starting point for inspiring further investigations of counting methods.

## ACKNOWLEDGMENTS

The author would like to thank Dr Lorna Wildgaard for critically reading earlier versions of the manuscript, Dr Dorte Drongstrup for fruitful discussions in relation to the manuscript, and Dr Abigail McBirnie for proof-editing. Furthermore, the author wishes to thank the two reviewers for their comments, which helped improve the review.

## COMPETING INTERESTS

The author has no competing interests.

## FUNDING INFORMATION

The Royal Danish Library has covered costs for proof-editing.

## DATA AVAILABILITY

A list of arguments for introductions of counting methods analyzed under RQ 2, and references to research evaluations analyzed under RQ 4 are available in the Supplementary Material.

## Notes

^{1}

(Gauffriau et al., 2007, p. 189). The term “normalized” is changed to “fractionalized” and the column “In Section” is removed. Reprinted by permission from Springer Nature Customer Service Centre GmbH: Springer Nature, Scientometrics, Publication, cooperation and productivity measures in scientific research, Marianne Gauffriau et al., 2007.

^{2}

(Gauffriau, 2017, p. 679)—postprint version: https://arxiv.org/abs/1610.02547v2.

^{3}

Two counting methods were published online first in 2018 and, thus, included in the period covered by the review. The two studies introducing the counting methods were assigned to journal issues in 2019 (Bihari & Tripathi, 2019; Steinbrüchel, 2019).

^{4}

The counting method Weighted fractional output is in Figure 2 with its two versions: Intramural is used for publications where first and last author are from the same institution. Otherwise, extramural is used (Abramo et al., 2013, p. 201). In the counting method Network-based model, the bibliometrician must select a distribution factor (Kim & Diesner, 2014). Two distribution factors (*d* = 0.25 and *d* = 0.59) are selected for the illustration in Figure 2.

^{5}

The term “productivity” is debated. Often, it is used as a simple concept, as is the case in the two referenced studies. This simple interpretation of productivity can be added to Group 1 in Framework 2. However, Abramo and D’Angelo (2014) argue for productivity as a complex concept requiring input and output indicators to calculate the productivity of the objects of study. They introduce a counting method relating to output (see Figure 2). The argument for their counting method is to measure author contributions (Abramo et al., 2013, p. 200). This argument is assigned to Group 1 in Framework 2: “The indicator measures the (impact of) contribution/participation/… of an object of study.” Thus, in Abramo and D’Angelo’s interpretation of productivity the counting method is one of the steps in calculating productivity (Abramo & D’Angelo, 2014, pp. 1135–1136).

^{6}

CRediT is a taxonomy for describing the contributions made by authors to research publications: https://casrai.org/credit/.

^{7}

Ellwein et al. (1989) use the simple interpretation of the term *productivity*. See ^{Footnote 5}.

^{8}

In the bibliometric research literature, the Danish Bibliometric Research Indicator is described with different calculations (Nielsen, 2017, p. 3; Schneider, 2009, p. 372; Wien, Dorch, & Larsen, 2017, pp. 905–907). Applying authors as basic units of analysis and objects of study, the calculation of the Danish Bibliometric Research Indicator overlaps with the Equal Contribution method with the modification that 10% of the credit is the minimum credit from a publication to a basic unit of analysis. However, the Danish Bibliometric Research Indicator has authors as basic units of analysis and institutions as objects of study. Thus, the calculation of the Danish Bibliometric Research Indicator differs from the Equal Contribution method (see a further discussion in Section 3.2.5).

^{9}

The Danish Bibliometric Research Indicator follows similar steps in the calculation. An institution’s score from a publication is calculated by first adding up the complete-fractionalized credits of the authors from the institution to a sum for the institution. Next, institutions with less than 10% of the credit from a publication each have their credit raised to 10% of the credit from the publication (Agency for Science and Higher Education, 2019, sec. Fraktionering (in Danish)). This practice is discussed further for the NPI in the Supplementary Material, Section 2.

^{10}

See ^{Footnote 6}.

## REFERENCES

*n*rule, parallelization, and team bonuses

*h*- and

*g*-index in case of fractional counting of authorship

*gh*-index

*h*-index should be supplemented by role-based

*h*-indices

*h*-index and

*g*-index

*zp*-index

## Author notes

Handling Editor: Ludo Waltman