Abstract
We study the array of partial sums, PX, of a given array X in terms of its h-type indices. Concretely, we show that h(PX) can be described in terms of the Lorenz curve of the array X and obtain a relation between the sum of the components of PX and the Gini index of X. Moreover, we obtain sharp lower and upper bounds for h-type indices of PX.
1. INTRODUCTION
h-type indices such as the h-index itself and the g-index have interesting mathematical properties as shown, for example, in Egghe and Rousseau (2019a), although they are only probably approximately correct (PAC) in research evaluation exercises (Bouyssou & Marchant, 2011; Waltman & van Eck, 2012; Rousseau, 2016). In this investigation we continue our theoretical investigation of the mechanism leading to h-type indices. Concretely, we study properties related to h-type indices of the array of partial sums of a given array X. We recall that these partial sums form the basis of the Lorenz curve and the related Gini index. As a consequence, we also obtain relations with the Gini index and the Lorenz curve of the original array X. We will further derive sharp lower and upper bounds for h-type indices of PX.
In the following section we recall the definitions we will use in this investigation.
2. DEFINITIONS
Let (R+)N be the set of all arrays of length N with nonnegative real values. An array X = (xr)r=1,2,…,N in (R+)N is said to be decreasing if, for all r = 1, 2, …, N, xr ≥ xr+1. The set (R+)N has a natural partial order defined by X ≤ Y if, for all r = 1, 2, …, N, xr ≤ yr. Equality between X and Y only occurs if xr = yr for all r. We denote the set of all decreasing arrays in (R+)N with at least one component larger than or equal to 1 by ΦN.
Next we recall the definition of some h-type indices for arrays in (R+)N.
2.1. The h-index (Hirsch, 2005)
Let X = (xr)r=1,2,…,N ∈ ΦN. The h-index of X, denoted h(X), is the largest natural number such that the first h coordinates have each at least a value h. If all components of a decreasing array X are strictly smaller than 1, then h(X) = 0. Such arrays are not considered further on because we will work with arrays in ΦN. If xN, the last element in X, is larger than or equal to N, then h(X) = N.
Recall that an h-index (and similarly for the other h-type indices defined further on) can only be defined for decreasing arrays. Moreover, for r ≤ h(X), xr ≥ r; conversely, if for all r ≤ n, xr ≥ n, then n ≤ h(X). Further, if r > h(X), then xr < h(X) + 1.
2.2. The g-index (Egghe, 2006a, b)
Let X = (xr)r=1,2,…,N ∈ ΦN. The g-index X, denoted g(X), is defined as the highest natural number g such that the sum of the first g coordinates is at least equal to g2. If the sum of all coordinates of X is strictly larger than N2, then we extend the array X with coordinates equal to zero, making it into an array in ΦM, M > N, until it is possible to apply the definition.
2.3. The R-index (Jin et al., 2007)
Let X = (xr)r=1,2,…,N ∈ ΦN. The R-index of X is defined as the square root of the sum of all coordinates up to and including the one with index h(X). Omitting the square root yields the R2-index. As it is easier to work with R2 than with R if their properties are for our purposes the same, all concrete examples will be given for R2.
2.4. Kosmulski’s h(2) Index (Kosmulski, 2006)
Let X = (xr)r=1,2,…,N ∈ ΦN. The h(2) or Kosmulski index of X, denoted h(2)(X), is the largest natural number h(2) such that the first h(2) coordinates have each at least a value (h(2))2.
2.5. The Majorization Order (Hardy et al., 1934)
3. AN INEQUALITY RELATED TO THE g-INDEX AND THE MAJORIZATION ORDER
Theorem 1. ∀ N ∈ N: X, Y ∈ ΦN: X − < Y ⇒ g(X) ≤ g(Y) (*)
Proof. Although this theorem is implied in Egghe (2009, p. 487) we present here two short proofs.
First proof: For each i ≤ g(X), ≥ ≥ i2. This implies that g(Y) ≥ g(X).
Second proof: For each i > g(Y), ≤ < i2. Also this inequality implies that g(X) ≤ g(Y).
Comments
- A.
This theorem proves that g is an order-preserving mapping from (R+)N with the majorization order to the positive real numbers with their natural order.
- B.
The converse of inequality (*) does not hold. Consider for instance X = (5, 2, 2) and Y = (4, 4, 1). Then g(X) = g(Y) = 3 but neither X ≤ Y nor Y ≤ X holds.
- C.
The inequality (*) can be strict. Indeed, take X = (2, 1, 1) and Y = (2, 2, 0). Then X ≤ Y, but g(X) = 1 and g(Y) = 2.
- D.
Yet, inequality (*) cannot be strict for N = 2. Indeed, consider X = (x1, x2) and Y = (y1, y2), with X ≤ Y. Then 1 ≤ x1 ≤ y1 (hence g(X), g(Y) ≥ 1) and x1 + x2 = y1 + y2. As N = 2, this sum completely determines the value of the g-index. Hence this value must be equal for X and Y. We note that even here there is no upper bound to the value of g(X) = g(Y).
- E.
If it were allowed that x1 < 1 then the previous Comment D is not valid. Indeed, take X = (½, ½) and Y = (1, 0) then X ≤ Y, and g(X) = 0 and g(Y) = 1.
- F.
Inequality (*) does not hold for the h- or the R2-index. Consider X = (3, 3, 3) and Y = (5, 2, 2). Then X ≤ Y but h(X) = 3 > h(Y) = 2. Moreover, R2(X) = 9 > R2(Y) = 7.
Proposition 1.
- (a)
For X, Y ∈ Φ2 and X ≤ Y, h(X) ≥ h(Y)
- (b)
For X, Y ∈ Φ3, X ≤ Y and if the components of X and Y are strictly positive natural numbers, then h(X) ≥ h(Y).
Proof.
- (a).
N = 2, X ≤ Y then x1 ≤ y1 and x1 + x2 = y1 + y2. Hence x2 ≥ y2. If now h(X) = 1, then 1 ≤ y1 and 2 > x2 ≥ y2. This implies that h(Y) = 1 = h(X). The case h(X) = 2 is trivial: As N = 2, it follows that h(Y) ≤ 2 = h(X).
- (b).
N = 3 and X ≤ Y, then x1 ≤ y1; x1 + x2 ≤ y1 + y2 and x1 + x2 + x3 = y1 + y2 + y3. This already implies that x3 ≥ y3. We now consider three cases: h(Y) = 3, h(Y) = 2 and h(Y) = 1.
Assume first that h(Y) = 3. Then y3 ≥ 3. Hence we see that x1 ≥ x2 ≥ x3 ≥ y3 ≥ 3, from which we derive that h(X) = 3 = h(Y).
Assume next that h(Y) = 2. Then y3 < 3, hence y3 = 2 or y3 = 1, and y1 ≥ y2 ≥ 2. We first consider the case y3 = 1. We know already that x3 is at least equal to 1. So, if x3 is equal to 1, then x1 + x2 = y1 + y2. As x1 ≤ y1 the previous equality implies that y2 ≤ x2. Now h(Y) = 2 leads to 2 ≤ y2 ≤ x2, or h(X) ≥ 2 = h(Y). Still with y3 = 1 we now consider the case that x3 > 1. Then x1 ≥ x2 ≥ x3 ≥ 2, leading to h(X) ≥ 2 = h(Y).
Next we consider the case y3 = 2. Then x1 ≥ x2 ≥ x3 ≥ y3 = 2, which implies that h(X) ≥ 2 = h(Y).
Finally, as components are assumed to be strictly positive natural numbers, h(X) and h(Y) are at least equal to 1. Hence h(Y) = 1 implies h(X) ≥ h(Y).
Comments
- A.
Proposition 1(b) is not valid if some of the components are zero. This is illustrated as follows. Let X = (2, 1, 1) and Y = (2, 2, 0). Then X ≤ Y, but h(X) = 1 and h(Y) = 2.
- B.
Proposition 1(b) is also not valid if some of the components are not natural numbers. Indeed, let X = (2, 1.5, 1.5) and Y = (2, 2, 1). Then X ≤ Y, but h(X) = 1 and h(Y) = 2.
- C.
Propositions 1(a) and 1(b) are not valid for R2.
- (a)
N = 2. Consider X = (1, 1) and Y = (2, 0). Then X ≤ Y, h(X) = h(Y) = 1; R2(X) = 1 and R2(Y) = 2.
- (b)
N = 3. Consider X = (2, 2, 2) and Y = (3, 2, 1). Then X ≤ Y, h(X) = 2 = h(Y) and R2(X) = 4 < R2(Y) = 5.
- (a)
- D.
D. Proposition 1 with N ≥ 4 is not valid for the h-index h.
4. INTRODUCING THE ARRAY OF PARTIAL SUMS
Remarks
- 1.
(PX)1 = A = ; (PX)N = x1.
- 2.
If X ends with p zeros, then PX starts with p + 1 As.
- 3.
Clearly X ≤ PX, as (PX)N = x1. Hence h(X) ≤ h(PX), g(X) ≤ g(PX) and R(X) ≤ R(PX). (Egghe & Rousseau, 2019a; Proposition 2).
5. A RELATION WITH THE GINI INDEX
6. A GEOMETRIC INTERPRETATION OF h(PX) IN TERMS OF THE LORENZ CURVE LX
For the decreasing array X of nonnegative real numbers, (xj)j=1,…,N and for aj = = , the Lorenz curve of X, denoted as LX, connects points with coordinates . The average of array X is denoted as .
An illustration: If X = (3, 2, 1, 0), N = 4, = , PX = (6, 6, 5, 3) and h(PX) = 3. Now LX = < = ; but LX = ≥ = . Hence, the smallest s is equal to 2/4 and h(PX) = 4(1 − 2/4) + 1 = 2 + 1 = 3.
In Egghe and Rousseau (2019b) we studied h(PX) and its relation with the Lorenz curve in a continuous context. This led to a new geometric interpretation of the h-index.
7. BOUNDS ON h-TYPE INDICES
In the next sections we derive bounds for h-type indices of PX. This is of importance for the following reason. A function relates an input to a unique output. In this way the standard h-index is a function which maps an array to a natural number. Yet it is not an explicit function, such as the function that maps the real number x to x2 + 4x + 7 or the function which maps a finite array to its sum. Finding an h-index needs a procedure and hence it is not possible to study properties in an analytical way (e.g., using integrals). The bounds obtained in this article are explicit functions which can be studied using analytical methods.
We denote by ⌊a, the floor function of a (i.e., the largest integer smaller than or equal to a). We note that a ≥ ⌊a⌋ > a − 1. Using the notation just introduced we come to the following interesting theorem.
Before proving Theorem 2 we make three remarks:
- 1.
The first inequality, namely min(N, A) ≥ h(PX) is easy to see because, on the one hand, an h-index can never be larger than the length of the array and on the other 1 ≤ h(PX) ≤ ≤ A.
- 2.
h(PX) = N if and only if x1 ≥ N.
- 3.
h(PX) = 0 can never occur in our context. Indeed, this may only occur if all components are strictly smaller than 1, which is excluded. Yet, in Egghe and Rousseau (2019c) we showed that formula 4 is also correct in cases for which h(PX) = 0.
Proof of Theorem 2. We only have to show the second inequality. By definition we know that h(PX) is equal to the largest index i such that yi = ≥ i. We know that yi = = (N − i + 1).(the average of (x1, x2, …, xN−i+1)) ≥ (N − i + 1). (as the array X is ranked in decreasing order).
In order to make these bounds more concrete we provide a table (Table 1) for some values of N and (or A), showing how sharp these bounds often are. Largest differences occur when the average number of items is one.
N . | . | |||||
---|---|---|---|---|---|---|
0.1 . | 0.5 . | 1 . | 2 . | 5 . | 10 . | |
10 | 1 ≥ h(PX) ≥ 1 | 5 ≥ h(PX) ≥ 3 | 10 ≥ h(PX) ≥ 5 | 10 ≥ h(PX) ≥ 7 | 10 ≥ h(PX) ≥ 9 | 10 ≥ h(PX) ≥ 10 |
30 | 3 ≥ h(PX) ≥ 2 | 15 ≥ h(PX) ≥ 10 | 30 ≥ h(PX) ≥ 15 | 30 ≥ h(PX) ≥ 20 | 30 ≥ h(PX) ≥ 25 | 30 ≥ h(PX) ≥ 28 |
100 | 10 ≥ h(PX) ≥9 | 50 ≥ h(PX) ≥ 33 | 100 ≥ h(PX) ≥ 50 | 100 ≥ h(PX) ≥ 67 | 100 ≥ h(PX) ≥ 84 | 100 ≥ h(PX) ≥ 91 |
200 | 20 ≥ h(PX) ≥ 18 | 100 ≥ h(PX) ≥ 67 | 200 ≥ h(PX) ≥ 100 | 200 ≥ h(PX) ≥ 134 | 200 ≥ h(PX) ≥ 167 | 200 ≥ h(PX) ≥ 182 |
N . | . | |||||
---|---|---|---|---|---|---|
0.1 . | 0.5 . | 1 . | 2 . | 5 . | 10 . | |
10 | 1 ≥ h(PX) ≥ 1 | 5 ≥ h(PX) ≥ 3 | 10 ≥ h(PX) ≥ 5 | 10 ≥ h(PX) ≥ 7 | 10 ≥ h(PX) ≥ 9 | 10 ≥ h(PX) ≥ 10 |
30 | 3 ≥ h(PX) ≥ 2 | 15 ≥ h(PX) ≥ 10 | 30 ≥ h(PX) ≥ 15 | 30 ≥ h(PX) ≥ 20 | 30 ≥ h(PX) ≥ 25 | 30 ≥ h(PX) ≥ 28 |
100 | 10 ≥ h(PX) ≥9 | 50 ≥ h(PX) ≥ 33 | 100 ≥ h(PX) ≥ 50 | 100 ≥ h(PX) ≥ 67 | 100 ≥ h(PX) ≥ 84 | 100 ≥ h(PX) ≥ 91 |
200 | 20 ≥ h(PX) ≥ 18 | 100 ≥ h(PX) ≥ 67 | 200 ≥ h(PX) ≥ 100 | 200 ≥ h(PX) ≥ 134 | 200 ≥ h(PX) ≥ 167 | 200 ≥ h(PX) ≥ 182 |
The next theorem shows that the second inequality in Theorem 2 becomes an equality for the array = (, , …, ) ∈ (R+)N.
Proof. We see that = (N, (N − 1) , …, 2, ). Then h() is the largest natural number i such that (N − i + 1) ≥ i. We observe that then h() is equal to the largest natural number i such that i ≤ (N + 1) and hence h() = . This proves Theorem 3.
We next present some examples, illustrating different aspects of the previous results.
Example 1. Returning to the example introduced before, we have X = (4, 3, 2, 1), with = 2.5 and PX = (10, 9, 7, 4). Now N = h(PX) = 4 > = = ⌊3.571⌋ = 3. This illustrates that the second inequality in Theorem 2 can be strict. Continuing now with we see that N = 4 > h() = h(10, 7.5, 5, 2.5) = 3 = .
Example 2. Consider X = (4, 2, 1, 1) with = 2 and PX = (8, 7, 6, 4). Now N = h(PX) = 4 > = = = 3. Continuing with we see that N = 4 > h() = h(8, 6, 4, 2) = 3 = = . This example illustrates that the floor function is really needed, because 3 < 10/3.
Example 3. Let X = (4, 0, 0, 0) with = 1 and PX = (4, 4, 4, 4). Now N = h(PX) = 4 > = = ⌊2.5⌋ = 2. This is another example that the second inequality in Theorem 2 can be strict. Continuing with we see that N = A = 4 > h() = h(4, 3, 2, 1) = 2 = = ⌊2.5⌋. This is not only another example that the floor function is really needed, but it also illustrates that the first inequality in Theorem 3, and hence also in Theorem 2, can be strict.
Example 4. In the previous examples h(PX) = N. Next we present an example where h(PX) < N. Let X = (3, 2, 1, 0). Then = 3/2 and PX = (6, 6, 5, 3). Now N = 4 > h(PX) = 3 ≥ = = ⌊3⌋ = 3. Continuing with we see that N = 4 > h() = h(6, 4.5, 3, 1.5) = 3 = = 3.
Example 5. Finally, we present an example where min(N, A) = A < N. Let X = (2, 0, 0, 0). Then = ½ and PX = (2, 2, 2, 2). Now A = 2 = min(N, A) = h(PX) = 2 ≥ = = 1. Continuing with we see that A = 2 = min(N, A) > h() = h(2, 3/2, 1, 1/2) = 1 = = 1.
Corollaries
- A.
If ≥ N, then h(PX) = N.
- B.
h(PX) = N
Remark
When applied to publications, corollaries A and B show that for large we only need those publications in X with the highest citations to determine h(PX). This is in accordance with the principle and meaning of an h-index.
8. PARTIAL SUMS AND THE G-INDEX
Using the same notations as before, we next prove the analogue of Theorem 2 for the g-index. We recall that the g-index has no upper limit.
If ≥ N2 then we have to study ≥ i2. In the same way as above we find that ≥ . ≥ i2 (is all we need). Hence i ≤ or imax = , where imax denotes the maximal value the index i can take here.
Consequently, if ≥ N2 then g(PX) ≥ .
Similar to the theory for the h-index, the next theorem shows that inequality in Theorem 4 becomes an equality for the array .
Proof. Now: = N + (N − 1) + … + 1. = . .
Hence, .
Similarly, .
Comment. Also here we can make the remark that lower bounds for g(PX) and g() depend only on N and .
Examples
Example 1. Take X = (4, 4, 4, 4), = 4 and PX = (16, 12, 8, 4). Then g(PX) = 6 (as 40 > 62 and 40 < 72). As this is a case where 40 > 42 we have to check formula 6b. This formula states that 6 = g(PX) ≥ = = 6. Thanks to the use of the floor function we obtain an equality.
Example 2. Take X = (4, 3, 2, 1), = 2.5 and PX = (10, 9, 7, 4). Then g(PX) = 5 (as 30 > 52 and 30 < 62). Also here we have to check formula 6b. We see that 5 = g(PX) ≥ = = 5. This is an example where the floor function is not necessary.
Example 3. For X = (4, 0, 0, 0), = 1 and PX = (4, 4, 4, 4). Here the sum, namely 16, is larger than or equal to N2 = 42; hence we have to check formula 6b. This leads to 4 = g(PX) ≥ = = 2. This is another case where we have strict inequality.
Example 4. For X = (2, 0, 0, 0), = 0.5 and PX = (2, 2, 2, 2). Here the sum, namely 8 < 42, hence we have to check formula (6a). This leads to 2 = g(PX) ≥ = 1. Also here we have strict inequality.
e) Finally we consider a case for which N ≠ 4. Let X = (5, 4, 3, 2, 1), = 3 and PX = (15, 14, 12, 9, 5). Here the sum namely 55 > 52; hence we check formula 6b. We first note that g(PX) = 7 (55 > 72 and 55 < 82). Now 7 = g(PX) ≥ = = 6. This is again a case with a strict inequality.
Proof. If is large, then we have to consider formula 6b. Then the right-hand side of formula 6b becomes unlimited large and hence this also holds for g(PX). This result confirms the fact that the g-index has no upper limit.
9. PARTIAL SUMS, THE R(R2)-INDEX AND KOSMULSKI’S INDEX H(2)(X)
In the previous sections we studied the h-index and the g-index. As a final case we mention the R2-index and Kosmulski’s h(2)-index. For proofs of the results we refer the reader to Egghe and Rousseau (2019c).
Similarly to Theorem 3 we have
10. DISCUSSION AND CONCLUSION
In this article we studied arrays of partial sums, PX, of a given array X in terms of their h-type indices. We showed that h(PX) can be described in terms of the Lorenz curve of the array X. Moreover, we obtained sharp lower and upper bounds for these h-type indices. We found bounds that only depend on N, the length of the array, and the average of array X, or equivalently, on the length of the array and the total sum of all items in the array.
As h(PX) is an h-index it is not surprising that it is not strictly independent in the sense of Bouyssou and Marchant (2011). This means that if h(PX) < h(PY) and if one adds to X and Y the same items (X becomes X′, and Y becomes Y′) then it is possible that h(PX′) > h(PY′). An example: Let X = (2, 0, 0, 0, 0) and Y = (1, 1, 1, 1, 1). Then PX = (2, 2, 2, 2, 2) with h(PX) = 2, and PY = (5, 4, 3, 2, 1) with h(PY) = 3, hence h(PX) < h(PY). Adding 5 times 1 to each of them yields X′ = (2, 1, 1, 1, 1, 1, 0, 0, 0, 0), PX′ = (7, 7, 7, 7, 7, 6, 5, 4, 3, 2) with h(PX′) = 6, and Y′ = (1, 1, 1, 1, 1, 1, 1, 1, 1, 1), PY′ = (10, 9, 8, 7, 6, 5, 4, 3, 2, 1) with h(PY′) = 5, hence h(PX′) > h(PY′).
A reviewer asked if h(PX) can be described in terms of Vannucci’s (2010) dominance dimension. It can: Using Vannucci’s notation we see that h(PX) = , where X = (xn)n=1,…,N is an array of length N.
Our investigation illustrated the rich mathematical structure hidden in the mechanism leading to h-type indices (see also Egghe & Rousseau, 2019d). In this article we considered the discrete case, requiring the floor function in order to get the correct results. In further research we intend to study the continuous case, where by definition no floor function will be needed. Then, bounds will be differentiable and integrable functions.
AUTHOR CONTRIBUTIONS
Leo Egghe: conceptualization; formal analysis; investigation; methodology; writing—original draft; writing—review and editing. Ronald Rousseau: validation; writing—review and editing.
COMPETING INTERESTS
The authors have no competing interests.
FUNDING INFORMATION
No funding has been received.
DATA AVAILABILITY
Not applicable.
ACKNOWLEDGMENTS
The authors thank anonymous reviewers for useful suggestions to improve the presentation of this article.
REFERENCES
Author notes
Handling Editor: Vincent Larivière