## Abstract

While most proposed methods for solving classification problems focus on minimization of the classification error rate, we are interested in the receiver operating characteristic (ROC) curve, which provides more information about classification performance than the error rate does. The area under the ROC curve (AUC) is a natural measure for overall assessment of a classifier based on the ROC curve. We discuss a class of concave functions for AUC maximization in which a boosting-type algorithm including RankBoost is considered, and the Bayesian risk consistency and the lower bound of the optimum function are discussed. A procedure derived by maximizing a specific optimum function has high robustness, based on gross error sensitivity. Additionally, we focus on the partial AUC, which is the partial area under the ROC curve. For example, in medical screening, a high true-positive rate to the fixed lower false-positive rate is preferable and thus the partial AUC corresponding to lower false-positive rates is much more important than the remaining AUC. We extend the class of concave optimum functions for partial AUC optimality with the boosting algorithm. We investigated the validity of the proposed method through several experiments with data sets in the UCI repository.

## 1. Introduction

The receiver operating characteristic (ROC) curve and the area under the ROC curve (AUC) are widely employed performance measures of classifiers (Pepe & Thompson, 2000), especially in medical diagnosis and psychological studies. An important property of the AUC is that it can be used when the prior probability of each class is biased, whereas the error rate counts only the number of wrong classifications regardless of the bias of samples of each class. Also, when the goal of a problem is to find a discriminant function with a high AUC value, it is natural to use an algorithm that directly maximizes the AUC; however, in general, direct construction of a discriminant function that maximizes the AUC value is difficult because of the nondifferentiability of the definition of the AUC. Brefeld and Scheffer (2005) proposed a support vector machine (SVM) type of algorithm (Vapnik, 1998) maximizing the AUC. The optimization problem for this algorithm can be solved by standard QP solvers; however, its computational cost grows quadratically with the number of examples, and optimization is hard for large-scale problems. Boosting is a learning method that builds up a stronger classification machine from a set of base learning machines that can be easily learned, and AdaBoost (Freund & Schapire, 1997) is a typical implementation of boosting. Murata, Takenouchi, Kanamori, and Eguchi (2004) extended the algorithm to general *U*-Boost and discussed its statistical properties such as consistency and robustness. In this letter, we extend the concept of the AUC and propose a new performance measure along the lines of Murata et al. (2004) and a boosting-type of algorithm that maximizes the measure. We also propose an algorithm maximizing the partial AUC (pAUC), a measure of performance whose false positive rate is controlled, and investigate the statistical properties and robustness of the algorithms. In other words, the algorithm is designed to attain a high true positive rate within “a specific range of the false positive rate.” This idea is common in medical screening where the false-positive rate (the probability that healthy subjects are incorrectly judged to be positive) should be kept as low as possible or in preference-ranking problems, where only the top of the ranking list is the user's interest. Recently Rudin and Schapire (2009) proposed an algorithm that focuses on the left-most portion of the ROC curve based on RankBoost (Freund, Iyer, Schapire, & Singer, 2003). The method controls the weights on the exponential loss so that it puts more importance on low values of a false-positive rate, in contrast with the direct restriction of the range of the false positive in our approach.

This letter is organized as follows. In section 2, we describe the basic settings and briefly review the conventional AUC. We formulate a new measure, *U*-AUC, and propose a boosting-type algorithm, *U*-AUCBoost. In section 3, we propose an algorithm maximizing pAUC and investigate some of its statistical properties. The performances of the proposed methods are evaluated with some experiments in section 4.

## 2. *U*-AUC

In this section, we define a new performance measure, *U*-AUC, which is a lower bound of the AUC, and propose an algorithm, *U*-AUCBoost, that can maximize the AUC. Adopting the framework of boosting makes it possible to deal with large-scale data sets. We also investigate a relationship with a statistical model and statistical properties of the algorithm such as consistency or robustness.

### 2.1. Settings of the Problem.

*F*(

**), the conventional true-positive rate (TPR) and false-positive rate (FPR) at a threshold value**

*x**c*are defined as follows: where H(

*z*) is the Heaviside step function and

*p*(

_{y}**) is a conditional probability**

*x**p*(

**|**

*x**y*) of

**given the label**

*x**y*. The AUC is a performance measure of the discriminant function

*F*(

**) over all threshold value**

*x**c*and is defined by Its empirical version is written as where

*n*is the number of samples having label

_{y}*y*in the data set.

A direct construction of *F* that maximizes equation 2.4 is difficult, because this equation is not continuously differentiable. Replacement of the Heaviside function with a sigmoid function solves this nondifferentiability (Herschtal & Raskutti, 2004; Ghosh & Chaudhuri, 2005) or the normal distribution function (Komori & Eguchi, 2010); however, it may not be necessary to find the global maximum because of the nonconvexity of the sigmoid function. To overcome the difficulty, in this letter we consider a true positive utility (TPU) rather than the conventional TPR and propose a new performance measure, *U*-AUC, that approximately maximizes equation 2.4. A similar approach, RankBoost, has been investigated by Freund et al. (2003) and Rudin and Schapire (2009), who considered combining preferences to solve ranking problems and pointed out that RankBoost is equivalent to maximization of the AUC value (Cortes & Mohri, 2004). Our proposed method includes RankBoost as a special case and has such statistical properties as consistency and robustness.

### 2.2. *U*-AUC.

*U*-AUC and derive the algorithm for maximizing it in the framework of boosting. First, we define a quantity TPU as follows: where

*U*is a concave and monotonically increasing function, satisfying Based on this quantity, we propose a new performance measure,

*U*-AUC, and denote its empirical version as , Note that the A

_{U}(

*F*) is a lower bound of AUC, because of equation 2.6. Then maximization of A

_{U}(

*F*) implies maximization of the lower bound of the AUC, which is our purpose. Also note that A

_{U}(

*F*) is concave with respect to

*F*, i.e., because of the concavity of

*U*. The equality holds if and only if

*F*

_{1}(

**) =**

*x**F*

_{2}(

**) +**

*x**c*, where

*c*is an arbitrary constant. This property also gives the uniqueness (except for

*c*) of the optimal function .

*U*-AUC is threshold free because A_{U}(*F* + κ) = A_{U}(*F*) holds for an arbitrary threshold value κ. In a situation where sample numbers of classes are biased, this threshold-free property is a virtue.

The maximization of A_{U}(*F*) is a concave optimization problem that can be efficiently solved when a problem is not too large (Boyd & Vandenberghe, 2004). However, recently, there has been a great need for analyzing large-scale data sets having numerous examples with high dimension, and the direct optimization of A_{U}(*F*) is difficult for such large-scale problems. To deal with such problems, a boosting method has been proposed: this is a learning method that builds up a stronger classification machine from a set of base learning machines that can be easily learned. A lot of boosting-type algorithms have been proposed, and theoretical properties such as consistency and robustness have been intensively investigated. In this letter, to maximize with respect to *F* for large-scale data sets, we derive a boosting-type algorithm, *U*-AUCBoost, as follows. Let be a set of base classifiers. We then construct *F* by ∑^{T}_{t=1}α_{t}*f _{t}* where

*f*is a component of and α

_{t}_{t}is a coefficient of

*f*, as follows:

_{t}Initialize a weight for a pair of examples as and discriminant function as

*F*_{0}() = 0.*x*For

*t*= 1, …,*T*,Output the discriminant function

*F*(_{T}).*x*

In this algorithm, the weight is defined for a pair of examples while the weight is defined for an example in the usual boosting algorithm such as AdaBoost.

*F*

_{t−1}is given, we consider an update from

*F*

_{t−1}to

*F*

_{t−1}+ ε

*f*where ε is a small constant. Then we observe and maximization of this quantity corresponds to the selection of the base classifier. For the selected base classifier, we determine the coefficient as in step b. In the maximization in equation 2.10, the solution often diverges. Then in our experiments, for the purpose of regularization, we threshold α at 1 and optimize in the range of [0, 1]. This kind of implementation for numerical problems is also found in LogitBoost (Friedman, Hastie, & Tibshirani, 2000).

In the following sections, we discuss statistical properties of the *U*-AUCBoost such as the empirical AUC attained by the algorithm (see section 2.3), a related statistical model and consistency (see section 2.4), robustness (see section 2.5), and relation with the conventional methods such as the RankBoost or AdaBoost (see section 2.6).

### 2.3. Evaluation of Empirical AUC.

Using the boosting-type algorithm just described, we can construct the discriminant function *F _{T}* by linear combination of base classifiers. We investigate how the algorithm behaves by evaluating a lower bound of the empirical

*U*-AUC for

*F*as follows:

_{T}See appendix A.

If acc_{t}(*f _{t}*) − err

_{t}(

*f*)>0 and γ

_{t}_{t}>0 hold for all

*t*, the lower bound of and monotonically increases against

*T*.

### 2.4. Consistency of *U*-AUCBoost.

We consider a population maximizer of (2.7), and investigate its property. Then we observe the following theorem:

**) be an arbitrary function of**

*x***and ε be a small constant, and we consider the functional derivative of A**

*x*_{U}(

*F*). From we observe that the maximizer

*F** of the concave function A

_{U}(

*F*) satisfies which is rewritten as The bracketed term in equation 2.14 becomes 0 for any

**because η(**

*x***) is arbitrary.**

*x*Note that *F** + *c* where *c* is an arbitrary constant, also satisfies theorem 2.4 because of the threshold-free property of *U*-AUC. In the following, we ignore the arbitrariness of the constant.

*U*. In a context of

*U*-Boost (Murata et al., 2004), for example, a boosting algorithm with a function

*U*(

*z*) = (1 − η)exp(

*z*) + η

*z*where η is a tuning parameter, is closely related to a probabilistic model of mislabeling as where δ(

**) is a probability of mislabeling, which can depend on the input**

*x***. Takenouchi, Eguchi, Murata, and Kanamori (2008) investigated a model of δ(**

*x***) where the input has a higher mislabeling probability when the decision for predicting the label of**

*x***is more difficult.**

*x**If the function*

*U*satisfies equation 2.17, the maximizer*F** of*A*(_{U}*F*) is equal to*except for the arbitrariness of the constant*.

We next focus on a special type of function *U* associated with the logistic model.

#### 2.4.1. Class of Function Associated with the Logistic Model.

*F*(

**;**

*x***)} of discriminant functions parameterized by a vector**

*α***of parameters and the logistic model as where**

*α**c*is a constant. We assume that the true conditional distribution is represented by the logistic model with parameter

*α*_{0}. Under these conditions, we focus on estimating the parameter

**. Let be an estimator maximizing equation 2.7 where a true joint distribution of (**

*α**X*,

*Y*) is (=π

_{y}

*p*(

_{y}**)).**

*x**Fisher consistency. When the estimator of the parameter vector maximizing equation 2.7 associated with the distribution is consistent with the true parameter α_{0}, that is, , then the estimator is Fisher consistent*.

These properties are derived from the concavity of the function *U* and do not hold for the sigmoid function or the normal distribution function.

We consider some examples of the function *U* that satisfy equation 2.17 with λ = 2.

Exponential type:

*U*(_{rank}*z*) = 1 − exp(−*z*). We discussed some properties in section 2.6.Logistic type: .

The statistical properties of a boosting algorithm with this third function, called MadaBoost, is discussed in Kanamori, Takenouchi, Eguchi, and Murata (2007). MadaBoost is the most B-robust boosting algorithm for outliers in the sense of gross error sensitivity (Hampel, Rousseeuw, Ronchetti, & Stahel, 1986) when the true distribution is modeled by the logistic model. We next investigate the robustness of the loss function in the framework of *U*-AUC maximization. These functions are plotted in Figure 1.

### 2.5. Robustness of MadaBoost Type Loss.

*U*satisfying equation 2.17 and assume that the true distribution is written as where

*F*† is a discriminant function. Let us consider an update of the discriminant function with a base classifier

*f*∈ {+1, − 1} as and a statistical model of the conditional distribution

*p*(

*y*|

**) as Note that**

*x**p*(

*y*|

**) =**

*x**p*

_{0}(

*y*|

**) and from theorem 4, the estimator maximizing equation 2.7 is Fisher consistent, which implies .**

*x*See appendix B.

Then we observe the following theorem:

*We assume that the base classifier*

*f*satisfies the following conditions,^{2}*Then the estimator associated with the MadaBoost-type function U*.

_{mada}is the most B-robust estimator, which minimizes the gross error sensitivity (see equation 2.26)*V*′ is not upper-bounded, the numerator of equation 2.26 diverges. Therefore we focus on the case where

*V*′ is upper-bounded. Without loss of generality, we can assume because multiplication by a positive constant for

*V*′ does not change the estimator. From assumptions 2.27, we observe The gross error sensitivity is minimized when the denominator of equation 2.26 is maximized. The term is calculated over the region {

**|**

*x**F*†(

**′) ⩾**

*x**F*†(

**)}, and then we find that**

*x**V*′(

*z*) = 2 should hold for

*z*⩾ 0. From equation B.8, we observe that the MadaBoost-type function minimizes the gross error sensitivity.

### 2.6. Special Example of *U*: Exponential Type.

*U*-AUC with a special function is equivalent to RankBoost, whose properties were investigated in Freund et al. (2003). A relationship between the algorithm and AUC maximization was discussed in Cortes and Mohri (2004). While the selection of base classifier and calculation of α

_{t}cannot be explicitly solved in the algorithm with a general function

*U*, the maximization of can be explicitly solved with corresponding steps of the algorithm modified as follows:

- •
- •

Also, as with theorem 1, theoretical evaluation of the empirical AUC value by the algorithm with the function *U _{rank}* was analyzed in Freund et al. (2003).

*U*and AdaBoost. AdaBoost is the most popular boosting algorithm for classification problems and is derived from sequential minimization of the following exponential loss function: We observe the following:

_{rank}*RankBoost is equivalent to AdaBoost with an optimized threshold value*.

*U*, or RankBoost.

_{rank}### 2.7. Related Work.

In this section, we briefly review related work to clarify our own contributions in this letter.

*U*-Boost algorithm. Statistical properties of the

*U*-Boost such as the consistency or robustness were investigated, and the information geometrical structure (Amari & Nagaoka, 2000) of

*U*-Boost algorithm was discussed, which clearly shows an intuitive interpretation of how

*U*-Boost works. Kanamori et al. (2007) considered the robustness of the boosting method in the class of

*U*-Boost and revealed that the MadaBoost-type loss function is the most robust loss function for the coefficient estimation.

RankBoost is a boosting method to learn accurate rank or preference by combining multiple preferences such as the results of different search engines (Freund et al., 2003), and it was pointed out that the minimization of the loss function of RankBoost is equivalent to the maximization of the surrogate of the empirical AUC (Cortes & Mohri, 2004). While RankBoost optimizes the surrogate of the empirical AUC, measure of performance of classifier, another important performance measure is the margin of classifier. However, RankBoost does not always converge to a maximum margin solution because its objective function is the empirical AUC rather than the margin. Rudin and Schapire (2009) proposed the smoothed margin ranking algorithm, a modified version of RankBoost that enables us to obtain a maximum ranking-margin solution. While the RankBoost and its variant, including the proposed method, are based on the maximization of concave objective functions, which are surrogates of the empirical AUC value, there are methods that approximate the Heaviside function by the sigmoid function (Herschtal & Raskutti, 2004; Ghosh & Chaudhuri, 2005) or the cumulative distribution function of the normal distribution (Komori & Eguchi, 2010). Algorithms are derived with the usual coordinate ascent method; however, the objective function is not a concave function, implying that the algorithm can be trapped in local maxima, and this kind of algorithm requires regularization to avoid the problem of local maxima.

Our proposed method is a generalization of the RankBoost with any arbitrary concave function for maximizing the surrogate of the empirical AUC, which is motivated by the extension of AdaBoost to *U*-Boost (Murata et al., 2004). Also we investigated the relationship between the class of concave functions and the associated statistical model, and proposed a robust algorithm for AUC maximization (in section 2.5) based on the framework of Kanamori et al. (2007).

## 3. Partial *U*-AUC

The AUC provides an overall assessment of classification based on FPR and TPR as described in equation 2.3, each of which expresses a different accuracy for discrimination performance. In some situations, especially in disease screening, a high TPR in a range of a low FPR is important (Baker, 2003; Qi, Joseph, & Seetharaman, 2006), and hence the partial area under the ROC curve (pAUC) has been gaining in popularity. Moreover, a specific value of TPR given a fixed FPR is also used as a summary measure of discrimination in medical screening (Pepe, Longton, Anderson, & Schummer, 2003; Pepe, 2003). Pepe and Thompson (2000) proposed a statistical method that maximizes the pAUC by linearly combining two predictors, and Komori and Eguchi (2010) made it possible to nonlinearly combine more than two predictors using the approximate pAUC. In this context, we consider the partial *U*-AUC as an extension of the *U*-AUC.

### 3.1. Definition of Partial *U*-AUC.

_{1}and τ

_{2}that corresponds to the threshold values

*c*

_{1,F}and

*c*

_{2,F}: Note that

*c*

_{2,F}<

*c*

_{1,F}if τ

_{1}< τ

_{2}. For the range of FPR, partial AUC pA(

*F*) is defined as To approximate the partial AUC, we define the partial

*U*-AUC pA

_{U}(

*F*) as The empirical version of pA

*is given as where and are corresponding threshold values. In the empirical versions, a threshold value*

_{U}*c*associated with a specific value τ is chosen from the middle points of sorted values

*F*(

*x*_{j})’s having class label

*y*= −1. Note that from the definition of the function

_{j}*U*, we observe that and the following theorem:

pA_{U}(*F*, τ_{1}, τ_{2}) *is concave with respect to F*.

See appendix C.

Note that the concavity for is easily shown by the same argument as that in the proof of theorem 6.

### 3.2. Algorithm for Maximizing Partial *U*-AUC.

*c*

_{1,F}and

*c*

_{2,F}in definition 3.3 depend on the function

*F*. If we update the discriminant function

*F*

_{t−1}to

*F*

_{t−1}+ α

*f*with a fixed

*f*, the threshold values also change depending on the value of α, and then we cannot consider a derivation like equation 2.11. This implies that

*f*and α cannot be independently determined, and then it is not computationally tractable. To overcome the difficulty, we assume that the coefficient α is in a range [0, 1] and consider the following lower bound of the difference of the partial

*U*-AUC with fixed threshold values: where the inequality of equation 3.8 is derived from the concavity of . The approximate lower bound is defined with thresholds for the function

*F*

_{t−1}+

*f*and does not depend on α, which enables us to select

*f*independent of α

_{t}and leads to a drastic reduction in computational cost. Then the algorithm maximizing the partial

*U*-AUC, p

*U*-AUCBoost is written as follows:

Initialize the discriminant function to

*F*_{0}().*x*Output the discriminant function

*F*(_{T}) =*x**F*_{0}() + ∑*x*^{T}_{t=1}α_{t}*f*(_{t}).*x*

Note that in the optimization, equation 3.15, the threshold values depend on the value of α; however, we can easily detect when the threshold values change, and then the optimization is not difficult. During the optimization procedure, the value of *Z*′_{t−1}(*f*) in the denominator of equation 3.13 often becomes exactly 0 because of the restriction of the number of samples by and ; the result is that one base classifier is chosen repeatedly, and the algorithm does not move ahead. To avoid this problem, the base classifier that is chosen is discarded in later steps.

### 3.3. Consistency of the Algorithm for pAUC.

Similar to the consistency of A* _{U}*, we have the following remark:

*F** of equation 3.3 is connected with the ratio

*p*

_{+1}/

*p*

_{−1}or in a one-to-one correspondence, and we observe where π

_{y}is a marginal distribution of

*y*, and .

See appendix D.

### 3.4. Evaluation of the Empirical pA_{U}.

_{U}

Regarding the bounds of the empirical pA* _{U}*, we have the following theorem:

See appendix E.

From relationship 3.5, we immediately observe the following corollary:

## 4. Experiments

In this section, we clarify the validity of proposed method by comparing it with the existing binary classification method using benchmark data sets from the UCI repository (Blake & Merz, 1998). Through experiments, we investigated the performance of the proposed methods (*U*-AUCBoost and p*U*-AUCBoost) with three functions (*U _{rank}*,

*U*,

_{logit}*U*) and employed “stumps” (Friedman et al., 2000) as the base classifier of boosting-type algorithms. The proposed methods were compared with several boosting-type algorithms (AdaBoost, LogitBoost, and MadaBoost defined by functions

_{mada}*U*,

_{rank}*U*, and

_{logit}*U*, respectively) and SVM with an RBF kernel and a first-order polynomial (i.e., linear) kernel.

_{mada}The coefficient α of the base classifiers was optimized in a range of [0, 1] for the purpose of regularization. The initial function of the p*U*-AUCBoost was set to a function constructed by RankBoost (the *U*-AUCBoost with *U _{rank}*) whose step number

*T*was determined by the AUC value or pAUC value for the validation data set, because p

*U*-AUCBoost can use only part of all examples associated with a focused range of FPR, so it had difficulty in working smoothly in these data sets. (See Komori & Eguchi, 2010, for the preprocessing of the data by a pAUC-based method in high dimensions.) We employed the SVM in a package program

*e1071*, which is an LIBSVM (Chang & Lin, 2011) implementation in R language (R Development Core Team, 2011), and used a first-order polynomial (i.e., linear) kernel and a radial basis function (RBF) kernel for their kernel functions (Schölkopf & Smola, 2001).

We divided the original data set into a training data set and a test data set with the ratios of 80% and 20%, respectively, and evaluated the generalization performance by using the test data set. We repeated this procedure 100 times by changing the data set division and observed the average performance. For the tuning of hyperparameters of each method (i.e., the step number *T* of the boosting-type algorithms or regularization parameter *C* controlling the number of support vectors and parameter of kernel γ of the SVM),^{3} we subdivided the training data set into a training subset and a validation subset with the ratios of 80% and 20%, respectively, and set hyperparameters so as to maximize the AUC value or pAUC value for the validation subset. Details of the data set are summarized in Table 1. We first removed examples with unobserved variables.

Data Set . | Total Number . | Number of Attributes . | . |
---|---|---|---|

Australian | 690 | 14 | −0.221 |

Breast Cancer | 683 | 9 | −0.619 |

Diabetes | 768 | 8 | 0.624 |

German | 1000 | 24 | −0.847 |

Heart | 270 | 13 | −0.223 |

Ionosphere | 351 | 34 | 0.580 |

Liver Disorders | 345 | 6 | 0.322 |

Pima | 768 | 8 | −0.624 |

Sonar | 208 | 60 | −0.135 |

Data Set . | Total Number . | Number of Attributes . | . |
---|---|---|---|

Australian | 690 | 14 | −0.221 |

Breast Cancer | 683 | 9 | −0.619 |

Diabetes | 768 | 8 | 0.624 |

German | 1000 | 24 | −0.847 |

Heart | 270 | 13 | −0.223 |

Ionosphere | 351 | 34 | 0.580 |

Liver Disorders | 345 | 6 | 0.322 |

Pima | 768 | 8 | −0.624 |

Sonar | 208 | 60 | −0.135 |

Figures 2 and 3 show box-whiskers plots of AUC and partial AUC (τ_{1} = 0, τ_{2} = 0.1) of each method over 100 trials, respectively. We observe that the proposed methods (especially *U*-AUCBoost) worked well and had improved better performance then the existing methods except for a few data sets; even for those few data sets, the existing method did not significantly outperform *U*-AUCBoost. Although improvement of the AUC value by the proposed method seemed to be small, this was not negligible because the AUC criteria were insensitive to the improvement of classifier (Pencina, D'Agostino, D'Agostino, & Vasan, 2008). On the other hand, p*U*-AUCBoost sometimes had degraded performance compared with that of *U*-AUCBoost. This may be because p*U*-AUCBoost focuses on a specific range of FPR, and its target function (see equation 3.4) of the p*U*-AUCBoost is defined with fewer negative examples compared with *U*-AUCBoost. This restriction of examples made its optimization processes less stable than that of *U*-AUCBoost. In summary, the algorithm p*U*-AUCBoost for maximizing the partial AUC can be formally defined, but we could not find a special situation or data set where it consistently works well. We recommend employing *U*-AUCBoost to maximize partial AUC rather than p*U*-AUCBoost, because *U*-AUCBoost performed well in the sense of the partial AUC.

## 5. Conclusion

In this letter, we formulated a new performance measure *U*-AUC for AUC-optimal classification and proposed a boosting-type algorithm maximizing *U*-AUC, *U*-AUCBoost, which was formulated as sequential maximization of a concave function. The statistical properties of the maximizer of *U*-AUC were discussed, and we evaluated the robustness of the proposed algorithm. We also extended *U*-AUCBoost into p*U*-AUCBoost for maximizing *U*-AUC at a specific range of FPR. We experimentally investigated the performance of the proposed algorithms with real data sets and clarified the validity of the *U*-AUCBoost. p*U*-AUCBoost did not work so well because of the shortage of examples caused by the restriction of FPR. Nevertheless, numerical experiments showed that *U*-AUCBoost can maximize partial AUC.

## Appendix A: Proof of Theorem 1

*V*(

*z*) that satisfies

*U*(

*z*) = 1 −

*V*(−

*z*). The function

*V*is nonnegative, convex, and monotonically increasing. With this function, maximization of equation 2.8 is equivalent to minimization of Note that . From the definition of γ

_{t}, we observe that Then we observe that where By recursively considering this inequality, we obtain because

*F*

_{0}(

**) = 0 and**

*x**V*(0) = 1. From an inequality 1 +

*z*⩽ exp(

*z*), we obtain From , we conclude the theorem.

## Appendix B: Proof of Lemma 1

*F*(

**) is in {+1, − 1}, the numerator of equation B.5 is rewritten as The denominator is rewritten as From the definition of**

*x**V*, we observe and then the above equation is rewritten as

## Appendix C: Proof of Theorem 6

**) and a scalar ε, we have Note that and because of equations 3.1. In the same way, the second derivative is given as because of the concavity of**

*x**U*(

*z*). This indicates that pA

_{U}(

*F*, τ

_{1}, τ

_{2}) is concave with respect to

*F*.

## Appendix D: Proof of Remark 5

**). From the same calculation as in the proof of theorem 6, we have Note that equation D.2 can attain 0 for any η only if**

*x***is in a region , and hence**

*x**F** satisfies where Ψ(

*z*) is defined in equation 2.13. From equation 2.15,

*F**(

**) has a one-to-one correspondence with**

*x**p*

_{+1}(

**)/**

*x**p*

_{−1}(

**), which directly leads to remark 5.**

*x*## Appendix E: Proof of Theorem 7

## Acknowledgments

We appreciate the valuable comments that the anonymous reviewers made.

## References

*16*(pp.

*12*(pp.

*U*-boost and Bregman divergence

## Notes

^{1}

From the concavity of with respect to α, we observe that , which implies 0 ⩽ γ_{t} ⩽ 1.

^{2}

The function *F*† is associated with the true distribution rather than that constructed by the boosting algorithm, and the condition is not too strong; *F*† can converge when *p*(+1|** x**) (or

*p*(−1|

**)) is nearly equal to 0.**

*x*^{3}

We optimize *C* and γ in a range {2^{−2}, …, 2^{5}} and {2^{−3}, …, 2^{3}}, respectively.