## Abstract

We present a new method for fusing scores corresponding to different detectors (two-hypotheses case). It is based on alpha integration, which we have adapted to the detection context. Three optimization methods are presented: least mean square error, maximization of the area under the ROC curve, and minimization of the probability of error. Gradient algorithms are proposed for the three methods. Different experiments with simulated and real data are included. Simulated data consider the two-detector case to illustrate the factors influencing alpha integration and demonstrate the improvements obtained by score fusion with respect to individual detector performance. Two real data cases have been considered. In the first, multimodal biometric data have been processed. This case is representative of scenarios in which the probability of detection is to be maximized for a given probability of false alarm. The second case is the automatic analysis of electroencephalogram and electrocardiogram records with the aim of reproducing the medical expert detections of arousal during sleeping. This case is representative of scenarios in which probability of error is to be minimized. The general superior performance of alpha integration verifies the interest of optimizing the fusing parameters.

## 1 Introduction

There are many scenarios where multiple detectors are to be fused to improve their individual performance (Khaleghi, Khamis, Karray, & Razavi, 2013; Atrey, Hossain, El Saddik, & Kankanhalli, 2010; Yuksel, Wilson, & Gader, 2012; Kittler, Hatef, Duin, & Matas, 1998). In general, the input to a single detector is a vector of measures (observation or feature vector) processed to obtain a scalar statistic to be compared with a threshold, thus obtaining a binary decision. Then the fusion of detectors can be made at three different levels: measures, statistics, or decisions. Finding optimum fusion functions becomes simpler as we go from measures to decisions level, but a price is paid in loss of information. Therefore, fusion at the statistics (intermediate) level becomes a reasonable compromise. On the one hand, the number of variables to be fused is reduced to the number of available detectors; on the other hand, it avoids the loss of information after thresholding. Usually the statistic is called “score.” Depending on the application, the score is normalized in a given range or not. Different normalization techniques exist (Jain, Nandakumar, & Ross, 2005), which are especially interesting in the case that heterogeneous detectors are to be fused. Normalized scores between 0 and 1 may be thought of as estimates of the a posteriori probability assigned by the detector to one of the two hypotheses if they are properly calibrated (Zadrozny & Elkan, 2002).

*d*numbers ( in the form

*m*and are respectively associated to probability density functions and of some random variable

_{i}*x*, then is the probability density minimizing the cost function where is the -divergence (Amari, 2007; Wu, 2009) between the two probability densities. Particularly simple cases of fusion rules are obtained for particular selections of the parameter . Thus, assuming that , we see that , respectively, renders the arithmetic mean, the geometric mean, and the harmonic mean. Similarly, is equivalent to computing the minimum or maximum. Notice that equation 1.2 can be applied to the approximation of every positive function from a set of

*d*positive functions .

Expressions for the gradients are obtained, and convergence is experimentally tested in some simulated data.

The -integration can be readily adapted to the fusion of scores in a detection context. Several detectors will produce several scores, which can be fused using -integration to obtain a unique (fused) score. In this letter, we propose three methods for estimating the fusing parameters ( and given a set of labeled training data. The first one is appropriate when working with normalized scores and is a direct adaptation of the least mean square error (LMSE) criterion in equation 1.3 to the detection problem. The possible unbalanced number of labeled data between both hypotheses and the different cost incurred by every type of erroneous decision (detection miss or false alarm) are accounted for by some simple modification of the cost function. A second method is proposed based on the maximization of the area under the ROC curve, (AUCmax). This is a cost function well suited to the detection framework and allows both normalized and nonnormalized scores. These two methods are appropriate in applications where the probability of detection is to be maximized for a given probability of false alarm. However, there are scenarios where minimizing the probability of error is more convenient. Hence, we propose a third method (MPE) where the -integration parameters are estimated so that the probability of making wrong decisions is minimized. This method requires that the scores are normalized. Gradient algorithms are devised for the three methods.

The next section is devoted to the LMSE approach. AUCmax is considered in section 3. Some experiments with LMSE and AUCmax criteria based on simulations are presented in section 4 with the aim of illustrating the concept and the interest of the new methods of -integration. Section 5 presents the application of LMSE and AUCmax to -integration in biometric data. Finally, the MPE method is considered in section 6, applied to a medical diagnosis problem: automatic detection of arousals during sleeping. Minimizing the wrong detections (relative to a medical expert) is the essential objective in this application. Section 7 ends the letter.

## 2 Estimating the -Integration Parameters by LMSE Criterion

In a detection scenario, we must decide between two hypotheses *H*_{0} and *H*_{1}. Let us assume that we have *d* different detectors working on the same hypotheses and that everyone contributes with a score, in a manner that higher values of the score play in favor of selecting and vice versa. Let us also assume that the scores are normalized so that Apart from this, there are no other constraints. Thus, the specific way in which every detector computes its score is of no concern here. Similarly the detectors may share the same input of observations or have totally different inputs, they may be statistically independent or not, and so on.

*y*is the corresponding known binary decision ( if

^{j}*H*

_{1}is true and if

*H*

_{0}is true). We can use this set to learn the parameters by minimizing a cost function as indicated in equation 1.3, which now becomes

We see that by minimizing the cost function, equation 2.2, we are trying to approximate the fused score to 1 when the true hypothesis is *H*_{1} and to 0 when the true hypothesis is *H*_{0}.

*H*

_{1}and

*H*

_{0}This is the case in novelty detection (Pimentel, Clifton, Clifton, & Tarassenko, 2014) or detection of signals in a noise background (Soriano, Vergara, Moragues, & Miralles, 2014). In those cases, minimization of equation 2.2 will be “blind” to

*H*

_{1}. To account for this problem, we propose a modification of the cost function, equation 2.2. Let us call

*N*

_{1}and

*N*

_{0}the sizes of the subsets corresponding, respectively, to

*H*

_{1}and

*H*

_{0}, hence . Instead of minimizing the overall mean square error, we compute separately the mean square errors corresponding to

*H*

_{1}and

*H*

_{0}. Then the mean of both values is to be minimized. Taking advantage of the binary value of

*y*, the new cost function can be expressed in the form

^{j}In this manner, the contributions to the error are normalized with respect to the size of the training subsets.

*H*

_{0}when the true hypothesis is (detection miss) or decide

*H*

_{1}when the true hypothesis is (false alarm). A simple modification of equation 2.3 can consider this option:

Notice that the modification of the new LMSE cost function of equation 2.4 implies a different weighting for every training sample contribution to the MSE as computed in equation 2.20. Also notice that and *N*_{1} are the number of available training samples of each class, and is a value fitted by the user depending on the importance given to every type of error. However, and can be estimated to minimize equation 2.4. In the following, we present gradient algorithms to estimate the optimum value of and

*w*by where necessary. The values and are the learning rate constants that control the speed of convergence. In all the experiments in this letter, these values have been fitted using similar values to those ones recommended in (Choi et al., 2010, 2013). Small variations around those values of and influenced the converging speed, but the final estimates remained the same.

_{i}## 3 Estimating the -Integration Parameters by AUCmax

LMSE criterion minimizes the MSE, where the error is defined as the (weighted) difference between the final (integrated score) and the target value (1 for *H*_{1}, 0 for . This seems a priori a reasonable criterion to obtain a good detector, but by no means implies that the probability of detection is maximized for a given probability of false alarm. Ultimately the detector performance depends on the statistical distribution of the integrated scores under every hypothesis. This suggests the convenience of a new criterion that could directly incorporate the detector performance.

Different figures of merit have been proposed to evaluate the detector performance (Parker, 2013). Among them, the AUC is the most popular. Moreover, AUC has two advantages in comparison with MSE:

- •
We can optimize the fusion in specific intervals of the probability of false alarm depending on the application requirements.

- •
Scores of the labeled training set are not required to be normalized between 0 and 1.

*P*as a function of the probability of false alarm

_{d}*P*; let us represent this curve by the function . We can compute the area associated with that function in a given interval of the independent variable

_{f}*P*by integrating ; the result of the integral will be the AUC corresponding to that interval. Let us define a normalized AUC in a given interval: where and limit the interval of interest where the normalized AUC is to be computed.

_{f}We can evaluate the normalized area by numerical integration. This can be made by uniformly sampling the ROC curve, adding all the sample values, and normalizing by the total number of samples. To define the sampling points, we take into account that the test is implemented by comparing score with a threshold *t*. Therefore, every threshold establishes one point of the ROC curve. We select consecutive values of the set in a given interval as thresholds. For every threshold , we count the number of values in *S*_{1} that are above the threshold; this number divided by *N*_{1}is an empirical estimate for that threshold. Summing all values so obtained and dividing by the total number of summed values, we obtain an empirical estimate of the normalized AUC in a given interval. The selected interval of thresholds in must be in concordance with the *P _{f}* interval But notice that as the elements in are sorted in descending order the thresholds correspond to empirical values Then the selected interval in must be , where is defined as the next higher whole number of value and as the next lesser whole number of value . This leads to the empirical normalized AUC estimator of equations 3.6a and 3.6b. We consider separately in equation 3.6a the case in which the limits of the interval for integration are so close that after truncations, the order of the limits is inverted (i.e., ). It that case, only one sample for is obtained for estimating the normalized AUC. The other cases, when , are all included in equation 3.6b. Notice that the truncation effects in and are compensated by the term

*a*

_{1}:

- •
- •

As can be observed in Figure 1, the sigmoid function may approximate the unit step function, with arbitrarily small approximation error, by selecting a large enough value.

To solve this optimization problem, an interior point algorithm can be used (Byrd, Hribar & Nocedal, 1999; Waltz, Morales, Nocedal, & Orban, 2006).

The gradient of the objective function can be obtained to improve the interior point algorithm due to using the differentiable sigmoid function in expressions 3.6a and 3.6b. Differentiating the estimator with respect to a generic parameter :

- •
- •

## 4 Experiments with Simulated Data

We have performed a number of simulations with the aim of illustrating the different factors influencing -integration for the fusion of detectors, as well as the specific interest of the proposed modifications. We have considered the fusion of two detectors (). Every detector provides one score , which is modeled as a random variable uniformly distributed in a given interval that depends on the true hypothesis . Let us, respectively, call and to the lower and upper limits of the intervals corresponding to the uniform distribution of the scores provided by detector *i*under hypothesis *H _{k}*.

We show in Figures 2 to 10 the results of nine experiments. In experiments 1 to 6, the LMSE gradient algorithm was used to estimate the optimum value of and/or . However, in experiments 7 to 9, the AUCmax was considered.

Every figure is formed by six subfigures showing (from left to right and from top to bottom):

- •
The 2D distribution of the training set of scores

- •
The curves of convergence of the parameter and/or the coefficients

*w*_{1}and*w*_{2}corresponding to the gradient algorithm of equations 2.8 and 2.9 - •
The ROC curves of the three detectors (two individual detectors and the fused one) representing the probability of detection

*P*in terms of the probability of false alarm_{d}*P*_{f} - •
The 2D contour curves defining the decision regions of the -integrated detector

- •
The uniform distributions of the scores

*s*_{1}and*s*_{2}corresponding to every individual detector - •
The final distributions of the score obtained after -integration

In all the experiments, the training (estimation of the optimum value of and/or was made by using labeled scores. The evaluation performance (ROC curves and fused score distributions) was obtained from a set of 10,000 scores. Other experiments were made by using different training and evaluation sizes, but the general conclusions remained the same.

Figure 2 corresponds to experiment 1. As we see, the parameter is learned by means of the gradient algorithm, equation 2.8, and converges to a final value (0.6) after only some 15 iterations. The sizes of the training sets are the same for both hypotheses . The weighting coefficients are not estimated but are fitted to the same value (). The parameter , that is, no preference is given a priori to any hypothesis. The limits of the uniform distributions of the individual scores are , , . This implies a large overlap between both hypotheses when working separately with the individual detectors. However, the distributions of the integrated score are no longer uniform, showing the better separation between hypotheses achieved after -integration. This can also be observed by looking to the ROC curves.

Experiment 2 (see Figure 3) illustrates the interest of the modification included in equation 2.3 to account for the possible different sizes of the training sets under every hypothesis. Thus, in Figure 2 the sizes of the training set under every hypothesis are very different (. The rest of the parameters are the same as those of the first experiment. We can see that parameter converges to the same value of experiment 1; hence, the performance of the detector after fusion should be the same. This is verified by observing that the ROC curves, the 2D contours, and the distribution of the integrated score are practically the same in both experiments.

The next two experiments illustrate how parameter may be used to bias the -integrated detector toward one of the two hypotheses. Thus, in Figure 4, we show the same case as in Figure 1, except that now , so that the contribution to the global error due to deciding *H*_{0} when the true hypothesis is *H*_{1}is much more significant than the contribution to the global-error due to deciding *H*, when the true hypothesis is *H*_{0}. We see in Figure 4 that converges to , that is, the -integrated detector tends toward computing the maximum of the two individual scores, which clearly bias the decisions in favor of *H*_{1}. This bias can also be observed in the form adopted by the 2D contour curves defining the decision regions and in the resulting distributions of . Finally, we see in the ROC curves that for a probability of false alarm greater than approximately , the individual detectors have greater probability of detection than the -integrated detector.

Experiment 4 is similar to experiment 3, but now , so that the fusion of detectors is biased in favor of *H*_{0}. We see in Figure 5 that converges to , that is, the fusion tends toward computing the minimum of the two individual scores. The 2D contour curves and the resulting distributions of are modified accordingly. Finally we see in the ROC curves that for a probability of false alarm less than about , the individual detectors have greater probability of detection than the fused detector.

The next experiment illustrates that an optimum linear combiner (weighted arithmetic mean) of the individual scores is a particular constrained case of -integration. Notice in equation 2.1 that if , then .

We show in Figure 6 the results of experiment 5, which is the same case as experiment 1, except that and the weighting coefficients are learned by the gradient algorithm, equation 2.9. Because both individual detectors produce the same score distribution (both detectors perform the same), the gradient algorithm converges to . Notice that the contour curves are now straight lines, in concordance with equation 2.9. This implies some suboptimality with respect to experiment 1, where the optimum was learned by the gradient algorithm, and was different from −1. Suboptimality can be appreciated too by comparing the ROC curve of the -integrated detector in Figure 6 with the corresponding curve of Figure 2

Experiment 6 illustrates the case of combining two detectors having different performances. We have modified Experiment 5 so that the uniform distribution of the scores of the detector 1 under *H*_{1}is narrowed (between 0.4 and 1). This implies that detector 1 performs better than detector 2 under *H*_{1}. Then we see in Figure 7 that the gradient algorithm converges to weights such that , so that detector 1 has more influence in the final -integrated detector. This produced a rotation of the 2D contours to accommodate a bias toward the decisions of detector 1.

In the next three experiments, we use the parameter estimation method based on AUCmax. Using this method, we can select the interval of probabilities of false alarm in which we want to obtain the best results. These experiments are like experiment 1, where two equal detectors are fused by means of integration, but now the new training method based on the AUCmax is used. First, in experiment 7, we estimated all the parameters to maximize . The results are presented in Figure 8. In this case, we can see how the weighting parameters obtained are the same for each detector and the estimated parameter converges to a value so that the whole AUC of the ROC curve obtained after fusion is maximized. Notice that the ROC curves are quite similar to the ones in Figure 1.

In the two final experiments, we have changed the *P _{f}* interval of the ROC curves in which we want to maximize the AUC. Thus, in experiment 8 (see Figure 9) is maximized, and in experiment 9 (see Figure 10) is maximized.

In these two cases, due to the same behavior of both detectors, the estimated weighting parameters are equal, but , which controls the shape of the separation frontiers, converges to a value that allows a better probability of detection after fusion in the specified false alarm intervals of the ROC curves

## 5 Application of -Integration in Biometrics Score Fusion

Biometrics refers to the automatic identification of an individual based on his or her physiological traits (Jain et al., 2004). The performance of a biometric system can be measured by reporting its false accept rate (FAR), equivalent to the concept of probability of false alarm considered so far, and false reject rate (FRR), equivalent to the concept of 1. These systems are subject to low FAR (usually less than 0.1%).

Biometric systems based on a single source of information (unimodal systems) suffer from such limitations as the lack of uniqueness, nonuniversality, and noisy data (Jain & Ross, 2004) and hence may not be able to achieve the desired performance requirements of real-world applications. In contrast, multimodal biometric systems combine information from its component modalities to arrive at a decision (Ross & Jain, 2003). Multimodal biometric authentication requires fusing information of different modalities (e.g., fingerprint, face, iris, retina, voice). Several studies (Toh, Jiang, & Yau, 2004; Wang, Tan, & Jain 2003) have demonstrated that by consolidating information from multiple sources, better performance can be achieved compared to the individual unimodal systems.

In a multimodal biometric system, integration can be done at the feature level, matching score level, or decision level. Matching-score-level fusion is commonly preferred because matching scores are easily available and contain sufficient information to distinguish between a genuine and an impostor case. Given a number of biometric systems, one can generate matching scores for a prespecified number of users even without knowing the underlying feature extraction and matching algorithms of each biometric system. Thus, combining information contained in the matching scores seems both feasible and practical (Dass, Nandakumar, & Jain, 2004)

In this letter, we have tested the use of -integration to fuse the matching scores in a multimodal biometric system. In particular we have used the public database Biometric Scores Set—Release 1 (BSSR1) (U.S. Department of Commerce, 2013). BSSR1 is a set of raw output similarity scores from two face recognition systems and one fingerprint system, operating on frontal faces, and left and right index live-scan fingerprints, respectively. The data are intended to permit interested parties to investigate a range of outstanding statistical problems related to biometrics. BSSR1 contains three partitions (see Table 1).

Partition . | Number of Individuals . | Number of Detectors . | Scores Available by Detector . |
---|---|---|---|

1 | 4:2 measures of 2 face matchers | Total: Genuine: | |

2 | 2:1 measure of right and 1 measure of left index fingerprint of 1 fingerprint matcher | Total: Genuine: | |

3 | 517 | 4:1 measure of 2 face matchers, 1 measure of right and 1 measure of left index fingerprint of 1 fingerprint matcher | Total: 517 Genuine: 517 |

Partition . | Number of Individuals . | Number of Detectors . | Scores Available by Detector . |
---|---|---|---|

1 | 4:2 measures of 2 face matchers | Total: Genuine: | |

2 | 2:1 measure of right and 1 measure of left index fingerprint of 1 fingerprint matcher | Total: Genuine: | |

3 | 517 | 4:1 measure of 2 face matchers, 1 measure of right and 1 measure of left index fingerprint of 1 fingerprint matcher | Total: 517 Genuine: 517 |

Many possible experiments may be devised from these three partitions. We have selected four experiments whose results are respectively shown in Tables 2 to 5. In all the experiments, we have obtained the GARs corresponding to three different FARs for different methods of score fusion. The shown GAR values are the average of 30 iterations. In every iteration, the available score sets of the corresponding BSSR1 partition have been randomly divided into two halves. The first half has been used for training and the second for evaluation.

. | FAR 0.001% . | FAR 0.01% . | FAR 0.1% . |
---|---|---|---|

Arithmetic mean | 97.859 | 98.823 | 99.510 |

Geometric mean | 96.229 | 98.691 | 96.609 |

Min | 72.305 | 79.816 | 85.724 |

Max | 97.424 | 98.622 | 99.426 |

-integration (LMSE) | 83.767 | 97.019 | 98.693 |

-integration (AUCmax, nAUC) | 98.851 | 99.135 | 99.601 |

. | FAR 0.001% . | FAR 0.01% . | FAR 0.1% . |
---|---|---|---|

Arithmetic mean | 97.859 | 98.823 | 99.510 |

Geometric mean | 96.229 | 98.691 | 96.609 |

Min | 72.305 | 79.816 | 85.724 |

Max | 97.424 | 98.622 | 99.426 |

-integration (LMSE) | 83.767 | 97.019 | 98.693 |

-integration (AUCmax, nAUC) | 98.851 | 99.135 | 99.601 |

Notes: Scores were normalized by using equation 5.1. Numbers in bold indicate the best result.

*s*and

_{norm}*s*are, respectively, the scores after and before the normalization, is the probability density of

*s*conditioned to hypothesis

*H*, and is the a priori probability of hypothesis

_{k}*H*. These probabilities were estimated from the percentages of instances of

_{k}*H*inside the training set of scores. Moreover, has been estimated using nonparametric gaussian kernel methods. Other methods of normalization are possible (Jain et al., 2005), but its influence on the results is out of the scope of this work.

_{k}As we can see in Table 2, the best results are obtained with -integration (AUCmax). Tuning the maximization of AUC in an interval of the ROC curve is important in this experiment if we compare the results obtained by -integration (LMSE). In fact, notice that in some cases of Table 2, -integration (LMSE) performs even worse than other simple rules. This is because no direct maximization of the GAR is made by -integration (LMSE) and reinforces the interest of the new proposed criterion AUCmax.

In experiments 2, 3 and 4, we considered the original scores without normalization; hence, the -integration (LMSE) was not applied. Each experiment corresponds to a different partition. Thus, we show in Tables 3, 4, and 5 the results obtained with partitions 1, 2, and 3, respectively. We can see in all cases the superior performance of fusion based on -integration (AUCmax), thus showing the interest of optimizing the fusing parameters.

. | FAR 0.001% . | FAR 0.01% . | FAR 0.1% . |
---|---|---|---|

Arithmetic mean | 92.990 | 93.901 | 96.172 |

Geometric mean | 90.799 | 92.864 | 95.404 |

Min | 57.969 | 73.896 | 84.135 |

Max | 87.161 | 90.223 | 93.436 |

-integration (AUCmax, nAUC) | 98.093 | 99.417 | 99.611 |

. | FAR 0.001% . | FAR 0.01% . | FAR 0.1% . |
---|---|---|---|

Arithmetic mean | 92.990 | 93.901 | 96.172 |

Geometric mean | 90.799 | 92.864 | 95.404 |

Min | 57.969 | 73.896 | 84.135 |

Max | 87.161 | 90.223 | 93.436 |

-integration (AUCmax, nAUC) | 98.093 | 99.417 | 99.611 |

Notes: Scores are not normalized. Numbers in bold indicate the best result.

. | FAR 0.001% . | FAR 0.01% . | FAR 0.1% . |
---|---|---|---|

Arithmetic mean | 88.393 | 91.170 | 93.895 |

Geometric mean | 85.410 | 89.007 | 92.304 |

Min | 75.546 | 79.740 | 84.425 |

Max | 86.570 | 90.298 | 93.311 |

-integration (AUCmax, nAUC) | 88.542 | 91.409 | 94.011 |

. | FAR 0.001% . | FAR 0.01% . | FAR 0.1% . |
---|---|---|---|

Arithmetic mean | 88.393 | 91.170 | 93.895 |

Geometric mean | 85.410 | 89.007 | 92.304 |

Min | 75.546 | 79.740 | 84.425 |

Max | 86.570 | 90.298 | 93.311 |

-integration (AUCmax, nAUC) | 88.542 | 91.409 | 94.011 |

Notes: Scores are not normalized. Numbers in bold indicate the best results.

. | FAR 0.001% . | FAR 0.01% . | FAR 0.1% . |
---|---|---|---|

Arithmetic mean | 50.752 | 65.018 | 77.320 |

Geometric mean | 65.135 | 74.904 | 83.998 |

Min | 59.807 | 71.365 | 81.538 |

Max | 49.176 | 63.914 | 76.416 |

-integration (AUCmax, nAUC) | 66.799 | 75.971 | 84.995 |

. | FAR 0.001% . | FAR 0.01% . | FAR 0.1% . |
---|---|---|---|

Arithmetic mean | 50.752 | 65.018 | 77.320 |

Geometric mean | 65.135 | 74.904 | 83.998 |

Min | 59.807 | 71.365 | 81.538 |

Max | 49.176 | 63.914 | 76.416 |

-integration (AUCmax, nAUC) | 66.799 | 75.971 | 84.995 |

Notes: Scores are not normalized. Numbers in bold indicate the best result.

## 6 Estimating the -Integration Parameters by MPE: An Application in Medical Diagnosis

So far we have considered that the ROC curve of the integrated detector is the essential element to be optimized by -integration. This is implicitly done with the LMSE criterion by trying to obtain integrated scores as close as possible to 1 when the true hypothesis is *H*_{1} or to 0 when *H*_{0} is in force. On the other hand, ROC curves are explicitly optimized by AUCmax. This approach is appropriate in detection problems where having control of the probability of false alarm *P _{f}* is a crucial aspect. However, there are applications when it is better to minimize the probability of error

*P*(i.e., the probability of selecting a wrong hypothesis). This is a typical criterion in digital transmission, where an error happens whenever a symbol “1” is decided in reception but the emitted symbol was “0” or vice versa. Thus,

_{e}*P*becomes the essential figure of merit of a digital communication system performance. There are other areas where minimizing the

_{e}*P*of a detector is the appropriate optimization goal. One of them is automatic medical diagnosis. Often long biosignal records (e.g., electrocardiogram (ECG) and electroencephalogram (EEG) recordings) must be visually analyzed by the medical expert to detect the possible presence of some predefined events in the signals. The amount and sequencing of these events may help in the diagnosis of pathologies. This task can be eased and dramatically accelerated by replacing the expert by an automatic detector. In this type of problem, the goal is to reproduce the detections of the expert, which are considered correct detections, as much as possible. Hence minimizing

_{e}*P*is the best option.

_{e}In this section, we show the results obtained by -integration in the implementation of an automatic detector that integrates two scores corresponding to different modalities (EEG and ECG). Before that, we propose a new method for estimating the -integration parameters, which is more appropriate for this kind of scenarios: the minimum probability of error (MPE) criterion.

*y*is the corresponding known binary decision ( if

^{j}*H*

_{1}is true and if

*H*

_{0}is true). Minimization of the

*P*corresponding to the foregoing set is equivalent to maximization of the probability of making correct decisions across the whole set of couples . Let us call to the probability of taking a correct decision

_{e}*y*from the fused score ; it can be expressed as We assume that the scores to be fused are normalized and calibrated (Jain et al., 2005; Zadrozny & Elkan, 2002), so that we can consider that . Therefore, after -integration, we have that . Then, substituting in equation 6.1, Let us call

^{j}*P*to the probability of making correct decisions across the whole set of couples . If the measurements are independent for different values of the index

_{c}*j*, we can write

*P*. It is well known in detection theory (Hippenstiel, 2002) that the optimum detector that minimizes

_{e}*P*from (in this case) the observation is obtained by the test But , so equation 6.6 is equivalent to But we have assumed that , so the MPE test will simply be We have considered the MPE criterion in the estimation of the -integration parameters in an application of medical diagnosis. The problem belongs to the area of computer-assisted sleep staging (Agarwal & Gotman, 2001). In particular, we want to build an automatic detector of arousals during sleeping, as their frequency of appearance is related to the presence of apnea and epilepsy. Normally arousals are detected by a medical expert from a visual inspection of the so-called polysomnograms (PSG), a set of EEGs obtained from the patient while sleeping. This manual task is tedious and susceptible to error after a long period of analysis. Then Salazar, Vergara, and Miralles (2010) proposed and automatic technique that, extracting four features from the PSG signals, generates automatic detections of arousals every epoch of 30 seconds. The method consists of a Bayesian classifier that assumes a hidden Markov model for the evolution of the sleeping stages and a nongaussian mixture model for the multivariate probability density in the feature space (see Salazar et al., 2010 for details).

_{e}Here we want to verify the possible improvement in the detection of arousals by combined use of EEG and ECG information. From the ECG records and after some standard signal processing (Kaufmann, Sütterlin, Schulz, & Vögele, 2011), the heartbeats (R-peaks) are extracted. Then the sequence of RR intervals between consecutive R-peaks is formed. This is termed the heart rate variability (HRV) signal, which has been extensively used for health monitoring (see Bouziane, Yagoubi, Vergara, & Salazar, 2015). Three features are extracted in every 30 second epoch. Two of them are time domain features: the mean and the standard deviation of the RR intervals. The third feature is the quotient between the low-frequency (LF) (0.04–0.15 Hz) and high-frequency (HF) (0.15-0.4 Hz) powers, obtained from the power spectral density (PSD) of the RR sequence.

For the experiment, we had four subjects. EEG and ECG signals were synchronously recorded for every subject during sleeping. Every recording session lasted some 7.5 hours (some 900 epochs of 30 seconds). A medical expert generated a binary decision for every epoch (presence or no presence of arousal), the target decision or ground truth. For every subject, we used the first half of his recording session for training and the second half for testing the detectors performance. Using the methods described in Salazar et al. (2010), a score is generated from the EEG information for every epoch. Similarly, a score is obtained from the ECG features described in the previous paragraph, using a support vector machine (SVM) classifier. Finally both scores are -integrated.

The goal is to reproduce the manual detections given by the expert as much as possible; then every discrepancy with the expert will be considered an error, and the probability of error is to be minimized. Hence, the decisions corresponding to the EEG and ECG modalities are obtained by respectively introducing the EEG score and the ECG score in the test of equation 6.8. On the other hand, we have used the MPE criterion for estimating the -integration parameters, and the -integrated score is also considered in the test (see equation 6.8) to generate decisions.

The left side of Table 6 shows the results in terms of percentage of decisions that coincide with the expert decisions for the three possible automatic cases: isolated scores obtained from the EEG signals, isolated scores obtained from the EEG signals, and scores derived from -integration of both. The corresponding -integration parameters estimated with the MPE criterion are indicated on the right side of Table 6. We see that improvements after -integration appear in subjects 1, 2, and 3; the percentage corresponding to subject 4 is the same one obtained with isolated ECG scores. The very large value of corresponding to subject 4 confirms that the minimum score is selected that seems to correspond to the ECG score in this case. Moreover, the weights are clearly unbalanced in favor of the ECG score in subject 4. In any case, notice that -integration yields a performance that is as least as good as the best individual performance. Thus, even in the case of no improvement, -integration is able to “select” the best automatic detector between the two available.

. | EEG (%) . | ECG (%) . | -int (%)
. | . | . | . |
---|---|---|---|---|---|---|

Subject 1 | 78.60 | 80.55 | 84.70 | 10.95 | 0.5053 | 0.4947 |

Subject 2 | 77.39 | 74.37 | 77.51 | 17.02 | 0.5552 | 0.4448 |

Subject 3 | 89.13 | 90.48 | 91.72 | 10.15 | 0.4306 | 0.5786 |

Subject 4 | 80.45 | 93.93 | 93.93 | 96.02 | 0.2009 | 0.7991 |

. | EEG (%) . | ECG (%) . | -int (%)
. | . | . | . |
---|---|---|---|---|---|---|

Subject 1 | 78.60 | 80.55 | 84.70 | 10.95 | 0.5053 | 0.4947 |

Subject 2 | 77.39 | 74.37 | 77.51 | 17.02 | 0.5552 | 0.4448 |

Subject 3 | 89.13 | 90.48 | 91.72 | 10.15 | 0.4306 | 0.5786 |

Subject 4 | 80.45 | 93.93 | 93.93 | 96.02 | 0.2009 | 0.7991 |

Note: Numbers in bold indicate the best results.

## 7 Conclusion

We have presented a new method for the fusion of scores obtained from different detectors based on -integration. It is a generalization of simpler rules which allows optimum fitting of the parameters and finds rationale in the optimum integration of stochastic models. Three optimality criteria have been considered: LMSE, AUCmax, and MPE. While the first two relates implicitly or explicitly in optimizing the ROC curves (i.e., maximizing probability of detection for a given probability of false alarm), the last one focuses on minimizing the probability of error.

We have proposed new gradient algorithms for the three criteria. In the LMSE case, we have adapted to the detection context a gradient algorithm previously proposed in the general framework of -integration. Some variations have been included to account for unbalanced distribution of the training data sizes and relative significance of every type of error in the global MSE. Regarding AUCmax, a new algorithm has been proposed based on transforming an empirical nonparametric measure of AUC in a differentiable function. A key advantage of AUCmax with respect to LMSE is that it allows tuned optimization in selected intervals of the ROC curves. In MPE, a new cost function is defined that is the negative of the log probability of correct answers.

We have included different experiments with simulated data with the aim of illustrating the different factors influencing -integration with both LMSE and AUCmax. It has been shown that the fusion of two-detector scores leads to significant improvements of the ROC curves.

Finally, two real data cases have been considered. The first corresponds to the fusion of scores in multimodal biometric data. In this application, the goal is to have the maximum genuine acceptation ratio (equivalent to probability of detection) for a given (rather small) false alarm ratio; hence, both LMSE and AUCmax have been considered. Different experiments have been done with different data sets, showing the superior performance of -integration with respect to simpler rules, which do not allow the optimization of the fusing parameters. We have also demonstrated the interest of the tuning capability of AUCmax to a selected range of probabilities of false alarm.

The second real data case is in the area of automatic analysis of medical records to reproduce the manual decisions taken by the medical expert, so the best criterion is MPE. We have presented the theoretical analysis, including gradient computations, of -integration based on MPE. The method has been applied in the fusion of two scores, respectively obtained from EEG and ECG records. The problem was the automatic detection of arousals during sleeping, which the medical expert currently does manually. Experiments in four subjects have illustrated the potential interest of MPE -integration in these kinds of problems.

## Acknowledgments

This work has been supported by Generalitat Valenciana under grants PROMETEOII 2014-032, ISIC2012-006 and by Spanish administration under grant TEC2014-58438-R.