Abstract
A cross-validation method based on replications of two-fold cross validation is called an cross validation. An cross validation is used in estimating the generalization error and comparing of algorithms’ performance in machine learning. However, the variance of the estimator of the generalization error in cross validation is easily affected by random partitions. Poor data partitioning may cause a large fluctuation in the number of overlapping samples between any two training (test) sets in cross validation. This fluctuation results in a large variance in the cross-validated estimator. The influence of the random partitions on variance becomes serious as increases. Thus, in this study, the partitions with a restricted number of overlapping samples between any two training (test) sets are defined as a block-regularized partition set. The corresponding cross validation is called block-regularized cross validation ( BCV). It can effectively reduce the influence of random partitions. We prove that the variance of the BCV estimator of the generalization error is smaller than the variance of cross-validated estimator and reaches the minimum in a special situation. An analytical expression of the variance can also be derived in this special situation. This conclusion is validated through simulation experiments. Furthermore, a practical construction method of BCV by a two-level orthogonal array is provided. Finally, a conservative estimator is proposed for the variance of estimator of the generalization error.