Abstract

We present a novel approach to dealing with overfitting in black box models. It is based on the leverages of the samples, that is, on the influence that each observation has on the parameters of the model. Since overfitting is the consequence of the model specializing on specific data points during training, we present a selection method for nonlinear models based on the estimation of leverages and confidence intervals. It allows both the selection among various models of equivalent complexities corresponding to different minima of the cost function (e.g., neural nets with the same number of hidden units) and the selection among models having different complexities (e.g., neural nets with different numbers of hidden units). A complete model selection methodology is derived.

This content is only available as a PDF.