In the presence of a heavy-tail noise distribution, regression becomes much more difficult. Traditional robust regression methods assume that the noise distribution is symmetric, and they downweight the influence of so-called outliers. When the noise distribution is asymmetric, these methods yield biased regression estimators. Motivated by data-mining problems for the insurance industry, we propose a new approach to robust regression tailored to deal with asymmetric noise distribution. The main idea is to learn most of the parameters of the model using conditional quantile estimators (which are biased but robust estimators of the regression) and to learn a few remaining parameters to combine and correct these estimators, to minimize the average squared error in an unbiased way. Theoretical analysis and experiments show the clear advantages of the approach. Results are on artificial data as well as insurance data, using both linear and neural network predictors.

This content is only available as a PDF.