Although the outputs of neural network classifiers are often considered to be estimates of posterior class probabilities, the literature that assesses the calibration accuracy of these estimates illustrates that practical networks often fall far short of being ideal estimators. The theorems used to justify treating network outputs as good posterior estimates are based on several assumptions: that the network is sufficiently complex to model the posterior distribution accurately, that there are sufficient training data to specify the network, and that the optimization routine is capable of finding the global minimum of the cost function. Any or all of these assumptions may be violated in practice.
This article does three things. First, we apply a simple, previously used histogram technique to assess graphically the accuracy of posterior estimates with respect to individual classes. Second, we introduce a simple and fast remapping procedure that transforms network outputs to provide better estimates of posteriors. Third, we use the remapping in a real-world telephone speech recognition system. The remapping results in a 10% reduction of both word-level error rates (from 4.53% to 4.06%) and sentence-level error rates (from 16.38% to 14.69%) on one corpus, and a 29% reduction at sentence-level error (from 6.3% to 4.5%) on another. The remapping required negligible additional overhead (in terms of both parameters and calculations). McNemar's test shows that these levels of improvement are statistically significant.