There has been a great deal of interest over the past 20 years in developing metrics and frameworks for evaluating and comparing the performance of spoken-language dialogue systems. One of the results of this interest is a potential general methodology, known as the PARADISE framework. This squib highlights some important issues concerning the application of PARADISE that have, up to now, not been sufficiently emphasized or have even been neglected by the dialogue-system community. These include considerations regarding the selection of appropriate regression parameters, normalization effects on the accuracy of the prediction, the influence of speech-recognition errors on the performance function, and the selection of an appropriate user-satisfaction measure. In addition, it gives the results of an evaluation of data from two Wizard-of-Oz experiments. These evaluations include different dependent variables and examination of individual user-satisfaction measures.

