Abstract
In recent years, the kappa coefficient of agreement has become the de facto standard for evaluating intercoder agreement for tagging tasks. In this squib, we highlight issues that affect κ and that the community has largely neglected. First, we discuss the assumptions underlying different computations of the expected agreement component of κ. Second, we discuss how prevalence and bias affect the κ measure.
Issue Section:
Squibs and Discussions
This content is only available as a PDF.
© 2004 Association for Computational Linguistics
2004