Abstract
We present a lexicon-based approach to extracting sentiment from text. The Semantic Orientation CALculator (SO-CAL) uses dictionaries of words annotated with their semantic orientation (polarity and strength), and incorporates intensification and negation. SO-CAL is applied to the polarity classification task, the process of assigning a positive or negative label to a text that captures the text's opinion towards its main subject matter. We show that SO-CAL's performance is consistent across domains and in completely unseen data. Additionally, we describe the process of dictionary creation, and our use of Mechanical Turk to check dictionaries for consistency and reliability.
Author notes
Corresponding author. Department of Linguistics, Simon Fraser University, 8888 University Dr., Burnaby, B.C. V5A 1S6 Canada. E-mail: [email protected].
Department of Computer Science, University of Toronto, 10 King's College Road, Room 3302, Toronto, Ontario M5S 3G4 Canada. E-mail: [email protected].
School of Computing Science, Simon Fraser University, 8888 University Dr., Burnaby, B.C. V5A 1S6 Canada. E-mail: [email protected].
Department of Computer Science, University of British Columbia, 201-2366 Main Mall, Vancouver, B.C. V6T 1Z4 Canada. E-mail: [email protected].
Department of Linguistics, University of Potsdam, Karl-Liebknecht-Str. 24-25. D-14476 Golm, Germany. E-mail: [email protected].