Abstract
This article presents our work on constructing a corpus of news articles in which events are annotated for estimated bounds on their duration, and automatically learning from this corpus. We describe the annotation guidelines, the event classes we categorized to reduce gross discrepancies in inter-annotator judgments, and our use of normal distributions to model vague and implicit temporal information and to measure inter-annotator agreement for these event duration distributions. We then show that machine learning techniques applied to this data can produce coarse-grained event duration information automatically, considerably outperforming a baseline and approaching human performance. The methods described here should be applicable to other kinds of vague but substantive information in texts.
Author notes
Microsoft Corporation, 475 Brannan St., San Francisco, CA 94107, USA. E-mail: [email protected].
4676 Admiralty Way, Marina del Rey, CA 90292, USA. E-mail: [email protected].
4676 Admiralty Way, Marina del Rey, CA 90292, USA. E-mail: [email protected].