Abstract

This paper presents an algorithm for identifying noun phrase antecedents of third person personal pronouns, demonstrative pronouns, reflexive pronouns, and omitted pronouns (zero pronouns) in unrestricted Spanish texts. We define a list of constraints and preferences for different types of pronominal expressions, and we document in detail the importance of each kind of knowledge (lexical, morphological, syntactic, and statistical) in anaphora resolution for Spanish. The paper also provides a definition for syntactic conditions on Spanish NP-pronoun noncoreference using partial parsing. The algorithm has been evaluated on a corpus of 1,677 pronouns and achieved a success rate of 76.8%. We have also implemented four competitive algorithms and tested their performance in a blind evaluation on the same test corpus. This new approach could easily be extended to other languages such as English, Portuguese, Italian, or Japanese.

This content is only available as a PDF.