This article describes an unsupervised strategy to acquire syntactico-semantic requirements of nouns, verbs, and adjectives from partially parsed text corpora. The linguistic notion of requirement underlying this strategy is based on two specific assumptions. First, it is assumed that two words in a dependency are mutually required. This phenomenon is called here corequirement. Second, it is also claimed that the set of words occurring in similar positions defines extensionally the requirements associated with these positions. The main aim of the learning strategy presented in this article is to identify clusters of similar positions by identifying the words that define their requirements extensionally. This strategy allows us to learn the syntactic and semantic requirements of words in different positions. This information is used to solve attachment ambiguities. Results of this particular task are evaluated at the end of the article. Extensive experimentation was performed on Portuguese text corpora.
Departamento de Língua Espanhola, Faculdade de Filologia, Universidade de Santiago de Compostela, Campus Universitario Norte, 15782 Santiago de Compostela, Spain. firstname.lastname@example.org
Faculdade de Informática, Pontifícia Universidade Católica do Rio Grande do Sul, Av. Ipiranga 6681 prédio 30 bloco 4, CEP 90619-900 Porto Alegre (RS), Brazil. email@example.com
Department of Computer Science, Faculty of Science and Technology, Universidade Nova de Lisboa, Quinta da Torre, 2829-516, Caparica, Portugal. firstname.lastname@example.org