Features for determining whether c fills iargn of predicate p. For each mention f (denoting a filler) in the coreference chain c′, pf, and argf are the predicate and argument position of f. Unless otherwise noted, all argument positions (e.g., argn and iargn) should be interpreted as the integer label n instead of the underlying word content of the argument. The & symbol denotes concatenation; for example, a feature value of “p&iargn” for the iarg0 position of sale would be “sale-0.” Features marked with an asterisk (*) are explained in Section 4.2. Features marked with a dagger (†) require external text corpora that have been automatically processed by existing NLP components (e.g., SRL systems). The final column gives a heuristic ranking score for the features across all evaluation folds (see Section 6.2 for discussion).
# . | Feature value description . | Importance score . |
---|---|---|
1* | For every f, pf&argf&p&iargn. | 8.2 |
2* | Sentence distance from c to p. | 4.0 |
3* | For every f, the head word of f& the verbal form of p&iargn. | 3.6 |
4* | Same as 1 except generalizing pf and p to their WordNet synsets. | 3.3 |
5* | Same as 3 except generalizing f to its WordNet synset. | 1.0 |
6 | Whether or not c and p are themselves arguments to the same predicate. | 1.0 |
7 | p& the semantic head word of p's right sibling. | 0.7 |
8 | Whether or not any argf and iargn have the same integer argument position. | 0.7 |
9* | Frame element path between argf of pf and iargn of p in FrameNet (Baker, Fillmore, and Lowe 1998). | 0.6 |
10 | Percentage of elements in c′ that are subjects of a copular for which p is the object. | 0.6 |
11 | Whether or not the verb forms of pf and p are in the same VerbNet class and argf and iargn have the same thematic role. | 0.6 |
12 | p& the last word of p's right sibling. | 0.6 |
13*† | Maximum targeted PMI between argf of pf and iargn of p. | 0.6 |
14 | p& the number of p's right siblings. | 0.5 |
15 | Percentage of elements in c′ that are objects of a copular for which p is the subject. | 0.5 |
16 | Frequency of the verbal form of p within the document. | 0.5 |
17 | p& the stemmed content words in a one–word window around p. | 0.5 |
18 | Whether or not p's left sibling is a quantifier (many, most, all, etc.). Quantified predicates tend not to take implicit arguments. | 0.4 |
19 | Percentage of elements in c′ that are copular objects. | 0.4 |
20 | TF cosine similarity between words from arguments of all pf and words from arguments of p. | 0.4 |
21 | Whether the path defined in 9 exists. | 0.4 |
22 | Percentage of elements in c′ that are copular subjects. | 0.4 |
23* | For every f, the VerbNet class/role of pf/argf& the class/role of p/iargn. | 0.4 |
24 | Percentage of elements in c′ that are indefinite noun phrases. | 0.4 |
25* | p& the syntactic head word of p's right sibling. | 0.3 |
26 | p& the stemmed content words in a two-word window around p. | 0.3 |
27*† | Minimum selectional preference between any f and iargn of p. Uses the method described by (Resnik 1996) computed over an SRL-parsed version of the Penn TreeBank and Gigaword (Graff 2003) corpora. | 0.3 |
28 | p&p's synset in WordNet. | 0.3 |
29† | Same as 27 except using the maximum. | 0.3 |
30 | Average per–sentence frequency of the verbal form of p within the document. | 0.3 |
31 | p itself. | 0.3 |
32 | p& whether p is the head of its parent. | 0.3 |
33*† | Minimum coreference probability between argf of pf and iargn of p. | 0.3 |
34 | p& whether p is before a passive verb. | 0.3 |
35 | Percentage of elements in c′ that are definite noun phrases. | 0.3 |
36 | Percentage of elements in c′ that are arguments to other predicates. | 0.3 |
37 | Maximum absolute sentence distance from any f to p. | 0.3 |
38 | p&p's syntactic category. | 0.2 |
39 | TF cosine similarity between the role description of iargn and the concatenated role descriptions of all argf. | 0.2 |
40 | Average TF cosine similarity between each argn of each pf and the corresponding argn of p, where ns are equal. | 0.2 |
41 | Same as 40 except using the maximum. | 0.2 |
42 | Same as 40 except using the minimum. | 0.2 |
43 | p& the head of the following prepositional phrase's object. | 0.2 |
44 | Whether any f is located between p and any of the arguments annotated by NomBank for p. When true, this feature rules out false positives because it implies that the NomBank annotators considered and ignored f as a local argument to p. | 0.2 |
45 | Number of elements in c′. | 0.2 |
46 | p& the first word of p's right sibling. | 0.2 |
47 | p& the grammar rule that expands p's parent. | 0.2 |
48 | Number of elements in c′ that are arguments to other predicates. | 0.2 |
49 | Nominal form of p&iargn. | 0.2 |
50 | p& the syntactic parse tree path from p to the nearest passive verb. | 0.2 |
51 | Same as 37 except using the minimum. | 0.2 |
52† | Same as 33 except using the average. | 0.2 |
53 | Verbal form of p&iargn. | 0.2 |
54 | p& the first word of p's left sibling. | 0.2 |
55 | Average per-sentence frequency of the nominal form of p within the document. | 0.2 |
56 | p& the part of speech of p's parent's head word. | 0.2 |
57† | Same as 33 except using the maximum. | 0.2 |
58 | Same as 37 except using the average. | 0.1 |
59* | Minimum path length between argf of pf and iargn of p within VerbNet (Kipper 2005). | 0.1 |
60 | Frequency of the nominal form of p within the document. | 0.1 |
61 | p& the number of p's left siblings. | 0.1 |
62 | p&p's parent’s head word. | 0.1 |
63 | p& the syntactic category of p's right sibling. | 0.1 |
64 | p&p's morphological suffix. | 0.1 |
65 | TF cosine similarity between words from all f and words from the role description of iargn. | 0.1 |
66 | Percentage of elements in c′ that are quantified noun phrases. | 0.1 |
67* | Discourse relation whose two discourse units cover c (the primary filler) and p. | 0.1 |
68 | For any f, the minimum semantic similarity between pf and p using the method described by (Wu and Palmer 1994) over WordNet (Fellbaum 1998). | 0.1 |
69 | p& whether or not p is followed by a prepositional phrase. | 0.1 |
70 | p& the syntactic head word of p’s left sibling. | 0.1 |
71 | p& the stemmed content words in a three-word window around p. | 0.1 |
72 | Syntactic category of c&iargn& the verbal form of p. | 0.1 |
73 | Nominal form of p& the sorted integer argument indexes (the ns) from all argn of p. | 0.1 |
74 | Percentage of elements in c′ that are sentential subjects. | 0.1 |
75 | Whether or not the integer position of any argf equals that of iargn. | 0.1 |
76† | Same as 13 except using the average. | 0.1 |
77† | Same as 27 except using the average. | 0.1 |
78 | p&p's parent’s syntactic category. | 0.1 |
79 | p& the part of speech of the head word of p’s right sibling. | 0.1 |
80 | p& the semantic head word of p’s left sibling. | 0.1 |
81† | Maximum targeted coreference probability between argf of pf and iargn of p. This is a hybrid feature that calculates the coreference probability of Feature 33 using the corpus tuning method of Feature 13. | 0.1 |
# . | Feature value description . | Importance score . |
---|---|---|
1* | For every f, pf&argf&p&iargn. | 8.2 |
2* | Sentence distance from c to p. | 4.0 |
3* | For every f, the head word of f& the verbal form of p&iargn. | 3.6 |
4* | Same as 1 except generalizing pf and p to their WordNet synsets. | 3.3 |
5* | Same as 3 except generalizing f to its WordNet synset. | 1.0 |
6 | Whether or not c and p are themselves arguments to the same predicate. | 1.0 |
7 | p& the semantic head word of p's right sibling. | 0.7 |
8 | Whether or not any argf and iargn have the same integer argument position. | 0.7 |
9* | Frame element path between argf of pf and iargn of p in FrameNet (Baker, Fillmore, and Lowe 1998). | 0.6 |
10 | Percentage of elements in c′ that are subjects of a copular for which p is the object. | 0.6 |
11 | Whether or not the verb forms of pf and p are in the same VerbNet class and argf and iargn have the same thematic role. | 0.6 |
12 | p& the last word of p's right sibling. | 0.6 |
13*† | Maximum targeted PMI between argf of pf and iargn of p. | 0.6 |
14 | p& the number of p's right siblings. | 0.5 |
15 | Percentage of elements in c′ that are objects of a copular for which p is the subject. | 0.5 |
16 | Frequency of the verbal form of p within the document. | 0.5 |
17 | p& the stemmed content words in a one–word window around p. | 0.5 |
18 | Whether or not p's left sibling is a quantifier (many, most, all, etc.). Quantified predicates tend not to take implicit arguments. | 0.4 |
19 | Percentage of elements in c′ that are copular objects. | 0.4 |
20 | TF cosine similarity between words from arguments of all pf and words from arguments of p. | 0.4 |
21 | Whether the path defined in 9 exists. | 0.4 |
22 | Percentage of elements in c′ that are copular subjects. | 0.4 |
23* | For every f, the VerbNet class/role of pf/argf& the class/role of p/iargn. | 0.4 |
24 | Percentage of elements in c′ that are indefinite noun phrases. | 0.4 |
25* | p& the syntactic head word of p's right sibling. | 0.3 |
26 | p& the stemmed content words in a two-word window around p. | 0.3 |
27*† | Minimum selectional preference between any f and iargn of p. Uses the method described by (Resnik 1996) computed over an SRL-parsed version of the Penn TreeBank and Gigaword (Graff 2003) corpora. | 0.3 |
28 | p&p's synset in WordNet. | 0.3 |
29† | Same as 27 except using the maximum. | 0.3 |
30 | Average per–sentence frequency of the verbal form of p within the document. | 0.3 |
31 | p itself. | 0.3 |
32 | p& whether p is the head of its parent. | 0.3 |
33*† | Minimum coreference probability between argf of pf and iargn of p. | 0.3 |
34 | p& whether p is before a passive verb. | 0.3 |
35 | Percentage of elements in c′ that are definite noun phrases. | 0.3 |
36 | Percentage of elements in c′ that are arguments to other predicates. | 0.3 |
37 | Maximum absolute sentence distance from any f to p. | 0.3 |
38 | p&p's syntactic category. | 0.2 |
39 | TF cosine similarity between the role description of iargn and the concatenated role descriptions of all argf. | 0.2 |
40 | Average TF cosine similarity between each argn of each pf and the corresponding argn of p, where ns are equal. | 0.2 |
41 | Same as 40 except using the maximum. | 0.2 |
42 | Same as 40 except using the minimum. | 0.2 |
43 | p& the head of the following prepositional phrase's object. | 0.2 |
44 | Whether any f is located between p and any of the arguments annotated by NomBank for p. When true, this feature rules out false positives because it implies that the NomBank annotators considered and ignored f as a local argument to p. | 0.2 |
45 | Number of elements in c′. | 0.2 |
46 | p& the first word of p's right sibling. | 0.2 |
47 | p& the grammar rule that expands p's parent. | 0.2 |
48 | Number of elements in c′ that are arguments to other predicates. | 0.2 |
49 | Nominal form of p&iargn. | 0.2 |
50 | p& the syntactic parse tree path from p to the nearest passive verb. | 0.2 |
51 | Same as 37 except using the minimum. | 0.2 |
52† | Same as 33 except using the average. | 0.2 |
53 | Verbal form of p&iargn. | 0.2 |
54 | p& the first word of p's left sibling. | 0.2 |
55 | Average per-sentence frequency of the nominal form of p within the document. | 0.2 |
56 | p& the part of speech of p's parent's head word. | 0.2 |
57† | Same as 33 except using the maximum. | 0.2 |
58 | Same as 37 except using the average. | 0.1 |
59* | Minimum path length between argf of pf and iargn of p within VerbNet (Kipper 2005). | 0.1 |
60 | Frequency of the nominal form of p within the document. | 0.1 |
61 | p& the number of p's left siblings. | 0.1 |
62 | p&p's parent’s head word. | 0.1 |
63 | p& the syntactic category of p's right sibling. | 0.1 |
64 | p&p's morphological suffix. | 0.1 |
65 | TF cosine similarity between words from all f and words from the role description of iargn. | 0.1 |
66 | Percentage of elements in c′ that are quantified noun phrases. | 0.1 |
67* | Discourse relation whose two discourse units cover c (the primary filler) and p. | 0.1 |
68 | For any f, the minimum semantic similarity between pf and p using the method described by (Wu and Palmer 1994) over WordNet (Fellbaum 1998). | 0.1 |
69 | p& whether or not p is followed by a prepositional phrase. | 0.1 |
70 | p& the syntactic head word of p’s left sibling. | 0.1 |
71 | p& the stemmed content words in a three-word window around p. | 0.1 |
72 | Syntactic category of c&iargn& the verbal form of p. | 0.1 |
73 | Nominal form of p& the sorted integer argument indexes (the ns) from all argn of p. | 0.1 |
74 | Percentage of elements in c′ that are sentential subjects. | 0.1 |
75 | Whether or not the integer position of any argf equals that of iargn. | 0.1 |
76† | Same as 13 except using the average. | 0.1 |
77† | Same as 27 except using the average. | 0.1 |
78 | p&p's parent’s syntactic category. | 0.1 |
79 | p& the part of speech of the head word of p’s right sibling. | 0.1 |
80 | p& the semantic head word of p’s left sibling. | 0.1 |
81† | Maximum targeted coreference probability between argf of pf and iargn of p. This is a hybrid feature that calculates the coreference probability of Feature 33 using the corpus tuning method of Feature 13. | 0.1 |