Table 8 shows the results from running De Facto against both versions of the gold standard. De Facto's performance is evaluated in terms of precision and recall (P&R) and their harmonic mean, F1 score. We considered only those categories for which there exist more than 10 instances classified as such in the gold standard; that is: ct+, ct−, pr+, ps+, uu. Furthermore, P&R for the whole corpus is obtained by applying the measures of macro- and micro-averaging (last two columns in the table). Macro-averaging averages the result obtained in each class, and micro-averaging applies over the set of instances, regardless of class distribution. The first measure gives equal weight to each class and hence over-emphasizes the performance of the less populated ones, and the second one over-emphasizes the performance of the largest classes because it assigns equal weight to each instance. Given the uneven class distribution in our gold standard, we take the combination of both measures as indicative of the lower and upper bounds of the result.
P&R for each relevant category and for the whole corpus (macro- and micro-average).
. | CT+ . | CT− . | PR+ . | PS+ . | Uu . | Macro-A . | Micro-A . |
---|---|---|---|---|---|---|---|
Original parses | |||||||
Precision | 0.81 | 0.89 | 0.80 | 0.54 | 0.86 | 0.78 | 0.82 |
Recall | 0.89 | 0.65 | 0.32 | 0.67 | 0.66 | 0.64 | 0.79 |
F1 | 0.85 | 0.75 | 0.46 | 0.59 | 0.75 | 0.70 | 0.80 |
Corrected parses | |||||||
Precision | 0.86 | 0.90 | 0.73 | 0.56 | 0.86 | 0.78 | 0.85 |
Recall | 0.92 | 0.75 | 0.44 | 0.67 | 0.77 | 0.71 | 0.85 |
F-1 | 0.89 | 0.82 | 0.55 | 0.61 | 0.81 | 0.74 | 0.85 |
. | CT+ . | CT− . | PR+ . | PS+ . | Uu . | Macro-A . | Micro-A . |
---|---|---|---|---|---|---|---|
Original parses | |||||||
Precision | 0.81 | 0.89 | 0.80 | 0.54 | 0.86 | 0.78 | 0.82 |
Recall | 0.89 | 0.65 | 0.32 | 0.67 | 0.66 | 0.64 | 0.79 |
F1 | 0.85 | 0.75 | 0.46 | 0.59 | 0.75 | 0.70 | 0.80 |
Corrected parses | |||||||
Precision | 0.86 | 0.90 | 0.73 | 0.56 | 0.86 | 0.78 | 0.85 |
Recall | 0.92 | 0.75 | 0.44 | 0.67 | 0.77 | 0.71 | 0.85 |
F-1 | 0.89 | 0.82 | 0.55 | 0.61 | 0.81 | 0.74 | 0.85 |