Most disagreements between De Facto's output and the gold standard are due to limitations in our system (84.4%), which mainly classify into insufficient coverage of factuality markers, either lexical or syntactic, and structural and lexical ambiguity. Other disagreements are due to some inaccuracy in the gold standard annotation (7.5%), or to an incorrect analysis from the dependency parser which escaped our manual correction (8.1%). Table 11 shows the error type distribution, distinguishing between lexical and syntactic error when relevant.
Error classification.
. | Error source . | % . | % Lexical . | % Syntactic . |
---|---|---|---|---|
De Facto limitations | Insufficient coverage | 34.4 | 1.9 | 32.5 |
Ambiguity | 46.2 | 18.1 | 28.1 | |
Other | 3.8 | – | – | |
Subtotal | 84.4 | 20 | 60.6 | |
Other error sources | Gold standard | 7.5 | – | – |
Wrong dependency trees | 8.1 | – | – | |
Subtotal | 15.6 | – | – |
. | Error source . | % . | % Lexical . | % Syntactic . |
---|---|---|---|---|
De Facto limitations | Insufficient coverage | 34.4 | 1.9 | 32.5 |
Ambiguity | 46.2 | 18.1 | 28.1 | |
Other | 3.8 | – | – | |
Subtotal | 84.4 | 20 | 60.6 | |
Other error sources | Gold standard | 7.5 | – | – |
Wrong dependency trees | 8.1 | – | – | |
Subtotal | 15.6 | – | – |