Skip to Main Content
Table 4: 

Human evaluation of system outputs across several filtering methods, with manually judged Precision for the subset of outputs remaining after applying the given filter, Recall of sentences manually judged to be acceptable, and the Multiple (in terms of number of sentences) of the resulting dataset in relation to the original seed corpus. Filtering methods consider the iteration number, and scores from the paraphrase and aligner models for a given system output. The “lax” row applies a filter consisting of the conjunction of the criteria from rows 3, 5, and 7 (relatively lenient conditions) whereas the “strict” row conjoins the criteria from rows 2, 4, and 6 (which are stricter, and lead to higher precision but fewer lexical units).

FilteringPRMultiple
Unfiltered 68.25 100 11x 
 
Iter = 1 90.06 13.20 2x 
Iter ≤ 3 81.29 35.73 4x 
Paraphrase score ≤ 0.6 90.14 5.48 1.42x 
Paraphrase score ≤ 0.8 74.86 34.45 4.14x 
Aligner score ≥ .99 85.01 32.56 3.61x 
Aligner score ≥ .95 76.72 85.00 8.56x 
Lax conjunction 87.73 20.82 2.62x 
Strict conjunction 92.54 5.31 1.39x 
 
P-Classifier 95.00 15.61 2.28x 
R-Classifier 81.19 96.99 10.27x 
FilteringPRMultiple
Unfiltered 68.25 100 11x 
 
Iter = 1 90.06 13.20 2x 
Iter ≤ 3 81.29 35.73 4x 
Paraphrase score ≤ 0.6 90.14 5.48 1.42x 
Paraphrase score ≤ 0.8 74.86 34.45 4.14x 
Aligner score ≥ .99 85.01 32.56 3.61x 
Aligner score ≥ .95 76.72 85.00 8.56x 
Lax conjunction 87.73 20.82 2.62x 
Strict conjunction 92.54 5.31 1.39x 
 
P-Classifier 95.00 15.61 2.28x 
R-Classifier 81.19 96.99 10.27x 
Close Modal

or Create an Account

Close Modal
Close Modal