Data statistics. par. len refers to paragraph length in bytes, and sent. len refers to sentence length in bytes. All lengths are in bytes. The development and test set is five-way annotated, and the expert data is four-way annotated.
. | # . | par. len . | sent. len . | feasible (%) . | infeasible (%) . | |
---|---|---|---|---|---|---|
w/ edit . | as is . | |||||
Train | 11290 | 695 | 156 | 60 | 31 | 9 |
Dev | 1945 | 695 | 162 | 67 | 21 | 12 |
Test | 1945 | 711 | 160 | 68 | 20 | 12 |
Expert | 100 | 658 | 163 | 63 | 26 | 12 |
. | # . | par. len . | sent. len . | feasible (%) . | infeasible (%) . | |
---|---|---|---|---|---|---|
w/ edit . | as is . | |||||
Train | 11290 | 695 | 156 | 60 | 31 | 9 |
Dev | 1945 | 695 | 162 | 67 | 21 | 12 |
Test | 1945 | 711 | 160 | 68 | 20 | 12 |
Expert | 100 | 658 | 163 | 63 | 26 | 12 |