Corpus statistics at three alignment levels: sentence-aligned, paragraph-aligned, and doc-aligned. Average error rate was computed on the concatenation of development and test data at all three alignment levels.
. | Sentence-aligned . | Paragraph-aligned . | Doc-aligned . | Error Rate . | ||||||
---|---|---|---|---|---|---|---|---|---|---|
#sentences . | #paragraphs . | #docs . | ||||||||
Train . | Dev . | Test . | Train . | Dev . | Test . | Train . | Dev . | Test . | ||
Native Formal | 4 060 | 1 952 | 1 684 | 1 618 | 859 | 669 | 227 | 87 | 76 | 5.81% |
Native Web Informal | 6 977 | 2 465 | 2 166 | 3 622 | 1 294 | 1 256 | 3 619 | 1 291 | 1 256 | 15.61% |
Romani | 24 824 | 1 254 | 1 260 | 9 723 | 574 | 561 | 3 247 | 173 | 169 | 26.21% |
Second Learners | 30 812 | 2 807 | 2 797 | 8 781 | 865 | 756 | 2 050 | 167 | 170 | 25.16% |
Total | 66 673 | 8 478 | 7 907 | 23 744 | 3 592 | 3 242 | 9 143 | 1 718 | 1 671 | 18.19% |
. | Sentence-aligned . | Paragraph-aligned . | Doc-aligned . | Error Rate . | ||||||
---|---|---|---|---|---|---|---|---|---|---|
#sentences . | #paragraphs . | #docs . | ||||||||
Train . | Dev . | Test . | Train . | Dev . | Test . | Train . | Dev . | Test . | ||
Native Formal | 4 060 | 1 952 | 1 684 | 1 618 | 859 | 669 | 227 | 87 | 76 | 5.81% |
Native Web Informal | 6 977 | 2 465 | 2 166 | 3 622 | 1 294 | 1 256 | 3 619 | 1 291 | 1 256 | 15.61% |
Romani | 24 824 | 1 254 | 1 260 | 9 723 | 574 | 561 | 3 247 | 173 | 169 | 26.21% |
Second Learners | 30 812 | 2 807 | 2 797 | 8 781 | 865 | 756 | 2 050 | 167 | 170 | 25.16% |
Total | 66 673 | 8 478 | 7 907 | 23 744 | 3 592 | 3 242 | 9 143 | 1 718 | 1 671 | 18.19% |