Corpus statistics for the Wikinews unseen languages we use as an evaluation set.
Lang. . | Docs . | Mentions . | Entities . | |
---|---|---|---|---|
Distinct . | ∉ EnWiki . | |||
ru | 1,625 | 20,698 | 8,832 | 1,838 |
it | 907 | 8,931 | 4,857 | 911 |
pl | 1,162 | 5,957 | 3,727 | 547 |
fr | 978 | 7,000 | 4,093 | 349 |
cs | 454 | 2,902 | 1,974 | 200 |
pt | 666 | 2,653 | 1,313 | 113 |
zh | 395 | 2,057 | 1,274 | 110 |
Total | 6,187 | 50,198 | 26,070 | 4,068 |
Lang. . | Docs . | Mentions . | Entities . | |
---|---|---|---|---|
Distinct . | ∉ EnWiki . | |||
ru | 1,625 | 20,698 | 8,832 | 1,838 |
it | 907 | 8,931 | 4,857 | 911 |
pl | 1,162 | 5,957 | 3,727 | 547 |
fr | 978 | 7,000 | 4,093 | 349 |
cs | 454 | 2,902 | 1,974 | 200 |
pt | 666 | 2,653 | 1,313 | 113 |
zh | 395 | 2,057 | 1,274 | 110 |
Total | 6,187 | 50,198 | 26,070 | 4,068 |