Training languages that are from UD 2.3 (Nivre et al. 2018) with the details including treebank name, family, word order, and data size of training and test sets.
Language . | Code . | Treebank . | Family . | Word Order . | Train . | Test . |
---|---|---|---|---|---|---|
Arabic | ar | PADT | Afro-Asiatic, Semitic | VSO | 6.1k | 680 |
Basque | eu | BDT | Basque | SOV | 5.4k | 1,799 |
Chinese | zh | GSD | Sino-Tibetan | SVO | 4.0k | 500 |
English | en | EWT | IE, Germanic | SVO | 12.5k | 2,077 |
Finnish | fi | TDT | Uralic, Finnic | SVO | 12.2k | 1,555 |
Hebrew | he | HTB | Afro-Asiatic, Semitic | SVO | 5.2k | 491 |
Hindi | hi | HDTB | IE, Indic | SOV | 13.3k | 1,684 |
Italian | it | ISDT | IE, Romance | SVO | 13.1k | 482 |
Japanese | ja | GSD | Japanese | SOV | 7.1k | 551 |
Korean | ko | GSD | Korean | SOV | 4.4k | 989 |
Russian | ru | SynTagRus | IE, Slavic | SVO | 15k* | 6,491 |
Swedish | sv | Talbanken | IE, Germanic | SVO | 4.3k | 1,219 |
Turkish | tr | IMST | Turkic, Southwestern | SOV | 3.7k | 975 |
Language . | Code . | Treebank . | Family . | Word Order . | Train . | Test . |
---|---|---|---|---|---|---|
Arabic | ar | PADT | Afro-Asiatic, Semitic | VSO | 6.1k | 680 |
Basque | eu | BDT | Basque | SOV | 5.4k | 1,799 |
Chinese | zh | GSD | Sino-Tibetan | SVO | 4.0k | 500 |
English | en | EWT | IE, Germanic | SVO | 12.5k | 2,077 |
Finnish | fi | TDT | Uralic, Finnic | SVO | 12.2k | 1,555 |
Hebrew | he | HTB | Afro-Asiatic, Semitic | SVO | 5.2k | 491 |
Hindi | hi | HDTB | IE, Indic | SOV | 13.3k | 1,684 |
Italian | it | ISDT | IE, Romance | SVO | 13.1k | 482 |
Japanese | ja | GSD | Japanese | SOV | 7.1k | 551 |
Korean | ko | GSD | Korean | SOV | 4.4k | 989 |
Russian | ru | SynTagRus | IE, Slavic | SVO | 15k* | 6,491 |
Swedish | sv | Talbanken | IE, Germanic | SVO | 4.3k | 1,219 |
Turkish | tr | IMST | Turkic, Southwestern | SOV | 3.7k | 975 |