Basic corpus statistics. Standard HTB splits.
. | train . | dev . | test . |
---|---|---|---|
Sentences | 4,937 | 500 | 706 |
Tokens | 93,504 | 8,531 | 12,619 |
Morphemes | 127,031 | 11,301 | 16,828 |
All mentions | 6,282 | 499 | 932 |
Type: Person (Per) | 2,128 | 193 | 267 |
Type: Organization (Org) | 2,043 | 119 | 408 |
Type: Geo-Political (Gpe) | 1,377 | 121 | 195 |
Type: Location (Loc) | 331 | 28 | 41 |
Type: Facility (Fac) | 163 | 12 | 11 |
Type: Work-of-Art (Woa) | 114 | 9 | 6 |
Type: Event (Eve) | 57 | 12 | 0 |
Type: Product (Duc) | 36 | 2 | 3 |
Type: Language (Ang) | 33 | 3 | 1 |
. | train . | dev . | test . |
---|---|---|---|
Sentences | 4,937 | 500 | 706 |
Tokens | 93,504 | 8,531 | 12,619 |
Morphemes | 127,031 | 11,301 | 16,828 |
All mentions | 6,282 | 499 | 932 |
Type: Person (Per) | 2,128 | 193 | 267 |
Type: Organization (Org) | 2,043 | 119 | 408 |
Type: Geo-Political (Gpe) | 1,377 | 121 | 195 |
Type: Location (Loc) | 331 | 28 | 41 |
Type: Facility (Fac) | 163 | 12 | 11 |
Type: Work-of-Art (Woa) | 114 | 9 | 6 |
Type: Event (Eve) | 57 | 12 | 0 |
Type: Product (Duc) | 36 | 2 | 3 |
Type: Language (Ang) | 33 | 3 | 1 |