Skip to Main Content
Table 2: 

Basic corpus statistics. Standard HTB splits.

traindevtest
Sentences 4,937 500 706 
Tokens 93,504 8,531 12,619 
Morphemes 127,031 11,301 16,828 
 
All mentions 6,282 499 932 
 
Type: Person (Per) 2,128 193 267 
Type: Organization (Org) 2,043 119 408 
Type: Geo-Political (Gpe) 1,377 121 195 
Type: Location (Loc) 331 28 41 
Type: Facility (Fac) 163 12 11 
Type: Work-of-Art (Woa) 114 
Type: Event (Eve) 57 12 
Type: Product (Duc) 36 
Type: Language (Ang) 33 
traindevtest
Sentences 4,937 500 706 
Tokens 93,504 8,531 12,619 
Morphemes 127,031 11,301 16,828 
 
All mentions 6,282 499 932 
 
Type: Person (Per) 2,128 193 267 
Type: Organization (Org) 2,043 119 408 
Type: Geo-Political (Gpe) 1,377 121 195 
Type: Location (Loc) 331 28 41 
Type: Facility (Fac) 163 12 11 
Type: Work-of-Art (Woa) 114 
Type: Event (Eve) 57 12 
Type: Product (Duc) 36 
Type: Language (Ang) 33 
Close Modal

or Create an Account

Close Modal
Close Modal