Skip to Main Content
Table 1

Statistics on extracted word classes for English (Sections 2–21 of the Penn WSJ treebank) and Chinese (articles 1–270 and 400–1151 of the Penn Chinese treebank).


Corpus totals
Begin class
End class
Unary class
Strings
Words
B

E

U

English 
 Count 39,832 950,028 430,841 439,558 223,544 646,855 105,973 844,055 
 Percent   49.5 50.5 25.7 74.3 11.2 88.8 
 
Chinese 
 Count 18,086 493,708 188,612 269,000 165,591 292,021 196,732 296,976 
 Percent   41.2 58.8 36.2 63.8 39.9 60.1 

Corpus totals
Begin class
End class
Unary class
Strings
Words
B

E

U

English 
 Count 39,832 950,028 430,841 439,558 223,544 646,855 105,973 844,055 
 Percent   49.5 50.5 25.7 74.3 11.2 88.8 
 
Chinese 
 Count 18,086 493,708 188,612 269,000 165,591 292,021 196,732 296,976 
 Percent   41.2 58.8 36.2 63.8 39.9 60.1 
Close Modal

or Create an Account

Close Modal
Close Modal