Impact of global term selection (GTS) criteria on the different text types in the training set (80% of the corpus).
. | total # of terms . | # of terms selected in GTS . | % of terms removed in GTS . |
---|---|---|---|
unigram | 142,396 | 58,423 | 58.97 |
bigram | 3,119,422 | 1,115,170 | 64.25 |
stanford | 7,430,397 | 1,618,478 | 78.22 |
AEGIR | 5,096,918 | 1,312,715 | 74.24 |
. | total # of terms . | # of terms selected in GTS . | % of terms removed in GTS . |
---|---|---|---|
unigram | 142,396 | 58,423 | 58.97 |
bigram | 3,119,422 | 1,115,170 | 64.25 |
stanford | 7,430,397 | 1,618,478 | 78.22 |
AEGIR | 5,096,918 | 1,312,715 | 74.24 |