Skip to Main Content
Table 1

The size of the various data sets used for the experiments in this article including the training, development (dev), incremental test set (devtest), and blind test set (test). The dev/devtest was a split of the NIST08 Urdu–English test set, and the blind test set was NIST09.



Urdu
English
set
lines
tokens
types
tokens
types
training 202k 1.7M 56k 1.7M 51k 
dev 981 21k 4k 19k 4k 
devtest 883 22k 4k 19–20k 4k 
test 1,792 42k 6k 38–41k 5k 


Urdu
English
set
lines
tokens
types
tokens
types
training 202k 1.7M 56k 1.7M 51k 
dev 981 21k 4k 19k 4k 
devtest 883 22k 4k 19–20k 4k 
test 1,792 42k 6k 38–41k 5k 
Close Modal

or Create an Account

Close Modal
Close Modal