Table 6: 
Comparison of the L2R order with SAO on running time, where b/s is batches per second and ms/s is ms per sentence. All experiments are conducted on 8 Nvidia V100 GPUs with 2000 tokens per GPU. We also compare beam sizes of 1 and 8 for SAO to search the best orders during training. We report the decoding speed of all three models based on greedy decoding.
ModelTraining (b/s)Decoding (ms/s)
L2R 4.21 12.3 
SAO (b = 1) 1.12 12.5 
SAO (b = 8) 0.58 12.8 
ModelTraining (b/s)Decoding (ms/s)
L2R 4.21 12.3 
SAO (b = 1) 1.12 12.5 
SAO (b = 8) 0.58 12.8 
Close Modal

or Create an Account

Close Modal
Close Modal