Comparison of the L2R order with SAO on running time, where b/s is batches per second and ms/s is ms per sentence. All experiments are conducted on 8 Nvidia V100 GPUs with 2000 tokens per GPU. We also compare beam sizes of 1 and 8 for SAO to search the best orders during training. We report the decoding speed of all three models based on greedy decoding.