Left: Histogram of BLEU scores that show wide variance in performance for a base NMT system (transformer) with different hyperparameters (e.g., BPE operations, # of layers, initial learning rate). Right: Scatterplot of BLEU and decoding time with different hyperparameters. Gold stars represent the Pareto-optimal systems.
This site uses cookies. By continuing to use our website, you are agreeing to our privacy policy. No content on this site may be used to train artificial intelligence systems without permission in writing from the MIT Press.