Results for each combination of recurrent unit and attention type. All numbers are medians over 100 initializations. = no attention; = location-based attention; = content-based attention. A grayed-out cell indicates that the architecture scored below 50% on the test set. In (b), the SRN produced the first auxiliary 45% of the time; for all other models, the proportion of first-auxiliary outputs is almost exactly one minus the first-word accuracy (i.e., the proportion of main-auxiliary outputs).
This site uses cookies. By continuing to use our website, you are agreeing to our privacy policy. No content on this site may be used to train artificial intelligence systems without permission in writing from the MIT Press.