MIT Press

Figure 4:

Results for each combination of recurrent unit and attention type. All numbers are medians over 100 initializations. = no attention; = location-based attention; = content-based attention. A grayed-out cell indicates that the architecture scored below 50% on the test set. In (b), the SRN produced the first auxiliary 45% of the time; for all other models, the proportion of first-auxiliary outputs is almost exactly one minus the first-word accuracy (i.e., the proportion of main-auxiliary outputs).

This Feature Is Available To Subscribers Only