MIT Press

Figure 6:

The new Transformer architecture with the proposed synchronous bidirectional multi-head attention network, namely SBAtt. The input of decoder is concatenation of forward (L2R) sequence and backward (R2L) sequence. Note that all bidirectional information flow in decoder runs in parallel and only interacts in synchronous bidirectional attention layer.

This Feature Is Available To Subscribers Only