The new Transformer architecture with the proposed synchronous bidirectional multi-head attention network, namely SBAtt. The input of decoder is concatenation of forward (L2R) sequence and backward (R2L) sequence. Note that all bidirectional information flow in decoder runs in parallel and only interacts in synchronous bidirectional attention layer.
This site uses cookies. By continuing to use our website, you are agreeing to our privacy policy. No content on this site may be used to train artificial intelligence systems without permission in writing from the MIT Press.