MIT Press

Figure 2:

Self-attentive encoder in Transformer (Vaswani et al., 2017) stacking m identical layers.

This Feature Is Available To Subscribers Only

This site uses cookies. By continuing to use our website, you are agreeing to our privacy policy. No content on this site may be used to train artificial intelligence systems without permission in writing from the MIT Press.

Accept