MIT Press

Figure 1:

Illustrations of the three PERL steps. PRD and PLR stand for the BERT prediction head and pooler head, respectively, FC is a fully connected layer, and msk stands for masked tokens embeddings (embeddings of tokens that were masked). NSP and MLM are the next sentence prediction and masked language model objectives. For the definitions of the PRD and PRL layers as well as the NSP objective, see Devlin et al. (2019). We mark frozen layers (layers whose parameters are kept fixed) and non-frozen layers with snow-flake and fire symbols, respectively. The token embedding and BERT layers values at the end of each step initialize the corresponding layers of the next step model. The BERT box of the fine tuning step is described in more details in Figure 2.

This Feature Is Available To Subscribers Only