MIT Press

Figure 4:

The contextual hypernym prediction model is based on BERT (Devlin et al., 2019). Input sentences c_t and c_w are tokenized, prepended with a [CLS] token, and separated by a [SEP] token. The target word t in the first sentence, c_t, and the related word w in the second sentence, c_w, are surrounded by < and > tokens. The class label (hypernym or not) is predicted by feeding the output representation of the [CLS] token through fully-connected and softmax layers.

This Feature Is Available To Subscribers Only