Abstract

Word embeddings are vectorial semantic representations built with either counting or predicting techniques aimed at capturing shades of meaning from word co-occurrences. Since their introduction, these representations have been criticised for lacking interpretable dimensions. This property of word embeddings limits our understanding of the semantic features they actually encode. Moreover, it contributes to the “black box” nature of the tasks in which they are used, since the reasons of word embeddings performance often remain opaque to humans. In this contribution, we explore the semantic properties encoded in word embeddings by mapping them onto interpretable vectors, consisting of explicit and neurobiologically-motivated semantic features (Binder et al. 2016). Our exploration takes into account different types of embeddings, including factorized count vectors and predict models (e.g., Skip-Gram, GloVe, etc.), as well as the most recent contextualized representations (i.e., ELMo and BERT).

In our analysis, we first evaluate the quality of the mapping in a retrieval task, then we shed lights on the semantic features that are better encoded in each embedding type. A large number of probing tasks is finally set to assess how the original and the mapped embeddings perform in discriminating semantic categories. For each probing task, we identify the most relevant semantic features and we show that there is a correlation between the embedding performance and how they encode those features. This study sets itself as a step forward in understanding which aspect of meaning are captured by vector spaces, by proposing a new and simple method to carve human-interpretable semantic representations from distributional vectors.

This content is only available as a PDF.

Author notes

*

Department of Chinese and Bilingual Studies, 11 Yuk Choi Road, Hung Hom, Kowloon, Hong Kong. E-mail: emmanuele.chersoni@polyu.edu.hk

**

MIT Computer Science and Artificial Intelligence Laboratory, 32 Vassar Street, Cambridge, MA 02139, United States. E-mail: esantus@mit.edu

Department of Chinese and Bilingual Studies, 11 Yuk Choi Road, Hung Hom, Kowloon, Hong Kong. E-mail: churen.huang@polyu.edu.hk

Department of Philology, Literature and Linguistics, Via Santa Maria 36, 56126 Pisa, Italy. E-mail: alessandro.lenci@unipi.it

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits you to copy and redistribute in any medium or format, for non-commercial use only, provided that the original work is not remixed, transfromed, or built upon, and the appropriate credit to the original source is given. For a full description of the license, please visit https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode.

Article PDF first page preview

Article PDF first page preview