Topic analysis aims to study topic evolution and trends in order to help researchers understand the process of knowledge evolution and creation. This paper develops a novel topic evolution analysis framework, which we use to demonstrate, forecast, and explain topic evolution from the perspective of the geometrical motion of topic embeddings generated by pretrained language models. Our data set comprises approximately 15 million papers in the computer science field, with 7,000 “fields of study” to represent the topics. First, we demonstrate that over 80% of topics have undergone obvious motion in the semantic vector space, based on the hyperplane and its normal vector generated by a support vector machine. Subsequently, we verified the predictability of the motion based on three vector regression models by predicting topic embeddings. Finally, we employed a decoder to explain the predicted motion, whose forecast embeddings can capture about 50% of unseen topics. Our research framework shows that topic evolution can be analyzed via the geometrical motion of topic embeddings, and the semantic motion of old topics nurtures new topics. The current study opens new research pathways in topic analysis and sheds light on the topic evolution mechanism from a novel geometric perspective.

This content is only available as a PDF.

Author notes

Handling Editor: Li Tang

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. For a full description of the license, please visit https://creativecommons.org/licenses/by/4.0/legalcode.