Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
Date
Availability
1-2 of 2
Tim C. E. Engels
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Quantitative Science Studies (2021) 2 (1): 65–88.
Published: 08 April 2021
FIGURES
Abstract
View article
PDF
Despite the centrality of disciplinary classifications in bibliometric analyses, it is not well known how the choice of disciplinary classification influences bibliometric representations of research in the social sciences and humanities (SSH). This is especially crucial when using data from national databases. Therefore, we examine the differences in the disciplinary profile of an article along with the absolute and relative number of articles across disciplines using five disciplinary classifications for journals. We use data on journal articles (2006–2015) from the national bibliographic databases VABB-SHW in Flanders (Belgium) and Cristin in Norway. Our study is based on pairwise comparisons of the local classifications used in these databases, the Web of Science subject categories, the Science-Metrix, and the ERIH PLUS journal classifications. For comparability, all classifications are mapped to the OECD Fields of Research and Development classification. The findings show that the choice of disciplinary classification can lead to over- or underestimation of the absolute number of publications per discipline. In contrast, if the focus is on the relative numbers, the choice of classification has practically no influence. These findings facilitate an informed choice of a disciplinary classification for journals in SSH when using data from national databases.
Includes: Supplementary data
Journal Articles
Publisher: Journals Gateway
Quantitative Science Studies (2021) 2 (1): 89–110.
Published: 08 April 2021
FIGURES
| View All (7)
Abstract
View article
PDF
We compare two supervised machine learning algorithms—Multinomial Naïve Bayes and Gradient Boosting—to classify social science articles using textual data. The high level of granularity of the classification scheme used and the possibility that multiple categories are assigned to a document make this task challenging. To collect the training data, we query three discipline specific thesauri to retrieve articles corresponding to specialties in the classification. The resulting data set consists of 113,909 records and covers 245 specialties, aggregated into 31 subdisciplines from three disciplines. Experts were consulted to validate the thesauri-based classification. The resulting multilabel data set is used to train the machine learning algorithms in different configurations. We deploy a multilabel classifier chaining model, allowing for an arbitrary number of categories to be assigned to each document. The best results are obtained with Gradient Boosting. The approach does not rely on citation data. It can be applied in settings where such information is not available. We conclude that fine-grained text-based classification of social sciences publications at a subdisciplinary level is a hard task, for humans and machines alike. A combination of human expertise and machine learning is suggested as a way forward to improve the classification of social sciences documents.