The main objective of this study is to compare the amount of metadata and the completeness degree of research publications in new academic databases. Using a quantitative approach, we selected a random Crossref sample of more than 115k records, which was then searched in seven databases (Dimensions, Google Scholar, Microsoft Academic, OpenAlex, Scilit, Semantic Scholar, and The Lens). Seven characteristics were analyzed (abstract, access, bibliographic info, document type, publication date, language, and identifiers), to observe fields that describe this information, the completeness rate of these fields, and the agreement among databases. The results show that academic search engines (Google Scholar, Microsoft Academic, and Semantic Scholar) gather less information and have a low degree of completeness. Conversely, third-party databases (Dimensions, OpenAlex, Scilit, and The Lens) have more metadata quality and a higher completeness rate. We conclude that academic search engines lack the ability to retrieve reliable descriptive data by crawling the Web, while the main problem of third-party databases is the loss of information derived from integrating different sources.

Peer Review

https://www.webofscience.com/api/gateway/wos/peer-review/10.1162/qss_a_00286

This content is only available as a PDF.

Author notes

Handling Editor: Vincent Larivière

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. For a full description of the license, please visit https://creativecommons.org/licenses/by/4.0/legalcode.

Article PDF first page preview

Article PDF first page preview