Abstract
Author self-citations are a somewhat controversial phenomenon. Some scholars maintain they are a normal, even indispensable, part of scientific referencing practice, while others claim they are frequently an expression of vanity and self-promotion. Citations are the basic data for citation network clustering, an important approach to creating bottom-up, data-driven, global taxonomic systems of research publications. Thus the topical information content of self-citations is of particular interest in this context. Since it is not yet known how author self-citations affect such systems, we study the question of their influence on cluster solution topic quality in a citation network by experimentally re-weighting self-citation edges by increasing and decreasing their link weights according to self-citation status and strength. As a case study, we investigate data on the field of astronomy and astrophysics. We assess the effects of self-citation manipulations by evaluating the quality of the resulting clustering solutions using diverse external data containing meaningful topic-structural information, namely topical journal special issues, co-usage data from scientific literature database search logs, grant funding data, and intellectual paper-level classification assignments. We find that we can reliably improve clustering solution quality by emphasizing self-citation link weights.
Author notes
Handling Editor: Vincent Larivière