In this article, we show and discuss the results of a quantitative and qualitative analysis of open citations of retracted publications in the humanities domain. Our study was conducted by selecting retracted papers in the humanities domain and marking their main characteristics (e.g., retraction reason). Then, we gathered the citing entities and annotated their basic metadata (e.g., title, venue, subject) and the characteristics of their in-text citations (e.g., intent, sentiment). Using these data, we performed a quantitative and qualitative study of retractions in the humanities, presenting descriptive statistics and a topic modeling analysis of the citing entities’ abstracts and the in-text citation contexts. As part of our main findings, we noticed that there was no drop in the overall number of citations after the year of retraction, with few entities that have either mentioned the retraction or expressed a negative sentiment toward the cited publication. In addition, on several occasions, we noticed a higher concern/awareness by citing entities belonging to the health sciences domain about citing a retracted publication, compared with the humanities and social science domains. Philosophy, arts, and history are the humanities areas that showed higher concern toward the retraction.

Retraction is a way to correct the literature and alert readers to erroneous materials in the published literature. A retraction should be formally accompanied by a retraction notice—a document that justifies such a retraction. Reasons for retraction include plagiarism, peer review manipulation, and unethical research (Barbour, Kleinert et al., 2009).

Several works in the past have studied and uncovered important aspects regarding this phenomenon, such as the reasons for retraction (Casadevall, Steen, & Fang, 2014; Corbyn, 2012), the temporal characteristics of the retracted articles (Bar-Ilan & Halevi, 2018), their authors’ countries of origin (Ataie-Ashtiani, 2018), and the impact factor of the journals publishing them (Campos-Varela, Villaverde-Castañeda, & Ruano-Raviña, 2020; Fang & Casadevall, 2011). Other works have analyzed authors with a higher number of retractions (Brainard, 2018), and the scientific impact, technological impact, funding impact, and Altmetric impact in retractions (Feng, Yuan, & Yang, 2020). Other studies focused on the retraction in the medical and biomedical domain (Campos-Varela, Villaverde-Castañeda, & Ruano-Raviña, 2020; Gaudino, Robinson et al., 2021; Gasparyan, Ayvazyan et al., 2014).

Scientometricians have also proposed several works on retraction based on quantitative data. For instance, several works (Azoulay, Bonatti, & Krieger, 2017; Lu, Jin et al., 2013; Mongeon & Larivière, 2016; Shuai, Rollins et al., 2017) focused on showing how a single retraction could trigger citation losses through an author’s prior body of work. Bordignon (2020) investigated the different impacts that negative citations in articles and comments posted on postpublication peer review platforms have on the correction of science, while Dinh, Sarol et al. (2019) applied descriptive statistics and ego-network methods to examine 4,871 retracted articles and their citations before and after retraction. Other authors focused on the analysis of the citations made before the retraction (Bolland, Grey, & Avenell, 2021) and on a specific reason for retraction, such as misconduct (Candal-Pedreira, Ruano-Ravina et al., 2020). The studies that considered only one retraction case usually observed also the in-text citations and the related citation context in the articles citing retracted publications (Bornemann-Cimenti, Szilagyi, & Sandner-Kiesling, 2016; Luwel, van Eck, & van Leeuwen, 2019; Schneider, Ye et al., 2020; van der Vet & Nijveen, 2016).

Although citation analysis concerning retraction has been done several times in Science, Technology, Engineering, and Mathematics (STEM) disciplines, less attention has been given to the humanities domain. One of the rare analyses done in the humanities domain was recently presented by Halevi (2020), who considered two examples of retracted articles and showed their continuous postretraction citations.

Our study seeks to expand the work concerning the analysis of citations of retracted publications in the humanities domain. By combining quantitative analysis with quantification of citations and their related characteristics/metadata, and qualitative analysis, through a subjective examination of aspects related to the quality of the citations (e.g., the reason for a citation based on the examination/interpretation of its in-text citation context), we aim to understand this phenomenon in the humanities, which has gained little attention in the past literature. In particular, the research questions (RQ1–RQ3) we aim to address are

  • RQ1: How did scholarly research cite retracted humanities publications before and after their retraction?

  • RQ2: Did all the humanities areas behave similarly concerning the retraction phenomenon?

  • RQ3: What were the main differences in citing retracted publications between STEM disciplines and the humanities?

In this paper, we use a methodology developed to gather, characterize, and analyze incoming citations of retracted publications (Heibi & Peroni, 2022), adapted for the case of the humanities1. The citation analysis is based on collections of open citations (i.e., data are structured, separate, open, identifiable, and available) (Peroni & Shotton, 2018, 2020).

The workflow followed to gather and analyze the data in this study is based on the methodology introduced in Heibi and Peroni (2022), briefly summarized in Figure 1. The first two phases of the methodology are dedicated to the collection and characterization of the entities that have cited the retracted publications. The third phase is focused on analyzing the information annotated in the first two phases to summarize quantitatively the data collected. The fourth and final phase applies a topic modeling analysis (Barde & Bainwad, 2017) on the textual information (extracted from the full text of the citing entities) and builds a set of dynamic visualizations to enable an overview and investigation of the generated topics. The data gathering of our study is detailed in the following sections.

Figure 1.

A summarizing schema representing the methodology in its four phases: identifying, retrieving, and characterizing the citing entities; extracting and labeling additional features based on the citing entities’ contents; building a descriptive statistical summary; and running a topic modeling analysis.

Figure 1.

A summarizing schema representing the methodology in its four phases: identifying, retrieving, and characterizing the citing entities; extracting and labeling additional features based on the citing entities’ contents; building a descriptive statistical summary; and running a topic modeling analysis.

Close modal

2.1. Retraction in the Humanities

First, we wanted to have a descriptive statistical overview of the retractions in the humanities as a function of crucial features (e.g., reasons of retraction) to help us define the set of retractions to use as input in the next phases. Thus, we queried the Retraction Watch database (https://retractiondatabase.org; Collier, 2011) searching for all the retracted publications labeled as humanities (marked with “HUM” in the database). Thus, the humanities domain considered in this work is based on the subject classification used by Retraction Watch (i.e., the subjects under the macro category “(HUM) Humanities”). Then we classified the results as a function of three parameters: the year of the retraction, the subject area of the retracted publications (architecture, arts, etc.), and the reason(s) for the retraction. We collected an overall number of 474 publications; the earliest retraction occurred in 2002, and the last year of retraction we obtained was 2020.

As shown in Figure 2, we noticed an increasing trend throughout the years, with some exceptions. In particular, we observed that the highest number of retractions per year was 119 in 2010, probably due to an investigation and a massive retraction of several articles belonging to one author, Joachim Boldt (Brainard, 2018). When looking at the subject areas, we noticed that most of the retractions are related to arts and history, and plagiarism motives2 were by far the most representative ones, confirming the observation in Halevi (2020). Most of the retracted publications (88%) are of article type (i.e., labeled in Retraction Watch as either “Conference Abstract/Paper,” “Research Article,” or “Review Articles”). Book chapters/References represent 8% of the total, and the rest are “Commentary/Editorials” (1%), and other residual types (3%, e.g., letters, case reports, articles in press).

Figure 2.

Retractions in the humanities domain with respect to three different features: the year of retraction (line chart), the subject areas of the retracted publications (ring chart), the type of the retracted publication (large horizontal bar), and the reasons for retraction (horizontal bar chart). Based on the data retrieved from the Retraction Watch database in June 2021.

Figure 2.

Retractions in the humanities domain with respect to three different features: the year of retraction (line chart), the subject areas of the retracted publications (ring chart), the type of the retracted publication (large horizontal bar), and the reasons for retraction (horizontal bar chart). Based on the data retrieved from the Retraction Watch database in June 2021.

Close modal

2.2. Retracted Publications Set and their Citations

As the focus of our study is on the analysis of citations of fully retracted publications, we excluded all the retracted publications collected in the previous step that did not receive at least one citation according to two open citation databases: Microsoft Academic Graph (MAG, https://www.microsoft.com/en-us/research/project/microsoft-academic-graph/) (Wang, Shen et al., 2020) and OpenCitations’ (2020) COCI (https://opencitations.net/index/coci) (Heibi, Peroni, & Shotton, 2019). MAG is a knowledge graph that contains the scientific publication records, citations, authors, institutions, journals, conferences, and fields of study. It also provides a free REST API service to search, filter, and retrieve its data. COCI is a citation index that contains details of all the DOI-to-DOI citation links retrieved by processing the open bibliographic references available in Crossref (Hendricks, Tkaczyk et al., 2020), and it can be queried using open and free REST APIs. We decided to not use other proprietary and nonopen databases because we aimed to make our workflow and results as reproducible as possible.

After querying COCI and MAG3, we found that 85 retracted items (out of 474) had at least one citation (2,054 citations). We manually checked the data set for possible mistakes introduced by the collections. Indeed, either some of the citing entities identified in MAG did not include a bibliographic reference to any of the retracted publications or the retracted publication in consideration was not cited in the content of the citing entity (although present in its reference list), or the citing entities’ type did not refer to a scholarly publication (e.g., bibliography, retraction notice, presentation, data repository). There was also one article retracted for duplication “The Nature of Creativity” by Sternberg (2006) that received 1,050 citations. This retracted article contains a substantial amount of content published by the same author in several of his previous works and it was the fourth retracted article by the same author who used to cite himself at a high rate while not doing enough to encourage diversity in psychology research. We decided to exclude it from our study to reduce bias in the results. Following these considerations, the final number of retracted publications considered was 84, involving a total number of 935 unique citing entities. As shown in the bubble chart in Figure 3, most of the citing entities (i.e., 891) were included in MAG; 388 were included in COCI, and they shared 344 entities.

Figure 3.

A Venn diagram (bubble chart) to plot the number of entities gathered from MAG (Microsoft Academic Graph) and COCI (OpenCitations Index of Crossref open DOI-to-DOI citations) which have cited the retracted publications, along with their distribution according to the hum_affinity score of the retracted publication they cite (pie chart).

Figure 3.

A Venn diagram (bubble chart) to plot the number of entities gathered from MAG (Microsoft Academic Graph) and COCI (OpenCitations Index of Crossref open DOI-to-DOI citations) which have cited the retracted publications, along with their distribution according to the hum_affinity score of the retracted publication they cite (pie chart).

Close modal

Although the retracted items identified so far were all in the humanities domain according to the categories specified in Retraction Watch, an item might have other nonhumanities subjects associated with it. Sometimes, these nonhumanities subjects might be more representative of the content of the retracted document and, thus, they might generate an unwanted bias for the rest of the analysis. For instance, consider the retracted article “The good, the bad, and the ugly: Should we completely banish human albumin from our intensive care units?” (Boldt, 2000). In Retraction Watch, the subjects associated with it were medicine and journalism. Yet, when we checked the full text of the article, we noticed that argumentations close to journalism are very few and, as such, the article should not be considered as belonging to humanities research.

To avoid considering these peculiar publications in our analysis, we devised a mechanism to help us evaluate the affinity of each retracted item to the humanities domain. We assigned to each retracted item in the list (84) an initial score of 1, named hum_affinity—this value ranges from 0 (i.e. very low) to 5 (i.e. very high). The final value of hum_affinity for each retracted item is calculated as follows:

  1. We assigned to each retracted item additional subject categories obtained by searching the venue where it was published in external databases—we used Scimago classification (https://www.scimagojr.com/) for journals and the Library of Congress Classification (LCC, https://www.loc.gov/catdir/cpso/lcco/) for books/book chapters.

  2. If both the Retraction Watch subjects and those gathered in step (1) included at least one subject identifying a discipline in the humanities, we added 1 to hum_affinity of that item.

  3. If all the Retraction Watch subjects are part of the humanities domain, we added another 1 to hum_affinity of that item.

  4. If the title of the retracted item has a clear affinity to the humanities (e.g., “The origins of probabilism in late scholastic moral thought”), we added another 1 to hum_affinity of that item.

  5. Finally, we provided a subjective score of −1, 0, or 1 based on the abstract of the item. For instance, we assigned 1 to the abstract of the retracted article of Mößner (2011): “… This paper aims at a more thorough comparison between Ludwik Fleck’s concept of thought style and Thomas Kuhn’s concept of paradigm. Although some philosophers suggest that these two concepts ….”

The pie chart in Figure 3 shows how we classified the retracted publications and those citing them according to their hum_affinity score. To narrow our analysis and reduce bias, we decided to consider only the retracted publications (and their corresponding citing entities) having a medium or high hum_affinity score (i.e., ≥ 2). Twelve retracted publications have been excluded from the analysis (i.e., hum_affinity < 2) along with their 257 citations. A list of the excluded retracted publications is available at the Zenodo repository (Heibi & Peroni, 2021b). At the end of this phase, the final number of retracted items we considered was 72, with 678 citing entities.

2.3. Annotating the Citation Characteristics

Once collected the 72 retracted items and their related 678 citing entities were collected, we wanted to characterize such citing entities with respect to their basic metadata and full-text content.

2.3.1. Gathering citing entities metadata

We retrieved basic metadata via REST APIs from either COCI/MAG, for each citing entity (i.e., DOI (if any), year of publication, title, venue id (ISSN/ISBN), and venue title). Then, using the Retraction Watch database, we annotated whether the citing entity was fully retracted as well.

We also classified the citing entities into areas of study and specific subjects, following the Scimago Journal Classification (https://www.scimagojr.com/), which uses 27 main subject areas (medicine, social sciences, etc.) and 313 subject categories (psychiatry, anatomy, etc.). We searched for the titles and IDs (ISSN/ISBN) of the venues of publication of all the citing entities and classified them into specific subject areas and subject categories. For books/book chapters, we used the ISBNDB service (https://isbndb.com/) to look up the related Library of Congress Classification (LCC, https://www.loc.gov/catdir/cpso/lcco/), and then we mapped the LCC categories into a corresponding Scimago subject area using an established set of rules detailed in Heibi and Peroni (2022).

2.3.2. Extracting textual content features

We extracted the abstract of each citing entity and all its in-text citations of the retracted publications in our set, marking the reference pointers to them (i.e., the in-line textual devices, e.g., “[3]” used to refer to bibliographic references), the section where they appear, and their citation context4. The citation context is based on the sentence that contains the in-text reference (i.e., the anchor sentence), plus the preceding and following sentences5. The definition of this citation context is based on the study of Ritchie, Robertson, and Teufel (2008). We annotated the first-level sections containing the in-text citation with their type using the categories “introduction,” “method,” “abstract,” “results,” “conclusions,” “background,” and “discussion” listed in Suppe (1998) if such section rhetoric was clear by looking at its title; otherwise we used other three residual categories: “first section,” “middle section,” and “final section,” depending on their position in the citing entity.

Then, we manually annotated each in-text citation with three main features: the citation sentiment conveyed by the citation context, whether the citation context mentioned the retraction of the cited entity, and the citation intent. The annotation of the citation sentiment is inspired by the classification proposed in Bar-Ilan and Halevi (2017), and we marked each in-text citation with one of the following values:

  • positive, when the retracted publication was cited as sharing valid conclusions, and its findings could also have been used in the citing entity;

  • negative, if the citing entity cited the retracted publication and addressed its findings as inappropriate and/or invalid; and

  • neutral, when the author of the citing entity referred to the retracted publication without including any judgment or opinion regarding its validity.

Then, we annotated with yes/no each citing entity if any in-text citation context we gathered from it did/did not explicitly mention the fact that the cited entity was retracted. Finally, we annotated the intent of each in-text citation. The citation intent (or citation function) is defined as the authors’ reason for citing a specific publication (e.g., the citing entity uses a method defined in the cited entity). To label such citation functions, we used those specified in the Citation Typing Ontology (CiTO, https://purl.org/spar/cito) (Peroni & Shotton, 2012), an ontology for the characterization of factual and rhetorical bibliographic citations. We used the decision model developed and adopted in Heibi and Peroni (2021a) to decide which citation function select to label an in-text citation. Figure 4 shows part of the decision model; it presents the case when the intent of the citation is “Reviewing and eventually giving an opinion on the cited entity” and the citation function is part of one of the following groups: “Consistent with,” “Inconsistent with,” or “Talking about.”

Figure 4.

Part of the decision model for the selection of a CiTO (Citation Typing Ontology) citation function for annotating the citation intent of an examined in-text citation based on its citation context. The first large row contains one of the three macro categories (“Reviewing …”); each macro category has a set of subcategories such that each subcategory refers to a set of citation functions. The first row defines what citation functions are suitable for it through the help of a guiding sentence that needs to be completed according to the chosen subcategory and citation function.

Figure 4.

Part of the decision model for the selection of a CiTO (Citation Typing Ontology) citation function for annotating the citation intent of an examined in-text citation based on its citation context. The first large row contains one of the three macro categories (“Reviewing …”); each macro category has a set of subcategories such that each subcategory refers to a set of citation functions. The first row defines what citation functions are suitable for it through the help of a guiding sentence that needs to be completed according to the chosen subcategory and citation function.

Close modal

We do not introduce the full details of the labeling process due to space constraints; the complete diagram of the decision model is available in Heibi (2022), and an extensive introduction and explanation can be found in Heibi and Peroni (2022).

We have produced an annotated data set containing 678 citing entities and 1,020 in-text citations of 72 retracted publications. We have published a dedicated web page (https://ivanhb.github.io/ret-analysis-hum-results/) embedding visualizations that enable the readers to view and interact with the results, also available in Heibi and Peroni (2021b).

In the following sections, we introduce some important concepts adopted in the description and organization of our results. Then we show the results of quantitative and qualitative analyses of all the data we collected.

3.1. Data Organization

We defined three periods to distribute the citations of retracted publications:

  • Period P-Pre—from the year of publication of the retracted work to the year before its full retraction (the year of the retraction is not part of this period).

  • Period P-Ret—the year of the full retraction.

  • Period P-Post—from the year after the full retraction to the year of the last citation received by the retracted publication, according to the citation data we gathered.

Each citing entity falls under one of the above three periods. The two periods P-Pre and P-Post were split into fifths, labeled “[−1.00, −0.61],” “[−0.60, −0.21],” “[−0.20, 0.20],” “[0.21, 0.60],” and “[0.61, 1.00].” When the citing entity is part of either P-Pre or P-Post, then it is also part of a specific fifth, which identifies how close or far that entity is to or from the events that defining the period.

The division into fifths helped us define a uniform time span to locate the citing entities independently of the year of retraction of the work they cite and the publication years of the citing and cited entities6. For instance, if an entity A published in 2011 had cited a retracted publication R published in 2002, fully retracted in 2012, then A is part of the last fifth (i.e., “[0.61, 1.00]”) of P-Pre. This means that A has cited R in the last fifth, immediately before the formal retraction of R.

3.2. Descriptive Statistics

We have classified the distribution of the citing entities in the three periods (i.e., P-Pre, P-Ret, and P-Post) as a function of the humanities disciplines used in Retraction Watch, as shown in Figure 5. Religion was the discipline that received the highest number of citations (375), and history had the highest number of retracted items (20).

Figure 5.

The number of citing entities in P-Pre (before the year of retraction), P-Ret (in the year of retraction), and P-Post (after the year of retraction) for each different humanities discipline specified to the retracted publication as gathered from Retraction Watch.

Figure 5.

The number of citing entities in P-Pre (before the year of retraction), P-Ret (in the year of retraction), and P-Post (after the year of retraction) for each different humanities discipline specified to the retracted publication as gathered from Retraction Watch.

Close modal

In Figure 6 we have classified the entities citing a retracted publication in each discipline according to their subject areas. Arts and humanities and Social sciences (AH&SS) were highly represented in both the P-Pre and P-Post periods of almost all the retracted publications’ disciplines. However, we noticed some exceptions to this rule in P-Pre in Journalism (10% of citing entities were AH&SS publications), P-Post in Arts (13% AH&SS publications), and P-Pre and P-Post of Architecture (no AH&SS publications in either period).

Figure 6.

The subject areas distribution of the citing entities of the retracted publications in P-Pre (before the year of retraction) and P-Post (after the year of retraction) for each different humanities discipline as specified in Retraction Watch. The number of citing entities is mentioned between brackets.

Figure 6.

The subject areas distribution of the citing entities of the retracted publications in P-Pre (before the year of retraction) and P-Post (after the year of retraction) for each different humanities discipline as specified in Retraction Watch. The number of citing entities is mentioned between brackets.

Close modal

Because we expected, as also highlighted in previous studies (e.g., Ngah & Goi, 1997), that a good part of the citations of humanities publications come from AH&SS publications, we decided to look more deeply into the obtained results before moving on to the next stage. As shown in Figure 5, we noticed that Journalism has a completely different behavior compared to the other disciplines. Indeed, the citations of Journalism have cited three retracted publications: two with a hum_affinity of 3, and one with a hum_affinity of 2. The latter article was “Personality, stress and disease: Description and validation of a new inventory” (Grossarth-Maticek & Eysenck, 1990). This article has 130 citations (almost 95% of all the citations in Journalism). Retraction Watch has labeled this article with the additional two subject areas: Public Health and Safety and Sociology; therefore Journalism represents the only humanities subject. A further investigation in the full text of the paper revealed the fact that this article is highly related to health sciences, and Journalism has a marginal (almost absent) relevance in it. Considering these discovered facts, we felt that this article could represent a significant bias in our analysis. Therefore, to limit its impact on the results we decided to exclude it from our analysis.

As a further check, we have investigated all the retracted publications of all the humanities disciplines in Figure 6 having citations from Arts and humanities publications less than 20% in either P-Pre or P-Post. Arts and Architecture are the two disciplines falling in this category. After a manual check, we detected the article “A systematic review on postimplementation evaluation models of enterprise architecture artefacts” (Nikpay, Ahmad et al., 2020), classified under Architecture, yet while reading its full text we found little evidence supporting the proposed labeling, as it was a computer science study. Therefore, we decided to also exclude this article from our analysis.

After this data refinement, our final data were reduced to 546 citing entities and 786 in-text citations of 70 retracted publications. Considering the final data and the classification of the retracted publications based on their humanities discipline, we investigated another aspect: In Figure 7 we have plotted the total number of citations gained by each humanities discipline as a function of the number of years passed after the date of retraction. This trend is compared to the average time of retraction for each humanities discipline. From Figure 7, we noticed that on average disciplines such as religion and philosophy reported their peak in the year before their retraction, while this trend is the opposite for history, arts, and architecture.

Figure 7.

The total number of citations gained by the retracted publications, grouped according to their humanities discipline (represented by different colors), as a function of the number of years passed after their date of retraction. The vertical dotted lines represent the average time of retraction of each humanities discipline. The gray line sums up all the humanities disciplines together.

Figure 7.

The total number of citations gained by the retracted publications, grouped according to their humanities discipline (represented by different colors), as a function of the number of years passed after their date of retraction. The vertical dotted lines represent the average time of retraction of each humanities discipline. The gray line sums up all the humanities disciplines together.

Close modal

To infer other interesting statistics regarding the obtained results, we treated the citing entities and the in-text citations they contain as two different classes, and we present descriptive statistics of these two classes in the following subsections.

3.2.1. Citing entities

We examined the distribution of the citing entities to retracted publications as a function of two features: the periods (i.e., P-Pre, P-Ret, and P-Post), further classified into those that mentioned the retraction or for which we could not access their full-text; and their subject areas. The results are shown in Figure 8.

Figure 8.

A descriptive statistical summary of the distribution of the citing entities to retracted publications in the three periods (P-Pre, P-Ret, and P-Post; i.e., before/during/after the year of retraction), also considering their subject areas. The bar charts on top highlight the citing entities that either did or did not mention the retraction and those for which we could not retrieve the full text.

Figure 8.

A descriptive statistical summary of the distribution of the citing entities to retracted publications in the three periods (P-Pre, P-Ret, and P-Post; i.e., before/during/after the year of retraction), also considering their subject areas. The bar charts on top highlight the citing entities that either did or did not mention the retraction and those for which we could not retrieve the full text.

Close modal

The number of citing entities before the retraction (192, period P-Pre) was lower than the number of citing entities after the retraction (260, period P-Post). Along P-Pre and P-Ret, we noticed a continuous increment in the overall number of citing entities, which suddenly started decreasing after the first fifth of P-Post, yet the numbers were in line with those observed in the third and fourth fifths of P-Pre. The last fifth of P-Post is an exception to the declining trend, with an unexpected high peak. This result was due to the fact that 27 retracted items received only one citation in P-Post and, in these cases, that citation always represented the last citation received, which is the final border of P-Post.

The full text of 8.42% of the citing entities was not accessible. For those for which we successfully retrieved the full text, our results showed that a relatively low percentage mentioned the retraction of the cited entity—2.25% of the total number of citing entities in P-Ret and P-Post.

Looking at their subject areas, we noticed that the citing entities started to spread into a higher number of subject areas (i.e., an additional nine) in P-Post compared to P-Pre, where the residual category Others contained 16% of the citing entities. The Arts and humanities subject area had a similar percentage throughout all three periods (22.94%, 18.42%, and 18.14%), and it represents, together with Social sciences, the two most representative subject areas in P-Ret and P-Post. We also noticed an important drop in Psychology, from 15.41% in P-Pre to 4.42% in P-Post.

3.2.2. In-text citations

We focused on the distribution of the in-text citations as a function of three features: the periods (i.e., P-Pre, P-Ret, and P-Post); the citation intent; and the section containing the in-text citation. The results of the three distributions have been further classified according to the in-text citation sentiment (i.e., negative/neutral/positive), as shown in Figure 9.

Figure 9.

A descriptive statistical summary for the distribution of the in-text citations contained in the citing entities to the retracted publications in the three periods (P-Pre, P-Ret, and P-Post, i.e., before/during/after the year of retraction), according to their intent, and section. The sentiment of the in-text citations is also highlighted.

Figure 9.

A descriptive statistical summary for the distribution of the in-text citations contained in the citing entities to the retracted publications in the three periods (P-Pre, P-Ret, and P-Post, i.e., before/during/after the year of retraction), according to their intent, and section. The sentiment of the in-text citations is also highlighted.

Close modal

The overall trend in the number of in-text citations during the three periods was close to the one we observed for the citing entities (shown in the previous section), although the differences between P-Pre and P-Post were even more marked. As introduced in the previous section, the pick in the last fifth of P-Post was due to the retracted items receiving only one citation in P-Post. Even though the overall percentage of negative citations was low, it had a higher presence in P-Pre (4.5%). Generally, most in-text citations were tagged as neutral, and very few were positive (0.75%).

The citation intents “obtains background from” and “cites for information” were the two most dominant ones in the three periods, and they represented 31.29% and 22.64% of the total number of in-text citations, respectively. The citation intent “cites for information” increased its presence moving from 17.8% in P-Pre to 27.20% in P-Post.

Considering the citation sections, we can clearly see that the in-text citations were mostly located in the “Introduction” section in all the three periods. The in-text citations in the section “Introduction” decreased a lot after P-Ret moving from 30.15% in P-Pre to 22.13% in P-Post. Instead, the in-text citations contained in the section “Discussion” have an increasing trend, from 6.87% in P-Pre to 15.20% in P-Post.

3.3. Topic Models of Citing Entities’ Abstracts and their Citation Contexts

A topic modeling analysis is a statistical modeling approach for automatically discovering the topics (represented as a set of words) that occur in a collection of documents. We used it with our data to understand what the evolution of the topics in time was and whether it was dependent, in some way, on the retraction received by the publications considered.

A standard workflow for building a topic model is based on three main steps: tokenization, vectorization, and topic model I creation. The topic model we have built is based on the Latent Dirichlet Allocation (LDA) model (Jelodar, Wang et al., 2019). In the tokenization process, we have converted the text into a list of words by removing punctuation, unnecessary characters, and stop words, and we also decided to lemmatize and stem the extracted tokens. In the second step, we created vectors for each of the generated tokens using a Bag-of-Words (BoW) model (Brownlee, 2019), which we considered appropriate to model our study considering our direct experience in previous findings (Heibi & Peroni, 2021a) and the suggestions by Bengfort, Bilbro, and Ojeda (2018) on the same issue. Finally, to build the LDA topic model, we determined in advance the number of topics to retrieve according to the examined corpus using a popular method based on the value of the topic coherence score, as suggested in Schmiedel, Müller, and vom Brocke (2019), which can be used to measure the degree of semantic similarity between high-scoring words in the topic.

We built and executed two LDA topic models, one using the abstracts of the entities citing the retracted publications (with 16 topics), named TM-Abs, and another using the citation contexts where the in-text reference pointers to retracted publications were contained (with 20 topics), named TM-Cits. To create the topic models, we used MITAO (Ferri, Heibi et al., 2020) (https://github.com/catarsi/mitao), a visual interface to create a customizable visual workflow for text analysis. With MITAO, we have generated two visualizations: Latent Dirichlet Allocation Visualization (LDAvis) (Sievert & Shirley, 2014) for an overview of the topic modeling results, and Metadata-Based Topic Modeling Visualization (MTMvis) for a dynamic and interactive visualization of the topics based on customizable metadata.

3.3.1. Citing entities abstracts

The total number of available abstracts in our data set was 509. We extended the list of MITAO’s default English stop words (“the”, “is”, etc.) with ad hoc stop words devised for our study, such as “method,” “results,” and “conclusions,” which represent the typical words that might be part of a structured abstract.

Figure 10 shows the topic distribution represented in the two-dimensional space of LDAvis. Using the LDAvis interface, we set the parameter λ to 0.3 to determine the weight given to the probability of a term under a specific topic relative to its lift (Sievert & Shirley, 2014), and retrieved the 30 most relevant terms of each topic. We gave an interpretation and a title to each topic by analyzing its related terms, which we avoid introducing here due to space constraints, but they are available in Heibi and Peroni (2021b). Topic 6 (“Leadership organization, and management”) was the dominant topic. The topics were distributed in four main clusters, as shown in Figure 10:

  • one composed of topics 2 (“Sociopolitical issues related to leadership”) and 6, concerning issues related to leadership, work organization, and management form a sociopolitical point of view;

  • a large one composed of topics 1 (“Sociopolitical issues possibly related to Vietnam”), 4 (“History of the Jewish culture”), 5 (“Music and psychological diseases”), 11(“Family and religion”), etc. This treats several subjects from different domains close to social sciences, political sciences and psychology; and

  • another two clusters composed of one topic each: topic 16 (“Geography and climatic issues”) and topic 3 (“Colonial history”).

Figure 10.

The 16 topics of TM-Abs (LDA topic modeling on the abstracts of the citing entities). The visualization is taken from LDAvis, and it shows the topic distribution in a two-dimensional space.

Figure 10.

The 16 topics of TM-Abs (LDA topic modeling on the abstracts of the citing entities). The visualization is taken from LDAvis, and it shows the topic distribution in a two-dimensional space.

Close modal

Figure 11 shows the chart generated using MTMvis. We plotted the topic distribution as a function of the three periods. At a first analysis, we noticed how topics 6 and 16 incremented their distribution along the three periods. On the other hand, topics 1 and 11 decreased their percentage throughout the three periods.

Figure 11.

The MTMvis chart created over the 16 topics of TM-Abs (LDA topic modeling on the abstracts of the citing entities). The topics are plotted as a function of the three periods (represented on the x-axis).

Figure 11.

The MTMvis chart created over the 16 topics of TM-Abs (LDA topic modeling on the abstracts of the citing entities). The topics are plotted as a function of the three periods (represented on the x-axis).

Close modal

3.3.2. In-text citation contexts

The total number of in-text citation contexts in our data set, used as input to produce the second topic model, was 786. As we did with the abstracts, we have defined and used a list of ad hoc stop words, which included all the given and family names of the authors of the cited publications.

Figure 12 shows the topics represented in the two-dimensional space of LDAvis. As we did for the abstracts’ topic modeling, we set λ to 0.3 and interpreted each topic by analyzing its 30 most relevant terms (Heibi & Peroni, 2021a, 2021b). In this case, we noticed that the topics are less overlapping and more distributed along the whole axis of the visualization. Topic 12 (“Leadership organization, and management”) is the most representative (11.7%) and was very distant from the other topics. The bottom right part of the graphics—with topics 2 (“Countries in conflict”), 15 (“War and terrorism”), 17 (“War and history”), 18 (“History of Europe”), and 20 (“War and army conflicts”)—is mostly close to history studies, especially discussion of army conflicts. The top part of the graphic contains several single-topic clusters, such as topic 5 (“Gender social issues”) and 9 (“Geography and climatic issues”).

Figure 12.

The 20 topics of TM-Cits (LDA topic modeling on the in-text citation contexts). The visualization is taken from LDAvis and shows the topic distribution in a two-dimensional space.

Figure 12.

The 20 topics of TM-Cits (LDA topic modeling on the in-text citation contexts). The visualization is taken from LDAvis and shows the topic distribution in a two-dimensional space.

Close modal

Figure 13 shows the chart generated using MTMvis, where we plotted the topic distribution as a function of the three periods. We noticed a continuous decrement in topics 7 (“Family and religion”) and 18 along the three periods. Topic 3 (“Drugs/alcohol and psychological diseases”) had a high decrement immediately after P-Ret. On the other hand, we noticed an increment in topics 5, 9, and 11 (“Music and psychological diseases”), although the latter topic had a higher percentage in P-Ret than in P-Post.

Figure 13.

The MTMvis chart created over the 20 topics of TM-Cits (LDA topic modeling on the in-text citation contexts). The topics are plotted as a function of the three periods (represented on the x-axis).

Figure 13.

The MTMvis chart created over the 20 topics of TM-Cits (LDA topic modeling on the in-text citation contexts). The topics are plotted as a function of the three periods (represented on the x-axis).

Close modal

In this section, we address separately each of our research questions RQ1–RQ3 presented in Section 1. We conclude the section by discussing the limits of our work and by sketching out some future work that might help us overcome these issues.

4.1. Answering RQ1: Citing Retracted Publications in the Humanities

It seems that, on average, retracted publications in the humanities did not have a drop in citations after their retraction (Figure 8) and only 2.25% of the citing entities—five Arts and humanities publications and three related to health sciences subject areas (e.g., medicine, psychology, nursing) mentioned the retraction in the citation context. In addition, we noticed that the negative perception of a retracted work, although limited in the data we have, happened before its retraction if the cited entity had a low affinity to the humanities domain. The fact that we reported few negative citations in P-Post is consistent with other studies (Bordignon, 2020; Luwel et al., 2019; Schneider et al., 2020).

Citing entities talking about retraction usually discussed the cited entity rather than obtaining background material from it or generic informative claims (Figure 14). Most of the in-text citations marked as discusses occurred in the Discussion section (as shown in Figure 15), and from TM-Cits we noticed the emerging of topic 6 (“The retraction phenomenon”) in Discussion sections only in P-Post—in other words, the retraction was not mentioned in the Discussion section before the retraction, and the retraction event might have been the trigger of a higher discussion from the citing entities.

Figure 14.

The distribution of topic 6 (“The retraction phenomenon”) of TM-Cits (LDA topic modeling on the in-text citation contexts) over the three periods for the four citation intents that have been used the most.

Figure 14.

The distribution of topic 6 (“The retraction phenomenon”) of TM-Cits (LDA topic modeling on the in-text citation contexts) over the three periods for the four citation intents that have been used the most.

Close modal
Figure 15.

The distribution of the main (positional sections are not included, e.g., first section) in-text citation sections over the three periods. The percentages of in-text citations having a corresponding annotated main section for each period (i.e., P-Pre, P-Ret, and P-Post) are respectively 50.76%, 56.68%, and 61.86%.

Figure 15.

The distribution of the main (positional sections are not included, e.g., first section) in-text citation sections over the three periods. The percentages of in-text citations having a corresponding annotated main section for each period (i.e., P-Pre, P-Ret, and P-Post) are respectively 50.76%, 56.68%, and 61.86%.

Close modal

From the distribution of the subject areas of the citing entities over the three periods (Figure 8), we noticed that Social sciences and Arts and humanities had almost the same percentages in the P-Ret and P-Post periods, which is less than their percentages in P-Pre, suggesting that the retraction event did have an impact on these subject areas. However, other subject areas such as psychology decreased in P-Ret and more in P-Post, which may be an indicator of higher concern in these subject areas toward the citation of retracted publications. This is evidenced by the observation of the TM-Abs topics distribution for the citing entities assigned to psychology (Figure 16), with a clear decrement in the topics related to health sciences, such as topics 10 and 11, whereas others, such as topics 6 and 9 (close to sociohistorical discussions with no relation to health sciences) increased their presence in P-Ret and P-Post. In other words, not only did the overall number of citing entities from the health sciences domain decrease after the retraction, but their subject areas moved from the health sciences domain to subjects that are closer to the Social sciences and Arts and humanities domains.

Figure 16.

A filtered MTMvis to show the distribution of the topics of TM-Abs (LDA topic modeling on the abstracts of the citing entities) as a function of the three periods. The visualization is built considering only the documents (i.e., abstracts) that have Psychology as subject areas.

Figure 16.

A filtered MTMvis to show the distribution of the topics of TM-Abs (LDA topic modeling on the abstracts of the citing entities) as a function of the three periods. The visualization is built considering only the documents (i.e., abstracts) that have Psychology as subject areas.

Close modal

4.2. Answering RQ2: Citation Behaviors in the Humanities

As shown in Figure 6, Religion and History had a very similar distribution pattern. In both, the citing entities belonging to Social sciences had an important decrement in P-Post, and for that period, the TM-Cits of these entities does not include topic 3 (“Drugs/Alcohol psychological diseases”) for Religion and topic 7 (“Family and religion”) for History. We can speculate that Social sciences studies significantly reduced its percentage due to a higher concern toward sensitive social subjects such as healthcare, family, and religion.

Arts had the highest number of citations in P-Post, although we reported an important drop in the Arts and humanities citing entities, in favor of subject areas such as Medicine, Nursing, and Engineering (Figure 6). On the other hand, for Philosophy we had a completely different situation: Citing entities labeled as Arts and humanities incremented a lot in P-Post at the expense of citing entities from Psychology. For the Arts discipline, topic 11 (“Music and psychological diseases”) of TM-Cits is the reason for the positive trend of P-Post. In other words, arts (and especially music) had been discussed with relation to psychological and medical diseases.

In Figure 17, we show the distribution of topic 6 (“The retraction phenomenon”) as a function of the three periods and considering the four humanities disciplines with the higher number of citing entities. Topic 6 increased a lot in P-Post in Philosophy and in Religion it had a steady trend, whereas History and Arts had a peak in P-Ret and a lower, yet relatively high, percentage in P-Post. These results might suggest that the entities that cite retracted publications in Philosophy, Arts, and History (which following the results of the topic modeling analysis produced topics close to STEM disciplines) were those showing major concerns toward the retraction—in the case of History and Arts starting from the year of the retraction.

Figure 17.

The distribution of topic 6 (“The retraction phenomenon”) of TM-Cits (LDA topic modeling on the in-text citation contexts) over the three periods for the humanities disciplines Religion, History, Arts, and Philosophy.

Figure 17.

The distribution of topic 6 (“The retraction phenomenon”) of TM-Cits (LDA topic modeling on the in-text citation contexts) over the three periods for the humanities disciplines Religion, History, Arts, and Philosophy.

Close modal

Considering these hypotheses, we can interpret the fact that History and Arts reached their peak of citations after their year of retraction (Figure 7) as a sign of awareness/acknowledgment regarding the retraction rather than unconsciousness use of the retracted publications, at least for part of these citations.

4.3. Answering RQ3: Comparing STEM and the Humanities

Our findings showed that the retraction of humanities publications did not have a negative impact on the citation trend (Figure 8). The opposite trend was observed in other disciplines, according to prior studies, such as biomedicine (Dinh et al., 2019) and psychology (Yang & Qi, 2020). However, studies, such as Heibi and Peroni (2021a) and Schneider et al. (2020), also observed that in the health sciences domain there were cases where either a single or a few popular cases of retraction were characterized by an increment of citations after the retraction. This might suggest that the discipline related to the retracted publication is not the only central factor to consider for predicting the citation trend after the retraction. Other factors might play a crucial role, such as the popularity of and media attention to the retraction case, as has been discussed in the studies by Mott, Fairhurst, and Torgerson (2019) and Bar-Ilan and Halevi (2017).

The work by Bar-Ilan and Halevi (2018) analyzed the citations of 995 retracted publications and found the same growing trend in the citations in the postretraction period. However, they did not analyze the retraction according to different and separate disciplines. As such, we might consider such results as a representation of a general trend of retracted publications, that confirms the general observations we derived from our data. In addition, considering the results we have obtained for the specific humanities disciplines, it seems as though the potential threats and damage from retracted materials have been perceived more seriously by others (i.e., citing entities) when the retracted publications have been linked to a sensitive area of study and to the STEM domain. This final observation notes the different behaviors that might occur when a retracted publication manifests a higher relation to STEM.

4.4. Limitations and Future Developments

There are some limitations in our studies that may have introduced some biases. First, compared to other fields of study, bibliographic metadata in the humanities have limited coverage in well-known citation databases (Hammarfelt, 2016). This fact led to some limitations when applying a citation analysis in the humanities domain (Archambault & Larivière, 2010). In this regard, a coverage analysis and comparison of the citations in the humanities domain in COCI and MAG might be highly valuable. Other data sources, such as OpenAlex (Priem, Piwowar, & Orr, 2022), a free and open catalog of the world’s scholarly papers, researchers, journals, and institutions, could be considered. Pragmatically, as far as our study is concerned, we undoubtedly collected fewer citing entities than those that had in fact cited the retracted publications. In addition, we have considered only open citation data; therefore the citation coverage could significantly improve with the addition of nonopen citation data. The availability of a larger amount of data could have strengthened and improved the quality of our results.

The selection of the retracted publications was another crucial issue, because we faced two major problems: some inconsistencies in the data provided by Retraction Watch and the presence of retracted publications labeled as humanities that, on close analysis, actually belonged to a different discipline. The first descriptive statistical results, our manual check, and the definition of the humanities affinity score helped us limit the biases of these two issues. However, we could improve the approach adopted by using additional services such as Elsevier’s ScienceDirect—as done in Bar-Ilan and Halevi (2018)—and increasing the threshold of the humanities affinity level to exclude border cases.

A citation analysis concerning retraction in the humanities domain is something that has rarely been discussed in the past, and therefore the discussion of our results included a comparison with similar works that considered different domains or retraction cases. Such works have not addressed the humanities domain or were based either on a single retraction case or a limited set of them. Work that considered other domains did not include most of the features that we have analyzed in this work (e.g., the citation intent), which made the comparison with them difficult. We intend that this study and others to be done in this field can lead to a comparison and improvement in the understanding of the retraction phenomenon in the humanities domain.

We would the like to thank the editor and the reviewers for taking the time and effort necessary to review the paper. We sincerely appreciate all valuable suggestions, which helped us to improve the quality of the paper.

Ivan Heibi: Data curation, Formal analysis, Investigation, Methodology, Software, Visualization, Writing—Original draft, Writing—Review & editing. Silvio Peroni: Conceptualization, Project administration, Supervision, Validation, Writing—Review & editing.

The authors have no competing interests.

This work has been partially funded by the European Union’s Horizon 2020 research and innovation program under grant agreement No 101017452 (OpenAIRE-Nexus).

The data produced in this work (i.e. inputs, annotations, and results) are published and available on Zenodo (Heibi & Peroni, 2021b).

1

We have not described the methodology adopted in full here due to space constraints.

2

A complete list of reasons accompanied by a description is provided by Retraction Watch at https://retractionwatch.com/retraction-watch-database-user-guide/retraction-watch-database-user-guide-appendix-b-reasons/.

3

We used their REST APIs in June 2021 to retrieve citation information.

4

If we could not access the full text of a citing entity (e.g., due to paywalls restrictions), the corresponding entity was still considered in our data set. However, we did not use it for the qualitative postanalysis described in Sections 3.2.2 and 3.3. Details about the number of entities for which we could not retrieve are introduced in Section 3.2.1.

5

Exceptions to this rule (e.g., when the anchor sentence is the last one of a paragraph) are discussed in Heibi and Peroni (2022).

6

A detailed explanation regarding the calculation of the periods is discussed in Heibi and Peroni (2022).

Archambault
,
É.
, &
Larivière
,
V.
(
2010
).
The limits of bibliometrics for the analysis of the social sciences and humanities literature
. https://ost.openum.ca/files/sites/132/2017/06/WSSR_ArchambaultLariviere.pdf
Ataie-Ashtiani
,
B.
(
2018
).
World map of scientific misconduct
.
Science and Engineering Ethics
,
24
(
5
),
1653
1656
. ,
[PubMed]
Azoulay
,
P.
,
Bonatti
,
A.
, &
Krieger
,
J. L.
(
2017
).
The career effects of scandal: Evidence from scientific retractions
.
Research Policy
,
46
(
9
),
1552
1569
.
Barbour
,
V.
,
Kleinert
,
S.
,
Wager
,
E.
, &
Yentis
,
S.
(
2009
).
Guidelines for retracting articles
.
Committee on Publication Ethics
.
Barde
,
B. V.
, &
Bainwad
,
A. M.
(
2017
).
An overview of topic modeling methods and tools
. In
2017 International Conference on Intelligent Computing and Control Systems (ICICCS)
(pp.
745
750
).
IEEE
.
Bar-Ilan
,
J.
, &
Halevi
,
G.
(
2017
).
Post retraction citations in context: A case study
.
Scientometrics
,
113
(
1
),
547
565
. ,
[PubMed]
Bar-Ilan
,
J.
, &
Halevi
,
G.
(
2018
).
Temporal characteristics of retracted articles
.
Scientometrics
,
116
(
3
),
1771
1783
.
Bengfort
,
B.
,
Bilbro
,
R.
, &
Ojeda
,
T.
(
2018
).
Applied text analysis with Python: Enabling language-aware data products with machine learning
.
O’Reilly Media, Inc
.
Boldt
,
J.
(
2000
).
The good, the bad, and the ugly: Should we completely banish human albumin from our intensive care units?
Anesthesia & Analgesia
,
91
(
4
),
887
895
. ,
[PubMed]
Bolland
,
M. J.
,
Grey
,
A.
, &
Avenell
,
A.
(
2021
).
Citation of retracted publications: A challenging problem
.
Accountability in Research
,
29
(
1
),
18
25
. ,
[PubMed]
Bordignon
,
F.
(
2020
).
Self-correction of science: A comparative study of negative citations and post-publication peer review
.
Scientometrics
,
124
(
2
),
1225
1239
.
Bornemann-Cimenti
,
H.
,
Szilagyi
,
I. S.
, &
Sandner-Kiesling
,
A.
(
2016
).
Perpetuation of retracted publications using the example of the Scott S. Reuben case: Incidences, reasons and possible improvements
.
Science and Engineering Ethics
,
22
(
4
),
1063
1072
. ,
[PubMed]
Brainard
,
J.
(
2018
).
What a massive database of retracted papers reveals about science publishing’s “death penalty.”
Science
,
25 October
.
Brownlee
,
J.
(
2019
).
A gentle introduction to the Bag-of-Words model
. https://machinelearningmastery.com/gentle-introduction-bag-words-model/
Campos-Varela
,
I.
,
Villaverde-Castañeda
,
R.
, &
Ruano-Raviña
,
A.
(
2020
).
Retraction of publications: A study of biomedical journals retracting publications based on impact factor and journal category
.
Gaceta Sanitaria
,
34
(
5
),
430
434
. ,
[PubMed]
Candal-Pedreira
,
C.
,
Ruano-Ravina
,
A.
,
Fernández
,
E.
,
Ramos
,
J.
,
Campos-Varela
,
I.
, &
Pérez-Ríos
,
M.
(
2020
).
Does retraction after misconduct have an impact on citations? A pre–post study
.
BMJ Global Health
,
5
(
11
),
e003719
. ,
[PubMed]
Casadevall
,
A.
,
Steen
,
R. G.
, &
Fang
,
F. C.
(
2014
).
Sources of error in the retracted scientific literature
.
The FASEB Journal
,
28
(
9
),
3847
3855
. ,
[PubMed]
Chuang
,
J.
,
Manning
,
C. D.
, &
Heer
,
J.
(
2012
).
Termite: Visualization techniques for assessing textual topic models
. In
Proceedings of the International Working Conference on Advanced Visual Interfaces
(pp.
74
77
).
Collier
,
R.
(
2011
).
Shedding light on retractions
.
Canadian Medical Association Journal
,
183
(
7
),
E385
E386
. ,
[PubMed]
Corbyn
,
Z.
(
2012
).
Misconduct is the main cause of life-sciences retractions
.
Nature
,
490
,
21
. ,
[PubMed]
Dinh
,
L.
,
Sarol
,
J.
,
Cheng
,
Y.
,
Hsiao
,
T.
,
Parulian
,
N.
, &
Schneider
,
J.
(
2019
).
Systematic examination of pre- and post-retraction citations
.
Proceedings of the Association for Information Science and Technology
,
56
(
1
),
390
394
.
Fang
,
F. C.
, &
Casadevall
,
A.
(
2011
).
Retracted science and the retraction index
.
Infection and Immunity
,
79
(
10
),
3855
3859
. ,
[PubMed]
Feng
,
L.
,
Yuan
,
J.
, &
Yang
,
L.
(
2020
).
An observation framework for retracted publications in multiple dimensions
.
Scientometrics
,
125
(
2
),
1445
1457
.
Ferri
,
P.
,
Heibi
,
I.
,
Pareschi
,
L.
, &
Peroni
,
S.
(
2020
).
MITAO: A user friendly and modular software for topic modelling
.
PuntOorg International Journal
,
5
(
2
),
135
149
.
Gasparyan
,
A. Y.
,
Ayvazyan
,
L.
,
Akazhanov
,
N. A.
, &
Kitas
,
G. D.
(
2014
).
Self-correction in biomedical publications and the scientific impact
.
Croatian Medical Journal
,
55
(
1
),
61
72
. ,
[PubMed]
Gaudino
,
M.
,
Robinson
,
N. B.
,
Audisio
,
K.
,
Rahouma
,
M.
,
Benedetto
,
U.
, …
Fremes
,
S. E.
(
2021
).
Trends and characteristics of retracted articles in the biomedical literature, 1971 to 2020
.
JAMA Internal Medicine
,
181
(
8
),
1118
1121
. ,
[PubMed]
Grossarth-Maticek
,
R.
, &
Eysenck
,
H. J.
(
1990
).
Personality, stress and disease: Description and validation of a new inventory
.
Psychological Reports
,
66
(
2
),
355
373
. ,
[PubMed]
Halevi
,
G.
(
2020
).
Why articles in arts and humanities are being retracted?
Publishing Research Quarterly
,
36
(
1
),
55
62
.
Hammarfelt
,
B.
(
2016
).
Beyond coverage: Toward a bibliometrics for the humanities
. In
M.
Ochsner
,
S. E.
Hug
, &
H.-D.
Daniel
(Eds.),
Research assessment in the humanities
(pp.
115
131
).
Springer International Publishing
.
Heibi
,
I.
(
2022
).
A guiding diagram for the selection of a CiTO citation function for a given in-text citation
.
Zenodo
.
Heibi
,
I.
, &
Peroni
,
S.
(
2021a
).
A qualitative and quantitative analysis of open citations to retracted articles: The Wakefield 1998 et al.’s case
.
Scientometrics
,
126
(
10
),
8433
8470
. ,
[PubMed]
Heibi
,
I.
, &
Peroni
,
S.
(
2021b
).
Inputs and results of “A quantitative and qualitative citation analysis to retracted articles in the humanities domain”
[Data set]. Zenodo
.
Heibi
,
I.
, &
Peroni
,
S.
(
2022
).
A protocol to gather, characterize and analyze incoming citations of retracted articles
.
PLOS ONE
,
17
(
7
),
e0270872
. ,
[PubMed]
Heibi
,
I.
,
Peroni
,
S.
, &
Shotton
,
D.
(
2019
).
Software review: COCI, the OpenCitations Index of Crossref open DOI-to-DOI citations
.
Scientometrics
,
121
(
2
),
1213
1228
.
Hendricks
,
G.
,
Tkaczyk
,
D.
,
Lin
,
J.
, &
Feeney
,
P.
(
2020
).
Crossref: The sustainable source of community-owned scholarly metadata
.
Quantitative Science Studies
,
1
(
1
),
414
427
.
Jelodar
,
H.
,
Wang
,
Y.
,
Yuan
,
C.
,
Feng
,
X.
,
Jiang
,
X.
, …
Zhao
,
L.
(
2019
).
Latent Dirichlet allocation (LDA) and topic modeling: Models, applications, a survey
.
Multimedia Tools and Applications
,
78
(
11
),
15169
15211
.
Lu
,
S. F.
,
Jin
,
G. Z.
,
Uzzi
,
B.
, &
Jones
,
B.
(
2013
).
The retraction penalty: Evidence from the Web of Science
.
Scientific Reports
,
3
(
1
),
3146
. ,
[PubMed]
Luwel
,
M.
,
van Eck
,
N. J.
, &
van Leeuwen
,
T. N.
(
2019
).
The Schön case: Analyzing in-text citations to papers before and after retraction
[Preprint]. SocArXiv
.
Mongeon
,
P.
, &
Larivière
,
V.
(
2016
).
Costly collaborations: The impact of scientific fraud on co-authors’ careers
.
Journal of the Association for Information Science and Technology
,
67
(
3
),
535
542
.
Mott
,
A.
,
Fairhurst
,
C.
, &
Torgerson
,
D.
(
2019
).
Assessing the impact of retraction on the citation of randomized controlled trial reports: An interrupted time-series analysis
.
Journal of Health Services Research & Policy
,
24
(
1
),
44
51
. ,
[PubMed]
Mößner
,
N.
(
2011
).
RETRACTED: Thought styles and paradigms: A comparative study of Ludwik Fleck and Thomas S. Kuhn
.
Studies in History and Philosophy of Science Part A
,
42
(
3
),
416
425
.
Ngah
,
Z. A.
, &
Goi
,
S. S.
(
1997
).
Characteristics of citations used by humanities researchers
.
Malaysian Journal of Library & Information Science
,
2
(
2
),
19
36
.
Nikpay
,
F.
,
Ahmad
,
R.
,
Rouhani
,
B. D.
, &
Shamshirband
,
S.
(
2020
).
RETRACTED ARTICLE: A systematic review on post-implementation evaluation models of enterprise architecture artefacts
.
Information Systems Frontiers
,
22
(
3
),
789
. .
(Retraction published 2016, https://doi.org/10.1007/s10796-016-9716-0)
OpenCitations
. (
2020
).
COCI CSV dataset of all the citation data
(p. 18077041949 Bytes) [Data set]. figshare
.
Peroni
,
S.
, &
Shotton
,
D.
(
2012
).
FaBiO and CiTO: Ontologies for describing bibliographic resources and citations
.
Journal of Web Semantics
,
17
,
33
43
.
Peroni
,
S.
, &
Shotton
,
D.
(
2018
).
Open Citation: Definition
.
Peroni
,
S.
, &
Shotton
,
D.
(
2020
).
OpenCitations, an infrastructure organization for open scholarship
.
Quantitative Science Studies
,
1
(
1
),
428
444
.
Priem
,
J.
,
Piwowar
,
H.
, &
Orr
,
R.
(
2022
).
OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts
(arXiv:2205.01833). arXiv
.
Ritchie
,
A.
,
Robertson
,
S.
, &
Teufel
,
S.
(
2008
).
Comparing citation contexts for information retrieval
. In
Proceedings of the 17th ACM Conference on Information and Knowledge Management
(pp.
213
222
).
Schmiedel
,
T.
,
Müller
,
O.
, &
vom Brocke
,
J.
(
2019
).
Topic modeling as a strategy of inquiry in organizational research: A tutorial with an application example on organizational culture
.
Organizational Research Methods
,
22
(
4
),
941
968
.
Schneider
,
J.
,
Ye
,
D.
,
Hill
,
A. M.
, &
Whitehorn
,
A. S.
(
2020
).
Continued post-retraction citation of a fraudulent clinical trial report, 11 years after it was retracted for falsifying data
.
Scientometrics
,
125
(
3
),
2877
2913
.
Shuai
,
X.
,
Rollins
,
J.
,
Moulinier
,
I.
,
Custis
,
T.
,
Edmunds
,
M.
, &
Schilder
,
F.
(
2017
).
A multidimensional investigation of the effects of publication retraction on scholarly impact
.
Journal of the Association for Information Science and Technology
,
68
(
9
),
2225
2236
.
Sievert
,
C.
, &
Shirley
,
K. E.
(
2014
).
LDAvis: A method for visualizing and interpreting topics
.
Sternberg
,
R. J.
(
2006
).
RETRACTED ARTICLE: The nature of creativity
.
Creativity Research Journal
,
18
(
1
),
87
98
. .
Suppe
,
F.
(
1998
).
The structure of a scientific paper
.
Philosophy of Science
,
65
(
3
),
381
405
.
van der Vet
,
P. E.
, &
Nijveen
,
H.
(
2016
).
Propagation of errors in citation networks: A study involving the entire citation network of a widely cited paper published in, and later retracted from, the journal Nature
.
Research Integrity and Peer Review
,
1
,
3
. ,
[PubMed]
Wang
,
K.
,
Shen
,
Z.
,
Huang
,
C.
,
Wu
,
C.-H.
,
Dong
,
Y.
, &
Kanakia
,
A.
(
2020
).
Microsoft Academic Graph: When experts are not enough
.
Quantitative Science Studies
,
1
(
1
),
396
413
.
Yang
,
S.
, &
Qi
,
F.
(
2020
).
How do retractions influence the citations of retracted articles?
In
E.
Ishita
,
N. L. S
Pang
, &
L.
Zhou
(Eds.),
Digital libraries at times of massive societal transition
(pp.
139
148
).
Springer International Publishing
.

Author notes

Handling Editor: Ludo Waltman

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. For a full description of the license, please visit https://creativecommons.org/licenses/by/4.0/legalcode.