Retracted articles use less free and open-source software and cite it worse

Abstract As an essential mechanism of scientific self-correction, articles are retracted for many reasons, including errors in processing data and computation of results. In today’s data-driven science, the validity of research data and results significantly depends on the software employed. We investigate the relationship between software usage and research validity, eventually leading to article retraction, by analyzing software mentioned across 1,924 retraction notices and 3,271 retracted articles. We systematically compare software mentions and related information with control articles sampled by coarsened exact matching by recognizing publication year, scientific domain, and journal rank. We identify article retractions caused by software errors or misuse and find that retracted articles use less free and open-source software, hampering reproducible research and quality control. Moreover, such differences are also present concerning software citation, where retracted articles less frequently follow software citation guidelines regarding free and open-source software.


INTRODUCTION
As science across all disciplines becomes increasingly data driven, software plays an increasingly important role in scientific investigations (Goble, 2014;Hannay, MacLeod et al., 2009).In surveys, 91% to 95% of scientists state that they are using software (Goble, 2014;Nangia & Katz, 2017) and 63% are convinced that they could not perform their research without it (Nangia & Katz, 2017).Furthermore, scientists increasingly need to develop software themselves, with 84% reporting that developing software is essential for their research (Goble, 2014).Not surprisingly, an analysis of the 100 most cited articles unveiled multiple articles describing software that significantly influenced the field, for instance, BLAST and CLUSTALW in the area of bioinformatics or SHELX, HKL, and PROCHECK in crystallography (Noorden, Maher, & Nuzzo, 2014).However, in recent years, there have also been multiple reports about research data and results compromised by unintended behavior or errors of the employed software (Eklund, Nichols, & Knutsson, 2016;Zeeberg, Riss et al., 2004;Ziemann, Eren, & El-Osta, 2016).It has, for instance, been shown that about 31% of research data from human gene analyses processed with Microsoft Excel were altered by the software's auto conversion features (Abeysooriya, Soria et al., 2021).
Software used in scientific investigations should be adequately indicated (Soito & Hwang, 2017) to ensure reproducibility and provide provenance information about the research results (Katz, Hong et al., 2021).Moreover, it is important to provide proper attribution to developers of software, who often spend significant parts of their academic careers on developing and maintaining research software (Chawla, 2016).Although much effort has been spent on establishing and describing software citation principles (Alliez, Cosmo et al., 2020;Cosmo, Gruenpeter, & Zacchiroli, 2020;Druskat, 2020;Katz et al., 2021;Smith, Katz, & Niemeyer, 2016), it has been shown that these principles have not yet become common in scholarly practice (Du, Cohoon et al., 2022;Howison & Bullard, 2016;Schindler, Bensmann et al., 2022).Currently, software is either indicated in a similar way to scientific instruments by providing its name accompanied by the version, developer, and a link to an online repository such as GitHub, or formally with a bibliographic reference for a corresponding software article or a direct software citation following software citation principles.Although both styles allow unique identification of the used software and are, therefore, valid regarding reproducibility, recent work has shown that information is often missing for in-text citation (Du et al., 2022;Howison & Bullard, 2016;Schindler et al., 2022).Moreover, formal citations can be systematically extracted and analyzed, whereas in-text mentions are significantly more challenging to study.
Access to specific software used or created in investigations can be a prerequisite for repeatability and reproducibility of the results (Krishnamurthi & Vitek, 2015).In science, commercial software, typically established by corporations, and free software, often implemented as part of publicly funded research projects, are used.Commercial software requires a financial commitment by either a purchase of the software or appropriate licenses.The source code of free software is usually published under open-source licenses allowing inspection and comprehension of the implementation details.In contrast, the source code of commercial software is commonly closed source and, therefore, often used without being thoroughly validated (Russo & Voigt, 2016).As the incentives for creating free and commercial software differ strongly, the citation and attribution also vary, as shown by Du et al. (2022).Proprietary software does not depend on attribution, as it is commercially distributed and financially compensated (e.g., by licensing fees).However, research software (Sochat, May et al., 2022), for instance, resulting from research projects, depends on proper attribution as a feedback mechanism for the creator, publishers, and funding agencies.
Retraction is a self-correction mechanism established in the scientific community (Ajiferuke & Adekannbi, 2018).The number of article retractions has increased throughout recent decades in absolute and relative numbers (Oransky, 2022;Shuai, Rollins et al., 2017;Steen, 2011b;Van Noorden, 2011).Different reasons for this increase have been described (Steen, Casadevall, & Fang, 2013), one of which is lower barriers to publishing flawed articles.For retraction itself, there are also several possible reasons.The two most prominently investigated reasons are Misconduct and Errors, where Misconduct refers to scientific fraud and has been suggested to be performed deliberately (Steen, 2011a).Early work indicates that Error is the most common reason (Steen, 2011b), but later results suggest that Misconduct is more likely as its number was previously underestimated (Fang, Steen, & Casadevall, 2012).Retraction occurs after articles have passed their peer review and have been published.Therefore, it has been suggested that retractions are more likely to occur in high-impact journals, as their articles receive more attention and more rigorous screening after publication (Cokol, Iossifov et al., 2007).Recent work suggests that more articles might need to be retracted (Oransky, 2022) when viewed based on the criteria from the Committee on Publication Ethics, a nonprofit collective in Eastleigh, United Kingdom.
Manually gathered examples from the Retraction Watch (RW) database provide some clues about how software and article retraction can be related.One publication, for instance, had to be retracted because Excel re-sorted single data columns upon import (Klingbeil, Brandt et al., 2021;Marcus, 2021).Similarly, data rows shifting between observation groups were reported when data were exported from Excel to be further analyzed with SPSS (Wallensteen, Zimmermann et al., 2018).Another case describes that while working with SPSS, the authors accidentally used the wrong input data for analyses (Jafari, 2019).All cases were caused by errors in software usage and unintended software behavior, which significantly impacted the study results.
Although anecdotal evidence for the relation between software and retraction exists and software has previously been associated with errors in investigations, there has been no systematic analysis of the differences in the scientific software landscape of retracted articles.This study, investigates the relationship between software and retraction by analyzing retraction notices and the software landscape of retracted articles.For this purpose, we analyzed a set of 1,924 retraction notices as well as 3,271 retracted articles in comparison to 32,710 nonretracted control articles.As a result, we identified multiple instances where software has been the primary reason for article retraction.Furthermore, our results provide evidence that the software landscape in retracted articles is less diverse and that less free and open-source software is employed.Moreover, they show that citation practices for free and open-source software in retracted articles are negligent, with a substantially lower amount of formal citations.

METHODS AND MATERIALS
This section describes our approach and the data resources used to systematically analyze the relation between software and article retraction.

Retraction Notices
Initially, we analyzed retraction notices for software mentions to analyze if there is a directly observable relationship between article retraction and software.The corresponding data collection process is illustrated in Figure 1 (A and B).Retraction notices are released by scientific publishers to announce the retraction of scientific articles and the corresponding

Quantitative Science Studies 822
Retracted articles use less free and open-source software and cite it worse reasons.Explicit mentions of software within these notices could mean that software had a direct influence on article retractions and give valuable insights into the impact of software on science.
The information and content of retraction notices varies depending on publishers and editors.They have been described as ranging from informative and transparent to deeply obscure ( Van Noorden, 2011).As they are publisher specific, they cannot be easily mined and accessed automatically, which is why the manually curated RW database, described in Section 2.2, is a valuable resource for investigating retractions.However, some notices are available from Pubmed Central and can automatically be identified by the corresponding publication type.
We obtained information on software mentions in these notices from SoftwareKG (Schindler et al., 2022), a knowledge graph containing information on software mentions for all Pubmed Central (PMC) open access publications covering 11.8 million software mentions in 3.2 million publications.It includes the software entities themselves, associated information (such as version and developer), and information closely identifying the context of software mentions.Furthermore, it provides disambiguation of software mentions based on unique identifiers for scientific software.We aggregated information on software in all publications indexed in SoftwareKG under publication type "retraction." Overall, we identified 1,924 retraction notices in Pubmed Central.We manually verified all instances where software was identified within these retraction notices and performed an analysis of all corresponding contexts to describe the reason why the software was stated.Additionally, we downloaded the corresponding retraction notice full-text documents from PMC and performed a keyword search for the term software to identify generic references to software, because SoftwareKG only contains mentions of named software, but did not identify any further cases relevant to our analyses.The process of gathering the information is also illustrated in Figure 1.

Software in Retracted Articles
Next, we analyzed software used within retracted articles, because even if software is not directly stated as a reason for retraction, differences in software usage practices can still be related to retractions.To this end, software mentions were identified within a set of retracted articles and compared to a control set of nonretracted articles, selected by coarsened exact matching.The results were analyzed with respect to the software used, its availability, and to software citation habits by analyzing the available information associated with software.The data collection pipeline is illustrated in Figure 1.

Data sources
Initially, we gathered a list of retracted articles to analyze.The utilized data sources are illustrated in Figure 1(A).As described in Section 2.1, it is not directly possible to automatically identify article retractions.Therefore, we utilized RW as a knowledge base on article retractions.It contains manually gathered information on retractions of scientific articles, including a fine-grained summary of the corresponding reasons leading to the retraction.Altogether, the database includes 32,127 entries ranging from 1756 to 2022.We obtained the current database from RW on January 6, 2022, and considered all articles published between 2000 and 2019 in our analyses.We excluded earlier publications because too few data samples are available per year, and later publications are excluded because full-text information is only available up to 2019 from S2ORC (see below).We use RW's information on the original article DOI and publishing journal for identification and utilize the retraction reason for analyses.
For the analyses, it is also necessary to obtain the full-text documents of the retracted articles.This is also a challenging task, as scientific publications are often paywalled or only available in PDF format, and we require the plain text of publications.We solve these issues by working on the S2ORC corpus and automatically matching it against the list of retracted articles.The S2ORC corpus (Lo, Wang et al., 2020) contains numerous English academic papers with metadata and plain, full-text documents for a subset of papers.When it was published, it contained 8.1 million full-text documents, but since then, it has been further extended to cover more than 12.7 million articles.We obtained the latest version of the S2ORC corpus (20200705v1), which contains full-text documents for articles published before April 2020.We use the provided article DOIs and the publishing journal for identification and information on the research domain and publication year to select control articles, as described below.Articles for which no full-text document was available could not be considered for further analysis.
Additionally, we gathered data on the Scimago Journal Rank (SJR) (SCImago, n.d.) for all journals extracted from S2ORC.The SJR provides a publicly available ranking of scientific journals with broad overall coverage.The corresponding data is available on a yearly basis and was manually downloaded from the SCImago website on June 16, 2022.We used the SJR to sort the journals and to divide them into percentiles.We matched the available journal names from SJR to journal names provided in RW and S2ORC, as the information is necessary for the control sample selection described below.As the information is required, we excluded all articles for which the information could not be identified.

Control article selection
Overall, a set of 3,374 retracted articles with available full-text documents and journal information were identified.To perform a valid statistical analysis with respect to software usage and software citation habits we compared them against a control set of nonretracted articles (see Figure 1(C)).We selected suitable nonretracted control papers for our analysis by coarsened exact matching (Iacus, King, & Porro, 2012).The idea is to generate a set of control articles for the given retracted articles, which is equally distributed in all variables that are attributed as an influence on the dependent variable.The literature (Schindler et al., 2022) has identified three variables influencing software usage and citation habits: Publication Date, Scientific Domain, and Journal Rank.We control those variables on an article basis as follows: Publication Date is coarsened to the year of publication; Scientific Domain is matched exactly, keeping the order of domains intact as we consider the first named domain the most influential for multidisciplinary work; and Journal Rank is coarsened to percentiles determined yearly.We do not control specific journals as we argue that the influence of journals on software citation is due to the Scientific Domain of the journal and Journal Rank.Especially for interdisciplinary journals, it could add a bias when matching journals directly instead of matching the Scientific Domain of specific articles.
Based on the controlled variables, we randomly selected 10 controls from the available S2ORC data for each retracted article without replacement.This was not possible for all articles as rare combinations of the controlled variables occurred in the RW data.Overall, 96.9% (3,271 out of 3,374) of articles were matched, and the remaining unmatched articles were excluded from further analyses, resulting in a control set of 32,710 articles.As 10 control articles correspond to each specific retracted article we can analyze arbitrary subsets of the retracted articles independent of their distribution, which is necessary, as, for instance, certain retraction reasons might be more likely in specific scientific fields.

Information extraction
We automatically extract information on mentioned software, associated information, mention context, and software identity using the state of the art SoftwareKG information extraction pipeline (Schindler et al., 2022), which recognizes software and its related information with an FScore of 88.5%.Plain text S2ORC articles are initially preprocessed, and named entities, associated information, and mention context are extracted by the pretrained SoftwareKG classifier, which is based on a SciBERT model and solves all tasks in a hierarchical, multitask classification.Software names are further disambiguated with the SoftwareKG pipeline to establish unique software identities across both article sets, which is a requirement for many of the described analyses.The information extraction step is illustrated in Figure 1(D) and a detailed description of the SoftwareKG information extraction pipeline is available in Schindler et al. (2022).Note that the CZI Software Mention data set (Istrate, Li et al., 2022) contains software mentions in scientific articles at a similar scale to the readily available S2ORC.However, it does not contain all information necessary for analyses of citation completeness, which is why we performed a custom analysis here.
Although high performance rates have been reported for the software information extraction, we took into account errors in all steps (for instance, through PDF conversion artifacts).To limit the potential influence of large-scale errors, we performed quality control by manually analyzing results during Information Enrichment, described in Section 2.2.4.Moreover, we argue that errors-especially on a small scale-do not bias the results of comparisons for retracted and control sets because errors are equally likely to be present in both sets.
Software citation habits are analyzed by considering citation completeness.Software can be mentioned without further specifications, which we consider as No Info.It can also be mentioned with a corresponding developer and version number, which we consider as Complete Info.This style of mentioning software is also typical for mentioning scientific instruments and has been described in the literature on software citation (Howison & Bullard, 2016).If only partial information on the software is provided (version or developer) it is considered Partial Info.In contrast, software can also be formally cited with a bibliographic reference by citing a software article, an article published to describe specific scientific software, or a direct software citation following software citation standards (Katz et al., 2021;Smith et al., 2016).We consider this case as a Formal Citation.In this investigation, a Formal Citation is regarded as the gold standard for software citation; therefore, even if other information is present, the mention is still considered a Formal Citation, which makes all classes mutually exclusive.Providing developer and version is also an acceptable way to mention software as it allows unique identification of the used software.Nevertheless, it has the drawback of requiring a full-text scan of articles to identify the software mention and does not allow for bibliometric analyses.

Information enrichment
We further enriched the information extracted with the SoftwareKG pipeline by adding manually curated information on whether software is free or commercial and whether it is opensource or proprietary.Moreover, we noted whether software is statistical software, as we are interested in analyzing the most common statistics tools.Because adding this information is a considerable manual effort, we limited it to the most common software under the condition of covering at least 50% of overall software mentions (both absolute and unique software per

Quantitative Science Studies 825
Retracted articles use less free and open-source software and cite it worse article).This resulted in manual annotation of the 243 most common pieces of software.The corresponding data collection step is illustrated in Figure 1(E).
During information enrichment, we manually checked the validity of the results for the most common software by checking whether extracted names do refer to software and whether there are disambiguation errors for software identities.We identified two large-scale extraction errors where methods were systematically confused with software.Furthermore, we identified errors in the disambiguation: 12 false negative errors, where two software tools were not disambiguated; 10 large-scale false positive errors, where two software tools were wrongly disambiguated; and 14 small-scale false positives, where single occurrences were incorrectly accounted to a software.All described errors were manually updated before executing the statistical analyses.We assume that potential further errors will be contained in the long tail of software names as they are likely to result from extraction errors that are unlikely to be disambiguated to other software.On the other hand, the long tail may contain further mentions that should be disambiguated to other software, but as we covered more than 50% of the mentions, these can only have a limited impact on the results.Therefore, we consider the reported results for the most common software and software availability valid.
We also extended the information on retraction reasons described in Section 2.2, which closely identify the circumstances of an article retraction.Overall, there are a large number of different reasons, which can, for instance, vary due to differing communications about the retraction by publishers.To analyze the relationship between the retraction reason, software usage, and citation habits, we summarize the reasons as done in prior work (Ribeiro & Vasconcelos, 2018).We manually determined semantic top-level categories by analyzing the reason descriptions provided by RW.For instance, we, summarized all the reasons indicating plagiarized work, including text, images, and data.Reasons that could not be semantically assigned to any top-level reason were summarized as other, which includes reasons such as Ethical Violations, Rogue Editor, or Copyright Claims.A detailed overview of the summarized reasons can be found in the Supplementary material.

Implementation
Data processing was implemented by using Python version 3.9.10(Van Rossum & Drake, 2009) and R version 4.2.1 (R Core Team, 2022).In particular, full-text retrieval was done using Python (Van Rossum & Drake, 2009) in combination with pandas in version 1.4.0 (McKinney, 2010) and software information extraction with SoMeNLP as of February 2022 (Schindler, 2022).Data analysis was implemented in R using tidyverse version 1.3.1 (Wickham, Averick et al., 2019), magrittr version 2.0.2 (Bache & Wickham, 2022) for data wrangling, effectsize version 0.8.3 (Ben-Shachar, Lüdecke, & Makowski, 2020) for statistical tests, and ggplot2 version 3.3.5 (Wickham, 2016) and patchwork version 1.1.1(Pedersen, 2020) for visualization.Detailed documentation of the computational environment for data analysis is provided at the end of the supporting information.

RESULTS
Here, we summarize the results of our analyses regarding the relation between software and article retraction.First, we describe the results of manually examining the context of software mentions in retraction notices.Then, we present results regarding the differences in the landscape of scientific software in retracted articles, with respect to the used software, its availability, and software citation habits based on an evaluation of the available associated information.

Quantitative Science Studies 826
Retracted articles use less free and open-source software and cite it worse

Software in Retraction Notices
We identified 16 (0.8%)retraction notices out of 1,924 that contain software mentions, where the software is related to the articles' retraction1 .The corresponding reasons why the software was mentioned were manually examined and summarized, and are outlined in detail below.

Software errors
The most common reason why software is mentioned within these notices is biased study results caused by errors in software usage.This affects nine notices, with six describing a mistake using point-and-click software concerning SPSS (Altunbas, Unsalver, & Yasar, 2019), Excel (Hunter & Prüss-Ustün, 2017), Photoshop (Li, Wang et al., 2016), uVariants and uDesign (Al-Koofee, Ismael, & Mubarak, 2019), and Gene Expression Omnibus (Liu, Zhang et al., 2020).Another three notices are related to programming where source code errors led to data processing errors, twice for Matlab (Sun, Jiang et al., 2018;Mann, Perna et al., 2012) and once for Perl (Hall & Salipante, 2007).One item of software was not specified (Gupta, Nagilla et al., 2012).Eight of the nine retraction notices described above clearly state how the error was discovered.In four cases, the software error was identified by the authors after publication.
In three cases, abnormalities in the research papers were found by others, and the error itself was recognized by the authors afterward.In one case, inaccuracies were identified by others without further elaboration.

Study design
There are four instances in which the study design, including the use of specific software, led to unreliable study results.One retraction notice described the study design, including the use of the software DaysyView, as unreliable (Koch, Lermann et al., 2019).Furthermore, we found two cases of inconsistent published data concerning Excel (Nguyen, Dayan et al., 2018) and TreeBASE (Khodami, McArthur et al., 2020).Another notice suggested the application of BLAST for validation, which would have revealed a data error (Mu, Yang et al., 2017).

Journal policy
Two notices reported cases in which the software publication violated the journal policy, leading to the retraction of the corresponding software articles.The software GNARE required user registration even though the journal demanded unrestricted access (Glass, Rodriguez et al., 2012).The second notice affects TREEFINDER, where the lead author changed the license after publication, no longer allowing usage of the software in the United States and multiple European countries.At the same time, the journal policy requires the software to be available for all scientists (Jobb, Von Haeseler, & Strimmer, 2015).Notably, the TREEFINDER publication had a high impact, with 833 citations indexed on Web of Science from its publication in 2004 to its retraction in 2015, with an additional 154 citations after retraction.

Reporting error
One notice remarked the wrongly reported version of SPSS as a contributing reason for retraction (Zhang, Chen et al., 2012).
With respect to the retraction reasons introduced in Section 3.2, 13 out of the 16 cases are classified as honest errors, but the instance regarding Photoshop is labeled as a misconduct and intentional altering of study results.The remaining instances correspond to the category other and concern the cases where authors violated journal policies.
We further extend this analysis to cover specific retraction reasons.As described in Section 2.2, we analyze the characteristics of different retraction reasons by comparing the articles retracted for a particular reason to the corresponding control set created by selecting the 10 control articles for each retracted article.This results in a separate set of retracted and control articles for each retraction reason and ensures that both sets are equally distributed for every comparison.The number of articles containing software per retraction reason is illustrated in Figure 2 and reveals further differences in software usage.For most reasons, the overall trend can be observed with a higher relative number of articles containing software in retracted articles.An exceptionally high proportion can be observed for PaperMill articles, with more than 99% of the articles containing software.At the same time, large proportions are also present for Error, Investigation, and SelfPlagiarism.In contrast, articles retracted due to Plagiarism are less likely to contain software.
Looking at articles that mention at least one item of software, we observed that retracted articles contain less software with an average of 2.92 (95% CI: [2.79, 3.05]) pieces of software compared to 3.32 (95% CI: [3.27,3.38]) in control articles, a difference found to be significant, t(2765) = 5.6, p < .001,with a negligible effect of Cohen's d = 0.11.In both sets, the frequencies increased similarly over the analyzed years.For further details see Figure S4 in the Supplementary material.Similar to the overall result, we found retracted articles to contain fewer software mentions across all individual retraction reasons except for Plagiarism and other, where we found no difference, illustrated in Figure 2.
Besides the plain number of software items per article, the frequencies for the mentioned software also differ in retracted articles, illustrated in Figure 3. SPSS is the most frequently used software in both sets, but there is a high difference in the number of articles mentioning it: 35.8% of retracted and 20.3% of control articles, respectively.Prism and ImageJ are also central software in both sets, which are mentioned more in retracted articles with 13.3% and 12.7% compared to 9.8%, 8.4% in the control set.A high difference can be observed for R, used in only 4.6% of the retracted articles but the third most common software in the control set (9.0%).TargetScan is a notable case, as it is mentioned in 7.9% of retracted articles but only in 1.0% of control articles.A closer analysis concerning retraction reasons (see Supplementary Figure S8) revealed that TargetScan is mostly mentioned within articles retracted due to Error, Investigation, SelfPlagiarism, other, and especially in PaperMill, where 39.7% (95% CI: [33.2, 46.3]) of retracted and 3.7% (95% CI: [2.8, 4.7]) of control articles mention the software.
A closer look at the most common statistical software, illustrated in Supplementary Figure S7, showed that the distribution is strongly skewed towards SPSS and Prism in the retracted set.In contrast, a more diverse set of tools is employed in control articles, indicated by higher usage of R, SAS, Excel, Matlab, and Stata.This led us to further investigate the overall distribution of  software by analyzing how many articles mention any of the top n software tools.By analyzing the cumulative distribution of the n most frequently used software, we found that less software is used in a higher number of articles in the retracted set (see Supplementary Figure S9 for details).In detail, for retracted articles, the top 1 software covers more than 25% of articles and the top 2 software are required to cover 25% in the control set.This difference increases to top 3 compared to top 7 software at 50% of articles and top 21 to top 74 at 75%.Software is often not mentioned by its original name, but authors use spelling variations, abbreviations, or full forms (Schindler et al., 2022).We analyzed how individual software is indicated in retracted articles but did not observe any systematic trend.Within retracted articles, for instance, SPSS is mentioned by a less diverse set of names in comparison to control articles and mainly indicated by its standard name, whereas an inverse trend exists for ImageJ.Mentions for Quantity One, on the other hand, are similarly distributed.The results are also illustrated in Supplementary Figure S10.
Because citation practices differ between free and commercial software (Du et al., 2022;Howison & Bullard, 2016), we further analyzed this difference concerning retracted articles.All trends described in the following for free and commercial software are also present for open-source and proprietary software (details are given in the Supplementary material, particularly Figures S16 and S18).The results are provided in Figure 5 and show that commercial  software is similarly cited in both retracted and control articles.At the same time, there is a notable difference in the citation style for free scientific software.Commercial software is most frequently mentioned similarly to scientific instruments by providing version and developer.It is also a common practice to only mention incomplete information by providing either version or developer.On the other hand, formal citations are almost never used: only 1.5% and 3.1% of instances in retracted and control articles, respectively.
Free software is most commonly mentioned without any information in both sets, with a higher proportion of 49.3% in retracted articles compared to 36.9% in control articles.It is only rarely mentioned similarly to scientific instruments with 8.1% and 5.5% in retracted and control articles, respectively, but partial information is more commonly provided in 26.5% and 26.9% of cases.A high difference can be observed for formal citation, with only 16.1% of retracted articles providing formal citations whereas 30.7% of control articles formally cite free software.To estimate the difference of formal software citation in retracted articles, we employed a logistic regression with the following covariates: retraction, number of software items per article, availability of the software in terms of free and open-source software, and all pairwise interactions; see Table 2.We found that control articles are significantly (p < .001,OR = 2.89, 95% CI: [1.94, 4.30]) more likely to formally cite software usage than retracted articles.Furthermore, we found that with increasing number of software items per article the ratio of formal citation increases (p < .001,OR = 1.28, 95% CI: [1.20, 1.37] per software item), independently of other covariates.Last, the test also showed that, independently of other covariates, free and open-source software are far more likely (p < .001,free OR = 8.02, 95% CI: [4.81, 13.36] and open-source OR = 13.35,95% CI: [6.90,25.85]) to receive formal citations compared to commercial and closed-source software, which confirms previous studies (Du et al., 2022;Howison & Bullard, 2016).
Based on the initial findings, we further analyzed the citation habits for free and commercial software concerning specific retraction reasons, summarized in Figure 6.We found commercial software to be similarly mentioned between retracted and control articles, consistent with the overall trend except for articles retracted due to reason PaperMill, where software is notably more likely to be mentioned similarly to instruments with 69.0% to 54.7%.
Regarding the difference in formal citation of free software, we found the highest divergence for articles retracted due to PaperMill with 0% to 28.5% for retracted and control articles, respectively.Large differences are also present for Investigation (8.7%, 30.3%),Misconduct (11.0%, 31.4%), and SelfPlagiarism (6.1%, 33.6%), but Error (16.7%, 31.9%) is close to the overall trend.A smaller difference is present for other (19.6%, 29.6%), but no difference can be observed for Plagiarism.

LIMITATIONS
Our findings seem to be valid compared to other studies on software mentions in scholarly publications.The automatic extraction of software and associated details and the subsequent automatic disambiguation of software name spelling variations produced reasonable results.Schindler et al. (Schindler et al., 2022), for instance, report that 59.5% of 3.2 million articles mention at least one item of software comparable to the 58.1% as observed for control articles in our study.They further report an average of 3.67 software mentions per article.The average number of different software items of 3.32 per control article observed in our study is in line with these values.Regarding spelling variations, we observed that 79.4% of all SPSS-related mentions use the name SPSS, which is at a similar level as 78.4% as reported by Schindler et al. (2022).
Overall, we only matched full-text and journal information for ≈10% of articles contained in the RW database (see Section 2.2), because a high number of the publications are hidden behind paywalls.The full-text documents we did obtain were extracted from the S2ORC

Quantitative Science Studies 833
Retracted articles use less free and open-source software and cite it worse corpus.The article selection is, therefore, likely biased by the article composition in S2ORC, with a dominant bias towards open access publications.For this reason, the overall sample size of retracted articles is also too small to implement specific analyses, such as investigating disciplinary differences.
Furthermore, the sample number of analyzed retraction notices is low and has only exemplary character.The current analysis uses only retraction notices from PMC, as it allows the systematic identification of retraction notice full-text documents.One caveat of using PMC is that this repository is skewed towards publications in biomedical domains and thus the results presented in this article may not be generalized to other domains.We did not include further notices because there is currently no corpus of full-text document retraction notices available.However, we strongly encourage further exploration in this area.

DISCUSSION
Software is an integral part of modern science and thus influences certain aspects of the data analysis and results reported in scholarly articles.Retraction, on the other hand, is an essential mechanism for implementing scientific self-correction (Ajiferuke & Adekannbi, 2018).Our study found evidence that software was involved in the retraction of multiple articles and revealed differences in the scientific software landscape and software citation habits in retracted articles.
Our analysis shows that software has been explicitly stated in retraction notices and has been the cause of the retraction of articles, with a majority of errors resulting from usage errors and unintended software behavior of which authors were unaware.Although these cases only contribute to a small number of overall retractions, they highlight the strong influence of software on science, as a single action by the software caused most reported errors.We further found that authors themselves catch most errors, highlighting the challenges for reproducibility but also suggesting that there might be other unnoticed cases.A potential solution would be to include analysis scripts as part of the scholarly publication, such as interweaved as one literate data analysis document, as peer review can reveal such errors.
Retracted articles indicate less software as well as a less diverse set of software, as observed by the increased usage of the most frequently used software.Moreover, retracted articles more frequently employ commercial and closed-source software in favor of free and open-source software.Furthermore, citation completeness is lacking in retracted articles for free and open-source software, whereas completeness for commercial and proprietary software is almost equal to the control set.Retracted articles are more likely to mention free and opensource software without any information, such as version or developer and, in turn, are less likely to provide proper formal citations.
We found that articles automatically generated by PaperMill techniques systematically differ from other articles and exaggerate software mentions.The generation mechanisms have learned that software is typically present in scientific publications, which is why almost every PaperMill publication contains software.Furthermore, they have learned to mimic proper citation for commonly applied, well-established tools.This is supported by our findings that complete information of proprietary software is provided at a very high rate, but information on free and open-source software is at a particularly low rate.Looking at specific software, it becomes clear that PaperMills have learned that SPSS is a commonly used tool, as they include it in over 70% of publications.
As our results provide the first evidence that software is related to retraction, we encourage future work to systematically analyze the content of retraction notices at a larger scale.The main effort in this regard is to establish a corpus of retraction notice full-text documents.Although unique identifiers for retraction notices are available from RW, it is challenging to obtain the documents, as they are publisher specific and require significant effort to gather.Further research directions, such as analyzing domain-specific differences requires increasing the sample size for full-text documents of retracted articles, for instance by reaching out to publishers.Finally, additional investigation is needed to determine the underlying reasons for the differences in software usage and the lack of rigor in software citation within retracted articles.A potential approach could be to reach out to authors with a questionnaire regarding their choice in software usage.

DATA AVAILABILITY
RW data are available from The Center For Scientific Integrity, the parent nonprofit organization of RW, subject to a standard data use agreement (details at https://retractionwatch.com /retraction-watch-database-user-guide/).Additionally, we publish anonymized aggregated information on software in retracted articles with permission of RW to allow direct repeatability of the described statistical analyses together with the source code at https://github.com/dave-s477/Software-and-Article-Retraction.S2ORC data is available for noncommercial use upon requesting access (details at https://github.com/allenai/s2orc).SCImago Journal Rank is publicly available from https://www.scimagojr.com/.The SoftwareKG information extraction pipeline is available from Github at https://github.com/dave-s477/SoMeNLP,and SoftwareKG data is available from Zenodo (https://doi.org/10.5281/zenodo.5780121).The Pubmed Central Open Access subset is available from https://www.ncbi.nlm.nih.gov/pmc/tools/ftp/.The analysis code, including results and figures, is available as a literate data analysis document from the supplementary material.

Figure 1 .
Figure 1.Flowchart illustrating the data collection pipeline for retraction notices and full-text documents of retracted and control articles.Left: Pipeline for retraction notices.Right: Pipeline for article selection and information extraction (IE) from articles.

Figure 2 .
Figure 2. Software mentions in scholarly articles per retraction reason separated by retracted and corresponding control articles.The sets of control papers are constructed by selecting the 10 corresponding articles for each retracted article.Top: Proportion of articles that contain at least one software mention.Bottom: Average number of software mentions per article with at least one software mention.Error bars indicate 95% CIs.

Figure 3 .
Figure 3. Proportion of retracted and control articles mentioning software out of the top 20 most used software.Error bars indicate 95% CIs.

Figure 4 .
Figure 4. Proportion of free or open-source software across retracted and control articles.Error bars indicate 95% CIs.

Figure 5 .
Figure 5. Proportion of software across different levels of citation completeness, separated by retracted and control articles.No Info: Neither the version nor the developer of software is provided; Incomplete Info: Either version or developer is provided; Informal Citation: Version and developer are provided; Formal citation: software mention is accompanied by bibliographic citation.Error bars indicate 95% CIs.

Figure 6 .
Figure 6.Proportion of software mentions across different levels of citation completeness per retraction reason, separated by retracted and control articles.No Info: Neither the version nor the developer of a software is provided; Incomplete Info: Either version or developer is provided; Informal Citation: Version and developer are provided; Formal citation: software mention is accompanied by bibliographic citation (independent of any associated information).Error bars indicate 95% CIs.

Table 1 .
Result of the logistic regression predicting the software availability for retracted papers considering the number of software per article

Table 2 .
Result of the logistic regression predicting formal citation for retracted papers considering the number of software items per article and the software availability