The overall scope of this study is an attempt at a methodological framework for matching publication lists at the national level against a combined set of blacklists for questionable publishing. Using the total verified set of publications from Swedish Higher Education Institutions (HEI) as a case, we examined the number, distribution, and proportion of publishing in questionable journals at the national level. Journal publication data was extracted from the national SwePub database and matched against three curated blacklists of questionable publishing. For the period 2012–2017, we identified 1,743 published papers in blacklisted journals, equal to an average of 0.5–0.9% of the total publications from Swedish HEIs. There was high variability between different HEI categories, with more established universities at the lower end of the spectrum, while university colleges and new universities had a much higher proportion (∼2%). There was a general decreasing trend during the study period (ρ = 0.83) for all categories of HEIs. The study presents a methodology to identify questionable publishing in academia that could be applied to other countries with similar infrastructure. Thus, it could serve as a starting point for the development of a general framework for cross-national quantitative estimation of questionable publishing.
There is an increasing focus on publications in scholarly research. Previously regarded as the outcome of a successful research project, researchers are nowadays often expected to specify how their results will be published already when planning research projects or applying for funding. Publishing also plays a role in the evaluation of individual researchers’ performance and the allocation of resources to and within academia. This is often popularized by the slogan “publish or perish.” Over the last decade, there has also been a rise in open access (OA) journals financed through article processing charges (APCs), paid mainly by the author, instead of the older system of subscription fees paid by the reader.
These two parallel trends have opened up a market for unscrupulous actors preying on unwary or desperate researchers. Dubious editorial activities include claiming to give peer review, although clearly not doing so, as well as accepting any submitted manuscript as long as the APC is paid. These journals also operate under false pretenses (e.g., using high-status entity names in titles without having real connections, inventing fake journal impact metrics, or listing prominent researchers as editors without their knowledge). Some studies have also raised the issue of hijacked journals (Dadkhah & Maliszewski, 2015; Jalalian & Dadkhah, 2015; Shahri, Jazi, et al., 2018) where the titles and ISSNs of both existing and discontinued journals have been picked up by shady actors. The problem is perceived as so widespread that tools for distinguishing real from hijacked journals have been developed (Asadi, Rahbar, et al., 2017).
The phenomenon is sometimes referred to as “predatory” publishing, but due to the intricate activities of several actors as described above, a better term might be “questionable” or “unethical” publishing practices, thus not blaming any single actor, but viewing it as a phenomenon worthy of empirical study in itself. The reasons researchers chose to publish in such journals are lack of awareness, speed, and ease of the publication process, and a chance to get work rejected elsewhere published (Shaghaei et al., 2018).
The ethical aspects of questionable publishing have been conceptualized and discussed in several publications and in a variety of research fields, such as nursing, biomedicine, economics, and agriculture (da Silva, 2017; Eriksson & Helgesson, 2017; Ferris & Winker, 2017; Gasparyan, Yessirkepov, et al., 2016; Josephson & Michler, 2018; McLeod, Savage, & Simkin, 2018).
Questionable publishing practices negatively affect legitimate journals, policymakers, and research funders and are seen by most stakeholders as an imminent threat to institutional integrity, reputation, and public support. The issue is of great importance and merits an investigation, especially in countries like Sweden, where almost all universities are public and most research is financed by taxpayers. However, academia is struggling to find ways to tackle this issue.
Firstly, there is no consensus on how to identify and classify journals as predatory or not. Secondly, as many predatory journals are not indexed in established databases, the identification of questionable published papers is difficult (Shen & Björk 2015). Even when agreed criteria are developed, the authors found that it was difficult to evaluate single publishers and journals based on these, since stated information did not always seem to match practical considerations about how publishing actually was done, in which country they operated, or how the peer review process actually worked. Consequently, it has not been possible to estimate the full scope of the problem.
Early attempts to categorize and collect blacklists of predatory journals have been met with complaints and alleged hostility toward the creators and curators of such lists. A list curated by an individual librarian at the University of Colorado, Jeffrey Beall, was publicly available but discontinued in 2017, although archives exist online. Despite the merits and impact of Beall’s blacklist, justified criticism has also been raised against it, especially regarding how some criteria for inclusion on the list were employed. Also, due to difficulties in maintaining the list, its accuracy over time was debated. However, most attempts to assess or discuss the issue of predatory publishing have frequently used Beall’s list as their only source, and thus it has been highly influential.
Several actors have made efforts to fill the void after the discontinuation of Beall’s list. The US-based publisher of journal directories, Cabell Inc., recently launched a new blacklist of journals, where they score journals against 66 publicly available quality criteria (Das & Chatterjee, 2018; Hoffecker, 2018). The Ministry of Health and Medical Education in Iran also curates a publicly available blacklist and the Directory of Open Access Journals (DOAJ) provides a list that contains journals that are no longer indexed in the general DOAJ directory due to the stated ground of “suspected editorial misconduct by publisher.” Here, we propose that using a combined set of blacklists overcomes some of the issues of lists being biased toward specific research areas or interests. While a certain overlap is expected, we argue that the addition of more sources is needed, since this is a new area where no established practices for defining and identifying sources have been developed.
In a wider context, there is a discussion regarding the role of standards and how institutional knowledge is constructed in a specific context within science and technology studies. It should also be noted that there is an ongoing debate as to whether blacklisting of journals fulfills any valuable purpose at all (Editorial, 2018; Strinzel, Severin, et al., 2019). Some argue that this is an issue that cannot be weeded out as long as there is a demand from authors and that the solution is to rank all journals. Indexing of legitimate journals, using so-called whitelists, is another approach, where large journal directories such as Web of Science (WoS), Scopus, PubMed, and the DOAJ act as gatekeepers. However, populating a whitelist with acceptable journals is arguably as difficult as identifying questionable journals, and it is well known that suspected questionable journals manage to get listed in these directories (Bohannon, 2013; Haider & Åström, 2017; Manca, Moher, et al., 2018). Additionally, noninclusion in a whitelist is not a clear sign that a journal is questionable, in the same manner that noninclusion in a blacklist signals that the journal is of acceptable quality. This is because there could be many different reasons for not being included, such as the journal being newly instigated or not known to the indexers. In a study that compares two blacklists (Cabell’s and Beall’s) and two whitelists (Cabell’s and DOAJ), it was found that there are overlaps in the contents of both blacklists and whitelists, and that inclusion criteria are vague and biased (Strinzel et al., 2019). In this study we handled the latter by evaluating the criteria used in the inclusion process for the blacklists, only using criteria that we found to be adequate for the purposes of the study.
The closest studies to the present one are Eykens, Guns, et al. (2018) and Eykens, Guns, et al. (2019). For some time, this group has used Beall’s lists in conjunction with WoS and the DOAJ whitelist in order to screen registered publications in the performance-based research funding system in Flanders, Belgium. Lately, they have added Cabell’s list to their efforts. In their study, they reported 556 “potentially predatory open access” articles during the years 2004 and 2015 in the Flemish VABB-SHW publication system. Despite the shortcomings of blacklists, we believe that they are currently the best available method to identify questionable research. Taken together they provide a breadth of inclusion criteria, yet focus on the issue at hand.
This paper provides a methodological framework for evaluating the scope of national publications in questionable journals during 2012–2017. Sweden is used as a case for testing the methodology since the authors are familiar with its HEI system. The aim can be articulated in the following specific research questions:
What proportion of serial academic publishing can be matched to alleged questionable publishing and are there any trends over time?
Are there differences in questionable publishing patterns between
different types of HEIs?
To what degree do different journal blacklists cover the same journal titles?
2. MATERIALS AND METHODS
Here, we will describe the methodological framework developed for matching suspected questionable publishing at the national level. The choice of using Sweden as a case was deliberate, since the authors are familiar with its HEI landscape and have good contacts within the National Library of Sweden, which hosts the SwePub publication database that was used as base data. Interestingly, it was found that third-party data such as CrossRef DOIs could not be used, since many questionable publishers’ publications are not indexed in any way. To produce solid research on these data, we developed a quite labor-intensive semiautomatic approach. Therefore, instead of making a comparative analysis at a shallow level, which might contribute to fuelling preconceived notions when comparing countries, we opted for an in-depth study to look at different kinds of higher education institutions. The Swedish HEIs are divided between broader (comprehensive) established universities, specialized universities, new universities, and university colleges (Hansson, Barriere, et al., 2019). This made it possible to try to understand if there are differences at this level instead. It is our intention to scale up the methodology for other countries providing the availability of comparable publication data.
2.1. Sources of Blacklisted Journals
2.1.1. Cabell’s blacklist
We received the blacklist curated by Cabell’s International Inc in August 2018. It contained 9,503 journals from 446 unique publishers. It included information on journal name, publisher, and (for some journals) ISSNs. As noted above, Cabell’s uses 66 different criteria to qualify the inclusion of journals. During the process, we decided to benchmark the Cabell criteria against subjective judgments of the “seriousness” of each criterion. The two authors ranked each criterion based on a three-level scale, ranging from 1, “not serious,” to 3, “highly serious” criteria. The rule for inclusion of a Cabell’s blacklisted journal was the following:
One or more 3s,
Two or more 2s, or
One 2 and at least two 1s.
In all, 49 entries were identified as published in one of 14 journals that had articles matched only in Cabell’s that were subjectively judged not serious. Six more journals were identified, but since these were also matched in either the Iran or the DOAJ list, these articles were left in the data set. The results are found in the supplementary material, Table XI.
2.1.2. The Iran MHME list
This is a curated blacklist maintained by the Ministry of Health and Medical Education in Iran (http://blacklist.research.ac.ir/). It includes information on journal name, publisher, ISSN, Journal website URL, and when the journal was added to the list. There is a field in the database, called ”Status,” with two different inclusion criteria stated: جعلی and نامعتبر, meaning “invalid” and “fake,” respectively. Journals matching both criteria were included in the matching process. The first set of criteria consisted of 12 journals that are known from the literature to be hijacked (Jalalian & Dadkhah, 2015). In matching, publications from these journals were manually inspected to identify if it was a publication in the hijacked version that was matched. The complete list was downloaded on September 5, 2018 and contained 2,180 titles from 88 different publishers.
2.1.3. DOAJ list
Since 2014, the DOAJ has provided a Google spreadsheet workbook of journals “Added,” “Removed,” and “Failed to submit a reapplication” of changes to the database. We chose the most conservative approach, selecting only journals that have been removed from their directory list because of “Suspected editorial misconduct by publisher” (https://blog.doaj.org/2014/05/22/doaj-publishes-lists-of-journals-removed-and-added/), leaving journals out of the matching exercise regarded as “not adhering to best practice,” (1,349 entries) and “Ceased publishing” and “Inactive” (259 and 238 entries, respectively). The complete list was downloaded from their website (see URL above) on October 6, 2018 and contained 642 titles with associated ISSNs that were removed due to “Suspected editorial misconduct.”
2.2. Publication Databases
SwePub is the centralized Swedish collection of published literature from all Swedish universities and a selection of public research organizations (Sīle, Pölönen, et al., 2018). It was developed by the National Library of Sweden. It collects data from the local publication databases of its participants. Most Swedish research organizations, including HEIs, regularly upload their publication lists to SwePub and maintain the database, including correcting errors and managing duplicates.
2.2.2. Clarivate Web of Science list
The Swedish Research Council calculates the research output of Swedish HEIs annually based on Clarivate WoS data. The data on full and fractionalized publications per HEI and year for the time period were obtained with kind permission (Act # 3.1-2018-6882).
2.3. Search Strategy
Publication data for all Swedish Higher Education Institutions during the years 2012 through 2017 were extracted from SwePub on October 4, 2018, by using a SparQL query for extracting journal data information. A total of 651,002 entries were returned in the query.
In SwePub we included all “journal publishing” categories. All but 19 entries were labeled with “ref,” meaning that they had supposedly undergone peer review before being published. After the full matching procedure described below, the search results included entries labeled as editorials (n = 2), letters (n = 2), journal articles (n = 1,716) and review articles (n = 25). In order to capture the scientific literature, we did not include “magazine publishing,” a category which includes other types of periodical publishing, such as leisure magazines, newspapers, and popular science magazines.
2.3.1. Data management
In order to reduce the risk of false positives in the blacklists due to journals with titles very similar to already established journals (homonyms), we conducted manual searches in the ISSN portal online (https://portal.issn.org). By establishing information such as country of publication and the organization responsible for the journal, and by browsing the journal’s webpages, we were able to identify the relationship between an article and the specific journal. For instance, by identifying the article at the journal’s webpage, or by matching the DOI address structure with other articles in the same journal, we were able to determine the relationship to a high degree of certainty. We also matched journal titles and publishers with the SwePub data by using DOIs to identify ISSNs where these were missing in the source data.
Publication databases such as SwePub are prone to errors, as they are built on data originally submitted by researchers, which is then evaluated and corrected by librarians prior to uploading the data to the national database (e.g., titles could have spelling errors or be shortened, and ISSN IDs could be written in different forms). Because of the size of the original data set (almost 1 million entries), we generally did not correct the data, other than harmonizing the ISSNs that were sometimes misrepresented by missing or different hyphen types.
Data was deduplicated at record ID level (_recordID).
Due to a known error in the SwePub database at the time of data extraction, some data for Malmö University College was duplicated. These publications were manually removed from the data set during analysis. Additionally, the Swedish Agricultural University had not reported publications correctly in 2017, and consequently had no data to match in that year.
Journal titles (_channel) and publishers (_publishers), as well as ISSNs (as found in SwePub), were matched against the blacklists in October (using the latest snapshots from the respective blacklists (Cabell’s, latest entry 09 11 2018; IRAN MHME, dated 08 05 2018, and DOAJ, latest entry 10 01 2018).
At the suggestion of one of the reviewers, we conducted two additional tests to evaluate that matched entries would not be found to be false positives: We first matched the set of matched entries to the so-called Norwegian list (NSD; October 29, 2019 NSDs register over vitenskapelige publiseringskanaler – tidskrifter og serier.xlsx). Additionally, we matched entries containing a WoS ID, Document Object Identifier (DOI), or PubMed (PMID) ID. The results of each of these steps are reported below.
Matching publication records with blacklisted journals was carried out in a hierarchical procedure.
Identical ISSNs in SwePub and blacklist(s).
Identical journal title AND publisher name in SwePub and blacklist(s).
Manual matching of entries with matching journal titles using the DOI string with the same structure as a matched publication, and/or using other metadata information, such as article title, names, and affiliations, to strengthen the link between the SwePub entry and the journal at hand.
After this procedure, 1,799 published works in blacklisted journals were identified.
Hereafter, a manual process took place of selecting only journals matching our evaluative criteria for Cabell’s (49 entries omitted) as well as the evaluation against the NSD (0 entries omitted) and WoS databases (7 entries omitted) for whitelists.
A total of 1,743 published works in blacklisted journals were identified. 1,548 were identified in step 1, 34 in step 2, and 161 in step 3. The final data set of matched publications is available in the supplementary material.
2.4. University Type Aggregation
Sweden has four distinct types of HEIs and results were aggregated according to this classification (Hansson et al., 2019). The University of Gothenburg, Linköping University, Lund University, Stockholm University, Umeå University, and Uppsala University are part of the broader (comprehensive) established universities.
Chalmers University of Technology, Karolinska Institutet, KTH Royal Institute of Technology, Luleå University of Technology, and SLU—Swedish University of Agricultural Sciences—are specialized universities with a high proportion of research on specific topics such as technology, medicine, or agriculture.
A third group, consisting of Karlstad University, Linnaeus University, Mid Sweden University, and Örebro University, are previous university colleges that were upgraded to full university status about 15 years ago and hence designated new universities.
Blekinge Institute of Technology, Dalarna University, the University of Borås, the University of Gävle, Halmstad University, Jönköping University, University of Skövde, Kristianstad University, University West, Malmö University, Mälardalen University, and Södertörn University were all university colleges during the study period. Malmö University was upgraded to full university status in 2018. In Sweden, university colleges were initially created with the aim of providing both academic and professional training at the tertiary level, much like the former binary divide in the UK. This form roughly correlates with polytechnics in that they do not have a general right to issue degrees at the postgraduate level, but must apply for each right, and that there is a lower share of government funding issued for research.
2.5. Statistical Methods
Full counts were generally used for the presentation of data, since reporting of the number of local authors and total numbers did not seem to be fully consistent. The difference was quite consistent, though, with fractional numbers being 43–44% higher for all HEI types except for comprehensive universities, where the difference was 35%. The latter was used, though, for comparing with WoS identified publications by the Swedish Research Council (supplementary material, Tables III and VI). In order to calculate fractionalized counts for each HEI, we divided the count given in the fields for affiliations with that for authors (_numLocalCreator/_creatorCount); that is, we calculated ”affiliation_shares,” which is roughly comparable to fractional publications for an HEI.
Proportions were calculated using standard descriptive statistics and the linear correlation between the share of RQP and time was calculated using Pearson correlation coefficients. Coword analysis and visualization were performed using VOSviewer v 1.6.8 (van Eck & Waltman, 2009).
3.1. Suspected Questionable Publications (SQP) in Relation to Total Publication 2012–2017
In our analysis of the 5-year period 2012–2017, we identified 1,743 journal publications that matched with the blacklists. The size of the institution was correlated with the total number of publications and SQP (supplementary material, Table I). In order to put the numbers into context, we calculated the proportion of fractionalized SQP in relation to the total number of journal publications as found in SwePub. Figure 1 shows the proportion of SQP in relation to total publishing during the years 2012–2017. Spearman’s correlation was run to determine the relationship between SQP and time over six years. There was a strong, negative monotonic correlation between SQP share and year (ρ = −0.83, n = 6, p < 0.05). The overall proportion over the period was 0.73% of all journal publications entered in SwePub.
The proportion of SQP journal publishing was not evenly distributed between the Swedish universities. A difference can be discerned between comprehensive universities and specialized universities (median share = 0.55, 0.43%) and new universities and university colleges (both with a median share = 1.96, 2.13%) (Table 1; supplementary material, Tables II and IV). However, there were considerable differences between these four categories. For example, within the group of comprehensive universities, the rate of SQP was almost three times higher at the University of Gothenburg (0.97%) compared to Uppsala University (0.35%) (supplementary material, Table II). Most of the university colleges had a total rate above 1% SQP during the full period, and as many as four HEIs had an SQP ratio of 3% or higher for the period (Figure 2). An unflattering national record was set in 2013 by Kristianstad University when 7.5% of their total reported journal publishing was in blacklisted journals (supplementary material, Table II).
|.||Total n .||Total SQP n .||Median .||Mean .||Range (min, yearly) .||Range (max, yearly) .||Range (min) .||Range (max) .|
|.||Total n .||Total SQP n .||Median .||Mean .||Range (min, yearly) .||Range (max, yearly) .||Range (min) .||Range (max) .|
As was mentioned above, the trend over time is declining. When different HEI types are distinguished, there is a clear difference in the share of SQP between new universities and university colleges, on the one hand, and comprehensive universities and specialized universities, on the other (Figure 3). Still, the declining trend is similar within each HEI type, at roughly half the share in the last year of the analysis as opposed to the year with the highest share.
3.2. Overlap with External Databases
3.2.1. The Norwegian NSD database
An analysis based on whitelists was performed after the main analysis was done.
The cleaned ISSNs from the 1,743 papers were matched against the Norwegian NSD database. Table 2 shows the number of matched entries in any of the three blacklists used in the analysis. Seventy-three articles in 28 journals were matched as being at level 1 for NSD 2019. Of these, two were raised from level 0 in the 2019 edition of NSD, and one was newly introduced in 2019. Additionally, two journals were demoted to level 0 for the coming year (2020). Four of the journals were identified as SQP in two blacklists, while 24 were identified in only one.
|Blacklist .||Matched .||Not Matched .|
|Blacklist .||Matched .||Not Matched .|
Six journals were indexed in the regular DOAJ index. Of these, two were each listed in the IRAN MHME list and the Cabell’s list, while three were also listed as “suspected editorial misconduct by publisher.” The reason for the inclusion in both the regular DOAJ and the list of removed journals is due to the publishing year of the matched entries. Three of these journals were entered into DOAJ in 2018 and 2019 after being suspended in 2017, which means that they have been rehabilitated in the eyes of DOAJ, while it should be noted that one of these was also included in Cabell’s, albeit for a reason for blacklisting that could be debatable, namely that “authors are published several times in the same journal and/or issue.”
There are clear discrepancies between the lists. One journal found at level 1 in NSD was also listed in the DOAJ general index since 2017. At the same time, it was also listed in Cabell’s blacklist with no fewer than seven criteria of malpractice, including “no physical or fake addresses to editorial offices or publisher,” “similar title as a legitimate journal,” and “prominent promises of unusual quick peer review.” Here, the decision about which criteria should prevail is difficult, but since there was no error in the matching process (i.e., no false positives due to entry error in SwePub), it was retained in the analysis. Ultimately, no entries were omitted after this analysis, but it shed light on some of the issues in combining blacklists and whitelists, that the same entry could actually appear in both. The full matching exercise may be found in the supplementary material, Table VII.
3.2.2. Clarivate WoS
Of the 1,743 publications matched in the set, 1,313 had either a WoS ID (n = 60, 55 matched), DOI (n = 1,275, 65 matched), or PubMed ID (n = 112, 31 matched). A total of 91 WoS indexed entries were matched in the set. These represent 38 different journals. Five were included in two of the blacklists, while 33 were only matched in one. Table 3 shows the number of matched entries in any of the three blacklists used in the analysis. Additionally, each journal was manually searched for in the so-called Norwegian list (NSD) (https://dbh.nsd.uib.no/publiseringskanaler). If the title was either not listed at all, or marked with a zero, indicating that it was not judged as a valid peer-reviewed journal by the Norwegian authorities, it is marked as “matched” in the table. The number of matched journals in the blacklists were quite evenly distributed between the blacklists, with DOAJ and the NSD standing out with larger numbers. The calculation of Cohen’s Kappa coefficient for interrater reliability between the blacklists (using NSD as a fourth list) also showed the largest overlap between NSD and DOAJ for the set of journals that were identified in WoS (Table 4).
|Blacklist .||Matched .||Not matched .|
|Blacklist .||Matched .||Not matched .|
Table 5 shows that 22 out of the 38 journals that were identified were found in the Emerging Sources Citation Index (ESCI), while 16 were found in one or two of the original WoS databases (SCI-e, SSCI). No matches were found in A&HCI. The ESCI database lists journals that have not (yet) been selected for inclusion in Clarivate’s so-called Flagship Citation indexes, and are not included in the JCR, or assigned a JIF (Testa, n.d.). The full set of data is available in the supplementary material, Tables VIII and IX, where additional information, such as if the journal was demoted in 2019 in the NSD list and the blacklist it was matched in, as well as the corresponding WoS database the journal was identified in. It should also be noted that four of these journals were additionally identified in the list of journals removed from the Elsevier Scopus list between the years 2013 and 2016, based on “Publication concerns” (https://www.elsevier.com/solutions/scopus/how-scopus-works/content).
|WoS database .||n .|
|SCI-e + SSCI||1|
|WoS database .||n .|
|SCI-e + SSCI||1|
3.3. SQP in Relation to WoS Indexed Publishing 2012–2017
We compared SQP in relation to the number of fractionalized publications as calculated annually by the Swedish Research Council based on WoS data. Fractionalization was calculated by dividing the number of authors with the total number of authors (_numLocalCreator/_creatorCount) in for the entry in the SwePub database. The percentages refer to the comparison between the number of SQP (fractionalized as calculated within SwePub data) with the fractionalized volume indicator calculated by the Swedish Research Council for each year. The comparison then reads: If the proportion of SQP is 1% then there is 1 SQP publication for every 100 WoS indexed output (fractionalized) from the organization. What is noteworthy is that the percentage of SQP/WoS-indexed from comprehensive universities varies from 1% to up to 3% (Table 6; supplementary material, Tables III and V), and specialized universities have lower median (0.8%), but higher extremes (11.3%), while the percentage of SQP/WoS-indexed from new universities and university colleges is about 6–7% with quite a few individual years noted above 10% for SQPs. Publishing in WoS-indexed journals was quite rare by researchers at university colleges, which could explain the extreme interyear variability.
|.||Median .||Range (min, yearly) .||Range (max, yearly) .|
|.||Median .||Range (min, yearly) .||Range (max, yearly) .|
3.4. SQP in Different Research Topics
A total of 638 different journal titles were identified in the 1,743 matched articles. From these, it was possible to create a set of 700 noun phrases. After removing the two most often occurring terms (“journal” and “international journal”), the largest segment the software was able to identify was a set of 219 terms that were connected to each other in a coword analysis. The result is shown in Figure 4. The most frequent topics were Nursing, Education, and Business. This does not reflect the relative research output of these areas in the Swedish context, which is dominated heavily by medicine and technology both in terms of number of publications and financial resources (Hansson et al., 2019).
3.5. Blacklist Coverage and Overlap
In Table 7 and Figure 5, a Venn diagram shows the overlap of matched entries in the three blacklists that were used in this project. In terms of matched entries of SQP after combining automatic publisher/name/ISSN-matching and manual additions there was a significant difference. The actual overlap of matched entries (with Cohen’s Kappa coefficient in parenthesis) was Cabell’s–Iran MHME: 146 (−0.58), Cabell’s–DOAJ: 292 (0.05), Iran MHME–DOAJ: 23 (−0.50), Cabell’s–Iran MHME-DOAJ: 9. Cabell’s was by far the largest database and it also had the highest capture rate (904 of 1,743). However, only 3% of its journals had published work submitted by Swedish researchers. The Iran MHME and DOAJ lists were both smaller (777/1,743 and 518/1,743 respectively), but a larger proportion of their journal titles were matched (13% and 23% respectively).
4.1. Suspected Questionable Publishing Within Swedish Academia
This study found that less than one in a hundred scientific publications from Swedish higher education institutions were in questionable journals. However, there were marked differences between broad-based established and technical universities on the one hand, and newer higher education institutions with a lower total research output on the other. University colleges in particular, but also new universities, had a substantial proportion of their output published in questionable journals.
This can be explained by several factors, such as lack of resources allocated to newer HEIs, which makes it difficult to recruit and retain talent. Since research is also a smaller part of these HEIs’ mission (the primary one being education at bachelor level), it is possible that both adherence to scholarly norms and practice, as well as structures for formal and informal quality control are less well-developed at these institutions due to a lack of critical mass of active researchers in each subject area.
Another reason for the differences might be differences in research profiles in different categories of HEIs. Our results indicated that publishing in questionable journals is more likely to be found in applied research areas, as well as in newly professionalized and academicized areas. Both of these are predominantly found in new universities and university colleges, where the share of teaching-intensive areas is more prominent. Nursing, Education, and Business stand out, probably reflecting the “demand” among researchers in these research areas. These areas have undergone an “academization” during recent decades, and it is possible that their publishing cultures are less mature. Natural and medical science, which usually require more resources, are concentrated in larger HEIs.
There is a decline in the matched share of SQP, at an overall level, from just below 1%, at the highest level, to almost half that share in 2017. The same level of decline was seen in all HEI types, although their respective rates were quite different. We can only speculate about the causes, but it might be due to heightened awareness and a general maturity in the publishing system. On the other hand, blacklists are created in retrospect, and some SQP might not have been identified and registered in blacklists yet. There is also a possibility that questionable publishers have adapted to increased awareness and stated criteria, making it harder to distinguish between clear offenders and just “poor publishing practices.”
4.2. Methodological Considerations for Matching Against Blacklists
Using the proposed methodological framework for evaluating total publication output at a national level provides for the opportunity to compare the proportion of SQP between countries and across regions. Furthermore, using the national publication database based on researchers’ reported publishing means that a larger set of publications are reported than if a third-party publication list, such as CrossRef, WoS, or Scopus, were used, since the curation would be done in a more standardized way instead of based on the coverage of the specific service chosen. At the same time, different practices of maintaining the local databases could have systematic effects on possibly reported questionable publishing. The general trends within each category suggest that differences do not seem to be attributable to different practices within the institutions that maintain the local databases.
This study also showed that none of the blacklists used in this study were comprehensive and that very few publications could be found in all three lists. At the same time, a very high agreement between the lists would still not ensure that the study covers all aspects of SQP. An evaluation of the coverage of the list is inherently problematic if no ground truth is available. It is not the place here to delve into the Science & Technology Studies STS issue of inherent uncertainty of facts and that standards are developed in context (Bowker & Star, 1999). But since it cannot be expected that SQP is distributed similarly to the full publishing activities at the national level, these evaluations are tentative. Apart from the statistical analyses of coverage and overlap that have been done here, it can be argued that, at least up to a point, the inclusion of more sources of blacklists would mean that a wider selection of views could be heard, akin to the argument within qualitative studies that a good selection process becomes “saturated.” By combining a commercial provider of blacklists with a list based on evaluation by an independent nonprofit organization, as well as with a blacklist originating from a government outside of the USA and Europe, there is at least the opportunity for a diverse and broad coverage of SQP journals. Additionally, by subjecting the results to an evaluation using the Norwegian NSD and Clarivate WoS as further resources for possible whitelisting of publications, we argue that we have provided as good quality control as is possible at this point. This helped us to remove seven matches that were made due to entry error in SwePub, but it was less clear-cut that using whitelisting using these sources would help prune out errors. In discussing the results of the whitelisting exercises made using the Norwegian NSD list and Clarivate WoS we would like to add the following: While 4% and 5% of the matched results were found in each of the respective databases, upon detailed analysis, none of the ”whitelisted” entries were actually found to be without suspicion. This has to do with entries being whitelisted at a specific point in time or that there was other evidence that rendered the whitelisting itself questionable.
All but two of the journals that had matched articles that were found in the Norwegian NSD list were, at the same time, not included in the general DOAJ directory (“not whitelisted by DOAJ”) during the coverage of the study, together with being matched to at least one of the blacklists. The two journals listed despite being on the DOAJ directory are noted as being found questionable in additional sources and were retained for that reason. Additionally, one of these was recently dropped by its publisher.
While WoS is generally used as a whitelist for credible publication outlets, when journals were compared to the coverage of the Norwegian list as well as DOAJ (whitelist), clear discrepancies were found. The fact that most of the WoS included journals that where blacklisted were found in the ESCI, which uses fewer rigid criteria for inclusion prompts for caution when using the full set of WoS databases to whitelist journals. All but five journals were either matched with more than one blacklist or not included in NSD during the coverage of the study. Among the five journals only matched in one blacklist, four were additionally not included in DOAJ coverage, while the fifth was the same journal that was dropped from its publisher as described above.
Since all journals found in WoS were also matched in at least one blacklist, as well as having additional criteria found during cross-examination of the results (such as not included in DOAJ, or listed as not scholarly “0” in NSD), none of these journals were omitted from the final results.
With this in mind, we do not want to underplay the fact that using blacklists is error-prone and based on the judgments of evaluators following protocols for inclusion. Therefore, this highlights the problem with blacklisting as a method to identify questionable journals.
4.3. Results in Relation to Other Studies
In a literature review consisting of all the 178 articles identified in WoS on the topics of fake, questionable, or predatory publishers or journals, we found that a large proportion of the studies focused on the issue of questionable publishing from a normative perspective, trying to detect, identify, and distinguish it from proper scientific publishing, sometimes by distinguishing high-quality from low-quality publishing.
One study, which aimed to quantify the extent of predatory publishing using Beall’s list, found that predatory journals had rapidly increased their publication volumes, reaching an estimated 420,000 articles in 2014, published by around 8,000 active journals (Shen & Björk, 2015).
It has also been reported that researchers belonging to less developed areas are found to cite research in questionable journals at a higher level (Frandsen, 2017). This seems to be in line with the results obtained here, which seem to show that less developed research areas that have recently undergone professionalization and academization are more prone to be published in SQP journals.
A number of studies focused on the issue at the national level in countries such as Brazil (Perlin, Imasato, & Borenstein, 2018), the Czech Republic (Mercier, Tardif, et al., 2018; Strielkowski, Gryshova, & Shcherbata, 2017; Vershinina, Tarasova, & Strielkowski, 2017), Iran (Erfanmanesh & Pourhossein, 2017), India (Mukherjee, 2018; Samal & Dehury, 2017; Seethapathy, Kumar, & Hareesha, 2016), Turkey (Demir, 2018a, b; Onder & Erdil, 2017), and Kazakhstan (Yessirkepov, Nurmashev, & Anartayeva, 2015).
However, these studies tackle questionable publishing at policy levels, which means that they focus on single countries or a single research area, or sometimes both. Additionally, almost all studies rely exclusively on a single set of blacklists—Beall’s—as the sole source of identification that is both dated and lacks the means to match with accuracy (using ISSN). However, one recent study used fuzzy text matching (Strinzel et al., 2019). Although this is quite an elaborate technique, it might actually defeat its own purpose, as many questionable publishers choose titles that are so akin to already existing journals that false positives could occur from such matching.
Finally, it is worth discussing if a share of SQP at 0.77% is low or high. Unfortunately, there are few studies that evaluate SQP at the national level. As noted above, the ECOOM group, which maintains the Flanders database in Belgium, reports lower numbers (Eykens, Guns, et al., 2019). In their recent PLOS study, covering five yearly reports on the issue, they identified 556 publications in potentially predatory blacklisted journals during the period 2003–2016 in the regional Flanders database. After evaluation by a panel of independent scholars that oversee the selection of publication outlets in the Flemish VABB-SHW database, 210 out of 73,694 published articles were identified as predatory OA publications. This would indicate a share of 0.28%, which would be significantly lower than our results. Furthermore, it was found that the numbers dropped significantly (by a factor of 10) after the year 2014, which coincides in time with the change in methodology when the independent panels started to evaluate the journal lists. We have also conducted tentative analyses of publishing data obtained from the Danish Bibliometric Research Indicator (BFI), Finland’s VIRTA Publication Information Service (VIRTA), and the Norwegian Current Research Information System in Norway (CRIStin). Here, the share of SQP seems to be comparable between the SwePub data set used here and VIRTA and CRIStin, while it is significantly lower in BFI at the same level as in the VABB-SHW. Upon manual inspection, the BFI has purged all publications that are below level 1 in the Danish database (level 0 and publications in publication channels that have not been evaluated), meaning that a large share of possible SQP has already been removed. This is in line with the VABB-SHV and seems to verify that the introduction of a screening process for “accepted publishing channels” is a viable way to significantly filter out questionable publishing from the evaluation in a performance-based funding system. In Norway and Finland, the same screening is in effect, since only publications at level 1 and above are used in the evaluation. It should also be noted that the Swedish Research Council is producing such a “Swedish list,” while there is also ongoing work on a collaborative “Nordic list.” Still, it does not alter the practices of researchers publishing in these journals, potentially padding publication lists with questionable publishing for use in individual evaluation for grant applications or applications for positions, although it might have a deterrent effect on bad publishing practices.
The handling of a number of blacklists and whitelists, as well as the original handling of publication lists, is error prone. Specifically, the handling of deduplication, fractionalization, and incorrectly added information in the registration process, as well as systematic errors, such as one university failing to report one year and another university’s publications having been added twice in one year, meant that every step in the process had to be handled with care, and often reiterated after small adaptations to the matching process. It was also found, late in the analysis, when comparing the resulting set of 1,750 SQP with the WoS database, that seven entries seemed to have been mislabeled upon registration. This could generally be attributed to a false ISSN being attached to a title or the DOI not matching the stated journal title and ISSN. Upon manual inspection, these entries were removed from the resulting set of 1,743 SQP entries in our analysis.
All the blacklists used in this study identified current publications, which means that changes in questionable status might have occurred (e.g., it has been alleged that some questionable publishers have taken over respectable journals and turned them into problematic outlets). However, we believe that the number of hijacked journals and journals transforming from legitimate to questionable over such a short time period could only marginally impact the results of this study.
Studies have shown that authors who select journals based on spam emails have been unable to halt the publication of manuscripts once they have been submitted (Oermann, Conklin, et al., 2016). Therefore, there could be a risk that authors are published in questionable journals against their will. This paper, however, focuses on publications that have not only been published, but that also have been submitted to local publication databases, indicating that the researchers acknowledge these publications as legitimate. Although the results only apply to Swedish research, the methods are transferable to other countries.
We have provided a methodology for identifying SQP by matching a national publication list to a combination of blacklists from different providers, omitting the outdated Beall’s list, and at the same time not just substituting it by one single list. We noted that no blacklist has near-complete coverage of questionable publication outlets (we manually cross-checked all blacklist matches), which makes our study less reliant on a single blacklist provider, freely available, as well as subscription-based.
Less than one in a hundred research papers reported by Swedish HEIs are published in questionable journals, and the problem does not seem to have increased in recent years. However, in relative terms, such publications are several times more common at newer HEIs, highlighting their vulnerability. We do not propose that our method should be used to publicly identify individual researchers, as our results indicate that blacklists are still in their infancy, to a certain degree are error prone, and probably only identify a certain fraction of all questionable publications, depending on the definition chosen. At the individual level, if made public, an SQP-matched publication list could have a stigmatizing effect and we find that the possibility of raising awareness at the individual level is not justified given that the risk of false positives is still quite high. When scaled to an aggregate level, errors, if found, could be expected to level out. However, it is an easy-to-implement surveillance system that could be used by different stakeholders to identify problematic research environments or whole HEIs in need of support for tackling the issue of quality and ethics in publication. It should also be noted that the use of text-based analysis for identifying the subject structure of matched publications could help to make comparisons between countries without having to align national subject classifications. Measures should be taken to raise awareness and enforce compliance with good publication practices. Information activities directed toward researchers, as well as general “mild" introduction of journal lists, such as the upcoming Nordic list and DOAJ for whitelisting, and preferably free lists, for blacklisting possible SQP journals needs to be developed and be maintained. There might also be a need to raise the issue in a larger context regarding research evaluation. We argue that there is a need to view the role of publishing as part of the full research cycle and not only as a means of presenting results at the end, and that there is a need to critically question the role of publications in the merit system in light of the San Francisco Declaration on Research Assessment (DORA, https://sfdora.org/read/).
Future elaborations would be to develop a comprehensive methodology using a similar methodological framework in collaboration with other research groups for comparing the incidence of questionable publishing in other European countries where a full set of HEI reported publications are available. Upon initial tests, comparable relationships between different types of HEI institution seems to be found in other countries (e.g., Danish, Norwegian, and Finnish data) that we have identified, but for obvious reasons, national data from Estonia and the Czech Republic that we have collected would need expert collaboration for translation at the national level to be interpreted. There is a need to implement the methodology with some care, since every national publication database has its specifics. For instance, as noted in section 4.3, the BFI only includes entries that are found in publication channels at level 1 and above. A comparison might, therefore, show very different results if done without local knowledge of their construction. Furthermore, HEIs only stand for a certain part of the total academic publishing at the national level. A useful extension would be to include publishing from institutes, the private sector, and independent researchers. That would entail using other sources, such as Google Scholar and Microsoft Academic, or browsing known SQP publishers’ web sites (Sīle, 2019).
Gustaf Nelhans: Conceptualization, Data curation, Methodology, Formal analysis, Investigation, Visualization Validation, Writing—original draft, Writing—review & editing. Theo Bodin: Conceptualization, Methodology, Formal analysis, Validation, Writing—original draft, Writing—review & editing.
The authors report no competing interests. Cabell’s Inc. had the right to read the manuscript before submission, but had no role in the design of the study, analysis or interpretation of data, or the writing of the manuscript.
No funding has been received for this research.
Supplementary data, as well as the raw matched data so that tables and figures in this article can be reproduced, are available at https://doi.org/10.5878/6dn9-yt13.
The authors would like to thank two anonymous reviewers for constructive and useful feedback on the submitted manuscript. We would also like to thank Peter Allebeck and Jonas Nordin for connecting the two authors with each other. We thank Henrik Aldberg, previously at the Swedish Research Council for supplying information publications in WoS journals, Camilla Lindelöw and Tuija Drake at the National Library of Sweden for guiding us in the peculiarities of SwePub, and Johan Eklund, SSLS, University of Borås, for help with implementing Cohen’s Kappa. Lastly, Cabell’s provided us with a copy of their blacklist, which made it possible for us to perform a large part of the matching.
Handling Editor: Ludo Waltman