Open Access (OA) facilitates access to research articles. However, authors or funders often must pay the publishing costs, preventing authors who do not receive financial support from participating in OA publishing and gaining citation advantage for OA articles. OA may exacerbate existing inequalities in the publication system rather than overcome them. To investigate this, we studied 522,411 articles published by Springer Nature. Employing correlation and regression analyses, we describe the relationship between authors affiliated with countries from different income levels, their choice of publishing model, and the citation impact of their papers. A machine learning classification method helped us to explore the importance of different features in predicting the publishing model. The results show that authors eligible for article processing charge (APC) waivers publish more in gold OA journals than others. In contrast, authors eligible for an APC discount have the lowest ratio of OA publications, leading to the assumption that this discount insufficiently motivates authors to publish in gold OA journals. We found a strong correlation between the journal rank and the publishing model in gold OA journals, whereas the OA option is mostly avoided in hybrid journals. Also, results show that the countries’ income level, seniority, and experience with OA publications are the most predictive factors for OA publishing in hybrid journals.

The unrestricted availability of Open Access (OA) publications is linked to the goal of granting all interested parties free access to scientific knowledge and ensuring greater equality of access (Munafò, Nosek et al., 2017). This view is strongly related to the consumers of scholarly knowledge, who then would not have to pay for access. However, when taking the authors of those articles into account, they are affected by OA in two different ways: when choosing a publication model for an article and when receiving citations (and hence reputation) for articles that have been published via a certain model (usually described as citation advantage; see, for example, Langham-Putrow, Bakker, and Riegelman (2021)). Those two aspects of OA may introduce significant biases and inequity into the scholarly publication and reputation system because they may restrict participation in OA in particular ways (Bahlai, Bartlett et al., 2019).

First, the OA publishing model generally shifts the publishing costs from readers to authors or their institutions and funders by introducing article processing charges (APCs). This can be a severe constraint for those authors who cannot afford these costs or do not receive any financial support. To overcome this issue, most publishers have implemented an APC waiver/discount policy for authors from, for example, low-income countries (Lawson, 2015). However, it is an open question as to how the different options for OA publishing and waivers/discounts are considered and adopted by researchers with various characteristics, such as their countries’ income level and also their seniority and gender—factors that are also often associated with the decision to publish OA (Iyandemye & Thomas, 2019; Olejniczak & Wilson, 2020; Simard, Ghiasi et al., 2021; Smith, Merz et al., 2021; Zhu, 2017). Rouhi, Beard, and Brundy (2022) discussed the waiver issues from the perspectives of the publisher, institutions, and developing countries. They mentioned the potential unfairness that authors are confronted with, which may be caused by APC-based models. They argued that waiver programs have yet to address this problem successfully. They suggested that meeting the equity standard requires a cross-functional approach involving publishers, funders, research institutions, individual researchers, libraries, and service providers.

To accommodate OA publishing costs, three funding options have emerged over time. First, diamond OA journals are funded by public institutions, such as libraries, which enable free reading and publishing for all researchers. Second, transformative agreements between public institutions and publishers have been introduced that include reading and publishing contracts and which are also funded by the institutions. In this case, there are no direct fees for authors, but their institutions pay the APCs as part of a consortium. Access to publishing and access to publications is limited to participating organizations only. Third, APCs could also be paid by the authors or their institutions themselves. The first option leads to gold OA at the journal level. Transformative agreements allow authors to publish in either gold OA or hybrid journals (which—for a fee—allow publishing individual articles as an OA-variant). The third option is often associated with hybrid journals. All other publishing models for journals usually require funding via subscriptions, resulting in closed-access (CA) articles that can only be read after paying the article or journal fee.

The publishing model is also strongly associated with the visibility of authors and articles. For many researchers, it makes a difference in which journals they publish (e.g., considering discipline-specific journal rankings). If they want to be noticed by others and/or seek promotion, it can be crucial to publish in reputable journals, especially for early-career researchers. To achieve this, not only do financial hurdles and APCs have to be overcome, but also, for example, English language skills and technical skills are needed, as well as institutions that can help with legal advice or infrastructure support. Against this background, researchers have to decide which publishing model to choose and whether OA is not only an altruistic but a feasible option at all.

The second possible source of bias and inequity is related to the paying for access case: It has been shown already that articles published as OA variants are more visible, leading to higher citation counts and altmetrics (Evans & Reimer, 2009; Fraser, Momeni et al., 2020; Lewis, 2018; McKiernan, Bourne et al., 2016; Ottaviani, 2016). Moreover, the Matthew effect shows that researchers who are already well known and widely cited receive even more citations (Farys & Wolbring, 2021)—which directly affects rewards for publication in prestigious journals, for prominence, and citations. For researchers, publications play a central role in their daily practice and the reputation system in which they operate. Publications enable researchers to build on the body of knowledge and refer to those findings by citing the publications (which accumulate reputation in this way). Hence, access to publications is crucial for the progress of science and building of reputation—both of which can be impeded by a lack of access to OA publishing options and the risk of CA articles not being cited as frequently as OA articles.

From that, we hypothesize that researchers with better access to financial resources have better access to publications—both in terms of access to read openly and in terms of access to publish openly. Associated with that may be an even stronger citation advantage for those researchers (usually WEIRD: Western, educated, industrialized, rich, and democratic (Henrich, Heine, & Norenzayan, 2010)) with extensive OA-publishing options. As such, OA may carry the risk of perpetuating already existing inequalities rather than resolving such marginalization in the scholarly communication system (Fox, Pearce et al., 2021).

Related work also indicates a strong association between economic factors, OA, and citation advantages. The scientific output of countries is associated with their economic evolution because scientific progress needs governments’ financial support. Samimi (2011) used a Granger Causality Test to examine the causal relationship between scientific output and GDP in 176 countries and found a two-way positive relationship between them. King (2004) compared published papers and their citation impacts across countries and found that only 31 countries contributed to 98% of the world’s highly cited papers and that the remaining 161 countries contributed less than 2%.

OA publishing is also highly influenced by the authors’ country of affiliation, because it determines APC waiver/discount policies or the availability of transformative agreements with publishers. Some publishers offer general waivers or have a discount policy for all of their journals for eligible authors, and the country’s income level mainly determines eligibility. Lawson (2015) has studied the waiver policy of the 32 most prominent publishers and found that 68% of them grant APC waivers. Simard et al. (2021) found that low-income countries publish and cite OA more than upper-middle and high-income countries. The positive correlation between OA citing and publishing is 1.3 times weaker for high-income countries than other countries. Similarly, Iyandemye and Thomas (2019) showed that biomedicine researchers from low-income countries have the highest percentage in OA publishing. Smith et al. (2021) reported the proportionately fewer OA articles published in Elsevier’s journals for low-income countries, despite their eligibility for APC waivers.

Olejniczak and Wilson (2020) studied the articles published by faculty members at research universities in the United States and found that in the United States, male and senior authors are more likely to publish in OA form. Zhu (2017) conducted a survey with over 1,800 researchers at 12 Russell Group universities1 to find the differences in OA publishing regarding discipline, seniority, and gender. Their results revealed disciplinary differences in OA publishing (Medical and Life Scientists are most likely to publish in gold OA journals), more tendency toward OA publishing for senior authors, and across genders for men.

The journal rank is a decisive factor in submitting the article in addition to its business model. Schroter, Tite, and Smith (2005) conducted a survey study with 28 international authors who submitted to the British Medical Journal and found that for authors, the journal’s ranking is more important than the availability of OA.

Many studies have investigated the OA citation outcome, and most found a citation advantage for OA articles (Evans & Reimer, 2009; Fraser et al., 2020; Lewis, 2018; McKiernan et al., 2016; Ottaviani, 2016). However, regarding biases (e.g., quality bias, self-selecting, mandating, self-archiving), different sampling and controlling data make it difficult to conclude that receiving more citations is only the effect of OA. Momeni, Mayr et al. (2021) studied the citation impact of flipping journals from CA to OA and generally found a slightly higher growth in receiving citations compared to journals in the same discipline and the impact factor’s range. However, they did not observe this trend in all scientific fields. Momeni, Mayr, and Dietze (2022) examined the correlation between different factors and the future authors’ h-index and found a positive but weak correlation between them.

One issue that is often discussed together with OA publishing and APCs is the problem of predatory publishing. Predatory publishers take advantage of the OA movement but work against good scientific practice. Ross-Hellauer, Reichmann et al. (2021) did a systematic review to study the threat to equity in science via open science implementations. They concluded that less well-resourced researchers, researchers from non-English-speaking countries, and early-career researchers are particularly affected by the predatory publishing problem.

We conduct our study on the association between publishing models, the economic background of researchers, and other author-specific and structural factors along three major research questions:

RQ1: What is the relationship between the income level of researchers’ affiliation countries and their publication behavior (do they prefer OA or CA)?

RQ2: What is the relationship between the income level of researchers’ affiliation countries and their publication behavior (OA or CA) with their citation impact?

To answer these questions, we categorize corresponding authors based on the income level of their affiliation country and compare the access status of articles they have published and their citation impact. Whereas the first two RQs are rather descriptive and aim at quantifying the extent to which access to publish openly and access to read openly (and along with it to make them easier/more likely to cite) are related to the economic background of authors, the third RQ takes a variety of factors into account that have been shown to be strongly associated with tendencies to publish OA (Iyandemye & Thomas, 2019; Olejniczak & Wilson, 2020; Simard et al., 2021; Smith et al., 2021; Zhu, 2017).

RQ3: What factors (e.g., journals, articles, authors, or their countries) are associated with selecting the business model of publications (OA against CA)?

Here we aim to give a detailed view of associating factors with OA publishing using correlation, regression, and machine learning analyses. To this end, structural features, such as APC waivers, are considered besides author-specific properties, such as gender or years of publishing activity (see Table 2). We will also look closely at the different access forms to publications such as gold OA, hybrid, and CA. Concerning the level of journals, the relationships between journal rankings, APCs, and research fields (Health Sciences, Life Sciences, Physical Sciences, Social Sciences, and multiple fields) will be examined. In addition, possible country-related influencing factors will be investigated, such as countries’ income level, transformation agreements’ existence, or opportunities for researchers to obtain APC discounts or waivers. At the journal article level, the ratio of OA to CA citations in an article and the number of authors involved are examined. Other author-specific influencing factors can be gender and age, the ratio of OA to CA publications in the past, or even the proportion of international coauthors.

To conduct our study, information on the business model, author characteristics, and article impact are needed, and several approaches and databases must be linked to receive a complete data set.

4.1. Data Selection

For the business model of journals (OA, hybrid, CA) it is possible to crawl the information from the journal’s or publisher’s website or to look up sources such as the Directory of Open Access Journals (DOAJ) and Unpaywall, which both include OA information. But information about the history of the business model of journals is rarely available. In recent years, many journals have converted (flipped) from CA to OA and vice versa, but often there is not enough information about the exact date of starting with the new access model. The Open Access Directory (OAD), a wiki hosted by the School of Library and Information Science at Simmons University2, is the only resource containing a list of a few flipped journals and the date of flipping. The OA start date of journals was available in the DOAJ dataset until 2020. Bautista-Puig, Lopez-Illescas et al. (2020) and Momeni et al. (2021) used the OAD and DOAJ for their studies about flipping journals. Unfortunately, the DOAJ has now stopped collecting that information: “As time progressed, open access models became more complicated … It has become harder to find the right answer to that seemingly simple question: when did open access start for this journal?”3Matthias, Jahn, and Laakso (2019) employed different snapshots of data sets that have OA status (Scopus, DOAJ, Ulrichsweb, publishers’ websites, etc.) and some other resources to find out the reverse flip (converting from OA back to CA) and verified them manually. For bibliometric analyses related to OA, it is necessary to know about the access status of journals for the period in which we study the effect of OA. Obtaining information more coherently requires looking into different journals’ business models and harmonizing them to make them comparable. In addition, every publisher has its own rules for APC exemptions to foster publishing in OA format. For example, eligibility for APC waivers for publishing in Elsevier’s journals is based on the “Research4Life program”4 and for Springer Nature based on “World bank classification.” Various transformative agreements with publishers and the period of their contracts are other influential factors that should be considered in studying the publishing behavior of each publisher separately.

Due to these varying APC-related rules for different publishers, we focused on one major publisher. To analyze papers for various disciplines and countries, we chose Springer Nature, the largest publisher of academic journals (more than 2,900 journals5) with worldwide authors from various disciplines, which provides us with a large amount of data and data diversity for more accurate results. Also, compared to Elsevier, the second most prominent publisher of scholarly journals (over 2,700 journals6), this publisher has a higher OA update (Sotudeh, Ghasempour, & Yaghtin, 2015; Sullo, 2016), resulting in less data skewness.

We downloaded the list of journals and their access status from the snapshot from the year 2019, which is available on the publisher’s website7. Three publishing models exist for these Springer Nature (SN) journals: Gold OA, Hybrid (with the open access option: Open Choice), and CA. Figure 1 displays the distribution of journals and their publishing models.

Figure 1.

Distribution of Springer Nature’s journals by (a) publishing model and (b) field and publishing model.

Figure 1.

Distribution of Springer Nature’s journals by (a) publishing model and (b) field and publishing model.

Close modal

For the bibliometric analyses, we employed Scopus8. We matched the list of SN journals with journals in Scopus via title and ISSN. From 3,138 SN journals, we could match 2,757 journals, which we used for further analyses. Because of the problems regarding journals’ flipping mentioned above, we limited our data to two years, 2017 and 2018, to reduce the errors related to detecting the journals’ and articles’ business models. This resulted in 522,411 articles.

To detect the publishing model of articles in hybrid journals, we employed Unpaywall9 (the snapshot of 2019), a service to find the available version of articles. We obtained the publishing model of articles in hybrid journals from metadata in this data set.

We obtained the APC amount in U.S. dollars for 1,741 hybrid journals and 297 gold OA journals from the website of Springer Nature10. There was no fixed APC for 147 gold OA journals (only 5% of investigated articles belong to these journals), and we had to visit their website to obtain the exact amount for these journals. Therefore, we replaced the APC amount for these journals with null values (empty) and excluded them from the data for the classification task.

To detect the gender status of authors, we utilized a combined name and image-based approach introduced by Karimi, Wagner et al. (2016), which categorizes gender into male and female. Based on this method, we tried detecting gender using the API at Genderize.io11. For those names that the API couldn’t identify the gender of, we looked for names on the web. We detected their gender using image-based recognition algorithms, which increases the recall and accuracy compared to Genderize.io (Karimi et al., 2016). We acknowledge that the person’s gender is not a binary variable. Considering the social dimensions, more gender identities could not be identified with this approach, and that is left out of the analysis. Using Scopus author ID, we found 381,074 unique corresponding authors for the investigated articles, and 10,614 authors (about 3%) had only initials or no first name, and we could not detect their gender.

Overall, we identified the gender status for 49% of authors. Therefore, we excluded 254,044 articles (about 49%) for which we could not detect the gender status of their corresponding author from data in the regression analysis and classification task. One possible reason for the low rate of identifying gender is the large percentage of authors affiliated with Asian countries (136,591; above 35%)12 and probably originally from these countries. Previous studies tested gender detection tools for authors with different nationalities and found them less effective for Asian names (Karimi et al., 2016; Santamaría & Mihaljević, 2018). Table 1 shows the number and percentage of OA and CA publications belonging to the corresponding authors with a gender status across scientific fields. The percentage of detected gender of authors for OA publications is 4% more than for CA publications.

Table 1.

Number and proportion of articles among scientific fields and publishing model for which we detected the gender status of their corresponding author

 Publishing model
CA model (%)OA model (%)
Health Sciences 31,642 (53) 20,534 (49) 
Life Sciences 23,011 (54) 10,032 (57) 
Physical Sciences 74,742 (48) 9,927 (50) 
Social Sciences 9,210 (40) 2,020 (41) 
Multiple fields 38,507 (52) 48,742 (58) 
Total 177,112 (50) 91,255 (54) 
 Publishing model
CA model (%)OA model (%)
Health Sciences 31,642 (53) 20,534 (49) 
Life Sciences 23,011 (54) 10,032 (57) 
Physical Sciences 74,742 (48) 9,927 (50) 
Social Sciences 9,210 (40) 2,020 (41) 
Multiple fields 38,507 (52) 48,742 (58) 
Total 177,112 (50) 91,255 (54) 

4.2. Features and Definitions

To investigate the factors that are associated with higher rates of OA publishing, we defined some features presented in Table 2. Figure 2 presents an overview of data collection and preparation steps. The final analyzed data is available in a Git repository13.

Table 2.

Features used to study the associated factors with OA publishing

Feature typeFeatureDescription
Journal journal_ranking h-index ranking of the journal in the related discipline (for multidisciplinary journals, the average ranking among disciplines). 
journal_APC The cost of APC to publish OA in the journal (US dollars). 
field Field of journal (if the journal has more than one field, the value is ‘multiple fields’). 
Health Sciences 
Life Sciences 
Physical Sciences 
Social Sciences 
multiple fields 
Country country_income Income level (GDP per capita) of the country in which the corresponding author is affiliated. 
OA_agreement If the corresponding author’s country of affiliation has an OA agreement with the publisher, it equals 1, otherwise 0. 
discount_eligible If the corresponding author’s country of affiliation belongs to the lower-middle income group, it equals 1, otherwise 0. 
waiver_eligible If the corresponding author’s country of affiliation belongs to the low-income group, it equals 1, otherwise 0. 
Paper OA_cite Ratio of citing OA against CA in this paper 
authors_count Number of authors 
Author* gender For females equals 0 and for males 1. 
age Years since first publication 
OA_publish Ratio of OA publications against CA in the past (number of previous OA publications divided by the number of CA publications) 
international_coauthors Proportion of international coauthors** to all coauthors in this paper 
Feature typeFeatureDescription
Journal journal_ranking h-index ranking of the journal in the related discipline (for multidisciplinary journals, the average ranking among disciplines). 
journal_APC The cost of APC to publish OA in the journal (US dollars). 
field Field of journal (if the journal has more than one field, the value is ‘multiple fields’). 
Health Sciences 
Life Sciences 
Physical Sciences 
Social Sciences 
multiple fields 
Country country_income Income level (GDP per capita) of the country in which the corresponding author is affiliated. 
OA_agreement If the corresponding author’s country of affiliation has an OA agreement with the publisher, it equals 1, otherwise 0. 
discount_eligible If the corresponding author’s country of affiliation belongs to the lower-middle income group, it equals 1, otherwise 0. 
waiver_eligible If the corresponding author’s country of affiliation belongs to the low-income group, it equals 1, otherwise 0. 
Paper OA_cite Ratio of citing OA against CA in this paper 
authors_count Number of authors 
Author* gender For females equals 0 and for males 1. 
age Years since first publication 
OA_publish Ratio of OA publications against CA in the past (number of previous OA publications divided by the number of CA publications) 
international_coauthors Proportion of international coauthors** to all coauthors in this paper 
*

Corresponding author.

**

An international coauthor is a coauthor who has a different affiliation country than the corresponding author.

Figure 2.

Flow chart of data collection and preparation process.

Figure 2.

Flow chart of data collection and preparation process.

Close modal

To compare publishing and citation behavior across countries, we classified countries by income based on the World Bank classification14 into four groups: low, lower middle, upper middle, and high-income economies. The income level of a country has been evaluated every year and its history is available15. From 218 listed countries by theWorld Bank, we excluded 20 countries with different income levels from 2015 to 2018. Springer Nature offers an APC waiver and discount to those articles with the corresponding author from low and lower middle income countries (classified by the World Bank), respectively16.

From the website Transformative Agreement Registry provided by ESAC17 we found three organizations with an open access agreement with this publisher during the investigated years 2017 and 2018 (KEMOE/FWF in Austria, Max Planck Society in Germany, and Bibsam consortium in Sweden) and two organizations (VSNU-UKB in Netherlands and FinELib consortium in Finland) in 2018. We obtained the list of involved institutions in the agreement by asking the KEMOE/FWF, Bibsam, and FinELib organizations. The list of participating institutions via VSNU-UK was available on the website of SN18. We assumed that publications with the corresponding author affiliated with institutions included in the transformative agreement are free of APC charges. To find Max Planck institutions, we used disambiguated institutional addresses for German institutions (Rimmert, Schwechheimer, & Winterhager, 2017) available on Scopus-KB. We manually looked up the participating institutions for the rest of the four countries. We found 12,323 articles and used them to set the feature “OA agreement” value.

Figure 3 represents the number of articles published in Springer Nature where their corresponding author is affiliated with a country with the respective income group. Sixty-seven articles had a corresponding author with multiple affiliation countries and we excluded them from the analyses. Publication distribution by countries and their income level are available on GitHub19.

Figure 3.

Number of papers published by Springer Nature grouped by income level of countries.

Figure 3.

Number of papers published by Springer Nature grouped by income level of countries.

Close modal

We needed to identify authors and their publications to obtain the ratio of authors’ previous OA publications. Scopus Author Id enabled us to get each author’s published article list. For the variable Country income, we consider average GDP per capita in 2017 and 2018 obtained from the World Bank group20. We used the year of the first publication of authors indexed in Scopus to calculate their career age as a measurement of seniority.

To evaluate and rank the quality of journals, we employed the journal’s h-index, which Hodge and Lacasse (2011) suggested as a better measurement for ranking journals than the five-year impact factor in social science that has been used in previous studies (Barner, Holosko, & Thyer, 2014; Xia, 2012). We calculated the h-index of all journals in Scopus classified in 27 subject categories21 between the years 2011 and 2016.

4.3. Methodology

4.3.1. Normalizing the citation impact

To evaluate and compare the citation impact at the article and journal level among different subject areas, we should normalize them because of varying citation patterns across scientific disciplines and fields. To normalize the journal’s h-index across categories, we computed the Percentile Rank (PR) of each journal (inspired by Bornmann and Mutz [2014]) in its category. This method gives the journals within a category a rank between 0 (lowest h-index) to 100 (highest h-index). In this approach, journals with the same h-index have the same rank. Therefore, this normalization method is an advantage in case of skewed distributions. If the journal belongs to more than one category, we used the weighted PR (Bornmann & Williams, 2020). Based on this approach, weighted PR (wPR) will be calculated using the formula:
(1)
where sci is the ith subject category that the journal belongs to, nsci is the number of journals in this subject category, and PRsci is the PR of the journal in it.

We employed a similar normalizing approach to present the citation impact of articles. Because the citation count is confounded by time since publication, we consider the citations during a time window of 2 years since the publication, as in previous studies (Jannot, Agoritsas et al., 2013; Piwowar, Priem et al., 2018). Next, we categorized the articles into groups with the same subject category and publishing year and ranked them from 0 to 100 based on received citations. We define a PR of 50 (citation’s median) as a threshold for highly cited articles. An article is highly cited if its rank is above 50% of PR in its group, meaning that it has received more citations than half of the articles in the same subject category and publishing year. For articles belonging to multiple subject categories, we used wPR mentioned in Eq. 1, where sci is the ith subject category of the article, nsci is the number of articles in this subject category, and PRsci is the PR of the article in it.

4.3.2. Correlation analysis

To find the association between OA publishing and any feature defined in Table 2 we conducted a correlation analysis. The first variable in calculating the correlation is OA publishing, a dichotomous variable (a case of categorical variable). To assess the association with field, which is a categorical variable, we selected Cramer’s V coefficient. Cramer’s V is based on the chi-squared test and measures the strength of association between two variables. Its value ranges from 0 (no association) to 1 (complete association). The association with binary variables (OA_agreement, discount_eligible, waiver_eligible, gender) was examined with the phi coefficient (Ekström, 2011). This correlation coefficient ranges from −1 to +1 and shows the strength of the positive or negative correlation between two dichotomous variables. To measure the association with other numerical or continuous variables, we applied the point-biserial correlation coefficient, which is used instead of the Pearson correlation when a variable is dichotomous (LeBlanc & Cox, 2017) and can range from −1 to +1.

4.3.3. Regression analysis

We used multivariate logistic regression to find the relationship between various variables (defined in Table 2) and OA publishing. This is a common method for modeling the relationship between the dichotomous dependent variable and multiple independent variables. It allows us to understand the association of the dependent variable with an independent variable in the presence of other independent variables in the data.

4.3.4. Classification method

We employed a machine learning method to estimate the likelihood of choosing the publishing model. To this end, we categorized the publishing model of articles into two groups, OA and CA. Then, we utilized the value of defined features in Table 2 to predict the publishing model. This process is a classification task in machine learning.

To estimate the publishing model of articles, we use a supervised machine learning method, random forest (RF): a common tool for classification tasks (Behr, Giese et al., 2020; Kumar, Mukhopadhyay et al., 2019; Roy, Chopra et al., 2020; Yamak, Saunier, & Vercouter, 2016). We utilize this tool for binary classification (OA = 1 or CA = 0) and use the features introduced in Table 2 as independent variables. We implement the algorithm for hybrid journals in which authors can choose their paper’s business model. We used a k-fold cross-validation (k = 10) procedure to train and test the model.

Due to the skewed distribution in the target variable (91% CA and 9% OA publishing), we balance them by resampling data via SMOTE (synthetic minority oversampling technique), which is proven to be a suitable method to handle a class imbalance problem (Spelmen & Porkodi, 2018).

In this section, we first present some descriptive statistics about the publishing model of articles across four country groups and address RQ1. Next, we display their differences in terms of citation impact among different models to answer RQ2. Then we focus on RQ3 and present the correlation coefficient between the publishing model and features defined in Table 2 and multivariate logistic regression to show the relationship between variables. Also, we demonstrate the performance of estimating the publishing model of articles in hybrid journals and the importance of defined features in the estimation task to reveal the influential factors in selecting the OA model for publishing.

5.1. Countries’ Income Level of Corresponding Authors and Their Publishing Model

Figure 4 shows the distribution of articles categorized by publishing model and the country income level of the corresponding authors. Authors with affiliations in countries with the lowest income level and eligible for the APC waiver have the highest proportion of gold OA publications. In contrast to this, authors from lower middle income countries who are eligible for the APC discount have the lowest percentage in gold OA publishing.

Figure 4.

Distribution of articles published in journals with three publishing models across four groups of countries. The access status of hybrid articles has been identified from Unpaywall (cases 2 and 3). For case 4 (hybrid, no access status), we could not find hybrid journals’ articles in Unpaywall.

Figure 4.

Distribution of articles published in journals with three publishing models across four groups of countries. The access status of hybrid articles has been identified from Unpaywall (cases 2 and 3). For case 4 (hybrid, no access status), we could not find hybrid journals’ articles in Unpaywall.

Close modal

5.2. Countries’ Income Level of Corresponding Authors and Their Citation Impact

Figure 5 shows the ratio of highly cited articles with different publishing models across country groups for the investigated articles. Generally, we observe a higher percentage of highly cited papers for corresponding authors from countries with higher income levels.

Figure 5.

Percentage of highly cited papers published in different models. Hybrid Open Access/Closed Access belongs to articles published as OA/CA in hybrid journals.

Figure 5.

Percentage of highly cited papers published in different models. Hybrid Open Access/Closed Access belongs to articles published as OA/CA in hybrid journals.

Close modal

The ratio of highly cited articles among all countries for gold and hybrid OA models is higher than in other models. Also, this ratio is higher for gold OA articles and indicates the better citation impact of articles published in gold OA journals. The only exception is for countries with low-income levels, with more highly cited papers in the hybrid OA model. Compared to CA journals, journals in hybrid CA have more highly cited articles, except for countries with a high income level.

5.3. Influential Factors on the Publishing Model

First, we conducted a correlation analysis to find the associations between OA publishing and features. Table 3 shows the correlation coefficient between the publishing model (if open access is equal to 1 otherwise 0) and features in Table 2. We separated the data into two sets: set 1 for articles published in OA or CA journals (nonhybrid journals) and set 2 for articles in hybrid journals. Set 1 reveals the association of discount and waiver policies with OA publishing, and optional OA publishing for hybrid journals in set 2 displays more author-specific factors related to OA publishing. The weak negative correlation with gender demonstrates that the tendency toward gold OA publishing for women is slightly more than for men, which disagrees with previous findings (Olejniczak & Wilson, 2020; Zhu, 2017). As we observed the lowest proportion of OA publishing for countries with a lower middle income level in Figure 4, the negative correlation for discount_eligible (also a positive value for waiver_eligible) in Table 3 points out that the discount policies are insufficient to motivate the authors from these countries for gold OA publishing. Table 4 displays the relationship between the publishing model and features in Table 3 by considering all of the features in multivariate logistic regression. The results confirm the negative/positive correlation calculated in correlation analysis, except that the positive correlation between discount_eligible and the publishing model is inconsistent with the result in the correlation coefficient. The highest Odds Ratios for Social Sciences among fields in Table 4 reveal the highest proportion of OA publishing in this field. This field has experienced a dramatic growth of OA journals since 2009 (Liu & Li, 2018). The strong positive correlation between journal_ranking and the publishing model for the first set suggests that the journal’s rank is the dominant factor in choosing a gold OA journal to publish. Therefore, we estimate the publishing model for articles in set 2 (hybrid journals) to discover other feature categories rather than journal-specific factors influencing the authors’ decision for an OA option. Moreover, the optional choice of the OA model in hybrid journals better reveals characteristics leading to the OA model.

Table 3.

Correlation coefficient between independent variables and the target variable. The value of the target equal to 1 (0) means the paper has been published in the OA (CA) model

FeatureCorrelation testCorrelation coefficient
Set 1 (nonhybrid)Set 2 (hybrid)
journal_ranking Point-biserial 0.70 0.07 
journal_APC Point-biserial – 0.10 
field Cramer’s V 0.69 0.09 
country_income Point-biserial 0.28 0.16 
OA_agreement Phi 0.08 0.30 
discount_eligible Phi −0.08 – 
waiver_eligible Phi 0.06 – 
OA_cite Point-biserial 0.42 0.13 
authors_count Point-biserial 0.09 0.07 
gender Phi −0.08 −0.01 
age Point-biserial −0.08 0.02 
OA_publish Point-biserial 0.46 0.41 
international_coauthors Point-biserial 0.17 0.11 
Sample size: 192,498 329,913 
FeatureCorrelation testCorrelation coefficient
Set 1 (nonhybrid)Set 2 (hybrid)
journal_ranking Point-biserial 0.70 0.07 
journal_APC Point-biserial – 0.10 
field Cramer’s V 0.69 0.09 
country_income Point-biserial 0.28 0.16 
OA_agreement Phi 0.08 0.30 
discount_eligible Phi −0.08 – 
waiver_eligible Phi 0.06 – 
OA_cite Point-biserial 0.42 0.13 
authors_count Point-biserial 0.09 0.07 
gender Phi −0.08 −0.01 
age Point-biserial −0.08 0.02 
OA_publish Point-biserial 0.46 0.41 
international_coauthors Point-biserial 0.17 0.11 
Sample size: 192,498 329,913 
Table 4.

The results of logistic regression. The target variable is the publishing model and is equal to 1 for OA and 0 for CA publishing. The outputs are odds ratio, exp(β). (1 − exp(β)) shows the percentage change of the target variable per unit increase in an independent variable. So, an odds ratio greater/less than 1 displays a positive/negative correlation between variables

 Set 1Set 2
Odds ratio95% CIOdds ratio95% CI
Intercept 0.002*** (−72.4) 0.001 to 0.002 0.00*** (−87.7) 0.00 to 0.00 
Independent variables 
 journal_ranking 1.98*** (10.38) 1.74 to 2.25 110.7*** (86.5) 99.5 to 100.23 
 journal_APC 1.00*** (8.05) 1.0001 to 1.0002 – – 
 field 
  Health Sciences reference reference reference reference 
  Life Sciences 1.01 (0.31) 0.94 to 1.08 0.67*** (−9.55) 0.62 to 0.73 
  Physical Sciences 0.97 (−0.91) 0.91 to 1.07 0.20*** (−44.29) 0.18 to 0.21 
  Social Sciences 1.90*** (13.81) 1.73 to 2.08 3.49*** (12.2) 2.86 to 4.27 
  multiple fields 1.25*** (8.5) 1.19 to 1.32 3.4*** (30.87) 3.17 to 3.71 
country_income 1.00*** (33.88) 1.000 to 1.000 1.000*** (16.18) 1.00 to 1.00 
OA_agreement 14.9*** (65.07) 13.78 to 16.22 0.93(−0.78) 0.78 to 1.11 
discount_eligible – – 1.7*** (9.17) 1.52 to 1.90 
waiver_eligible – – 20.19*** (5.53) 8.29 to 77.5 
OA_cite 0.55*** (−12.97) 0.500 to 0.600 1.55*** (8.4) 1.39 to 1.71 
authors_count 1.003 (0.80) 0.99 to 1.01 1.17*** (33.15) 1.16 to 1.18 
gender 0.94** (−2.8) 0.90 to 0.98 0.93* (−2.5) 0.88 to 0.98 
age 1.05*** (29.63) 1.05 to 1.1.054 0.97*** (−15.36) 0.96 to 0.98 
OA_publish 196.79*** (105.65) 178.46 to 217.09 23.86*** (50.58) 21.1 to 26.99 
international_coauthors 1.17*** (18.21) 1.15 to 1.19 1.03 (1.34) 0.99 to 1.06 
McFadden’s pseudo R2 0.25 0.60 
Sample size 96,674 162,773 
 Set 1Set 2
Odds ratio95% CIOdds ratio95% CI
Intercept 0.002*** (−72.4) 0.001 to 0.002 0.00*** (−87.7) 0.00 to 0.00 
Independent variables 
 journal_ranking 1.98*** (10.38) 1.74 to 2.25 110.7*** (86.5) 99.5 to 100.23 
 journal_APC 1.00*** (8.05) 1.0001 to 1.0002 – – 
 field 
  Health Sciences reference reference reference reference 
  Life Sciences 1.01 (0.31) 0.94 to 1.08 0.67*** (−9.55) 0.62 to 0.73 
  Physical Sciences 0.97 (−0.91) 0.91 to 1.07 0.20*** (−44.29) 0.18 to 0.21 
  Social Sciences 1.90*** (13.81) 1.73 to 2.08 3.49*** (12.2) 2.86 to 4.27 
  multiple fields 1.25*** (8.5) 1.19 to 1.32 3.4*** (30.87) 3.17 to 3.71 
country_income 1.00*** (33.88) 1.000 to 1.000 1.000*** (16.18) 1.00 to 1.00 
OA_agreement 14.9*** (65.07) 13.78 to 16.22 0.93(−0.78) 0.78 to 1.11 
discount_eligible – – 1.7*** (9.17) 1.52 to 1.90 
waiver_eligible – – 20.19*** (5.53) 8.29 to 77.5 
OA_cite 0.55*** (−12.97) 0.500 to 0.600 1.55*** (8.4) 1.39 to 1.71 
authors_count 1.003 (0.80) 0.99 to 1.01 1.17*** (33.15) 1.16 to 1.18 
gender 0.94** (−2.8) 0.90 to 0.98 0.93* (−2.5) 0.88 to 0.98 
age 1.05*** (29.63) 1.05 to 1.1.054 0.97*** (−15.36) 0.96 to 0.98 
OA_publish 196.79*** (105.65) 178.46 to 217.09 23.86*** (50.58) 21.1 to 26.99 
international_coauthors 1.17*** (18.21) 1.15 to 1.19 1.03 (1.34) 0.99 to 1.06 
McFadden’s pseudo R2 0.25 0.60 
Sample size 96,674 162,773 

Significance: *p < 0.05, **p < 0.01, ***p < 0.001. z-values of coefficients in parentheses. CI: Confidence interval.

Table 5 shows the performance of the RF classifier for the second set (hybrid journals). Figure 6 displays the permutation importance of features employed to predict the publishing model implemented for this set. The permutation importance of a feature shows a decrease in the model performance when the feature’s value is randomly shuffled, but the values of other predictors remain unchanged. A higher value for a feature shows more predictive power in the proposed model. The highest importance values for country_income and age in Figure 6 indicate that the most significant factors in selecting an OA model are the income level of countries and seniority. The lowest value for the variable gender presents that gender has a lower impact on the authors’ decision for the OA model compared to other factors. OA_agreement is one of the weakest features in predicting the publishing model, and the correlation analysis also shows a weak correlation between them. One possible reason for the weak effect is that only 2.3% of papers have been involved in transformative agreements. In addition, the income level of countries is the most important feature, and regarding the positive correlation of this feature with OA publishing, it is more likely for authors from high-income countries (even without a transformative agreement) to publish in the OA model. This may also smooth the association of the agreement with OA publishing.

Table 5.

Performance of predicting the publishing model of papers with random forest method

ClassificationOACA
Precision 0.85 0.94 
Recall 0.95 0.83 
F1 score 0.89 0.88 
Accuracy 0.92 
ClassificationOACA
Precision 0.85 0.94 
Recall 0.95 0.83 
F1 score 0.89 0.88 
Accuracy 0.92 
Figure 6.

Permutation importance of features employed to predict the publishing model of papers with the Random Forest method for the articles published in hybrid journals.

Figure 6.

Permutation importance of features employed to predict the publishing model of papers with the Random Forest method for the articles published in hybrid journals.

Close modal

This work presents a detailed study of the relationship between author-specific and structural factors (e.g., income level of authors’ affiliation country), OA publishing, and OA citation advantage. First, we investigated the relationship between the income level of countries and OA publishing for articles published by Springer Nature in the years 2017 and 2018. We found that authors from lower middle income countries with eligibility to use APC discounts have a lower proportion of gold OA publications in all published papers by this publisher compared to other countries. It indicates that discounted APC is still too much for these authors to pay for a gold OA model and agrees with the statement of Rouhi et al. (2022), who pointed out that waiver and discount issues could not bring author equity in reading and publishing. In contrast, the proportion of authors from countries with a low income level who receive APC waivers is higher than authors from other countries. This result conflicts with the study results by Smith et al. (2021), which found fewer OA paper proportions published by Elsevier for these countries compared to others. The reason could be stricter conditions that this publisher considers for waiver eligibility.

We examined the citation impact of these articles and compared the percentage of highly cited papers among the publishing models and the income levels of the corresponding authors’ countries. For all countries, the OA model in gold OA or hybrid has the highest percentage of highly cited papers. Also, the results demonstrate a higher proportion of highly cited articles for countries with higher income levels. Although it displays more citation impact for OA models, this can result from confounding factors such as self-selection and quality biases (Gargouri, Hajjem et al., 2010). Also, examining the preprint and green OA publishing effect (where the article has been published in the CA model, but a free version is available in a repository outside of the publisher’s website) will result in more accurate analyses (Fraser et al., 2020; Wang, Glänzel, & Chen, 2020).

We conducted correlation, regression, and machine learning analyses to find more characteristics (e.g., author, journal, paper) related to OA publishing. The results of the correlation analysis displayed the strength of positive/negative correlation between the publishing model and every feature defined in Table 2. Using regression analysis, we examined the association of each factor while accounting for other factors. The results reinforced the correlation outcomes. The only conflict between these two methods was the negative correlation between discount_eligibility with OA publishing in the correlation analysis, whereas it was positive in regression evaluation. In addition, we estimated the publishing model of articles (OA or CA) using an RF-based machine learning approach and examined the impact of each feature on the estimation task. The results show that the country’s income and more experiences in OA rather than CA publishing are the most influential factors in estimating the publishing model. We discovered that the tendency toward OA publishing was slightly higher for women, but it was a less important feature than other features in estimating the OA model.

One obvious limitation of this study is that we included articles from just one publisher, Springer Nature. Authors’ publishing behavior may differ among articles published by other publishers, which limits the generalizability of the results of our study.

We obtained the access status of journals in 2019 based on the list published on Springer Nature’s website (the same for the access status at the article level from Unpaywall). Some journals may have flipped from CA to OA (Momeni et al., 2021) or vice versa, and we did not detect this, which may cause errors in results. Furthermore, we did not control the correctness of external data (Springer Nature and Unpaywall). The accuracy of these data affects the results’ precision. We identified the gender of 49% authors and removed 49% of articles without gender status for the corresponding authors in the regression and machine learning analyses. In addition, 2% of the data have been removed because of the null value in other features (e.g., journals’ APC). Because the gender detection approach does not work well for Asian names, especially Chinese ones, we have a lower proportion of these authors with gender status in the data set, which also creates biases in our analyses.

For future work, we can consider other publishers to examine how the different APC policies among publishers impact OA publishing. Also, controlling for articles’ language in the analyses encourages future studies. Springer Nature is an international publisher and publishes mostly articles in English22, and articles in other languages are underrepresented in this study. Considering other publishers with non-English content and the articles’ language in the analyses may reveal the role of languages in publishing international OA articles and citation advantages.

Fakhri Momeni: Conceptualization, Formal analysis, Investigation, Methodology, Resources, Software, Validation, Visualization, Writing—original draft, Writing—review & editing. Kristin Biesenbender: Conceptualization, Resources, Writing—review & editing. Philipp Mayr: Funding acquisition, Project administration, Writing—review & editing. Stefan Dietze: Methodology, Supervision, Writing—review & editing. Isabella Peters: Funding acquisition, Project administration, Supervision, Writing—review & editing.

The authors have no competing interests.

The data set analyzed during the current study and code are available at https://github.com/momenifi/open_access_springer_nature.git.

This work is financially supported by BMBF project OASE, grant number 01PU17005A. We acknowledge the support of the German Competence Center for Bibliometrics (grant: 01PQ17001) for maintaining the used data set for the analyses.

8

The in-house Scopus database maintained by the German Competence Centre for Bibliometrics (Scopus-KB), 2021 version.

12

Authors from Armenia, Azerbaijan, Georgia, Kazakhstan, Russia, and Turkey, which belong to both Asia and Europe, are not included in this list.

Bahlai
,
C.
,
Bartlett
,
L. J.
,
Burgio
,
K. R.
,
Fournier
,
A. M.
,
Keiser
,
C. N.
, …
Whitney
,
K. S.
(
2019
).
Open science isn’t always open to all scientists
.
American Scientist
,
107
(
2
),
78
82
.
Barner
,
J. R.
,
Holosko
,
M. J.
, &
Thyer
,
B. A.
(
2014
).
American social work and psychology faculty members’ scholarly productivity: A controlled comparison of citation impact using the h-index
.
British Journal of Social Work
,
44
(
8
),
2448
2458
.
Bautista-Puig
,
N.
,
Lopez-Illescas
,
C.
,
de Moya-Anegon
,
F.
,
Guerrero-Bote
,
V.
, &
Moed
,
H. F.
(
2020
).
Do journals flipping to gold open access show an OA citation or publication advantage?
Scientometrics
,
124
(
3
),
2551
2575
.
Behr
,
A.
,
Giese
,
M.
,
Teguim K.
,
H. D.
, &
Theune
,
K.
(
2020
).
Early prediction of university dropouts—A random forest approach
.
Jahrbücher für Nationalökonomie und Statistik
,
240
(
6
),
743
789
.
Bornmann
,
L.
, &
Mutz
,
R.
(
2014
).
From P100 to P100′: A new citation-rank approach
.
Journal of the Association for Information Science and Technology
,
65
(
9
),
1939
1943
.
Bornmann
,
L.
, &
Williams
,
R.
(
2020
).
An evaluation of percentile measures of citation impact, and a proposal for making them better
.
Scientometrics
,
124
(
2
),
1457
1478
.
Ekström
,
J.
(
2011
).
The phi-coefficient, the tetrachoric correlation coefficient, and the Pearson-Yule debate
.
Journal of the Korean Statistical Society
,
42
(
3
),
323
328
.
Evans
,
J. A.
, &
Reimer
,
J.
(
2009
).
Open access and global participation in science
.
Science
,
323
(
5917
),
1025
. ,
[PubMed]
Farys
,
R.
, &
Wolbring
,
T.
(
2021
).
Matthew effects in science and the serial diffusion of ideas: Testing old ideas with new methods
.
Quantitative Science Studies
,
2
(
2
),
505
526
.
Fox
,
J.
,
Pearce
,
K. E.
,
Massanari
,
A. L.
,
Riles
,
J. M.
,
Szulc
,
Ł.
Gonzales
,
A. L.
(
2021
).
Open science, closed doors? Countering marginalization through an agenda for ethical, inclusive research in communication
.
Journal of Communication
,
71
(
5
),
764
784
.
Fraser
,
N.
,
Momeni
,
F.
,
Mayr
,
P.
, &
Peters
,
I.
(
2020
).
The relationship between bioRxiv preprints, citations and altmetrics
.
Quantitative Science Studies
,
1
(
2
),
618
638
.
Gargouri
,
Y.
,
Hajjem
,
C.
,
Larivière
,
V.
,
Gingras
,
Y.
,
Carr
,
L.
, …,
Harnad
,
S.
(
2010
).
Self-selected or mandated, open access increases citation impact for higher quality research
.
PLOS ONE
,
5
(
10
),
e13636
. ,
[PubMed]
Henrich
,
J.
,
Heine
,
S. J.
, &
Norenzayan
,
A.
(
2010
).
The weirdest people in the world?
Behavioral and Brain Sciences
,
33
(
2–3
),
61
83
. ,
[PubMed]
Hodge
,
D. R.
, &
Lacasse
,
J. R.
(
2011
).
Evaluating journal quality: Is the h-index a better measure than impact factors?
Research on Social Work Practice
,
21
(
2
),
222
230
.
Iyandemye
,
J.
, &
Thomas
,
M. P.
(
2019
).
Low income countries have the highest percentages of open access publication: A systematic computational analysis of the biomedical literature
.
PLOS ONE
,
14
(
7
),
e0220229
. ,
[PubMed]
Jannot
,
A.-S.
,
Agoritsas
,
T.
,
Gayet-Ageron
,
A.
, &
Perneger
,
T. V.
(
2013
).
Citation bias favoring statistically significant studies was present in medical research
.
Journal of Clinical Epidemiology
,
66
(
3
),
296
301
. ,
[PubMed]
Karimi
,
F.
,
Wagner
,
C.
,
Lemmerich
,
F.
,
Jadidi
,
M.
, &
Strohmaier
,
M.
(
2016
).
Inferring gender from names on the web: A comparative evaluation of gender detection methods
. In
Proceedings of the 25th International Conference Companion on World Wide Web
(pp.
53
54
).
King
,
D. A.
(
2004
).
The scientific impact of nations
.
Nature
,
430
(
6997
),
311
316
. ,
[PubMed]
Kumar
,
N.
,
Mukhopadhyay
,
S.
,
Gupta
,
M.
,
Handa
,
A.
, &
Shukla
,
S. K.
(
2019
).
Malware classification using early stage behavioral analysis
. In
2019 14th Asia Joint Conference on Information Security (AsiaJCIS)
(pp.
16
23
).
Langham-Putrow
,
A.
,
Bakker
,
C.
, &
Riegelman
,
A.
(
2021
).
Is the open access citation advantage real? A systematic review of the citation of open access and subscription-based articles
.
PLOS ONE
,
16
(
6
),
e0253129
. ,
[PubMed]
Lawson
,
S.
(
2015
).
Fee waivers for open access journals
.
Publications
,
3
(
3
),
155
167
.
LeBlanc
,
V.
, &
Cox
,
M.
(
2017
).
Interpretation of the point-biserial correlation coefficient in the context of a school examination
.
The Quantitative Methods for Psychology
,
13
,
46
56
.
Lewis
,
C. L.
(
2018
).
The open access citation advantage: Does it exist and what does it mean for libraries?
Information Technology and Libraries
,
37
(
3
),
50
65
.
Liu
,
W.
, &
Li
,
Y.
(
2018
).
Open access publications in sciences and social sciences: A comparative analysis
.
Learned Publishing
,
31
(
2
),
107
119
.
Matthias
,
L.
,
Jahn
,
N.
, &
Laakso
,
M.
(
2019
).
The two-way street of open access journal publishing: Flip it and reverse it
.
Publications
,
7
(
2
),
23
.
McKiernan
,
E. C.
,
Bourne
,
P. E.
,
Brown
,
C. T.
,
Buck
,
S.
,
Kenall
,
A.
, …
Yarkoni
,
T.
(
2016
).
Point of view: How open science helps researchers succeed
.
eLife
,
5
,
e16800
. ,
[PubMed]
Momeni
,
F.
,
Mayr
,
P.
, &
Dietze
,
S.
(
2022
).
Investigating the contribution of author- and publication-specific features to scholars’ h-index prediction
.
arXiv:2207.09655
.
Momeni
,
F.
,
Mayr
,
P.
,
Fraser
,
N.
, &
Peters
,
I.
(
2021
).
What happens when a journal converts to open access? A bibliometric analysis
.
Scientometrics
,
126
,
9811
9827
.
Munafò
,
M. R.
,
Nosek
,
B. A.
,
Bishop
,
D. V. M.
,
Button
,
K. S.
,
Chambers
,
C. D.
, …
Ioannidis
,
J. P. A.
(
2017
).
A manifesto for reproducible science
.
Nature Human Behaviour
,
1
,
0021
. ,
[PubMed]
Olejniczak
,
A. J.
, &
Wilson
,
M. J.
(
2020
).
Who’s writing open access (OA) articles? Characteristics of OA authors at Ph.D.-granting institutions in the United States
.
Quantitative Science Studies
,
1
(
4
),
1429
1450
.
Ottaviani
,
J.
(
2016
).
The post-embargo open access citation advantage: It exists (probably), it’s modest (usually), and the rich get richer (of course)
.
PLOS ONE
,
11
(
8
),
e0159614
. ,
[PubMed]
Piwowar
,
H.
,
Priem
,
J.
,
Larivière
,
V.
,
Alperin
,
J. P.
,
Matthias
,
L.
, …
Haustein
,
S.
(
2018
).
The state of OA: A large-scale analysis of the prevalence and impact of open access articles
.
PeerJ
,
6
,
e4375
. ,
[PubMed]
Rimmert
,
C.
,
Schwechheimer
,
H.
, &
Winterhager
,
M.
(
2017
).
Disambiguation of author addresses in bibliometric databases
.
Technical Report
.
Bielefeld
:
Universität Bielefeld, Institute for Interdisciplinary Studies of Science
.
Ross-Hellauer
,
T.
,
Reichmann
,
S.
,
Cole
,
N. L.
,
Fessl
,
A.
,
Klebel
,
T.
, &
Pontika
,
N.
(
2021
).
Dynamics of cumulative advantage and threats to equity in open science: A scoping review
.
Royal Society Open Science
,
9
(
1
),
211032
. ,
[PubMed]
Rouhi
,
S.
,
Beard
,
R.
, &
Brundy
,
C.
(
2022
).
Left in the cold: The failure of APC waiver programs to provide author equity
.
Science Editor
,
45
(
1
),
5
13
.
Roy
,
S. S.
,
Chopra
,
R.
,
Lee
,
K. C.
,
Spampinato
,
C.
, &
Mohammadi-Ivatlood
,
B.
(
2020
).
Random forest, gradient boosted machines and deep neural network for stock price forecasting: A comparative analysis on South Korean companies
.
International Journal of Ad Hoc and Ubiquitous Computing
,
33
(
1
),
62
71
.
Samimi
,
A. J.
(
2011
).
Scientific output and GDP: Evidence from countries around the world
.
Journal of Education and Vocational Research
,
2
(
2
),
38
41
.
Santamaría
,
L.
, &
Mihaljević
,
H.
(
2018
).
Comparison and benchmark of name-to-gender inference services
.
PeerJ Computer Science
,
4
,
e156
. ,
[PubMed]
Schroter
,
S.
,
Tite
,
L.
, &
Smith
,
R.
(
2005
).
Perceptions of open access publishing: Interviews with journal authors
.
British Medical Journal
,
330
(
7494
),
756
. ,
[PubMed]
Simard
,
M.-A.
,
Ghiasi
,
G.
,
Mongeon
,
P.
, &
Larivière
,
V.
(
2021
).
Geographic differences in the uptake of open access
. In
18th International Conference on Scientometrics and Informetrics
(pp.
1033
1038
). .
Smith
,
A. C.
,
Merz
,
L.
,
Borden
,
J. B.
,
Gulick
,
C. K.
,
Kshirsagar
,
A. R.
, &
Bruna
,
E. M.
(
2021
).
Assessing the effect of article processing charges on the geographic diversity of authors using Elsevier’s “Mirror Journal” system
.
Quantitative Science Studies
,
2
(
4
),
1123
1143
.
Sotudeh
,
H.
,
Ghasempour
,
Z.
, &
Yaghtin
,
M.
(
2015
).
The citation advantage of author-pays model: The case of Springer and Elsevier OA journals
.
Scientometrics
,
104
(
2
),
581
608
.
Spelmen
,
V. S.
, &
Porkodi
,
R.
(
2018
).
A review on handling imbalanced data
. In
2018 International Conference on Current Trends Towards Converging Technologies
(pp.
1
11
).
Sullo
,
E.
(
2016
).
Open access papers have a greater citation advantage in the author-pays model compared to toll access papers in Springer and Elsevier open access journals
.
Evidence Based Library and Information Practice
,
11
(
1
),
60
62
.
Wang
,
Z.
,
Glänzel
,
W.
, &
Chen
,
Y.
(
2020
).
The impact of preprints in library and information science: An analysis of citations, usage and social attention indicators
.
Scientometrics
,
125
(
2
),
1403
1423
.
Xia
,
J.
(
2012
).
Positioning open access journals in a LIS journal ranking
.
College & Research Libraries
,
73
(
2
),
134
145
.
Yamak
,
Z.
,
Saunier
,
J.
, &
Vercouter
,
L.
(
2016
).
Detection of multiple identity manipulation in collaborative projects
. In
Proceedings of the 25th International Conference Companion on World Wide Web
(pp.
955
960
).
Zhu
,
Y.
(
2017
).
Who support open access publishing? Gender, discipline, seniority and other factors associated with academics’ OA practice
.
Scientometrics
,
111
(
2
),
557
579
. ,
[PubMed]

Author notes

Handling Editor: Ludo Waltman

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. For a full description of the license, please visit https://creativecommons.org/licenses/by/4.0/legalcode.