An analysis of form and function of a research article between and within publishers and journals

Abstract The identification and subsequent analysis of research articles for machine learning and natural language processing is a complicated task given the lack of consistent article organization principles and heading naming conventions across publishers and journals. Given this, an understanding of how research articles organizationally follow a common function and their use of various heading terms, or forms, is a critical step in applying machine learning techniques for data and information mining across a corpus of articles. To address this need, the authors developed and implemented an article heading form and function analysis across 12 publishers including both research articles and nonresearch articles. Our aim was to (a) identify each of the labeled sections used by research articles, define these sections based on their rhetorical function, and determine frequency of use; (b) within the given data set, determine all of the alternative labels used to identify these sections; and (c) determine whether these sections can be used to consistently determine (1) whether an article is a true research article, or (2) whether an article is not a research article. The results indicated wide variability in the organization of research articles with 24 common sections, known by 186 different names both within and across publishing houses.


INTRODUCTION
From biology to architecture to writing, a simple principle holds true: Form must align with function. In scientific writing, the empirical research article (RA) is the form used to communicate new, systematically tested ideas in a way that allows those ideas to be evaluated by other experts (Tanti, 2014). As a mechanism for scholarly communications, the form and structure of the RA has been the study of much research and analysis, specifically in the areas of genre classification, or successfully differentiating texts that belong to different genres, and communicative moves, or the mapping of rhetorical structure within an article section (Swales, 1981).
There is also a growing body of research on the structure of an RA, specifically around article headings. Thelwall (2019) leveraged a large corpus of full text articles from PubMed Central to look across domains to compare the structure of RA headings. Leveraging the high-quality metadata within PubMed Central, Thelwall concluded that there was very little consistency in the structure of a research article both within and across the many scientific disciplines. Thelwall's work built on the research of Teufel (1999), who determined that scientific texts could be arranged in key argumentative zones based on section function and expected moves. Additional research has focused on mapping the research article structure and form of specific disciplines and subdisciplines. For example, Kanoksilapatham (2015) found variations and unique characteristics for each engineering subdiscipline RA structure. Similarly, Tessuto (2015), analyzed the form and moves of empirical law RAs, finding that the typical Introduction, Methods, Results, and Discussion (IMRD) framework for organizing an RA was not found in contemporary outputs published by the law discipline.
What hasn't been widely studied within this area of critical analysis is the form and function of the research article between and within journals and publishers. While some publishers, journals, and scholarly societies have expectations and guidelines for reporting research or a framework for RA structure (ICMJE, 2004;Journal of Environmental Quality, 2021), no known analysis has completed a comparison across journals within a group of publishers. To conduct this research, the aims are to 1. Identify each of the labeled sections used by RAs, define these sections based on their rhetorical purpose, and determine frequency of use. 2. Within the given data set, determine all of the alternative labels used to identify these sections. 3. Determine whether these sections can be used to consistently determine (a) whether an article is an RA, or (b) whether an article is not an RA.
An article "section" is defined as any segment of text that has been set apart from the main text with a label. A "subsection" is any segment of text that falls within a section and has been set apart from that section with its own label. A "label" is a word or phrase used to identify and describe a section or subsection.
Given that this research is bound by publisher/journal and not by academic discipline, no large-scale corpus, with CC-BY or full-text open access, such as PubMed Central is available to identify and parse RAs and the structured metadata. Thus, the aims require a process for first doing this. The first aim seeks to catalogue and define the current RA structure, identifying rhetorical elements and cataloguing how frequently they are used to determine which components are obligatory in an RA and which are optional. By defining an RA and its specific elements, it is possible to determine how to differentiate RAs from other similar genres.
The second aim seeks to document variation in how sections are labeled, thus identifying the particular problems that machine programmers face when identifying RAs and locating information within RAs.
Finally, our third aim seeks to compare sections and labels used in RAs across articles within specific publishers and journals to those used by review articles, meta-analyses, and case studies to determine if there are already straightforward ways to differentiate these genres based on article structure.

Literature Review
In the last two decades, a considerable body of work has emerged to document the rhetorical components of RAs. This research has documented the macrosections of the RA, the various orders in which those sections can appear, and the specific rhetorical functions of those sections. Together, these existing studies have built a rich framework for understanding the structure and definition of RAs. Yet the dichotomous focus on either macrostructure or the rhetorical components of a particular section has left midlevel structure relatively overlooked. Given that RAs, non-RAs, and nonresearch articles often contain the same macrosections, RA subsections within publishers and specific journals deserve focus.
The current literature builds generally from Swales' (1981) seminal work on RA structure, in which he established move-step analysis. In the Swalesian tradition, a move can be defined as "a text segment that performs a communicative function, contributing to the global function of a whole text. Moves can vary in length … and can be recognized by a set of linguistic features" (Kanoksilapatham, 2015). In turn, moves are broken down into steps, essentially rhetorical subunits that work together to accomplish a move. For example, Dobakhti (2013) identifies one RA move, "Commenting on Findings," as composed of three steps: explaining, interpreting, and evaluating.
Authors use these moves to identify and define the macrosections of RAs. For instance, Lin and Evans (2012) use move-step analysis to question the breakdown of RAs into only the four main sections of IMRD. They argue that the Literature Review and Conclusion sections contain moves not accounted for in prior move-step analyses of I, M, R, and D and should, therefore, be their own sections. Lin and Evans (2012) also posit that, when the Results and Discussion sections are combined into a single section, that section contains a different set of moves than those in separate R and D sections. Thus, according to Lin and Evans (2012), separate Results and Discussion sections are definitionally distinct from a combined Results and Discussion section. Authors continue to disagree on how these sections should be categorized, and many authors continue to lean on the IMRD breakdown despite Lin and Evans' (2012) analysis. For instance, Li andGe (2009) use Nwogu's (1997) set of moves and IMRD framework to conduct their analyses of articles published in 1985and 2004. Kanoksilapatham (2015 also focuses on the I, M, R, and D sections when analyzing how moves vary across engineering papers, and Hsieh, Tsai, Lin, Luoi, and Kuo (2006) treat the Conclusion as a variation of the Discussion section.
At an even more fundamental level, researchers have used linguistic patterns to identify and define the moves themselves. For example, de Waard and Pander Maat (2012) and Dahl (2009) use verb tense patterns to define the rhetorical purpose of particular sentences or paragraphs. By contrast, Kashiha (2015) identifies phrases or "lexical bundles" that are used to introduce particular rhetorical components. These lexical bundles contain clues about the function of the subsequent text, allowing Kashiha (2015) to define that text's purpose and categorize it as a move.

Data Collection
The goal of the data collection process was to record all of the labeled sections and subsections within RAs, meta-analyses, case studies, and review articles, as well as all of the different labels used to identify those sections. Before reviewing actual articles, one author (LH) developed a preliminary list of sections and labels based on previous work annotating thousands of RAs both by hand and with the use of Prodigy (2017). Prodigy is an annotation tool from the creators of spaCy (2017) that produces training and evaluation data to develop machine learning models.
Using this list of preliminary sections, JM conducted the initial textual analysis, manually scanning each article and recording when one of the anticipated sections was present. When she encountered a section with a different rhetorical purpose from all previously recorded sections, JM created a new column in the spreadsheet and began recording that section for all subsequent papers. When the found section label was a verbatim match for the label in the spreadsheet, JM simply tallied the result. If JM encountered a section that she suspected was an alternate label for one of the pre-established sections, she recorded the verbatim wording in the spreadsheet under that section's column. After scanning and recording all sections in a given article, JM then re-examined the article to annotate which sections were main sections and which were subsections. Sections that were demarcated with an individualized label (i.e., the label was specific to the topic of the paper) were not included in the tally. To review the full data collection workflow, see Figure 1.

Qualitative Data Analysis: Delineating Sections and Crafting Definitions
After the initial data collection, a multistep process was developed to delineate our final list of sections and to craft definitions for these sections (see Figure 2). First, the research team reviewed all the section labels and noted any suspected overlap in rhetorical purpose among sections with different names. For each overlapped grouping, it was first determined whether more than one of the sections ever appeared in a single article. If so, these sections were kept separate because they served different purposes. If not, the article texts were reviewed to determine the overall purpose of each section; any notable differences in specific language among these sections; and words or phrases held in common by these sections. If both the purpose and the language aligned, these sections were provisionally combined. In the final step, journal author guidelines, RA writing guides, and previous move-step analyses that discussed these specific sections were reviewed. These texts were used to refine the purpose assigned to each section and to identify any stated differences in purpose for grouped sections.
An inverse process was used to determine when the same label was used to refer to more than one type of section (see Figure 2). Annotators recorded when they suspected that one verbatim label was used to refer to sections with different purposes. They additionally identified the pre-established section with which each verbatim label most aligned. The articles' texts were then reviewed to verify that sections with the same verbatim label did, in fact, have divergent purposes, and determine if these labels could be added to an overlapping group, as described in the previous paragraph.
With the finalized list of sections, each section was assigned a title based on the labels most commonly used by authors. SN then crafted definitions in a three-step process (see Figure 3). She  began by writing provisional definitions based on the purposes that were assigned to each section. She then reviewed author guidelines, RA writing guides, and move-step analyses to create a list of common features and purposes for each of the sections. Finally, she used our provisional definitions and the list of common features and purposes to create definitions that captured the scientific community's collective understanding for each of the sections.
The sections identified typically appear in a standard order. Based on the collected data and observations, section definitions, American Association for the Advancement of Science (AAAS) (n.d.) guidelines, and the generally accepted IMRD format for academic articles, a model for an RA was developed that follows established practices. This model is presented in Figure 4.

Data Validation
Once the list of section labels and their definitions was finalized, the research team conducted a set of validation checks. SN conducted this second set of checks, scanning each of the RAs, case studies, meta-analyses, and review articles. These checks consisted of two parts and ensured that no sections had been missed or had been mistakenly tallied when actually absent and that suspected alternate labels were correctly categorized. This second step required that SN skim all sections with alternate labels to ascertain their purpose and check that the purpose matched the definitions. In cases when the appropriate categorization was not clear, the researcher presented the found text to the larger group for final deliberation.

Statistical Analysis
To analyze for differences in section frequency between RA and nonresearch articles, a twoproportion z test and used a 95% confidence interval (CI) was conducted. This statistical proportion test allowed us to test our hypothesis and is appropriate as the data are approximately normally distributed, of suitable size, and independent (McCullagh & Nelder, 1989). Tests were run with R statistical software version (R Core Team, 2014) 3.5.1 (2018-07-02) and the Mosaic package (Pruim, Kaplan, & Horton, 2017) (mosaic_1.5.0).

RESULTS
3.1. Objectives 1 and 2: Section Labels, Definitions, and Article Structure The first objective of this study was to identify each of the labeled sections used by RAs, define these sections based on their rhetorical purpose, and determine the general structure of an RA based on the frequency with which these sections are used. Our second objective was to identify all of the alternative labels used to identify these sections.
To achieve these objectives, 250 RAs and 30 non-RAs were analyzed (ten case studies, ten review articles, and ten meta-analyses). Within the RAs, 31 different section types known collectively by 302 different labels (see Tables 1 and 2) were identified. Twenty-four sections that are theoretically applicable to every RA, known by 186 different names (see Table 1) were found. The additional seven sections, all subsections of the "Methods" section, are only relevant to certain types of research, and were known by 116 different labels (See Table 2). Table 1 can appear as either a main section or as a subsection of another section. For example, Statistical Analysis almost always appears as a subsection of Methods or Results.

Some of the sections in
The research team also found that some specific labels were used to refer to more than one type of section. For example, "Summary" may refer to either the Abstract or Conclusion. Additionally, "Replication of Results," "Availability of Data and Materials," and "Open Practices Statement" may refer to Code Availability Statement, Data Availability Statement, or both combined. "Instruments" could also be used to refer either to the Instruments section or to the Materials section.
Many sections tend to be combined under one label, the most obvious examples being "Materials and Methods" and "Code Availability Statement and Data Availability Statement." Combination is especially frequent among the subsections in the Methods section. For instance, Study Subjects and Selection Criteria are often combined, as are Study Design and Procedures. For the purposes of this study, the project team counted combined labels toward each of the individual sections. For example, if an article had a "Study Design and Procedures" section, a tally for both Study Design and Procedures was made, but it also was recorded that it was a combined label. Common examples of combined labels are listed in Table 3.
A second type of combination, which is much more difficult to detect, occurs when authors combine sections but only list one of the labels rather than both. Interestingly, this type of combination is most prevalent with the Conclusion, Discussion, and Results sections. For example, regarding the Results, Discussion, and Conclusion sections, PLOS ONE states (PLOS ONE, n.d.), "These sections may all be separate, or may be combined to create a mixed Results/Discussion section (commonly labeled 'Results and Discussion') or a mixed Discussion/ Conclusions section (commonly labeled 'Discussion'). These sections may be further divided into subsections, each with a concise subheading, as appropriate." In these instances, the The objectives state the goal of the research or the question that the research will answer, often accompanied by a brief description of how the researcher will achieve those aims. The objectives should clearly relate to the gap in current research that the author establishes in the introduction. The methods section clearly describes the specific design of the study and provides a description of the procedures that were performed, giving enough detail that the reader can assess the credibility of the results. A methods section typically contains: • The population and equipment used in the study; • How the population and equipment were prepared and what was done during the study; • The protocol used; • The outcomes and how they were measured; • The methods used for data analysis; • Inclusion and exclusion criteria; and • An ethical approval statement. The results section presents a study's findings and observations ideally as neutrally as possible, without bias or interpretation.

Discussion of Results; General Discussion; Concluding Discussion
The discussion section puts the results into broader context and establishes their significance in relation to the stated objective, discussing new insights that resulted from the study and explaining any differences or similarities to other published evidence about the topic.

Conclusion* Summary; Summary and Conclusions; Concluding Remarks
The conclusion is a section that discusses the main takeaways from the study and sometimes implications for changes in research practice or future research opportunities.

Limitations* Strengths and Limitations; Limitations and Directions for Future Research; Assumptions and Limitations
The limitations section discusses important weaknesses in the design or scope of the study in a way that highlights their consequences for the interpretation of the results. This section places the study in context and addresses the generalizability, applications to practice, and utility of the study's findings. Often, the limitations section discusses opportunities for future research based on the identified weaknesses.

Acknowledgments Acknowledgment
The acknowledgment section primarily serves to recognize important individuals who made the work possible. This section can also include information about funding, affiliated institutions, associated fellowships, and other miscellaneous information.
Funding Statement* Funding Sources/Information; Sources of Funding/Support; Financial Disclosure Statement; Financial Support; Research Funding The funding statement indicates whether or not the authors received funding for their research, describes the role of each funder, and often provides specific grant information and grant numbers. An author contribution statement details each author's role in developing and publishing the manuscript, often following the CRediT format (Brand, Allen et al., 2015), which provides standardized phrases to describe different ways that an author may have assisted with the project.

Quantitative Science Studies
Conflict(s) of Interest* Competing Interests (Statement); Competing Interests Declaration; Disclosure (Statement); Declaration of (Competing) Interest(s); Disclosure of Relationships & Activities A conflict of interest statement acknowledges any financial, legal, commercial, or professional relationships that the researcher or the researcher's employer has with another organization or person that could influence the author's research.

Corresponding Author
Correspondence; Author Contact Information The corresponding author section provides contact information for one or multiple of the authors.

Publication Details Article Details
The publication details section lists publication information about the article, including the dates when it was received, accepted, and published, the number of pages, the ISBN or ISSN, and the number of tables and figures.

References Literature Cited
The references section provides, in a standardized format, publication information about all outside sources of information used to inform the research, giving enough detail that readers can ascertain the genuineness and reliability of the sources.
* This section can be used either as a main section or a subsection. The study subjects section is a description of the people or animals who were studied, including the number of subjects, relevant demographic and health information, and relevant differences among groups of subjects. This section is frequently combined with the "Selection Criteria" section (see below), but they are sometimes presented separately. The selection criteria section describes:

Quantitative Science Studies
• The eligibility conditions that study subjects or study units had to meet to be included in the study; • Any specific exclusion criteria; and • How the sample size was achieved.
In studies in which participants were recruited, this section also describes recruitment and selection methods. The measures section describes the framework and specific methods used to collect and assess data, including a description of the reliability of those methods. In qualitative studies, the measures section often describes a survey or interview instrument. For quantitative studies, the measures section may describe specific equations, models, scales, tests, or surveys used to assess data. The procedure section explains how the measures were applied to collect data and describes any processes to promote data quality.

Quantitative Science Studies
combined section includes the rhetorical components of each of the individual sections but is listed under only one label.
Another notable phenomenon that was encountered was the tendency to use individualized labels, or labels that were specific to the article's topic. These individualized labels were particularly common for the Background section. For instance, one article (Vlachantoni, 2019) labeled its Background, "Conceptualising need for social care," and another (Brooks, Tejedo, & O'Neill, 2019) labeled its Background, "General characteristics of Antarctic soils." Although the project team encountered significant variation among RAs, particularly in the specific labels used to demarcate sections, the project team was able to use the patterns in the data to create a model of a sample RA, shown in Figure 4. This model represents the typical order and organization of the various RA sections. Importantly, some articles strayed notably from the model. In particular, RAs that focused on a proof of concept tended to have few labeled sections, instead presenting models, equations, and results in a thematic order. Physics and mathematics articles tended to follow this thematic pattern.

Objective 3: Identifying Research Articles
Our third objective was to determine whether the sections and labels can be used to consistently determine whether an article is an RA or whether an article is not an RA. Achieving this objective required determining the frequency of use of each of the sections in RAs and comparing those results to the frequencies in nonresearch articles. It also required the identification of any sections that were unique to either RAs or nonresearch articles.
The research team found that certain sections were nearly universal across RAs, such as Abstract (99.6%), Introduction (89.2%), Methods (97.2%), Results (98%), and Discussion (92%). The only truly universal section across all RAs was the References section (100%). Other sections were unique to a particular publisher or were infrequently used, such as Publication Details (9.16%), which was only encountered with one publisher, and Background (12.8%). For a full list of results on section frequency, see Table 4.
Although some significant differences between RAs and meta-analyses, case studies, and review articles were found in each major section in RAs, they could also be found in nonresearch articles. As a result, the presence of one of these sections cannot be easily used to distinguish RAs from other journal article types.
There were specific sections that were found only in nonresearch articles, specifically "Case Study," and "Meta-analysis." Though these sections were not in every case study or meta-analysis, every paper that included either of these could quickly be identified as either a case study or meta-analysis. Some nonresearch articles use different labels to refer to a given section. For instance, many review articles referred to the Selection Criteria section as "Search Strategy," "Search Methods," or "Study Selection." These specific terms were not used in RAs and could therefore be used to determine if a journal article is not an RA.

DISCUSSION
The objectives of this study were to create a normalized set of journal sections and labels, determine structural differences between RAs and nonresearch articles, and determine whether sections can be used to identify an RA as traditional among a constrained corpus of articles within a set of specific publishers and journals. The results of our inquiry have significant implications regarding the initial issues that we set out to solve: difficulty identifying journal articles as RAs and difficulty querying RAs to locate particular types of information across a corpus of journals and publishers.
The different forms for RAs between publishers and among journals add a further dimension to the work began by Thelwall (2019). Not only did these results uphold this previous work, but further showed the differentiation among journals and publishers. It may be expected that a specific publisher would apply similar standards and requirements for how RAs are formatted, yet this was not found to be true. Taking this a step further, within a specific journal you may expect to see consistency with how RAs are structured, yet this was also found to be inconsistent.
With regard to identifying RAs, it was found that RAs and similar genres, such as metaanalyses, review articles, and case studies, cannot be easily distinguished based on the major RA sections alone: A, I, B, M, R, D, or C. Furthermore, these article types tend to include similar rhetorical components or moves. For instance, Kanoksilapatham (2015) identifies the Results moves of an RA as "summarizing procedures," "reporting results," and "commenting on results," moves that are common to all article types. Instead, differentiation may occur at the step level. For instance, one Introduction move is to "[establish] a territory to provide background information of the research topic," which may be present in all article types, but a review article may not include the typical step of "claiming centrality." There is thus more promise in differentiating journal article types by their subsections rather than by their main sections.
Another potential way to differentiate journal article types is by the specific labels used to identify a given section. For instance, both review articles and RAs frequently include Selection Criteria sections, which is defined as a section that "describes: 1) The eligibility conditions that study subjects or study units had to meet in order to be included in the study; 2) Any specific exclusion criteria; and 3) How the sample size was achieved." In studies in which participants were recruited, this section also describes recruitment and selection methods. Currently, there are far too many variations on these labels for them to be a practical way to identify RAs or non-RAs, but if these labels were standardized by genre it would be much simpler to use machines to accurately sort journal articles by genre or to find an algorithm that can identify a combination of these.
In terms of querying articles, the extensive variation in section labels is a significant barrier to comprehension for both human and nonhuman readers. In particular, the subsections of the Methods section could be very convoluted, especially because different authors used the same labels to refer to different types of information. For example, "Research Approach" was used to refer to a study's data collection methods in one article but "Study Design" in a different article. Similarly, "Conflict of Interest" and "Ethics Statement" were often used interchangeably, but "Ethics Statement" also often referred to approval from an Institutional Review Board. This type of inconsistency makes comprehension a challenge for human readers as well as machine-based readers. And, for machine readers in particular, the sheer number of possible labels and authors' tendency to use individualized labels makes it almost impossible to identify sections based on their labels. This reality is a problem particularly for researchers who wish to conduct section-based analyses.
It is important to note again the tension between clarity and accuracy. By nature, RAs present new and often complicated information, and standardized section labels could be an important way for authors to signal what type of information they will be presenting. Moreover, much of the variation did not serve any significant rhetorical purpose and would not, therefore, decrease accuracy. For instance, the difference between "Materials and Methods" and "Methods and Apparatus" is trivial. Unnecessary variations such as these, which can significantly impede machine-based analysis, should be minimized through a normalized set of labels and definitions agreed upon by the scientific community. Such an intervention would not only facilitate machine-based analysis but would also ease other researchers' ability to replicate and understand a study's findings and processes. Researchers could begin the process of standardization by reviewing the author guidelines provided through the Equator Network (2006), which outline best practices for RA sections but fall short of suggesting label names or precise definitions.

LIMITATIONS
There are a few notable limitations of our study. First, although we looked across the journals of 10 major publishers, an expanded list of publishers could have significantly altered our results. In our analyses, we found distinct patterns within both journals and publishers, so a different set of publications could have yielded very different results. Another limitation of this study is that we only harvested open access articles. A different set of researchers may be drawn to open access publications and could affect how those articles are structured. Perhaps the most significant limitation of this study was that we had a limited sample of meta-analyses, review articles, and case studies and that we grouped these types of articles together despite their differences. A more nuanced analysis could compare the sections in RAs to just one of these other article types to see if there are more consistent ways to differentiate RAs from specific subtypes. In this article, we grouped these subtypes together because we hoped to find an easy way to differentiate RAs from all other journal article types, but we did not find a way to do so. Individual analyses may therefore have been more appropriate. Finally, there was unavoidable subjectivity built into our study. Particularly in the methods section, where subsections often had convoluted purposes, the research team had to make decisions about the overall rhetorical purpose of that section and make the appropriate classification. We also had to decide which labels to choose as the official labels for various