A typology of scientific breakthroughs

Scientific breakthroughs are commonly understood as discoveries that transform the knowledge frontier and have a major impact on science, technology, and society. Prior literature studying breakthroughs generally treats them as a homogeneous group in attempts to identify supportive conditions for their occurrence. In this paper, we argue that there are different types of scientific breakthroughs, which differ in their disciplinary occurrence and are associated with different considerations of use and citation impact patterns. We develop a typology of scientific breakthroughs based on three binary dimensions of scientific discoveries and use this typology to analyze qualitatively the content of 335 scientific articles that report on breakthroughs. For each dimension, we test associations with scientific disciplines, reported use considerations, and scientific impact. We find that most scientific breakthroughs are driven by a question and in line with literature, and that paradigm shifting discoveries are rare. Regarding the scientific impact of breakthrough as measured by citations, we find that an article that answers an unanswered question receives more citations compared to articles that were not motivated by an unanswered question. We conclude that earlier research in which breakthroughs were operationalized as highly cited scientific articles may thus be biased against the latter.


INTRODUCTION
The evolution of scientific knowledge is commonly understood as an alternating process between short periods of breakthrough discoveries and long periods in which these breakthroughs are further refined and elaborated (Kuhn, 1962;Toulmin, 1967). Periods of breakthrough discoveries transform the knowledge frontier, while periods of refinement and elaboration allow for the major scientific, technological, and societal contributions of those breakthroughs to materialize (Evans, 2016;Hilgard & Jamieson, 2017;Winnink & Tijssen, 2014). However, while the periods of refinement and elaborations are well understood (e.g., Boyd & Richerson, 1985;Cavalli-Sforza & Feldman, 1981;Nelson & Winter, 1982), the characteristics of discoveries that transform the knowledge frontier remain unclear (Marx & Bornmann, 2013).
In recent years, several authors have suggested supportive conditions for scientific breakthroughs, such as cognitively diverse teams (Hage & Mote, 2010;Hinrichs, Seager, et al., 2017;Wu, Wang, & Evans, 2019), combinations of highly conventional and highly novel knowledge (Mukherjee, Romero, et al., 2017;Schilling & Green, 2011) and psychological characteristics of scientists, such as stubbornness and tenacity (Grumet, 2008). Generally, these studies understand scientific breakthroughs as being codified in highly cited publications, typically operationalized as the top 5%, 1%, or 0.1% highly cited articles in a field (Ponomarev, Williams, et al., 2014;Uzzi, Mukherjee, et al., 2013;Zeng et al., 2017). These studies thus treat scientific breakthroughs as a group characterized by high impact and implicitly assume that all scientific breakthroughs are highly cited articles, and conversely that all highly cited articles are scientific breakthroughs. A systematic effort to identify different types of breakthroughs has so far not been made. This is, however, useful, as it might be the case that breakthrough types occur under different circumstances and differ in their citation impact.
Below, we develop a typology of scientific breakthroughs, and examine differences between kinds of breakthroughs in terms of their disciplinary occurrence, considerations of use and citation impact. We make use of the Charge-Chance-Challenge (Cha-Cha-Cha) theory of scientific discovery as described by Koshland (2007) to develop a typology of scientific breakthroughs. Rather than understanding scientific breakthroughs as either Charge, Chance or Challenge type, as Koshland does, we propose three discovery dimensions that underlie those three types to provide a better understanding of the varieties of scientific breakthroughs. Using the three dimensions, we test to what extent configurations of these three dimensions are observable in the scientific literature by qualitatively coding the full text of 335 articles that, according to experts, report on scientific breakthroughs. We then use these coded articles to explore how the different characteristics are distributed over scientific disciplines and vary in their considerations of use and citation impact. We are particularly interested in those aspects, because Koshland, in his paper, provides examples of the different types of discoveries that come primarily from physical sciences and life sciences. Furthermore, as it is known that scientific breakthroughs have transformational potential both within and beyond science (Winnink & Tijssen, 2014), we explore how the different typological dimensions relate to furthering fundamental understanding and considerations of use (Stokes, 1997). Finally, we model the effect of breakthrough characteristics on cumulative citation impact over 10 years by means of a set of regression models. We find that breakthroughs vary widely in their citation impact, and that there are telling differences in impact between different types of breakthroughs.

THE CHA-CHA-CHA THEORY OF SCIENTIFIC DISCOVERIES
Koshland's Cha-Cha-Cha theory of scientific discoveries was developed to aid in understanding the heterogeneous nature of major scientific advances, and to improve our understanding of the conditions under which scientific breakthroughs occur (Koshland, 2007). The theory is developed from the perspective that different field conditions lead up to different types of scientific discoveries. These field conditions relate to the perceived state of knowledge in a scientific field, which offers opportunities for scientists to make relevant scientific contributions (discoveries). For example, a scientific discovery may provide an answer to a long-standing question in a field. Alternatively, a breakthrough may be a serendipitous encounter with an important new piece of evidence, which may fit or question the existing theory or observations in a field. Scientists may also recognize a set of inconsistencies in the state-of-the-art literature in a field, which they aim to resolve. Koshland's Cha's summarize these different kinds of scientific discoveries in three types: Charge, Chance, and Challenge.

Charge, Chance and Challenge Type Discoveries
Koshland defines Charge type discoveries as discoveries that "solve problems that are quite obvious … but in which the way to solve the problem is not so clear" (p. 761). In other words, Charge type discoveries resolve "known unknowns" (Logan, 2009). Koshland uses Isaac Newton's discovery of gravity as an example of a Charge type discovery, because "the movement of stars in the sky and the fall of an apple from a tree were apparent to everyone, but Isaac Newton came up with the concept of gravity to explain it all" (p. 761). A recent example of a Charge type scientific breakthrough is "cloaking technology" (Leonhardt, 2006), an invisibility device that has been a longstanding dream of many scientists. While it had been proven that perfect invisibility is impossible due to the wave nature of light, there was reason to believe that "perfect invisibility within the accuracy of geometrical optics" was achievable (p. 1777). Leonhardt (2006) reports the formulation of a "general recipe" for the design of media that can achieve such invisibility with possibilities for practical demonstrations. This breakthrough has thus, at least in theory, solved a well-known problem in a way that had not been thought of before.
Koshland defines Chance type discoveries as "instances of a chance event that the ready mind recognizes as important and then explains to other scientists" (p. 761). For a Chance type discovery, the original contribution lies in recognizing the importance of an unexpected encounter or explaining the importance to other members of the scientific community. These encounters typically involve some kind of serendipity (Copeland, 2019;Koshland, 2007;Yaqub, 2018). Encounters may take the shape of accidental discoveries of fossils, ancient remains, and other natural phenomena, but also of unexpected outcomes of planned experiments, such as Alexander Fleming's discovery of penicillin (Koshland, 2007). A more recent example of a Chance type discovery is reported in an article by Palmer, Barthelmy, et al. (2005), who report on neutron star SGR 1806-20 emitting a giant gamma-ray flare on 27 December 2004. Recognizing the importance of this event was crucial: The authors note that this flare was about a hundred times higher than the two giant flares observed from this neutron star earlier, whereas the energy of giant flares is usually a thousand times higher than that of a typical burst. Because of that difference, the authors further note that under different circumstances, this burst could have been interpreted as another type of burst. Instead, they suggest that the observed flare is of a newly discovered subclass.
As a third type of scientific breakthrough, Koshland defines Challenge type discoveries as "a response to an accumulation of facts or concepts that are unexplained by or incongruous with scientific theories of the time" (p. 761). Koshland provides Einstein's theory of special relativity as an example of a Challenge type discovery, as it provided a theory that explained anomalies with contemporary theories. Another example presented itself with the report on a draft sequence of the Neandertal genome (Green, Krause, et al., 2010). The authors emphasize in the introduction of the article that "substantial controversy surrounds the question of whether Neandertals interbred with anatomically modern humans" (p. 710). The challenged model was based on the idea that modern humans, after leaving Africa, completely replaced Neandertals without interbreeding. This theory was supported by evidence on morphological features and DNA of modern humans, although the evidence was considered to be inconclusive. The draft sequence of the Neandertal genome suggest that Europeans and Asians, but not Africans, have inherited genes from Neandertals-a finding that does not fit with the model. Instead, the authors put forward an alternative theory: Neandertals interbred with modern humans after they left Africa, but before they spread into Europe and Asia. While Koshland's categorization is intuitive, it has thus far not been used to systematically map scientific discoveries. One obstacle towards using Koshland's theory to classify scientific breakthroughs holds that Koshland does not specify whether we should understand scientific discoveries as mutually exclusive (i.e., Chance, Charge, or Challenge alone), or as combinations of types. For example, a discovery can fit the description of both Chance and Challenge. Consider, as an example, an article published in 2000, reporting on the discovery of two early hominid skulls and tools at a site in the Republic of Georgia, which the authors interpret as evidence that "the initial hominid dispersal from Africa was driven not by technological innovation but more likely by biological and ecological parameters" (Gabunia, Vekua, et al., 2000, p. 1025. This discovery fits the definition of a Chance type discovery because it involves a chance encounter that scientists recognized as important, but it also fits the definition of a Challenge type discovery because the authors interpret evidence that is incongruent with scientific theories of the time, and propose an alternative theory. Moreover, Koshland also does not specify whether the three types are meant to be exhaustive. It might be that some discoveries do not fit with the definition of any of Chance, Charge, or Challenge.

From Discovery Types to Discovery Dimensions
As we aim to characterize and compare scientific breakthroughs, we allow for the possibility that Koshland's discovery types are neither exhaustive nor mutually exclusive. Rather, we assume that breakthroughs can be characterized on three binary discovery dimensions. For each of Koshland's discovery types, the state of two of the three binary dimensions is fixed, while the state of the third dimension may vary (see also Table 1). For both states of each dimension, we provide examples of relevant scientific articles in Table A1. We summarize the dimensions as follows: 1. The discovery is driven by a question, or by a research object First, we distinguish between discoveries that are question driven and discoveries that are research object driven. Whereas in the case of question-driven discoveries the area of ignorance and the line of enquiry in the field is well established and widely shared ("we know what we do not know"), discoveries driven by a research object are inverse question driven: The discovery precedes the formulation of the question ("we do not know what we do not know") (Meyers, 2011). For example, archaeologists might discover ancient hominid remains in an unexpected location, which then raises questions about the distribution and social relation of hominids (Brunet, Guy, et al., 2002). The discovery of ancient remains thus drives the formulation of a question that was not asked before. Note that it may be the case that discoveries driven by a research object do actually provide answers to questions, but these questions were not the driver of the discovery.
Koshland's Charge type can be characterized as question driven, referring to discoveries that solve long-standing problems. In our earlier example, the discovery team was able to design a theoretical cloaking device in response to the ambition of engineering invisibility. Chance and Challenge type discoveries are both research object driven: Chance type discoveries start from an encounter with a research object that awaits interpretation, and Challenge type discoveries start from the recognition of a research object that does not fit with existing theories. In our example of a Chance type discovery, this was the observation of a giant gamma-ray burst, and in our Challenge example, this was genetic The discovery introduces a new question/research object, or contributes to a known question/research object As a second dimension, we distinguish between new and known questions and research objects. We understand questions and research objects that are known as questions and research objects that are documented in the scientific literature, and of which the scientists who made the discovery were aware. Conversely, new questions and research objects are those that are introduced by the scientists who made the discovery and are, therefore, themselves part of the discovery. With regard to the scientific impact of a discovery, this is a relevant distinction, as it indicates whether the discovery team can be credited for introducing the new question or research object, or only for resolving or contextualizing it. Koshland described this in terms of "uncoverers," or scientific teams for whom uncovering the question or research object is (part of ) their original contribution, and "discoverers," or scientific teams that contribute to a question or research object uncovered by others (p. 761).
Koshland's Chance type can clearly be characterized along this dimension. For Chance type discoveries, it is the "uncovering" that is critical, along with recognizing and interpreting the relevance of the uncovered research object. Without observing the giant gamma-ray burst, its discovery team would not have been able to report any discovery.
In the case of Charge or Challenge discoveries, the question or research object can be either new or known. For Charge type discoveries, which answer "obvious" questions, the question may be a long-standing one that many others have tried and failed to solve, such as the ambition of invisibility or the puzzle of gravity, but it may also be a question that they raised themselves as an extension of existing literature. For example, the authors of an article that reports on the derivation of germ cells from stem cells argued that "because embryoid bodies sustain blood development, we reasoned that they might also support primordial germ cell formation," thus raising the question of whether germ cells can indeed be made from such embryoid bodies (Geijsen, Horoschak, et al., 2004, p. 148). And, Challenge type discoveries can be a response to an accumulation of facts that the discovery team uncovers themselves, or that were already known in the literature before. Our example of a Challenge type discovery based on the draft sequence of the Neandertal genome includes both: It reports original evidence that counters the existing model, and describes pieces of evidence that were uncovered by others. Here, we are in agreement with Koshland (2007), who also argued that Challenge type discoveries can be accompanied with uncovery or not. Following his wording, it is the discovery of a new explanation of facts that is critical for Challenge type discoveries, not the uncovering of the facts as such.

The question or research object is against or in line with state-of-the-art literature
Third, we distinguish between discoveries that go against state-of-the-art literature and discoveries that fit with or follow logically from existing literature. In other words, the discovery may have the potential to cause a paradigm shift, or it may fit within the current paradigm (Koshland, 2007;Kuhn, 1962). Koshland's Challenge type discoveries are driven by research objects that are incongruent with the current paradigm, and their interpretation thus calls for a paradigm shift. Challenge type discoveries can thus be characterized as "against state-of-the-art literature." The article on the Neandertal genome, for example, reported existing evidence incongruent with the current paradigm, uncovered additional evidence, and offered an alternative model. Charge type discoveries, on the other hand, answer questions that have been part of the existing literature or follow logically from it and, logically, cannot go against state-ofthe-art literature. The discovery of a theoretical cloaking device was, indeed, in line with earlier ideas on the feasibility of such a device. Chance type discoveries may or may not be in line with state-of-the-art literature, depending on the interpretation of their discoverers. The article by Palmer et al. (2005) on an observed gamma-ray flare is an example of the former: the flare was interpreted as an additional category of flares. This interpretation offered an extension of the current model and did not require a paradigm shift. The discovery of hominid skulls and tools in the Republic of Georgia (Gabunia et al., 2000), by contrast, is an example of both a Chance type discovery and a Challenge type discovery, as the evidence is seen as incongruent with scientific theories of the time.
In summary, we can define Koshland's types as configurations of three binary dimensions, as summarized in Table 1. Following the table, Charge type discoveries are driven by a question, be it a new or known question, and are in line with existing literature. Chance type discoveries are driven by a new research object and may be in line with or against existing literature. Challenge type discoveries are driven by a new or existing research object, and go against existing literature.
It follows from Table 1 that Koshland's three discovery types are not exhaustive of the possible types that are analytically conceivable. Indeed, there is no reason to assume that discoveries will only meet the particular configurations of the dimensions that are consistent with Koshland's three types. Using the framework, we are able to characterize scientific breakthroughs in three binary dimensions, so that each scientific breakthrough is classified as one out of eight (2 3 ) possible types, rather than Koshland's three. The question, then, of which of the eight possible discovery types is most prevalent, is an empirical one.

Data Collection
To characterize different types of scientific breakthroughs, we make use of Science's annual announcement of the Breakthrough of the Year (BotY) (AAAS, 2018) between 1999 and 2012. Each year the magazine's scientific editors select "the most significant scientific discovery of the year" (AAAS, 2018) and its nine runners-up 1 . The selected breakthroughs are described by the journal's reporters in the final issue of the year. These descriptions may refer to a single scientific breakthrough or to a multitude of breakthroughs that center on a common theme 2 , and include a list of references to the original research described and other supportive material.
For this paper, we use the reference list of each BotY description to select research articles that report on the scientific breakthrough. We will refer to these articles as breakthrough articles. We use the following requirements in our selection of breakthrough articles from the BotY (and runners-up) reference lists: (a) Articles should be written in English 3 ; (b) articles should be published in the same year as the year in which they were announced BotY or runner-up, with the exception of articles published in December the year before (as these were published after the BotY announcement of the previous year); (c) articles should be published in peer-reviewed academic journals 4 ; (d) articles should report original results described in the BotY description: Review articles or articles that were included as further reading are omitted; (e) articles should have a DOI and be available on Web of Science (WoS); and (f ) articles should not have been retracted afterwards 5 .
Although the announcement of BotYs began in 1996 (replacing the annual announcement of Molecule of the Year), data are collected from 1999 onwards, as no runners-up were announced in 1998. BotYs are collected until 2012, to allow the articles at least 6 years to receive citations after publication. This resulted in 335 scientific breakthrough articles, derived from 140 BotYs (14 years: one breakthrough and nine runners-up per year). Table 2 shows a summary of this selection process.
We used the DOI of each article to collect data from WoS: (a) publication date; (b) publication source; (c) citation report consisting of the number of citations per year for 10 years, or as many years as possible for articles published after 2008; (d) number of authors; and (e) a PDF of the article's full text (including abstract). These data were extracted from WoS in April 2018. The articles' full texts were used to code the breakthrough article in terms of the three discovery dimensions and its reported considerations of use.
We also collect data on the scientific discipline of each of our breakthrough articles based on the indexation of Nature 6 (Springer Nature, n.d.). We distinguish between disciplines as listed by Nature: biological sciences 7 , business and commerce, environmental sciences, health sciences 8 , humanities, physical sciences, scientific community and society, and social sciences. 9 Because many of the examples of Chance type discoveries supplied by Koshland are specifically from paleontological sciences and astronomical sciences, whereas examples from Charge and Challenge type discoveries are not (Koshland, 2007), we will further distinguish between paleontological sciences and other biological sciences and between astronomical sciences and other physical sciences.

Coding Discovery Dimensions and Reported Considerations of Use
We use directed content analysis (Hsieh & Shannon, 2005;Saldaña, 2015) to code each breakthrough article on each of the three binary discovery dimensions and on the considerations of use of the scientific breakthrough article. The result of this process will be used to assess descriptively, statistically, and visually differences in discipline, citation impact, and reported considerations of use between breakthrough articles by dimensions (see also Section 2.2).
To code articles on the three discovery dimensions, we use the text of the articles. For each article we search for key phrases indicative of each dimension. Examples of key phrases used can be found in Table A1, and were developed in three steps. First, two coders, K. S. and M. L. W., 4 Note that, although BotYs are announced by Science, reference lists include articles published in other peerreviewed journals. 5 As retraction of articles is essentially right-censored, because any article may be retracted in the future, there may be a bias against older articles. However, there are few retractions in the data. 6 For breakthrough articles that were not published by Nature, we identify the most relevant scientific discipline by determining the discipline of referenced articles published in Science. 7 Including anatomy, physiology, cell biology, biochemistry, biophysics, and paleontology (Springer Nature, n.d.). 8 Including aspects of health, disease and healthcare aiming to develop knowledge, interventions, and technology for use in healthcare (Springer Nature, n.d.). 9 However, in our data set, we only find articles from "biological sciences," "physical sciences," "environmental sciences," and "health sciences." coded 14 breakthrough articles from 2014, which were not part of the data set of this paper, by highlighting phrases that signal the state of the three dimensions as defined in Section 2.2. During this stage it was found that while relevant phrases could be found throughout the whole text of the article, the articles' abstracts, first sentence, introductions and conclusions are the most informative with regard to the state of the discovery dimensions. Coding thus focused on these sections, or on the whole text if abstract, introduction, and conclusion were inconclusive. Second, coding differences between K. S. and M. L. W. were discussed until a consensus was reached on common coding practices. Third, the highlighted phrases of these 14 breakthrough articles were summarized into stylized phrases. Note that some key phrases serve as signals for more than one dimension. For example, the phrase "On [date] we have observed […]" signals that the reported scientific breakthrough is research object driven rather than question driven, but also that this research object is new rather than known, because uncovering this research object is part of the breakthrough. Fourth, the 335 breakthrough articles in our data set were then independently coded by both coders. For our analyses, the dimension states question-driven, new, and against literature are coded as 1, and research object-driven, known, and in line with literature are coded as 0.
For the identification of reported considerations of use we follow the four quadrants proposed by Stokes (1997) when cross-tabulating two questions: (a) Does the article report applied considerations of use of the scientific breakthrough, or not?; and (b) Does the article report that the scientific breakthrough is part of a quest for fundamental understanding, or not? Articles that do not report applied considerations of use but do report contributions to fundamental understanding are considered basic research. Articles that only report applied considerations of use without contributing to fundamental understanding are applied research. Articles that report both applied and fundamental considerations of use are considered as useinspired basic research, also known as "Pasteur's quadrant" (Stokes, 1997). Finally, articles may report neither applied nor fundamental considerations of use. For the development of key phrases that signal considerations of use, we followed the same procedure as for the development of key phrases for discovery dimensions, described above. Such phrases were found to be typically reported in the final paragraph(s) of the breakthrough articles. Key phrases, as well as examples of reported considerations of use, can be found in Table A2.
We present intercoder reliability for each of our coded dimensions in Table 3, where we report Cohen's kappa, which takes intercoder agreement by chance into account. We find that kappa values are sufficiently high (Cohen, 1960). In cases of disagreement, final codes are based on consensus between the two coders. Consensus was found for all articles in the data set, which implies that all 335 publications originally selected serve as empirical observations. Have not been retracted 335

Analysis
We run chi 2 tests and Tukey's HSD post hoc tests (Tukey, 1949) to test whether discovery dimension states are associated with scientific disciplines and reported considerations of use. We also present bar charts to assess differences visually.
To test whether discovery dimensions affect cumulative citation patterns, we run a set of regressions with the number of cumulative citations as the dependent variable, and three dummies that represent the three binary discovery dimensions as the main independent variables. Ten regression models estimate cumulative citations from 1 to 10 years after publication. For this, negative binomial regression is appropriate, as our dependent variables reflect overdispersed count data (Cameron & Trivedi, 1998): If we were to use a Poisson regression rather than a negative binomial regression to model cumulative citations 10 years after publication using all our independent variables, the residual variance of 184.073 would exceed the 220 degrees of freedom. As each BotY description can contain references to multiple breakthrough articles, we cluster standard errors at the level of the BotY description. As using cumulative citations per year makes it difficult to identify differences in the number of citations per year, we rerun our models using number of citations per year for one to 10 years after publication rather than cumulative citations per year for 1 to 10 years after publication as dependent variables, presented in Figure A1 and Table A3.
Individual authors may each boost the cumulative citations to their own articles by bringing their work under the attention of others (Aksnes, 2003). Therefore, we include number of authors as a control variable. As this variable is heavily skewed, we use a log transformation for number of authors. In a second set of models, we further include dummy variables for discipline, as it is known that citations rates vary between disciplines, and we find that configurations of discovery dimensions are not randomly distributed over disciplines.

Configurations of Discovery Dimensions and Associated Disciplines and Considerations of Use
In Table 4, we present the distribution of articles in our data set over the eight different configurations of discovery dimensions. We also compare our typology to Koshland's. We see that some combinations of characteristics are more common than others. Notably, the majority of our articles (77%) are what Koshland would describe as Charge type discoveries: driven by a known question that is in line with theory, irrespective of whether the research object is new or known. We also find that most articles can indeed be classified according to Koshland's typology: Only 43 articles (13%) do not fit with that typology, either because they have properties of more than one type (11%), or because they have properties of none (2%). In this sense, the original typology of Koshland can be regarded as useful. Figure 1 presents bar charts of the discipline and reported considerations of use for each configuration of discovery dimensions, discussed in detail below. Most articles in our data set report considerations of use for fundamental understanding only (74%). Just a few are only applied (7%), while another 17% are classified as both fundamental and applied; 2% report neither.
A large share (47%) of the articles report on research on biological sciences (excluding paleontology), while 26% report on research on physical sciences (excluding astronomy). Research in the field of health sciences and environmental sciences is less common.

Dimension A
The majority of articles in this study (86%) are question driven. We find that question driven discoveries are not randomly distributed across disciplines (chi 2 = 100, df = 5, p < .001). Based  on our post hoc test, we find that being question driven is associated with health sciences, more than with other disciplines ( p < .05). Indeed, almost all (96%) health sciences articles in our data set are question-driven. Conversely, being research object driven is associated with astronomy and paleontology more than with other disciplines ( p < .05). This may be because disciplines such as paleontology and astronomy more often encounter unexpected physical research objects, for example from fossil records and satellite observations, respectively. Question-driven articles are also not randomly distributed across the four Stokes quadrants (chi 2 = 20, df = 3, p < .01). Our analysis suggests that being question driven is associated with reporting only applied considerations of use and with reporting both applied and fundamental considerations of use, while research object driven breakthroughs are associated with reporting on neither ( p < .05).

Dimension B
Articles with a new question or research object are slightly less common than articles with a known question or research object (43% versus 57% of articles). Most of these are articles with a new question rather than a new research object (74%). Of our total set of articles, only 10% report on a new research object that is in line with theory.
Our results indicate that this dimension and discipline are not independent (chi 2 = 20, df = 5, p < .001). Specifically, we find that having a new question or research object is associated more with biological sciences (but not paleontology) and astronomy, while having a known question or research object is associated with physical sciences (but not astronomy) ( p < .05). It is further worth noting that breakthroughs that are specifically driven by a new research object are primarily found in astronomy (see also Figure 1). We do not find a strong association between this discovery dimension and Stokes' four quadrants regarding considerations of use (chi 2 = 7, df = 3, p > .05).

Dimension C
Breakthroughs that go against the state-of-the-art literature are uncommon (11%) and among them the large majority are question driven. We do not find a strong association between this discovery dimension and discipline (chi 2 = 9, df = 5, p > .05). Our post hoc tests suggest that breakthroughs going against the literature are somewhat common in paleontological articles, while being in line with the literature is associated more with health sciences ( p < .1) and physical sciences except astronomy ( p < .05). In terms of reported considerations of use, we do not find significant evidence that being against state-of-the-art literature is associated with reported considerations of use (chi 2 = 5, df = 3, p > .1). Table 5 shows descriptives of the cumulative number of citations within 10 years per discovery dimension state. On average, the articles in our data set collect 799 citations within the first 10 years after publication. However, with a median of 489 and an interquartile range of 625, this varies broadly: While the lowest decile of the articles in our data set have fewer than 108 citations, the highest decile has more than 1,571. Interestingly, one article did not receive any citation within 10 years 10 . 10 This article, which reports the first results from the Sudbury Neutrino Observatory (Helmer & SNO Collaboration, 2002). It may be that this article did not receive any citations because there were two other articles that report results from the same observatory (also included here). While all three articles were originally submitted in April 2002, the Helmer et al. paper was published in November, while the other two were published in June.

Citation Impact and Discovery Dimensions
To test if the discovery dimension can explain some of the variation in citation counts, we present the incidence rates of negative binomial regression models including control variables in Figure 2, with one regression for each of the 10 years. Incidence rates for the effect of being question-driven, driven by a new question or research object, or being against literature are presented relative to being research object-driven, driven by a known question or research object and being in line with literature, respectively. Table 6 presents the regression coefficients of our models, where cumulative citations 1, 2, 5, and 10 years after publication are used as dependent variables. Models based on cumulative citations after 3, 4, 6, 7, 8, and 9 years were omitted from this table for readability reasons. Dummies for the three binary dimensions (with question-driven, new, and against literature coded as 1, and research object-driven, known and in line with literature coded as 0) and control variables for # authors (log) and discipline, with biological sciences as reference category, are included.

Dimension A
We find that being question driven has a positive effect on cumulative citations of scientific breakthrough articles. After 10 years, articles that are question driven are estimated to receive twice as  Controlling for discipline in Models 5-8 slightly reduces the effect of being question driven, suggesting that part of the effect seen in Models 1-4 is, in fact, due to high citation rates of disciplines that are associated with being question driven. However, this does not alter our conclusion that being question driven has a positive effect on cumulative citations of scientific breakthrough articles.

Dimension B
We do not observe a significant association between this dimension and cumulative citations.
Our coefficients suggest that there may be a small positive effect of being driven by a new question or research object on cumulative citations shortly after publication, which decreases in later years. This may be caused by the unexpectedness and novelty of the new question or new physical evidence introduced in the breakthrough article, and the sudden interest that this may spark. However, this is not a significant finding. Note: *** p < .001, ** p < .01, * p < .05.

Dimension C
We do not find a significant association between articles going against the literature and cumulative citations. Upon visual inspection, there is some indication that breakthrough articles driven by a question or research object that is against state-of-the-art literature receive more citations in later years (year 9 and 10). The trend observed supports the idea that paradigmshifting discoveries require more time to have an impact before they can be integrated in future knowledge development. However, the results are statistically insignificant.
The results of Models 9-16, where we use citations per year rather than cumulative citations per year as dependent variable, are presented in Figure A1 and Table A3. These results are in line with our earlier results. Again, we find that only the question-driven dimension significantly affects the number of citations received. We find that the difference in the number of citations per year between question-driven and research object-driven articles is biggest after 4-5 years.

DISCUSSION
In this paper, we have developed a typology of scientific breakthroughs and applied this typology to characterize a set of articles reporting on scientific breakthroughs. Using Koshland's Charge-Chance-Challenge theory of scientific discovery as a starting point, we propose that scientific breakthroughs can be characterized along three dimensions: (a) whether the discovery is question driven or research object driven; (b) whether the discovery contributes to a known question or research object or introduces a new one; and (c) whether the discovery is in line with, or against, state-of-the-art literature. We subsequently use the typology to characterize 335 breakthrough articles along the three dimensions and analyzed how breakthrough characteristics relate to scientific disciplines, citation impact, and considerations of use for fundamental understanding and application.
One of our main findings holds that the large majority of breakthrough discoveries can be classified as one of Koshland's discovery types within his Cha-Cha-Cha framework. However, we also observed that a small proportion of breakthroughs could not be characterized as any of Koshland's types, and some other articles fell into multiple Koshland types. Based on this finding we conclude that, rather than distinguishing between Charge, Chance, and Challenge types, breakthroughs can better be understood as being question driven or research object driven, introducing a new question/research object or a known question/research object, and having a contribution that is against or in line with state-of-the-art literature. We believe that our framework marks an improvement over the original Cha-Cha-Cha theory, as we have made the underlying dimensions explicit and orthogonal to one another, expanding the typology from 3 to 2 3 = 8 types. Our framework, then, can be used in future research to further probe the antecedents and effects of scientific breakthroughs. It can equally be used to analyze differences between characteristics of breakthrough and nonbreakthrough discoveries. A logical extension of this paper is also to study whether the configurations of discovery dimensions discussed here are distributed differently over breakthroughs than over nonbreakthroughs, and to test whether the citation patterns we found are also observed for nonbreakthroughs.
Our main empirical finding holds that most scientific breakthroughs are driven by an already existing question and in line with the state-of-the-art literature. This finding broadens our view of science in that it questions the popular view of scientific breakthroughs as radical, paradigmshifting discoveries (e.g., Evans, 2016;Ventegodt & Merrick, 2004). Rather, it suggests that the majority of scientific discoveries that are recognized as breakthroughs are better described as "normal science" (Kuhn, 1962).
Our analysis also shows that articles reporting on scientific breakthroughs vary considerably in their citation impact. In particular, breakthrough articles that were driven by a research object rather than a question receive far fewer citations. This finding has implications for the interpretation of earlier research on scientific breakthroughs. Previous research has mainly analyzed scientific breakthroughs based on citation impact, thereby considering breakthroughs as a homogeneous group of discoveries (Ponomarev et al., 2014;Uzzi et al., 2013;Zeng et al., 2017). In contrast, our findings suggest that earlier research aimed at identifying supportive conditions for scientific breakthroughs did not recognize the variety of breakthroughs and may have been biased against a minority of breakthroughs driven by research objects. Therefore, their findings may not be generalizable to all scientific breakthroughs. For literature on scientific breakthroughs, a next step is to identify how conditions such as team composition and sponsoring affect the occurrence of a variety of scientific breakthroughs, and in particular those breakthroughs that are research object-driven, as we have shown that these have been underrepresented in the literature thus far.
In this research, discoveries that were marked as scientific breakthroughs by Science have been leading. This operationalization of scientific breakthroughs has several implications for the generalizability of our findings. In the first place, our research only includes discoveries that are recognized as scientific breakthroughs within a year after publication. Discoveries that are recognized as such in a later phase may not have the same characteristics. For example, their citation impact may differ over time. An interesting avenue for future research would be to distinguish between discoveries that are received as breakthroughs shortly after publication and those that are recognized as breakthroughs later on. An interesting question then holds whether the relative prominence of the three dimensions introduced here differs between early and delayed recognition. In the second place, it is not unlikely that the nomination for BotY in itself affects the way a discovery is received. The increased visibility of the discovery may inspire others to refine the discovery in other research projects, and can lead to an increase in citations or even an increase in the likelihood of receiving a significant prize, such as a Nobel Prize or a Fields Medal. For future research, we encourage alternative approaches to identifying scientific breakthroughs that are more sensitive to delayed recognition and are not based on external assessments. One such approach has been developed by Small, Tseng, and Patek (2017), who identify and characterize biomedical discoveries based on automated text analysis of citing sentences and cocitation analysis.
Our analysis of breakthrough discoveries is further limited by what has been reported in the scientific articles. As such, we must limit ourselves to an analysis of the reported drive of the scientific discoveries observed, which may not be the same as the actual drive of the discovery. Indeed, authors may present the process of discovery as more linear and rational than it actually has been (Myers, 1985). Similarly, the authors' motivation to write and publish the article may be different from their motivation to start the reported research project. For example, their original line of enquiry may have resulted in a serendipitous finding that solves an unexpected problem in another line of enquiry (Yaqub, 2018), which might lead the authors to change their narrative as well.
Furthermore, our analysis is limited by the limited number of articles considered. With more observations, we could test differences in citation patterns of combinations of dimensions, rather than for single dimensional states. This may help us understand whether breakthrough articles that go against existing theory are accepted by the scientific community faster if they provide an answer to a long-standing question, for example. This is likely, as the question-driven approach of such articles may provide more legitimacy to the anomalous finding than if it were driven by new evidence. We therefore encourage others to extend our analysis to a larger set of breakthrough articles, potentially also including a broader range of scientific disciplines. Note: *** p < .001, ** p < .01, * p < .05. Figure A1. Incidence rate for noncumulative citations after N years, after controlling for the log of the number of authors and discipline.