How research programs come apart: The example of supersymmetry and the disunity of physics

Abstract According to Peter Galison, the coordination of different “subcultures” within a scientific field happens through local exchanges within “trading zones.” In his view, the workability of such trading zones is not guaranteed, and science is not necessarily driven towards further integration. In this paper, we develop and apply quantitative methods (using semantic, authorship, and citation data from scientific literature), inspired by Galison’s framework, to the case of the disunity of high-energy physics. We give prominence to supersymmetry, a concept that has given rise to several major but distinct research programs in the field, such as the formulation of a consistent theory of quantum gravity or the search for new particles. We show that “theory” and “phenomenology” in high-energy physics should be regarded as distinct theoretical subcultures, between which supersymmetry has helped sustain scientific “trades.” However, as we demonstrate using a topic model, the phenomenological component of supersymmetry research has lost traction and the ability of supersymmetry to tie these subcultures together is now compromised. Our work supports that even fields with an initially strong sentiment of unity may eventually generate diverging research programs and demonstrates the fruitfulness of the notion of trading zones for informing quantitative approaches to scientific pluralism.


Introduction
This paper focuses on High-Energy Physics (HEP), the field of physics concerned with the fundamental entities of nature, and "supersymmetry", a symmetry between the two basic types of particles in nature.The idea of supersymmetry has brought together many of the most significant developments in the field throughout the past 50 years, all the way from the highly abstract world of string theorists, deep down to the machinery of under-ground particle colliders.However, none of the discoveries that supersymmetry promised have materialized as expected; as much as supersymmetry may be necessary to theorists seeking to unify the forces of nature into a coherent picture, it is increasingly plausible that it will not be of much use to the experimentalists looking to find new particles.Throughout this case study, therefore, our work exhibits the disunity of science, by demonstrating that even scientific fields that have been strongly committed to unity, such as HEP, can eventually fail to coordinate various research efforts.Our paper is guided by the idea that empirical case-studies, although seemingly narrow in scope, do enrich our understanding of the nature of scientific enterprise (in this case, the nature of the coordination of diverse scientific cultures), and that quantitative studies of science should provide conceptually informed tools for carrying out such case-studies, preferably in ways that can be generalized for a variety of contexts.
We start by presenting Galison's notions of subcultures and trading zones which is the framework for studying the plurality of science and the dynamics of interactions between scientific fields that underlies our investigation (2.1).We will then provide the necessary background knowledge for understanding the context of our case-study before laying out our hypotheses: i) that theory and phenomenology, over the historical period considered , are to be regarded as two distinct theoretical subcultures within high-energy physics; ii) that supersymmetry generated diverse research programs, some being phenomenological and some being more theoretical; and iii), that supersymmetry significantly contributed to sustain successful trades between theory and phenomenology until it was put in doubt by experimental data (2.2).We then elaborate our motivation for addressing these hypotheses through quantitative methods (2.3).Then, section 3 details the quantitative methods that were deployed in order to address each of the three claims put forward in the introduction.It starts with a description of the data on which our analysis rests and how it was collected (3.1).Subsection 3.2 elaborates quantitative methods for assessing the level of semantic and social autonomy of certain categories (subcultures), and applies these methods to the two theoretical subcultures in HEP.The next subsection (3.3) elaborates a methodology based on topic models in order to address the "plasticity" and "plurality" of supersymmetry, which can in principle be applied to all "boundary objects", i.e. those objects that can be traded between distinct subcultures while preserving and sustaining their distinctness.Finally, subsection 3.4 provides a quantitative model for locating "trading zones" or more broadly concepts that enhance trades between subcultures (or scientific disciplines in general), and applies the model to the exchanges between the theoretical subcultures of HEP.Section 4 reveals and interprets the results of these analyses.Finally, Section 5 explores the consequences of this work, both for our case study (supersymmetry within HEP) and for the more general question of the plurality of science from a quantitative perspective.

Background
2.1 Subcultures and trading zones: Galison's approach to the plurality of science If science is a unified enterprise, what is the nature of the relationship between fields as diverse as physics, biology, psychology, or economics?Can we translate all the concepts of these disciplines into a basic (say, physical) scientific language, as Carnap proposed?Or, are all these fields so incommensurable and autonomous that it is impossible to translate their respective entities, laws, and explanations from one's language to another's, as proponents of a pluralistic view defend (e.g.Suppes (1978), Dupré (1983), andCartwright (1999))?Disciplines themselves can be so diverse, too, that the nature of what makes their own unity is not necessarily obvious.For instance, the nature of the unity of physics has been the matter of much debate, with sometimes serious political implications: reductionist views (which imply that high-energy physics is the most fundamental, since it supposedly entail any higher-level theory) were mobilized to justify the funding of large particle physics facilities (Cat, 1998), potentially to the detriment of more "useful" projects, as certain condensed matter physicists argued (Martin, 2018, Ch. 9).Instead, the latter argued that macroscopic systems have emergent properties that cannot be derived from "fundamental" laws.They were most often proponents of a "methodological" form of unity (Ibid.,p. 233), according to which the field is bound together by shared norms and conceptual tools (Cat, 1998, p. 267), rather than by relations of logical deduction from the most fundamental to the least fundamental theories.This view provided an intellectual and philosophical basis for elevating the prestige of condensed matter physics (Martin, 2018, p. 148-149), thus putting condensed matter and high-energy physicists on a more equal footing.
Even within the subfield of particle physics, there is a strong contrast between theorists and experimentalists.In fact, the nature of the relationship between the objects manipulated by, say, experimentalists (for instance, tracks within a cloud-chamber, or electric signals from a sensor) and the more abstract entities manipulated by theorists (e.g."quarks", "gluons", "strings", etc.) has been the subject of much philosophical debate.Inheriting a positivist view, some would grant experiment a more fundamental status, by defending its ability to provide robust empirical statements that could dictate theoretical change.Others, such as Kuhn, argued that empirical statements cannot be isolated from a theoretical paradigm and emphasized the"primacy" of theory (Galison, 1988) 1 .It is in order to overcome this debate about the relationship between experiment and theory within the context of physics that Galison originally developed his concepts of subcultures and trading zones (Galison, 1987(Galison, , 1997)).However, these notions may apply more generally whenever distinct scientific communities attempt to overcome difficulties to communicate and achieve coordination (Collins et al., 2010, p. 8).Consequently it is useful in a much broader range of contexts than the narrow case of physics; for instance, it is generally useful for studying the dynamics of interactions between disciplines in science2 .Below, we propose a brief summary of the concepts of subcultures and trading zones and the rationale for their introduction.
The notion of subcultures was introduced by Galison (1987Galison ( , 1988) ) in order to account for two characteristics of high-energy physics: first, that it is subject to a strong division of labor, such that "theory", "experiment" and "instrumentation" are carried out by different groups of people (Galison, 1987, p. 138), with their own skill sets and bodies of knowledge; and second, that each of these "subcultures" are partially autonomous, i.e., none of them are completely subordinate to the others.We can highlight two tangible components of such subcultures: a social component-the community of practitioners-and a linguistic component-the language specific to each community.
For Galison, then, the question is what makes these subcultures part of a "larger culture" (physics), while retaining that their successful coordination is a "contingent matter" (Galison, 1997, p. 18); and his answer is "trading zones".Trading zones allow knowledge to be exchanged across different subcultures, inasmuch as the practitioners of distinct communities can locally agree on the usefulness of certain constructs despite the distinctiveness of their respective languages, commitments, aims, and methodologies.That trading occurs within "zones" captures the fact that the exchange procedure is "local" rather than "global", such that subcultures working out trades with each other can retain much of their autonomy in the process.
What kinds of goods may be subject to these "trades"?Examples of trade-able goods are "boundary objects", i.e. "objects that are both plastic enough to adapt to local needs and constraints of the several parties employing them, yet robust enough to maintain a common identity across sites" (Star & Griesemer, 1989, p. 393) 3 .Trading zones may give rise to a purposefully crafted inter-language that allows for further communication and coordination (a "pidgin").If the inter-language grows, it may turn into a full-blown language (a "creole"); this signals the emergence and stabilization of a new scientific discipline of its own.
Arguably, this is the process through which "phenomenology" -a subfield of HEP at the boundary between theory and experiment -has developed (Galison, 1997, p. 837).However, we may wonder whether phenomenology is still merely dedicated to bridging the gap between the theoretical and experimental cultures, or whether it acquired enough autonomy to depart from the supremacy of abstract theory -e.g., by relying on independent sources of inspiration for its own enterprise rather than by seeking to establish connections between high theory and experiment.In the following subsection we will suggest treating "theory" and "phenomenology" in high-energy physics as two distinct subcultures, such that they may both enjoy considerable autonomy and eventually fail to coordinate their developments -thus extending the distinction made by Galison between theory, experiment and instrumentation.

Theory and phenomenology as distinct subcultures within high-energy physics
High-energy physics involves a complex web of mathematical and technical knowledge, whether it concerns the details of the often abstract underlying theories, the behavior of the instruments that are assembled within sophisticated experiments, statistical notions for the analysis of the data derived from these experiments, etc.As a result of this complexity, there is a strong division of labor within high-energy physics, and we can even distinguish two different groups within the theorists themselves.While "pure" theorists (we will call them "theorists", in accordance with the terminology within the field) are driven by "the abstract elaboration of respectable theories", phenomenologists (the second kind of theorists) are often more concerned with "the application of less dignified models to the analysis of data and as a guide to further experiment" (Pickering, 1984), or at least more concerned with experimental consequences rather than with high theory.This division is itself strong enough that these two kinds of physicists can generally receive different training and diverge early in their careers, although some physicists -usually prominent ones -have expertise in both these domains and are able to sustain exchanges between the two.Therefore, in the present paper, we will make the following claim: Claim 1: Over the historical range considered , categories "theory" and "phenomenology" in high-energy physics should be regarded as distinct subcultures with their own bodies of knowledge, ontologies and methodologies, and which are carried out by different people.
It is not controversial in itself that "theory" and "phenomenology" are different matters in HEP; these are now distinct categories within the HEP literature and it is not uncommon for physicists to label themselves as "theorists" or "phenomenologists" depending on their specialization.However, our claim goes further by stating that the nature of their work is so distinct that it should not be assumed a priori that they can sustain fruitful connections; per Galison, we should not expect a priori that subcultures are bound to cooperate flawlessly under any circumstance; we should instead remain open to the possibility that they may fail to produce constructs of shared value within the contexts of their respective enterprises.There may not even be one single overarching goal that is equally shared and sought after by HEP theorists and phenomenologists, and it is even less certain that their respective methods should equally contribute to achieving their goals at any time4 .In the following subsection, we will propose that supersymmetry exemplifies the contingent ability of high-energy physicists to coordinate their respective methods and goals in a successful way.It does so because the story of supersymmetry is that of a partial failure, rather than that of a total success.Although successful cases of cross-fertilization across fields are valuable to illustrate the notion of trading zones, that science (and even physics itself, as Galison claims, against a symbiotic view of theory and experiment) is disunified is better exemplified by those cases where scientific cultures attempt and fail to establish coordination.The dramatic story of supersymmetry provides such an example.

Supersymmetry as a trade-able good between theory and phenomenology
Supersymmetry is a symmetry that relates the two fundamental kinds of particles that arise in nature: fermions and bosons.It was postulated simultaneously and independently by several physicists in the early 1970s, who were each motivated by very different goals5 .Supersymmetry rapidly gathered substantial attention from the theoretical community.The reasons were manifold, but they were clearly theoretical rather than empirical, as early reviews of the topic show6 .First, symmetry principles play a fundamental role in High-Energy physics, and supersymmetry was an especially attractive symmetry because of its peculiar properties.Second, supersymmetry can naturally give rise to gravity, as was observed by Volkov and Akulov (1973), suggesting that it could lead to a consistent theory of quantum gravity.This feature of supersymmetry gave birth to an entire research program, "supergravity", which then spanned several decades7 .Third, while quantum field theory is prone to mathematical difficulties due to divergences appearing in the perturbative calculations of certain quantities, in many instances such infinities were suppressed in supersymmetric theories.
However, as appealing as it was to theorists, supersymmetry posed a number of empirical difficulties.First, supersymmetry establishes a symmetry between bosons and fermions; and yet, at first it was not at all clear which of the bosons and fermions should have been related to each other by this symmetry.Moreover, if supersymmetry were perfectly realized in nature, the particles it relates should have identical masses, which was also in contradiction with the data.This contradictory situation was well summarised by Witten (1982) in his Introduction to supersymmetry: [Supersymmetry] is a fascinating mathematical structure, and a reasonable extension of current ideas, but plagued with phenomenological difficulties.[...] Supersymmetry is a very beautiful idea, but I think it is fair to say that no one knows what mysteries of nature (if any) it should explain.
Still, efforts to incorporate supersymmetry into a theory consistent with the data were undertaken over several years, and they culminated in what is now called the Minimal Supersymmetric Standard Model (MSSM) (Fayet & Ferrara, 1977;Dimopoulos & Georgi, 1981).The MSSM is the result of reconciling the achievements of the Standard Model of Particle Physics (SM) (the best theoretical account available at the time and still today) with the requirement of supersymmetry.This, however, has very undesirable consequences.Compared to the SM, the MSSM introduces 105 additional unspecified parameters, so that supersymmetry can accommodate a large range of observations and has little predictive power in general (Parker, 1999, p. 1).In particular, although supersymmetry predicts the existence of many new particles (the "superpartners"), there is a priori little chance that these particles will have just the right properties to be discoverable in experiments.If not, supersymmetry may be of high value to theorists (because of its mathematical properties, and its promise to achieve a coherent account of quantum gravity), while being of low value to phenomenologists who are interested in building predictive models that can lead to the discovery of new particles or phenomena8 .
Yet, in 2011, supersymmetry was perceived across the field as the theory beyond the SM that was most likely to manifest itself in experiments (Mättig & Stöltzner, 2019, 2020).Arguably, the reason why it became highly credible and valuable to phenomenologists as well, was that it could solve the so-called "naturalness" problem of the standard model on the condition that it was discoverable.In parallel to these developments around supersymmetry, there was indeed increasing recognition that an explanation was required as to why the the mass of the Higgs boson (an important piece of the Standard Model) could be many orders of magnitude below the mass scale at which the unification of forces is assumed to take place.It was also realized that supersymmetry could provide an answer to this "naturalness" problem (Weinberg 1979;Veltman 1981;Witten 1982), but only as long as the masses of the superpartners (the particles predicted by supersymmetry) are not too high, so that they should be discoverable in future experiments9 .In light of this, supersymmetry became of very high value to phenomenologists and experimentalists as well, rather than just a mathematical toy for the theorists to play with10 .This situation is summarised in Figure 1.As theorists work out a path towards their goals (e.g., the unification of forces, or the formulation of a consistent theory of quantum gravity), they rely on theoretical heuristics such as renormalizability, symmetry principles, consistency requirements, etc. (Galison, 1995).In that context, supersymmetry emerges as a very valuable concept.Phenomenologists, on the other hand, try to work out a path towards the discovery of "new physics" (evidence for new phenomena unaccounted for by the SM) by relying instead on more generic models and constraints derived from experimental data (e.g. from particle colliders or astrophysical observations).It is the naturalness requirement that makes supersymmetry valuable to phenomenologists as well, by strengthening the belief that supersymmetric particles should have masses that are low enough to be discoverable.In this way, supersymmetry effectively enhances the "trading zone" between theorists and phenomenologists: both communities can acknowledge its value in spite of the vast differences in their aims, methods, and objects of inquiry.

Theory Phenomenology
Unification?Quantum Gravity? "New physics"?Supersymmetry Naturalness Naturalness Renormalizability, Symmetries, Consistency... Collider/astrophysical data, generic models. . .Fig. 1 Supersymmetry in the trading zone between theory and phenomenology.Theorists and phenomenologists have different aims and methodologies, and whether they can both positively appraise a particular construct is not guaranteed.In the case of supersymmetry, it is the naturalness requirement that ensures that the MSSM is so valuable to both subcultures.As a result supersymmetry enhances a trading zone between these two cultures.
It is now time to introduce the last (but not the least) player in our drama: the Large Hadron Collider (LHC).Operating since 2010, the LHC is the largest physics experiment ever built.By performing particle collisions at the highest energies ever achieved, it promised to discover supersymmetric particles, provided that they had the properties prescribed by the naturalness problem that supersymmetry should solve.However, no such discovery has been made, which suggests that the "naturalness problem" was unwarranted (Giudice, 2018).If there is no naturalness problem, then, supersymmetry is left unconstrained again; there is no guarantee that supersymmetric particles will ever be discovered; and its phenomenological value plunges back to the depths from which it surfaced.Therefore we will put forward the following claim, which will also be evaluated in the present paper: Claim 2: Supersymmetry occurs in a variety of partially independent contexts within high-energy physics, some of which belong to "theory" and some of which belong to "phenomenology", and these applications of supersymmetry have responded differently to the LHC's failure to find supersymmetric particles.
Furthermore, we hypothesized that supersymmetry should be losing its ability to sustain trades between theory and phenomenology.Therefore, we will evaluate the following claim: Claim 3: Supersymmetry sustained trades between theory and phenomenology in high-energy physics, until it was challenged by the LHC's failure to observe the particles predicted by supersymmetry.
If theorists and phenomenologists fail to share a similar appraisal of supersymmetry, then this may pose a serious problem for the field: this would imply that theorists' research programs can persist despite their low value to phenomenologists, and conversely that experimental input has little to offer to theorists; if that is the case, then the unity of high-energy physics would indeed be fragilized.Therefore, addressing claims 1-3 (1-that theory and phenomenology are partially autonomous subcultures of high-energy physics; 2-that supersymmetry arises in distinct, autonomous contexts, which responded differently to the absence of supersymmetric particles at the LHC; and -3-that the value of supersymmetry for bridging together subcultures of physics has decreased as a result of the failure of phenomenological supersymmetry) should contribute to answering the questions of what makes and unmakes unity in HEP.

Towards a quantitative assessment of subcultures and trades
In the following, we propose an array of quantitative methods implementing several dimensions of Galison's framework for addressing the plurality of science, which evaluate the claims put forward above.To this end, we will rely on authorship data (for investigating the social entrenchment of theory and phenomenology as distinct subcultures), semantic analyses (for investigating the linguistic divide between these subcultures as well as the plurality of supersymmetry research), and citation data (in order to locate "trading zones" within the field).To our knowledge, this is the first attempt to implement Galison's framework into a quantitative analysis of scientific literature.Of course, the plurality of science and the coordination between scientific fields have already been addressed quantitatively in numerous publications.In the context of physics research, for instance, Battiston et al. (2019) have evaluated the ability of physicists to publish in various subfields.In particular, they demonstrate that HEP physicists are among the most specialized physicists (i.e., they have a high probability of publishing only in their primary subfield), although their work does not distinguish between the various kinds of high-energy physicists, which will be done in the present paper.There remains to address the linguistic component of the divide between these subcultures, in particular theory and phenomenology, and to this end we will propose a novel strategy based on semantic data (titles and abstracts of the literature).
As for the analysis of the plurality of supersymmetry-related research in HEP, we will develop a topic model approach in order to identify clusters of concepts that are most likely to be associated with supersymmetry in the literature, and we will explore the dynamics of supersymmetry research throughout time.
Finally, we will assess the intensity of trades between theoretical subcultures and locate the concepts that facilitate these trades.Yan et al. (2013) proposed a quantitative assessment of dependency relations between scientific disciplines based around a metaphor with international trade, by measuring quantities such as "exports", "imports", or "self-dependence" of various fields throughout time based on citation data.However, this work does not investigate what exactly allows these trades to happen, e.g., which concepts sustain them.This requires combining citation data with semantic information about papers' concepts, as achieved by Raimbault (2019) who proposed measures of interdisciplinarity built upon such data.Similarly to Yan et al. 2013, we will assess the self-dependence of experiment, phenomenology, and theory in HEP based on the citation network.However, we will also evaluate the ability of different concepts (such as supersymmetry) to sustain trades across subcultures throughout time, by combining semantic and citation data.
More broadly, this work will add to quantitative studies of science literature, by helping to fill a gap that has come to the attention of the community.As stressed by Leydesdorff et al. (2020), Kang andEvans (2020), andBowker (2020), quantitative and qualitative studies of science have mostly diverged in their goals and "world views", urging the need to "bridge the gap" between them.We propose, therefore, a bridge connecting these two forms of scientific study.First, we demonstrate that quantitative methods can address questions raised by the philosophy, history, and sociology of physics.Moreover, we show that concepts from qualitative science studies can give structure to quantitative methods, in line with the call by Heinze and Jappe 2020 to inform quantitative analyses with "middle-range theories" (of which Galison's trading zones are an example).As a result, our methods are in principle meaningful in any context where such a theory is valid -whenever scientific cultures attempt to achieve coordination -, much beyond the case-study proposed in this paper.

Data
Our data consists of the scientific literature on high-energy physics and the semantic, authorship, and citation information that it entails which is of interest for our questions.
The data were retrieved using the Inspire HEP database (Moskovic, 2021).Inspire HEP is a platform dedicated to the HEP community and is maintained by members of CERN, DESY, Fermilab and SLAC.It aggregates publications from the HEP literature, and maintains a list of institutions and collaborations involved in the community, while also publishing job offers.It replaced Spires in 2012 11 .
The database is fed by an automatic aggregator that retrieves articles from multiple sources 12 including a number of databases (including the Astrophysics Data System, arXiv, etc.), research institutions (CERN, Inspire then aggregates data from these sources with automated crawlers, and it performs manual curation for completion or error-correction13 , including author name disambiguation.This database has a strong yet untapped potential for quantitative analyses.However, only contents related to HEP are subject to a systematic effort of collection and curation, and the data should be used preferably in analyses which scope is limited to HEP, thereby making it unsuitable for, e.g., studying interactions between HEP and other fields of physics (e.g.condensed matter physics).
The database includes data about the contents of the literature (title, summary, sometimes keywords), the authors (name, unique identifier, institutional affiliations), dates corresponding to different events related to each paper, associated experiments, references of the articles.The only data pertaining to the contents of the articles that are consistently available are titles and abstracts.Articles are categorized according to a classification scheme compatible with that of the arXiv pre-print platform.This scheme includes categories such as Theory-HEP, Experiment-HEP, Phenomenology-HEP, Astrophysics, etc. Categories of papers published on arXiv.orgare extracted directly from the platform (where they are defined by the authors, while being subject to moderation and controls).Categories of papers not published on arXiv.orgare now assigned manually by curators.Categories of papers inherited from the ancestor of Inspire HEP (Spires) and absent from arXiv.org were derived according to a mapping between Spires' classification and the current arXiv-based classification.In this paper, we rely mostly on three categories which entail most of the high-energy physics literature: Theory-HEP, Phenomenology-HEP, but also Experiment-HEP, which typically entails papers that report empirical results such as statistical analyses of experimental data.A portion of the articles between the years 1990 to 1995 were not categorized, which led to some issues with the data collection process, as described in Appendix A.1.For this reason, our longitudinal analyses will focus on later years, which does not prevent us from addressing our research questions.The analysis of subcultures spans over years 1980 to 2020.The years prior to 1980 could also have been interesting for this analysis as well, but the corresponding data was of lower quality.

Social and semantic analysis of subcultures of high-energy physics
The first claim that we seek to establish is that "theory" and "phenomenology" should both be regarded as distinct subcultures within physics.There are two components to subcultures: a linguistic one (they should have vocabularies that are distinct enough to signal complementary bodies of knowledge) and a social one (they should correspond to distinct groups of people).Therefore we will proceed twofold.First, we will demonstrate that theory and phenomenology manipulate vocabularies that are so distinct that we can predict with reasonable accuracy whether a paper belongs to one of these categories based on the words present in its abstract; our predictive model will then be used to unveil the ontological differences between these subcultures.Second, we will show that these categories from the literature are associated with different communities.

The semantic divide between Theory and Phenomenology
If it is possible to tell whether a paper is theoretical or phenomenological based on the words it contains, then this implies that these categories use partially distinct vocabularies -i.e., that each of these two categories has its own "language" -in a way that allows papers from one category to be distinguished from those from another.If that is the case, we can then examine the nature of the linguistic divide between theory and phenomenology in order to better understand their differences.In what follows, we apply this strategy using statistical methods, based on the classification of HEP literature provided by Inspire HEP14 .Although we are more interested in the divide between "theory" and "phenomenology", we also include "experiment" (which Galison himself labeled as a subculture of its own) in our analysis in order to emphasize its differences with phenomenology.
In order to establish whether we can predict which articles d belong to any of the categories c ∈ {Experiment, Phenomenology, Theory}, we will build a simple linear logistic regression using a bag-of-words as the predictive features.In this approach, the corpus is represented by a matrix B = (b d,i ) ∈ R D×V where D is the amount of documents, V is the size of the vocabulary, and b d,i is the number of occurrences of the word (or expression15 ) i in the document d.This representation excludes a lot of semantic information that results from the knowledge of the ordering of the words and the structure of sentences within the documents; it is in line with our goal to find out whether the vocabularies of each category are so distinct that the mere presence or absence of certain words can be used to infer the category of a document.We perform a normalization of the bag-of-words prior to the regression, by applying the tf-idf transformation16 to (b d,i ), resulting in a normalized bag-of-words which we will name (b ′ d,i ).More specifically, our predictive model is defined as: This model is then trained on N = 100, 000 articles of our database from 1980 to 2020 that belong to any of the following categories: Experiment-HEP, Phenomenology-HEP, and Theory-HEP, 17 .The vocabulary used in the regression is the V expressions (n-grams, up to four word long) among those that belong to predefined syntactic patterns 18 , that have the highest "unithood" as measured in Omodei 2014 19 .The size of the vocabulary V is chosen to be a round number that is just high enough to reach about the maximum accuracy of the model, as evaluated on the test set (which consists in 10,000 articles not present in the training set).Then, the accuracy of the predictions of the model is evaluated using the same test set.The coefficients β ci are then analyzed in order to extract the words that are the most discriminatory between "theory" and "phenomenology", thus revealing the most salient differences.For that, we retrieve those expressions i that maximize β th,i − β ph,i and β ph,i − β th,i .Because of the inverse document frequency transformation applied prior to the regression, expressions that are more common are favored by this selection process.

The social divide between Theory and Phenomenology
What does it mean to say that theory and phenomenology have a "demographic component", as Galison (1987, p. 138) puts it, regarding theory and experiment in HEP?It means that these categories of the literature are supplied by distinct groups of people, "theorists" and "phenomenologists".Therefore, we will investigate whether it is the case that experimental, phenomenological and theoretical papers are published by three distinct groups of physicists, such that these physicists usually contribute mostly to just one of these categories Again, "experiment", which is a paradigmatic example of subculture in Galison's view, is also included in our analysis.It will be useful to assess whether the distinction between phenomenology and theory is comparable to the distinction between theory and experiment (the one initially stressed by Galison).
Let N ij be the amount of articles co-authored by a physicist i that belong to the category j ∈ {theory, phenomenology, experiment}, and N i the total amount of articles co-authored by i.Let us assume N ij ∼ Binomial(N i , p ij ), where p ij is the latent probability that a paper from physicist i belongs to the category j 20 .Since the researchers co-authored widely varying amounts of publications (ranging from a few papers to hundreds), we assumed that the latent probabilities p ij were described by the following model: The binomial process assumes that each physicist can be imputed a constant latent fraction of papers in each category.The beta prior is a flexible distribution over probabilities, which can be either unimodal or bimodal.The exponential prior over α and β is agnostic regarding these two possibilities, and its exact shape does not significantly matter considering the amount of available data.Most crucially for us, this model allows to combine information from researchers with many papers and researchers with very few papers; for those with few papers, the estimation of the latent probabilities is more influenced by the shape of the beta distribution.The model was fitted to 2500 researchers randomly sampled among those with more than 3 publications in HEP for 1980-2020.In order to evaluate the social entrenchment of these categories, we verify that most physicists contribute mostly to just one of these categories.

Assessing the plurality of supersymmetry research with topic models
Our second claim pertains to the plurality of supersymmetry research.In this section, we present our methodology for assessing the plurality of supersymmetry related research, by recovering the contexts, i.e., the topics in which supersymmetry occurs, and by evaluating the extent of their independence, and how they responded to the results of the LHC.More broadly we provide a methodology for investigating scientific "objects" akin to "boundary objects" in that they are "plastic enough to adapt to local needs and constraints of the several parties employing them" (Star & Griesemer, 1989, p. 393), by unveiling the plurality and autonomy of the contexts in which such objects may arise.

Model
In order to evaluate in which contexts supersymmetry arises within the high-energy physics literature, we have chosen to subdivide the literature into sub-topics using an unsupervised probabilistic topic model, namely the Correlated Topic Model (CTM, Blei and Lafferty 2007).We do not use conventional classifications such as the Physics and Astronomy Classification Scheme® (PACS) codes from the American Institute of Physics (AIP), because they were not available for the whole dataset -PACS codes were only available starting from 1995, and only for a subset of the papers, which may not be representative of the whole.Besides, PACS codes are too numerous (more than 5000 categories)21 for our purposes.Therefore, we opted to extract the topics in the literature using unsupervised topic models instead.
Probabilistic topic models generally assume that each document of a corpus is a mixture of variable proportions of a certain amount of topics, each of these topics having their own vocabulary distribution.When trained on a corpus, such models simultaneously learn the "topics" in the corpus (and their vocabulary), as well as the relative contribution of each topic to each document of the corpus.These models have demonstrated their ability to capture the semantic information contained within the scientific and academic literature, as shown in previous work22 , even from abstracts alone (Syed & Spruit, 2017); as a result this technique has seemingly taken precedence over network-based semantic maps (Leydesdorff & Nerghes, 2016, Figure 1).Although co-occurrence networks may have more conceptual bearing in the STS tradition, we have preferred topic models for their intrinsic ability to capture the polysemy of certain words (e.g., "supersymmetry"), in terms of the probabilities that such words can arise in different contexts (i.e.topics).
In particular, we have chosen the Correlated Topic Model (CTM) for its ability to capture correlations between topics.In this model, the contribution of a topic z to a document d, P (z|d), is assumed to be drawn from a hierarchical model involving a correlated multivariate distribution (Blei & Lafferty, 2007): (2) Through the covariance matrix Σ, the CTM model is able to learn correlations between topics, and therefore to account for the fact that some topics are more likely to occur together within one document.Moreover, our intuition is that using CTM allows the derivation of more realistic topic-distribution for short texts such as abstracts, for which the small amounts of words only moderately inform the prior topic distribution.Most importantly, this model allows us to directly assess the level of independence between the topics derived by the model, which is important for assessing the autonomy of the contexts in which supersymmetry arises.
The model is trained on N = 120, 000 articles randomly sampled from those between 1980-2020 that belong to any of the categories Theory-HEP, Phenomenology-HEP, Experiment-HEP, and also Lattice (a theoretical approach to HEP, with ties to both theory and phenomenology, and in which we expected supersymmetry to potentially arise as well).The procedures for extracting the input vocabulary and for choosing the hyper-parameters are described in detail in appendices A.3.1 and A.3.2 respectively.Two methodological contributions can be highlighted.First, we included informative n-grams matching pre-defined syntactic patterns in the vocabulary in order to preserve more semantic information.Second, we made a prudent and balanced use of perplexity and topic coherence measures in order to recognize the advantages and limitations of both these kinds of measures for assessing the quality of topic models and choosing the best hyper-parameters.The procedure resulted in the extraction of 75 topics.

Interpretation and validation
Once the model was trained, we manually assigned a label to each topic, by inspecting and interpreting their top-words and the categories from the PACS classification of the physics literature that were most correlated to each topic23 .Informing our interpretation of each topic with these correlations rather than the sole top-words help overcome issues associated with the interpretation of fat-tailed topic-word distributions based on a handful of top-words (Chang et al., 2009;Allen & Murdock, 2022).We failed to provide a meaningful label for some topics, but this had little impact on the rest of the analysis.Finally, in order to assess the meaningfulness of the metrics produced by the model (the document-topic distributions and the topic-word distributions), we performed an additional validation procedure using the PACS classification of the literature and the input of independent experts (see appendix A.3.3).
In section 4.2, the model is applied to a number of tasks: the evaluation of the contexts (i.e.topics) in which supersymmetry occurs in the literature, the extent of the correlation between these contexts and finally the trends in research involving supersymmetry since the start of the LHC.

Locating trades across scientific cultures
In this section, we elaborate a longitudinal methodology for locating trades between scientific cultures, which we use to assess the ability of supersymmetry to enhance trades between the theoretical and phenomenological cultures of HEP throughout time.Trading zones can manifest themselves in a myriad of ways, some of which are readily prone to a quantitative analysis.For instance, citing the example of quantum chromodynamics, a theory of the strong interaction, Galison notes that "the contact between the experimenters and the phenomenological theorists had grown to the point where Andersson [a theorist] and Hofmann [an experimentalist] could coauthor a Physics Letter " (Galison, 1997, p. 655).In that sense, a paper co-authored by scientists from different cultures is indicative of a trading zone, such that co-authorship data can in principle be used to probe trades across scientific cultures.Another manifestation of trading zones can be found in the citation network, which encodes exchanges of knowledge across publications, and sometimes across subcultures.Indeed, that a phenomenological publication, for instance, cites a theoretical paper, indicates that phenomenologists can acknowledge the value and significance of certain theoretical constructs (that are present in this specific paper) in their enterprise.Although in principle both the citation networks and the collaboration networks could be used for our purpose, the present analysis will rely on the former.Indeed, the citation graph preserves more information about the directionality of the exchanges involved, thus supporting the trade metaphor in Yan et al. 2013.Intuitively, it is also less vulnerable to non-epistemic factors as is the case with authorship (e.g.physicists authoring papers they did not contribute to as is frequent in large collaborations in the field).In addition, for validation purposes, we show in Appendix A.4 (Fig. 10) that the citation network can indeed reveal the relative autonomy (self-reliance) of HEP subcultures, but also the special role of the phenomenological subculture in sustaining the unity of HEP by channeling trades across theory and experiment (which hardly communicate directly otherwise).This further supports the use of the citation graph use as a means of locating trades.
In order to assess the ability of supersymmetry to facilitate trades between theorists and phenomenologists, we develop a method that combines two important aspects of Galison's trading zones: their locality and their linguistic component (the "inter-language").In particular, we look for scientific concepts that are most likely to be involved in trades between these subcultures throughout time.To this effect we perform the analysis on a subset of the citation graph, such that the nodes are limited to theoretical and phenomenological papers, excluding cross-listed papers (those that belong to both these categories).For each of these two theoretical cultures, we derive a list of informative keywords from the abstracts of the papers by extracting n-grams (n ≥ 2) matching certain syntactic patterns.We retain the top N keywords (sorted by decreasing unithood) such that at least 95% of the abstracts contain at least one of the N keywords; this yields N = 1370 keywords specific to the phenomenological culture and N = 1770 keywords specific to the theoretical culture.From this we derive a bag of words b ik for each publication such that b ik = 1 if keyword k is present in abstract i, and b ik = 0 otherwise.We then evaluate the probability that the keyword occurs in an abstract given the paper is involved in a trade between a theoretical and a phenomenological paper at a time t, which we write P (b k = 1|trade i→j , t).We consider trades in both directions: phenomenological papers citing theoretical papers (th→ph), then in a second time theoretical papers citing phenomenological papers (ph→th).To what extent supersymmetry helps sustain the trading zone between these theoretical cultures is roughly measured by P (b k = 1|trade, t) for those keywords k related to supersymmetry.In this analysis, we explore citations appearing in papers published between t = 2001 and t = 2019 (covering similar range prior and after the start of the LHC), and we include all cited papers published from 1980 onwards; it is unlikely that recent papers would cite publications from before 1980.However, since cross-listed papers, which we excluded, have become much more common in the database starting from 2010 for spurious reasons (a change in the classification procedure), we ran a separate analysis in order to assess the robustness of our results.In this second analysis, we included cross-listed papers and assigned them only one category based on their authors' primary subfield (the subfield to which they contribute the most).We found both analyses to produce similar results.In the following we report the results obtained by excluding cross-listed papers.

Theory and phenomenology as distinct subcultures
Let us now examine our first claim that "theory" and "phenomenology" should be regarded as distinct subcultures within high-energy physics.The claim requires that these categories mobilize distinct bodies of knowledge which manifest themselves through distinct vocabularies.As shown in Table 1, it is indeed possible to predict with reasonable accuracy whether a paper belongs to either one of these categories based on the vocabulary in its abstract.The accuracy is higher than 90% for "theory" and reaches 86% for "phenomenology", far above what one would obtain from assigning the most probable class irrespective of the contents, purely based on their frequency.This conclusion holds throughout the whole historical period considered (see Appendix A.2).This supports the existence of a linguistic divide between these two theoretical cultures over the years 1980 to 2020.
Our model also unveils the expressions that are most capable of discriminating between theory and phenomenology, as shown in Table 2.One striking difference between theory and phenomenology appears to be the importance of space-time related concepts in theory ("space-time", "geometry", "manifold", "dimension",

Theory Phenomenology Experiment
Model accuracy 91% 86% 92% Baseline 55% 51% 84% Table 1 Accuracy of the model for predicting which categories HEP papers belong to.The precision of the model for each category is estimated based on the test corpus.For reference, the accuracy of a naive model that assigns the most likely class irrespective of any information about the papers is given (baseline).The size of the vocabulary used for the predictions is set to 500 words and expressions.
"coordinate", etc.).The objects (entities) of interest also differ, which signals an ontological divergence: on the pure theory side, "black hole[s]" and "strings" are prominent entities, while particles ("quark", "neutrino", "gluon", "hadron", "nucleon", etc.) belong to the realm of phenomenology.Among those terms most specific to phenomenology but absent in pure theory, we also find the notions of model ("mssm", "standard model"), and effective field theories ("effective theory", "chiral perturbation theory") which are approximate theories emerging from more fundamental theories.Moreover, the mention of "experimental data" is a distinctive feature of phenomenology: theory is not directly committed to establishing a connection with empirical results.Interestingly, one aspect of supersymmetry (the MSSM) appears as markedly phenomenological, while "supergravity" is specifically theoretical.Similarly, keywords that discriminate the most between experiment and phenomenology are shown in Table 3.They confirm the theoretical ( "model", "scenario", "effective theory", "implication") and computational ("estimate", "approximation", "contribution", "numerical result", "correction") nature of phenomenology, as opposed to the empirical, "fact-based" dimension of experiment ("measurement", "search", "experiment", "event", "result", "evidence", "data").Fraction of (co-)authored publications for each category (theory, phenomenology, experiment) Fig. 2 Relative fraction of articles from any of the categories "Experiment", "Phenomenology" and "Theory", for 2,500 HEP physicists.Each physicist among those sampled is represented by a red dot on the diagram, positioned according to the estimate of (p i,exp , p i,ph , p i,th ), the probability that any of his articles belong to those three categories.The dashed lines, along the direction of the arrows, form a grid along which one can read the relative importance of each category for every physicist ( ). Physicists near the vertices of the triangle contribute almost exclusively to one category; those near an edge contribute quasi-exclusively to two categories.Most physicists are located near a vertex, thus contributing to mostly one category.
What about the "demographic component" of the divide between theory and phenomenology?Do these categories have social counterparts?The results of our social analysis are shown in Figure 2. Figure 2 is a ternary diagram in which each red dot represents a physicist and is positioned according to the relative prevalence of each category (among experiment, phenomenology and theory) among the papers they authored or co-authored.The majority of the dots are clustered near vertices, which means that most physicists dedicate themselves to mostly one of these categories.In particular, the inner part of the ternary diagram, which corresponds to physicists with balanced contributions to each category, is almost empty.We do find that some authors are scattered along the experiment-phenomenology edge and the phenomenology-theory edge; still, our results suggest that the category of phenomenology does feature a "demographic" counterpart as well, although it is more porous than experiment or pure theory.Therefore, phenomenologists do, to some extent, constitute a social group distinct from that of theorists (and experimentalists); however, phenomenology seems to play a special role in sustaining some form of cooperation between experimentalists and theorists.Overall, we find that 81% of high-energy physicists publish more than 80% of their papers in just one of these categories, which is clear evidence of specialization.
Our quantitative analysis supports our claim that, at least over the years 1980 to 2020, theory and phenomenology should be regarded as distinct subcultures with partially distinct languages.Consequently, strategies ought to be devised for them to properly communicate and coordinate their efforts, as long as physicists believe it to be necessary or worthwhile.It follows that their unity should not be assumed; instead, why a trading zone may be successfully worked out remains to be explained.Before we turn to the ability of supersymmetry to sustain the coordination between these subcultures, we will address the plurality of supersymmetry research itself.

The plurality of supersymmetry
In this section we apply our methods to address our second claim regarding the plurality of supersymmetry research and the recent decline in phenomenological supersymmetry research as a response to LHC results.
Topic models are able to link one word to several topics, thus allowing us to unveil different aspects of supersymmetry, i.e. different contexts 24 in which this concept may occur.For three words w that explicitly refer to supersymmetry ("supersymmetry", "supersymmetric", "susy" 25 ), we evaluated the probability P (z|w) that these words occur in the context of a topic z according to: Where P (w|z) is frequency of the term w within the topic z, P (z) is the marginal probability of topic z, and P (w) is the overall term-frequency of w.The five most probable topics for each of the words "supersymmetry", "supersymmetric", and "susy" are shown in Figure 3 (for the other topics, the probability P (z|w) for w ∈ { "supersymmetry", "supersymmetric", "susy" } is residual).We can see that each of these terms may indeed occur in relation to a variety of topics: "supersymmetric theories" (which entail supersymmetry in string theory, or supersymmetric gauge theories in general), "sigma models (?)", "Higgs sector beyond the SM", "supergravity", "Higgs boson", "supersymmetric particles", "flavor physics".The meaning of the "sigma models" context is unclear, although it comprises most occurrences of terms relating to superspaces and superfields.These concepts are directly tied to supersymmetry.They arise from the abstract extension of space by introducing extra anti-commuting coordinates.That supersymmetry spans across distinct topics constitutes evidence for the diversity of its uses.It is also notable that several of these topics are in fact dominated by supersymmetry ("supersymmetric theories", "supergravity" and "supersymmetric particles").This stresses the importance of supersymmetry in the high-energy physics literature.
Moreover, although all these words ("supersymmetry", "supersymmetric" and "susy") should refer to the same concept, we find that they are in fact related to different topics: "supersymmetry" seems to encompass more theoretical aspects of supersymmmetry (e.g.supergravity) while "susy" is more likely to occur in relation to supersymmetric particles (phenomenological supersymmetry).In fact, we find that 60% of papers mentioning "supersymmetry" belong to theory (versus ∼40% to phenomenology) while only 30% of papers mentioning "susy" in their abstract belong to "theory" (versus 70% to phenomenology).
That these topics are at least partially independent can be assessed by inspecting the covariance matrix Σ of the Correlated Topic Model from which they were derived.We therefore compute the correlation matrix 26 between the seven topics most commonly associated with supersymmetry; the results can be found in Figure 4. Overall the correlations are close to 0, which suggests that these topics are rather independent, with a few exceptions.In particular, pairs of topics that belong to the same kind (theoretical or phenomenological) are moderately correlated; pairs of topics that are directly tied to supersymmetry (e.g., supergravity and phenomenological supersymmetry) but of different nature (in this case, theoretical and phenomenological, respectively) are less correlated.Further visual evidence is provided in Figure 9 (Appendix A.3.5).
From these results, one can see that supersymmetry is itself a diverse concept.It arises in a variety of partially independent contexts.In particular, theoretical and phenomenological aspects of supersymmetry are quite independent.How have these different aspects of supersymmetry evolved after the negative results of the searches for supersymmetric particles at the LHC?
24 Like Allen and Murdock (2022), we caution that these "topics" may not be as coherent as the common understanding of the word may suggest and that they should really be understood as different "contexts", although we use both terms interchangeably below. 25Short for "supersymmetry" 26 The Pearson correlation coefficients R ij can be deduced directly from the covariance matrix Σ of the CTM model -cf."susy" Fig. 3 The many uses of supersymmetry.For three terms w refering to supersymmetry ("supersymmetric", "supersymmetry", and "susy"), the five topics z that are most likely to have led to their occurrence and their respective conditional probability P (z|w) are shown."Supersymetry" and "supersymmetric" have similar distributions, and mostly occur within theoretical topics."Susy"'s topic distribution is much more peaked, and most often occurs within phenomenological topics.
In order to address this question, we evaluate the evolution of supersymmetry research in HEP since the first results of the LHC (2011) until today.For that, similarly to Hall et al. ( 2008), we assess the relative importance θz,y of each topic z for every year y from 2011 to 2019: Where D y is the amount of articles first submitted in year y.We then selected the three topics with the highest increase (rising topics) and decrease (declining topics) in magnitude over this period.For that, P (z|y) was fitted to a linear time trend (P (z|y) = a z y + b z ), discarding topics for which the correlation was not significant (i.e.R = 0 is excluded from the 99% CI).Then, the topics were sorted according to the best fit value of a z , the rate of increase of its magnitude per year (similarly to what was done in Griffiths and Steyvers 2004).We apply the procedure to all papers mentioning at least one of the words "supersymmetric", "supersymmetry" or "susy" in their title or abstract in the years following the start of the LHC.The results are shown in Figure 5.
According to these results, the most rapidly declining topics among articles that mention supersymmetry are Higgs-sector related topics and phenomenological supersymmetry, i.e. phenomenological aspects of super- Fig. 4 Correlation between the topics most associated to supersymmetry.The Pearson correlation is comprised between -1 (perfect anti-correlation) and 1 (perfect correlation).A correlation close to 0 means that a pair of topic is partially independent, i.e. that they can arise or not in variable proportions in a paper.
symmetry.By contrast, two of the (relatively) increasingly active topics are very theoretical (in particular, Supergravity and Conformal Field Theory).In order to understand these dynamics, it is therefore necessary to distinguish theoretical supersymmetry from phenomenological supersymmetry.Shifman's assessment strikingly converges with the patterns that emerge from our analysis.Topic models reveal the plurality of supersymmetry in high-energy physics.They support that supersymmetry arises in different contexts, some theoretical and others phenomenological.They allowed us to demonstrate that these "faces" of supersymmetry have responded differently to the absence of evidence for supersymmetric particles at the LHC.Indeed, although phenomenologists find supersymmetry to be much less valuable in the light of the most recent experimental findings, theorists may still rely on it for their own endeavor.This supports  (2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018)(2019).On the left, the three topics that are declining the fastest "Supersymmetric particles", "Higgs sector beyond the SM" and "Higgs boson".On the right, the three fastest rising topics are "Supergravity", "Amplitudes and Feynman Diagrams", "Conformal Field Theory".
that cultures can "trade" certain concepts (according to Galison's terminology) while retaining much of their autonomy, including in their own appraisal of the usefulness of these concepts27 .In the next section, we investigate the contribution of supersymmetry to sustaining the trading zone between these theoretical traditions throughout time.

Supersymmetry in the trading zone between theory and phenomenology
Which concepts sustain trades within HEP?As proposed in Section 3.4, we measure the ability of certain concepts (keywords) to sustain trades through time in terms of the probability that each of these concepts occurs in citations across theory and phenomenology.The results are shown in Figures 6 and 7.
Both these figures show the probability of occurrence of the five most common keywords (left side) and the five most common supersymmetry-related keywords (right side) involved in trades across these subcultures (excluding redundant keywords).Figure 6 shows those probabilities for trades where phenomenological papers draw from theoretical papers.Three main trends are revealed: the fall of trades involving extra-dimensions (hypothesized spatial dimensions beyond the 4 space-time dimensions for which we have direct evidence); the increase in trades involving black-holes; and, directly relevant to our third claim, the decline of trades involving supersymmetry, despite a short increase after the start of the LHC in 2010.Interestingly, in the early 2000s, "supersymmetric model[s]" had a trade-ability on par with that of the keywords most involved in these trades.Moreover, "low energy" appears to be one of the most frequent keywords in phenomenological Fig. 6 Inside the trading zone: probability that certain keywords appear in the abstract of a theoretical paper involved in a trade (a phenomenological paper citing a theoretical paper).To the left, the five keywords are those with the highest peak probability of occurrence; to the right, are the five keywords with the highest probability of occurrence among supersymmetry related keywords.Redundant keywords (which normalized pointwise mutual information with a more frequent keyword exceeds 0.9) are excluded.
imports of theoretical papers, which makes sense since the low-energy limit of theories of, say, strings and quantum gravity is what matters most from a phenomenological standpoint (it is what can be observed).
Turning to Figure 7 -trades involving theoretical references -, we get an even more striking picture of the demise of "extra dimensions", which were involved in about 30% of the trades in 2001 and went down to 5% only.Similarly, "weak-scale" which refers to the domain of phenomena targeted by the LHC, have become much less frequent in the "trading zone" (from ∼10% of trades to ∼2%).This suggests that phenomenological models dedicated to this domain of phenomena have become much less useful to the "theoretical" subculture over time.On the other hand, "dark matter"28 is increasingly common in the phenomenological papers theorists draw from.This suggests dark matter is deemed valuable for the theoretical enterprise as well.This figure also confirms the overall decline of supersymmetry in the trading zone, thus providing further support to our third claim: supersymmetry does not connect developments from the theoretical programs to progress in the phenomenological program as much as it did prior to the LHC.

Conceptually informed methods in quantitative science studies
Before exploring the implications of this case-study, we want to emphasize that Galison's conceptual framework has been a fruitful guide for our quantitative approach.The linguistic component of his notion of subculture led us to build a bag-of-words model for measuring the extent of the divide between two theoretical cultures, and for unveiling the concepts that are specific to these cultures as well as their methodological Fig. 7 Inside the trading zone: probability that certain keywords appear in the abstract of a phenomenological paper involved in a trade (a theoretical paper citing a phenomenological paper).To the left, the five keywords are those with the highest peak probability of occurrence; to the right, are the five keywords with the highest probability of occurrence among supersymmetry related keywords.Redundant keywords (which normalized pointwise mutual information with a more frequent keyword exceeds 0.9) are excluded.
and ontological differences.The social autonomy of these subcultures, too, can be readily quantified from authorship data.Furthermore, the notion of trading zone invited us to explore citations quantitatively (as a proxy of scientific "trades") while devising ways to determine their "location" in the semantic space.We also found that topic models can reveal the plurality of contexts in which a concept may arise, and how the dynamics of these contexts compare throughout time.Although we have applied our topic model approach to supersymmetry, in principle it can be applied to any kind of "boundary object", understood in the broad sense of a shared notion that allows some coordination to be achieved while preserving the distinctness of the scientific cultures at play.In the end, these methods illuminated our study of supersymmetry in high-energy physics, and provided further grounds for Galison's claim that unity is a contingent matter.

Unity challenged?
The two theoretical subcultures we have distinguished -pure "theory" and phenomenology -no longer seem to value supersymmetry equally.Supersymmetry indeed fails to provide equally satisfying solutions to the heterogeneous commitments of HEP physicists, which poses a challenge to the unity of the field.Indeed, the example of supersymmetry shows that what drives theoretical progress may not drive phenomenological progress -in contrast with the expectations of the community regarding supersymmetry prior to the LHC as surveyed by Mättig andStöltzner 2019, 2020 -and developments in these subcultures may become quite orthogonal.
Of course, supersymmetry is not the only channel of coordination between the theoretical and phenomenological cultures in their search for "new physics".Another channel, for instance, has been the notion of extra dimensions (see Figures 6 and 7), which dominated trades in the early 2000s to an extent we did not expect before conducting this analysis.Extra-dimensions are required by string theory, but they are also subject to trades with phenomenologists interested in their observable consequences.However, no evidence for extra-dimensions was found at the LHC.This further supports that the goals that drive theoretical research programs such as string theory (like the search for a quantum description of gravity) may not serve the phenomenologists' agenda so well.
Eventually, the LHC provided "a test of the unity of physics" 29 , and its verdict was ruthless.In the future, will the field strive to regain unity (possibly to the detriment of certain research programs), or will the socially entrenched divergences between these "cultures" of high-energy physics prevail?We may assume that the challenge is merely transitory, and that theorists will eventually move to other theories which will be more successful from an empirical or phenomenological standpoint.However, the divergence between these theoretical cultures has become axiological (Camilleri & Ritson, 2015;Laudan, 1984), in the sense that they prioritize different epistemic goals 30 ; and this divergence may persist as long as their differences in aims persist; as Galison puts it, "there is no teleological drive towards ever-greater cohesion", and "fields previously bound [may] fall apart" (Galison, 1997, p. 805).As illustrated in Figure 1, the aims of the theorists is to achieve the unification of the fundamental forces and a coherent theory of quantum gravity.By contrast, the aim of phenomenologists is to guide the experiment towards promising directions where evidence of "new physics" may be found.Both these aims may seem well-founded; however, there is no reason to expect that a simultaneous solution can be worked out.The apparent failure of supersymmetry to provide such a simultaneous solution does not undermine by itself the relevance of the "theorists"' aims, nor does it undermine the methodology they deploy for addressing their goals (e.g.their trust in certain theoretical constraints, cf.Galison 1995).It does, however, challenge the belief that such methods can provide grounds for progress to the field as a whole; indeed, unification and quantum gravity might eventually not provide much reliable guidance to the experimental side.Conversely, it can very well be that the details of the theory "at high energy", where quantum effects matter to gravity, cannot be extrapolated from our knowledge of the low-energy theory, i.e. the one that we can probe in our experiments.As a result, Dawid (2013) argues for recourse to meta-empirical assessment of theories in theoretical physics, given that empirical input underdetermines the directions of potential progress in quantum gravity.Disagreements in the aims of a scientific enterprise may not always be resolved on purely epistemic ground, and a resolution, provided it occurs, may involve some sort of negotiation instead.As long as theorists believe in the feasibility of their aims, they may pursue these aims even if it further isolates them from other cultures 31 .Alternatively, they could decide that the schism should be resolved; as Galison puts it, distinct scientific cultures "can [. . .] understand that the continuation of exchange is a prerequisite to the survival of the larger culture of which they are part" (Galison, 1997, p. 803).

Trading zones as a mean to sustain diversity
More generally, the example of HEP and supersymmetry demonstrates how disunity can be endogenously produced in the fabric of science.Even initially tightly bound scientific cultures can diverge into quite distinct and autonomous programs, with different ontologies, methodologies and aims, as new domains of inquiry open up (e.g.quantum gravity) and warrant new modes of knowing.The extent of the coordination between disciplines will in general depend on epistemic factors (depending on how fruitful certain "trades" turn out to be), but also on non-epistemic factors: for instance, it may depend on the institutional setting, or whether such exchanges are incentivized or "coerced" (Collins et al., 2010).
Paradoxically, it can be noted that trading zones can stabilize the heterogeneity of cultures within a field, by sustaining the practitioners' beliefs that, in spite of the large differences in what they are doing, 29 Wilson (1986, p. 29) (cited in Cat 1998, p. 292) used this expression in reference to the now aborted Super-Conducting Supercollider. 30Laudan (1984) refers to disagreements in the goals of scientific inquiry as axiological Camilleri and Ritson 2015, for instance, have argued that certain controversies around string theory could be understood in terms as an instance of axiological disagreement. 31More drastically, Cao and Schweber (1993) expressed the view that the theories at different energy scales (i.e.corresponding to different ranges of phenomena) are irreducible, and they argued for a "pluralist view of possible theoretical ontologies" while challenging the possibility of achieving a "ultimate stable theory of everything" (p.69-71).According to this view, the plurality of ontologies in physics is not an accident but the result of partially disconnected "phenomenological domains" through which knowledge cannot be deduced from one another.For a criticism of this view, see Rivat and Grinbaum 2020.their respective efforts somehow support each other.If that is the case, there is no perceived need for a profound re-alignment of their respective practice.Trading zones can contribute, therefore, to a mutual process of legitimization of heterogeneous scientific practices, which is not necessarily tantamount to further ontological unity.In order to further emphasize that, it is useful to come back to the example of HEP, and most particularly that of string theory, a highly theoretical research program driven by the pursuit of a consistent theory of quantum gravity.String theorists such as Matt Strassler have argued that even if string theory did not directly provide testable predictions to phenomenologists and experimentalists, it generated mathematical tools that could be useful to their practice, e.g. for predicting the behavior of quark-gluon plasma (Ritson, 2021).Consequently phenomenologists may have a low appraisal of string theory in terms of its ability to generate models for testing its assumptions about nature, while still recognizing the usefulness of what string theorists do for them, since some of their work is effectively "applicable".As Ritson and Camilleri (2015) put it, "if string theory has proved so useful for branches of physics whose scientific status is not in question, it can be argued it forms a legitimate part of physics".Supersymmetry itself may be experiencing the same fate, considering that "supersymmetry as a tool for exploring gauge dynamics at strong coupling [. . .] is taking precedence over phenomenology" (Shifman 2020, p. 7-8).Such trades do support the usefulness of the theoretical program to other endeavors, without necessarily implying further integration of the subcultures of HEP (ontological unity); just like successful interdisciplinary work does not necessarily amount to further integration of disciplines (Grüne-Yanoff, 2016).

Limitations and future work
Before concluding we would like to hint at several directions for future work that could overcome certain limitations of the present methodology and further inform the question of the disunity of science.
First, none of our semantic methods distinguished between different kinds of words, i.e. which words refer to, say, methods (such as computation techniques) rather than entities (e.g.strings, particles, etc.).It would be interesting to evaluate to what extent the coordination between theoretical cultures involves ontological or mere methodological trades, depending on whether the constructs of high-theory are referred to as the proper description of nature or as mere mathematical tools, and how this may have changed throughout time.This might uncover evidence for a shift from an ontological to a more methodological coordination between the subcultures of high-energy physics, as the arguments for supersymmetry and string theory as "tools" rather than accurate accounts of the natural world suggest.
Another direction of future work involves the topic model approach.Although the topic model used in this work yielded seemingly acceptable results overall, some topics were difficult to interpret.In that respect, we made several improvements compared to previous works, by training the model on not just single words but also n-grams matching specific and presumably semantically informative syntactic patterns and by informing our interpretation of topics using correlations with a standard classification (rather than the top-words only).Yet, further improvements could be made.First, vocabulary selection could be enhanced by a better handling of mathematical expressions, for instance by parsing LaTeX formulas.The NLTK library picked up some of these expressions, and since they captured some information about the documents, we did not exclude them from the vocabulary; however, this way of proceeding does not preserve the underlying mathematical structure, although it may be valuable to distinguish references to, say, specific particles, or certain symmetry groups, based on their mathematical notations.We may also want the model to learn to discard uninformative words such as "result", "parameter", "model", etc..In our case, we found such vague words to be clustered in three topics that we labeled as "jargon" which correlated very poorly with the standard classification (see Tables 4 and 5), but they should ideally not emerge as distinct topics on par with more meaningful topics.To this end, we may want to build on Griffiths et al. 2004, which provides a model that is able to distinguish between "semantic" and purely "syntactic" clusters of words without prior knowledge of the language.A more critical limitation of topic models pertains to the challenge of hyperparameters' tuning, considering it is unclear which performance metric should be maximized in the process.Although we proposed a procedure for choosing these parameters that accounts for known limitations to the reliability of perplexity or topic coherence metrics, non-parametric methods may provide a better answer to this fundamental issue (Gerlach et al., 2018).
Finally, the historical scope of our analysis was limited by our database.In particular, we were only able to analyze the theory/phenomenology divide over a restricted time range , and we could not reveal how such a divide has historically emerged.By contrast, Galison has proposed a number of explanations for the earlier decoupling of theory and experiment, such as increased specialization and the increased time-scales of experiments (Galison, 1987, p. 138).
-All single nouns and adjectives are retrieved from these tokens.
-Words and expressions that occur less than 20 times are removed.
First, these steps allow us to reduce noise by removing words that convey little to no information about the topics of the articles (such as stop words).Second, extracting n-grams that matching certain syntactic patterns allows us to preserve some information about the relative position of words within the abstracts -which CTM do not do otherwise -while taking advantage of our prior knowledge of the documents' language.For instance, the word "dark" may convey different meanings depending on whether it occurs immediately before the word "matter", or, alternatively, "energy"; similarly, the occurrence of the expression "dark matter" in a text conveys more information than the simultaneous occurrence of "dark" and "matter" without more knowledge about their relative position.
As a result of this procedure, the vocabulary contains V = 18,658 "words", with 58 words per article on average.

A.3.2 Hyper-parameters
The implementation of the CTM by Tomotopy (Bab2min et al., 2021) has three hyper-parameters: the amount of topics k, an α parameter that controls the sparsity of the document-topic distribution (θ d,i ), and a η parameter that controls the sparsity of the topic-word distribution (the vocabulary associated to each topic).For choosing the amount of topics k, we considered three values that seemed acceptable in terms of interpretability and compliance with the values from the literature: 50, 75 and  , mass, effect, large, parameter, energy, value, analysis, small, order, region, current, due, contribution, present Experiments on light photon, electron, particle, experiment, mi, laser, compton, optical, mo, beam, light, atom, year, math e r s y m m e t r i c p a r t i c l e s H i g g s s e c t o r b e y o n d t h e S M S i g m a m o d e l s ( ? ) F l a v o u r p h y s i c s S u p e r s y m m e t r i c t h e o r i e s

Fig. 8
Fig. 8 Accuracy of the text-classifier from Section 3.2 as a function of the papers' years of publication.Error-bars represent the 95% confidence interval.Dashed lines show the accuracy of the baseline model (which may vary only due to variations in the frequency of each category, since the baseline model always predicts the most common class).The accuracy is roughly constant across time for each of the three categories, despite significant variations in the frequency of each class.

Table 2 :
Vocabulary specific to phenomenology (left column) versus theory (right column).

Table 3 :
Vocabulary specific to phenomenology (left column) versus experiment (right column).

Table 4 :
Most frequent terms for each topic.

Table 4 :
Most frequent terms for each topic.

Table 5 :
PACS categories most correlated to the topics derived with the unsupervised model.Correlation is measured as the mutual pointwise information (pmi).

Table 5 :
PACS categories the topics derived with the unsupervised model.Correlation is measured as the mutual pointwise information (pmi).

Table 5 :
PACS categories most correlated to the topics derived with the unsupervised model.Correlation is measured as the mutual pointwise information (pmi).

Table 5 :
PACS categories the topics derived with the unsupervised model.Correlation is measured as the mutual pointwise information (pmi).

Table 5 :
PACS categories most correlated to the topics derived with the unsupervised model.Correlation is measured as the mutual pointwise information (pmi).

Table 5 :
PACS categories the topics derived with the unsupervised model.Correlation is measured as the mutual pointwise information (pmi).