Abstract

Recent studies of scholarly data work argue that whatever researchers handle as data and, in particular, what researchers consider as potential evidence for supporting claims, counts as data. In this article I extend the relational approach towards data to the various ways of dealing with data in the course of a single research project. Relying on an example from ecology, I argue that data gain presence for and occupy researchers in manifold ways: for example, as a promise, desire or pressure, or as a problem of trust and strength. In this way we notice that data, in the eyes of researchers, have more than one mode of existence, each of which is linked with its own expectations, challenges and actions. And we notice that these modes of existence not only coexist in the framework of a single project, but can also compete with each other.

1. Introduction

Data, once rarely a central focus of science, is enjoying remarkable attention in the scholarly public today.1 Data, in combination with advanced computational methods, is being pushed as a key resource for new discoveries. Collecting, preparing, and sharing data have become scientific tasks of their own. The reutilization of data generated elsewhere is becoming ever more commonplace. Researchers and science policy often consider data as a commodity, to be conceptualized in categories of ownership and access. Scholars in science studies and in the history and philosophy of science quickly took up the new topics surrounding data. They questioned whether data-driven research is really such a new phenomenon (Müller-Wille 2017; Sepkoski 2013; Strasser 2012, 2019), and what real changes research practices are experiencing with the emergence of a new set of “methods, infrastructures, technologies and skills developed to handle (format, disseminate, retrieve, model and interpret) data” (Leonelli 2014, p. 2; see also Chadarevian 2018; Hilgartner 2017; Leonelli 2016; Stevens 2013, 2017; Strasser 2019). Databases take shape as an obligatory passage point, channeling what can become data, what—by categorization—data represents, and how data can be retrieved (Bowker 2005, ch. 3, Bowker 2000; Decker 2018; Hine 2006; Leonelli 2016; Leonelli and Ankeny 2012). Metadata issues turn out to be central (Mayernik 2019) and yet, the information attached to data regularly prove to be less useful than informal communications between researchers (Edwards et al. 2011; Hoeppe 2014; Shen 2017; Zimmerman 2008). We have learned so much about “the costs in time, energy, and attention required simply to collect, check, store, move, receive, and access data” (Edwards 2010, p. 84; see also Beltrame and Jungen 2013; Denis 2018; Heaton and Millerand 2013; Nadim 2016; Pontille 2010). In fact, “data friction” seems to be the rule (Edwards 2010, ch. 5; see also Bowker 2005, ch. 4).

As a result, we have a quite nuanced view of data work in today’s sciences. Yet the object of reference in all of these studies is hardly ever the focus. It almost seems as if data, following the Latin roots of the word “things given” (Rheinberger 2011, p. 337; on the shifts in the meaning of the term “data,” see Rosenberg 2018), is something about which, in and of themselves, there is not much to be said. Indeed, there is a consensus in science studies and in history and philosophy of science that any global definition of data must fail. Data in general, without reference to a specific context, is just a collective term for manifold things. As the editors of the OSIRIS volume on Data Histories conclude, “data is what its makers and users have considered it to be” (Aronova et al. 2017, p. 13). The reason for this is not merely the diversity of data types that come into play in scholarly work (Carlson and Anderson 2007). As Sabina Leonelli has argued, “data” must be framed “as a relational category” (Leonelli 2016, p. 78); “What counts as data depends on who uses them, how, and for which purposes” (2016, p. 78).

In this article I want to go one step further. Not only what is considered to be data, but also how data gains presence, is related to what researchers think and do. Even when scientists use the same words to talk about data, they may have quite different things in mind. For example, despite the sheer abundance of data today, researchers frequently claim that they have too little data. Asked for the reasons, some indicate that they don’t have the right data; others answer that they need additional data. Both groups complain of being short on data, but on closer consideration, they have different things in mind. When researchers point out that they don’t have the right data, this is probably because they learned about an aspect of their research object that was not considered in advance. When researchers ask for additional data, they very often want to reinforce a certain claim—for example, in reaction to a negative referee report. In the first situation, researchers frame data as a potential resource for further exploring the research object. In the second situation, researchers frame data as potential evidence that might increase the power of their claims.

Discussing and handling data as a potential resource is fundamentally different from treating data as potential evidence: with respect to the sort of data to be acquired (something else vs. more of the same), with respect to the expectations placed on the data to be acquired (new insights vs. enforcing a claim), and with respect to the necessary activities (retrieving or producing new sorts of data vs. multiplying the production of the same data as before). In the following I argue that the definition of something as data is relational not merely in the sense that it varies with respect to research cultures and research questions. It is also relational in the sense that researchers, depending on their focus of attention and the related challenges during a certain stage of a research project, realize data – in the strong sense of making something real – in several distinct ways of concern.

Investigating the many ways how data can be realized resembles what Annemarie Mol has labeled the study of “ontology-in-practice” (Mol 2002, p. 150). In The Body Multiple Mol argues that patients and doctors “enact”, i.e. experience and treat, a disease—arteriosclerosis—in numerous clearly discernible ways, which, taken together, all represent modes of the coming into being of one disease. As she concludes, “this ethnography-of-a-disease became a study into the coexistence of multiple entities that go by the same name” (Mol 2002, p. 151). For data, too, is a number of things defined in different ways, depending on the situation in which researchers think of and handle data. However, two differences remain. First, in Mol’s case we learn how a number of clearly distinguishable actors—patients, relatives, doctors in the ward, surgeons, radiologists, pathologists, technicians, etc.—enact multiple versions of arteriosclerosis. From the sample, which I will refer to in this essay, I learned how a single team of researchers with largely the same background consider and treat something that always goes by the same name—namely: data—in a number of clearly distinctive ways.2 Second, Mol’s study is much more concerned with the question of how unity is achieved despite multiplicity (see Mol 2002, ch. 3–5), whereas I want to show how multiplicity resides in apparent unity.

For my argument I will draw on a particular sample. For a few years now, a research group in marine ecology affiliated with Germany’s Alfred Wegener Institute for Polar and Marine Research has been carrying out long-term monitoring of an ecosystem in the Arctic Ocean (AWI Centre, CSD Research Topic 3). With respect to manpower, expenses and amount of data, this is relatively “small science.” The group relies mostly on its own data, the team consists of a few coworkers, and most of the equipment employs ordinary consumer technology. Furthermore, the theories, models and tools of marine ecology are well established. In one respect, however, the project promises something quite new indeed. Beneath surprising insights into an Arctic marine ecosystem, the data themselves represent one crucial outcome of the project, as long-term monitoring of an ecosystem in the Arctic Ocean has, to the best knowledge of the researchers, never before been undertaken in marine biology (Fischer et al. 2017). In this regard, the project may be characterized as “data centric,” a notion developed by Leonelli to label the Big Data approach in genomics and subsequent omics-branches (Leonelli 2016).

It is not obvious which way scientists realize data in a given moment. A “naked” account (hardly conceivable at all) of the data work done by the group won’t provide much information on the various realities of data in scholarly activities. Researchers, absorbed in their work, rarely elaborate on their momentary activities. Direct questions are not necessarily constructive, either. Asked what they do, researchers may answer that they collect, manage, process, analyze, or arrange data. But this alone provides us with no clue about the way in which data occupy researchers at any given moment. This only becomes clear through queries, which, for their part, have the disadvantage that they steer the actors’ attention in certain directions. In this sense, observation inevitably intermingles with interpretation in the presentation of my sample.

Descriptions and conclusions follow from short time observations at the research group’s base on the island of Heligoland (German North Sea) and informal conversations with the group leader. Notes were taken directly afterward, supported by keywords and photographs taken in the presence of the researchers. My argument is further informed by discussions on digital data work with biologists (one of them the leader of the Heligoland group), philosophers and historians of science, and artists (see Fischer et al. 2020). In the following it is not my intention to portray the monitoring project of the Heligoland group in terms of its scientific details and in its larger science policy context. Nor do I intend to reconstruct a “chain of transformations” in Bruno Latour’s sense (Latour 1999, p. 71), although I do follow the project’s workflow, from the acquisition of research material to the presentation of results.

I focus first on the various realities (modes of existence) which data acquire over the course of a research project. Then I will show that we can better understand scholarly data work by taking into account that data has more than one mode of existence. Two points are central. First: As we saw above, scientists, even when they use similar phrases to speak about data, by no means necessarily have the same thing in mind. If we do not realize that scientists complain about not having enough data for different reasons and with different intentions, not only misunderstandings will result. We will also fail to understand the logic that determines their thinking and actions in any given situation. Second: An essential characteristic of this logic is that the different ways in which scientists realize data in a research project, can come into conflict with each other and have to be reconciled.

2. Organisms in the Sea, Data on the Desktop

The main element of the monitoring technology is the remote-controlled underwater observatory REMOS, located in the Kongsfjord near Ny-Ålesund on the island of Svalebord (Spitsbergen). The device is anchored close to the shore and can be moved up and down in a range between a depth of eleven meters and the surface. A fully automated digital stereo-imaging system takes a pair of photographs of the space in front of the observatory every half hour. In addition, sensors permanently register the temperature, conductivity/salinity, oxygen and turbidity of the water. Epistemologically speaking, each pair of photographs represents a sample. In earlier times researchers simply lowered a bucket into the water or used a fishing net. Because of the time and manpower necessary, this probably would have taken place just a few times a week and only for as long as the formation of ice allowed. With REMOS1 the group obtains 48 samples every day, mostly independent of weather conditions and permanent local support.

The photographs taken by REMOS1 are automatically transferred to a data collection center on the German North Sea coast, thousands of kilometers south of Svalebord. The group then downloads the photographs from the server to their local computer system at Heligoland and stores them in monthly directories. When the group leader speaks of data in this context, he has in mind the many thousands of JPEG files accumulating over time. In this moment, data are realized as a pile of work. To use the technical term: the team is busy with data management. Possible questions include: is the observatory working without error; is the remote transfer taking place smoothly; is the set of photographs complete; are all files readable; is the directory named correctly; do we need additional storage space; do we have all of the necessary sensor data?

Usually, some weeks elapse between the storage and the processing of the photographs. For analysis, the JPEG files are imported into a computer program, and one photograph from each pair is opened. In a first step, the researcher assesses the visual conditions in the photograph, clicks a button on the screen to comment on the result, for example “high turbidity,” and, if necessary, adapts the contrast and brightness of the photograph for a better view. Afterward, s/he inspects the photograph very rapidly and counts any macroorganism recognizable on the screen. Each time a fish, crustacean, jellyfish, or anything else is identified, the researcher clicks the button corresponding to the organism observed. In the next step, each identified organism is measured. For this, a new feature of the program is started. Now the two photographs, left and right, are arranged vertically—one above the other—on the screen. Via the computer mouse the organism is marked lengthwise with a line, and then the program calculates the actual length of the organism by comparing both photographs. Finally, the exact species of the observed organism must be determined; the process often stops here because this is work usually done by outside experts.

Although I have gone into great detail, the most important step of the process has so far escaped mention. At the very moment the researcher begins examining the photographs, a simple csv table file is generated in the background by the computer program. Here, the basic information regarding the photograph and the various steps of the analysis are recorded on a separate line for each organism, starting with the name of the JPEG file and ending with the length of the organism. This table—usually rendering the results of one month—serves two functions. It allows subsequent researchers to check what has been done up to that moment, because the photographs are frequently processed in more than one session. And secondly, the table provides the basis for any further research action. After the processing of the photographs has been completed, the JPEG files are archived. I will return to this point later.

What happened while the researcher was studying the photographs on the screen? Observation alone does not provide insight into this. I can see how the researcher inspects one photograph after the next. I follow the movements of the cursor on the screen and register which buttons are clicked. I admire the rapidity of processing; even the length measurement takes just a few seconds. When the researcher switches to the csv file I can even view the results of the work performed. But the main point is less obvious. In the course of the analysis, a substantial transformation has taken place. First of all, every photograph has been reduced to a line in a table. Going along with this, the content of the photograph is condensed into two numbers and a name: how much, how large, what species. Yet, simultaneously, a considerable gain takes place: What attracted the attention of the researcher—marine macroorganisms, fish, crustaceans, jellyfish, and so on—acquires the character of research data. By this I mean that the data do not swim in the sea. On the contrary: at the very moment the researcher clicks the button, the organisms captured by the photograph gain a new status; they become fixed as matters of interest instead of floating by as something of no further importance.

I emphasize this point because research data gain presence only against a certain conceptual background. They are never simply collected, but always generated. As Bruno Latour once remarked: “One should never speak of ‘data’—what is given—but rather of sublata, that is, of ‘achievements’” (Latour 1999, p. 42). Research data have to be extracted from collected material (here the JPEG files), and to do so the researchers need a conceptual framework (in our case reified in a computer program), however provisional it may be, to allow them to isolate certain aspects in the material as their data. Without that framework, the photographs show an abundance of details but do not provide any information; they remain, in Hans-Jörg Rheinberger’s sense, “traces” not yet related and condensed “into an epistemic thing” (Rheinberger 2011, p. 345). Accordingly, what takes shape as data depends on the related research question. Note that the processing of the photographs taken by REMOS1 can also result in a very different set of research data: for example, a set of data on the growth of macroalgae (which is actually done by another group).

When, after examining the photographs, the group leader speaks of our data, he has something promising in mind: The lines of the csv table may provide answers to a research question. At this instant, data are realized as a basis for further exploitation. The status of a promise can persist for a considerable time span. In a long-term monitoring project, the minimum period that must be covered by a data set is twelve months, according to the group leader. Such a data set may provide first indications, but solid conclusions demand comparison with observation periods of two or three years. The first data generated by the Heligoland group therefore remained, to some extent, in a state of latency. Analysis could not become meaningful until a certain number of data had been accumulated.

When sufficient data are available, the exploration of the data starts. A very simple statistical analysis might relate, for instance, water temperature to taxa observed over a certain time period. The criteria guiding the analysis are clearly not selected at random; they follow from background assumptions about the monitored ecosystem. It is key, however, that the researchers not know what kind of results the analysis will yield. As the group leader repeatedly emphasized, because knowledge on Arctic ecosystems is scant and no study like this has been performed before, they had no explicit hypothesis when running the first analyses. Of course, they had certain expectations, but initially they obtained nothing more than patterns, some easily understandable, others very surprising with respect to the established body of knowledge.

In this part of the process, the researchers work with data. They do not explore an ecosystem (that is, a concept), nor a region of the Arctic Ocean (that is, a part of world); they explore a csv table. I call this table a tractandum, that is, something in need of treatment. As just described, one treatment—and perhaps the most obvious approach—consists in studying the research data with the help of programmed statistical tools. Yet, simultaneously, the research data must be evaluated with respect to their quality: are there bugs or artifacts in the table that might, for example, explain a strange distribution pattern? And the group must once again be careful with the data. Every analysis of the data must be documented: which portion of the data and which program settings were used? In a word, various metadata must be captured by the research group. In the technical language of the sciences, all this is covered by the term data analysis.

The outcome of data exploration typically consists of graphs. Arranged and merged in this way, research data become part of an argument (see Law and Whittaker 1988, pp. 176–78). A figure from the first publication of the project, published in April 2017, shows the “seasonal cycle in total species abundance (upper panel) and species composition (lower panel) pooled per month of the year,” for the time span from October 2013 to November 2014, measured in CPUE (Fischer et al. 2017, p. 265, fig. 5). Based on this graph and several other figures, the authors come to the following conclusion: “The data reveal a distinct winter community in the fjord’s shallow water ecosystem, which by far exceeds the summer community in both abundance and species diversity” (Fischer et al. 2017, p. 269). Sentences like the one just quoted and similar phrases like “our data clearly show” or “our data suggest” are typical for the discussion section of a scientific paper. This time, data are not conceptualized as objects of concern and evaluation, and they are not realized as a tractandum still open to interpretation. In these phrases, data represent the evidence by which an, indeed, rather unexpected finding—the rich winter community—is substantiated. Now the data are no longer something to be understood. For a moment, the data have acquired the character of a final reference, backing up a scientific statement. But perhaps just for a moment, because after reading the paper competing researchers might doubt the authors’ interpretation of data.

But what happened to the photographs recorded by REMOS1? As we already know, they quickly disappear from the scene. When team members in a meeting discuss the latest data, they do not go back to the photographs, but refer to slips of paper covered with graphs in front of them at the table. Using the photographs as a starting point would be useless: they don’t reveal anything about what the analysis of the csv tables has revealed. Instead, after counting, measuring and naming is over, the photographs are saved in storage devices. Yet, although they have been set aside, they, too, acquire a new mode of existence in the course of the research activities. Initially approached as something to be transformed into research data, upon publication of the results they gain the status of a proof. Anyone who raises doubt about the feasibility of the whole project or the quality of the obtained research data can be confronted with the archived files.

3. Realizing Data

What data is in researchers’ minds and hands is rarely questioned. Even Sabina Leonelli, who provides the most refined philosophical access to recent data work, reduces the manifold ways in which data gain presence in the course of research activities to just one. In her book-length study of data-centric biology, she proposes “to define data as any product of research activities, ranging from artifacts such as photographs to symbols such as letters or numbers, that is collected, stored, and disseminated in order to be used as evidence for knowledge claims” (Leonelli 2016, p. 77; emphasis in original). The group at Heligoland would not disagree with this definition; nevertheless, to think of data merely in terms of “prospective evidence” (Leonelli 2016; p. 77; emphasis in original) ignores that the members of the group realize data in many more ways in their daily work.

In the data work done at Heligoland, seven modes of existence can be distinguished.3 (1) data as something missing: no long-term data series for a single ecosystem in the Arctic; (2) data as something to be generated: hundreds of JPEG files to be processed; (3) data as something to be managed: keeping pace with the data flow; (4) data as something to be evaluated: looking out for bugs, etc.; (5) data as tractandum: analyzing data sets; (6) data as evidence: backing up claims; (7) data as proof: there is no doubt, here, in the archive, are the data. In each case data is realized in a different way. In data as something missing resides a desire, which can be motivated by knowledge gaps, new funding programs, or more abstract political and economic aspects, like, in our case, the circumstance that climate change impacts and growing commercial shipping are beginning to intervene in Arctic ecosystems. In contrast, data as something to be generated is coupled with a twofold kind of pressure: on the one hand, so many photographs still waiting in the directories, and so little time and manpower to do the job right at the moment; on the other hand, if time and manpower are available, this task is framed by the demand for acuity and thoroughness. Data as something to be managed put no less pressure on researchers, yet in this case, the question of trust is what produces permanent concern. Data as a tractandum is surrounded by an atmosphere of promise. Every csv file is coupled with the expectation of future insights into the monitored ecosystem. When data analysis starts, the promise of data takes shape as openness: it was not at all clear what the group at Heligoland might learn about the Kongsfjord ecosystem. In turn, data as evidence is considered by researchers in terms of strength—including a sense for where they are weak. Lastly, with data as proof, the sheer existence of data becomes dominant. Researchers’ attention narrows onto questions of archiving, accessibility, and documentation.

We will understand scientific data work sufficiently only when we pay close attention to what researchers “see before themselves” when they talk about data and handle data. Studies to date have generally positioned themselves as counter-narratives against the grand data promises of the last decade (see, for example, Edwards et al. 2011, p. 668; Leonelli 2016, p. 1; Strasser 2019, p. 3). They emphasize how complicated data work really is and show the way in which the new digital tools actually transform research cultures. What has attracted less interest, however, is the internal logic of data work in a research project. How do researchers realize data in a particular situation? What types of concerns structure the activities? Is there always just one mode of existence in a given situation? Might different modes of existence interact with each other? One last example from the Heligoland group will help to illustrate this point.

Occasionally the group leader receives an automated mail message notifying him that the server is unable to pick up data from REMOS1. Such an incident can be framed as a classic case of data friction. The data no longer flow, and the “cost in time, energy, and human attention” accumulates (Edwards et al. 2011, p. 669). If we follow the group’s attempts to find out what happened, we become aware of the many interfaces that connect REMOS1 in the Kongsfjord with the desktop computer at a lab bench thousands of kilometers away. We learn about power sockets in the sea, underwater nodes, handshakes, server systems, transfer speeds, and many more issues, any of which can be a source of error. Overall, the data friction approach shifts attention towards the manifold causes that might have led to the breakdown. If we analyze the same incident within the framework of data-centric research, we become aware of an aspect that is almost absent from Leonelli’s account. Instead of ‘soft’ measures for data handling, labels, standards, and so on, which may smooth the exchange of data between various places, we learn about hardware like cable networks, computer centers, power supply, technicians, and whatever else is needed to make data physically travel. When we ask what the breakdown means with respect to data-centric research, we will immediately focus on these massive infrastructures. Notwithstanding the fact that even tiny details like the corrosion of a plug in salt water are sufficient to bring an entire project to a halt.

The Heligoland group knows very well how much the monitoring project depends on a stable infrastructure. And, clearly, they are very much interested in minimizing data friction. In the first instance, however, the interruption of data transfer is a data management problem. Is part of REMOS1 defective? Is data transmission disrupted somewhere along the stretch between Ny-Ålesund and Heligoland? Or—the most common cause—is one of the scripts not working properly? When researchers speak of data in this context and search for lost data, they realize “missing” data above all as a question of trust. Concerns about the stability of data transfer arise once again, and overall confidence in the system is partly undermined. Yet researchers also realize these lost data in a second way. While the group members try to find out what is going wrong, they are aware of that the interruption is also a problem of strength. Two more concerns occupy their mind: How many data points are we going to lose, and how much will the gap in the time series impact the validity of a potential argument? Fortunately, a real loss of data is a rare event, and two weeks of missing data in the summertime would not be a disaster, according to the group leader. But two months of uncaptured activity in the main season from January to March would truly cause difficulties for the project. In this case data take shape as potential evidence, and the concerns about the reliability of data management intermingle with fears regarding the robustness of prospective results.

As this example illustrates, in a single situation it is possible for data to become the object of concern in more than one way simultaneously. In the case above it is correct to assume that the concern about the validity of future research results substantially drives efforts to “retrieve” the lost data. However, this is true only in part. For the Heligoland group, proving the possibility of reliably operating a remote monitoring system at a climatically extreme location is itself a goal of their work. The authors close their first publication by vigorously emphasizing that “the study demonstrates the advantages of permanently operated cabled observatory technology” (Fischer et al. 2017, p. 271). Accordingly, it would be wrong to trace the concern about possible data losses back exclusively to the understanding of data as potential evidence.

By asking about the way in which researchers realize data in a certain situation, we can learn how the manifold activities of data work are interrelated. In most situations researchers probably do realize data in more than one way, and in some cases compete with each other. The monitoring project of the Heligoland group is clearly governed by the vision to collect long time series of data from a place where this has never been done before. Although the project is based on the automatization and remote control of data collection, it cannot be carried out without local infrastructure. There is a simple reason why REMOS1 is inserted into the water directly next to the research station at Kongsfjord: Here the observatory can be connected to the power grid and the internet, and the technicians and divers who maintain REMOS1 can find accommodations in the station.

In contrast to what would be expected from data understood as potential evidence, in the Heligoland group the scientific question is not the only factor that determines which data are generated, and not even the primary motivation. Rather, the infrastructure required to operate REMOS1 is a decisive in determining from which location in the fjord the observatory can supply photographs. In other words, the relationship between data-as-evidence and data as something missing is anything but trivial in this case. In order to satisfy the desire for data, the Heligoland group must (at least for the moment) make concessions regarding the strength of their data. While this dependency is somewhat concealed, it is unmistakable nevertheless, and the resulting restrictions even found mention in the publication of the first results: “The results of this study are by far incomplete and only represent a 1-year study at a specific site in the Kongsfjorden ecosystem, which may or may not be representative of the shallow water community of this area” (Fischer et al. 2017, p. 270; my emphasis).

4. Conclusion

The question of what “data” is has been largely settled in the fields of Science Studies and History and Philosophy of Science. Data does not comprise a specific kind of thing that is essentially different from other kinds of things. Rather, data refers to an actors’ category that, in principle, can include any kind of thing. This relational approach exists in a broad version, according to which scientific data is simply whatever any scientist calls as such (see Aronova et al. 2017); and in a narrow version, according to which data are whatever a scientist treats as potential evidence for scientific statements (see Leonelli 2016). The latter view appears to have prevailed. By now it is considered to be common sense that “the notion of ‘data’ in the context of scholarly work is intimately tied to the evidentiary value of whatever entities are being marshaled in support of an argument or claim” (Mayernik 2019, p. 734).

The relational approach, whichever version, makes clear how any random thing can become something special called data. This transformation must not be underestimated, for it fundamentally changes the way the thing exists. To elaborate on my example: Fish and other organisms on a photograph become data in a table, the analysis of which, in turn, promises insights into an ecosystem. I already discussed above how the research question determines what kind of traces in the recorded photographs become data. Similarly, it is obvious that, in the long term, the entire process serves to generate potential evidence for a statement about the investigated ecosystem. Yet it is worth asking whether this covers all of the ways in which data becomes present in a research undertaking. When researchers talk about data, do they have only the robustness of future results in mind, and is the way they deal with data determined by this in all conceivable situations?

As we have seen, the collected monitoring data are marked and characterized not only by the research questions, but just as much by the infrastructures required. Furthermore, filling a blind spot in data collection is as important for the research group as producing new knowledge on the monitored ecosystem. Even from a purely quantitative perspective, the group spends nearly as much time controlling and maintaining the observatory and the data transfer as it does evaluating the data and publishing the results. This may sound exaggerated, but when the photographs from REMOS1 reach Heligoland, the researchers are first of all pleased that the entire system is working properly—before they think of the fact that potential evidence has arrived.

The various modes in which data comes into being in one and the same project demand compromises and adjustments with respect to aims, strategies and work standards. There is no way to predict how much and in which way a particular aspect of data work matters to researchers. Forty-eight photographs per day is an amount of material that a small research group like the one at Heligoland is capable of managing and analyzing. More photographs, for example one every ten minutes, might well reinforce the ecological argument, but simultaneously increase the pressure on the group’s data management. Relevancies thus differ from project to project, and from one situation to another. In this respect, directing attention to the various modes in which data exists for researchers functions as a kind of door to access a more refined picture of the internal logics that mark data work in each particular instance. In fact, seeking a single logic of data work is just as senseless as agreeing on a single definition of what data is. Rather, I would argue: logic is local.

If we understand that data encompasses much more than merely potential evidence, we can also avoid drawing premature conclusions: When scientists complain that they have too few data, by no means are they saying, as mentioned at the start, that the amount of data at their disposal is too small. Further, we can avoid premature generalizations: When researchers worry about lost data, they are not thinking only of lost evidence. What is more, generalizations must be avoided. Learning about the multiple ways researchers realize data in the course of their activities is a necessary complement to the numerous accounts about the frictions of data work and the fortunes of data travel. Then we will better understand how the different ways in which researchers occupy themselves with data interact and yield a local logic of data work. In short, we will get at a better sense of all the issues bound up in the ubiquitous talk of data.

Notes

1. 

In the essay I use the term data with the singular when I speak of data in general, with the plural when I speak of specific data.

2. 

At one point Mol addresses this aspect in a side remark, when she insists that one actor, here the surgeon, can of course enact arteriosclerosis twofold—in words, or with the knife in the hand, depending on the situation: “While the first is a talker, the second is a cutter” (Mol 2002, p. 143).

3. 

Of course, this list is not complete. One major aspect missing from my sample is the issue of data sharing. Up to now the Heligoland group has shared its photographs only on an informal basis, with colleagues interested in the growth of macroalgae (as already mentioned in passing above). From conversations I conclude that the researchers see data as something to be shared, on the one hand, as a source of anxiety (losing control, meeting quality standards); on the other hand, researchers conceptualize shared data as a kind of currency in scientific exchange.

References

Aronova
,
Elena
,
Christine
von Oertzen
, and
David
Sepkoski
.
2017
. “
Introduction. Historicizing Big Data
.”
Osiris
32
(
Data Histories
):
1
17
.
AWI Centre
. “
CSD Research Topic 3: The Shallow Water Fish and Macroinvertebrate Community in the Arctic. A Long Term Monitoring Approach
.” https://www.awi.de/en/science/special-groups/scientific-diving/scientific-projects.html#c36727
Beltrame
,
Tiziana Nicoletta
, and
Christine
Jungen
.
2013
. “
Cataloguing, Indexing and Encoding. How Data Come to Life
.”
Revue d’Anthropologie des Connaissances
7
(
4
):
747
759
.
Bowker
,
Geoffrey C.
2000
. “
Biodiversity Datadiversity
.”
Social Studies of Science
30
:
643
683
.
Bowker
,
Geoffrey C.
2005
.
Memory Practices in the Sciences
.
Cambridge, Mass./London
:
MIT Press
.
Carlson
,
Samuelle
, and
Ben
Anderson
.
2007
. “
What Are Data? The Many Kinds of Data and their Implications for Data Re-use
.”
Journal of Computer-Mediated Communication
12
:
635
651
.
Chadarevian
,
Soraya de
.
2018
. “
Things and Data in Recent Biology
.”
Historical Studies in the Natural Sciences
48
:
648
658
.
Decker
,
Kris
.
2018
.
Data Struggles. The Life and Times of a Database in Historical Climatology
.
Social Science Information
57
(
1
):
6
30
.
Denis
,
Jérôme
.
2018
.
Le Travail Invisible des Données. Élements pour une Sociologie des Infrastructures Scripturales
.
Paris
:
Presses des Mines
.
Edwards
,
Paul N.
2010
.
A Vast Machine. Computer Models, Climate Data, and the Politics of Global Warming
.
Cambridge, Mass./London
:
MIT Press
.
Edwards
,
Paul N.
,
Matthew S.
Mayernik
,
Archer L.
Batcheller
,
Geoffrey C.
Bowker
, and
Christine L.
Borgman
.
2011
. “
Science Friction. Data, Metadata, and Collaboration
.”
Social Studies of Science
41
:
667
690
.
Fischer
,
Philipp
,
Gabriele
Gramelsberger
,
Christoph
Hoffmann
,
Hans
Hofmann
,
Hans-Jörg
Rheinberger
, and
Hannes
Rickli
.
2020
.
Natures of Data. A Discussion between Biology, History and Philosophy of Science and Art
.
Zürich/Berlin
:
Diaphanes
.
Fischer
,
Philipp
,
Max
Schwanitz
,
Reiner
Loth
,
Uwe
Posner
,
Markus
Brand
, and
Friedhelm
Schröder
.
2017
. “
First Year of Practical Experiences of the New AWIPEV-COSYNA Cabled Underwater Observatory in Kongsfjorden, Spitsbergen
.”
Ocean Science
13
:
259
272
.
Heaton
,
Lorna
, and
Florence
Millerand
.
2013
. “
La Mise en Base de Données de Matériaux de Recherche en Botanique et en Écologie. Spécimens, Données et Métadonnées
.”
Revue d’anthropologie des connaissances
7
(
4
):
885
913
.
Hilgartner
,
Stephen
.
2017
.
Reordering Life. Knowledge and Control in the Genomics Revolution
.
Cambridge, Mass./London
:
MIT Press
.
Hine
,
Christine
.
2006
. “
Databases as Scientific Instruments and their Role in the Ordering of Scientific Work
.”
Social Studies of Science
36
:
269
298
.
Hoeppe
,
Götz
.
2014
. “
Working Data Together. The Accountability and Reflexivity of Digital Astronomical Practices
.”
Social Studies of Science
44
:
243
270
.
Latour
,
Bruno
.
1999
.
Pandora’s Hope. Essays on the Reality of Science Studies
.
Cambridge, Mass./London
:
Harvard University Press
.
Law
,
John
, and
John
Whittaker
.
1988
.
On the Art of Representation. Notes on the Politics of Visualisation
. Pp.
160
183
in
Picturing Power. Visual Depiction and Social Relations
. Edited by
Gordon
Fyfe
and
John
Law
.
London, New York
:
Routledge
.
Leonelli
,
Sabina
.
2014
. “
What Difference does Quantity Make? On the Epistemology of Big Data in Biology
.”
Big Data & Society
1
:
1
11
.
Leonelli
,
Sabina
.
2016
.
Data-Centric Biology. A Philosophical Study
.
Chicago, London
:
University of Chicago Press
.
Leonelli
,
Sabina
, and
Rachel A.
Ankeny
.
2012
. “
Re-thinking Organisms. The Impact of Databases on Model Organism Biology
.”
Studies in History and Philosophy of Biological and Biomedical Sciences
43
:
29
36
.
Mayernik
,
Matthew S.
2019
. “
Metadata Accounts: Achieving Data and Evidence in Scientific Research
.”
Social Studies of Science
49
:
732
757
.
Mol
,
Annemarie
.
2002
.
The Body Multiple. Ontology in Medical Practice
.
Durham, NC/London
:
Duke University Press
.
Müller-Wille
,
Staffan
.
2017
. “
Names and Numbers. ‘Data’ in Classical Natural History, 1758–1859
.”
Osiris
32
(
Data Histories
):
109
128
.
Nadim
,
Tahani
.
2016
. “
Data Labours. How the Sequence Databases GenBank and EMBL-Bank Make Data
.”
Science as Culture
25
:
496
519
.
Pontille
,
David
.
2010
.
Updating a Biomedical Database. Writing, Reading and Invisible Contribution
. Pp.
47
66
in
The Anthropology of Writing. Understanding Textually Mediated Worlds
. Edited by
David
Barton
and
Uta
Papen
.
London, New York
:
Continuum
.
Rheinberger
,
Hans-Jörg
.
2011
. “
Infra-Experimentality. From Traces to Data, from Data to Patterning Facts
.”
History of Science
49
:
337
348
.
Rosenberg
,
Daniel
.
2018
. “
Data as Word
.”
Historical Studies in the Natural Sciences
48
:
557
567
.
Sepkoski
,
David
.
2013
. “
Towards ‘A Natural History of Data’. Evolving Practices and Epistemologies of Data in Paleontology, 1800–2000
.”
Journal of the History of Biology
46
:
401
444
.
Shen
,
Yi
.
2017
. “
Data Sharing Practices, Information Exchange Behaviors, and Knowledge Discovery Dynamics. A Study of Natural Resources and Environmental Scientists
.”
Environmental Systems Research
6
(
9
):
1
14
.
Stevens
,
Hallam
.
2013
.
Life out of Sequence. A Data-Driven History of Bioinformatics
.
Chicago/London
:
University of Chicago Press
.
Stevens
,
Hallam
.
2017
. “
A Feeling for the Algorithm. Working Knowledge and Big Data in Biology
.”
Osiris
32
(
Data Histories
):
151
174
.
Strasser
,
Bruno J.
2012
. “
Data-driven Sciences. From Wonder Cabinets to Electronic Databases
.”
Studies in History and Philosophy of Biological and Biomedical Sciences
43
:
85
87
.
Strasser
,
Bruno J.
2019
.
Collecting Experiments. Making Big Data Biology
.
Chicago/London
:
University of Chicago Press
.
Zimmerman
,
Ann S.
2008
. “
New Knowledge from Old Data. The Role of Standards in the Sharing and Reuse of Ecological Data
.”
Science, Technology, & Human Values
33
:
631
652
.