Abstract
Scribe (2022) is a choral work for three voices. It is a multidisciplinary project that encompasses paleography, machine learning, transcription, and performance. Furthermore, Scribe is a work of parafictional art where fact and fiction overlap, conventional practices of paleography and edition-making are playfully reconfigured, and supposed historical authenticity is employed as a compositional material. This paper describes the creative processes in the making of Scribe before evaluating aspects of the uncanny and material agency. It draws upon autoethnographic analysis before contextualizing this within the psychoanalytical criticism of philosopher Slavoj Žižek.
Project Outline
Scribe brings together two supposedly disparate entities: notated sacred music of the Late Middle Ages and machine learning image processing. Both represent contemporary technological means to manipulate graphical information and convey meaning [1,2]. Both might also be seen as “black boxes” by the layperson—the enigmatic ambiguities of antique music notation on the one hand and the obscurity of computer code on the other. How might these distant technologies intertwine and convey new meanings?
Simply put, I employed a neural network (a form of machine learning) to sample the “Old Hall Manuscript,” British Library Add. MS 57950 (henceforth, Old Hall MS) [3], and generate new images based on its learning. I then compiled these images to form the pages of a fabricated manuscript before transcribing these into modern music notation. I worked with the EXAUDI vocal ensemble to interpret the music. The result is music of the Middle Ages but somehow warped, artificial, and dreamlike.
Motivations and Context
Even though I am an enthusiastic amateur medievalist, I am foremost a composer of experimental music; my motivations and methods are playful, mischievous, and at times, purposefully provocative. An immediate comparison is A Late Anthology of Early Music Vol. 1: Ancient to Renaissance (2020) [4], in which composer Jennifer Walshe trained a neural network on her own voice and used the audio outputs to synthesize an array of early music repertoire. However, my motivations and methods in Scribe differ from Walshe’s significantly.
Manuscripts of the medieval era are striking visual-material documents that evidence highly sophisticated and imaginative cultures. As such, these valuable and often fragile objects are preserved within libraries and museums, occasionally exhibited publicly but routinely secured in material-sustaining conditions. However, the composition of these manuscripts often relied upon chaotic histories of addition, amending, and erasure by numerous parties across time, as was the case with the Old Hall MS [5]. They are, by nature, palimpsest, collage, and unstable. I equate the glass-casing of manuscripts by cultural institutions with geographer Caitlin DeSilvey’s conception of “arrested decay” [6], in which material histories and processes are frozen in time. By using machine learning to manipulate and destabilize the digitized Old Hall MS, I hope to circumvent the glass-casing of the manuscript and continue a process of vibrant transformation, one that engages with the manuscript’s contemporary materiality—a digital, opencontent artifact. The images generated might be viewed as naive statistical extensions or responses to the original. I hold my transformations in equal regard to those contrived by the manuscript’s multiple scribes, antiquarian revisionists, vandals, or incidental processes of natural decay. Whether intentional or accidental, these transformations tell material narratives.
Owing to its chronology within music notation evolution, the Old Hall MS presented me with a well-counterbalanced sample with which to explore such transformations. Despite the ambiguity of music notation in the medieval era [7,8], the notation of the Old Hall MS is developed enough to allow productive identification and interpretation of the symbols generated by the machine. Yet the notation remains adequately ambiguous (e.g. underdetermined rhythmic meters, beat divisions, and voice pairings) as to permit creative decisions when transcribing the new music. The notation of the Old Hall MS is a liminal space within which I could both borrow and lean upon convention and simultaneously lose myself within an enigmatic world.
Training a Machine Scribe
If my expertise in medieval music is limited, my abilities as a computer scientist are nonexistent. To train my machine scribe, I collaborated with software engineer Christopher Melen at the Royal Northern College of Music Centre for Practice & Research in Science & Music (PRiSM). The description below of the employed neural network is necessarily simplified; a reader doesn’t need to comprehend algorithmic processes in depth but should instead gain a broad understanding of how the neural network samples the manuscript and generates new images, and how these processes might affect their content.
Christopher and I employed a generative adversarial network (GAN), a form of deep learning, generative modeling algorithm, commonly known for its application using image data to generate realistic-looking human faces. A GAN uses two neural networks, one generative model and one discriminatory, in a comparison game (hence the term “adversarial”). Both models are fed a training set—here, the 224 scanned digital pages of the Old Hall MS, provided by the British Library, each divided into 16 segments. They then scan and infer the pixel distribution of the training set data to determine a “probability distribution” (i.e. a prediction of a likely representation) [9]. Using random noise, the generative model then upsamples its learning based on this probability to generate a new image. Based on the same learning, the discriminatory model then evaluates the legitimacy of this new image. The two models compete in a zero-sum game to generate progressively authentic adaptations of the training data [10]. The use of GANs in a creative context raises ethical considerations. With sufficient training data, such networks can generate novel compositions and render musical professionals obsolete. While GANs open innovative avenues for creative practice, any approach should acknowledge such concerns and explore possibilities for reinventing and extending, rather than replacing, the need for human labor.
The GAN network presented a fitting scribe for our new fabricated manuscript. The notation of European sacred music in the medieval era contrasts significantly with that of today. As well as recognizable clefs and staves to denote pitch, various symbols and their combination indicate a nonabsolute representation of rhythm and meter that might be interpreted in various ways. The network, whose processes rely on statistical probability devoid of cultural and aesthetic bias [11], here solely trained on the notation of the Old Hall MS, identifies and samples these symbols as an alien interpreter; as a scribe (perhaps drunk, incompetent, or ignorant), it learns what notation to replicate and how to present it on the page, but it does not reproduce the musical syntax.
The network completed 5,000 “epochs” (i.e. training cycles of the Old Hall data) and generated ten 512×512 pixel images every fifth and tenth epoch. So, the network generated 10,000 new images in total. Each epoch set comprised variations on two distinct images. In its early learning, the network produced repetitive, grid-like structures reminiscent of the manuscript’s rigid staves. Gradually, the model added abstract black figures to these grids, which began to taper into red stave lines. At epoch 130, we might have identified a recognizable note of medieval mensural notation, the minima. Other symbols appeared inconsistently. It was not until epoch 780 that we found a convincing ligature, though again these were erratic and took hundreds of epochs to fully establish. By epoch 1,500, the rigid grid began to dissipate and images exhibited variation between systems with varied notational figures. By epoch 4,000, we found instances of sophisticated notation that might pass as fragments from an authentic manuscript, although these still ranged wildly between notational variety and incessant repetition.
It is worth briefly reflecting on why our machine scribe prioritized some notational aspects and overlooked others. In its counterfeit outputs, the GAN network prioritized stave lines and the notes of the mensural notational system. This was due to their frequency and calligraphic consistency in the training set; the uniformity of the original scribal work is extraordinary and provided the ideal dataset for the network to infer and sample. However, there are aspects of the Old Hall MS the network had either sampled inconsistently or did not reproduce entirely. Musical clefs and mensuration symbols appear infrequently; coloration (where varying colors and shading indicate temporary changes in duration or meter) appears only sporadically; the text of the Old Hall MS (chiefly settings of the ordinary of the mass) is represented by the incessant reproduction of minims; the gorgeous illuminated initial letters of the Old Hall MS were unfortunately omitted entirely in the network’s outputs. Considering these scribal priorities, I was conscious of music medievalist Kate Maxwell’s insistence that “the disparate elements of an artefact come together to make meaning” [12]. For these new images to hold meaning beyond graphical symbols on a “page,” I would have to account for and problematize these inconsistencies and omissions, the ghosts of which present puzzles to solve. My responses to these puzzles, detailed below, required choosing and interpreting aspects of the images generated. Human choice therefore plays a significant role in Scribe. My use of machine learning foregrounds the human mediator and the choices, biases, and intuition they must bring to bear.
Compiling a Fabricated Manuscript
Presented with the network’s outputs, I needed a framing device for these new images—glitchy counterfeits of an historical artifact. I have long been attracted to works of experimental music that tread the hazy line between historiography, archiving, and fiction. Works such as Jennifer Walshe’s Historical Documents of the Irish Avant-Garde (2015–ongoing), Peter Falconer’s What Happened to Seaton Snook? (2017–ongoing), and Joanna Bailie’s Roll Call (2018) use historically compelling, but partially or wholly fabricated, media and narratives to explore alternative pasts and question cultural hegemony. Art historian Carrie Lambert-Beatty uses the term “parafictional” to describe artworks in which “real and/or imaginary personages and stories intersect with the world as it is being lived” [13]. As a creative methodology, I began to work under the fantasy that the neural network—nothing more than algorithmic architecture—was in fact a medieval scribe, a diligent tradesperson seated in the scriptorium copying out “new” compositions, perhaps perched alongside the original Old Hall MS scribes. Under this pretense, each epoch was now a stage of this copying process and each image a fragment of an historic manuscript.
I took on the role of paleographer—one who assembles, deciphers, and edits ancient texts. Subsequently, I began pairing outputs from consecutive epochs to form the pages of a fabricated facsimile manuscript (henceforth, Scribe MS). Using a free-form digital notepad, I arranged groups of images according to various criteria, including stave alignment and page layout; some outputs suggested the edge of a page or a viable join to another. As an example, I detail my handling of epochs 4,005 and 4,010 (henceforth, f4005–4010v). I chose two images from each epoch—4,005(0); 4,005(1); 4,010(1); and 4,010(2)—based on their varied content, optimum image resolution, and intact notation. Images 4,005(1) and 4,010(1) feature dark imprints on their upper margins, hinting at the underlying binding of the Old Hall MS scans. These two images form the top half of my new page. Consecutively, image 4,010(2) contains a C-clef in its top-left corner, suggesting the left-hand side of the page. I then aligned these four images, loosely following the red stave lines left to right. The text at the top of 4,010(2) conveniently underpins the music at the bottom of 4,010(1). As seen in Fig. 1, my criteria for arranging outputs are fallible and the joins between images are abrupt. This process was about me as paleographer making sense of the images in a semiregulated fashion. Incidentally, I enjoy the visual disparity between the supposedly high-art object presented with lo-res imperfections.
Transcribing Scribe MS
To transcribe my paired images, I had to interpret each invented page. This included determining the number of musical voices and how they are laid out on the page, and how to fit text to the music. In manuscripts such as the Old Hall MS, individual compositions are presented in various ways across one or multiple pages [14]. For instance, voices are presented in “score notation,” where all voices are presented together in consecutive systems with a single text below, or “part notation,” where individual voices and text(s) are presented in their entirety one after another. The former is typically used to notate homophonic textures while the latter is used for more intricate polyphony [15]. I interpreted my fabricated page structures in a multimodal fashion, balancing immediate visual suggestions with closer examination of various notational elements.
As an example, in the instance of f4005–4010v in Fig. 1, I instantly distinguished the single line of music running across the bottom of 4,010(1) and 4,005(1) from the repetitive figures in 4,010(2) and 4,005(0) on the account of the former’s notational variety and underpinning text. The latter immediately struck me as a pair of two voices. This page, I reasoned, included three voices. The first—the cantus—is written in florid “part notation” over one system with its own text, and the remaining voices—the countertenor and tenor—in “score notation” across two systems, sharing a single text. The interpretation of pages again allowed me to play paleographer, to make naive yet telling creative decisions following a semiconsistent logic in the knowledge that the fictitious manuscript might be interpreted in countless ways. As in the parafictional works mentioned, these decisions allowed me to challenge supposed cultural authorities, in this case, the disciplines of paleography and musicology. My intention is not to undermine the methods and practices of these disciplines but to suggest that the interpretation of a historical document (musical or otherwise) is a highly creative and experimental act imbued with personal, cultural, and sensory biases. This should be acknowledged and celebrated.
Having established the structure of my pages, I began to transcribe the individual voices—both their music and texts. This similarly involved a balancing act, here between rigorous transcription and editorial license. On the one hand, I identified and interpreted the mensural notation following Willi Apel’s The Notation of Polyphonic Music 900–1600 (2010). On the other, my interpretations were highly flexible, based on immediate visual suggestions, a subsequent squinting of the eye, and efforts to jigsaw voice durations and harmonies together. Furthermore, where the machine scribe had failed to include certain notational information or voice endings, I added clefs, mensurations, and cadential endings. These editorial decisions, made under the pretense I was working with a genuine historical manuscript with corresponding genre constraints, were a way of making sense of the fabricated manuscript; the resulting transcription should partially function and sound like medieval music.
This pretense and genre constraint began to lead my editorial and curatorial decisions. As an example, I detail my handling of the first three bars of f4005–4010v (Fig. 2). I identified the clef and beginning pitch of the countertenor. I then added clefs to the other two voices to create an opening harmony suited to the genre. Next, I determined the mensuration of the voices. This involved a back-and-forth process, navigating the ambiguities of mensural notation and the low-resolution images. For instance, the beginning ligature and ensuing rest in the tenor voice might be read in numerous ways. Once I decided on a given mensuration, I transcribed the rhythm represented in Fig. 2. This reading informed my interpretation of ambiguous note durations in the countertenor. Reading a skewed notehead as a square breve would suggest one mensuration resulting in interval x with the tenor, and as a diamond semibreve another mensuration and interval y. This specific example might stand to represent my entire process of transcription. What I hope to convey is that each graphic reading may open and close numerous interpretive possibilities that open and close a myriad of eventualities. I view this process—this piecemeal patching, meshing, and bastardizing—as a playful, iterative dialogue with the machine-generated notation, an affective agent in the creative process in Scribe.
A brief note regarding the Scribe MS text: As mentioned, the network resampled single-stroke minims only. This extreme limitation led me to a thirteenth-century poem found in MS IV 524 Bl. 3r in which an entire satirical sentence is written almost entirely using minims [16]. I therefore transcribed the groups of minims generated in Scribe MS into corresponding words from this poem, creating a nonsense, pseudo-Latin text to set the music to.
Performance Workshop
Following the above process, I produced several short movements for three voices. I then worked with EXAUDI vocal ensemble, directed by James Weeks, in the interpretation of these movements. In the first workshop, I hoped to explore how the machine-generated notation might inform performance practice. Specifically, could the glitchy, low-resolution images influence the vocal timbre, articulation, or dynamic? Might the repeated ligatures hint toward a mechanical phrasing and articulation? How should the ensemble deliver the nonsense text?
However, while exploring these questions in the workshop, I began to confront the role of the transcriptions I’d produced. At face value, my fabricated music has no historical context. I would therefore need to make any arising performance practice explicit in the score. For instance, if I sought a “mechanical” performance, I would indicate this in the score using accents at the beginnings of repeated phrases and include instructions such as “detached” and sans vibrato. I felt this stylized approach to the score and performance would conflict with my pretense of historical authenticity; the more I shaped the score with instruction for performance, the more I eroded my own fantasy. As Weeks suggested in the workshop, regardless of the music’s origins and without any instruction, “we look for patterns … we create coherence through patterns and relationships. … It sheds a light on what we are listening for” [17]. Instead, Weeks compared my pared-down score to an urtext edition, in which the editor solely reproduces the original composer’s intentions with minimal amendment. This comparison motivated me to double down on my pretense and present the transcriptions without interpretive information, in a format reminiscent of early music editions, replacing the nonsense text derived from the images (whose limited phonemes produced a lackluster, instrumental-like texture when sung) with settings of the Mass and Marian hymns. I hope to invite a performance appropriate to these genres.
I began to question not only the role of the score and its appearance but also my presence as a composer in the public presentation of the music—an opportunity to extend my pretense further. Rather than present the music under the title Scribe, written by Mark Dyer, I name individual movements according to their text (typical to the genre) and credit them to a fabricated composer working in the late fourteenth century. The composer’s name and ambiguous life span, M[arcel] le Gan, b. [13] 57–95 to 00, contain several Easter eggs hinting toward the music’s origins.
Parafictional Art and the Perversion of Practice
Scribe features three imaginary elements—a fabricated manuscript, a contrived edition, and a phantom composer—situated within the context of medieval music and editions. These situated fabrications lead to a tension between the real and fantasy. To scrutinize this tension, I analyze my processes and reflections using Slavoj Žižek’s psychoanalytical critique of twentieth-century film in The Pervert’s Guide to Cinema, directed and produced by Sophie Fiennes. I am interested in the parallels between the illusory experience of film outlined by Žižek and my own self-deception in creating a parafictional artwork using machine learning. I do not speak for a listener’s reception of Scribe.
My creative decision-making in Scribe reflects a genuine attempt to configure material (margins and stave lines) and musical (rhythms and harmonies) aspects as of an authentic historical work. Nevertheless, fully aware of my own pretense, I became increasingly astonished to find instances where these aspects “fit” together. I felt this more keenly in the transcription process, where two or three separately transcribed voices combined to produce fleeting artifacts of medieval music. These artifacts might be harmonic, including voices descending in thirds or cadencing at consonant intervals, or rhythmic, with separately transcribed voices of comparable duration. At such moments, transpiring from my own haphazard patching process outlined above, I experienced increasing feelings of uncanniness.
This uncanniness is, according to Žižek, similarly critical to our reception of film. Despite knowing the film is “a fiction, it still fascinates us. … Illusion persists. There is something real in the illusion, more real than in the reality behind it” [18]. Žižek suggests that our consumption of film requires a self-deception, in which actors, narratives, and sets assume a state of un-realness that leave us simultaneously cognizant and beguiled. Our belief in the created world is conditional on emotional affectation. I relate this conflicting experience to my own self-deception and fascination during transcription. Inwardly, I knew the manuscript was counterfeit—the product of algorithmic statistical sampling, as well as my own design. Still, I transcribed the manuscript as if it were genuine. And yet, I would double take when I could fit harmonies or durations together, when I could convince myself the manuscript was made with authorial intent. We might understand this double take as a cognitive dissonance, what Žižek describes as “the paradox of belief” [19], required by the artist in their various roles (in this instance, composer, historicist, paleographer, and trickster) in the creation of parafictional art. Despite the fiction we create, we let ourselves be emotionally affected; we get caught in our own game and the illusion of reality persists.
But what are the limits of this illusion, this emotional affectation, for me as parafictional artist? In his analysis of erotica in cinema, Žižek suggests that the tension between reality and fantasy occurs most distinctly in the pornography genre, but at a cost. He explains that, in return for close-up shots of sexual activity, “you are not allowed … to be emotionally seriously engaged. … [Pornography] tries to be as realistic as possible, but it has to maintain the minimum of phantasmatic support” [20] with ludicrous narrative justification. While the Scribe project is in no way erotic, I find Žižek’s analysis a playful lens through which to scrutinize my creative experience. I relate the outputs of the GAN neural network to Žižek’s “close-up”; while by no means sexual, these images—in their uncanny realism and strangeness—are revealing, alluring, and, if I may, seductive. But they similarly incur a cost. I compare my contrived pretense of authenticity—including pixelated manuscript pages, the liberal interpretation of notation, fanciful editorial additions, and comical invention of a historic composer—to the delusive narrative support of pornography. Furthermore, the charade relies on significant intervention—the pairing and interpreting of images according to the conservative genre of mensural music—to retain any sense of caricature realism for me, the artist. My pretense does not hold up to scrutiny (nor should it) but it provides me with the minimal narrative scaffolding to make sense of the strange and seductive images and justify my methods of compilation and transcription. As a result, I can only convince myself and be emotionally engaged up to a point. My feelings of uncanniness and any notions of authenticity are self-aware, tongue-in-cheek, and limited. The illusion of reality for me as the parafictional artist is restricted.
The self-awareness of my own role in the creation of a fictional reality leads me to take a brief account of artistic and displaced agencies. I am less interested in whether we can say the machine-generated manuscript is authentic; there is a vibrant body of scholarship dedicated to the problematizing of authorial intent, both within the context of machine learning and without. Rather, I am interested in the neural network’s capacity to inspire the uncanny and contribute to my confusion between reality and fantasy. Media theorist Marshall McLuhan suggests, “It was not the machine but what one did with the machine that was its meaning or message” [21]. McLuhan here refers specifically to the medium of presentation. I appropriate this notion in relation to artistic process to contextualize my methods of transcription. The machine-generated images, seductive as they are, are not of great interest in their own right; it is only through their compiling, interpretation, and performance that they might acquire meaning. In other words, the emotional affect I feel in the creative process of Scribe is the result of suddenly, serendipitously finding meaning—my own meaning, contrived or no—amid the machine-made forgeries. Following sociologist Andrew Pickering, I might describe my interplay with the nonhuman—the neural network and its outputs, but also the Old Hall MS itself and complexities of mensural notation—as a “dance of human and material agency … structured as a dialectic of resistance and accommodation” [22]. This push and pull, working with or against something not fully within my control, heightens the emotional response of this meaning; despite this resistance or because of this accommodation, I’ve made something that makes sense to me. Moreover, owing to the interdisciplinary nature of Scribe and the conflicting intentions that arise from my various roles, I might extend Pickering’s conception to a “dance” with multiple versions of myself—composer, paleographer, editor, etc.—entangled between and refracted through the various nonhuman material agents present. In this uncanny dance, I propose that the parafictional artist—me—is displaced from itself in a game of guessing, trickery, and compromise.
The Scribe project began as a thought experiment, as me asking what would happen if I trained a neural network on a music manuscript. What transpires is a confrontation of my role as an artist and a questioning of the reality my practice inhabits. I believe that such displacement from the self—a perversion of practice—is crucial for the parafictional artist to find meaning in their created reality. Once the artist finds this meaning, whether their illusion holds is irrelevant. The artist is emotionally affected, and a world is created.
Acknowledgments
This research was conducted as part of Cyborg Soloists, supported by a UKRI Future Leaders Fellowship and Royal Holloway, University of London (Grant number MR/T043059/1). For the purposes of open access, the author has applied a Creative Commons Attribution (CC BY) license to any Author Accepted Manuscript version arising.
References and Notes
Glossary
- ligature
specific symbols in mensural notation indicating groups of connected notes.
- mensuration
the metric pattern, equivalent to a modern-day time signature.
- minims
a single stroke used in varying combinations to signify various letters.
- ordinary of the mass
sections that are common to all celebrations of the mass.