Abstract
This article presents a custom system combining hardware and software that senses physiological signals of the performer's body resulting from muscle contraction and translates them to computer-synthesized sound. Our goal was to build upon the history of research in the field to develop a complete, integrated system that could be used by nonspecialist musicians. We describe the Embodied AudioVisual Interaction Electromyogram, an end-to-end system spanning wearable sensing on the musician's body, custom microcontroller-based biosignal acquisition hardware, machine learning–based gesture-to-sound mapping middleware, and software-based granular synthesis sound output. A novel hardware design digitizes the electromyogram signals from the muscle with minimal analog preprocessing and treats it in an audio signal-processing chain as a class-compliant audio and wireless MIDI interface. The mapping layer implements an interactive machine learning workflow in a reinforcement learning configuration and can map gesture features to auditory metadata in a multidimensional information space. The system adapts existing machine learning and synthesis modules to work with the hardware, resulting in an integrated, end-to-end system. We explore its potential as a digital musical instrument through a series of public presentations and concert performances by a range of musical practitioners.
Introduction
Bioelectrical signals from the human body have been used in electronic and computer music for over 50 years. The late Alvin Lucier's “Music for Solo Performer” (1966; cf. Straebel and Thoben 2014) is the most famous early work using brain signals in concert performance. In the early digital era, Knapp and Lusted (1990; see also Lusted and Knapp 1996) created the BioMuse, a device that digitized brain encephalogram (EEG) and muscle electromyogram (EMG) data for MIDI synthesizer control. This foreshadowed broader developments in human–computer interaction (HCI) by means of brain–computer interfaces (Tan and Nijholt 2010) and physiological computing (da Silva et al. 2014b). The advent of microelectronics has made low-cost biosignal interfaces available to wider artistic and musical communities who repurpose generic human interface devices (HIDs) for creative applications.
This article is organized as follows: We first retrace the history of research and musical practice using biosignals. We then the introduce the hardware and discuss design decisions to make a device specific for musical applications. We next describe characteristics of the muscle EMG signal and its signal features, and we present various sonification and mapping techniques to transform user input to sound output. We outline the system architecture, present several interaction paradigms, then propose a multidimensional gesture–sound mapping technique based on regression modeling feeding content-based concatenative sound synthesis. We describe a series of public presentations in workshops and concerts, and finish by providing perspectives for future work.
Physiological Interfaces for Music
Although music using brain electroencephalography began in the 1960s with Alvin Lucier and continued in the 1970s with composers like David Rosenboom, music made with the muscle EMG is more recent. Performance artists Laurie Anderson and Pamela Z used the BodySynth system in the 1990s to integrate EMG interaction into their multimedia stage performances (Kalvos and Damian 2005; Mason 2016). The BioMuse Trio was a chamber music ensemble formed by Ben Knapp, Eric Lyon, and Gascia Ouzounian using multiple modes of physiological interaction in musical performance (Lyon, Knapp, and Ouzounian 2014). Yoichi Nagashima (2016) created the Mini BioMuse to extend the performance of Japanese traditional music and interactive media art. The performance artist Marco Donnarumma (2011) created a system, Xth Sense, to measure gross muscle deformation acoustically via the mechanomyogram for musical performance.
The Myo Gesture Control Armband was a general-purpose consumer EMG interface marketed between 2015 and 2018 that has been adopted by the music research community (Visconti et al. 2018). Nymoen, Haugen, and Jensenius (2015) carried out an early evaluation of the device, and Erdem, Lan, and Jensenius (2020) conducted an analysis of effort qualities in performance. We used the Myo for gesture design (Ward et al. 2016) and for multimodal gesture–sound interaction design approaches (Visi et al. 2017). The interest in the Myo was supported by the development of middleware facilitating musical use (Di Donato, Bullock, and Tanaka 2018; Caramiaux et al. 2022; see also the software repositories at https://github.com/cpmpercussion/myo-to-osc, https://github.com/benkuper/MyOSC, and https://github.com/benkuper/MyOSC). The Myo was discontinued in 2018, and Thalmic Labs's EMG-related patents were sold to the startup CTRL-Labs, which went on to be acquired by Meta.
There are several platforms in the do-it-yourself (DIY) electronics space that allow creative hackers to work with physiological signals, including EMG. The SpikerShield by Backyard Brains is a “shield,” or daughterboard, for a standard Arduino microcontroller board. It consists of an analog preamplifier that brings the EMG into the 0- to 5-V range for digitization by the Arduino's analog-to-digital converters (ADCs). The company publishes a series of tutorials (called “experiments”), including one on emulating a piano keyboard with muscle tension. The Myoware is a suite of components for low-cost EMG acquisition that include electrode triplet supports, analog circuitry to rectify the signal and carry out envelope following, a power supply, a display, and an Arduino shield. The Plux Bitalino (da Silva et al. 2014a) is a complete microcontroller system for the DIY community that offers a novel modular circuit that can be snapped apart and recombined. Modules include EEG, electrocardiogram (ECG), and EMG amplifiers.
EMG in HCI
In HCI research, the use of muscle-based interfaces has been motivated by the need to interact with a nonphysical interfaces. Applications have been proposed for users with disabilities (Barreto, Scargle, and Adjouadi 2000), or in wearable contexts where devices are too small to embed traditional physical interfaces (Wheeler and Jorgensen 2003). Costanza et al. (2007) set EMG in a mobile interaction context in which discretion allowed subtle interaction. Saponas et al. (2009, 2010) look at the practical benefits of EMG interaction in which the hands may be busy in other tasks. Chen et al. (2007) recognized 24 hand gestures, consisting of wrist motions and finger extensions.
Multimodal interaction strategies are common in gesture recognition, where kinematic sensors provide information complementary to physiological data (Georgi, Amma, and Schultz 2015). We explored the musical potential of a bimodal combination of EMG and relative position sensing, introducing the concept of “bidirectional complementarity,” where similar gestures in one mode may take on different musical meaning depending on information on a second sensing mode (Tanaka and Knapp 2017).
Electromyography in HCI is not limited to hand gestures. Manabe explored unvoiced speech recognition using EMG from facial muscles (cf. He et al. 2020). Richard Hazlett (2003) measured facial EMG to detect frowning and smiling as ways to detect user frustration during interactions with website interfaces.
Outside of European and North American research, interaction research using the EMG has looked at the relation to muscle force (Kuriki et al. 2012), arm strength training (Ho et al. 2015), use of support vector machines for gesture classification (Naik, Kumar, and Jayadeva 2010), finger movement discrimination (Gupta and Ryait 2012), and sustainable aid applications (Majid, Al-Sharify, and Al-Sharify 2020).
Research Context
The EAVI-EMG system is the outcome of a series of research projects in the period 2012–2019. In the Meta Gesture Music project (Tanaka 2018), we identified EMG energy as a signal feature that represented the user sensation of gesture power (Caramiaux, Donnarumma, and Tanaka 2015). In the BioMusical Instruments project, we took research insights from the previous projects to create an end-to-end musical instrument prototype as described here. Our objective was to integrate the considerable scientific and musical knowledge in the field to exploit EMG for music and to combine it with interactive machine learning and sound synthesis to produce a complete, combined hardware-and-software system for musical instruments. We sought to exploit recent advances and lowered costs in electronic design, notably the burgeoning modular synthesizer sector, to propose a new hardware design that was designed not as a general-purpose interface but conceived from the ground up as an audio and MIDI device. We use high-precision components to minimize analog processing, resulting in a design that is reprogrammable and integrates the EMG in the audio signal-processing pipeline. By proposing a complete, affordable system and publishing the hardware designs and software implementations as open source, we hoped to fill a gap left by the disappearance of the Thalmic Labs Myo and to support the music research community and nonspecialist musicians who may have been left orphaned by its disappearance.
Rather than make a general-purpose HID, we sought to craft an instrument-like system specifically designed for musical performance. This meant arriving at a hardware design that unified treatment of biosignal and audio in a single digital signal processing (DSP) chain. This also meant creating gesture analysis and sound synthesis functionality as part of the hardware-plus-software package. We sought to exploit recent advances in the use of EMG in HCI for musical applications. Unlike general-purpose devices like the Myo, our system was designed from the ground up to be specifically for music.
Digital Musical Instruments
We draw upon the literature on digital musical instruments (DMIs) to inform the design and development of the EAVI-EMG. After analyzing the work of Michael Waisvisz (1949–2008) with the Hands, Torre, Andersen, and Baldé (2016) conclude that experimentation and refinement with a system in cycles of artistic research can be a model to evolve a controller into an instrument. The definition of a musical instrument as an extended system has been proposed by Tanaka (2011), using the electric guitar as an example to include effects pedals and amplifier within the scope of the instrument system. As a further example, Tanaka cites the DJ setup with two turntables and mixer as an extended system instrument requiring external content in the form of the vinyl record.
Borrowing from the HCI literature (Gaver, Dunne, and Pacenti 1999), DMIs can be considered as “cultural probes” (Tahıroğlu et al. 2020) where our experience and dialogue on musical performance are vital to defining an instrument. Performance becomes an act of questioning the device or instrument and the musical practice developed around it. Establishing a clear relationship with the instrument and building an instrumental practice is fundamental to recognizing a device as a DMI. Exposing the instrument hardware and software to the performer, as well as the live “redesign” of the mapping from action to audio feedback, can become part of instrumental design practice. In the context of a musical performance, enabling instrument modification opens up sonic and performative possibilities (McPherson and Zappi 2015). This fosters inventiveness and scope for creativity in playing a new instrument (Arfib, Couturier, and Kessous 2005).
3DMIN was a project (2013–2016) that looked at the “design, development, and dissemination” of new musical instruments (cf. Bovermann et al. 2017, p. 2). In her contribution to Bovermann's book, Sarah Hardjowirogo (2017) discusses the concept of instrumentality in establishing the musical instrument potential of any sound-producing object or system. Forms of instrumentality may arise from use, intention, cultural negotiation, distribution. In the case of technological musical instruments, techno-cultural processes of electrification, digitization, and virtualization mean that instrumental qualities are transitional and in constant flux. Hardjowirogo proposes an inventory of criteria that includes sound production, intention, learnability, playability, expressivity, cultural embeddedness, and audience perception. Instrumentality was a working concept for us in the development of the EAVI-EMG.
The Electromyogram
Signal Features
The EMG is not a continuous signal, but the sum of discrete neuron impulses. This results in an aperiodic signal that poses challenges in information processing. For interactive applications to track a participant's state or gesture, some kind of signal analysis and feature extraction needs to take place. EMG features of the signal can be grouped into time-domain (TD) and frequency-domain (FD) features. Given the aperiodic nature of the EMG, there is no harmonic content for FD algorithms to extract, making such algorithms less useful for EMG analysis (Phinyomark et al. 2013).
The most straightforward TD feature is amplitude estimation, which reflects the level of muscle tension. Simple amplitude estimation can be achieved as envelope-following by smoothing, or low-pass filtering, of the raw EMG data—for example, by taking the median of the signal over a time window. But this introduces latency and a lag having the length of the median window. Root mean square (RMS, the square root of the mean of the squares of samples in a time series) is a calculation of electrical power, and it relates to constant force and nonfatiguing muscle contraction. Recursive Bayesian estimation, or a Bayes filter, is a probabilistic approach for estimating an unknown function over time using incoming measurements. The algorithm is a nonlinear estimator of amplitude that achieves a better signal-to-noise ratio than RMS and stabilizes the signal, all while remaining reactive to transients (Hofmann et al. 2016).
Other TD features that show good performance (Phinyomark et al. 2010; Phinyomark and Scheme 2018) are: mean absolute value, providing energy information of the signal; waveform length, the cumulative length of the waveform over time, which is related to the signal complexity; and Willison amplitude (cf. Scheme and Englehart 2014) as a measure of frequency information of the signal, similar in nature to the number of zero crossings.
Frequency-domain features are based on statistical properties of the EMG signal's power spectrum density. These features are used to detect neural abnormalities and muscle fatigue. The most common FD features are: median frequency, the frequency at which the spectrum is divided into two regions of equal amplitude; peak frequency, the frequency at which the maximum power occurs; and spectral centroid, the center-of-gravity line of the spectrum (Phinyomark et al. 2017).
We have developed a new compound TD feature called vector sum that reports amplitude and direction (Zbyszyński et al. 2021; see also Figure 3). Vector sum exploits limb physiology where muscles can oppose or reinforce the action of other muscles isometrically. To calculate the vector sum with multiple EMG sensors, we model each sensor as representing a vector pointing away from the center of a circle. The direction for each vector is constant and the magnitude is proportional to the amplitude. The vectors are summed, giving the overall direction of force. When compared to the sum of all electrodes, the vector sum can distinguish gestures where muscles oppose one another isometrically. This is useful in cases where joint movement might be minimal, but the subjective perception of effort is quite high. In the EAVI-EMG system, we use amplitude estimation by RMS and Bayes filter as well as vector sum.
EMG Signal Acquisition
The EMG signal is measured by electrodes making electrical contact with muscle tissue. This can be done in an invasive or noninvasive manner. Invasive electrodes consist of needles inserted directly into the muscle. This provides a direct reading of the electrical potential of the muscle and can focus on a single muscle cell. It requires near-clinical conditions, however, and brings discomfort that would not be tenable for physical musical performance.
Noninvasive EMG is captured by surface-mounted electrodes. Surface EMG (sEMG) makes electrical contact with the surface of the skin, using conductivity of human tissue to transmit muscle-cell electrical potential through the skin to the electrode. The best contact can be made with wet gel electrodes made from silver and silver chloride. This requires some preparation of the skin, and such electrodes have a limited lifespan. Dry electrodes have the advantage in ease of application and reuse, although their conductivity is not equal to that of gel electrodes. In all cases, sEMG is a coarser measure of EMG activity than invasive techniques, and it reports on muscle tension of a number of muscles under the skin at the location of the electrode. This allows monitoring the activity of a muscle or a muscle group.
We can measure MUAPs by the difference between two electrodes placed at different sites on the same muscle. The signals are subtracted, and the result is amplified; any signal at both electrodes is therefore eliminated. This circuitry is familiar to musicians from balanced XLR audio cables used for microphones. The third electrode is used in an unrelated area, often joints or bony areas, to act as a reference ground that filters out ambient electrical noise.
Electrode Placement
System Architecture
Recent advances in electronics hardware, signal processing, and information analysis have made physiological computing applications practical and feasible, taking it out of the biomedical domain to find applications in HCI. There is a gap in the market between the two extremes, however. Although high-end medical grade hardware offers excellent signal quality, it remains expensive and difficult to use outside of clinical settings. On the other hand, DIY alternatives are often built upon low-grade, general-purpose amplifiers and ADCs not specifically tuned to the noisy and delicate nature of the physiological signal.
Signal Acquisition
The characteristics of EMG signals present several challenges in signal acquisition. The combination of low signal level, large baseline drift, and relative proximity of noise frequencies to the signals of interest requires compromises to be made, notably between noise removal and signal fidelity. At one end of the design spectrum, the acquired signal is too noisy to be useful, while at the other, filtering may significantly alter or remove salient features.
Traditionally, these compromises must be carefully designed into analog filters that precede digitization. Getting these filters right is critical to successful signal acquisition, particularly since they are fixed and cannot be adjusted to different use cases. Modern ADCs can support a very wide dynamic range, however, which makes it feasible to digitize the signal directly with minimal analog filtering, taking care primarily to prevent aliasing of higher-frequency noise components. This means that the acquired signal includes both baseline drift and noise. These must then be removed in the digital domain, through signal processing. The advantage of this approach is that digital filters are not designed into the hardware and can be adjusted and adapted to specific use cases. Furthermore, specialized filtering techniques may be applied in the preprocessing stage to optimize certain signal features. The high-resolution ADC allows for use of less overall gain, meaning that the signal can be accurately tracked across relatively wide changes in electrode offset (baseline drift). Previously the drift had to be compensated for in the analog domain to ensure that the ADC received a less volatile, highly amplified signal. With 24-bit resolution, the dynamic range is large enough to capture even extremely low-level signals with detail. A programmable gain stage allows the device to be tuned to different signal ranges. We therefore chose to exploit the high-resolution ADC to digitize raw EMG with no preamplification and no analog filtering. Any noise reduction and filtering are done after acquisition in the digital domain. Although this was a gamble, it had the benefit of simplifying the hardware design, leading to lower production costs, and it left us free to change noise reduction and filtering strategies. This brought signal conditioning closer in the DSP chain to the feature extraction stage; it also leaves open the possibility in the future to explore new techniques, such as noise reduction based on machine learning.
The ADS129x, with its 24-bit sigma-delta ADCs, is capable of operating with an extremely low noise floor, down to less than 5 μV RMS. The decimation filters down-sample the input signal from over 200 kHz and provide efficient antialiasing. The cutoff frequency, and hence signal bandwidth, is programmable from 5 Hz to 1,280 Hz. The output signal of the ADS129x is low-pass filtered and antialiased, but it includes baseline drift and some amount of power-line noise. In addition, the TI chip has flexible signal routing and can improve common mode rejection (to reduce noise) with its integrated Right Leg Drive features.
Preprocessing
The sensor front end is combined with a microcontroller unit (MCU) capable of signal processing at the output signal rate (four channels at up to 6 kHz) to remove baseline drift and noise. The MCU is responsible for configuring and communicating with the front end and connecting over a standard serial interface with a data transceiver. Preprocessing on the MCU consists of sample rate conversion, a high-pass or DC filter to remove baseline drift, data conversion, a notch filter to remove powerline hum, and a low-pass filter to remove noise from electromagnetic interference (EMI).
Sample rate conversion is used to align the accelerometer data with the sampled EMG signal and to downsample before wireless transmission. High-pass filtering is required to remove or drastically reduce the baseline drift. After this stage, data conversion can be applied to reduce the bit depth, and thereby the dynamic range, without saturation or clipping. All signal preprocessing is reprogrammable in the MCU firmware.
Transmission
The preprocessed data can be transmitted to a host computer in one of two ways, USB or Bluetooth LE (BLE). When transmitting over USB, the device implements a class-compliant USB audio interface, which allows high-rate, high-resolution, low-jitter, multichannel data transmission without requiring device drivers.
With wireless BLE transmission, the bandwidth is restricted, which requires sample rate and bit-depth reduction in the preprocessing. To allow real-time data transfers to a host device, we establish a connection using the BLE MIDI standard profile. This allows interoperability with a wide range of devices. Data is transmitted as MIDI pitch bend messages using one MIDI channel per electrode. This supports up to 16 channels of data, either EMG or from the inertial measurement unit (IMU), each with 14-bit resolution.
Our default configuration was based on four EMG channels plus three channels of accelerometer data at a sample rate of 8 kHz. When connected by USB, all seven channels could be transmitted at 8 kHz with 24-bit resolution. Using BLE, the data was downsampled 64-fold to 125 Hz and truncated to 14 bits (after preprocessing).
OWL Microcontroller Framework
The onboard signal processing is implemented using the OWL (Open Ware Laboratory) framework (Webster, LeNost, and Klang 2014). We take advantage of the audio DSP capabilities of this framework, which has been used in a range of modular synthesizers and guitar effects pedals, as the core signal-processing engine of the EAVI-EMG. The digitized EMG signal is injected into this DSP chain, combining biosignal and audio into a single signal-processing pipeline. The OWL platform offers tools to develop patches in any of the languages C++, Faust, Pure Data, Max gen∼, Soul, and Maximilian. Patches can be compiled offline, and the patch binary is packaged as MIDI system exclusive messages, sent by USB to the device, and dynamically loaded. This allows for fast prototyping, development, and test cycles, with no dependency on specific hardware features.
Host System
Sound Synthesis
Sound synthesis is implemented in a modular manner in Max. We implemented two paradigms for sound production from EMG—sonification and parametric synthesis control. In the former, the body was considered a sound source to be subsequently processed; in the latter, performer gesture was used as a controller, articulating synthesis unit generators. We used three different libraries in the Max environment to implement these paradigms at increasing levels of complexity: (1) a “plain vanilla” use of standard objects in the Max package; (2) SCP for Max by Manuel Poletti; and (3) IRCAM's MuBu.
Sonification
Sonification of muscle activity allows us to hear the neuron impulses of muscle exertion as data. A direct audification of MUAPs is heard as a stochastic pulse train reflecting performer limb activity. The raw EMG signal is heard directly as an audio signal, upsampled to audio rate where the stochastic pulse train is heard as an inharmonic series of spike pulses. Muscle exertion invokes groups of muscle cells, increasing the density of spikes heard.
This signal then feeds a series of audio processing units, including modulators, resonators, and filters. The raw EMG pulse train becomes the excitation stimulus for the subsequent audio processing. Resonant filters allow the high-frequency content to excite tones at the resonance frequency of the filter. This is then duplicated into a resonant filter bank of multiple tunable frequencies that begins to simulate acoustic resonance by a noisy source. Ring modulators also enable the spiking content of the raw EMG data to excite sonic material at various musical tunings relative to a carrier signal.
Parametric Synthesis
In the controller paradigm, the goal was to enable a range of different strategies that map performer gesture to musical output. This goal includes accommodating classical mapping strategies from the literature in the field of new interfaces for musical expression, one-to-many and many-to-one (Hunt and Kirk 2000), as well as more recent approaches using neural networks to create regression models as a means for automatic mapping (Visi and Tanaka 2021). We present a system for creating such mappings associating extracted gesture features to auditory metadata. Sample- and wavetable-buffer playback arranged in various granular-synthesis architectures allowed the broadest support for different sound-synthesis and audio-processing approaches, from oscillator-style synthesis, to time stretching, to corpus-based concatenative synthesis. The granular audio generator is subsequently processed in a user-definable set of filters and modulators.
Feature Mapping
To associate features extracted from gesture directly to auditory features, we used CaTART, a concatenative synthesis system driven by music information retrieval. Content-based concatenative synthesis (CBCS) is an extension of granular synthesis in which grains, or units, are automatically generated and are catalogued by auditory features through the use of music information retrieval and the timbral descriptors it generates (Schwarz 2007). Longer sounds are created with CBCS by combining shorter sounds, where units can be recalled by a query that uses a vector of those features. The actual grain to be played is specified by a target and features associated with that target. The target may be of the same modality as the corpus or a different modality. In our case, the target is sensor data or some representation from the EMG of performer gesture, and it may have the same feature dimensionality as the corpus, or a different dimensionality.
Gesture-to-Sound Translation
In this section we will describe the mapping strategies that we adopted for the purpose of coupling bodily gestures with sound dynamics. Van Nort (2009) proposed moving away from looking at mapping as an isolated process—he aimed to go from a “connective tissue between control and sound parameters” towards a process of structuring an interplay between human actions and musical dynamics. This calls for design principles that take into account criteria of instrumentality beyond sound production (Hardjowirogo 2017) and view the interaction design workflow as an “affordance” (Altavilla, Caramiaux, and Tanaka 2013) of the instrument itself.
From Direct Mapping to Feature Mapping
Hunt, Wanderley, and Kirk (2000) identify explicit mapping strategies to define the relationship between performer actions and sound-synthesis parameters. They distinguish this category of mapping strategies from those that involve generative mechanisms and training procedures, such as artificial neural networks. Arfib et al. (2002) consider explicit mappings to be those in which one can clearly describe how input and output are related. They describe such approaches as being more mathematically transparent than implicit mapping. In the context of our work, we see explicit mapping as a way of designing direct relationships between signal features derived from the performer's action and features of the synthesized sound. We have used an EMG feature we have previously reported as gesture power (Caramiaux, Donnarumma, and Tanaka 2015; Zbyszyński et al. 2021). Carmiaux's team noted that the muscular activation associated with the gestures of the performer has a direct, explicit relationship; Cadoz et al. (1984) also found that gestures transfer energy into the instrument.
As the dimensionality of input signals and sound synthesis increases, it can become an increasingly complex task to associate parameters across the two domains in ways that are intuitive and consistent and that allow for a wide range of musical expression. To address this increasing complexity, we have abstracted both gesture input and sound output into higher-level features, or metadata. We refer to the process of explicitly mapping gesture features to sound features as feature mapping. This approach to the design of gesture–sound relationships allows one to make use of the semantic and perceptual qualities of the features extracted from raw signals. The method has some limitations, however, as not all signal features carry higher-level meanings or have obvious perceptual qualities. Moreover, just like any other explicit mapping approach, the method becomes impractical as the number of features and parameters increases and their correlations become more complex.
Implicit Mappings and Machine Learning
The complexity of feature mapping makes explicit mapping difficult. Here, regression modeling by means of machine learning enables forms of implicit mapping through a paradigm called “mapping by demonstration” (Françoise 2013). Complex mappings between sets of parameters can be defined concurrently, in a holistic design procedure, with mappings between individual parameters defined implicitly in the process as opposed to the explicit mappings between features and sound parameters. We adopt an interactive machine learning (IML) workflow (Fiebrink and Cook 2010) for creating implicit mappings by way of linear regression (Visi and Tanaka 2021).
Regression is the task of estimating the relationship between an independent variable (or a feature) and a dependent variable, or outcome. This is done by building a statistical model that explains how the variables are related. Regression is a typical supervised learning task, meaning that the model describing the continuous function is trained using a set of examples describing the relationship between input data and output data in a few specific cases. Regression is a powerful method in the context of gesture–sound interaction, since it allows one to easily define complex, continuous mapping functions between gesture features and sound-synthesis parameters. This can be done by providing examples consisting of sample input data—such as EMG signal features—paired with sound-synthesis parameters. Thereby, designing interactions between gesture and sound for interactive music performance is an interactive procedure mediated by machine learning in which mappings are created implicitly by providing example gesture–sound pairs. In a typical IML workflow, examples in the gesture domain are provided as static poses.
We refer to this approach as static regression (Tanaka et al. 2019). Within the project, we extended this paradigm and proposed an automated technique for training a neural network with a windowed set of anchor points captured on the fly from a dynamic gesture made in response to a sound-tracing auditory stimulus; we called this technique “windowed regression.”
Dissemination
The EAVI-EMG system has been used by a range of musicians in three distinct public presentation settings. Even if they do not constitute a formal study, the EAVI-EMG system was deployed in a range of performance situations “in the wild.” We describe three settings: (1) repertory, where the EAVI-EMG hardware was used to perform musical work composed for other EMG systems, (2) dance, where the BMI was used in interactive dance performance, and (3) new work, where the system was used in the conception of new work.
Repertory
The first check of the hardware was to substitute it for the BioControl Systems Biomuse and Thalmic Labs Myo in the performance of two works originally composed and performed on older hardware. First, we tried the EAVI-EMG device alongside a Myo in two simple compositions by Tanaka for EMG: “Lifting” and “Le Loup” (https://youtu.be/p8CKjmE7zys). Both used four channels of EMG: two on opposing muscles of the left lower arm, and the other two on the corresponding muscles of the right forearm.
Both “Lifting” and “Le Loup” track oppositional flexing of the wrist. They were written originally for the BioMuse and were later performed on the Myo. (They did not make use of all eight channels of each Myo's EMG, nor its motion sensing.) In this sense, the two pieces are musical works for EMG that are device-independent. “Lifting” is a simple oscillator piece, inspired by the Theremin, in which a basic mapping allows one arm to control oscillator pitch and the other the overall amplitude. The opposing muscles on each arm were used in sum and difference to extract a single glissando value based on wrist flexion. “Le Loup” is performed with a single sound sample (of a wolf growling), on a four-voice granular synthesizer. Two EMG channels are used on each arm, with the left arm tracking the sum of EMG amplitude against a Schmitt trigger to articulate the sample. Once triggered, the sound is looped and granulated by sustained muscle exertion, with amplitude modulation. The right arm controls a resonant low-pass filter, with one EMG channel controlling the cutoff frequency, and the other, resonance.
In both “Lifting” and “Le Loup,” the EAVI-EMG was first used to replace the Myo on one arm, and the Myo was retained on the other. The arms were then swapped, to enable checking the EAVI-EMG with the musical interaction of the other arm. Finally, the four channels of the EAVI-EMG were used to replace both Myos on both arms. The pieces were performed then in four variations: (1) two Myos, (2) EAVI-EMG left and Myo right, (3) Myo left and EAVI-EMG right, and (4) EAVI-EMG only. The performer (Tanaka) reported confidence when performing the works on the EAVI-EMG, with the compositions retaining their musical identity and “feel.” This demonstration was made publicly in a research workshop setting.
Dance
A second example of device substitution was carried out with artists not involved in the development of the system. We worked with the composer Anne Sèdes and the researcher David Fierro, both of whom had contributed to the interactive dance work Écoute/Expansion for the French dance company Kitsou Dubois. The original performance of that piece involved two dancers, each wearing one Myo on a forearm. The choreography involved movements of one dancer working on the floor and the other on a pole. Continuous muscle exertion is picked up by the EMG, whereas gross arm gesture is picked up by the IMU in the Myo. The two modes of interaction work in conjunction to create a form of multimodal interaction with which the dancers explore a timbre space in surround sound.
We worked with Sèdes and Fierro to extract several isolated moments from the choreography as excerpts we could consider “interaction modules.” We then asked Fierro to demonstrate each for us using the Myo, then try them with the EAVI-EMG, using its EMG and accelerometer. This was presented in the research workshop mentioned above.
In a second workshop, Sèdes and Fierro presented an adaptation of excerpts from the piece using the EAVI-EMG. Whereas “Lifting” and “Le Loup” were compositions for generic EMG, Écoute/Expansion was composed specifically for the Myo, with its eight channels of EMG and three-dimensional movement sensing. The goal in the new version, according to Sèdes, was not to reproduce the original verbatim, but instead to realize the original musical intent of the project using the feature set of the proposed system. Fierro used a one-to-many mapping approach from the two EMG channels of the EAVI-EMG as well as a sum of the two channels to control a series of synthesis and spatialization parameters. Sèdes ultimately noted that with fewer EMG channels, she somehow found her original musical intent of exertion mapped to timbre and space in a way that was more direct than using the multiple modalities and multiple channels of the Myo.
New Work
The feminist music and fine art ensemble Chicks on Speed premiered the performance piece “Noise Bodies” at Muzeum Susch, Switzerland, in December 2019. In their performance, they used four prototypes of the EAVI-EMG in an experimental multimedia performance. “Noise Bodies” was inspired by the 1965 work of the same name by artist Carolee Schneemann, in which vernacular objects such as pots, metal cans, and car license plates were mounted on the bodies of Schneemann and her partner, the American computer music composer James Tenney. “Noise Bodies” by Chicks on Speed retained the theme of wearing sound objects, pushing it in the direction of electrified and electronic sound. The piece featured five performers: Alexandra Murray-Leslie on E-Shoe (Murray-Leslie and Johnston 2017) augmented with the EAVI-EMG; Melissa Logan, who interfaced the EAVI-EMG with a wearable analog synthesizer and laptop computer; Krõõt Juurak on the EAVI-EMG; and Visi on electronic stethoscope and Myo armbands.
This was the first public performance involving multiple EAVI-EMG. Reflecting on this experience, we appreciate how the EAVI-EMG afforded experimentation in ways that more normative devices like the Myo would have not. On the other hand, using the board without an enclosure near the performers’ skin increased the risk of short circuits when some components of the board came in contact with skin surface moisture, and wet electrodes are more difficult to clean and reuse compared with the dry electrodes of the Myo. These technical and design aspects are the subject of current development work and will be addressed in a future iteration of the device.
Having the EAVI-EMG recognized as a class-compliant Bluetooth MIDI device made it easier to use the device in a sound design and music production software environment. This allowed us to quickly design and test interactions during rehearsals, showing how a dedicated device can make such experimental practices more accessible to practitioners working outside of niche research circles.
Discussion
We have presented an extended system combining both hardware and software for physiological sensing, gesture–sound mapping, and sound synthesis. We have described several preliminary examples where the system has been used in musical performance, sometimes alongside existing systems. The EAVI-EMG is not the first system to enable use of muscle EMG in musical applications. Here we discuss in what ways the proposed system may advance the state of the field, and we explore its potential as a digital musical instrument.
System Design
As a class-compliant audio and MIDI device, the EAVI-EMG hardware is designed from the ground up to be used for music. Of the systems described in the Physiological Interfaces for Music section, it is the older systems—the BioMuse and BodySynth—that, by being MIDI devices, share this musical specificity. Products like the Thalmic Labs Myo were general purpose HIDs that needed to be adapted for musical use. The musical research community responded to this need by producing utilities like Françoise's Myo for Max object and Di Donato's Myo Mapper, but the core Myo software development kit and drivers are no longer maintained and do not run on new CPU architectures. Being class-compliant, the EAVI-EMG requires no device drivers for it to be used as an audio or MIDI device.
Do-it-yourself systems like the MyoWare offer flexibility, allowing the artist or musician to choose the interface board to which the EMG sensor should be connected, but they require artists to build their own systems. Artists like Hollie Miller and Craig Scott have used the MyoWare coupled with a Teensy microcontroller as input to digital audio workstation software (in their case, Ableton Live). This combination was specific to the project. The EMG and MyoWare were used in just one piece in Miller and Scott's repertoire, have not been adopted by other artists, and have not been used by them in their subsequent projects. The dissemination examples we described in the previous section are a first indication that the EAVI-EMG system has potential both to perform repertoire and for musical uptake across a range of musical styles.
Incorporating the mapping and sound synthesis contributes to making an integrated system that can then be adopted by musicians as a complete system without requiring them to build or invent their own. The software components of the EAVI-EMG system have been developed through a series of user-centered design actions (Tanaka et al. 2019; Zbyszyński et al. 2021). The integrated system allows musicians with no prior experience with biomedical technologies to get started using the EMG in musical applications. We have presented examples of use of the EAVI-EMG system by artists who were not part of the core research team. As we describe in the following section, we continue to work on simplification of the end user software experience to make its use more accessible. The production run of 20 EAVI-EMG boards is being rolled out to a range of students and independent musicians.
Conceiving of the hardware component as a combined MIDI-and-audio device, and designing it in conjunction with the host software, leaves open opportunities for improvement and optimization in the future. The OWL firmware on the microcontroller treats the EMG signal and audio together. To our knowledge, this is the first time that physiological signal and audio are treated in a single-signal processing chain. The increasing computational power of future microcontrollers leaves open the possibility to take signal-processing tasks currently running in the host software and shift them to run directly in the microcontroller. We are currently experimenting with implementing more of the EMG feature extraction in the microcontroller, and we have an experimental system where the granular synthesizer has been implemented in Pure Data compiled to run in the OWL framework. We are studying the possibility of digital-to-audio conversion and analog audio output in the next revision of the hardware, pointing to the possibility that a future version of the EAVI-EMG could one day run as a standalone musical device without a host computer.
Our device hardware does not match the industrial design elegance of commercial products like the Myo. This brings with it disadvantages as well as newfound freedom. The snap electrode system we have adopted is standard in biomedical practice and allows use of both wet and dry electrodes, but it requires three electrodes for one channel of EMG. For the connector between the electrode cable and the main circuit board, we have chosen to use the same multipin (UC-E6) connector as the Plux Bitalino. This provides a certain degree of interoperability with DIY solutions like the Bitalino. The cables can be cumbersome, however, and the straps we have made for use with dry electrodes (as shown in Figure 1) do not provide the same stability and electrical contact as the Myo steel-plate dry electrode system. Although the Myo provides excellent fit on the lower arm, it is limited to this limb, some artists have used the Myo on the leg (Candau et al. 2017). While the snap electrode system is more ungainly, it enabled Juraak in “Noise Bodies” to use the EAVI-EMG to detect thigh muscle activity (Figure 10). In current work with guitarists and violinists, we are exploring electrodes placed behind the shoulder and on the neck. This gives us a freedom we would not have with the Myo and opens up possibilities for use of the EAVI-EMG with nonnormative bodies. The freer use of the EAVI-EMG on different body parts and on different body types bring with it new design challenges we will need to confront in the future. Murray-Leslie and Logan reported mild electric shocks from the 5-V battery, resulting from perspiration in performance and the use of a raw circuit board without a case. This points to health and safety considerations that need to be addressed before a generalized rollout of the system. Meanwhile, the wireless data transmission means that the user of the wearable system is protected from ground problems or electrical current from the rest of the computer, audio, or mainline electrical systems.
Conclusion and Future Work
We have presented a digital musical instrument system that is based on the electromyogram muscle signal as input. It is an integrated system combining both hardware and software that represents the convergence of several strands of research and builds upon prior work by others in the field. It benefits from advances in the technology landscape, including the democratization of hardware development and advances in signal processing and machine learning.
The notion of instrument as extended system and the concept of instrumentality led us to present not just the development of a hardware interface but the associated software around it that facilitates the composition and performance of a diverse body of musical work. It has been used by the authors in concert, and also by a range of different composers and performers in public performance settings. By making the instrument design and code available as open source to the musical community, we hope that it will be of use to other musicians and researchers who had been orphaned by discontinued commercial devices. Beyond avoiding transience, we feel that the design of the EAVI-EMG from the ground up as a class-compliant system, working with both audio and MIDI, will imbue it with musical qualities and lead to richer musical use than would the hacking of otherwise general-purpose devices.
Development of the system continues with several perspectives for future work. The research continues in a new project, Brain Body Digital Musical Instrument (BBDMI, see https://bbdmi.nakala.fr), where we use the EAVI-EMG to explore multimodal musical interaction combining EMG and brain EEG. Feature extraction and sound synthesis are being implemented in Faust (https://faust.grame.fr) on the OWL framework as a way to embed those components in microcontroller firmware to create a truly standalone instrument. We continue our work with users, and we are preparing a separate publication on user studies conducted with the EAVI-EMG. The BBDMI project will expand the user base for the instrument, including conservatory instrument students, and it will explore neurodiversity in trials with autistic musicians. Finally, we are working with industrial designers on electrode harnesses, and we will produce a simplified version of the hardware that can interface with modular synthesizers.
Acknowledgments
The research leading to these results has received funding from the European Research Council under the European Union's Seventh Framework Programme (FP/2007-2013) / ERC Grant FP7-283771. It has also received funding from the Horizon 2020 research and innovation programme, grant agreement no. 789,825. Continuing work is supported by the French Agence Nationale de la Recherche ANR-21-CE38-0018. We would like to thank the participants in our user trials.