Coordination and Consonance Between Interacting, Improvising Musicians

Joint action (JA) is ubiquitous in our cognitive lives. From basketball teams to teams of surgeons, humans often coordinate with one another to achieve some common goal. Idealized laboratory studies of group behavior have begun to elucidate basic JA mechanisms, but little is understood about how these mechanisms scale up in more sophisticated and open-ended JA that occurs in the wild. We address this gap by examining coordination in a paragon domain for creative joint expression: improvising jazz musicians. Coordination in jazz music subserves an aesthetic goal: the generation of a collective musical expression comprising coherent, highly nuanced musical structure (e.g., rhythm, harmony). In our study, dyads of professional jazz pianists improvised in a “coupled,” mutually adaptive condition, and an “overdubbed” condition that precluded mutual adaptation, as occurs in common studio recording practices. Using a model of musical tonality, we quantify the flow of rhythmic and harmonic information between musicians as a function of interaction condition. Our analyses show that mutually adapting dyads achieve greater temporal alignment and produce more consonant harmonies. These musical signatures of coordination were preferred by independent improvisers and naive listeners, who gave higher quality ratings to coupled interactions despite being blind to condition. We present these results and discuss their implications for music technology and JA research more generally.

Consonance time series were computed from music streams using 2, 5 and 10 second sliding windows with 2 second hop size, as illustrated in Figure 3B. Three measures of consonance were considered: Individual Consonance (consonance of individual music streams), Combined Consonance (consonance of merged music streams from both players in a dyad) and Emergent Consonance (Combined Consonance minus average Individual Consonance of both musicians in a dyad). Emergent consonance is essentially a measure of tonal coordination, as it captures the consonance arising from the interaction of pitches played by the two different musicians. A situation in which each pianist plays consonant notes that clash with one another would result in low emergent consonance (e.g. {C,E,G} and {F#,A#,C#} are consonant on there own but {C,E,G,F#,A#,C#} is highly dissonant), whereas a situation in which each pianist plays dissonant pitch sets that stabilize one another when sounded together would result in high emergent consonance (e.g. {C,B} and {E,G} have low average consonance but {C,E,G,B} has high consonance because it is tonicized to a Cmaj7 chord).

Results
Performer Subjective Responses. At the conclusion of every trial participants filled out a questionnaire indicating their subjective experience of (1) quality of the improvised piece (2) how well coordinated they were with their partners (3) how easy it was to coordinate with their partners and (4) the degree to which they felt they played a leader or a supporter role. Responses to questions (1-3) were given on a 5 point Likert scale and responses to question 4 were given on a 5 point scale in which 1 corresponded to a strong leader role and 5 corresponded to a strong supporter role. Responses are depicted in Figure 5.
Tempogram Analysis. A tempogram analysis was conducted to determine the extent to which duets were rhythmically pulsed with a steady beat (i.e. "in time" as opposed to "out of time"). The librosa library in python was used to obtain tempograms from audio recordings of duets in both conditions (6,7). Tempograms were then summed to give an overall score of "pulsedness" per duet. To test for effects of interaction condition, a mixed-effects model was fit to predict pulsedness as a function of condition, with random intercepts for yoked groupings at the duo and duet levels. This analysis did not indicate a significant effect of condition (estimate of condition slope: M = -425.07, SE = 13412.70, t(86.88)=-0.023, p>0.1), thus we conclude that coupled and one-way duets were equally pulsed. This model was fit using the lme4 package in R, and p value was obtained using lmerTest, which uses Satterthwaite's method to estimate degrees of freedom (8,9).
Cross-correlation of co-performer onset density. Cross-correlation of onset density decreases for large-magnitude lags, as depicted in Figure 6.
Relationship between onset density and tonal consonance. Although our measure of tonal consonance normalized for the combined duration of sustained notes, it did not explicitly normalize for onset density. Accordingly, we examined the correlation between consonance and onset densit in individual music streams. The relationship between these measures is depicted in Figure 7. Overall these plots indicate a negative correlation between consonance and onset density (correlation = .377, 95% CI=[.37,.38], t(299466)=222.73, p < 0.01), although there is considerable spread of consonance especially at low levels of onset density which are more common in our dataset. Such a negative correlation is unsuprising and a desired property: in time windows with more notes played (i.e. higher onset density) there are more opportunities for dissonant intervals to occur, resulting in more dissonance probabilistically. Nonetheless, this correlation made it necessary to deconfound effects of tonal consonance from those of onset density, as reported in subsequent supplementary analyses.
Granger Causality Between Co-Performers' Onset Density and Tonal Consonance. The Multivariate Granger Causality (MVGC) toolbox in MATLAB was used to compute Granger Causality (GC) amongst co-performers' onset density and tonal consonance time series (10). This toolbox allowed us to deconfound the correlation between onset density and tonal consonance by separately computing pairwise GC between onset density (conditioned on tonal consonance) and consonance (conditioned on onset density).
Data pre-processing and granger causality computation. Onset density and Consonance time series were obtained from each individual MIDI time series with 2 second windows sliding windows and 0.6 second step sizes. These time series were detrended with first differencing to remove non-stationarity. We had to account for missing values in Tonal Consonance time series, because Consonance is undefined in time windows with no playing. Accordingly, we subtracting the temporal mean of each consonance time series from every value, and set missing values to 0.
Once the data was pre-processed, pairwise GC between co-players' Onset Density and Consonance time series was computed separately for each trial, following the procedures of the MVGC toolbox demo. This included a test to ensure time series passed the Granger causality stationarity assumption, with spectral radius of less than one. This condition was met by time series in all trials except for 2, which were discarded from further analysis.
GC computation involves a model comparison between a restricted Vector Autoregressive Model (VAR) (e.g. predict A from A's past) with a full VAR model (e.g. predict A from A's past and B's past). As suggested in the literature (10), these should each have the same model order. GC was computed repeatedly for each trial over a range of hand-chosen model orders. Since our time series came from sliding windows, VAR models with small orders would be fit with data in which there is substantial temporal overlap between predictors and predicted values. We avoided this issue by hand-choosing reasonable model orders that were sufficiently high to avoid such temporal overlap. Moreover, in computing GC for each trial over a range of model orders we were able to see how robust GC results were to model order.
Results and Analysis. GC was compared across three conditions: ghost-to-live (GC from ghost recording to live musicians in one-way trials), live-to-ghost (vise versa) and live-to-live (granger causality between mutually interacting musicians in coupled trials). The main GC output was a log-likelihood model comparison of full and restricted models, which we used as a continuous dependent measure of causality as has been done in previous applications (11). Results for are depicted in Figure 8.
Overall there is higher GC of onset density compared to tonal consonance. GC of onset density reflects the underlying patterns of influence enforced by each condition. Within one-way trials, GC of ghost-to-live was higher than live-to-ghost (paired t(84)=3.724, p < 0.01; comparing gc values in each direction within each one-way trial, and using GC values obtained with 5.4 second model order). Thus, onset density of live musicians in one-way trials was responsive to that of the ghost recordings they were playing with but not the other way around. There was no significant difference in GC between ghost-to-live and live-to-live (i.e. coupled) conditions (t(283)=.101; p=.92; 5.4 second model order).
Granger causality has been applied in related work to show that leader follower relations in music ensembles playing composed music are reflected in the postural sway of performers (11,12). Here we find a related to collectively improvised musical structure: experimentally manipulated conditions of interaction constrain the directed flow of musical information (such as onset density) within dyads of improvising pianists, and these different patterns of causal influence can be detected with Granger causality.
Lagged Consonance. In the main body of this paper we report that within one-way trials lagged Emergent Consonance are both higher at ghost-to-live lags versus live-to-ghost lags. This result was robust over the full range of window sizes, and also held for lagged Combined Consonance, as depicted in Figure 11. Table 2 reports posterior estimates for Bayesian mixed-effects models predicting either Emergent or Combined Consonance in one-way duets as a function of lag sign. Separate models were fit for each window size. Table 3 reports posterior estimates for Bayesian mixed-effects models predicting simultaneous Emergent Consonance (EC) and Combined Consonance (CC) as a function of condition. EC is significantly higher in coupled duets across all window sizes, but this effect is not significant across all window sizes for Combined Consonance. Given the correlation between tonal consonance and onset density, we performed a supplementary analysis to confirm that these results are not somehow artifactual of condition-wise differences in onset density. The same lagged analysis was performed with respect to combined onset density instead of combined/emergent consonance. If onset density was driving the asymmetry in one-way trials we would expect to see a similar asymmetry in one-way with onset density (though in the opposite direction, because of the negative correlation). There was no such asymmetry (paired t(85)=1.197, p=0.235), indicating that the above results are genuinely reflective of tonal coordination.

Disentangling Alignment and Complementary Tonal Coordination.
It was demonstrated in the main body of the text that mutual coupling promotes tonal coordination, in that coupled duos exhibited greater Emergent Consonance compared to one-way duos. But is this effect due to greater tonal alignment, in the sense that coupled musicians are more likely to play more of the same pitches as one another, or is it due to complementary tonal coordination, in the sense that musicians play different notes that consonantly harmonize together? To disentangle pitch matching from complementary tonal coordination, we analyzed Tonal Entropy of the combined notes produced by duos as a function of interaction condition. Low tonal entropy of the combined notes of a duo indicates that musicians were aligning on a restricted subset of the 12 pitch classes (indicative of pitch matching) whereas high tonal entropy indicates that pitch content was more evenly distributed across the 12 pitch classes. Tonal Entropy was computed over the distribution of all 12 pitch classes in the chromatic scale using Shannon's information theory definition of entropy (13). For a given time window, a probability distribution was obtained by incrementing bins of each pitch class by the number of note onsets musicians played from that pitch class * . This was done repeatedly with 2, 5 and 10 second sliding windows and a hop size of 200 milliseconds, yielding time series of Tonal Entropy for each duet. A Python implementation of this measure can be found in the osf directory linked above. Bayesian mixed-effects models were fit to predict Tonal Entropy (average Tonal Entropy throughout each duet) as a function of interaction condition, with random effects for yoked groupings at the pair and trial levels. Results are depicted in Table 4. While not entirely robust over all window sizes, these posterior estimates show that Tonal Entropy is generally lower in coupled duets (though this effect was not significant in 2 second window sizes), thus suggesting that mutual coupling promotes tonal alignment. This being said, we were still interested in whether mutual coupling also promotes greater complementary tonal coordination, especially given that the effect of interaction condition on Emergent Consonance was more robust than it was on Tonal * A prior uniform distribution was assumed by initializing probability bins for each pitch class to 0.5. This was done to ensure that passages with just one or two instances of a particular note onset weren't assigned disproportionately low entropy.
Entropy. Two Bayesian multi-level models were fit and analyzed to disentangle the relative contributions of matching versus complementary tonal coordination with respect to the observed Emergent Consonance results. A restricted model predicted Emergent Consonance as a function of Tonal Entropy (with random effects at the pair and trial level) and a full model predicted Emergent Consonance as a function of both Tonal Entropy and interaction condition (with the same random effects). To the degree that the full model outperforms restricted model, we can infer that mutual coupling promotes complementary tonal coordination resulting in higher EC. This is because Tonal Entropy is a measure of pitch matching, and if EC was fully explained by matching alone, condition effects would already be encapsulated in the entropy predictor.
Posterior estimates, displayed in Table 10b, reveal that Tonal Entropy did not significantly predict EC in the restricted or full models for 2 and 5 second windows, (although it did for 10 second windows), whereas interaction condition significantly predicted EC in full models across all window sizes. The loo_compare method implemented in brms R package was used to perform model comparison between restricted and full models, using the Watanabe-Akaike information criterion (WAIC) (REF?). This model comparison revealed that full models outperformed restricted models at each time window, as summarized in Table 6. This analysis thus reveals that the previously observed effect of interaction condition on Emergent Consonance is not merely a consequence of greater tonal alignment between mutually coupled musicians, but also from greater complementary tonal coordination between mutual coupled improvisers, who play different pitches that interact to produce more consonant harmonies. Tests for Artifacts of Onset Density. Two supplementary analyses were performed to verify that these results were not artifactual of onset density. First we looked for correlation between onset density (combined within a dyad) and Emergent Consonance. As is displayed in Figure 9, there is no strong correlation between these features, except for extremely high values of onset density which were outliers. Second, we contrasted overall level of individual consonance between musicians playing in coupled and one-way trials. If EC results were artifactual of there being more onset density in coupled trials, we would expect lower individual consonance in coupled trials which would in turn result in higher EC. But, as depicted in Figure 10 this is not the case.        11. Lagged consonance analysis. Live musicians harmonize with past notes of ghost recording significantly more so than the other way around. This effect is robust across a range of consonance window sizes and was found for both Emergent (a) and Combined (c) consonance measures.