Peripheral interaction is a new approach to conveying information at the periphery of human attention in which sound is so far largely underrepresented. We report on two experiments that explore the concept of sonifying information by adding virtual reverberation to real-world room acoustics. First, to establish proof of concept, we used the consumption of electricity in a kitchen to control its reverberation in real time. The results of a second, in-home experiment showed that at least three levels of information can be conveyed to the listeners with this technique without disturbing a main task being performed simultaneously. This number may be increased for sonifications that are less critical.
Scenario. While I am at home, a few sounds, random as well as intentional, appear in the background of my attention: traffic and the twittering of birds through an open window, a radio program, the rattling of dishes while I cook. All is part of the soundscape with specific room acoustics with which I am implicitly familiar. Suddenly, I become aware of a change in the room's reverberation: The room acoustics are virtual and are actually a sonification of the household's electrical power consumption. The boiler started heating!
Our hearing can be exploited at the periphery of our attention to convey information about anything that is relevant to us. We can realize this in an unobtrusive way by manipulating sounds that exist in the soundscape. This idea is explored and discussed in this article.
Following Weiser and Brown's (1996) well-known call for calm computing, new modes of interaction are needed in times of ubiquitous computing and ever-growing amounts of data. Using the recent taxonomy of the interaction–attention continuum introduced by Bakker and Niemantsverdriet (2016), peripheral displays (Matthews, Hsieh, and Mankoff 2009) link the gap between systems of focused interaction (e.g., as those in most of our usual computing devices and in many sonifications), on the one hand, and systems of implicit interaction (e.g., automatic lighting or doors, or an alarm clock that wakes one up at the most appropriate moment in the sleep cycle), on the other. Many other related concepts might be mentioned here: ambient information systems (Pousman and Stasko 2006), ambient intelligence and persuasive technology (Verbeek 2009), auditory augmentation (Bovermann, Tünnermann, and Hermann 2012), embedded sonification (Barrass and Barrass 2013), and ambient sonification systems (Ferguson 2013). Ideally, a new display type should be developed that facilitates the alternation between the required level of attention, from peripheral to focused interaction and back again. Ambient systems have been shown to provide such an alternative. One example is the Student Feedback Orb (Hazlewood, Stolterman, and Connelly 2011), which uses ambient light. In a case study, an ambient orb was used to show lecturers the results of their course evaluation surveys in their offices. The orb alternated between red and green depending on the weekly results of student evaluations.
Light is often used in prototypes of peripheral interaction, whereas audio is generally avoided, possibly due to acoustic and psychoacoustic issues (e.g., interplay with the soundscape or room response, overall noise level, or interference with other signals). These problems may be partly overcome by existing methods of digital signal processing and expert tuning, however. State-of-the-art technology in 3-D audio includes, among other techniques, virtual room acoustics, interactive audio scenes (e.g., orientation-tracking headphones with binaural rendering), 3-D loudspeaker arrays, or adaptive digital signal processing systems (e.g., Pulkki and Karjalainen 2015; Zotter and Frank 2019). Although active noise cancellation is standard, similar algorithms may augment certain audio features or even perceptual entities in the auditory scene.
Current research on auditory ambient displays shows us how to cope with the strengths and weaknesses of the audio modality. The need for audio to be unobtrusive for nonusers may be met by using personalized sounds. Butz and Jung (2005), for example, experimented with a prototype of encoded Muzak for a mall or museum. By adding specific instruments or motifs to the arrangement, employees can be summoned without annoying the customers. In an interesting experiment with ringtones, John Brown (2016) attempted to show that using personalized sounds can “encalm” ringtone interaction. The evaluation of this EEG-based experiment produced no significant results, however. The examples of Butz and Jung and of Brown display only binary information (call / no call). Richer data streams have been sonified using ambient systems, but these were proofs of concept with little or no evaluation. These sonifications have usually focused on everyday interactions to exploit hearing as an additional information channel—for example to induce more-economical driving attitudes (Hammerschmidt, Tünnermann, and Hermann 2014), when the sound produced by knocking on a door is used to reveal whether or not someone is inside (Tünnermann, Hammerschmidt, and Hermann 2013), or when opening a wardrobe door is used to trigger an auditory weather report (Ferguson 2013).
Auditory augmentation provides an ingenious method for displaying richer information in an unobtrusive way. Bovermann, Tünnermann, and Hermann (2012) introduced auditory augmentation with a focus on structure-borne sounds. In their WetterReim project, for example, the sounds of keys during typing change depending on weather parameters. We have generalized the concept of auditory augmentation. In a recent study, summarized in the PilotKitchen Experiment section, we used artificial reverberation in a kitchen to convey its consumption of electrical power in real time (for further details, cf. Groß-Vogt et al. 2018). In this case, we found that only extraordinary changes come to the foreground of attention—for example, when the small kitchen reverberates as if it were a church. In our broader definition, auditory augmentation makes use of either object-specific sounds caused by interaction, or adds environmental sounds that blend into the sound scene.
Time tagging, which is a similar concept in ubiquitous music (“ubimus”) research, uses local acoustic cues for aesthetic decision making. Time tagging is based on local resources, provides synchronous feedback, and demands unobtrusive audio-processing techniques (Keller et al. 2010; Farias et al. 2015; Keller 2018; Keller, Schiavoni, and Lazzarini 2019).
Hypothesis and Related Research
We hypothesized that auditory augmentation can be used to convey user-relevant information with embedded sounds that are easily learned and have little distraction potential. In this way we can exploit the auditory modality at the periphery of attention. We conducted two experiments to explore our hypothesis. In the PilotKitchen experiment, we set up an adaptive system for room acoustics in the kitchen of the Institute of Electronic Music and Acoustics at the University of Graz that was controlled by the electrical power consumption of kitchen appliances. In the RadioReverb experiment, we explored how many levels of information can be conveyed using this method.
Parthy, Jin, and van Schaik (2004) used reverberation for ambient data communication. Their study found that reverberant decay time must increase by approximately 60 percent or decrease by approximately 30 percent to be clearly perceived. We followed a different argument for tuning our reverberation levels, discussed below. The Weakly Intrusive Ambient Soundscape is an older concept by Kilander and Lönnqvist (2002). Kilander and Lönnqvist conveyed individual notifications in a ubiquitous service environment, with a sound associated to each user. Playback volume and reverberation were used to convey three levels of intensity for the notification. In a more recent experiment, Lockton et al. (2014) used sonification to communicate in-home electricity use. First, Lockton and coworkers experimented with abstract sounds; when they found that these were not sufficiently unobtrusive, however, they turned to birdsong. Our experiments pursued a similar design scenario.
For evaluating peripheral displays, Matthews, Hsieh, and Mankoff (2009) developed design dimensions and criteria based on the framework of activity theory. As evaluation criteria they listed awareness, the detection of the system's breakdowns, distraction, and appeal. To these standard factors they added learnability, which is necessary for automated operation (and consequently, for peripheral display). For peripheral sonification, we needed to add effectiveness of information transfer. This factor was explored in detail in the experiment described in the RadioReverb Experiment section.
This section gives an overview of the PilotKitchen experiment that we reported in a previous paper (Groß-Vogt et al. 2018). The PilotKitchen was the prototype of a system for auditory augmentation that conveyed information on the electric power consumption of the kitchen of the Institute of Electronic Music and Acoustics (IEM; https://iem.kug.ac.at). The purpose of the study was to raise awareness of power consumption. Data from the kitchen appliances were collected in real time. Pictures of the kitchen and the wall plugs measuring the electric power consumption are shown in Figure 1. A demo video with binaural audio recording of the PilotKitchen can be found online at https://dx.doi.org/10.1162/comj_a_00553.
System Design of the PilotKitchen
Data on the electric power consumption of five kitchen appliances were collected from wall plugs using the Fibaro intelligent-home system (www.fibaro.com/en/the-fibaro-system/wall-plug) that transmitted the data over Z-Wave (Yassein, Mardini, and Khalil 2016). The data from a dishwasher, a coffee maker, a water kettle, a microwave oven, and a refrigerator showed two patterns—one stemming from the technical cycle of each appliance, and the other from the interaction of the kitchen users. We implemented a self-adapting system that conveyed direct feedback from the actual electrical power consumption and also related this information to the typical weekly user pattern. The algorithm of data preprocessing consisted of an initialization and three iterative steps: (1) smoothed real-time electrical power consumption (EPC), (2) baseline week EPC, and (3) the difference relation.
In the initialization we gathered three weeks of total EPC data and averaged the data to obtain one typical week. This was our initial baseline week EPC.
For the baseline week EPC, the smoothed real-time value obtained above was used to update the baseline week of typical EPC. We applied another leaky integrator with . Thus, the EPC of the present moment was added with a weight of two-thirds to the updated baseline, while the average over all previous weeks contributed with a weight of one third.
For the difference relation, we compared the real-time value to the value of the baseline week that corresponded to the present moment. For instance, if it was Tuesday at 13:05:10, we subtracted the corresponding value of the baseline week from the present value. The result could be positive, when the actual consumption was higher than usual, or negative, when it was lower. Outliers were clipped, and the resulting number was mapped to a range of . This number was further used in real time to control the reverberation level.
The sonification design was based on virtual acoustics by adding reverberation to the room. In our setup, we installed sound absorbers in the kitchen to lower its real reverberation. Then we recorded the environmental sound with a microphone, applied filters and a reverberation algorithm, and played the signal back over loudspeakers in real time. The Preset 0 of the virtual room acoustics was then tuned to approximately restore the original reverberation. The difference relation controlled the level of reverberation within three sound presets: 0 corresponded to a “typical” kitchen use, that is, a plausible but virtually added kitchen reverberation; whereas , that is, a particularly high consumption, led to reverberation resembling that of a church. Atypically low consumption, that is, a value of , turned the virtual reverberation off. The reverberation levels were interpolated between these extremes.
Evaluation and Discussion of the PilotKitchen
The evaluation of the PilotKitchen experiment aimed at assessing users' perception and evaluation of the system. In the first round of the evaluation, the system was installed for ten days. Kitchen users were notified about the experiment but given no additional information about it. After the experiment was finished, the users were asked to answer open questions in a questionnaire. Their comments included assessments of their affective reactions to our prototype system and included subjective interpretations of how to control the system (for details, see Groß-Vogt et al. 2018).
The resulting data from 60 diary entries by 14 participants did not show a correlation between the perception of reverberation and the difference relation driving the sonification. This might be due to statistical deficiency, to interpersonal differences in perceiving the virtual acoustics, or to insufficient testing of the tuning of the reverberation levels. Qualitative results of the SAM ratings revealed a heterogeneous attitude towards the system (see Figure 2b).
The results of the PilotKitchen experiment showed that the chosen evaluation techniques were not sufficient. In general (following Hazlewood, Stolterman, and Connelly 2011, p. 877), the challenge for prototypes of peripheral interaction is to explore
[h]ow to provide in-depth evaluations on something that is defined as blending with the surrounding world, and meant to be (in some respects) ignored.
Hazlewood and colleagues argued strongly in favor of long-term and in situ measurements. They reported an interesting case study (“clouds and lights”), in which three ambient displays were installed in a publicly shared atrium of a large building in an effort to persuade passers-by to use the stairs instead of the elevators. Interestingly, quantitative data showed that staircase use increased significantly. When interviewed, however, users denied that they had changed their behavior. This example shows that at-the-periphery-of-attention qualitative evaluation alone is not sufficient. On the other hand, if indirect quantitative measurement is not possible, qualitative data are needed to resolve issues of test and evaluation design. This was also discussed by Hazlewood and coworkers, who reported that indirect measurement in the above-mentioned Student Feedback Orb study failed for reasons that did not become evident until the post hoc interviews.
For our PilotKitchen experiment, we concluded that a simplified experimental design is needed to conduct a thorough evaluation of the method. The RadioReverb experiment was designed to meet this criterion.
The RadioReverb experiment, designed and carried out in April 2020 in Austria, was originally planned as a listening experiment in our sonic interaction design lab at the IEM. Due to the COVID-19 pandemic, we redesigned the experiment as an in-home test using an app on participants' smartphones and headphones.
On the basis of the results of the PilotKitchen experiment, we posed the following research question:
When the environmental sounds of a room are augmented by changes in the room acoustics, how much information can be conveyed at the periphery of attention (without disturbing the main task the user wants to perform)?
Design of the RadioReverb Experiment
To investigate our research question, we set up the scenario of a user who was casually listening to a half-hour radio feature at home as the main task. The reverberation of the room changed randomly, and the participants of the experimental group (EG) were asked to estimate the changes they perceived on a simple GUI on their smartphones. Participants in a control group (CG) listened only to the radio feature at constant reverberation. Then, both groups were requested to answer questions on the content of the feature. This ensured that the EG participants completed the task by listening mainly to the radio and only peripherally to the reverberation, as opposed to altering the main task by tracking changes in the reverberation. The radio signal was played back via a virtual loudspeaker, and the room response was carefully simulated to correspond to that of a standard living room and rendered binaurally.
Audio Material and Participants
To meet the prerequisite of providing an information source that should be relevant to the participants (Hazlewood, Stolterman, and Connelly 2011), we chose two episodes from a current radio feature “Ö1-Radiokolleg,” produced by ORF (the Austrian Broadcasting Corporation). These episodes on the topic of food plants (fig and millet) were chosen with the expectation that they would be of general interest. Episodes are around 15 minutes each and were played back consecutively as one file. Approximately half of the 15-minute episode contained speech in a dry studio; the other half consisted of interviews recorded in a variety of settings. This material would not be ideal for a strict listening test, but it supported our in-home, environmental test design.
We recruited 33 participants; although some did not finish the experiment owing to technical issues. Ultimately, 12 participants (six men, six women) were taken into account in the CG, with the remaining 17 (13 men, 4 women) making up the EG. All members of the EG were staff or students of our institute, with the exception of two musicians. All were advanced listeners and were assumed to have normal hearing. The participants of the CG were recruited from among the acquaintances of the authors, and, with the exception of two sound engineers, had no special audio or music background. The experiment was conducted in German, the first language of all participants.
The experiment was conducted on each user's smartphone using headphones (not earbuds) with the MobMuPlat app, described below. The users were asked to first install the environment and load their personalized preset, then to find a comfortable sitting place in a normal room with little distraction. The app guided the participant through the experiment and provided a slider on the GUI as the experimental interface. The experiment consisted of four phases:
Post hoc survey.
In Training I, a 46-sec excerpt from a radio feature was played. During playback, the reverberation was iterated through twelve discrete levels, starting from minimum, up to the maximum, and then back to minimum reverberation. The slider showed the correct setting simultaneously. This procedure could be repeated on demand.
In Training II, a 3.5-sec speech sample was played repeatedly at different levels of reverberation. For each trial, the participants were asked to estimate the corresponding slider position. The correct answer was revealed through an additional slider in the background as soon as the sample ended. After a pause of 1.5 sec, the next trial was presented. The level of reverberation was randomized by presenting consecutive rows of the twelve levels in random order. The procedure was repeated for at least two such rows (24 random jumps) and could be continued as long as the participant wanted. After completion of Training II, the experiment proper was carried out, in which the radio feature was played for half an hour. The participant was instructed to estimate each change on the slider as rapidly as possible. For the comfort of the participants, the interface could be released between changes because the slider retained the last value.
Finally, in the post hoc survey, the participants were asked to return the gathered data and fill out an online survey about the content of the broadcast and about the experiment.
We chose the reverberation levels based on perceptual considerations, taking into account just-noticeable differences (JNDs) from the literature and the physically plausible range as given by the size of the simulated room (43 meters). Previous work by members of our group (Weger, Hermann, and Höldrich 2018) has shown that limiting the auditory augmentation to a plausible range leads to a calm sonification that fits naturally and seamlessly into the everyday acoustic environment. According to Connell and Keane (2006, p. 95):
A highly plausible scenario is one that fits prior knowledge well: with many different sources of corroboration, without complexity of explanation, and with minimal conjecture.
In our case, the lower threshold was tuned to a dry reverberation of a standard living room with discrete reflections only, while the upper threshold simulated conditions in a room of the same dimensions that was tiled as a bathroom, resulting in 1.2 sec of diffuse reverberation. We argue that this is a plausible range because it can be achieved physically for the given room dimensions.
During the whole radio episode, all possible transitions between the seven levels (i.e., at least 42 transitions) were presented in random order for each participant, with each setting lasting for at least 7 sec.
Sound Rendering for Virtual Room Acoustics
The sound was rendered with the Reaper digital audio workstation (www.reaper.fm) using Ambisonics technology (Zotter and Frank 2019) and the IEM Plug-in suite (https://plugins.iem.at). The individual processing steps are shown in Figure 4.
Monophonic audio was fed into a DirectivityShaper plug-in that created 16 channels to represent typical frequency-dependent directivity of a loudspeaker in third-order spherical harmonics. The orientation of the directivity pattern was rotated by to face the listener using the SceneRotator plug-in. The rotated directivity pattern was sent into the RoomEncoder, which used an image-source model of a m shoebox-shaped room with 236 reflections to create basic, dry living room acoustics. The virtual loudspeaker was placed at (1.7, 0.3, 0.5) m and the listener at (0.3, 0.3, 0.5) m relative to the center of the room, resulting in a typical listening distance of 2 m. The walls of the shoebox room had a reflection coefficient of for lower frequencies and for higher frequencies. To simulate the effect of a carpet on the floor, the floor reflections were attenuated by 2 dB. The output of the RoomEncoder was 64 channels so that seventh-order Ambisonics could make use of the highest possible spatial resolution. The output of a feedback delay network (FDN; cf. Stautner and Puckette 1982), was added to these 64 channels according to the different levels of reverberation using the FDNReverb plug-in. The parameters of the plug-in were chosen to take into account the average free path of the room and a 0.1-sec fade-in to increase diffusivity (Blochberger, Zotter, and Frank 2019) for a smooth blending into the early reflections of the image-source model. A frequency-dependent reverberation time with about half the reverberation time at higher frequencies was achieved by high-shelf attenuation at 8 kHz. The resulting envelopes of the dry room and the additional diffuse reverberation levels were depicted in Figure 3. The BinauralDecoder created the headphone signals from the resulting 64-channel Ambisonics stream with state-of-the-art decoding technology (Schörkhuber, Zaunschirm, and Höldrich 2018; Zaunschirm, Schörkhuber, and Höldrich 2018). The detailed parameter settings of each plug-in can be found as supplementary material at https://dx.doi.org/10.1162/comj_a_00553.
Collecting In-Home Data via MobMuPlat
The participants returned their recorded data in the form of a text file containing the slider values sampled at a rate of 100 Hz. Of the 17 participants, 3 accidentally performed the experiment with the wrong audio sampling rate: Participant 16 listened to the 44.1-kHz version with a 48-kHz sampling rate, while for Participants 42 and 44 it was the other way around. The resulting change in playback speed (8% or 9%) was not considered relevant for our experiment, as JNDs for diffusivity and reverberation time are defined relatively, not as absolute values. For unknown reasons, the data from two others (Participants 6 and 40) were 1 sec too long. All collected data were therefore resampled to the same length of 163,662 samples (about 27 minutes 17 seconds).
Evaluation in the Post Hoc Online Survey
After the experiment, the participants were asked to fill out an online survey. This was anonymized for the CG participants but not for the EG participants, as the data needed to be linked to the quantitative data collected via the MobMuPlat app. The questionnaire contained a multiple-choice test on the content of the radio feature, and the following questions on the circumstances of the experiment and on the subjective experience of the sound, whereby the CG participants were asked the first three questions only:
Multiple-choice, multiple-response statements on Episode 1 (fig).
Multiple-choice, multiple-response statements on Episode 2 (millet).
Description of the room in which the experiment was conducted, and of possible distractions or technical issues during the experiment (formulated as an open question).
“How well do you think you were able to estimate the reverberation level?” (Likert item, 1–7).
“How much were you challenged by simultaneously listening to the radio and estimating the reverberation level?” (Likert item, 1–7).
“How realistically did the virtual radio and its reverberation blend into your listening environment?” (Likert item, 1–7).
The participants answered sets of detailed questions on the content of the radio feature, in two parts corresponding to the two episodes. Each set contained 14 statements on figs and millets, respectively, all of which were factually correct statements. The participants had to indicate whether or not the statement had actually been part of the radio feature. To validate the difficulty of this task, the CG participants were asked the same questions as the EG participants.
Our test design was a magnitude-estimation experiment: A certain stimulus was to be estimated by the participants on a given scale. In our case, a slider was used with a given beginning and end point but no further segmentation. Many effects are known for magnitude estimation, regardless of the presented modality (see Petzschner, Glasauer, and Stephan 2015). When we analyzed our data we took note of behavioral effects, such as the regression effect, where the reproduced range is smaller than the physical one, as well as sequential effects. We expected to find a hysteresis in the data, that is, a bias in estimates towards the recent history of the stimuli experienced.
Results of the RadioReverb Experiment
The data obtained from the MobMuPlat app were time series of estimated reverberation levels (response) in relation to the true reverberation levels (reference) for each participant (see one example in Figure 7a). To simplify comparisons, both are given in quantities of reverberation time , as was also done for the added FDN reverberation. As a first step in the analysis, we estimated the delay of the response with respect to the reference via the maximum of their cross-correlation. This average delay was between 0.54 sec and 2.22 sec for all participants (mean 1.48 sec, standard deviation 0.55 sec). A moving-average instantaneous delay over time was computed in the same way by using a sliding rectangular window of four minutes in length. This instantaneous delay ranged from 0 to 3.88 sec for all participants, with an average delay over time of 0.99 sec.
For further analysis, the time series were divided into segments of constant true reverberation (plateaus are shown in Figure 7a in the gray reference curve). The first segment was excluded from further analysis. In addition, the first 2.22 sec (i.e., the maximum average delay), as well as the last second of each segment were removed; for the remainder, the average over the response (i.e., the estimated reverberation) was calculated. After collecting average responses according to the corresponding reference level, we obtained six to nine estimates per participant and reverberation level, as shown in Figure 7b.
For each participant and level, the distribution of estimated reverberation was tested for normality by the Lilliefors test. The null-hypothesis of normal data was rejected in only 4 percent of the cases (and for a maximum of one level per participant). Therefore, we generally assumed a normal distribution for further statistical analysis. For each participant, we performed pairwise one-tailed Welch's -tests on all possible pairs of reference levels, with a 5 percent threshold for significance. For all participants, a difference of three reverberation levels (e.g., fifth versus second level) always resulted in estimates that differed significantly from each other. Below that level, the estimates of some participants were not significantly different from each other. When pooled over participants, all pairwise comparisons were significant; that is, the estimated reverberation of each level was always significantly higher than that of all levels below and lower than the levels above, respectively.
When analyzing the estimated versus the true values, several different approaches can be used to define the number of levels that participants were able to identify correctly. As can be seen in the example of Figure 7b, it was quite obvious that the participants did not reach our seven reverberation levels. Due to the small number of data points per level and the fact that participants each have their own nonlinear mapping, we decided to use the statistical measure of effect size to analyze the data. Note that this choice follows the assumption that our levels are equally spread and that the perceptual distance between adjacent levels is similar over the whole range of reverberation. The data indicate a linear behavior; Figure 7b, for example, shows a rather linear function of average estimates over reverberation levels. The same behavior can be seen in the analysis of the average values when the data are pooled over participants.
For different levels of superiority , corresponding to the thresholds , the table shows the results for the number of levels discriminable by all listeners, those who were content focused (CF), and those who were sound focused (SF). There was a minimum of 3.0 discernible levels and a maximum of 6.8, taking into account a lower probability of correct estimates.
The results of this analysis are given in Figure 8, averaged over all participants, as well as averaged, separately, over the groups of SF and CF listeners. In Figure 8a, was plotted against a continuous , whereas for Figure 8b, four selected (corresponding to four values of ) were chosen. When choosing a of 2.77 (corresponding to a of 0.975), CF listeners achieved an average of 3.0 discriminable levels (standard deviation 0.4), whereas SF listeners achieved an average of 3.5 levels (standard deviation 0.4). The number of perceived levels was significantly higher for SF listeners than for CF listeners: .
The instantaneous delay that was discussed above was further analyzed in a similar way as the estimated reverberation, by computing the average instantaneous delay per time segment and participant. When these data were pooled over all participants, however, no significant effects could be observed: There was no significant difference in delay, neither between different reverberation levels, nor between jump distances, nor between jump directions.
Finally, Figure 10 shows the numerical results of the questionnaires. Participants in EG indicated that their listening experience seemed more virtual compared to the CG participants. Mapping of the original Likert scale [1,7] to a range of [1,1] produced mean values for CG 0.1 (standard deviation 0.9) and for EG 0.6 (standard deviation 0.4). The EG participants did not feel excessively challenged by the need to listen to the radio while estimating the reverberation (mean lies in the center, at 0.1 with standard deviation 0.3); however, they found it more challenging than only listening to the radio (as in CG). Furthermore, they were rather confident of their estimate (mean 0.4 with standard deviation 0.3).
Discussion of the RadioReverb Experiment
As we expected, we found that listeners cannot discern as many levels of reverberation in a magnitude-estimation experiment as in a JND experiment with pairwise comparisons. The results of the RadioReverb experiment showed that we may reliably convey three levels (about 1.5 bits) of information by virtual reverberation in real time at the periphery of attention, while the users are focused on another task. This number corresponds to probability of superiority of 0.975, that is, 97.5 percent correct answers. We may increase the number of bits when the probability of superiority is lowered, for instance to 0.8, resulting in 5.8 levels or about 2.5 bits. The choice of the necessary probability of superiority should be based on the criticality of the conveyed information. Another measure for adapting a peripheral sonification based on reverberation is to take into account the over- and underestimation of upward jumps and downward jumps, respectively, that is, the sequential bias found in magnitude estimation, shown in Figure 9. Our data show estimates that exaggerate the difference between consecutive reverberation settings. Changes from high to low reverberation values (downward jumps) were underestimated as compared to jumps from low to high (upward jumps). The latter showed a more linear behavior.
Our analysis of the temporal correlation between reference reverberation levels and response showed that this sonification works well in real time. The average instantaneous delay of 1 sec includes the estimation of the participant and the setting of the slider; thus we may assume a real-time-capable display within physiological reaction times.
As was evident both in the qualitative responses in the questionnaire and in the time-series response data, we found that some listeners concentrated more on content and others more on sound (CF versus SF listeners in Figure 6). We preferred to take only the CF listeners into account for the results of this experiment, as they fulfilled our criterion for peripheral listening to the reverberation. The RadioReverb experiment was designed as an ecologically valid experiment; however, true peripheral listening without any experimental task—in our case, setting a slider—might lead to different results. Nevertheless, the fact that we were able to distinguish CF listeners from SF listeners on the basis of both the questionnaire and the data from the experiment gives evidence that our goal of designing a peripheral listening experiment was to some extent achieved.
Participants did not feel excessively challenged by the task in the experiment, and they felt confident of their estimates. Even if no questions addressed these subjective reactions explicitly, the responses indicate that the sound was accepted and was not too disturbing. These issues should be addressed explicitly in a long-term, in situ, follow-up experiment.
The EG participants experienced the radio and its reverberation as being very unnatural. The responses of the CG participants were less negative; this is perhaps to be expected as they were exposed to a constant reverberation. From this, we may conclude that in Question 6 in the above description of the post hoc online survey, the participants were assessing the reverberation more than the radio. Nevertheless, these responses indicate that the situation was experienced as being less natural than we had expected, and this reduces the ecological validity of the experiment. From personal observation and informal comments of some participants, we may say that the radio blended well into the customary soundscape. The abstract formulation of the question might explain why the questionnaire responses were so negative in this regard.
The binaural test design of the experiment was due to the fact that the experiment was conducted during the COVID-19 pandemic. Initially, the experiment was planned and already set up in a studio using Ambisonics technology and loudspeakers. Comparing the listening experience of the lab environment with the binaural room simulation, we may say that these were comparable and therefore would probably have produced similar results. Although the lab version would have been more controlled in its test procedure, the binaural design was more ecologically valid due to the at-home setting.
We conducted two experiments to explore peripheral sonification by altering a room's reverberation with virtual acoustics. First, in the PilotKitchen experiment we installed a prototype in the real surrounding of our institute's kitchen. Real-world sounds were recorded by a microphone and played back over loudspeakers to add virtual reverberation, depending on the actual electrical power consumption of the kitchen. The evaluation of this experiment was carried out in the form of a diary study and led to ambiguous results, revealing a heterogeneous attitude towards the system. As a proof of concept, this experiment permitted us to draw conclusions concerning the implementation of such a system. The second experiment, RadioReverb, was conducted with the total of 29 participants in their homes using their smartphones to audition binaural renderings of a virtual radio and room response. The experiment task was to first listen to the radio feature and second to track changes of reverberation and estimate them on a GUI slider. As discussed above, we may state that within a plausible range of virtual room response, three levels of reverberation can be distinguished reliably. More levels can be distinguished if the criticality of the sonified information is lower and a larger number of incorrect estimates can be tolerated. The tuning of the sonification has been described in detail in the supplementary material to this article and can be used for creating peripheral sonifications with the tools we used or with other tools.
Overall, we conclude that the method works well as a peripheral auditory display. The sound design is rather unobtrusive, whereas adding sounds to an environment that is already noisy, especially in an at-home display, can often be difficult. Augmenting existing sounds produces better results. Furthermore, learnability, which may be low in abstract sonifications, can be assumed to be higher, since tracking room characteristics such as reverberation are an evolutionary feature of human hearing. Although our experiments did not make use of musical sounds, we believe that our results could be extended to experiments that use reverberation as an informational layer in background music or music performances.
Our outlook for future research includes a plan to install a system of peripheral sonification long-term and in situ in a semipublic or at-home environment to test all evaluation criteria, such as those given by Matthews, Hsieh, and Mankoff (2009). Especially the criteria appeal and acceptance, which could not be measured in our studies, are cornerstones for the successful application of any new technology and should be explored in future studies. Furthermore, on a conceptual level, our finding that the EG participants could be divided in SF and CF listeners, both from qualitative and quantitative data, might be linked to typologies of listening, for example, starting with Schafer (1993) and ubiquitous music participation (cf. Keller, Schiavoni, and Lazzarini 2019).
We would like to thank our test participants, mainly our colleagues and students, who were first exposed to our virtual reverberation in the institute's kitchen, and then installed the experiment environment on their private smartphones. Finally, many thanks to Henry Fullenwider and to the editors for their thorough proofreading of the manuscript.