The study of cognitive processes underlying natural behaviors implies departing from computerized paradigms and artificial experimental probes. The present study aims to assess the feasibility of capturing neural markers (P300 ERPs) of cognitive processes evoked in response to the identification of task-relevant objects embedded in a real-world environment. To this end, EEG and eye-tracking data were recorded while participants attended stimuli presented on a tablet and while they searched for books in a library. Initial analyses of the library data revealed that P300-like features shifted in time. A Dynamic Time Warping analysis confirmed the presence of P300 ERP in the library condition. Library data were then lag-corrected based on cross-correlation coefficients. Together, these approaches uncovered P300 ERP responses in the library recordings. These findings highlight the relevance of scalable experimental designs, joint brain and body recordings, and template-matching analyses to capture cognitive events during natural behaviors.
The emergence of mobile brain and body imaging (MoBI; Gramann et al., 2011) research methods provides the unprecedented opportunity to depart from artificial laboratory-based settings to study cognitive processes directly in real-world environments (De Vos, Gandras, & Debener, 2014; Gramann, Jung, Ferris, Lin, & Makeig, 2014; Makeig, Gramann, Jung, Sejnowski, & Poizner, 2009). Over the last decade, technical advances have been made toward the miniaturization of sensors, improving the portability of research-grade body and neuroimaging hardware (Mcdowell et al., 2013), thus allowing to record brain data outside of the laboratory over long periods (Hölle, Meekes, & Bleichner, 2021). More pointedly, mobile EEG and mobile eye-tracking (E-T) open new research avenues to better understand how people think and act in the real world. The exciting prospects offered by the exploitation of such mobile research methods have sparked interest in the development of novel signal processing approaches (Reis, Hebenstreit, Gabsteiger, von Tscharner, & Lochmann, 2014). Taken together, these developments enable to investigate human cognition directly in naturalistic settings (Ladouce, Donaldson, Dudchenko, & Ietswaart, 2017) to tackle fundamental and applied questions across a wide range of research fields such as sport science (Park, Fairweather, & Donaldson, 2015), architecture (Djebbara, Fich, & Gramann, 2019) and urban planning (Birenboim, Helbich, & Kwan, 2021), neuroergonomics (Gramann et al., 2021; Dehais, Karwowski, & Ayaz, 2020; Ayaz & Dehais, 2018), spatial navigation (Do, Lin, & Gramann, 2021; Miyakoshi, Gehrke, Gramann, Makeig, & Iversen, 2021), perception of art, architecture and neuroaesthetics (Djebbara, Jensen, Parada, & Gramann, 2022; King & Parada, 2021; Djebbara, Fich, Petrini, & Gramann, 2019), and the development of assessment and rehabilitation methods for neurocognitive disorders (Lau-Zhu, Lau, & McLoughlin, 2019; Kranczioch, Zich, Schierholz, & Sterr, 2014). As elegantly articulated by Parada (2018), the overarching challenges lying ahead of the MoBI approach to reach its full potential imply a progressive transition from highly controlled laboratory settings to the study of cognitive phenomena in real-world environments with high ecological validity. Initiating such an incremental approach, a series of influential out-of-laboratory studies have revisited experimental paradigms commonly used in neuroimaging research and performed them in naturalistic contexts. The following sections present this body of research that established the foundations upon which the present study is based.
Measuring Brain Activity during Motion and Outside of the Laboratory
In a seminal study, Gramann, Gwin, Bigdely-Shamlo, Ferris, and Makeig (2010) revealed that transient brain responses to the presentation of stimuli could be extracted from surface EEG data acquired while participants walked on a treadmill. More specifically, the authors examined the impact of walking speed (standing, walking, and walking briskly) on the P300 component amplitude, which is a widely studied feature of EEG signals whose robustness has established it as a gold standard of EEG research (Polich, 2007). The P300 component is a positive deflection in the time domain of the EEG signal occurring 300 msec after the presentation of infrequent or task-related stimuli (typically within the frame of an oddball paradigm) reflective of selective attention processes. These properties established the P300 as a relevant measure to assess the validity and quality of data acquired while participants were in motion or went outside of the laboratory.
Taking the EEG outside of the laboratory, Debener, Minow, Emkes, Gandras, and de Vos (2012) demonstrated the feasibility of recording the P300 component elicited through an auditory oddball paradigm when participants were walking outside versus sitting inside. The neural responses typically elicited by the presentation of target auditory stimuli were observed in both experimental conditions although they were attenuated in the outdoor-walking condition. In a follow-up study (De Vos et al., 2014), the authors controlled for the environmental factor by having the participants perform the same auditory oddball task while sitting and walking outdoor. Similar to the previous study, an attenuation of the P300 effect was observed for the walking condition, which was interpreted as reflecting either a lower signal-to-noise ratio, potentially related to the presence of motion artifacts contaminating the walking data, or a reallocation of attentional resources during walking. These early studies were hinting toward important distinctions between how the mind works under artificial and natural conditions. Circumventing signal-to-noise issues relative to gait-related artifacts, two cycling studies (Scanlon, Redman, Kuziek, & Mathewson, 2020; Scanlon, Townsend, Cormier, Kuziek, & Mathewson, 2019; Zink, Hunyadi, Huffel, & Vos, 2016) confirmed that the attenuation previously reported was partly attributable to the physical activity related to cycling but also, and more importantly, to the higher cognitive demands of being outdoor. Ladouce, Donaldson, Dudchenko, and Ietswaart (2019) further specified the nature of the reallocation of cognitive resources underlying the P300 attenuation by demonstrating that it is not the locomotor demands themselves that take cognitive resources away from the task. Indeed, this research pinpointed that it is the displacement through space that is substantially taxing in terms of cognitive resources. The reallocation of cognitive resources is therefore because of the increased flow of vestibular and visual information that needs to be processed during locomotion. Consistently, Liebherr et al. (2021) reported a reduction of P300 amplitude when students performed an oddball task while finding their way through a university campus as compared with simply walking around a sports field, highlighting the cost of integrating sensorimotor information. Similarly, two studies conducted in real-flight conditions using a passive (Dehais, Rida, et al., 2019) and active (Dehais, Duprès, et al., 2019) auditory oddball paradigms disclosed lower P300 amplitude when participants faced challenging flying conditions involving an increased flow of visual information to be processed. The MoBI approach has further been applied to investigate embodied aspects of perception and attentional processes during natural movement (Cao, Chen, & Haendel, 2020; Cao & Händel, 2019; Reiser, Wascher, & Arnau, 2019; Benjamin, Wailes-Newson, Ma-Wyatt, Baker, & Wade, 2018; Schmidt-Kassow, Heinemann, Abel, & Kaiser, 2013), as well as learning and memory (Schmidt-Kassow, Deusser, et al., 2013). Taken together, these findings further underlined that cognitive functions are altered when people are immersed in an ever-changing, dynamic, and complex environment.
To date, out-of-the-laboratory studies have mainly resorted to the presentation of artificial stimuli through computerized paradigms (e.g., visual or auditory oddball tasks), interactions with artificial apparatus taking place within highly controlled environments, or even experimental protocols characterized by the repetition of prototypical behaviors. An important step toward realizing the vision of studying human cognition in the real world would imply to progressively transitioning from computerized paradigms to the study of cognitive processes in relation to embodied experiences that are grounded in the real world. Such a goal remains challenging as the study of cognitive phenomena in the real world prevents the use of external and artificial experimental probes. In the absence of these experimental markers, questions such as when do cognitive processes happen and how to timestamp and segment the data accordingly become nontrivial issues that need to be addressed to accurately extract cognitive events. In contrast to computerized paradigms in which the timing of cognitive events is dictated by the course of the experiment and can therefore be derived from event markers with very high temporal accuracy, the definition of cognitive events in the real world poses several conceptual and technical questions.
Capturing Cognitive Events in the Real World
A relevant approach to segment and extract meaningful information from continuous brain recordings is to contextualize the data based on physiological and behavioral data recorded simultaneously. An example of such a multimodal approach can be found in a study carried out by Mavros, Austwick, and Smith (2016) in which Global Positioning System (GPS) tracking and brain data were combined to study spatial perception and cognitive experiences related to different types of urban environments. Banaei, Hatami, Yazdanfar, and Gramann (2017) used virtual reality environments and mobile EEG to study the impact of architectural design on spatial representations when individuals walked through them. Through the use of head orientation information provided by the virtual reality system, the brain signals could be segmented to extract brain dynamics related to the embodied experience of the virtual environment. Another example of contextual information being used to retrieve experimental events from continuous brain imaging recordings is illustrated by Mustile et al.'s (2021) study in which infrared motion sensors were placed in a real-world environment to detect the passage of individuals over obstacles and extract neural dynamics reflecting motor planning. Hölle, Blum, Kissner, Debener, and Bleichner (2022) implemented a microphone and a dedicated processing pipeline to detect the onset of natural sounds and lock the associated EEG analyses to unveil auditory attention processes. In a series of studies looking at spatial knowledge acquisition in natural environments (Wunderlich & Gramann, 2021), the experimenter was following the participants navigating through an urban environment and encoded the timing of experimental events manually through a smartphone interface that was synchronized to the EEG system. These manually encoded timestamps, although arguably not of the highest temporal precision, provide nevertheless a basis for the extraction of experimental events. The accuracy in the definition of experimental event timings, however, has important implications for later EEG analyses. Indeed, the millisecond scale of ERP components makes their analysis particularly sensitive to temporal lags. The presence of variance across trials in the latency of the responses will likely result in a smearing of the averaged ERP waveform amplitude or the contamination of ERP components at neighboring latencies.
In natural settings, first-person video recording coupled with eye movements that can be both recorded using head-mounted E-T glasses can serve as contextual reference points to segment and analyze the EEG signal to capture neural dynamics of visual processing. First, the video recording can be used to flag and label experimental events in the continuous EEG data by providing information about the onset and end of a visual search task. The E-T data provide a second layer of contextual information to the scene capture, increasing the temporal resolution of the experimental event segmentation. Indeed, the superposition of gaze dynamics to the video recording allows to extract temporal information about the timing of visual events more precisely (Hayhoe & Ballard, 2005). Gaze dynamics are composed of a sequence of discrete fixations whose features (e.g., duration, pattern, previous saccadic distance) provide information about the timing and depth of visual and attentional processes. The timing information of the initial fixations on experimental objects to extract fixation-related potentials (FRPs; Baccino & Manunta, 2005) has been widely adopted to investigate neural underpinnings of reading and visual search.
Applying FRP analysis approach to free viewing visual search paradigm, Brouwer, Reuderink, Vincent, van Gerven, and van Erp (2013) demonstrated that the P300 ERP component can be used to infer whether participants are looking at target or nontarget stimuli. Kaunitz et al. (2014) also observed the emergence of sensory and attentional components in the FRP associated with the detection of target faces in a scene crowded with distractor faces. The authors further contrasted the free visual search task to a control fixation task and revealed differences in terms of the topography and latency of the P300 component. The free viewing paradigm elicited P300 responses that were most prominent over centro-parietal electrode sites, whereas the traditional oddball paradigm exhibited an initial earlier peak at frontal sites followed by later peaks over parietooccipital sites. These results further demonstrated that the P300 component is robustly elicited upon target detection during visual exploration of natural scenes. Kamienkowski, Ison, Quiroga, and Sigman (2012) reported similar FRP components in a free viewing search task and a replay task during which individual elements of the scene are presented as discrete sequences. It is, however, important to note that in the context of free viewing search task paradigms, participants are typically instructed to keep fixating at the target object when they identify them for an extended period of time. This practice aims to facilitate later ERP analyses by avoiding eye movements artifacts contaminating the EEG signals time-locked to the fixations on targets. Applying such methods, Roberts et al. (2018) and Soto et al. (2018) explored neural markers of faces and economic value processing related to the viewing of visual elements (pictures presented on panels) placed in the real-world using mobile E-T and wireless EEG. In both studies, the panels comprised a central fixation cross to which participants had to return their gaze to after looking at each individual element for a minimum of a few seconds. Despite the artificial nature of the task, these studies demonstrated the feasibility of capturing P300 FRP in a real-world environment. It however remains unclear whether such approach applies in the context of a naturalistic behavior.
Aim of the Present Study
The present proof-of-concept study assesses the feasibility of capturing neural markers of visual processing of objects embedded in (i.e., being an integral part of) a real-world environment. To achieve this, the present study applies the concept of scalable experimental design (Parada, 2018; Ayaz & Dehais, 2018) according to which similar cognitive phenomena are studied over a spectrum of experiments ranging from highly controlled to naturalistic environments. Inspired by this approach, the present study contrasts two experimental conditions. The first condition consists of a classic computerized paradigm to elicit neural responses related to the processing of visual information. For this purpose, abstract visual stimuli are presented on a screen and participants are instructed to count the number of occurrences of a certain type of stimuli (i.e., targets). The artificial (as opposed to natural) elements of this condition make it far removed from experiences taking place in the real world. In contrast, the second condition maximizes ecological validity (i.e., the applicability of research findings to real-life contexts) by getting as close as possible to the recording of a natural behavior taking place in the real world. To this end, the second condition consists of the performance of a scripted (semistructured) but realistic behavior (i.e., searching for a book) grounded in a real-world environment (i.e., a library). Through this leap across the ecological validity continuum, differences related to embodied experiences can be explored whereas conceptual and methodological gaps pertaining to the study of cognitive processes in the real-world can be uncovered. An elegant example of how to fill the ecological validity gap is provided by the work of Chen, Cao, and Haendel (2022), in which translucent augmented reality glasses were used to superpose visual stimuli on a dynamically changing real-world environment as participants walked around. The authors report an amplification of early sensory responses (N1 component) to visual stimuli during locomotion. Interestingly, later EEG dynamics (N2pc ERP component and alpha oscillations) reflective of cognitive processes related to stimulus discrimination were not altered by the act of walking in the real world. By enabling the presentation of experimental stimuli superposed to participants' natural field of view in a timely manner, augmented reality technology offers a practical solution to retain experimental control while studying visual experiences in real-world environments. In contrast, the present study sets out to address whether neural markers of visual cognitive processes elicited through computerized paradigms are also naturally present (i.e., embedded) in the real world. This aim not only implies methodological capacities in terms of recording meaningful neural data during real-world behaviors (which have been demonstrated by the aforementioned MoBI body of research) but also novel solutions to extract experiment-related neural signals from the continuous recordings of dynamically and ever-changing environment without resorting to the introduction of artificial/extraneous stimuli (i.e., through computerized paradigms). The visual search task in the library and its high degree of ecological validity offer the opportunity to identify and address the challenges related to the capture of cognitive events as they occur naturally during real-world behaviors.
Twenty-four participants took part in the study. The participants were exempt from any motor, visual, and cognitive impairment. All the participants were provided with detailed information regarding the experimental protocol and were introduced to EEG and E-T recording procedures. Inclusion and exclusion criteria were checked through the completion of a questionnaire by the participants. The participants gave their written informed consent to take part in the study. The study was reviewed by a local research ethics committee, complied with data protection laws, and was carried out in accordance with the principles delineated in the declaration of Helsinki. The order of the conditions (tablet, library) was counterbalanced across participants to control for potential biases related to fatigue and training. Inconsistencies between synchronization pulses timing sent to the EEG and E-T data streams led to the exclusion of two data sets. The average gaze tracking proportion across participants was above 80% for the remaining participants, with most of the missing gaze data coinciding with displacements across the library. Overall, these missing data points were therefore not consequential with regard to analyses that concerned periods during which participants were static (i.e., standing in front of the library shelves). However, three additional data sets were excluded from the study because of their insufficient proportion of the E-T data containing pupil position (42%, 55%, and 58%). In those data sets, a substantial portion of gaze data were missing when the participants were scanning through the shelves. In the absence of gaze information at key moments of the experimental paradigm (i.e., when a participant locates a target book cover), it was therefore impossible to retrieve an estimation of experimental events timing. As a consequence of the aforementioned technical issues, only 19 out of the 24 initial recordings were included in the reported analyses. The remaining number of data sets complied with the minimal sample size of 16 individual data sets estimated based on previous mobile P300 ERP studies (Ladouce et al., 2019; Reiser et al., 2019; Debener et al., 2012).
Following a standard visual P300 elicitation oddball paradigm, infrequent target stimuli were presented within a series of frequent nontarget stimuli (at a 1:4 ratio). The visual stimuli were presented on a Windows Surface tablet positioned on a library shelf at 60-cm distance from participants' eye level as they were standing still. The target stimuli consisted of red circles and nontarget stimuli were blue squares of matching areas (circle diameter = 5.1 cm, square length = 4.52 cm). Therefore, visual stimuli width was maintained throughout the tablet condition at 4° of visual angle. All stimuli were presented in the center of the screen for a 200-msec period, which was followed by an 800-msec interstimulus interval. The participants were instructed to mentally count the number of target stimuli presented. A total of 300 (260 nontargets, 40 targets) stimuli were presented through a Python-based program operating on a 10.8-in. tablet (60-Hz refresh rate, Windows 10). The communication rate of devices and software used to send event markers to the amplifier was measured through blackbox testing (BBTK v2 system from Black Box ToolKit Ltd.). This approach measures the magnitude and variance of the interval difference between the local timing of stimulus onset on the external device (tablet running the visual presentation program) and the registration of the triggers received on the amplifier end that will be interpreted as timestamps of experimental events in the EEG trace. The accuracy of stimuli presentation timing was assessed over an hour-long recording. The test revealed a 50-msec difference between stimulus presentation onset and the EEG timestamp, with low variance throughout the recording (SD = 2 msec). Accordingly, event marker latencies were corrected to account for the measured delay by subtracting 50 msec (implemented at the beginning of the processing pipeline).
The library condition consisted of a natural visual search task and was designed as an analogy to the tablet paradigm previously described. Participants (all native English speakers) were instructed to search for 40 books (whose titles were all written in modern English) in the library. Target visual objects were the books that the participants were instructed to find in the library, whereas nontarget objects were defined as the four book covers preceding the initial visual exploration of a target Book Cover 1. Each trial started at a specific location in the library where participants received instructions regarding the title and the location (aisle of the library where the bookshelf was located, and on which shelve the book could be found) of the target book on a sheet of paper that they carried with them. Once a book was found, participants returned to the starting position to pick up the next sheet providing them with instructions about the next book to find. In contrast to the tablet condition in which distance from stimuli presented on the tablet was fixed, the natural exploration of the library was completely free of restrictions. The lack of information regarding eye-to-stimulus distance and its potential trial-to-trial (and even moment-to-moment) variance complicates the computation of visual angle measures for the tablet condition. Assuming that participants maintained a distance ranging from 60 cm to 1 m away from the shelves while searching for book covers, the visual angle can be estimated as a range between 2°51′ and 4°. This approximation, however, does not allow for a valid definition of the different subfields of central vision as further explained in the Discussion section.
Several factors were considered for the selection of target books. All the target books selected were placed on shelves that were at the eye level of the participants to reduce the contamination of the EEG signal by artifacts related to neck movements and other muscular activity. Early works on visual attention have revealed the bottom–up influences of low-level visual features (i.e., intensity, contrast, and edge density) on the initial visual exploration of a scene (Peters, Iyer, Itti, & Koch, 2005). The confrontation of computational models inspired by this visual saliency hypothesis with experimental data has later revealed that such bottom–up influences, although partially accounting for the visual exploration pattern, are, however, not sufficient to accurately predict visual exploration within the frame of complex scenes (Henderson, Brockmole, Castelhano, & Mack, 2007). Furthermore, empirical evidence from scene perception and visual search experiments highlighted the major contribution of top–down processes (e.g., use of prior semantic information to orient visual search) in how complex scenes are explored and perceived (Birmingham, Bischof, & Kingstone, 2009; Underwood, 2009). As both stimulus-driven and cognitive-driven processes interact and influence how information embedded in complex scenes is perceived and processed, several dispositions were taken to ensure consistency in bottom–up and top–down influences across trials. The top–down influence was controlled through the homogeneity of books within a shelve in terms of their semantic field. Indeed, this homogeneity does not favor top–down driven exploration strategies such as parsing and skipping book covers based on prior semantic information gathered (e.g., the target book is more likely to be surrounded by books of the same semantic field). As a consequence of this consideration, shelves with semantically homogeneous books (i.e., related to the same lexical field) were included in the experiment. Moreover, the classification system adopted by the library used as an experimental environment was not based on alphabetical order but followed a systematic catalog arrangement (i.e., books sorted in accordance to the domain and types of publications). This nonalphabetical ordering makes the filtering of content during the initial stage of a bookshelf exploration more difficult. The systematic catalog system could nevertheless be leveraged for semantic parsing of bookshelf content, but it requires metaknowledge regarding the classification system used itself and domain-specific knowledge to navigate and parse effectively sections that do not correspond with the title of the target book. The wide variety (genre, types) of the books selected across trials further discouraged the adoption of such domain-specific search strategies by the participants over the course of the experiment. To minimize the bottom–up influences of objects whose visual properties make them stand out from the rest of a visual scene, the size and color of book covers were taken into consideration in the design of the experiment. Indeed, shelves containing books whose covers were particularly salient were not included in the experiment. The position of the book relative to the edge of a bookshelf was another factor taken into consideration for the selection of target books. Indeed, a qualitative inspection of preliminary E-T data confirmed that participants mainly scan through the shelves using an initial reading-like approach (left to right and top to bottom) as a default exploration strategy. Therefore, to gather sufficient fixations on individual books preceding the first fixation on the target book to allow for an analogous analysis, the position of the target book relative to the edges of the bookshelf was purposefully central.
EEG Data Recording and Processing
EEG data were recorded from 32 sensors fitted in an elastic cap following the International 10–20 system, which were tethered to a portable amplifier (eego sports from ANTNeuro) recording data at a sampling rate of 500 Hz (with a 0.1- to 250-Hz on-line bandpass filter). The amplifier was fitted in an ergonomic backpack carried by the participants. The data were initially referenced to channel Cpz with the ground placed at the Afz electrode site. Electrode impedance was measured before each recording session, and each channel was maintained below 5 kΩ using electrode gel. EEG data were downsampled to 250 Hz, and mastoid electrodes (M1 and M2) were discarded. The data were then rereferenced to the average of all remaining electrodes.
Continuous EEG data from both recording conditions were processed jointly using the EEGLAB (Delorme & Makeig, 2004) open-source toolbox and custom MATLAB scripts (Version R2019b 9.7.0, The MathWorks Inc.). As an initial preprocessing step, the continuous data were visually examined and the portions of the EEG displaying extreme levels of noise (e.g., channel disconnections) were manually discarded. Following this manual data rejection preprocessing, the processing pipeline was divided into two stages. In the first stage, the data sets were filtered with a low-pass filter of 20 Hz and a high-pass filter of 1 Hz with a −6-dB cutoff and a filter order of 1650. Then, the continuous EEG was split into consecutive epochs of 1 sec. Epochs presenting abnormal values were pruned based on standard statistical criteria (more than 3 SDs from the mean).
Following the initial filtering and removal of noisy data, the first stage of artifact removal was carried out. An extended infomax Independent Component Analysis (ICA; Bell & Sejnowski, 1995) was performed on the remaining data, and the resulting independent components (ICs) decomposition matrices were saved. In a second stage, the IC features obtained during the first stage of the processing procedure were back-projected to the original filtered data. An automatic classification algorithm (ICLabel) was used to classify ICs (Pion-Tonachini, Kreutz-Delgado, & Makeig, 2019). The results of this classification were examined, and ICs identified as artifactual (i.e., ocular and cardiac components) were confirmed manually. The weights of ICs reflecting common artifacts such as eye blinks, eye movements, and heartbeats were subtracted. After this ICA-based data pruning, an average of 58% (SD = 8.2%) of the initial ICs remained across participants. This proportion of remaining components is in line with the guidelines proposed by Klug and Gramann (2021). The ICA-pruned continuous data sets were then epoched around the onset of experimental events (−2000 msec to 2000 msec). Epoched data were then split into the experimental paradigm (tablet and library) and stimulus type (target and nontarget) conditions and baseline corrected (the mean voltage recorded within the 200-msec prestimulus period was subtracted from the signal for each electrode and each trial). Averaging across epochs resulted in the obtention of ERP waveforms for each condition. The P300 effect amplitude was computed as the voltage difference (in microVolts) between target and nontarget ERP waveforms within the a priori time window ranging from 250 to 500 msec after stimulus onset. The P300 latency was extracted based on the maximal value recorded within the a priori time window on a single-trial basis.
Although not subjected to statistical analyses, event-related spectral features were nevertheless examined. Time–frequency decomposition of the epoched data were performed through complex Morlet wavelet convolution. Wavelet frequency ranged from 1 to 20 Hz in 38 linearly spaced steps with the number of wavelet cycles increasing from 3 to 16 following a 0.8-step increase. Frequency-specific power was baseline-corrected using a decibel (dB) transform for each time point of the epoched data relative to the mean spectral activity recorded during the prestimulus period (−200 to 0 msec relative to stimulus onset) on a single-trial basis. Relative power change was averaged over time points within the data-driven time window used for P300 ERP analysis (300–500 msec).
E-T Data Recording and Processing
Gaze dynamics were recorded using a portable Tobii Pro Glasses 2 E-T system (Tobii Pro AB). E-T data were acquired from four eye cameras tracking pupil position and corneal reflection binocularly at a sampling rate of 100 Hz. Built-in parallax and slippage compensation methods were performed to maintain E-T tracking accuracy during movement. The calibration procedure consisted in presenting a target placed at 1-, 3-, and 5-m distance from the participants to ensure reliable tracking at different fixation depths. The E-T apparatus comprises the camera-equipped glasses and a recording unit to which the glasses were connected through HDMI. The recording unit was fitted in the backpack with the mobile EEG amplifier to which it was connected through micro USB to the 8-bit trigger input. A timestamp was generated by the E-T every 5 sec and sent to the EEG amplifier for synchronization purposes. Before the study, the accuracy of the synchronization triggers has been extensively tested over hours-long recordings. The delay between recording systems remained below 10 msec and was consistent throughout the testing recordings. The raw E-T data were then reviewed visually, and periods characterized by poor tracking accuracy were recalibrated using known fixation points (e.g., participants were instructed to look at a fixation point at the beginning of each trial). Missing gaze samples were interpolated (using a moving median of five samples) if the gap between retrieved samples was lesser than 75 msec, otherwise the samples were considered lost. The proportion of gaze samples retrieved throughout the recording (expressed in percentage) was over 75% for the majority of participants. As mentioned above, two outlying data sets had to be excluded because of their low proportion of E-T samples recorded. The continuous data were then subjected to a noise reduction function based on a nonweighted moving median filter with a window size of three samples. A classification algorithm was then performed on the raw E-T data to identify fixations. The built-in Tobii I-VT Fixation Filter was used with a velocity (expressed in visual degrees per second) threshold of 30°/sec over 20-msec window length. Gaze samples above the velocity threshold were classified as saccade samples. Short fixations lasting less than 50 msec were discarded. Adjacent short fixations were merged when their interfixation (saccade) duration was lower than 75 msec or that the visual angle difference between these fixations was lower than 0.5°. Henderson and Luke (2014) have reported that the mean fixation time was around 250 msec during complex scene visual search tasks and that this fixation duration, although prone to intersubject variability, was stable within and between sessions. Based on these findings, a lower threshold of 200 msec was used for the definition of visual fixations.
The processed E-T data were then visually inspected by the experimenter, and experimental events timings were manually annotated using the Tobii Pro Lab software. The onset of the initial fixation on a target book cover was used as a timestamp for the definition of a target trial. The timings of the onset of preceding fixations on four distinct book covers were used to retrieve nontarget experimental events. This approach to the definition of the library experimental events timing was adopted to allow for comparisons with the tablet condition in which target stimuli were presented in the midst of nontarget stimuli with a 1:4 ratio. It should be noted that an automatic solution for event extraction was available at the time of the study. This approach, however, required placing QR-code probes in the vicinity of experimental objects. The high contrast and odd nature of these probes with regard to a library environment yield strong bottom–up influences on visual attention. Not only those probes would act as cues heavily orienting participants' visual exploration of the environment, but the number of probes required to assign every potential experimental object (book cover) made this solution inadequate within the frame of the present experimental design (Figure 1).
Statistical analyses were performed on the mean amplitudes within the P300 time window (300–500 msec) recorded at Pz electrode site where the P300 is most prominent (Polich et al., 1997; Alexander et al., 1996). Repeated-measures ANOVA and paired-samples t tests were performed on the extracted amplitude features. To ensure that parametric analysis was appropriate, a normality test was carried out to ensure that the data followed a normal Gaussian distribution. In addition, Holm-Bonferroni correction for multiple comparisons was applied for all post hoc t tests. Partial eta squared (η2) and Cohen's d measures of effect sizes are reported for ANOVA and t tests, respectively.
P300 ERP Analyses
A repeated-measures ANOVA was carried out on ERP features with the Experimental Paradigm (tablet, library) and Stimulus Type (target, nontarget) as factors. Post hoc paired-samples t tests were carried out to explore the main effects.
The repeated-measures ANOVA revealed that both the paradigm, F(1, 18) = 8.98, p < .01, η2 = .33, and the stimulus type, F(1, 18) = 17.62, p < .001, η2 = .49, had a main effect on P300 ERP amplitude. Moreover, an interaction, F(1, 18) = 19.98, p < .001, η2 = .52, between the two factors was found.
Post hoc comparisons revealed that target stimuli elicited P300 ERP responses of significantly higher amplitude than nontarget stimuli for the tablet condition, t(18) = 5.2, p < .001, Cohen's d = 1.19, but not for the library condition, t(18) = 1.33, p = .209, Cohen's d = .29. The present results indicate that a P300 ERP response was consistently elicited in the tablet condition, whereas the effect was not observed in the library data (as illustrated in Figure 2).
As discussed in the previous sections, there are many unknown variables affecting the definition of the onset of a cognitive event in the real world. The a priori approaches based on gaze data (i.e., fixations on experimental objects) may not coincide with the actual onset of cognitive processing of that particular visual information. The cognitive processes related to a visual fixation may precede or follow the initial fixation. The temporal gaps in either of those scenarios would introduce variance in the latency of ERP responses, which would not survive averaging processes. The inspection of gaze data provided striking evidence that both scenarios (i.e., fixation-event related potentials (fERP)-based definition of visual processes onset being late or early) were commonly found within single recordings and across participants.
Indeed, the first fixation on the target book may already be relatively late with regard to the overall temporal course of the visual processing of that information, which may have already started when the information entered the peripheral visual field. This possibility is illustrated by the gaze pattern of the participants preceding target identification: The visual exploration strategy typically shifts from an orderly scan of the book covers to a sudden ballistic saccade toward the target. The large angular distance of these saccades further suggests that the book covers present in the periphery are already being processed semantically. In this scenario, the ERP responses related to the visual processing of these target trials would precede the initial fixations and, therefore, the ERP would be temporally shifted in time in what is commonly called the prestimulus period. This shift is particularly problematic for the application of baseline correction approaches that aim to detrend the data using this prestimulus period. If the prestimulus period contains the signal of interest, then subtracting it from later time-series signals would introduce antagonist artifactual effects.
In contrast, all recordings contained several trials in which the individuals visually explored the target book cover without identifying them as targets and continued their exploration of the shelves. The object may have only been identified as a target after having been fixated on. In that second scenario, using the fixation on the object as a timestamp for the onset of visual processing introduces a delay in the responses shifting the neural signals at a later point in the time series. This second scenario is more likely to occur when the target object is less salient and/or there is more competing visual information in the visual field such that the bottom–up influences of the nontarget objects counteract top–down strategies. Although these “missed” trials were relatively infrequent (their number was too low to allow for a dedicated ERP analysis), averaging them with actual target identification trials would lower the signal-to-noise ratio of ERP responses of the latter. The strict definition of the first fixation on a target book cover was therefore relaxed, and the fERP onset for such trials was changed to the first fixation on the target preceding their actual identification (i.e., the participant stops scanning the shelves and returns to the starting point).
Both of these phenomena are likely to occur in real-world environments where the objects and their surroundings' visual features are variables. Not only the cover of the target books may have been more or less salient, but the density of the books on the shelves, visual properties of the books' covers (i.e., colors, width, orientation), the lighting in different parts of the library, and other priming effects through semantic association (e.g., shared lexical field) induced by surrounding cover titles are as many factors that may affect the timing between early visual and later attentional processing of target objects and their visual exploration. Therefore, the naive fixation-based approach for the timestamping of experimental events is inherently limited to capturing accurately the onset of cognitive processing of visual information embedded in the real world.
Another source of temporal imprecision comes from the relatively low sampling rate of the eye-tracker scene camera. Indeed, the scene recording is captured at a rate of 30 frames per second. There is therefore a 33-msec gap between every frame. This gap means that the accuracy of visual events timing achieved by reviewing the video recording frame by frame is limited by the temporal resolution of the scene capture. In addition to this temporal variance, the gaze data superposed to the scene recording is acquired at a higher rate (100 Hz) and is smoothed to match both recordings. Considering the additional degrees of freedom that apply to a fully mobile E-T system, it is sensible to assume that the matching between the scene recording and gaze data points may be subject to some imprecision, especially during head movements and even more so during whole-body movements. Any incoherence between the data streams comes, however, at the price of an additional 33-msec variance added on top of the original 33 msec. The millisecond scale of ERP components and the averaging process usually applied to uncover the signal from background activity make them particularly sensitive to subtle temporal variations. Taken together, the aforementioned considerations suggest that mobile E-T data, while providing contextual information for the definition of experimental events, may, however, not provide a temporal estimate sufficiently precise to perform ERP analyses. The direct consequence of any of these sources of temporal imprecision is that there would be important variations across trials in terms of ERP latencies, essentially leading up to smearing effects (Ouyang, Herzmann, Zhou, & Sommer, 2011) or even canceling out potential ERP components through averaging procedures.
Although it appears plausible that the absence of ERP components time-locked to the initial fixation on experimental objects observed in the library condition may be caused by intertrial latency variability, it is nevertheless important to consider that such brain signals may simply not be present during the library visual search task. Indeed, the P300 ERP component could be an artificial response evoked by computerized paradigms that do not transfer to the real world. The former hypothesis implies that ERP components would manifest at the single-trial level but would be shifted in the time domain, whereas the latter hypothesis implies the total absence of such ERP components for the library condition. To elucidate these competing assumptions, single-trial ERP responses to target book covers were inspected over a larger time around their fixation onset. As can be observed in Figure 3, signals sharing spatial (parietal topographical distribution) and spectral features (delta and theta band activation) of the P300 ERP are present at the single-trial level. The variance in the latency of such signals is, however, important, spanning across both the prefixation and postfixation periods, with a wide temporal distribution over the latter. These observations suggest that the P300 ERP response may be present in the library condition recordings but have substantial variance in their latency. The FRP approach applied to define the onset of a cognitive event may not be valid in the context of the present real-world data. It remains unknown whether the P300-like signals observed at the single-trial level are effectively a specific response to target stimuli. To address this question, library epoched data of both target and nontarget stimuli were compared with a subject-specific template of P300 ERP response based on the tablet ERP average waveform. A higher similarity between target waveforms than between the template and nontarget stimuli would provide further evidence, suggesting that the library target responses are effectively reflecting time-shifted P300 ERP responses.
Assessing Similarity between Library and Tablet Signals
A template matching method accounting for temporal shifts is required to assess whether the P300-like signals observed across the library EEG data are similar to time-locked ERP responses recorded in the tablet condition. Dynamic Time-Warping (DTW) algorithms allow to compute measures of similarity between time series that may vary in speed and consequently be shifted in time. DTW is an ubiquitous approach commonly applied to speech recognition to handle variations in speaking speed. DTW algorithms compute an optimal match between two time series, in the present case a template based on the average ERP waveform of the tablet condition and single-trial ERP waveforms of the library condition. The sequences are nonlinearly warped (i.e., shifted) in the time domain, every data point of each sequence being matched with at least one data point from the other time series. A cost measure is computed as the sum of absolute distance values of each matched pair of indices. The optimal match is selected on a minimal cost basis. The similarity (sometimes referred as dissimilarity depending on applications) measure provided by DTW accounts for amplitude differences between the signals following the nonlinear warping.
For each participant, a template was defined as the average ERP waveform (1- to 8-Hz bandpass filtered) of target stimuli during the tablet condition (see Figure 4A). DTW similarity measures were computed between the template and single-trial time series (1- to 8-Hz bandpass filter) for both target and nontarget stimuli of the library EEG data. The library target trials were significantly more similar to the template than the nontarget trials, t(18) = 4.58, p < .001, Cohen's d = 1.05, following nonlinear warping in the time domain. This similarity further support that P300 ERP responses are present in EEG data epoched around the first fixation on the target book covers as these time series are more to typical P300 ERP waveform elicited by a computerized paradigm.
Alignment of Real-world ERP Responses
The previous observations and analyses have provided evidence that P300 ERP features are present around the first fixations on target book covers but these signals are not time-locked to the fixation onset. To perform ERP analyses on the library data, it is critical to address the latency variability of its ERP responses. Although DTW measures offer a metric to assess the similarity between temporal sequences, it cannot be applied to correct latency because of its nonlinearity (further developed in the Discussion section). The cross-correlation method computes the similarity between two time series that are shifted along each other. The result of this convolution is a sliding dot product whose maximum value informs about the lag between the two series that optimizes similarity (see Figure 4B). This method is useful to search for known features within long signals. Following a similar implementation than DTW, a subject-specific template of the tablet P300 ERP is slid over every library EEG data epochs. The temporal lag maximizing the similarity between the time series will then be used to align the library single-trial data. The average temporal lag between the tablet P300 template and library single-trial data was −80 msec (SD = 344 msec). The first fixation on target stimuli timing used for epoch extraction was corrected on a single-trial basis. The continuous EEG data were then epoched around lag-corrected markers. The following sections present ERP analyses performed on time-corrected epoched data.
Lag-corrected ERP Analyses
The repeated-measures ANOVA revealed that Condition did not have a main effect on P300 ERP amplitude, F(1, 18) = 0.44, p = .51, η2 = .007, anymore, whereas the Stimulus Type, F(1, 18) = 16.82, p < .001, η2 = .27, had a main effect on P300 ERP amplitude. There was no interaction, F(1, 18) = .996, p = .33, η2 = .007, between the two factors on P300 ERP amplitude. Post hoc comparisons revealed that target stimuli elicited P300 ERP responses of significantly higher amplitude than nontarget stimuli in the lag-corrected library condition, t(18) = 2.403, p = .027, Cohen's d = 0.551. Interestingly, no significant difference was found in P300 ERP amplitude between the lag-corrected library and tablet conditions, t(18) = .454, p = .655, Cohen's d = 0.104, as can be observed in Figure 5.
The present study was designed to assess the feasibility of capturing neural markers of visual attention related to the processing of objects embedded in the real world. Inspired by the scalable experimental design approach, two conditions at both ends of the experimental control and ecological validity continuum were contrasted: a classic visual oddball paradigm running on a tablet and a naturalistic visual search task of book covers in a library. Whereas the tablet paradigm presented a series of discrete stimuli whose onset was timestamped in the EEG data, the library EEG data were epoched around visual fixations on experimental objects.
In the initial analyses, the presence of the P300 ERP response was only found for the tablet condition. The inspection of single-trial data recorded in the library revealed signals whose features were similar to the P300 ERP found for the tablet condition. Such signals appeared to be shifted in time around the initial fixations on target book covers. Temporal jitters have important implications for analyses based on the averaging of brain signals acquired over multiple experimental trials. Indeed, averaging time-shifted signals may lead to a smearing of the resulting averaged waveform. This issue raises the question regarding how to determine the timing of the processing of the visual objects (i.e., books) embedded in the real world. Such a definition is highly dependent on the validity of the measure used. Should that definition be based on the visual information entering the field of view of the individual already raises issues regarding the very definition of this visual field. Is the phenomenological experience of a visual object bound to its entrance into the foveal spotlight? In this case, then the initial fixation on a visual object (or the onset of the preceding saccade) appears a valid approach to extract brain dynamics reflective of such a visual cognitive experience. This assumption, however, does not take into account empirical evidence from reading research that visual information entering parafoveal fields are already processed at sensory but also semantic levels (Pan, Frisson, & Jensen, 2021). Using the entrance within the parafoveal field of view to define the onset of a visual event, however, requires computing the angular distance between the individual's retina and the visual object.
In the absence of a continuous measure of object-to-eye distance over the course of the experiment added to variability in object dimensions, the visual subfield delimitation approach was not possible. As an alternative solution, we decided to apply template-matching approaches to investigate the presence of P300 ERP in the library data. We performed DTW similarity measures using the individual subject template of the P300 response based on the tablet data that we compared with signals recorded in the library. The similarity measures with the tablet template were substantially higher for signals around visual fixations on target book covers than on nontarget book covers, suggesting the presence of time-shifted P300 ERP that are specific to target trials in the library condition. Although DTW is a powerful method to evaluate the likeness of ERP responses between library and tablet conditions, it does not provide a measurement of how much time series have to be warped in the time domain to match the template. Moreover, the nonlinear warping may introduce distortions of the time series and the resulting warped series may be of different lengths (i.e., different number of data points), which may bias further statistical analyses (Zhao, Xu, Li, & Wu, 2020). Another approach was therefore needed to quantitatively characterize ERP features while accounting for temporal lags. The cross-correlation coefficient was therefore used to find the lag that would maximize the similarity between the tablet template and the library data upon which target events latency was then lag corrected.
The realignment of the library signals uncovered a P300 ERP response. The present findings suggest that neural markers of visual attention elicited by computerized paradigms are also present in a naturalistic environment. These results demonstrate the potential of the joint recording of EEG and E-T data to study visual cognition in the real world. Moreover, retrieval of cognitive events latency through analyses carried upon EEG data can in turn be used to gain insight into gaze dynamics involved in object recognition and processing (see Figure 4C). These results also highlight the relevance of data-driven approaches such as template-matching to recover neural signals of interest in real-world data. However, these findings should be interpreted cautiously. For instance, did the P300 evoked response to account for the same cognitive phenomena in the laboratory and real-world scenarios? Is it possible to compare the amplitude and the latency of this latter ERP across the two experimental conditions and draw a conclusion about the effect of environmental factors and task difficulty previously reported in several studies (Cortney Bradford, Lukos, Passaro, Ries, & Ferris, 2019; Dehais, Rida, et al., 2019; Ladouce et al., 2019)? The following sections will address critical differences between the two experimental settings (tablet and library conditions) and discuss the implications of such discrepancies.
Differences between Free-viewing and Gaze-fixed Paradigms
In a gaze-fixed scenario such as the tablet paradigm, no particular predictions of information localization or eye movements are required. In contrast, the active free-viewing search task in the library implies an exploration of the visual environment to identify the identity of a stimulus (target or nontarget). The decision of where to look next at any given time is therefore critical. Because of the high degree of freedom inherent in real-world settings such as in the library condition, our participants' cognitive experiences were more complex, dynamic, and multidimensional. Indeed, at any given time, there is a wide range of multimodal sensory information to process, with each of this information having a certain number of potential states. Based on this information input, the nervous system works toward building an understanding of the present state of the environment and attempts to predict future states. These differences have several implications. First, the higher-order cognitive processes involved in the free-viewing library task to update contextual information and guide the visual exploration of the environment do not correspond to those involved in the gaze-fixed tablet condition. Second, the EEG data recorded in a free-viewing task inherently includes both neural activity related to experimental events but also ocular artifacts (especially within the frame of FRP analyses; see Ehinger & Dimigen, 2019)). The temporal overlap between neural responses evoked by consecutive fixations is another critical issue related to free-viewing conditions. Indeed, the series of rapid eye movements that typically precede the initial fixation on an experimental object may all bring their respective neural signatures, which could be confounded for the visual ERPs as their delayed latencies and their short temporal spacing may overlap with the latter. Although later components such as the P300 are less prone to this issue than early sensory processing components (i.e., N1 and P1; Dandekar, Privitera, Carney, & Klein, 2012), it is worth noting that regression-based analyses approaches have been proposed to disentangle the contribution of consecutive eye movements on EEG signals based on E-T information (Dimigen & Ehinger, 2021). Furthermore, the presence of neighboring visual information requires increasing cognitive demands to select the target information and inhibit the exploration and processing of concurrent information (Hillyard, Hink, Schwent, & Picton, 1973). In the case of the free-viewing library task, top–down cognitive processes direct attentional resources toward the detection and identification of the target while inhibiting bottom–up influences of distracting information. The nature of this competition for attentional resources does not apply to the same extent to the tablet condition. Future studies applying a scalable experimental design coupled with template-matching approaches such as presented here should try to design the laboratory condition to be as close to the real-world condition as possible. In the case of the present study, a free-viewing paradigm where participants look for an object embedded in natural scenes could have been an intermediate condition between the tablet and the library condition.
Permanency and Uniqueness of Real-world Visual Experiences
The perceptual differences between target stimuli in the experimental tasks are another factor that could lead to differences in terms of neural responses recorded between the tablet and library conditions. It has been largely documented in traditional P300 experiments that a higher contrast between target and nontarget stimuli leads to an increase in P300 amplitude. The saliency of target stimuli has been shown to largely contribute to this effect (Luo & Ding, 2020). The tablet experiment uses variations of low-level perceptual properties of the stimuli such as shapes and colors to make the target and nontarget highly distinguishable from each other. The target book covers used in this experiment were all different, and so were their respective neighboring book covers. More than shapes and colors, the shelves density and the width of covers systematically varied across trials recorded in the naturalistic library setting. This means that each trial was singularly different from another in terms of how target and nontarget stimuli were contrasted and how relatively salient the target stimuli were. The objective of the present study was to dive directly into the investigation of neural dynamics during free visual exploration of a real-world environment. For this reason, the experimental setup took place in a naturalistic environment that was not altered by the introduction of experimental probes. Nevertheless, the additional degrees of freedom pertaining to the use of various books as target stimuli in the library condition could, however, be circumvented through the use of identical book covers that would be placed in the vicinity of other book covers that would share similar low-level perceptual properties. Although such an approach is not an ideal solution to achieve the vision of studying cognitive experiences directly in the real world and may be impractical to set up, it could, however, help reconcile the measures by reducing potential differences related to low-level visual processes between experimental conditions. Eventually, the permanence of the visual information is yet another difference between the two experimental conditions. What is meant by permanence here is that the visual objects are always present in the environment, before and after being experienced by the user. This is in contrast to a discrete visual event such as a sudden flash of light or images appearing on a monitor. In the latter cases, the phenomenal visual experience can be timely defined coincidentally to the onset and offset of the physical apparition of the visual stimuli. A distinction should nevertheless be made between experimental events and cognitive phenomena. Computerized paradigms use the former to elicit the latter. In the real world, how such cognitive phenomena are expressed and how they can be captured remain unknown. Therefore, a sensible question is whether neural responses typically observed in the laboratory such as the P300 ERP are merely just a by-product of the specific way sequences of stimuli are presented and are therefore not present in everyday life experiences.
Considerations Regarding Brain and Body Imaging in the Real World
Template Matching Approaches: Advantages and Limitations
This study clearly illustrates how capturing neural responses to visual objects embedded in the real world constitutes a substantial leap (both technically and conceptually) from recording evoked potentials elicited under highly controlled laboratory settings. These challenges were addressed through a combination of novel methodologies ranging from the experimental design to the recording and analysis of the data. However, the template-matching approaches (DTW and cross-correlation) applied to correct the library data event latencies present several pitfalls that are crucial to consider to interpreting the results. First, by comparing the library EEG signals recorded a few seconds before and after the first fixation on the target with a prototypical signal representing a P300 effect recorded using a computerized paradigm, a strong assumption regarding the expression of the visual cognitive event in the library is made. Although this assumption is directly derived from the original hypotheses of the study and therefore provides a sound basis for the application of such a method, the results should nevertheless be interpreted with caution. Indeed, by identifying features in the library data that resemble the template, the alignment resulting from the correction may produce a waveform whose features are similar to the template but whose origin may be different (i.e., brain signals not related to the task or even noise). In the present case, other peaks related to earlier or later components reflecting distinct cognitive processes might be picked up and realigned as if they were a P300. Although this possibility cannot be entirely dismissed, it should be noted that the relative proximity of the corrected responses (indicated by the low lag values around the original fixation point) and the presence of other components that were not included in the template and its localization (as indicated by topographical maps) together suggest that the latency-corrected signals share properties with the tablet signals that go beyond mere spectral and temporal domains similarities. Second, the application of such an approach to the library data not only provides a means to correct the latency of the effect (provided that the previous assumption is valid) but this approach also artificially reduces the temporal variance of such signals in the corrected data by imposing strong temporal constraints that are based on the template data. This issue has implications for the interpretation of both the amplitude and latency of the P300 ERP. Indeed, the strong time-locking and phase-locking induced artificially by the realignment of the EEG signals will exhibit minimal temporal variance that may inflate the amplitude of the averaged waveform as it is not subject to the same smearing that would naturally occur in relation to variance in the P300 latency across trials. Moreover, variations in P300 latency may be particularly informative within the frame of real-world experiments as several aspects of the visual stimuli (i.e., book cover more or less salient) but also environmental factors (i.e., competition of surrounding visual information) could have an impact on cognitive processing speed. Although the cross-correlation-based template matching approach is not an ideal solution, the corrected latencies derived from the EEG signal template provide information that opens a new range of analyses to be carried out on the multimodal data.
Intrusiveness of MoBI Devices: Toward Transparent Solutions
The participants were equipped with a research-grade mobile EEG system that comprised an EEG cap tethered to an amplifier that was fitted in a backpack and E-T glasses also tethered to a recording unit. Although the E-T glasses have been deemed as relatively unobtrusive following a short adaptation period, wearing the EEG cap in a public space such as a university library was reportedly a self-conscious experience for the participants. Although data quality is critical for the selection of EEG systems, the degree of comfort and discretion are also key aspects to consider in the context of real-world studies. An elegant solution to this issue could be found in minimalist EEG devices. For example, around-the-ear electrode arrays are discreet, quick to set up, and can be worn comfortably for extended periods (Debener, Emkes, De Vos, & Bleichner, 2015). The minimalist and ear-centered montage does not, however, offer the spatial resolution of a whole-head EEG system and therefore may not be sensitive to far-field potentials (Meiser, Tadel, Debener, & Bleichner, 2020). Nonetheless, around-the-ear arrays have proven to be an effective solution to record neural markers of visual processes (Pacharra, Debener, & Wascher, 2017) and selective attention to auditory information (Mirkovic, Bleichner, De Vos, & Debener, 2016; Bleichner et al., 2015; Debener et al., 2015). This comfortable solution is particularly appealing for the study of everyday-life human cognition but also offers promising medical use cases such as sleep staging and long-term monitoring of epileptiform activity (Bleichner & Debener, 2017). The future of MoBI research field is linked to the development of inconspicuous and comfortable recording devices as they will enable the range of research to expand to social interactions in real-life situations.
Ethics of Real-world Brain and Body Imaging Data
A wide range of personal information is gathered from pervasive devices (i.e., mobile phones, smartwatches). The recording of GPS, heart rate, and accelerometer data can be used to track where and what an individual is doing at various times of the day. Although privacy concerns are raised sporadically, the quantity and variety of data collected from individuals have been steadily on the rise. More pointedly, instrumentations that were limited to medical and research purposes are now ported into consumer-grade devices. As this new generation of devices find their way to households and their daily usage becomes widely adopted by the population, it will be conceivable that the general public might be desensitized to the intimate nature of the data collected. Gaze data are a striking example of how personal preferences and otherwise covert information regarding one's experience of its environment can be effectively derived. Indeed, metrics such as fixation duration and relative number of saccades when exploring a visual object provide objective measures of how much attention has been paid to the object and indicate the depth of cognitive processing of that information, revealing interest and preferences. Advances in the fields of computer vision are providing efficient ways to label visual objects in video recordings in an automatic and increasingly reliable manner. The extraction of neural markers in relation to labeled visual data adds another layer of insight into how an individual's attention, semantic processing, and decision-making operates. The fusion of gaze and brain data offers novel opportunities to extend Brain–Computer Interface applications beyond the presentation of artificial interfaces and paradigms. It is through the acquisition of such contextual information that neural data will deliver its full potential for real-world applications. Aside from the exciting prospects offered by the exploitation of multimodal data for everyday-life applications, it is critical to reflect on the sensitive nature of such data. Indeed, once equipped with sensors that will become ever more transparent, data will be acquired continuously without the user being conscious of it. The covert aspects of continuous physiological data recordings pose questions regarding the consent of the user for such information to be exploited at any given moment. To be clear, multimodal brain and body imaging data pose novel ethical issues regarding data privacy. It falls upon researchers the responsibility to ensure that data privacy remains a priority by setting precedents of high standards in how such multimodal data are handled.
The challenges related to the study of human cognition in everyday life contexts are numerous. Abandoning computerized paradigm lessens experimental control over a wide range of variables that can add variance to participants' behavior and cognitive processes involved in a certain task. In the context of EEG analyses, this high degree of freedom poses several conceptual and technical issues, notably related to the timing of experimental and cognitive events. The present findings provide evidence that such challenges may be overcome through a combination of scalable experimental design, recording of multimodal brain and body imaging data, and the application of state of the art signal processing and template matching methods. By applying these approaches, neural markers of cognitive processes related to visual information embedded in a real-world environment could be captured. These encouraging results further highlight the relevance of scalable experiments to study human cognition in real-world contexts. Indeed, recording neural responses elicited by the discrete presentation of visual stimuli through a computerized paradigm (i.e., the tablet condition) was instrumental to create a template that could be used to search for and extract similar responses in a more naturalistic, and therefore less controlled, recording setting such as the library condition. Moreover, the extended range of analyses enabled by the joint recording of gaze and brain dynamics showcases the complementarity of MoBI methods. The E-T data provided contextual information regarding the occurrence of experimental events that enabled time-domain analyses to be performed on the EEG data. Although the information provided about experimental event timing may not have the temporal resolution required to perform time-domain analyses at the milliseconds scale, they can, however, serve as an initial estimate upon which template matching approaches are applied to extract ERP features. In conclusion, adopting a scalable approach to experimental design and leveraging the potential of multimodal recording methodologies are important steps toward enabling the study of embodied aspects of human cognition in naturalistic environments.
The authors thank Catriona Bruce and Stephen Stewart for the technical assistance in setting up the experiment. Thank you to Ludovic Darmet for his advice on data analysis.
Reprint requests should be sent to Simon Ladouce, PhD, Department for Aerospace Vehicles Design and Control, Institut Supérieur de l'Aéronautique et de l'Espace (ISAE), Toulouse, Haute-Garonne, 31055, France, or via e-mail: firstname.lastname@example.org.
Data Availability Statement
Original EEG data with experimental events annotations along with the code used for processing and analysis of the data are available on the following online repository: https://osf.io/zhcr7/.
This work was supported by the Agence Innovation Defense - AID (RAPID Neurosynchrone project). Frédéric Dehais is a ANITI (Artificial and Natural Intelligence Toulouse Institute) chairholder. Simon Ladouce and Magda Mustile were supported by scholarships from the University of Stirling and a research grant from SINAPSE (Scottish Imaging Network: A Platform for Scientific excellence).
Diversity in Citation Practices
Retrospective analysis of the citations in every article published in this journal from 2010 to 2021 reveals a persistent pattern of gender imbalance: Although the proportions of authorship teams (categorized by estimated gender identification of first author/last author) publishing in the Journal of Cognitive Neuroscience (JoCN) during this period were M(an)/M = .407, W(oman)/M = .32, M/W = .115, and W/W = .159, the comparable proportions for the articles that these authorship teams cited were M/M = .549, W/M = .257, M/W = .109, and W/W = .085 (Postle and Fulvio, JoCN, 34:1, pp. 1–3). Consequently, JoCN encourages all authors to consider gender balance explicitly when selecting which articles to cite and gives them the opportunity to report their article's gender citation balance.