Driver mental fatigue leads to thousands of traffic accidents. The increasing quality and availability of low-cost electroencephalogram (EEG) systems offer possibilities for practical fatigue monitoring. However, non-data-driven methods, designed for practical, complex situations, usually rely on handcrafted data statistics of EEG signals. To reduce human involvement, we introduce a data-driven methodology for online mental fatigue detection: self-weight ordinal regression (SWORE). Reaction time (RT), referring to the length of time people take to react to an emergency, is widely considered an objective behavioral measure for mental fatigue state. Since regression methods are sensitive to extreme RTs, we propose an indirect RT estimation based on preferences to explore the relationship between EEG and RT, which generalizes to any scenario when an objective fatigue indicator is available. In particular, SWORE evaluates the noisy EEG signals from multiple channels in terms of two states: shaking state and steady state. Modeling the shaking state can discriminate the reliable channels from the uninformative ones, while modeling the steady state can suppress the task-nonrelevant fluctuation within each channel. In addition, an online generalized Bayesian moment matching (online GBMM) algorithm is proposed to online-calibrate SWORE efficiently per participant. Experimental results with 40 participants show that SWORE can maximally achieve consistent with RT, demonstrating the feasibility and adaptability of our proposed framework in practical mental fatigue estimation.

## 1 Introduction

Mental fatigue, a common physiological phenomenon (Borghini, Astolfi, Vecchiato, Mattia, & Babiloni, 2014), induces suboptimal functioning and may even lead to accidents with severe consequences (Van Cutsem et al., 2017). The National Highway Traffic Safety Administration estimates that about 100,000 official reports of crashes are the direct result of driver mental fatigue each year, resulting in an estimated 1,550 deaths, 71,000 injuries, and $12.5 billion in monetary losses. In response to these critical issues, several algorithms have been developed to detect mental fatigue using electrocardio signal (ECG) (Fallahi, Motamedzade, Heidarimoghadam, Soltanian, & Miyake, 2016), functional near infrared spectroscopy (fNIRS), electrooculogram (EOG) (Laurent et al., 2013), and electroencephalographic (EEG) (Lin, Tsai, & Ko, 2013; Jagannath & Balasubramanian, 2014; Sauvet et al., 2014; Wang, Zhang, Wu, Darvas, & Chaovalitwongse, 2015), among others. Of these signals, EEG signals are assumed to be most accurate and valid for providing information related to drivers' mental fatigue owing to their high temporal resolution and the availability of a vast variety of preprocessing methods (Graimann, Allison, & Pfurtscheller, 2009; Sahayadhas, Sundaraj, & Murugappan, 2012; Palanivel Rajan & Dinesh, 2015).

Previous methods for developing automatic systems to detect driver drowsiness from EEG signals can be broadly classified into two categories: non-data driven or data driven. Non-data driven approaches, such as power spectrum-based analysis (Jap, Lal, Fischer, & Bekiaris, 2009; Wang et al., 2018), entropy-based analysis (Kar, Bhagat, & Routray, 2010), and brain network-based analysis (Li, Li, Wang, Zhang, & Wang, 2017), usually resort to handcrafted estimators, like changes in power or statistically related features, to evaluate mental fatigue using EEG signals from multiple channels (Gurudath & Riley, 2014; Gharagozlou et al., 2015). However, these evaluation metrics require expert interpretation and complex calculation processes. In addition, EEG signals are known to be highly specific and vary in great detail among individuals. Thus, non-data-driven approaches relying on predefined criteria are not robust enough to account for individual variability, reducing their applications in practical implementations.

In terms of data-driven mental fatigue evaluation, the reaction time (RT) to a certain assigned task is widely adopted as supervision, to indicate the fatigue level. Some linear (Lin et al., 2010; Resalat & Saba, 2015) and nonlinear (Liu, Lin, Wu, Chuang, & Lin, 2016; Cui & Wu, 2017; Pan, Tsang, Singh, Lin, & Sugiyama, 2020) methods show that it is possible to detect mental fatigue with high accuracy. It is impressive but also blind to the wealth of the dynamics and behavioral variability (Müller et al., 2008; Ratcliff, Philiastides, & Sajda, 2009; Yarkoni, Barch, Gray, Conturo, & Braver, 2009; Xu, Min, & Hu, 2018). Although some recent work (Wei, Lin, Wang, Lin, & Jung, 2018; Cui, Xu, & Wu, 2019) suggested addressing the concerns of inter- and intrasubject variability through transfer learning, those techniques are available only to offline analysis methods with sufficient training samples.

Previous offline analysis methods often result in poor fatigue detection performance due to limited training data in practical implementation (see Figure 1). For example, deep learning (Goodfellow, Bengio, Courville, & Bengio, 2016) methods, requiring massive training data, and Riemannian methods (Barachant, Bonnet, Congedo, & Jutten, 2012; Congedo, Barachant, & Bhatia, 2017), incurring high computation costs, fail to meet the harsh requirement in actual situations. In addition, mental fatigue, drop in mental alertness, and poor driving performances are a reflection of brain dynamics among different brain areas. Recent work demonstrates its efficacy by discriminating functional interactions among different brain regions based on heuristic metrics (Wang et al., 2018; Richer, Zhao, Amores, Eskofier, & Paradiso, 2018) or complex analysis (Li et al., 2017). However, it cannot fully reveal functional interactions among multiple channels in terms of mental fatigue since the analysis is independent of the mental fatigue evaluation that takes place later.

To address these concerns, we introduce a data-driven methodology, self-weight ordinal regression (SWORE), for online driver mental fatigue detection that models functional interactions among brain regions. Instead of formulating SWORE as a regression task with RT being the direct supervision, we consider a more general problem setting: learning to rank. SWORE learns from brain dynamics preferences and aims to achieve consistency with RT indirectly in the sense of ranking. The brain dynamics preferences can be constructed via some objective fatigue indicator, such as RT if available, or some power spectral features (Wang et al., 2018; Bose et al., 2019). Preferences-based, indirect mental fatigue evaluation is proved to alleviate the overfitting issue of directly predicting RT in a regression task (Pan et al., 2020). In particular, SWORE models the brain's dynamic preferences in terms of two states: shaking state and steady state. It automatically discriminates the reliable channels from the noninformative ones by modeling the shaking state and suppresses the mental fatigue nonrelevant fluctuation within each channel by modeling the steady state. Moreover, an online generalized Bayesian moment matching (online GBMM) algorithm is proposed for Bayesian posterior update. Once a new sample (the reaction time corresponding to the newly recorded EEG signals) is available, online GBMM can efficiently calibrate the SWORE model with simple updating rules. In summary, the main contributions of this letter are as follows:

- •
We propose an online mental fatigue monitoring system that can evaluate mental fatigue quickly with a high prediction performance.

- •
We propose the SWORE model to reliably aggregate brain dynamics-related preferences from multiple noisy channels in terms of two states: shaking state and steady state.

- •
We propose an online generalized Bayesian moment matching (online GBMM) algorithm for online calibrating the SWORE model with the analytic update rules.

- •
We conduct comprehensive experimental results on 40 participants to verify the reliability of our system in online mental fatigue monitoring scenarios. Further, we explore the parameter sensitivity and model uncertainty of SWORE with regard to the online GBMM algorithm.

This letter is organized as follows. Section 2 introduces the background of mental fatigue monitoring and motivates the practice of online mental fatigue monitoring. In section 3, we introduce SWORE, an indirect mental fatigue monitoring model, to model the heterogeneous brain dynamic preferences in terms of two states. Section 4 describes an analytic update strategy for online calibrating the SWORE model. Section 5 discusses the details of mental fatigue evaluation in the online scenario. Section 6 demonstrates the reliability of the proposed SWORE model with EEG signals collected from 40 participants. Section 7 concludes the letter and envisions future work.

## 2 Background and Problem Statement

In this section, we introduce mental fatigue monitoring and discuss previous approaches in the online scenario. We list, several subgoals that are necessary for achieving a robust online mental fatigue evaluation model.

### 2.1 Mental Fatigue Monitoring

The reaction time (RT) to an emergency is generally accepted as the most intuitive and resourceful metric to evaluate mental fatigue. The EEG (Lin et al., 2013; Wang et al., 2015) signals as the feature vectors are adopted, which is well known to be accurate and valid to supply the information related to the driver's mental fatigue (Graimann et al., 2009; Sahayadhas et al., 2012; Palanivel Rajan & Dinesh, 2015), compared to, for example, an EEG, ECG or fNIRS (Nguyen, Ahn, Jang, Jun, & Kim, 2017), etc.

Therefore, a common practice for mental fatigue monitoring is to build a learning model that can predict humans' reaction time to an emergency using the EEG signals recorded beforehand (Lal, Craig, Boord, Kirkup, & Nguyen, 2003; Dornhege, del R. Millán, Hinterberger, McFarland, & Müller, 2007; Soon, Brass, Heinze, & Haynes, 2008; Jap et al., 2009).

### 2.2 Impaired Performance on Nonstationary Brain Dynamics in Online Applications

Some previous work derived from linear (Resalat & Saba, 2015; Lin et al., 2010) and nonlinear (Liu et al., 2016; Cui & Wu, 2017; Pan et al., 2020) methods show that it is possible to detect mental fatigue with high accuracy. It is impressive, but it would be rather blind to the wealth of the dynamics and behavioral variability (Müller et al., 2008; Ratcliff et al., 2009; Yarkoni et al., 2009; Wei et al., 2018; Cui et al., 2019). Brain dynamics are nonstationary; they are characterized by significant trial-by-trial and subject-by-subject variability (Ratcliff et al., 2009; Yarkoni et al., 2009). However, the above methods, designed for offline analysis with sufficient training samples, would result in poor generalization performance in actuality without efficient online calibration.

For better illustration, we trained three support vector machine regressions (SVR)^{1} using 20, 40, and 60 sequential trials and visualized their prediction performances on the rest trials, respectively. From Figure 1, we find that (1) apart from a few local mispredictions, SVR can exactly predict the RT on the training trials and is insensitive to the extreme values. It proves that SVR has sufficient fitting capability for mental fatigue monitoring and has superior robustness to extreme values compared to deep regression models (Pan et al., 2020). (2) All SVR models show poor prediction performance^{2} on the remaining trials. This is consistent with our conjecture that previous offline analysis will suffer from severe generalization issues in online scenarios. (3) Increasing the training trials marginally improves prediction accuracy. In particular, when the number of training trials increases from 20 to 60, the mean absolute error decreases only by 0.34 s.

### 2.3 Online Mental Fatigue Evaluation

In pursuit of online mental fatigue evaluation, computational efficiency in terms of time and memory has been a major concern. Deep learning (Goodfellow et al., 2016) methods achieve superior performance (Pan et al., 2020) but require massive training data. Riemannian methods (Barachant et al., 2012; Congedo et al., 2017) achieve good performance with a small number of training trials but incur an overhead computational cost. Another important factor among existing and proposed methods is lack of efficient aggregation mechanisms to distill reliable predictions from multiple noisy channels. In particular, majority voting and concatenation suffer from overfitting and poor generalization performance (Pan et al., 2020).

Based on this analysis, we summarize three subproblems, which we address in this letter to develop a robust online mental fatigue evaluation model:

- •
How to reliably detect metal fatigue using the EEG signals as well as the corresponding RTs

- •
How to automatically eliminate noninformative channels during the learning process

- •
How to effectively calibrate the learning model with an EEG signal when its truth RT is available

## 3 Self-Weighted Ordinal Regression for Brain Dynamics

### 3.1 Brain Dynamic Preferences

As shown in Figure 1, it is usually difficult for a learning model to get the exact estimation of RT since RT values do not change smoothly, and the relationship between RT values and fatigue levels is not exact but relative due to time and subject. The performance would worsen in the online setting when only a few training trials are available. Meanwhile, a rough but reliable estimation is acceptable in real-world situations of mental fatigue monitoring (Colosio, Shestakova, Nikulin, Blagovechtchenski, & Klucharev, 2017). Therefore, we model the brain dynamics-related preferences instead of the exact values of RT.

**Remark 1**

(From Regression to Ordinal Regression). Let's revisit the prediction of RT in the perspective of ordinal regression. RT is actually defined in the complete ordered field $R$, which owns its structure meanings. The relative structure information is entirely preserved among the pairwise comparisons of RTs. Therefore, if there exists a learning model that can maximally preserve all structure information, a new trial can find its own position (a rough estimation of RT) by its comparisons with previous recorded EEG signals. See section 5.3 for more details.

^{3}$(xtn,xt+1n)$ (typically a pair of $d$-dimensional feature vectors) can be constructed with the corresponding pairwise EEG signals recorded from each channel ($\u2200n=1,2,\u2026,N$). Every EEG sensor used for recording is assumed to record independently from the scalp without influencing other sensors (Homan, Herman, & Purdy, 1987; Teplan, 2002), so brain dynamic preferences are constructed for each channel independently. Therefore, $N$ brain dynamic preferences are constructed for each comparison.

**Remark 2**

(Indirect Mental Fatigue Monitoring). The word *indirect* is adopted for comparing with the use of RT as the direct supervision in the regression task. Meanwhile, the objective fatigue indicator used for constructing brain dynamics preferences is not limited to RT. Other well-studied and easily accessible power spectral features (Borghini, Astolfi, Vecchiato, Mattia, & Babiloni, 2014; Chai et al., 2016), such as dynamic time warping, entropy, and functional connectivity, can also be adopted as fatigue supervision (Wang et al., 2018; Bose et al., 2019) for constructing the preferences.

Meanwhile, due to individual variability, the mental fatigue criteria defined by a specific RT value vary from person to person. Ranking-based criteria can avoid this since it can capture the normal level by modeling the ordering connection of several EEG signals.

### 3.2 Heterogeneous Brain Dynamic Preferences

To improve model stability, we introduce an insensitive zone, which flattens the steepest gradient around the boundary and therefore enables the classification model to be less sensitive to the subtle difference between the response times.

#### 3.2.1 Shaking State

The shaking state $y\u2208Y1$ has two cases: an up ($RTt+1>RTt$) and a down ($RTt+1<RTt$), which can be formulated as the learning-to-rank problem.

**Remark 3**

(Superiority over the Regular Weighted Average). From the perspective of EEG channel analysis, equation 3.4 provides a new aggregation mechanism to combine the information from different channels. Different from majority voting, which simply categorizes the channels into reliable and noisy ones, this equation performs a fine-grained analysis and categorizes the noisy channels into nonrelevant ones and negative reliable ones. Therefore, three types of channels can be recognized with the channel reliability $\pi n\u2208[0,1]$: positive reliable ones ($\pi n\u21921-$),^{4} nonrelevant ones ($\pi n\u22480.5$), and negative reliable ones ($\pi n\u21920+$), $\u2200n=1,2,\u2026,N$.

#### 3.2.2 Steady State

The steady state $y\u2208Y2$ denotes the brain dynamic preferences with comparable RTs.

**Remark 4**

(Gradient Flattening Enhances Model Robustness). The gradient flattening used in equation 3.5 can be understood as a regularization. It enables our model to be robust to the fluctuation between brain dynamics, which is not relevant to RTs.

### 3.3 Self-Weighted Ordinal Regression Model

**Remark 5**

(Reliability of the SWORE Model). (1) Interchannel reliability: SWORE trusts the brain dynamic preferences from only positive and negative reliable channels. Since SWORE trains a mixture of two complementary classifiers with shared parameter $w$, it categorizes the channels into positive channels ($\pi n\u21921-$), negative channels ($\pi n\u21920+$), and nonrelevant channels ($\pi n\u22480.5$). Based on channel reliability $\pi $, SWORE can automatically choose the suitable classifier to extract the correct information from the positive and negative channels and update the shared parameter $w$ accordingly. Further, it ignores information from nonrelevant channels by assigning a constant likelihood (i.e., 0.5) to each brain dynamic preference from the nonrelevant channels.

(2) Intrachannel reliability: SWORE extracts only task-related information from each brain dynamic preference. Since the probability of the steady state $y\u2208Y2$ (see equation 3.5) does not depend on channel reliability $\pi n$, gradient flattening actually performs as a regularization on the regression weight $w$ and enables SWORE to be robust to the random fluctuations that exist in brain dynamics.

## 4 Efficient Online Updating Strategy

As we discussed in section 2.2, brain dynamics are nonstationary. If the SWORE model cannot be updated, it would suffer from low generalization performance. Therefore, in this section, we introduce an efficient online updating strategy for it. It can update SWORE with high accuracy while introducing marginal computation cost.

### 4.1 Bayesian Moment Matching

Bayesian moment matching (BMM) is used to estimate the model parameters. Specifically, it estimates the parameters of the approximated posterior by matching a set of sufficient moments of the exact complex posterior. Moreover, it can be extended to the sequential update paradigm for large-scale or streaming data sets such as onlineBMM (Jaini et al., 2017). That is, the approximated posterior is updated with each sample, rather than the entire data set, each time.

The main issue with equation 4.1 is that the joint posterior distribution $P(w,\pi |y,\Delta xn)$ is complicated or even intractable. To keep the computation tractable, we adopt the mean-filed assumption and project the posterior into the same form with the prior (product of a Normal with betas, that is, $P(w,\pi |y,\Delta xn)\u2248q(w)q(\pi )=N(w|\mu ,\Sigma )\u220fn=1NBeta(\pi n|\alpha n,\beta n)$). Then the posterior parameters are estimated by matching a set of sufficient moments of the approximate posterior with the exact posterior:

- •
Match the moments between $q(w)$ and $P(w|y,\Delta xn):\u222bwq(w)dw$$=\u222bwP(w|y,\Delta xn)dw$ and $\u222bwwTq(w)dw=\u222bwwTP(w|y,\Delta xn)dw$. Due to the nonconjugation between the marginalized likelihood $P(y|w,\Delta xn)$

^{5}and the normal prior $N(w|\mu ,\Sigma )$, the posterior $P(w|y,\Delta xn)$ is complex. Therefore, the posterior parameters $(\mu new,\Sigma new)$ cannot be computed analytically because of the intractability of the integrals in the moment constraints. - •
Match the moments between $q(\pi )$ and $P(\pi |y,\Delta xn):\u222b\pi nq(\pi )d\pi $$=\u222b\pi nP(\pi |y,\Delta xn)d\pi $ and $\u222b\pi n2q(\pi )d\pi =\u222b\pi n2P(\pi |y,\Delta xn)d\pi $, $n=1,2,\u2026,N$. Fortunately, we can solve the moment constraints with closed-form integrals and get the posterior parameters $(\alpha nnew,\beta nnew),\u2200n=1,2,\u2026,N$ accordingly.

### 4.2 Generalized Bayesian Moment Matching

Inspired by the Bayesian approximation method proposed by Weng and Lin (2011), which extended Stein's lemma (Woodroofe, 1989), we propose to estimate the posterior parameters $(\mu new,\Sigma new)$ of the approximate posterior $q(w)$ by differential operations instead of integral operations. Therefore, the BMM algorithm is extended to a general situation where the likelihood function is twice differentiable.

**Theorem 1.**

We set $w=\mu $ as we expect that the posterior density of $w$ to be concentrated on $\mu $ (Weng & Lin, 2011). See the appendix for the detailed proof of theorem ^{6}.

In the following, we resort to the generalized Bayesian moment matching (GBMM) method to estimate the posterior parameters. We take the brain dynamic preference ($xtn,xt+1n$) at the shaking state $y\u2208Y1$ as an example. The equations can be easily extended to the brain dynamic preference at the steady state.

### 4.3 Online GBMM for the Calibration of SWORE

According to the analysis, we summarize an online GBMM for SWORE in algorithm 1. It is notable that both the weight update and channel reliability update can be completed following the analytic rules (see equations 4.3, 4.4, 4.8a, and 4.8b). As a result of the efficient posterior updating procedure, online GBMM enables SWORE naturally to handle streaming preferences.

## 5 Online Mental Fatigue Evaluation

In this section, we apply the SWORE model (see equation 3.6) to perform online mental fatigue monitoring. First, we introduce data augmentation tricks to address the low data volume in the online scenario. Then we propose to maintain a brain dynamic table (BDtable), which sequentially stores the representative EEG signals. Finally, we summarize the entire framework for online mental fatigue evaluation.

### 5.1 Blank-Out Noise Model for Data Augmentation

^{6}For simplicity, we are going to focus on the blank-out noise model (a.k.a dropout) as the corrupting distribution, which randomly omits subsets of neurons (or features)—more precisely,

Note that each dimension of the input $\Delta xn$ is corrupted independently. Equation 5.1 is also a promising technique to break up the complex coadaptations caused by high correlation among different dimensions of the EEG signals (in either the time or frequency domain). Since the presence of any particular dimension is unreliable, a dimension cannot rely on other specific dimensions to correct its mistakes. It must perform well in a wide variety of contexts provided by the other dimensions.

### 5.2 Online Reservoir Sampling for BDtable

Our SWORE model requires brain dynamics-related preferences, which are constructed using current EEG signals and previously observed ones, for an update. Accordingly, brain dynamic table (BDtable) is introduced to store the EEG signals, which can help to calibrate our evaluations and guide the model updating process. Considering the requirement of the high computational efficiency in online applications, BDtable should provide a good summary of previous EEG signals.

Since no prior knowledge about each subject is available, we propose to build BDtable with random sampling: each element of the BDtable is uniformly sampled from the EEG signals seen so far. In particular, reservoir sampling is proven to meet the requirement for BDtable (Vitter, 1985). It is then carried out to sequentially maintain the BDtable following algorithm 2, where $S$ denotes the number of BDtable.

### 5.3 The Framework for Online Mental Fatigue Evaluation

Assume the SWORE model ${w,\pi 1:N}$^{7} and the BDtable ${xi1:N,RTi}i=1:S$ are updated to time $t-1$ following algorithms 1 and 2, respectively. Online mental fatigue monitoring refers to predicting $RTt$ with the EEG signals $xt1:N$, extracted at time $t$, using the up-to-date SWORE model and BDtable.

The framework of online mental fatigue evaluation is summarized in Figure 3. In the first $S$ trials, we build the BDtable with the S EEG signals and their corresponding RTs and then initialize SWORE. For a newly collected EEG signal $xt1:N$, the SWORE model conducts the indirect mental fatigue evaluation by giving a coarse estimation of $RTt$ following equation 5.4. When the reaction time $RTt$ is available, we calibrate the SWORE model following algorithm 1 and online update the BDtable following algorithm 2.

### 5.4 Complexity Analysis of the Framework

In this section, we analyze the space complexity and computational complexity of our framework. In particular, let $d$, $S$, and $N$ denote the dimension of the feature vector, the size of the BDtable, and the number of channels, respectively. Meanwhile, we use $T$ to denote the number of data augmentations in equation 5.1 and $M$ to denote the number of sequential trials.

The storage of the online system consists of two parts: $O(d+N)$ for the model parameters $(\mu ,\Sigma ,{\alpha n,\beta n}n=1N)$^{8} and $O(SNd)$ for the BDtable ${xi1:N,RTi}i=1:S$. Therefore, the overall space complexity is $O(SNd)$.

We analyze the complexity of the online system from the aspects of prediction and calibration, respectively:

- •
**Prediction complexity.**The predication consists of two steps: equations 5.3 and 5.4. The computational complexity is $O(SNd)$ for equation 5.3 and $O(SNlogS)$ for equation 5.4. Therefore, the overall computational complexity of predication for $M$ sequential trials is $O(MSN(d+logS))$. - •
**Calibration complexity.**According to algorithm 1, the most time-consuming step is the matrix multiplication in equation 4.4 with $O(d3)$ time complexity. Therefore, the overall time complexity for $M$ calibration steps is $O(MTSNd3)$. It would decrease to $O(MTSNd)$ if a diagonal covariance matrix is adopted.

According to our analysis, the proposed online fatigue monitoring system is both space and time efficient since they are both lineary related to each factor.

## 6 Numerical Experiments

In this section, we first introduce the experimental setup of mental fatigue monitoring. Then we explore the reliability of our SWORE model in online mental fatigue evaluation tasks. We also analyze the parameter sensitivity and the model uncertainty of SWORE with regard to the proposed online GBMM algorithm.

### 6.1 Experiment Setup

#### 6.1.1 Data Collection

This letter uses the EEG data introduced in Huang, Jung, and Makeig (2009). Forty healthy male adults aged 20 to 30 years were recruited to participate in the sustained-attention driving experiment in a virtual driving simulating environment (see Figure 4). All subjects participated in the sustained-attention driving experiment for 90 minutes, beginning between 1:00 p.m. and 2:00 p.m. At the beginning of the experiment, a 5 minute pretest was performed to ensure that every subject understood the instructions and did not suffer from simulator-induced nausea. During this sustained attention driving task, the experimental paradigm simulated a nighttime driving situation on a four-lane highway, and lane changing was randomly triggered to make the car drift from the original cruising lane toward the left or the right. Each participant was instructed to quickly compensate by steering the wheel. A complete trial in this study, including a 1 s baseline, deviation onset, response onset, and response offset, is shown in Figure 4. EEG signals were recorded simultaneously. The next trial occurs within an interval of 5 s to 10 s after the completion of the current trial in which the subject has to drive back to the centerline of the third car lane. If a subject fell asleep during the experiment, there was no feedback to alert him. For each trial $t$, the 10 s EEG signals ${xn,t}n=1N$ from $N$(=33)^{9} different EEG channels before the deviation onset were recorded simultaneously, and the corresponding reaction time $RTt$ was collected.

A wired EEG cap with 33 Ag/AgCl electrodes, including 30 EEG electrodes, 2 reference electrodes (A1 and A2), and 1 vehicle position channel (VP), was used to record the electrical activity of the brain from the scalp during the driving task. The EEG electrodes were placed according to a modified international $10-20$ system. The contact impedance between all electrodes and the skin was kept below 5 k. The EEG recordings, amplified by the Scan SynAmps2 Express system (Compumedics Ltd., VIC, Australia), were digitized at 500 Hz (resolution: 16 bits). Before data analysis, the raw EEG data were preprocessed. First, we used a digital bandpass ($1--50$ Hz) zero-phase FIR filter (the eegfilt.m routine from the EEGLAB toolbox) to remove the power line noise and low-frequency drift. Then the signals were downsampled to 250 Hz to reduce the volume of data. Finally, we did a manual removal of some artifacts such as random and persistent disturbance from body motion, eye movement, eye blinking, muscle activity, EEG channel malfunction, and environmental noise.

#### 6.1.2 Data Preprocessing and Preferences Construction

Following Huang, Pal, Chuang, and Lin (2015) and Pan et al. (2020), we preprocessed the EEG signals as follows. Considering the time delay among the channels in the time domain, Fourier transforms (Welch, 1967) were applied to EEG signals to transform time series into the frequency domain. Further, to avoid overhead computation, EEG power within 0 to 30 Hz was selected, which is considered to be the most relevant to the RTs (Huang et al., 2015).

Two types of preferences were constructed following Pan et al. (2020). The shaking-state preferences $Y1$ were constructed with RT comparisons $(RT0,RT1)$, namely ($RT0<RT1$), where $RT0<min(RT0+\tau 1,\tau 2*RT0)<RT1$, and vice versa for $RT0>RT1$. The steady-state preferences $Y2$ were constructed with RT comparisons $(RT0,RT1)$, namely ($RT0<RT1$), which satisfies $RT0<RT1<min(RT0+\tau 3,\tau 4*RT0)$, and vice versa when $RT0>RT1$. It is notable that $\tau 1>\tau 3>0$ and $\tau 2>\tau 4>1$ control the sensitivity of mental fatigue evaluation. We empirically set $\tau 1=0.15;\tau 2=1.2;\tau 3=0.1;\tau 4=1.1$ for all participants in our experiment.

In terms of the scenarios where RT is not available, other well-studied power spectral features can be adopted as fatigue indicators. For example, dynamic time warping used in Wang et al. (2018) and Bose et al. (2019) proved to be consistent with RT and can also be adopted for constructing the brain dynamics preferences.

#### 6.1.3 Evaluation Metric

$ACC\u2208[0,1]$ denotes the consistency between the prediction and the ordering of the response time—the higher the better. $ACC=1$ means the learning model can correctly capture the level of current mental fatigue based on the reference EEG signals. $ACC=0.5$ means the learning model can capture nothing about the level of current mental fatigue. $ACC=0$ means the learning model can capture the level of current mental fatigue based on the reference EEG signals but in the reverse order. Ranking-based metric ACC is a relaxation of mean absolute error (MAE). ACC is more robust to extreme RTs and local perturbation around RT compared to MAE. The optima status derived by MAE is also the optima for ACC, but not vice versa. Therefore, achieving the optima of ACC should be much easier than that of MAE.

#### 6.1.4 Baselines

We consider only the data-driven mental fatigue evaluation approaches in previous literature. Among the regression methods, support vector regression (SVR) (Bose et al., 2019) and neural network-based regression (Pan et al., 2020) been verified achieving superior performance. Since neural network-based regression requires very large samples for training, we consider SVR (Bose et al., 2019) because a small number of samples is sufficient for training. In terms of classification methods, CArank (Pan et al., 2020), which requires large numbers of training samples, is not suitable for our online scenario. Alternatively, we consider support vector machine (SVM) and random forest, and logistics ordinal regression (LOR). LOR is a special case of SWORE that models only the shaking state ($Y1$).

All baselines—SVR, SVM, Random Forest, LOR, and our SWORE—were implemented with Matlab. In particular, we adopt the RBF kernel for SVR and SVM following Bose et al. (2019). Since SVR SVM, Random Forest, and LOR have no mechanism to evaluate the channel state beforehand, we simply concatenate the EEG signals from all channels as the feature vector. It achieved better performance than aggregating the output from different channels using majority voting. For all the methods, we use the first 20 trials for pretraining. We fixed SVR, SVM, and Random Forest after pretraining since no efficient online calibration methods are available. LOR and SWORE can be efficient online calibrated with the update strategy introduced in section 4. The size of BDtable is set to 10 for LOR and SWORE—that is, $S=10$, in algorithm 2. For a fair comparison, we calculate the prediction accuracy of all methods regarding each trial using the same dynamic-updated BDtable, respectively. Note two kinds of LOR: LOR, denoting LOR without online calibration, and online LOR, denoting LOR with online calibration, are considered for better comparison.

### 6.2 Comparison with Offline Regression/Classification Methods

Following the online mental fatigue evaluation framework proposed in Figure 3, we explored the reliability of SWORE in the online monitoring scenario. Specifically, we leveraged the prerecorded 20 trials to pretrain the embryonic SVR, SVM, Random Forest, LOR, and SWORE, respectively. In terms of SWORE, we randomly initialized $\mu $ in $[-10-2,10-2]$ and $\Sigma $ in $[0,10-4\xd7I]$ and set $\alpha n=\beta n=5$ according to our parameter sensitivity analysis in section 6.5. The data augmentation size $T$ is set to 1 during pretraining and 3 during online updating according to Figure 10 since we encountered a sufficient and an insufficient scenario, respectively. The brain dynamics table size is fixed at 10. Therefore, we sequentially get the coarse estimation of RT for each new trial, determine the prediction accuracy, and update the SWORE model when the truth RT is available. We ran our SWORE model following the procedure in Figure 3 100 times and recorded the prediction accuracy for each EEG signal accordingly.

Method . | Average Prediction Accuracy . | |
---|---|---|

Offline Regression | SVR | $69.1\xb10.36%$ |

Offline Classification | SVM | $67.4\xb10.26%$ |

Random Forest | $67.7\xb10.39%$ | |

LOR | $63.1\xb10.35%$ | |

Online Classification | LOR | $72.6\xb10.33%$ |

SWORE | $76.0\xb10.30%$ |

Method . | Average Prediction Accuracy . | |
---|---|---|

Offline Regression | SVR | $69.1\xb10.36%$ |

Offline Classification | SVM | $67.4\xb10.26%$ |

Random Forest | $67.7\xb10.39%$ | |

LOR | $63.1\xb10.35%$ | |

Online Classification | LOR | $72.6\xb10.33%$ |

SWORE | $76.0\xb10.30%$ |

Notes: We run each baseline independently 100 times and calculate the mean and 95% confidence interval. The best result is in bold.

From Table 1, we can find that the offline regression method, that is, SVR, achieves higher prediction accuracy than offline classification methods, that is, Random Forest and LOR. It is consistent with the result in section 6.4, which makes sense since a classification task is easier to overfit on a small data set than a regression task. We also find that online calibration is necessary for reliable mental fatigue evaluation. Both online methods, SWORE and LOR, achieve significant improvement over the offline regression/classification tasks. And our SWORE achieves the highest prediction performance among all baselines.

### 6.3 Online Mental Fatigue Evaluation on One Participant

According to the experimental results in section 6.2, the offline regression method, SVR, achieves superior prediction accuracy to the offline classification methods: SVM, Random Forest, and offline LOR. In the following, we only consider the comparison between the offline regression method, SVR, and online classification methods, online LOR and SWORE. In particular, we conducted more detailed comparisons following the experimental setting in section 6.2.

#### 6.3.1 PDF and CDF of Online Prediction Accuracy

We estimated the probability density function (PDF) and cumulative distribution function (CDF) regarding the prediction accuracy of each EEG signal (see Figure 5).

From Figure 5, we can observe first that SWORE gives the most reliable evaluation ($76%$ average prediction accuracy) with the lowest variance for any new EEG signal, compared to LOR ($72.6%$) and SVR ($69.1%$). Second, in terms of half of the samples ($y=0.5$), SWORE gives a prediction accuracy of more than $77%$ compared to $75%$ for LOR and $68%$ for SVR. Third, SWORE gives a prediction accuracy of more than $x=70%$ for $65%$ ($1-y3$) of samples. SVR and LOR are much worse: only $60%$ ($1-y2$) and $48%$ ($1-y1$) of samples can be predicted, respectively, when requiring a prediction accuracy of more than $x=70%$.

Equal Means without Assuming Equal Variances . | Test Decision . | $p$-value . |
---|---|---|

SWORE versus LOR | Reject | 2.38e-42 |

SWORE versus SVR | Reject | 1.039e-153 |

LOR versus SVR | Reject | 1.03e-37 |

Equal Means without Assuming Equal Variances . | Test Decision . | $p$-value . |
---|---|---|

SWORE versus LOR | Reject | 2.38e-42 |

SWORE versus SVR | Reject | 1.039e-153 |

LOR versus SVR | Reject | 1.03e-37 |

To further demonstrate our claim, we conducted a two-sample $t$-test under the assumption of equal means without assuming equal variances using the ttest2 function in Matlab. In particular, we conducted testing between any two of SWORE, LOR, and SVR at the $5%$ significance level. The comparison results are listed in Table 2.

All test results in Table 2 indicate that $t$-test rejects all three null hypotheses at the $5%$ significance level without assuming equal variances. There is a significant difference between the results of any two of SWORE, LOR, and SVR.

#### 6.3.2 Showcase of Online Mental Fatigue Evaluation

To give an intuitive comparison, we calculated the prediction accuracy of each trial for one random experiment with three methods and then plotted the performance improvement of SWORE and SVR compared to LOR (see Figure 6).

Figure 6 shows that SWORE consistently achieves superior or at least comparable performance compared to LOR and SVR, which demonstrates our claim that channel reliability indeed affects the efficacy of the learning model. In terms of (regression-based) SVR, it suffers from high generalization errors for new EEG signals compared to (classification-based) SWORE and LOR. And SWORE achieves an average prediction accuracy of $80.6%$ for each trial in one random experiment, which is higher than that of LOR ($75%$) and SVR (70.5%).

#### 6.3.3 Channel Reliability Estimation

Following our analysis in remark ^{3}, the model parameter $\pi n$ reveals the reliability of the $n$th channel, $n=1,2,\u2026,N$. Therefore, we visualize the estimated $\pi n$ for each channel in Figure 8. Note that the 33-channel EEG data in our experiment consist of 30 EEG channels, 2 reference channels (A1 and A2), and the vehicle position channel, VP. We also visualized the relative contribution of each channel via the 30-channel layout of Topoplot in Figure 7. According to our analysis in remark ^{5}, both positive and negative channels are considered informative and contribute equally to SWORE. We introduce a new metric, R ($R=2*|\pi -0.5|$), to denote the contribution of each channel with regard to the SWORE model. Similar to $\pi $, $R$ ranges between 0 and 1, but a higher value ($\u21971$) indicates that the corresponding EEG channel is more informative.

From Figures 7 and 8, we find that the relative contributions of different channels are different, which is consistent with our motivation that different regions of the human brain perform different functions. We also find that majority of channels ($29/33$) are considered reliable. All three known nonrelevant channels—A1, A2, and VP—can be automatically detected and removed during the learning process (Pan et al., 2020).

#### 6.3.4 Model Reliability with Fewer Channels

We always prefer fewer EEG channels in the online scenario, which means lower computation cost, a small amount of storage, and minimal impact on drivers. Therefore, we explored the reliability of SWORE when fewer channels are available. According to Figure 8, we retrained SWORE with 5, 10, and 20 randomly selected reliable channels and compared them with the original SWORE model using all channels in Figure 9. We collected the prediction accuracy of all variants of SWORE using the same dynamic updating BDtable for a fair comparison.

Figure 9 shows that all SWORE variants achieve comparable prediction accuracy in terms of overall average for each trial. It proves that the superior performance of our SWORE model is barely affected when fewer EEG channels are used and that the reliable channels detected by SWORE are trustworthy.

### 6.4 Ablation Study for Exploring the Efficacy of Our Contributions

In this letter, we have made contributions from different perspectives. To be specific, C1 denotes the robust multichannel aggregation method in equation 3.4. C2 denotes the gradient flattening in equation 3.5. C3 denotes the efficient online updating strategy (i.e., online GBMM) in section 4. C4 denotes the blank-out noise mode for data augmentation in equation 5.1.

To explore the efficacy of these four contributions, we introduce five new baselines: LOR w/o C3: LOR without online calibration; SWORE w/o C3: SWORE without online calibration; SWORE w/o C2&C4: SWORE without modeling steady state and using data augmentation; SWORE w/o C2: SWORE without modeling steady-state; SWORE w/o C4: and SWORE without using data augmentation. We ran all methods 100 times on the first participant and calculated the mean and standard deviation in Table 3.

Method . | Calibration Option . | Average Prediction Accuracy . |
---|---|---|

LOR | w/o C3 | $63.1\xb10.35%$ |

Full | $72.6\xb10.33%$ | |

w/o C3 | $64.5\xb10.33%$ | |

w/o C2 | $75.7\xb10.30%$ | |

SWORE | w/o C4 | $74.6\xb10.32%$ |

w/o C2 & C4 | $74.1\xb10.32%$ | |

Full | $76.0\xb10.30%$ |

Method . | Calibration Option . | Average Prediction Accuracy . |
---|---|---|

LOR | w/o C3 | $63.1\xb10.35%$ |

Full | $72.6\xb10.33%$ | |

w/o C3 | $64.5\xb10.33%$ | |

w/o C2 | $75.7\xb10.30%$ | |

SWORE | w/o C4 | $74.6\xb10.32%$ |

w/o C2 & C4 | $74.1\xb10.32%$ | |

Full | $76.0\xb10.30%$ |

Notes: We run each baseline independently 100 times and calculate the mean and 95% confidence interval. The best result is in bold.

From Table 3, we can find that online calibration (C3) is necessary for reliable mental fatigue evaluation. SWORE and LOR achieve significant improvement (around $10%$) over their offline versions via adopting efficient online calibration, respectively. Comparing SWORE w/o C2&C4 to LOR, we find that the robust multichannel aggregation strategy (C1) enables higher predication accuracy by automatically eliminating the task-nonrelevant channels. In addition, the data augmentation (C4) and the gradient flattening (C2) are useful for improving model performance, comparing SWORE w/o C2 and SWORE w/o C4 to SWORE w/o C2&C4. Finally, the importance ranking of all four contributions on the first participant is C3$>$C1$>$C4$>$C2.

### 6.5 Offline Analysis of Parameter Sensitivity and Model Uncertainty

In this section, we explore the parameter sensitivity of SWORE with regard to the hyperparameters $(\mu ,\Sigma )$ and $(\alpha ,\beta )$. In particular, we generated the offline brain dynamic preferences as follows: the trials of each participant were randomly divided into two parts—$50%$ for training and $50%$ for testing—and offline brain dynamic preferences were constructed according to the pairwise comparisons between the RTs regardless of their sequential property.

#### 6.5.1 Sensitivity Analysis with Regard to Hyperparameters $(\mu ,\Sigma )$

For simplicity, we considered the diagonal covariance matrix here. Specifically, we randomly initialized $\mu $ in $[-10-a,10-a]$ and $\Sigma $ in $[0,10-bI]$. The values of $a$ and $b$ are set within ${0,2,4}$, respectively. Further, we adopted a noninformative prior for $\pi n$, namely, $\alpha n=\beta n=5$, to eliminate the effects of noisy channels. The data augmentation size $T$ is set to 1 since the training data are sufficient. The testing performances of SWORE under different parameter setting are presented in Table 4.

. | $(a,b)$ with $\mu =10-a$ and $\Sigma =10-bI$. $(\alpha ,\beta )$ fixed to (5,5) . | $(\alpha ,\beta )$ with $(\mu ,\Sigma )$ fixed to $(10-2,10-4)$ . | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Test ACC . | (0,0) . | (2,0) . | (4,0) . | (0,2) . | (2,2) . | $(4,2)$ . | $(0,4)$ . | (2,4) . | $(4,4)$ . | (1,1) . | (1,3) . | (1,5) . | (3,1) . | (3,3) . | $(3,5)$ . | (5,1) . | (5,3) . | (5,5) . |

P1 | 50.00 | 50.00 | 79.89 | 77.23 | 78.29 | 78.75 | 50.00 | 78.71 | 79.28 | 78.48 | 78.26 | 78.29 | 78.71 | 78.52 | 78.37 | 78.64 | 78.83 | 78.71 |

P2 | 50.00 | 50.00 | 50.00 | 79.55 | 81.48 | 83.00 | 50.00 | 82.03 | 81.85 | 81.62 | 82.13 | 82.09 | 82.03 | 81.75 | 82.17 | 82.04 | 81.98 | 82.03 |

P3 | 50.00 | 50.00 | 50.00 | 85.32 | 84.54 | 84.29 | 70.07 | 82.89 | 83.14 | 83.47 | 83.47 | 83.47 | 82.89 | 83.02 | 83.55 | 83.55 | 83.72 | 82.89 |

P4 | 50.00 | 50.00 | 50.00 | 72.16 | 76.45 | 75.72 | 50.00 | 73.26 | 73.26 | 73.34 | 73.65 | 73.68 | 73.26 | 73.42 | 73.65 | 73.38 | 73.46 | 73.26 |

P5 | 50.00 | 50.00 | 50.00 | 85.34 | 85.52 | 85.21 | 50.00 | 85.07 | 85.00 | 85.17 | 85.14 | 85.13 | 85.07 | 85.06 | 85.17 | 85.06 | 85.07 | 85.07 |

P6 | 50.00 | 50.00 | 50.00 | 86.76 | 83.25 | 84.76 | 50.00 | 84.56 | 84.31 | 84.23 | 84.31 | 84.31 | 84.56 | 84.64 | 84.40 | 85.13 | 85.05 | 84.56 |

P7 | 50.00 | 50.00 | 50.00 | 75.44 | 75.21 | 75.10 | 50.00 | 75.18 | 75.31 | 75.19 | 75.12 | 75.19 | 75.18 | 75.22 | 75.10 | 75.19 | 75.23 | 75.18 |

P8 | 50.00 | 50.00 | 50.00 | 84.38 | 84.06 | 84.10 | 50.00 | 84.70 | 84.62 | 84.30 | 84.54 | 84.5 | 84.70 | 84.66 | 84.46 | 84.61 | 84.53 | 84.70 |

P9 | 50.00 | 50.00 | 50.00 | 83.12 | 83.01 | 83.12 | 50.00 | 83.26 | 83.46 | 83.25 | 82.86 | 82.87 | 83.26 | 83.22 | 83.00 | 82.85 | 83.14 | 83.26 |

P10 | 50.00 | 50.00 | 50.00 | 79.17 | 88.00 | 88.51 | 50.00 | 89.50 | 89.47 | 89.14 | 88.56 | 88.56 | 89.50 | 89.21 | 88.44 | 88.43 | 88.41 | 89.50 |

P11 | 50.00 | 50.00 | 50.00 | 80.45 | 75.99 | 76.39 | 50.00 | 77.49 | 77.42 | 77.06 | 76.14 | 76.14 | 77.49 | 77.29 | 76.26 | 77.11 | 77.09 | 77.49 |

P12 | 50.00 | 50.00 | 50.00 | 79.97 | 80.18 | 80.28 | 50.00 | 80.09 | 80.09 | 80.13 | 80.09 | 80.13 | 80.09 | 80.13 | 80.09 | 79.94 | 79.94 | 80.09 |

P13 | 50.00 | 50.00 | 50.00 | 81.09 | 80.75 | 80.75 | 50.00 | 81.52 | 81.52 | 81.33 | 81.23 | 81.23 | 81.52 | 81.33 | 81.14 | 81.04 | 81.18 | 81.52 |

P14 | 50.00 | 50.00 | 50.00 | 50.00 | 78.58 | 78.65 | 50.00 | 80.03 | 80.07 | 79.70 | 80.00 | 80.03 | 80.03 | 79.70 | 80.03 | 79.46 | 79.49 | 80.03 |

P15 | 50.00 | 50.00 | 50.00 | 89.58 | 89.92 | 89.86 | 50.00 | 89.93 | 89.97 | 89.95 | 89.95 | 89.95 | 89.93 | 89.90 | 89.95 | 90.04 | 89.97 | 89.93 |

P16 | 50.00 | 50.00 | 50.00 | 73.02 | 72.75 | 72.59 | 50.00 | 72.41 | 72.17 | 72.44 | 72.37 | 72.37 | 72.41 | 72.42 | 72.33 | 72.50 | 72.42 | 72.41 |

P17 | 50.00 | 50.00 | 50.00 | 50.00 | 76.99 | 77.63 | 50.00 | 78.05 | 78.09 | 78.09 | 77.94 | 77.79 | 78.05 | 78.24 | 77.45 | 77.86 | 77.75 | 78.05 |

P18 | 50.00 | 50.00 | 50.00 | 50.00 | 78.03 | 85.38 | 50.00 | 89.36 | 93.52 | 88.38 | 87.82 | 87.79 | 89.36 | 88.88 | 88.00 | 87.31 | 87.92 | 89.36 |

P19 | 50.00 | 50.00 | 50.00 | 77.97 | 77.94 | 77.80 | 50.00 | 77.92 | 77.89 | 77.69 | 77.62 | 77.61 | 77.92 | 77.92 | 77.83 | 77.68 | 77.89 | 77.92 |

P20 | 50.00 | 50.00 | 50.00 | 80.78 | 79.96 | 79.80 | 50.00 | 80.32 | 80.48 | 80.29 | 80.29 | 80.28 | 80.32 | 80.29 | 80.30 | 80.33 | 80.34 | 80.32 |

P21 | 50.00 | 50.00 | 50.00 | 50.00 | 69.92 | 73.51 | 50.00 | 75.74 | 78.23 | 79.13 | 78.89 | 78.90 | 75.74 | 74.38 | 79.05 | 79.29 | 79.37 | 75.74 |

P22 | 50.00 | 50.00 | 50.00 | 78.42 | 77.96 | 78.08 | 50.00 | 78.05 | 78.22 | 77.96 | 77.95 | 77.96 | 78.05 | 77.99 | 77.98 | 77.95 | 77.96 | 78.05 |

P23 | 50.00 | 50.00 | 50.00 | 84.72 | 84.34 | 84.47 | 50.00 | 84.66 | 84.58 | 84.55 | 84.55 | 84.55 | 84.66 | 84.64 | 84.55 | 84.80 | 84.69 | 84.66 |

P24 | 50.00 | 50.00 | 50.00 | 80.08 | 80.12 | 80.11 | 50.00 | 80.04 | 80.09 | 79.98 | 79.99 | 79.99 | 80.04 | 80.01 | 79.98 | 80.12 | 80.09 | 80.04 |

P25 | 50.00 | 50.00 | 50.00 | 82.12 | 82.18 | 82.26 | 50.00 | 82.17 | 82.39 | 82.18 | 82.20 | 82.21 | 82.17 | 82.31 | 81.99 | 82.18 | 81.99 | 82.17 |

P26 | 50.00 | 50.00 | 50.00 | 86.71 | 86.52 | 86.55 | 50.00 | 86.61 | 86.63 | 86.61 | 86.61 | 86.61 | 86.61 | 86.60 | 86.61 | 86.59 | 86.60 | 86.61 |

P27 | 50.00 | 50.00 | 50.00 | 81.17 | 77.35 | 82.60 | 50.00 | 82.71 | 83.20 | 82.79 | 82.94 | 82.93 | 82.71 | 82.77 | 82.86 | 82.97 | 83.00 | 82.71 |

P28 | 50.00 | 50.00 | 50.00 | 85.36 | 64.69 | 85.40 | 50.00 | 85.34 | 83.73 | 85.37 | 85.34 | 85.29 | 85.34 | 85.31 | 85.33 | 85.23 | 85.32 | 85.34 |

P29 | 50.00 | 50.00 | 50.00 | 84.12 | 84.06 | 84.13 | 50.00 | 83.87 | 83.86 | 83.83 | 83.85 | 83.85 | 83.87 | 83.85 | 83.86 | 83.85 | 83.84 | 83.87 |

P30 | 50.00 | 50.00 | 50.00 | 50.00 | 82.30 | 84.32 | 50.00 | 84.27 | 84.40 | 84.07 | 84.08 | 84.14 | 84.27 | 84.19 | 84.03 | 84.08 | 84.08 | 84.27 |

P31 | 50.00 | 50.00 | 50.00 | 82.28 | 83.80 | 83.33 | 50.00 | 83.55 | 83.60 | 83.55 | 83.53 | 83.52 | 83.55 | 83.57 | 83.52 | 83.51 | 83.53 | 83.55 |

P32 | 50.00 | 50.00 | 50.00 | 84.54 | 86.02 | 85.19 | 50.00 | 85.69 | 86.69 | 85.66 | 86.62 | 86.56 | 85.69 | 85.65 | 86.54 | 86.46 | 86.51 | 85.69 |

P33 | 50.00 | 65.27 | 50.00 | 80.05 | 80.59 | 80.62 | 50.00 | 80.90 | 81.34 | 80.83 | 81.00 | 80.98 | 80.90 | 80.83 | 80.79 | 81.26 | 81.18 | 80.90 |

P34 | 50.00 | 50.00 | 69.92 | 87.27 | 86.98 | 87.37 | 50.00 | 87.65 | 87.65 | 87.47 | 87.65 | 87.65 | 87.65 | 87.59 | 87.62 | 87.40 | 87.47 | 87.65 |

P35 | 50.00 | 50.00 | 50.00 | 74.24 | 75.32 | 74.28 | 50.00 | 74.77 | 74.95 | 74.81 | 74.77 | 74.74 | 74.77 | 74.76 | 74.69 | 74.90 | 74.82 | 74.77 |

P36 | 50.00 | 50.00 | 50.00 | 86.17 | 85.58 | 85.42 | 50.00 | 85.55 | 85.58 | 85.50 | 85.55 | 85.55 | 85.55 | 85.52 | 85.50 | 85.47 | 85.52 | 85.55 |

P37 | 50.00 | 50.00 | 50.00 | 90.96 | 89.81 | 90.25 | 50.00 | 90.20 | 90.64 | 89.81 | 89.43 | 89.43 | 90.20 | 90.03 | 89.49 | 89.98 | 89.92 | 90.20 |

P38 | 50.00 | 50.00 | 50.00 | 90.30 | 90.06 | 90.14 | 50.00 | 90.52 | 90.40 | 90.48 | 90.28 | 90.28 | 90.52 | 90.52 | 90.28 | 90.44 | 90.48 | 90.52 |

P39 | 50.00 | 50.00 | 50.00 | 85.09 | 84.65 | 84.65 | 50.00 | 84.90 | 84.98 | 84.94 | 84.90 | 84.90 | 84.90 | 84.98 | 84.98 | 84.68 | 84.79 | 84.90 |

P40 | 50.00 | 50.00 | 50.00 | 75.80 | 75.90 | 75.86 | 50.00 | 75.93 | 75.96 | 75.93 | 75.93 | 75.92 | 75.93 | 75.92 | 75.91 | 75.86 | 75.89 | 75.93 |

. | $(a,b)$ with $\mu =10-a$ and $\Sigma =10-bI$. $(\alpha ,\beta )$ fixed to (5,5) . | $(\alpha ,\beta )$ with $(\mu ,\Sigma )$ fixed to $(10-2,10-4)$ . | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Test ACC . | (0,0) . | (2,0) . | (4,0) . | (0,2) . | (2,2) . | $(4,2)$ . | $(0,4)$ . | (2,4) . | $(4,4)$ . | (1,1) . | (1,3) . | (1,5) . | (3,1) . | (3,3) . | $(3,5)$ . | (5,1) . | (5,3) . | (5,5) . |

P1 | 50.00 | 50.00 | 79.89 | 77.23 | 78.29 | 78.75 | 50.00 | 78.71 | 79.28 | 78.48 | 78.26 | 78.29 | 78.71 | 78.52 | 78.37 | 78.64 | 78.83 | 78.71 |

P2 | 50.00 | 50.00 | 50.00 | 79.55 | 81.48 | 83.00 | 50.00 | 82.03 | 81.85 | 81.62 | 82.13 | 82.09 | 82.03 | 81.75 | 82.17 | 82.04 | 81.98 | 82.03 |

P3 | 50.00 | 50.00 | 50.00 | 85.32 | 84.54 | 84.29 | 70.07 | 82.89 | 83.14 | 83.47 | 83.47 | 83.47 | 82.89 | 83.02 | 83.55 | 83.55 | 83.72 | 82.89 |

P4 | 50.00 | 50.00 | 50.00 | 72.16 | 76.45 | 75.72 | 50.00 | 73.26 | 73.26 | 73.34 | 73.65 | 73.68 | 73.26 | 73.42 | 73.65 | 73.38 | 73.46 | 73.26 |

P5 | 50.00 | 50.00 | 50.00 | 85.34 | 85.52 | 85.21 | 50.00 | 85.07 | 85.00 | 85.17 | 85.14 | 85.13 | 85.07 | 85.06 | 85.17 | 85.06 | 85.07 | 85.07 |

P6 | 50.00 | 50.00 | 50.00 | 86.76 | 83.25 | 84.76 | 50.00 | 84.56 | 84.31 | 84.23 | 84.31 | 84.31 | 84.56 | 84.64 | 84.40 | 85.13 | 85.05 | 84.56 |

P7 | 50.00 | 50.00 | 50.00 | 75.44 | 75.21 | 75.10 | 50.00 | 75.18 | 75.31 | 75.19 | 75.12 | 75.19 | 75.18 | 75.22 | 75.10 | 75.19 | 75.23 | 75.18 |

P8 | 50.00 | 50.00 | 50.00 | 84.38 | 84.06 | 84.10 | 50.00 | 84.70 | 84.62 | 84.30 | 84.54 | 84.5 | 84.70 | 84.66 | 84.46 | 84.61 | 84.53 | 84.70 |

P9 | 50.00 | 50.00 | 50.00 | 83.12 | 83.01 | 83.12 | 50.00 | 83.26 | 83.46 | 83.25 | 82.86 | 82.87 | 83.26 | 83.22 | 83.00 | 82.85 | 83.14 | 83.26 |

P10 | 50.00 | 50.00 | 50.00 | 79.17 | 88.00 | 88.51 | 50.00 | 89.50 | 89.47 | 89.14 | 88.56 | 88.56 | 89.50 | 89.21 | 88.44 | 88.43 | 88.41 | 89.50 |

P11 | 50.00 | 50.00 | 50.00 | 80.45 | 75.99 | 76.39 | 50.00 | 77.49 | 77.42 | 77.06 | 76.14 | 76.14 | 77.49 | 77.29 | 76.26 | 77.11 | 77.09 | 77.49 |

P12 | 50.00 | 50.00 | 50.00 | 79.97 | 80.18 | 80.28 | 50.00 | 80.09 | 80.09 | 80.13 | 80.09 | 80.13 | 80.09 | 80.13 | 80.09 | 79.94 | 79.94 | 80.09 |

P13 | 50.00 | 50.00 | 50.00 | 81.09 | 80.75 | 80.75 | 50.00 | 81.52 | 81.52 | 81.33 | 81.23 | 81.23 | 81.52 | 81.33 | 81.14 | 81.04 | 81.18 | 81.52 |

P14 | 50.00 | 50.00 | 50.00 | 50.00 | 78.58 | 78.65 | 50.00 | 80.03 | 80.07 | 79.70 | 80.00 | 80.03 | 80.03 | 79.70 | 80.03 | 79.46 | 79.49 | 80.03 |

P15 | 50.00 | 50.00 | 50.00 | 89.58 | 89.92 | 89.86 | 50.00 | 89.93 | 89.97 | 89.95 | 89.95 | 89.95 | 89.93 | 89.90 | 89.95 | 90.04 | 89.97 | 89.93 |

P16 | 50.00 | 50.00 | 50.00 | 73.02 | 72.75 | 72.59 | 50.00 | 72.41 | 72.17 | 72.44 | 72.37 | 72.37 | 72.41 | 72.42 | 72.33 | 72.50 | 72.42 | 72.41 |

P17 | 50.00 | 50.00 | 50.00 | 50.00 | 76.99 | 77.63 | 50.00 | 78.05 | 78.09 | 78.09 | 77.94 | 77.79 | 78.05 | 78.24 | 77.45 | 77.86 | 77.75 | 78.05 |

P18 | 50.00 | 50.00 | 50.00 | 50.00 | 78.03 | 85.38 | 50.00 | 89.36 | 93.52 | 88.38 | 87.82 | 87.79 | 89.36 | 88.88 | 88.00 | 87.31 | 87.92 | 89.36 |

P19 | 50.00 | 50.00 | 50.00 | 77.97 | 77.94 | 77.80 | 50.00 | 77.92 | 77.89 | 77.69 | 77.62 | 77.61 | 77.92 | 77.92 | 77.83 | 77.68 | 77.89 | 77.92 |

P20 | 50.00 | 50.00 | 50.00 | 80.78 | 79.96 | 79.80 | 50.00 | 80.32 | 80.48 | 80.29 | 80.29 | 80.28 | 80.32 | 80.29 | 80.30 | 80.33 | 80.34 | 80.32 |

P21 | 50.00 | 50.00 | 50.00 | 50.00 | 69.92 | 73.51 | 50.00 | 75.74 | 78.23 | 79.13 | 78.89 | 78.90 | 75.74 | 74.38 | 79.05 | 79.29 | 79.37 | 75.74 |

P22 | 50.00 | 50.00 | 50.00 | 78.42 | 77.96 | 78.08 | 50.00 | 78.05 | 78.22 | 77.96 | 77.95 | 77.96 | 78.05 | 77.99 | 77.98 | 77.95 | 77.96 | 78.05 |

P23 | 50.00 | 50.00 | 50.00 | 84.72 | 84.34 | 84.47 | 50.00 | 84.66 | 84.58 | 84.55 | 84.55 | 84.55 | 84.66 | 84.64 | 84.55 | 84.80 | 84.69 | 84.66 |

P24 | 50.00 | 50.00 | 50.00 | 80.08 | 80.12 | 80.11 | 50.00 | 80.04 | 80.09 | 79.98 | 79.99 | 79.99 | 80.04 | 80.01 | 79.98 | 80.12 | 80.09 | 80.04 |

P25 | 50.00 | 50.00 | 50.00 | 82.12 | 82.18 | 82.26 | 50.00 | 82.17 | 82.39 | 82.18 | 82.20 | 82.21 | 82.17 | 82.31 | 81.99 | 82.18 | 81.99 | 82.17 |

P26 | 50.00 | 50.00 | 50.00 | 86.71 | 86.52 | 86.55 | 50.00 | 86.61 | 86.63 | 86.61 | 86.61 | 86.61 | 86.61 | 86.60 | 86.61 | 86.59 | 86.60 | 86.61 |

P27 | 50.00 | 50.00 | 50.00 | 81.17 | 77.35 | 82.60 | 50.00 | 82.71 | 83.20 | 82.79 | 82.94 | 82.93 | 82.71 | 82.77 | 82.86 | 82.97 | 83.00 | 82.71 |

P28 | 50.00 | 50.00 | 50.00 | 85.36 | 64.69 | 85.40 | 50.00 | 85.34 | 83.73 | 85.37 | 85.34 | 85.29 | 85.34 | 85.31 | 85.33 | 85.23 | 85.32 | 85.34 |

P29 | 50.00 | 50.00 | 50.00 | 84.12 | 84.06 | 84.13 | 50.00 | 83.87 | 83.86 | 83.83 | 83.85 | 83.85 | 83.87 | 83.85 | 83.86 | 83.85 | 83.84 | 83.87 |

P30 | 50.00 | 50.00 | 50.00 | 50.00 | 82.30 | 84.32 | 50.00 | 84.27 | 84.40 | 84.07 | 84.08 | 84.14 | 84.27 | 84.19 | 84.03 | 84.08 | 84.08 | 84.27 |

P31 | 50.00 | 50.00 | 50.00 | 82.28 | 83.80 | 83.33 | 50.00 | 83.55 | 83.60 | 83.55 | 83.53 | 83.52 | 83.55 | 83.57 | 83.52 | 83.51 | 83.53 | 83.55 |

P32 | 50.00 | 50.00 | 50.00 | 84.54 | 86.02 | 85.19 | 50.00 | 85.69 | 86.69 | 85.66 | 86.62 | 86.56 | 85.69 | 85.65 | 86.54 | 86.46 | 86.51 | 85.69 |

P33 | 50.00 | 65.27 | 50.00 | 80.05 | 80.59 | 80.62 | 50.00 | 80.90 | 81.34 | 80.83 | 81.00 | 80.98 | 80.90 | 80.83 | 80.79 | 81.26 | 81.18 | 80.90 |

P34 | 50.00 | 50.00 | 69.92 | 87.27 | 86.98 | 87.37 | 50.00 | 87.65 | 87.65 | 87.47 | 87.65 | 87.65 | 87.65 | 87.59 | 87.62 | 87.40 | 87.47 | 87.65 |

P35 | 50.00 | 50.00 | 50.00 | 74.24 | 75.32 | 74.28 | 50.00 | 74.77 | 74.95 | 74.81 | 74.77 | 74.74 | 74.77 | 74.76 | 74.69 | 74.90 | 74.82 | 74.77 |

P36 | 50.00 | 50.00 | 50.00 | 86.17 | 85.58 | 85.42 | 50.00 | 85.55 | 85.58 | 85.50 | 85.55 | 85.55 | 85.55 | 85.52 | 85.50 | 85.47 | 85.52 | 85.55 |

P37 | 50.00 | 50.00 | 50.00 | 90.96 | 89.81 | 90.25 | 50.00 | 90.20 | 90.64 | 89.81 | 89.43 | 89.43 | 90.20 | 90.03 | 89.49 | 89.98 | 89.92 | 90.20 |

P38 | 50.00 | 50.00 | 50.00 | 90.30 | 90.06 | 90.14 | 50.00 | 90.52 | 90.40 | 90.48 | 90.28 | 90.28 | 90.52 | 90.52 | 90.28 | 90.44 | 90.48 | 90.52 |

P39 | 50.00 | 50.00 | 50.00 | 85.09 | 84.65 | 84.65 | 50.00 | 84.90 | 84.98 | 84.94 | 84.90 | 84.90 | 84.90 | 84.98 | 84.98 | 84.68 | 84.79 | 84.90 |

P40 | 50.00 | 50.00 | 50.00 | 75.80 | 75.90 | 75.86 | 50.00 | 75.93 | 75.96 | 75.93 | 75.93 | 75.92 | 75.93 | 75.92 | 75.91 | 75.86 | 75.89 | 75.93 |

Notes: The best parameter settings are in gray. Some parameter settings do not consistently perform very well and may fail on some participants (marked in bold).

Table 4 shows that SWORE consistently performs very well with testing accuracy greater than $70%$ on all participants under small initialization ($2<a,b<4$) for ($\mu ,\Sigma $). The SWORE model suffers from spurious overflow and underflow problems with large initializations at each updating step due to the high-dimension feature ($L=492$) and the exponential operator (within the sigmoid function). Also, although the performance of SWORE has minor differences for different participants, it is robust to the small initialization and shows comparable performance for the same participant under different initializations.

#### 6.5.2 Sensitivity Analysis with Regard to Hyperparameters $(\alpha ,\beta )$

To explore the effects of hyperparameter $(\alpha n,\beta n)$ with regard to the SWORE model, we randomly initialized $\alpha n,\beta n$ in ${1,3,5}$, respectively. We randomly initialized $\mu $ in $[-10-2,10-2]$ and $\Sigma $ in $[0,10-4\xd7I]$. The corrupting size $T$ was set to 1 as before. The performance of SWORE on the testing data is reported in Table 4.

It is worth noting that SWORE is insensitive to the initialization of hyperparameters $(\alpha ,\beta )$. In particular, it achieves comparable performance for each participant under different initializations of $(\alpha ,\beta )$. SWORE consistently performs very well on all 40 participants, regardless of the different initializations for $(\alpha ,\beta )$.

#### 6.5.3 Sensitivity Analysis with Regard to Data Augmentation Size $T$

According to Table 4, we randomly initialized $\mu $ in $[-10-2,10-2]$ and $\Sigma $ in $[0,10-4\xd7I]$, and we initialized hyperparameters $(\alpha ,\beta )$ to $(5,5)$. Then we collected the negative log-likelihood of brain dynamic preferences on training and test data set (see Figure 10) with data augmentation size $T$ being set to ${0,1,3,5}$, respectively. We show only the results of the first participant due to space concerns.

From Figure 10, we can observe that: (1) the SWORE model is prone to be overfitting on the original EEG signal, since the dimensions in the EEG signals (either in time domain or frequency domain) are closely related to each other. (See section 5.1 for more details.) We found as well that the feature corruption trick $(T=1)$ achieves the best performance compared to other settings, including the data augmentation methods $(T>1)$. The larger the data augmentation size $T$, the worse the generalization performance of SWORE is. It is interesting to note too that SWORE with data augmentation methods $(T>1)$ performs extremely well with only a few samples (less than $20%$ training data), but it starts overfitting when updated with more samples.

Here, we empirically analyzed the stability of the online GBMM algorithm. According to our sensitivity analysis with regard to hyperparameters $(\mu ,\Sigma )$ and $(\alpha ,\beta )$ (for both, see Table 4), we randomly initialized $\mu $ in $[-10-2,10-2]$, $\Sigma $ in $[0,10-4\xd7I]$. Further, we initialized hyperparameters $(\alpha ,\beta )$ to $(5,5)$. The corrupting size $T$ is set to 1. Then we repeated the online GBMM algorithm on the training data 20 times and summarized the prediction accuracy on the test data (see Figure 11).

#### 6.5.4 Stability Analysis of the Online GBMM Algorithm

It can be observed from Figure 11 that the test accuracies of each participant are quite stable in different runnings. In addition, SWORE consistently achieves high generalization performance (test accuracy above $80%$) on 26 of 40 participants with $95%$ confidence. Note that the performance of each participant can be improved by tailor-designed brain dynamic preferences for each participant.

### 6.6 Online Mental Fatigue Evaluation on Forty Participants

Following the online experiment setting in section 6.3, we explored the reliability of SWORE on 40 participants in the online monitoring scenario. Similarly, we leveraged the prerecorded 25 trials to pretrain the embryonic SWORE, LOR, and SVR models, respectively. Then we ran SWORE and other baselines independently 100 times and calculated the average prediction accuracy for all 40 participants in Table 5.

ACC . | P1 . | P2 . | P3 . | P4 . | P5 . | P6 . | P7 . | P8 . |
---|---|---|---|---|---|---|---|---|

SVR | 69.1 $\xb1$ 0.36 | 76.9 $\xb1$ 0.30 | 74.0 $\xb1$ 0.31 | 70.4 $\xb1$ 0.35 | 72.7 $\xb1$ 0.34 | 70.6 $\xb1$ 0.39 | 51.5 $\xb1$ 0.47 | 76.6 $\xb1$ 0.31 |

LOR | 72.6 $\xb1$ 0.33 | 78.3 $\xb1$ 0.31 | 73.1 $\xb1$ 0.37 | 73.8 $\xb1$ 0.33 | 73.7 $\xb1$ 0.36 | 74.5 $\xb1$ 0.32 | 75.1 $\xb1$ 0.33 | 74.8 $\xb1$ 0.32 |

SWORE | 76.0 $\xb1$ 0.30 | 79.9 $\xb1$ 0.29 | 76.5 $\xb1$ 0.31 | 75.9 $\xb1$ 0.31 | 76.7 $\xb1$ 0.31 | 74.6 $\xb1$ 0.33 | 78.3 $\xb1$ 0.32 | 76.9 $\xb1$ 0.30 |

ACC | P9 | P10 | P11 | P12 | P13 | P14 | P15 | P16 |

SVR | 69.7 $\xb1$ 0.36 | 71.0 $\xb1$ 0.37 | 73.7 $\xb1$ 0.37 | 74.4 $\xb1$ 0.33 | 73.7 $\xb1$ 0.32 | 72.0 $\xb1$ 0.35 | 74.2 $\xb1$ 0.36 | 45.8 $\xb1$ 0.44 |

LOR | 73.2 $\xb1$ 0.35 | 73.3 $\xb1$ 0.36 | 76.5 $\xb1$ 0.32 | 78.3 $\xb1$ 0.30 | 74.3 $\xb1$ 0.34 | 72.6 $\xb1$ 0.35 | 72.5 $\xb1$ 0.35 | 76.3 $\xb1$ 0.32 |

SWORE | 75.7 $\xb1$ 0.33 | 75.2 $\xb1$ 0.32 | 78.2 $\xb1$ 0.30 | 78.4 $\xb1$ 0.31 | 74.1 $\xb1$ 0.36 | 74.0 $\xb1$ 0.32 | 74.0 $\xb1$ 0.36 | 79.9 $\xb1$ 0.31 |

ACC | P17 | P18 | P19 | P20 | P21 | P22 | P23 | P24 |

SVR | 63.3 $\xb1$ 0.41 | 65.6 $\xb1$ 0.43 | 67.0 $\xb1$ 0.38 | 68.8 $\xb1$ 0.36 | 47.4 $\xb1$ 0.45 | 63.7 $\xb1$ 0.40 | 50.9 $\xb1$ 0.38 | 75.3 $\xb1$ 0.35 |

LOR | 79.8 $\xb1$ 0.29 | 74.4 $\xb1$ 0.34 | 77.6 $\xb1$ 0.32 | 73.7 $\xb1$ 0.36 | 76.6 $\xb1$ 0.34 | 73.1 $\xb1$ 0.34 | 84.2 $\xb1$ 0.27 | 84.6 $\xb1$ 0.24 |

SWORE | 80.0 $\xb1$ 0.29 | 76.7 $\xb1$ 0.31 | 79.0 $\xb1$ 0.31 | 76.7 $\xb1$ 0.35 | 76.4 $\xb1$ 0.32 | 75.2 $\xb1$ 0.35 | 84.0 $\xb1$ 0.29 | 84.2 $\xb1$ 0.25 |

ACC | P25 | P26 | P27 | P28 | P29 | P30 | P31 | P32 |

SVR | 71.7 $\xb1$ 0.34 | 72.8 $\xb1$ 0.32 | 75.2 $\xb1$ 0.33 | 69.3 $\xb1$ 0.39 | 72.6 $\xb1$ 0.36 | 79.0 $\xb1$ 0.32 | 70.1 $\xb1$ 0.39 | 39.1 $\xb1$ 0.41 |

LOR | 71.1 $\xb1$ 0.37 | 76.8 $\xb1$ 0.31 | 72.3 $\xb1$ 0.35 | 76.3 $\xb1$ 0.33 | 79.7 $\xb1$ 0.30 | 78.9 $\xb1$ 0.30 | 77.1 $\xb1$ 0.33 | 80.8 $\xb1$ 0.31 |

SWORE | 75.2 $\xb1$ 0.32 | 77.8 $\xb1$ 0.29 | 74.0 $\xb1$ 0.32 | 76.9 $\xb1$ 0.31 | 80.0 $\xb1$ 0.29 | 79.5 $\xb1$ 0.28 | 77.5 $\xb1$ 0.30 | 81.1 $\xb1$ 0.28 |

ACC | P33 | P34 | P35 | P36 | P37 | P38 | P39 | P40 |

SVR | 72.6 $\xb1$ 0.35 | 59.3 $\xb1$ 0.41 | 70.8 $\xb1$ 0.35 | 75.5 $\xb1$ 0.30 | 78.9 $\xb1$ 0.29 | 74.0 $\xb1$ 0.32 | 74.1 $\xb1$ 0.35 | 56.1 $\xb1$ 0.43 |

LOR | 74.5 $\xb1$ 0.33 | 80.2 $\xb1$ 0.28 | 77.1 $\xb1$ 0.31 | 74.9 $\xb1$ 0.31 | 79.1 $\xb1$ 0.28 | 79.4 $\xb1$ 0.29 | 79.5 $\xb1$ 0.29 | 76.5 $\xb1$ 0.33 |

SWORE | 77.7 $\xb1$ 0.31 | 80.9 $\xb1$ 0.29 | 77.3 $\xb1$ 0.31 | 76.3 $\xb1$ 0.31 | 79.8 $\xb1$ 0.28 | 81.0 $\xb1$ 0.28 | 80.6 $\xb1$ 0.28 | 78.3 $\xb1$ 0.32 |

ACC . | P1 . | P2 . | P3 . | P4 . | P5 . | P6 . | P7 . | P8 . |
---|---|---|---|---|---|---|---|---|

SVR | 69.1 $\xb1$ 0.36 | 76.9 $\xb1$ 0.30 | 74.0 $\xb1$ 0.31 | 70.4 $\xb1$ 0.35 | 72.7 $\xb1$ 0.34 | 70.6 $\xb1$ 0.39 | 51.5 $\xb1$ 0.47 | 76.6 $\xb1$ 0.31 |

LOR | 72.6 $\xb1$ 0.33 | 78.3 $\xb1$ 0.31 | 73.1 $\xb1$ 0.37 | 73.8 $\xb1$ 0.33 | 73.7 $\xb1$ 0.36 | 74.5 $\xb1$ 0.32 | 75.1 $\xb1$ 0.33 | 74.8 $\xb1$ 0.32 |

SWORE | 76.0 $\xb1$ 0.30 | 79.9 $\xb1$ 0.29 | 76.5 $\xb1$ 0.31 | 75.9 $\xb1$ 0.31 | 76.7 $\xb1$ 0.31 | 74.6 $\xb1$ 0.33 | 78.3 $\xb1$ 0.32 | 76.9 $\xb1$ 0.30 |

ACC | P9 | P10 | P11 | P12 | P13 | P14 | P15 | P16 |

SVR | 69.7 $\xb1$ 0.36 | 71.0 $\xb1$ 0.37 | 73.7 $\xb1$ 0.37 | 74.4 $\xb1$ 0.33 | 73.7 $\xb1$ 0.32 | 72.0 $\xb1$ 0.35 | 74.2 $\xb1$ 0.36 | 45.8 $\xb1$ 0.44 |

LOR | 73.2 $\xb1$ 0.35 | 73.3 $\xb1$ 0.36 | 76.5 $\xb1$ 0.32 | 78.3 $\xb1$ 0.30 | 74.3 $\xb1$ 0.34 | 72.6 $\xb1$ 0.35 | 72.5 $\xb1$ 0.35 | 76.3 $\xb1$ 0.32 |

SWORE | 75.7 $\xb1$ 0.33 | 75.2 $\xb1$ 0.32 | 78.2 $\xb1$ 0.30 | 78.4 $\xb1$ 0.31 | 74.1 $\xb1$ 0.36 | 74.0 $\xb1$ 0.32 | 74.0 $\xb1$ 0.36 | 79.9 $\xb1$ 0.31 |

ACC | P17 | P18 | P19 | P20 | P21 | P22 | P23 | P24 |

SVR | 63.3 $\xb1$ 0.41 | 65.6 $\xb1$ 0.43 | 67.0 $\xb1$ 0.38 | 68.8 $\xb1$ 0.36 | 47.4 $\xb1$ 0.45 | 63.7 $\xb1$ 0.40 | 50.9 $\xb1$ 0.38 | 75.3 $\xb1$ 0.35 |

LOR | 79.8 $\xb1$ 0.29 | 74.4 $\xb1$ 0.34 | 77.6 $\xb1$ 0.32 | 73.7 $\xb1$ 0.36 | 76.6 $\xb1$ 0.34 | 73.1 $\xb1$ 0.34 | 84.2 $\xb1$ 0.27 | 84.6 $\xb1$ 0.24 |

SWORE | 80.0 $\xb1$ 0.29 | 76.7 $\xb1$ 0.31 | 79.0 $\xb1$ 0.31 | 76.7 $\xb1$ 0.35 | 76.4 $\xb1$ 0.32 | 75.2 $\xb1$ 0.35 | 84.0 $\xb1$ 0.29 | 84.2 $\xb1$ 0.25 |

ACC | P25 | P26 | P27 | P28 | P29 | P30 | P31 | P32 |

SVR | 71.7 $\xb1$ 0.34 | 72.8 $\xb1$ 0.32 | 75.2 $\xb1$ 0.33 | 69.3 $\xb1$ 0.39 | 72.6 $\xb1$ 0.36 | 79.0 $\xb1$ 0.32 | 70.1 $\xb1$ 0.39 | 39.1 $\xb1$ 0.41 |

LOR | 71.1 $\xb1$ 0.37 | 76.8 $\xb1$ 0.31 | 72.3 $\xb1$ 0.35 | 76.3 $\xb1$ 0.33 | 79.7 $\xb1$ 0.30 | 78.9 $\xb1$ 0.30 | 77.1 $\xb1$ 0.33 | 80.8 $\xb1$ 0.31 |

SWORE | 75.2 $\xb1$ 0.32 | 77.8 $\xb1$ 0.29 | 74.0 $\xb1$ 0.32 | 76.9 $\xb1$ 0.31 | 80.0 $\xb1$ 0.29 | 79.5 $\xb1$ 0.28 | 77.5 $\xb1$ 0.30 | 81.1 $\xb1$ 0.28 |

ACC | P33 | P34 | P35 | P36 | P37 | P38 | P39 | P40 |

SVR | 72.6 $\xb1$ 0.35 | 59.3 $\xb1$ 0.41 | 70.8 $\xb1$ 0.35 | 75.5 $\xb1$ 0.30 | 78.9 $\xb1$ 0.29 | 74.0 $\xb1$ 0.32 | 74.1 $\xb1$ 0.35 | 56.1 $\xb1$ 0.43 |

LOR | 74.5 $\xb1$ 0.33 | 80.2 $\xb1$ 0.28 | 77.1 $\xb1$ 0.31 | 74.9 $\xb1$ 0.31 | 79.1 $\xb1$ 0.28 | 79.4 $\xb1$ 0.29 | 79.5 $\xb1$ 0.29 | 76.5 $\xb1$ 0.33 |

SWORE | 77.7 $\xb1$ 0.31 | 80.9 $\xb1$ 0.29 | 77.3 $\xb1$ 0.31 | 76.3 $\xb1$ 0.31 | 79.8 $\xb1$ 0.28 | 81.0 $\xb1$ 0.28 | 80.6 $\xb1$ 0.28 | 78.3 $\xb1$ 0.32 |

Notes: We run each baseline independently 100 times and calculate the mean and 95% confidence interval. The best results are in bold. Best results are in bold.

It can be observed from Table 5 that SWORE can give the most reliable evaluation with the lowest variance for the new EEG signal, compared to LOR and SVR. In particular, SWORE achieves the highest average prediction accuracy on 34 of 40 participants and comparable results on the rest participants. SWORE also achieves consistent reliable evaluations on different participants. There are 35 participants for SWORE whose average prediction accuracy is above $75%$, while there are 7 participants for SVR and 22 participants for LOR, respectively. The nononline method, SVR, is not trustworthy, since it does not consider the nonstationary properties of brain dynamics. There are 7 participants—P7, P16, P21, P23, P32, P34, and P40—on which SVR achieves an average prediction accuracy below $60%$. LOR and SWORE, equipped with efficient online calibration strategies, can consistently achieve an average prediction accuracy above $75%$ on the same participants.

## 7 Conclusion

This letter takes an initial step to calibrate prediction models on nonstationary brain dynamics. We proposed the self-weight ordinal regression (SWORE) model with brain dynamics table (BDtable) for online mental fatigue monitoring. SWORE can aggregate the information from multiple noisy channels based on the brain dynamic preferences, while BDtable is used to online calibrate the SWORE model utilizing a generalized Bayesian moment matching algorithm. Empirical results demonstrate that the proposed framework achieves significantly better performance than baseline approaches like SVR and LOR. As a direction for future research, we are committed to assessing the feasibility of performing the online mental-fatigue monitoring system with EEG signals and other mental fatigue indicators.

## Appendix A: Proof for Therorem 1

^{7}as a building block.

**Lemma 1**

Based on lemma ^{7}, we give the detailed proof in the following.

**Proof.**

^{7}. ② sets $z=0$. Such a substitution is reasonable as we expect that the posterior density of $z$ to be concentrated on 0. ③ follows the chain rule.

^{7}. ② sets $z=0$. Such a substitution is reasonable as we expect the posterior density of $z$ to be concentrated on 0. ④ follows the chain rule. We give the proof for ③ as follows,

## Appendix B: Second-Order Taylor Approximation for $EN(w|\mu ,\Sigma )[\sigma (wT\Delta xn)]$

## Appendix C: Posterior Moments of Beta Distribution

## Appendix D: The Updating Rules for Hyperparameter ($\alpha nnew,\beta nnew$)

## Acknowledgments

I.W.T is supported by ARC under grant DP180100106 and DP200101328. We thank Yinghua Yao, Peiyao Zhao, two anonymous reviewers, and the editor for helpful suggestions for this paper.

## Notes

^{1}

We adopted SVR due to its nonlinear properties and superior generalization performance on small training data set (Schölkopf, Smola, & Bach, 2018). SVR is implemented using the Libsvm with the parameter option -s 3 -t 2.

^{2}

The $y$-axis is in log scale. And the prediction discrepancy would be more significant in normal scale.

^{3}

We used the term “preference” intentionally to show that brain dynamics keep changing with regard to human behavior and it happens because the human brain prefers one decision over others.

^{4}

$\pi n\u21921-$ denotes $\pi n$ is up to approximate 1, while $\pi n\u21920+$ denotes $\pi n$ is down to approximate 0.

^{5}

$P(y|w,\Delta xn)=EBeta(\pi |\alpha ,\beta )[P(y|w,\pi ,\Delta xn)]$.

^{6}

Although the data augmentation procedure generates a corrupted data set with a larger size, the final computational cost (scaling linearly with $T$) is acceptable, benefiting from the efficient updating rules.

^{7}

The parameters ${w,\pi 1:N}$ are used to represent the SWORE model, since equation 3.6 is fully determined by ${w,\pi 1:N}$. We omit the subscript $t-1$ for convenience.

^{8}

$\Sigma $ is simplified to be a diagonal matrix in the experiment for simplicity.

^{9}

It consists of 30 EEG channels, 2 reference channels, and 1 vehicle position channel. We did not eliminate the 3 non-EEG channel beforehand to demonstrate that our SWORE can automatically remove this kind of noninformative EEG channel during the training.

## References

*IEEE Transactions on Biomedical Engineering*

*Neuroscience and Biobehavioral Reviews*

*IEEE Transactions on Cognitive and Developmental Systems*

*IEEE Journal of Biomedical and Health Informatics*

*Journal of Neuroscience*

*Brain-Computer Interfaces*

*Proceedings of the International Conference on Neural Information Processing*

*IEEE Transactions on Neural Systems and Rehabilitation Engineering*

*Toward brain-computer interfacing*

*Applied Ergonomics*

*Iranian Journal of Public Health*

*Deep learning*

*Brain-computer interfaces*

*Procedia Computer Science*

*Electroencephalography and Clinical Neurophysiology*

*Frontiers in Human Neuroscience*

*Proceedings of the International Conference on Foundations of Augmented Cognition*

*Applied Ergonomics*

*Proceedings of the International Conference on Learning Representations*

*Expert Systems with Applications*

*Transportation Research Part F: Traffic Psychology and Behaviour*

*Journal of Safety Research*

*Biomedical Signal Processing and Control*

*Journal of Medical and Biological Engineering*

*IEEE Transactions on Biomedical Circuits and Systems*

*IEEE Transactions on Neural Networks and Learning Systems*

*IEEE Transactions on Neural Networks and Learning Systems*

*Journal of Neuroscience Methods*

*Scientific Reports*

*International Journal of Applied Engineering Research*

*Neural Computation*

*Proceedings of the National Academy of Sciences*

*Journal of Machine Learning Research*

*Signal, Image and Video Processing*

*Proceedings of the 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society*

*Sensors*

*IEEE Transactions on Biomedical Engineering*

*Learning with kernels: Support vector machines, regularization, optimization, and beyond*

*Nature Neuroscience*

*Measurement Science Review*

*Sports Medicine*

*ACM Transactions on Mathematical Software*

*Cognitive Neurodynamics*

*IEEE Transactions on Intelligent Transportation Systems*

*NeuroImage*

*IEEE Transactions on Audio and Electroacoustics*

*Journal of Machine Learning Research*

*Annals of Statistics*

*Healthcare Technology Letters*

*Proceedings of the 20th International Conference on Machine Learning*

*PLOS One*

*Proceedings of the 31st*

*Annual*

*International ACM SIGIR Conference on Research and Development in Information Retrieval*