Introducing simple stimulus regularities facilitates learning of both simple and complex tasks. This facilitation may reflect an implicit change in the strategies used to solve the task when successful predictions regarding incoming stimuli can be formed. We studied the modifications in brain activity associated with fast perceptual learning based on regularity detection. We administered a two-tone frequency discrimination task and measured brain activation (fMRI) under two conditions: with and without a repeated reference tone. Although participants could not explicitly tell the difference between these two conditions, the introduced regularity affected both performance and the pattern of brain activation. The “No-Reference” condition induced a larger activation in frontoparietal areas known to be part of the working memory network. However, only the condition with a reference showed fast learning, which was accompanied by a reduction of activity in two regions: the left intraparietal area, involved in stimulus retention, and the posterior superior-temporal area, involved in representing auditory regularities. We propose that this joint reduction reflects a reduction in the need for online storage of the compared tones. We further suggest that this change reflects an implicit strategic shift “backwards” from reliance mainly on working memory networks in the “No-Reference” condition to increased reliance on detected regularities stored in high-level auditory networks.
The dynamics of perceptual learning, particularly its initial stages, are not well understood. Previous studies have focused on the specificity of learning to trained stimuli, which was shown to be consistent with the specificity of the sensory areas (Spang, Grimsen, Herzog, & Fahle, 2010; Van Wassenhove & Nagarajan, 2007; Amitay, Hawkey, & Moore, 2005; Seitz & Watanabe, 2005; Demany & Semal, 2002; Ahissar & Hochstein, 1993, 1996; Levi & Polat, 1996; Karni & Sagi, 1991). However, such specificity mainly characterizes later stages of learning, when some expertise had been obtained (Jeter, Dosher, Liu, & Lu, 2010; Ahissar & Hochstein, 1997; Karni & Sagi, 1993). Ahissar and Hochstein (Ahissar, Nahum, Nelken, & Hochstein, 2009; Ahissar & Hochstein, 1997, 2004) suggested that when finer resolution is required, perceptual learning may progress backwards along the perceptual hierarchy from crude generalizing representations to more local ones. This theory, termed the Reverse Hierarchy Theory, posits that perceptual learning is not limited to a specific brain site and progresses from high- to lower-level areas with practice. Nevertheless, it does not address the brain mechanisms underlying the very early stages of learning, when the task and its broad stimulus characteristics need to be sorted out. This initial stage is typically short and difficult to track and hence has rarely been studied, although it is probably crucial to subsequent learning dynamics (e.g., Ortiz & Wright, 2009; Hawkey, Amitay, & Moore, 2004; Karni, Jezzard, Adams, Turner, & Ungerleider, 1995).
One of the key features of the training procedure, particularly at the early training stages, is the consistency of stimuli across consecutive trials. Consistent training with similar stimuli leads to fast, condition-specific (Cohen, Daikhin, & Ahissar, 2013) learning (e.g., Otto, Herzog, Fahle, & Zhaoping, 2006), whereas training with a broad range of stimuli, whose sequence is not predictable, leads to slow learning (Parkosadze, Otto, Malaniya, Kezeli, & Herzog, 2008) if any (e.g., Kuai, Zhang, Klein, Levi, & Yu, 2005; Adini, Wilkonsky, Haspel, Tsodyks, & Sagi, 2004; Yu, Klein, & Levi, 2004). A very clear example of this dissociation was recently reported in the auditory modality for training on frequency (pitch) discrimination between sequentially presented tones. Whereas discrimination between tones whose frequency was randomly chosen from a broad frequency range improved slowly (within hundreds of trials), substantial and fast improvement was achieved when the first tone in a pair had a fixed frequency (Nahum, Daikhin, Lubin, Cohen, & Ahissar, 2010). This rapid improvement was attributed to the ability to form effective predictions for the incoming stimuli when training with stimuli that obeyed a simple regularity (Ahissar et al., 2009; Ahissar & Hochstein, 2004). Here we inquired whether the impact of introducing simple regularities that facilitate learning, perhaps by facilitating the “reverse hierarchy” process, is accompanied by a detectable concurrent change in the pattern of brain activation.
Although serial discrimination is considered a simple perceptual task, it requires two types of management processes, both of which involve frontoparietal networks. First, as in any new task (or situation), its basic structure in terms of neural representations should be set (Miller & Cohen 2001). Many studies suggest that this task-setting is implemented by high-level networks, which include extensive frontal and parietal regions. These networks are largely general-purpose and form the “task-set” for various tasks (and were hence termed “the multiple-demand” network; Duncan, 2010; Duncan & Owen, 2000). Second, task performance requires the retention of the relevant value of the first stimulus in each trial during the interstimulus interval and a comparison of this value with that of the second stimulus. This retain-and-compare process is a working memory operation (e.g., Romo, Brody, Hernández, & Lemus, 1999). Such operations were also shown to activate frontoparietal regions, which were thus termed the working memory network (Fedorenko, Behr, & Kanwisher, 2011; Koelsch et al., 2009; Baldo & Dronkers, 2006; Rainer, Asaad, & Miller, 1998). The exact role of this network in the retain-and-compare operation is still being debated. Previous studies have suggested that these working memory areas both manage and store the task-relevant stimuli. However, very recent studies (reviewed in Sreenivasan, Curtis, & D'Esposito, 2014) posit that the stimuli are stored in posterior sensory areas, and the role of the working memory network primarily involves task-related management. Additional related term is “attentional resources,” whose recruitment when a task is generally more demanding also activate partially overalapping posterior-parietal regions (Magen, Emmanouil, McMains, Kastner, & Treisman, 2009).
The behavioral observation that a simple regularity in perceptual discriminations leads to fast perceptual learning, which is specific to the trained regularity (Cohen et al., 2013; Nahum, Daikhin, et al., 2010), implies that the load on management processes decreases. This decrease is expected because utilizing the regularity, that is, the repeated reference, leads to increased reliance on the internal representation (of the reference), which partially replaces the need to actively retain the first stimulus in each trial. We therefore hypothesized that a condition with no regularity would place a heavier load on management processes and hence would induce a higher activation in the frontoparietal network. We further hypothesized that frontoparietal activity would quickly decrease when effective practice with the regularity containing condition led to the formation of a reliable prediction of the expected stimuli, and discrimination would increasingly rely on this stored regularity. Moreover, we reasoned that we may be able to track the formation of this auditory prediction in a high-level auditory area. This area is expected to show high activity at the initial stages of learning the regularity and then decrease its activity with repetitions of this regularity (Karni et al., 1995, 1998), as long as the reference containing condition is not interrupted.
To test these hypotheses, we measured both behavior and the BOLD response when participants performed a simple perceptual two-tone frequency discrimination task. On the basis of the observations of Nahum, Daikhin, et al. (2010), participants in the current study were administered the following two conditions. In one, the same tone was consistently presented in the first interval of each trial. This regularity is known to be detected quickly and yields fast and substantial improvement (Cohen et al., 2013; Nahum, Daikhin, et al., 2010). In the second condition, the same task and similar stimuli (though drawn from a broader frequency range) were used, but there was no cross-trial tone repetition. In this condition, participants' improvement has been reported to be very slow and does not reach the same level of performance even after many practice sessions.
We presented blocks of these two conditions in an interleaved manner (3 blocks of one condition followed by 3 blocks of the other condition). Because the stimuli were similar and the task was the same, participants were unaware of the switch in conditions. We asked which brain areas were sensitive to the difference between the two conditions, and activity in which brain areas was modified as a function of the rapid improvement we anticipated in the condition involving a simple, easily detected stimulus regularity.
Nineteen participants (age = 29 ± 5 years; 10 women) took part in the study. Each of them performed frequency discrimination and another task (not reported here) in the magnet (except one, who was administered only the frequency discrimination task) and had an additional anatomical scan at the end of the session. Before entering the scanner, participants practiced a short version of the behavioral protocol that they performed during scanning. Participants signed an informed consent form and were paid for their participation.
In the frequency discrimination task participants were presented with pairs of tones and were asked to decide (and respond with a right/left button press) which tone was higher. The tones were 50 msec long, the ISI was ∼600 msec, and the trial duration (onset to onset) was 2 sec. We measured frequency discrimination under two conditions in a single fMRI run: (1) No-Reference condition (No-Ref; see schematic illustration in Figure 1A) with no cross-trial consistency. In this condition, the first tone was chosen from a frequency range of 800–1200 Hz and the second tone was chosen according to the frequency differences shown in Figure 1B (see description below). (2) Reference-1st condition (Ref-1st; see schematic illustration in Figure 1A), which employed the same procedure, but the first tone in each pair was always 1000 Hz.
In both conditions, we administered sequences of tone pairs, with an initially large (20%) frequency difference, which got gradually smaller. The specific characteristics of these sequences (shown in Figure 1B) were based on the average frequency differences (of naive performers in their first assessment) obtained in an adaptive version of these conditions, which converged to 80% correct (see Nahum, Daikhin, et al., 2010). The purpose of using the sequences that both converged to the same level of performance (80% correct) was intended to control for difficulty differences between the conditions. Note that, in an adaptive protocol (similar accuracy of performance), the Ref-1st condition converges to much lower thresholds (Figure 1B). Each condition consisted of 180 trials, presented in 15 blocks of 12 trials each (24 sec per block separated by 9 sec of rest). Each condition was presented in three consecutive blocks and was then switched to the other condition (order counterbalanced across participants). Thus, a single run contained five triads of blocks of each condition (see Figure 1C). Participants were typically unaware of the condition switch. Participants were asked to keep their eyes closed throughout the entire measurement. RT and accuracy of performance were collected while participants performed the task inside the fMRI scanner. These were analyzed using repeated-measures ANOVAs with Condition (No-Ref, Ref-1st) and Block (First, Third) as within-participant factors.
fMRI Scanning Procedure
Scanning was performed in a 3T scanner (Magnetom TimTrio System 3.0 T (Tim (102 × 32) TQ) Erlangen, Germany). For each participant, functional (T2*-weighted) and high-resolution anatomical reference data sets (T1-weighted) were acquired. Functional measurements were obtained with a single EPI sequence with an echo time of 30 msec and a repetition time of 3000 msec. Acquisition of the slices was arranged uniformly within the repetition time interval. The matrix acquired was 80 × 80 with a field of view of 240 cm, resulting in an in-plane resolution of 3 × 3 mm. The slice thickness was 3 mm. Anatomical scans were measured with a 3-D gradient-echo with a 1 × 1 × 1 mm resolution.
fMRI Data Analysis
Anatomical and functional data were analyzed using the Brain Voyager QX Software package (The Netherlands). Functional data were corrected for motion using a trilinear estimation and interpolation. To correct for the temporal offset between the slices acquired in one scan, a cubic spline interpolation was applied. A temporal high pass with three cycles/points and linear trend removal were used for baseline correction of the signal. The functional images collected were coregistered with the anatomical images. Anatomical images were then transformed into the Talairach space.
The statistical evaluation was based on a least-squares estimation using the general linear model (GLM). The design matrix was generated using a hemodynamic response function. The time course of the BOLD signal obtained during the task was initially modeled using two predictors (one predictor per condition) as illustrated by the pattern of coloring in Figure 1C. To inspect within-condition changes, that is, the difference between the first and third blocks within a condition triad, the time course of the BOLD signal was remodeled using a separate predictor for each block within the triad (i.e., 3 predictors per condition × 2 conditions).
Multisubject random effects GLM and repeated-measures ANOVAs of beta values with Condition (No-Ref, Ref-1st) and Block (First, Third) as within-participant factors were applied to the data. The data were z-transformed before entering the random effects analyses. The results were corrected for multiple comparisons using a cluster-size limitation. Applying a cluster-level statistical threshold estimator, a minimal cluster size was determined at the chosen significance level (see figures) for each volume map.
We first identified cortical areas that were positively and significantly activated by the frequency discrimination task: (all conditions) > rest, random effects GLM contrast. The obtained map served as a mask for testing our hypotheses (see Figure 2A). Using voxel-wise repeated-measures ANOVA of beta values, we examined which brain areas were sensitive to the differences between the conditions. To test sensitivity to the task conditions while controlling for the behavioral difference, an ANCOVA on the beta values obtained from each of the condition-sensitive regions (separately for each region) was run with condition as within-participant factor and behavioral gain (ACC(Ref-1st) − ACC(No-Ref)) as a covariate.
To assess within-condition learning-related changes, we remodeled the data using a different predictor for each of the three blocks within a triad, obtaining six predictors—three for each condition. We then compared the beta values obtained in the first and third blocks by applying voxel-wise repeated-measures ANOVA on the areas within the mask (see Figure 2A), separately for each condition. We examined which areas consistently changed their activity from the first to the third block of within-condition block triads. To assess learning-related modifications during the entire session, we remodeled the data using a different predictor for each block, obtaining 30 predictors—15 for each condition. We then compared the beta values obtained in the third block of the first triad to those obtained in the third block of the last triad by applying voxel-wise repeated-measures ANOVA on the areas within the mask (Figure 2A).
To compare the areas that we found with those reported in literature, we calculated the distance in anatomical (1 mm3) voxels between the peak voxels of the regions reported in the literature and the areas found in our study.
Comparison to the Primary Auditory Cortex
Because we used simple auditory stimuli and a basic auditory discrimination task, we were interested in the impact of the experimental conditions on the primary auditory cortex, which shows automatic responses to auditory stimuli. To specifically compare our results to the dynamics of the signal there, we ran an auditory localizer on a subgroup of participants (n = 5). During the localizer period, participants were presented with auditory stimuli with rich and varying spectral content but with no clear semantic association. These included white noise, broad-band noise, pink noise, pitch shifts, intensity modulations, and sound effects (fading in, fading out, tremolo, stretching [paulstretch], amplitude modulation [wahwah], inversion). The stimulus duration was 1 sec. The stimuli were consecutively presented in seven blocks of 18 stimuli each. The blocks were separated by 15 sec of rest. Participants were requested to listen to the stimuli with their eyes closed. They were not asked to perform any task. Figure 2B shows the obtained auditory area resulting from the contrast: stimuli > rest (GLM random effects). The center of mass of the obtained area is similar to the areas identified as primary auditory cortex in the literature (x + 1, y + 1, z + 2 from the peak voxel reported in Lockwood et al., 1999; x − 5, y + 1, z + 1 from the peak voxel reported in Binder et al., 2000). This area was used as a control ROI to compare beta values and average time courses with the regions that were obtained in each of our experimental questions. Importantly, ROI-based repeated-measures ANOVAs of beta values in the auditory area yielded no significant effects for the frequency discrimination task.
The Pattern of Activation Induced by Two-Tone Frequency Discrimination
Figure 2A shows the map of brain areas positively activated by the conditions of the frequency discrimination task. The map shows involvement of auditory areas in the superior temporal gyri and sulci of both hemispheres. It also shows the somatosensory and motor areas in the precentral and postcentral gyri of the left hemisphere together with premotor areas associated with participants' motor responses (with their right hand) and planning. Additionally, it shows activation of the inferior prefrontal regions and parietal areas, evident mainly in the left hemisphere. Extensive involvement of the cerebellum and BG is also shown. Unexpectedly, we also found activation of the visual areas, as is visible in the medial view of both hemispheres, although participants' eyes were closed throughout the assessments. Subsequent analyses were based on this map.
Figure 2B shows the whole-brain activation map obtained from the subgroup of five participants who were presented with an auditory localizer stimuli in the scanner. This localizer was composed of a sequence of auditory stimuli with rich and varying spectral content. The marked area was significantly more activated during the auditory stimulus presentation compared to rest, when there was no auditory stimulation (p < .01, corrected by cluster size). This area served as a control ROI for comparing beta values and average time courses with the regions obtained from analyzing frequency discrimination activations.
Sensitivity to Task Condition—With and Without Stimulus Regularity
To assess which areas were differentially activated by the two conditions (Condition effect), we applied a voxel-wise repeated-measures ANOVA on the beta values obtained from the contrast: all-conditions > rest, shown in Figure 2A.
The comparison between the two task conditions revealed several areas that were differentially activated by the two conditions. These were mainly high-level areas in the left hemisphere, as shown in Figure 3A: lateral prefrontal (L-supPrefrontal; −46, −1, 33; L-infPrefrontal; −52, 0, 23), premotor (L-Premotor; −26, −16, 53), posterior intraparietal (L-intraParietal; −32, −60, 47; L-intraParietal-2; −40, −47, 44), superior parietal (L-supParietal; −19, −74, 46). As shown in Figure 3A, several small areas in the right hemisphere also showed differential sensitivity to the task conditions: middle temporal (R-midTemporal; 48, −30, 1), medial premotor (R-medial-Premotor; 4, 4, 48), and medial occipital (R-medial-Occipital; 6, 70, −21).
This condition-sensitive increase of activity in the left prefrontal and parietal areas was in line with our prediction, as these regions are known to be part of the working memory network involved in managing the retention of sounds (Prefrontal: x, y + 6, z + 3 from the peak voxel reported in Zatorre, Perry, Beckett, Westbury, & Evans, 1998; x, y + 4, z − 7; x, y − 2, z + 7 from the peak voxels reported in Gaab, Gaser, Zaehle, Jancke, & Schlaug, 2003; x, y + 2, z + 7; x, y − 4, z + 6 from the peak voxels reported in Koelsch et al., 2009; parietal areas include the peak voxels reported in these studies). The premotor region has been reported to be involved in processing linguistic information and in auditory–motor interface (Friederici, Kotz, Scott, & Obleser, 2010; Obleser & Kotz, 2010; Friederici, Makuuchi, & Bahlmann, 2009; Obleser, Wise, Alex Dresner, & Scott, 2007; Hickok & Poeppel, 2000, 2004; Davis & Johnsrude, 2003). The right middle temporal region was previously associated with auditory processing and working memory for pitch (Johnsrude, Penhune, & Zatorre, 2000; Zatorre & Samson, 1991). The involvement of the additional areas in the right hemisphere was not predicted by our working memory hypothesis.
Figure 3B shows the beta values (averaged across participants and blocks) obtained from each of the condition-sensitive regions. Beta values from the auditory cortex are also presented for comparison. The plot shows that the condition effect stems from higher beta values in the No-Ref condition. In contrast, the auditory cortex shows no difference between the beta values of the two conditions, F(1, 18) = 0, p = .99, in spite of being highly activated by the task. Figure 3C shows the time courses of the BOLD signal, indicating that the No-Ref condition induced a larger BOLD signal. Again, this condition-specific increase in activity was not found in the auditory cortex.
Although we aimed for attaining equal levels of difficulty (and hence of general attentional resources) in the two conditions and therefore used larger frequency differences in the No-Ref condition (based on Nahum, Daikhin, et al., 2010; see Methods), this condition was still slightly more difficult. Specifically, participants were less accurate (95 ± 1% vs. 89 ± 2% correct for Ref-1st vs. No-Ref; repeated-measures ANOVA, main effect of Condition: F(1, 18) = 19.33, p < .001) and somewhat slower (479 ± 20 msec vs. 530 ± 22 msec for Ref-1st vs. No-Ref, repeated-measures ANOVA, main effect of Condition: F(1, 18) = 19.2, p < .001), although they were not asked to be quick (but there was a fixed time interval of 1.4 sec between trials). The difference in activity patterns between these conditions may thus be attributed to this small, yet significant, difference in the required attentional resources rather than a difference in the ability to form effective predictions.
To control for this alternative account, we compared the beta values of the two conditions obtained for each condition-sensitive region (Figure 3A) by regressing out the behavioral difference. We ran an ANCOVA on the beta values obtained from each of the condition-sensitive regions (separately for each region) with Condition as the within-participant factor and Behavioral gain (ACC(Ref-1st) − ACC(No-Ref)) as the covariate. In the parietal areas, the difference between the conditions remained significant even when the behavioral difference was controlled for (L-intraparietal: F(1, 17) = 5.3, p = .03; L-intraparietal-2: F(1, 17) = 4.3, p = .05; L-sup-parietal: F(1, 17) = 11.2, p = .004). Similarly, the left premotor area and the right temporal region retained the significant difference between conditions, F(1, 17) = 7, p = .02, and F(1, 17) = 5, p = .04, respectively. However, the difference between the conditions decreased and became only marginally significant in the prefrontal areas (L-sup-Prefrontal: F(1, 17) = 3.5, p = .08; L-inf-Prefrontal: F(1, 17) = 3.2, p = .09). This reduction is in line with previous reports of prefrontal sensitivity to task difficulty (Fuster, 2001; Grady et al., 1996). Areas in the right hemisphere were also sensitive to this control. The right medial-occipital area became only marginally condition-sensitive (R-medial-Occipital: F(1, 17) = 4.2, p = .06), and the right medial-premotor area did not remain condition-sensitive (R-medial-Premotor: F(1, 17) = 1.7, p = .21).
Taken together, the pattern of increased activity under the No-Ref condition indicates that this condition activates working memory networks to a greater extent than the Ref-1st condition, although the task was the same and participants were unaware of the difference between the conditions. The differences in brain activity, particularly those related to the posterior parietal region, cannot be attributed to a general difference in the overall attentional efforts required by these two conditions.
Fast Learning in the Regularity Containing Condition
The behavioral advantage of the Ref-1st over the No-Ref condition was reflected in the different dynamics of performance in the two conditions. Figure 4A shows the average (cross-participant) accuracy in each block of each condition. As expected, in the Ref-1st condition (left plot) performance improved quickly. Accuracy increased between consecutive short blocks (12 trials each) of this condition (93 ± 1.7% in the first blocks of the block triad, vs. 97.6 ± 1% in the last blocks, t = −3.43, p = .003, in a paired, two-tailed t test). However, this improvement was specific to this condition, that is, to the specific pattern of stimuli, and was degraded whenever No-Ref blocks were introduced.
By contrast, performance in the No-Ref blocks (Figure 4A, right plot) did not show significant improvement after mild amounts of practice (88 ± 2 vs. 89.5 ± 2, t = −1.18, p = .25, in a paired, two-tailed t test), in line with previous findings of very slow improvement in this condition (Nahum, Daikhin, et al., 2010).
To assess within-triad changes in brain activity, we remodeled the data using a different predictor for each of the three blocks within a triad, obtaining six predictors—three for each condition. We then compared the beta values obtained under the first and third blocks applying voxel-wise repeated-measures ANOVAs on the areas within the mask (see Figure 2A), separately for each condition. Figure 4B shows two regions that showed sensitivity to block in the Ref-1st condition. The No-Ref condition is not shown because the comparison failed to reach significance for any of the activated areas (at the chosen significance level, p < .01, cluster-size corrected), in line with the lack of significant behavioral improvement between consecutive blocks of this condition.
The two regions that showed a main effect of Block were located in the left hemisphere: in the intraparietal area (L-intralParietal; −38, −45, 42) and in posterior superior temporal area (L-supTemporal; −49, −47, 13). The intraparietal area is associated with the storing of information (Koelsch et al., 2009; Baldo & Dronkers, 2006), although it is probably not the site of storage itself (Sreenivasan et al., 2014; Magen et al., 2009). The posterior superior temporal region is associated with analysis of temporal auditory structures at different levels of complexity (Obleser & Kotz, 2010; Friederici et al., 2009; Davis & Johnsrude, 2003; Binder et al., 2000). Figure 4C and D shows a reduction in activity in these areas between the first and third blocks. Beta values and time courses of activity for the auditory cortex are also presented. Here, in spite of high beta values and high BOLD signals, there was no significant effect of Block (ROI repeated-measures ANOVA, Block effect: F(1, 18) = 1.1, p = .31), suggesting that this area is not part of the fast “learning network” whose activity is modified across consecutive blocks of Ref-1st.
The results described above only show the effects of within-triad learning. To assess the possible effects of learning during the entire session (across block triads), we compared the activity and the behavior in the third block of the first triad with that in the third block of the last triad. We remodeled the data using a different predictor for each block (see Methods) and applied a voxel-wise repeated-measures ANOVA to the masked voxels (Figure 2A) with Block (First third, Last third) and Condition (No-Ref, Ref-1st) as within-participant factors. There was no significant difference in the measured brain activity (no areas showed differential activity at the p < .05 threshold, corrected by cluster size). Behavior did not improve during the session, and there was even a small tendency for some accumulated fatigue (Ref-1st, 99.6% vs. 95%: t = 1.46, p = .16; No-Ref, 97% vs. 90%: t = 2.1, p = .05). There was no evidence of cross-triad learning in any of the two conditions.
We studied the dynamics of brain activation during the performance of a two-tone frequency discrimination task in two conditions: with (Ref-1st) and without (No-Ref), an easily detected regularity in the stimulation pattern of consecutive trials. We conducted three ANOVAs that tested (1) which areas were activated differentially under these two similar behavioral conditions and (2) which areas modified their activity across consecutive blocks of the same condition (two separate ANOVAs for the two conditions, respectively). In addition, we tested potential impact of a behavioral difference between the two conditions on the activity in the condition-sensitive regions (ANCOVA results) as well as possible cross-triad learning. The findings showed that participants were typically unaware of the existence of the two different conditions. This is not surprising given the common behavioral task and trial structure and the similar range of stimuli (in the No-Ref condition 800–1200 Hz; in the Ref-1st condition vs. 950–1050 Hz, except for the broader few first trials).
We hypothesized that the condition effect would reveal different levels of activation within the working memory network, because the No-Ref condition, which contained no regularities, placed a heavier load on working memory processes. This is because online management of comparison and retention of stimuli was more demanding in the No-Ref condition (Cohen et al., 2013; Nahum, Daikhin, et al., 2010). As hypothesized, the condition-sensitive regions were mainly located in the left frontoparietal and premotor areas, which have been associated with the working memory network for sound (Koelsch et al., 2009; Gaab et al., 2003; Zatorre et al., 1998; premotor area–auditory–motor interface: Hickok & Poeppel, 2000, 2004). Furthermore, increased reliance on successful stimulus-specific predictions in the Ref-1st condition perhaps also reduced the activity related to task-setting, because performance becomes more automatic (Miller & Cohen, 2001). The lack of cross-triad learning in either of the two conditions suggests that this reduction cannot be explained as manifesting a general decrease in difficulty and hence in the need to allocate general attentional resources.
In contrast to the Ref-1st condition in which fast improvement was observed across consecutive short blocks, no such improvement was found in the No-Ref condition. This was expected from previous studies using this discrimination task (Nahum, Daikhin, et al., 2010) and other discrimination tasks when many stimuli were used with no repeated pattern. In these studies, even with a more limited range of stimuli, when several repeated references were used in a randomly chosen sequence (“roving conditions”; Clarke, Grzeczkowski, Mast, Gauthier, & Herzog, 2014; Herzog, Aberg, Frémaux, Gerstner, & Sprekeler, 2012; Parkosadze et al., 2008), improvement was either absent or small and very slow. Our No-Ref condition is an extreme case of roving, in which stimuli were randomly chosen from a flat distribution. Interestingly, this variability was sufficient to block the fast learning of the simple Ref-1st condition to the extent that performance did not improve between the first and last triad of this condition.
However, as expected, there was fast within-triad improvement in the Ref-1st condition. This improvement was specific to the simple predictable stimulation pattern of this condition and was interrupted (performance was degraded) by intervening No-Ref blocks. This interference was expected, because No-Ref blocks violate the expected pattern of stimulation that underlies the fast improvement. The cross-block (first to third) behavioral improvement was accompanied by modifications in two specific regions, namely, the left intraparietal area and the left posterior superior temporal area.
We interpret these results in the framework of the Reverse Hierarchy Theory, which suggests that successful detection of task-informative lower-level representations enables a gradual reliance on lower-level populations (e.g., Ahissar et al., 2009). In other words, we propose that the temporal region retains the detected auditory regularity whereas the intraparietal region controls this retention. Auditory regularity was successfully detected only in the Ref-1st condition. In this condition, the population that best decodes the average frequency is a reliable predictor in each trial, because this is the frequency of the first tone of every pair. In the No-Ref condition, the frequency within a trial could not be reliably predicted. Thus, when this simple regularity was detected, the “managing effort” required from the intraparietal region may have decreased. This claim of a division of labor is based on recent imaging studies that suggest that the high-fidelity representations of stimuli in working memory are kept in perceptual areas, whereas intraparietal regions only “manage” retention efforts (reviewed in Sreenivasan et al., 2014). Indeed, a study that aimed to assess this question directly concluded that intraparietal regions are not the site of storage itself but of the attentional resources required for keeping online storage (Magen et al., 2009).
An alternative account to the pattern of reduction of activation relates to a general reduction in task difficulty, which perhaps was greater in the Ref-1st condition, which was learned faster. This interpretation is unlikely. Comparing the two conditions while controlling for the behavioral difference (ANCOVA results) did not eliminate the condition effect in the posterior parietal region. It did, however, reduce the significance of the frontal region in the condition effect, suggesting that for this region, we cannot rule out a contribution of the small difference in the overall difficulty of the two conditions. Note, however, that participants were completely unaware of the switch between conditions or a change in the effort they were required to allocate at different stages of the session. This reported introspection is in line with the lack of a general improvement or a general change in activity during the session.
The Benefit of the Regularity—Integration of Interpretations
The cognitive literature attributes a unique role to the detection of regularities in sounds. For example, the MMN ERP component (Näätänen, 1992) is considered an automatic response of the auditory cortex to a violation of regularities (e.g., Näätänen, Paavilainen, Rinne, & Alho, 2007; Picton, Alain, Otten, Ritter, & Achim, 2000). Our own interpretation of the fast improvement in Ref-1st in fact stresses its easily detected regularity (within fewer than 10 trials), as described in a series of previous studies (Cohen et al., 2013; Oganian & Ahissar, 2012; Nahum, Daikhin, et al., 2010).
For example, Nahum, Daikhin, et al. (2010) interpreted this improvement as stemming from a shift from an initial working memory-based comparison of the stimuli presented in the two intervals of the trial to a comparison with an internal representation of the constant reference. This interpretation was also based on monkey studies (reviewed in Romo & de Lafuente, 2012) that found that, in the No-Ref condition, well-trained monkeys compare stimuli and activate working memory areas (premotor, prefrontal, parietal), thus producing “delayed activity.” However, when trained on the Ref-1st condition, monkeys do not compare stimuli online and do not produce delayed activity in higher level areas (Romo & Salinas, 2003; Brody, Hernández, Zainos, Lemus, & Romo, 2002; Romo et al., 1999; Hernández, Salinas, García, & Romo, 1997). Rather, they compare the second stimulus to the previously trained fixed reference stimulus maintained in their long-term memory, although no neural signature was found for the storage of this trained reference stimulus.
However, we also found that participants keep track and are heavily affected by the statistics of the experiment even when it contains no regularities, as in the case of No-Ref. Raviv, Ahissar, and Loewenstein (2012) suggested a simple model (inspired by Bayesian rules), accounting for these effects. The model proposes that rather than comparing the two stimuli within a trial, listeners compare the second tone to a combined representation of the frequency of the first tone (which is noisy because of the working memory noise added during the retention interval) and the prior. The prior in this case is simply the average frequency of the first tone on previous trials. According to Raviv et al.'s model, the same mechanism could have been automatically implemented in both the No-Ref and Ref-1st conditions. Indeed, the same simple model also accounts for participants' behavior when a reference is introduced (Raviv, Lieder, Loewenstein, & Ahissar, 2014), suggesting that, in spite of its substantial behavioral advantage, Ref-1st may not be a qualitatively different condition.
However, these two perspectives can be reconciled. Raviv et al.'s model does not take into account the reliability of the prior, which differs considerably between the two conditions: in the No-Ref condition, its reliability is low, whereas under the Ref-1st condition, its reliability is high. The weight assigned to the prior should depend on its reliability. Studies have shown that people are sensitive to the reliability with which recent data indicate the current state of the data (Nassar et al., 2012). Moreover, this reliability is also reflected in pupil diameter, implying that it is tightly linked with activity in attentional systems. This finding is consistent with the idea that the activity in the intraparietal regions is sensitive to the reliability of the prior. It decreases when the estimated reliability is increased, because the required attentional resources can consequently be reduced. Thus, the decrease in intraparietal activity may reflect the gradual switch of reliance from the externally presented stimulus to the temporally stored prior, as its estimated reliability is increased.
This interpretation suggests that learning the reliable prior should be reflected in a decrease of activity in the posterior superior temporal region. A decrease of activity is a marker of the initial, fast, but condition-specific stage of learning (e.g., Karni et al., 1995, 1998). The mechanism underlying this “habituation-like” pattern is not well understood and may reflect a match between the successfully detected prior and the incoming first tone, which would lead to a stimulus-specific suppression, whereas the failure of such a match (mismatch) in the No-Ref condition would not yield suppression.
The notion that this area is involved in auditory regularity detection and, perhaps, in the storage of detected regularities is consistent with previous studies that associated this area with the analysis of temporal structures at different levels of complexity (peak voxel is located at: x + 2, y + 3, z + 4 from the peak voxel reported in Davis & Johnsrude, 2003; x, y + 5, z from the peak voxel reported in Friederici et al., 2009; x, y + 4, z + 5 from the peak voxel reported in Obleser & Kotz, 2010).
By extension, with additional, cross-day learning, the response to this prior should gradually increase, as was reported by Karni et al., following several weeks of practice on a simple finger sequencing task (Karni et al., 1995, 1998). It may evolve into an area of expertise that gradually stores more elaborate priors.
No Indication of Regularity Learning in the Auditory Cortex
As expected, our frequency discrimination paradigm activated the primary auditory cortex. However, its response did not differ between conditions or blocks. Importantly, although the Ref-1st condition contained a smaller range of tones than the No-Ref condition, which could have induced greater adaptation in areas with narrow frequency tuning curves, no such reduction was found. This observation is congruent with observations both at the level of single neurons (Kajikawa, de La Mothe, Blumell, & Hackett, 2005; Recanzone, Guard, & Phan, 2000; Ehret & Schreiner, 1997; Howard et al., 1996) and with ERP (Daikhin & Ahissar, 2012) reports of broad adaptation tuning. Nevertheless, we cannot exclude the possibility that a different experimental design specifically aimed at measuring the working memory-based retention and comparison processes would have revealed working memory-related activity in the auditory cortex (e.g. Brechmann et al., 2007; Zatorre & Samson, 1991).
In conclusion, we found that the degree of regularity affects the pattern of brain activity even in simple discrimination tasks. The frontoparietal network involved in working memory is activated to a greater extent when no regularity is introduced. When a simple regularity is introduced, an effective prior is formed, leading to reduced activity in a region that controls retention (left intraparietal) and in a region that stores this effective prior (posterior superior-temporal region). We posit that this orchestrated modification in brain activity reflects a quick and implicit shift “backwards” when a reliable prior is detected. In other words, task performance relies more on posterior networks that store effective priors than on laborious online computations.
This work was supported by ISF grant 616/11 and the HUJI and EPFL Brain Collaboration. In addition, we thank Avi Mendelson and Tanya Orlov for their constructive comments and help with data analysis.
Reprint requests should be sent to Merav Ahissar, Department of Psychology and the Edmond and Lily Safra Center for Brain Sciences, Hebrew University of Jerusalem, Israel 91905, or via e-mail: firstname.lastname@example.org.