## Abstract

Children's gains in problem-solving skills during the elementary school years are characterized by shifts in the mix of problem-solving approaches, with inefficient procedural strategies being gradually replaced with direct retrieval of domain-relevant facts. We used a well-established procedure for strategy assessment during arithmetic problem solving to investigate the neural basis of this critical transition. We indexed behavioral strategy use by focusing on the retrieval frequency and examined changes in brain activity and connectivity associated with retrieval fluency during arithmetic problem solving in second- and third-grade (7- to 9-year-old) children. Children with higher retrieval fluency showed elevated signal in the right hippocampus, parahippocampal gyrus (PHG), lingual gyrus (LG), fusiform gyrus (FG), left ventrolateral PFC (VLPFC), bilateral dorsolateral PFC (DLPFC), and posterior angular gyrus. Critically, these effects were not confounded by individual differences in problem-solving speed or accuracy. Psychophysiological interaction analysis revealed significant effective connectivity of the right hippocampus with bilateral VLPFC and DLPFC during arithmetic problem solving. Dynamic causal modeling analysis revealed strong bidirectional interactions between the hippocampus and the left VLPFC and DLPFC. Furthermore, causal influences from the left VLPFC to the hippocampus served as the main top–down component, whereas causal influences from the hippocampus to the left DLPFC served as the main bottom–up component of this retrieval network. Our study highlights the contribution of hippocampal–prefrontal circuits to the early development of retrieval fluency in arithmetic problem solving and provides a novel framework for studying dynamic developmental processes that accompany children's development of problem-solving skills.

## INTRODUCTION

The ability to quickly and efficiently retrieve basic arithmetic facts from long-term memory is a core feature of children's early mathematical skill development. Basic fact retrieval also serves as a foundation for the efficient solution of more complex arithmetic problems and is a cardinal deficit in dyscalculia (Kaufmann, 2002; McCloskey, Harley, & Sokol, 1991). Representative studies conducted in North America and Great Britain indicate that nearly one in four adults does not have the mathematical skills needed for success in many blue-collar, much less mathematics-intensive, occupations nor does he or she have the quantitative skills needed to manage many now-routine day-to-day activities (Every Child a Chance Trust, 2009; Bynner, 1997; Parsons & Bynner, 1997; Rivera-Batiz, 1992). The competencies assessed in these studies were basic arithmetic, measurement, and simple algebraic skills and thus indicated that a substantial number of adults have not mastered the mathematics expected of a middle school student. The personal and wider social consequences of this level of innumeracy were highlighted by the National Mathematics Advisory Panel (2008) in the United States and similar panels in Great Britain (Every Child a Chance Trust, 2009). To address this issue, one of the core recommendations is that children master whole number arithmetic and fractions during the elementary school years. At the foundation of these competencies is the fluent retrieval of basic arithmetic facts (Geary, 2006).

In typically developing children, efficient fact retrieval is preceded by an extended period during which children use a mix of counting, retrieval, and other procedures to solve addition problems; for example, they count to solve some problems and retrieve the answer to others (Geary, 1994; Siegler & Shrager, 1984). It is thought that the representation of addition facts in long-term memory results from the repeated use of counting and other procedures during problem solving (Siegler, Shipley, Simon, & Halford, 1995; Siegler & Shrager, 1984; Ashcraft, 1982; Groen & Parkman, 1972). As an example, counting up from 5 to 8 (“five, six, seven, eight”) to solve the problem “5 + 3 = ?” results in an association between the answer (“eight”) and the problem stem (“5 + 3”). After many such counts, children begin to directly retrieve the answer when presented with the stem (Siegler & Shrager, 1984). Efficient use of retrieval is also dependent on improvement in cognitive control over retrieval processes and inhibition of irrelevant information, such as incorrect answers, intermediate steps, and operand intrusions, from entering into working memory during problem solving (Barrouillet & Lepine, 2005; Passolunghi & Siegel, 2004).

Cognitive change that involves the gradual emergence of memory-based problem solving is not limited to arithmetic. Over the last three decades, detailed behavioral studies of children's problem solving led to a reconceptualization of cognitive development, from discrete Piagetian (Piaget, 1965) stages to one that is analogous to overlapping waves (Siegler, 2006, 2007; Lee & Karmiloff-Smith, 2002). The latter is in fact consistent with some neo-Piagetian approaches to cognitive development, whereby more and less sophisticated solutions compete for expression. In these models, as with the efficient use of retrieval, inhibition of less sophisticated solutions is a critical component of children's conceptual insights associated with more advanced Piagetian stages (Houdé et al., 2000, 2011). Thus, at any point in time, children have available to them a mix of procedural and memory-based approaches for solving problems. Early in skill development, procedures are used more frequently, and memory-based processes, less frequently. Computational models and behavioral studies suggest that procedural execution results in the formation of problem-specific answers, which in turn results in decreased frequency of procedural use and increased frequency of memory-based problem solving (Shrager & Siegler, 1998; Siegler, 1996; Siegler et al., 1995). Cognitive development is thus characterized by overlapping use of different strategic approaches to problem solving, with change reflected in less efficient procedures being gradually replaced by more efficient memory-based ones.

In contrast to these advances in our understanding of children's cognitive development at the behavioral level, little is known about the underlying changes in neural systems. Only one study to date has examined differences in brain response associated with individual differences in strategy use in children. Using the well-studied domain of simple addition, Cho et al. recently reported that activity in the ventrolateral PFC (VLPFC) was elevated for children who were efficient retrievers of arithmetic facts compared with performance-matched children who were skilled counters (Cho, Ryali, Geary, & Menon, 2011). Although not focusing on early development, Rivera and colleagues found that age (sampled from 8 to 19 years) was negatively correlated with activity levels in the VLPFC, dorsolateral PFC (DLPFC), and hippocampus but positively correlated with left posterior parietal cortex (PPC), leading to the proposal that increased expertise in arithmetic is accompanied by functional specialization of the PPC along with decreased reliance on attentional resources for sequencing and execution of procedures (Rivera, Reiss, Eckert, & Menon, 2005). Cho and colleagues' study was the first to identify distinct multivariate patterns of brain activity associated with children's use of different strategies but was based on relatively small groups (*n* < 20/group) that predominantly used one strategy for problem solving or another. Very little is known about the changes that occur along a continuum ranging from low retrieval fluency, and thus heavy dependence on counting, to high retrieval fluency indicative of movement toward fact mastery in children learning arithmetic. Furthermore, nothing is known about functional connectivity and dynamic causal interactions between PFC and medial temporal lobe (MTL) regions, including the hippocampus proper, that support fact retrieval in children.

Here, we use a large sample of second and third graders (*n* = 86) to capture the full range of children's strategy in problem-solving approaches, from those who relied predominantly on counting to those who used a mix of counting and retrieval and those that primarily used retrieval to solve addition problems. The major goals of this study were to investigate neurodevelopmental changes associated with increased use of fact retrieval strategies and to examine dynamic functional interactions in hippocampal circuits associated with controlled memory retrieval. Our study focuses on the role of the hippocampus and the extended MTL memory system in arithmetic fact retrieval. The role of these regions in the retrieval of mathematical facts has been largely ignored in previous brain imaging studies (see, however, Cho et al., 2011 and De Smedt, Holloway, & Ansari, 2010) mainly because the vast majority of them have focused on adults who appear to rely on neocortical systems rather than the MTL memory system for fact retrieval. To our knowledge, no previous studies have reported arithmetic fact retrieval deficits in adults with lesions localized to the MTL. In contrast, theories of memory consolidation argue that the MTL plays an important role in the early stages of learning and retrieval, but its involvement decreases over time with concomitant increase in reliance on neocortical memory systems (Wang & Morris, 2010; Takashima et al., 2009). Furthermore, MTL involvement in retrieval also depends on how well schema and domain knowledge are established (Tse et al., 2007), which is necessarily weaker in young children. If such a model applies to arithmetic fact learning, one would predict that, in the initial stages of formal skill acquisition, children will rely more on MTL memory systems than well-practiced adults. Consistent with this view, Rivera and colleagues found that hippocampal responses decreased linearly between ages 9 and 19 years during arithmetic problem solving (Rivera et al., 2005). In the same vein, De Smedt and colleagues (De Smedt et al., 2010) found greater hippocampal response in children, compared with adults, when solving addition problem but not when solving subtraction problems that are less well rehearsed and more difficult to memorize because problems are not commutative (e.g., 5 − 3 ≠ 3 − 5). Neither of these studies, however, examined how MTL responses are related to individual differences in children's use of retrieval strategies during the early stages of skill acquisition. On the basis of these findings and the model discussed above, we hypothesized that higher levels of retrieval use in young children during the early stages of skill acquisition would be associated with greater hippocampal engagement.

The second related question we address is the differential role of VLPFC and DLPFC in use of retrieval. Although prominent developmental changes in engagement have been observed in both regions, the direction of effects has been mixed. Both VLPFC and DLPFC responses related to mental arithmetic have been reported to decrease from childhood to adulthood (Rivera et al., 2005). However, both VLPFC (Cho et al., 2011) and DLPFC show increased activity with greater task proficiency in second and third graders who are at the early stages of skill acquisition (Rosenberg-Lee, Barth, & Menon, 2011), raising the possibility that multiple regions of PFC are engaged during fact retrieval during the early phases of learning basic facts. Consistent with this view, domain-general studies of memory retrieval in adults suggest that the VLPFC plays an important and differential role in controlled retrieval of the contents of episodic and semantic memory (Badre & D'Esposito, 2009; Badre, 2008; Badre & Wagner, 2007; Simons & Spiers, 2003). In this study, we examine the differential role of the VLPFC and the DLPFC in retrieval of arithmetic facts in children. We predicted that VLPFC and DLPFC regions supporting cognitive control over retrieval would be engaged to a greater extent in children with higher retrieval use.

A third and novel aspect of our study is that we also investigated effective and dynamic causal interactions between hippocampus and PFC regions implicated in memory retrieval using two different analytic approaches. To our knowledge, no previous studies have examined developmental changes in hippocampal–prefrontal connectivity in this or any other cognitive domain. Furthermore, the dynamical systems modeling used in this study allowed us to dissociate bottom–up and top–down causal influences within this hippocampal–prefrontal memory system (Dove, Brett, Cusack, & Owen, 2006). First, we used psychophysiological interaction (PPI) analysis (Friston et al., 1997) to determine functional circuits, at the whole-brain level, associated with regions that showed strong retrieval fluency effects. Compared with the conventional model-free functional connectivity analysis, PPI analysis is a more powerful approach to identifying context-dependent functional interactions because it measures the temporal relationship between multiple brain regions while discounting the influence of task or common driving input. We predicted that the hippocampus would show strong task-related functional interactions with VLPFC, DLPFC, and PPC regions important for numerical problem solving (Menon, Rivera, White, Glover, & Reiss, 2000). Second, because PPI analyses do not provide information about the directional influences between regions, we used a novel multivariate dynamical systems (MDS) approach (Ryali, Supekar, Chen, & Menon, 2011) to assess causal interactions within the functional circuits identified by the PPI analysis. MDS is a state-space approach (Bishop, 2006) for estimating causal interactions in fMRI data that improves on many of the problems associated with existing methods such as Granger causal analysis and dynamic causal modeling (Ryali et al., 2011). We used MDS to assess causal interactions in functional circuits associated with greater retrieval fluency. We test the hypothesis that the left VLPFC exerts a strong top–down influence over the hippocampus and related MTL regions during retrieval of arithmetic facts (Badre & D'Esposito, 2009; Badre, 2008; Badre & Wagner, 2007; Simons & Spiers, 2003).

## METHODS

One-hundred twenty-one right-handed children (54 girls, age range = 7.0–9.9 years, *M* = 8.2 years, *SD* = 0.6 years) with a mean IQ of 109 (*SD* = 11.5) participated in the strategy assessment session. Children's average age in months was 97 (7.7 years), with 63 (52%) of them in the second grade. All protocols were approved by the institutional review board for human participants at the Stanford University School of Medicine. Participants received $50 per visit as compensation.

### Strategy Assessment

Each child's mix of strategies for solving single-digit addition problems (e.g., “2 + 9 = ?”) was first assessed using standardized, well-validated measures that classify strategies based on RT patterns, experimenter observation, and self-reports (Wu et al., 2008; Geary, Hoard, Byrd-Craven, & DeSoto, 2004). The task was programed using E-Prime (Psychology Software Tools, Inc., Pittsburgh, PA) on a PC running Windows XP. The 18 problems were presented one at a time on a computer monitor and were random pairs of integers from 2 to 9 (e.g., 2 + 9 = ?), with sums ranging from 6 to 17. Problems with two identical addends (e.g., 2 + 2, 5 + 5) or with addends of 0 and 1 were excluded because they evince less strategy variability (Siegler, 1987). Half of the problems were randomly selected to be presented in max (larger addend) + min (smaller addend) format, and the other half were presented in min + max format. No problem was repeated within the set. The child was asked to solve each problem (without the use of paper and pencil) as quickly as possible without making too many mistakes and to verbally state the answer out loud. There was no time limit. It was emphasized that the child could use whatever strategy was easiest to get the answer; for example, count with fingers, count verbally, or recall the answer. For each problem, the experimenter took detailed notes of overt signs of counting, such as finger usage, lip movement, or audible counting, and these were compared against the child's report of how the problem was solved; children were asked to report how they solved each problem immediately after stating the answer (Siegler, 1987). The experimenter classified and recorded the child's self-reported strategy into the following categories: “counting fingers” (min, max, sum), “verbal/mental counting” (min, max, sum), “look at fingers,” “retrieval” (including “guess” responses), “decomposition,” “multiply,” “count by numbers,” and “other/mixed“” (see supporting material for further details).

A timer was started at the initial display of each problem, and the experimenter measured RT by pressing a key on the keyboard as soon as the child spoke the answer; all sessions were audio-recorded to check RT precision. For each child, we computed the proportion of trials in which retrieval or counting strategies were used. Trials in which the RT was (1) below the 1st or above the 99th percentile for each child across all problems or (2) below the 1st or above 99th percentile for each problem across all children were excluded. Trials in which the participant gave multiple verbal responses and the experimenter had stopped the timer after the response were also excluded. Trials in which the experimenter noted overt signs of counting even when the child reported a retrieval strategy were classified as counted. For each child, we computed the proportion of trials in which retrieval or counting strategies (combined across “counting fingers,” “verbal/mental counting,” “look at fingers,” and “count by numbers”) were used.

### fMRI Methods

#### Behavioral Task

The fMRI experiment consisted of four alternating blocks of (1) standard addition (hereafter referred to as addition task), (2) “plus 1” addition (hereafter referred to as control task), (3) number identification, and (4) passive fixation. In the addition task, equations with different addends (e.g., 3 + 4 = 7) were presented, and the children were asked to indicate via a button box whether the equation was correct or incorrect. One addend was from 2 to 9, the other was from 2 to 5 (tie problems, such as “5 + 5 = 10,” were excluded), and answers were correct in 50% of the trials. Incorrect answers deviated by ±2 or ±1 from the correct answer (Ashcraft & Battaglia, 1978). The range of values of the min addend was restricted to ≤5 to allow children to execute the min counting strategy (i.e., state the value of the larger addend and count the smaller one) within the 5-sec window provided for each problem in the scanner. The range was determined based on previous studies of the speed with which children in the assessed age range encode numbers and implicitly count (Ashcraft, Fierman, & Bartolotta, 1984; Geary & Brown, 1991). In other words, the smaller min values allowed children to complete the count within the allotted 5 sec. The control task was the same as the addition task except that one addend ranged from 2 to 9 whereas the other was “1” (e.g., 5 + 1 = 7). The “*n* + 1” task provides an ideal high-level control for the standard “addition” task, as it is in the same format as the addition task and requires the same response selection. Our use of this task was based on pilot studies, which suggested that children are consistently faster on these problems compared with other addition problems. Furthermore, children show less strategy variability for “*n* + 1” problems, thus serving as ideal control problems for our study (Siegler, 1987; Baroody, 1985).

Stimuli were presented in a block fMRI design to optimize signal detection and connectivity analysis (Friston, Zarahn, Josephs, Henson, & Dale, 1999). In each task, stimuli were displayed for 5 sec with an intertrial interval of 500 msec. There were 18 trials of each task condition, broken up into four blocks of four or five trials; thus, each block lasted either 22 or 27.5 sec. The order of the blocks was randomized across participants with the following constraints: In every set of four blocks, all of the conditions were presented and the complex and simple addition task blocks were always separated by either a number identification or a passive fixation block. All orders of addition and nonaddition task conditions were equally likely. The total length of the experimental run was 6 min and 36 sec (Supplementary Figure S1).

#### Stimulus Presentation

The task was programed using E-Prime (Psychology Software Tools, Inc., Pittsburgh, PA) on a PC running Windows XP. The onsets of the fMRI scan and experimental task were synchronized using a TTL pulse delivered to the scanner timing microprocessor board from a serial response box connected to the computer. Stimuli were presented visually at the center of a screen using a custom-built magnet compatible projection system. The temporal precision of stimulus presentation and response onset detection was accurate to approximately ±1 msec.

#### fMRI Data Acquisition

Images were acquired on a 3-T GE Signa scanner (General Electric, Milwaukee, WI) using a custom-built head coil at the Stanford University Lucas Center. Head movement was minimized during the scan by a comfortable custom-built restraint. A total of 29 axial slices (4.0-mm thickness, 0.5-mm skip) parallel to the AC–PC line and covering the whole brain were imaged with a temporal resolution of 2 sec using a T2*-weighted gradient-echo spiral in–out pulse sequence (Glover & Lai, 1998) with the following parameters: repetition time = 2 sec, echo time = 30 msec, flip angle = 80°, 1 interleave. The field of view was 20 cm, and the matrix size was 64 × 64, providing an in-plane spatial resolution of 3.125 mm. To reduce blurring and signal loss from field inhomogeneity, an automated high-order shimming method based on spiral acquisitions was used before acquiring functional MRI scans (Kim, Adalsteinsson, Glover, & Spielman, 2002).

#### Preprocessing

fMRI data were analyzed using SPM8 (www.fil.ion.ucl.ac.uk/spm/). The first five volumes were not analyzed to allow for T1 equilibration. A linear shim correction was applied separately for each slice during reconstruction (Glover & Lai, 1998). ArtRepair software (spnl.stanford.edu/tools/ArtRepair/ArtRepair.htm) was used to correct for excessive participant movement (Mazaika, Whitfield-Gabrieli, Reiss, & Glover, 2007). Deviant volumes resulting from sharp movement or spikes in the global signal were interpolated using the two adjacent scans. No more than 20% of the volumes were interpolated. Finally, the images were corrected for errors in slice-timing, spatially transformed for registration to standard Montreal Neurological Institute (MNI) space, and smoothed again at 4.5-mm FWHM Gaussian kernel. The two-step sequence of first smoothing with a 4-mm FWHM Gaussian kernel and later with 4.5-mm FWHM Gaussian kernel approximates a total smoothing of 6 mm. Correction for movement during preprocessing (instead of including motion regressors in the individual statistics stage) provides a better method for correcting large movements typically found in pediatric and clinical populations.

### fMRI Data Analysis

#### Statistical Analyses

Task-related brain activation was identified using a general linear model. In the individual subject analyses, interpolated volumes flagged at the preprocessing stage were de-weighted. Brain activity related to each task condition was modeled using boxcar functions in conjunction with a canonical hemodynamic response function (HRF) and a temporal dispersion derivative to account for voxelwise latency differences in hemodynamic response. Low-frequency drifts at each voxel were removed using a high-pass filter (0.5 cycles/min). Serial correlations were accounted for by modeling the fMRI time series as a first-degree autoregressive process (Friston et al., 1997). Contrast images for each participant were generated for each contrast of interest. These contrast images were then used in a general linear model to determine voxelwise group *t* statistics. Significant clusters of activation were determined within the gray matter using a voxelwise height threshold of *p* < .01, with family-wise error (FWE) correction for multiple spatial comparisons at *p* < .01. We used a nonparametric approach based on Monte Carlo simulations to determine the minimum cluster size that controls for false positive rate at *p* < .01 for both height and extent. Monte Carlo simulations were implemented in Matlab using methods similar to the AlphaSim procedure in AFNI (Nichols & Hayasaka, 2003; Ward, 2000; Forman et al., 1995). Ten thousand iterations of random 3-D images, with the same resolution and dimensions as the fMRI data, were generated. The resulting images were smoothed with the same 6-mm FWHM Gaussian kernel used to smooth the fMRI data. The maximum cluster size was then computed for each iteration, and the probability distribution was estimated across the 10,000 iterations. The cluster threshold corresponding to a FWE significance level of *p* < .01 was determined to be 141 voxels. All stereotaxic coordinates are reported in MNI space. Activation foci were superimposed on high-resolution T1-weighted images, and their locations were interpreted using known neuroanatomical landmarks (Mai & Assheuer, 1997) and the cytoarchitectonic maximum probability maps provided by the SPM Anatomy toolbox (Eickhoff et al., 2005).

#### PPI Analysis

We conducted a PPI analysis to examine the functional interaction between a specific ROI (seed ROI) and the rest of the brain, while removing sources of potential confounding influences such as task-related effects and common driving input. A seed ROI was defined as a sphere centered at the local peak of the group-level statistical *t* map with a radius of six voxels. The time series from the seed ROI was de-convolved so as to uncover neuronal activity (physiological variable) and multiplied with the task design (psychological variable) to form an interaction term (thus called PPI). This interaction term was convolved with an HRF to form the PPI regressor (Friston et al., 1997). In addition to the PPI regressor, the design matrix included the main effect of task (addition vs. control task) as well as the mean-corrected time series of the seed ROI. These latter covariates remove common driving effects from other brain regions as well as task-driven effects on brain connectivity. Brain regions showing significant PPI effects were determined by testing for a positive slope of the PPI regressor. Subject-level contrast images were generated and entered into a group-level statistical analysis as described above. The significance of the results was assessed in the same way as described in the Statistical Analyses section above.

#### MDS Modeling

*s*(

*t*) is a

*M*× 1 vector of latent signals at time

*t*of

*M*regions,

*A*is an

*M*×

*M*connection matrix wherein

*C*

_{j}is an

*M*×

*M*connection matrix ensued by modulatory input

*v*

_{j}(

*t*), and

*J*is the number of modulatory inputs. The nondiagonal elements of

*C*

_{j}represent the coupling of brain regions in the presence of modulatory input

*v*

_{j}(

*t*).

*C*

_{j}(

*m*,

*n*) denotes the strength of causal connection from the

*n*th region to

*m*th region. Therefore, latent signals

*s(t)*in

*M*regions at time

*t*is a bilinear function of modulatory inputs

*v*

_{j}(

*t*) and its previous state

*s*(

*t*− 1).

*D*is an

*M*×

*M*diagonal matrix wherein

*D*(

*i*,

*i*) denotes external stimuli strength to the

*i*th region.

*u*(

*t*) is an

*M*× 1 binary vector whose elements represent the external stimuli to the

*m*th region under investigation.

*w*(

*t*) is an

*M*× 1 state noise vector whose distribution is assumed to be Gaussian distributed with a covariance matrix

*Q*(

*w*(

*t*) ∼

*N*(0,

*Q*)). Additionally, state noise vectors at time instances 1, 2, …,

*T*(

*w*(1),

*w*(2) …

*w*(

*T*)) are assumed to be identical and independently distributed. Equation 1 represents the time evolution of latent signals in

*M*brain regions. More specifically, the latent signals at time

*t*,

*s*(

*t*) is expressed as a linear combination of latent signals at time

*t*− 1, external stimulus at time t (

*u*(

*t*)), bilinear combination of modulatory inputs

*v*

_{j}(

*t*),

*j*= 1, 2, …,

*J*and its previous state, and state noise

*w*(

*t*). The latent dynamics modeled in Equation 1 gives rise to the observed fMRI time series represented by Equations 2 and 3.

We model the fMRI time series in region *m* as a linear convolution of HRF and latent signal *s*_{m}(*t*) in that region. To represent this linear convolution model as an inner product of two vectors, the past *L* values of *s*_{m}(*t*) are stored as a vector *x*_{m}(*t*) in Equation 2 and represent an *L* × 1 vector with *L* past values of latent signal at the *m*th region. In Equation 3, *y*_{m}(*t*) is the observed BOLD signal at *t* of the *m*th region. Φ is a *p* × *L* matrix whose rows contain bases for HRF. *b*_{m} is a 1 × *p* coefficient vector representing the weights for each basis function in explaining the observed BOLD signal *y*_{m}(*t*). Therefore, the HRF in the *m*th region is represented by the product *b*_{m}Φ. The BOLD response in this region is obtained by convolving the HRF (*b*_{m}Φ) with the *L* past values of the region's latent signal (*x*_{m}(*t*)) and is represented mathematically by the vector inner product *b*_{m}Φ *x*_{m}(*t*). Uncorrelated observation noise *e*_{m}(*t*) with zero mean and variance σ_{m}^{2} is then added to generate the observed signal *y*_{m}(*t*). *e*_{m}(*t*) is also assumed to be uncorrelated with *w*(τ), at all *t*. Equation 3 represents the linear convolution between the embedded latent signal *x*_{m}(*t*) and the basis vectors for the HRF. Here, we use the canonical HRF and its time derivative as bases, as is common in most fMRI studies. Equations 1–3 together represent a state-space model for estimating the causal interactions in latent signals based on observed multivariate fMRI time series.

Estimating causal interactions between *M* regions specified in the model is equivalent to estimating the parameters *C*_{j}, *j* = 1, 2, …, *J*. To estimate *C*_{j}, the other unknown parameters, *D*, *Q*, {*b*_{m}}_{m=1}* ^{M}*, and {σ

_{m}

^{2}}

_{m=1}

*, and the latent signal {*

^{M}*s*(

*t*)}

_{t=1}

*based on the observations {*

^{T}*y*

_{m}

*(*

^{s}*t*)}

_{m=1,s=1}

^{M}^{,}

*,*

^{S}*t*= 1, 2, …,

*T*, where

*T*is the total number of time samples and

*S*is the number of participants, need to be estimated. We use a variational Bayes approach for estimating the posterior probabilities of the unknown parameters of the MDS model given fMRI time series observations for

*S*number of participants. We estimate a single model for all subjects given fMRI time series observations for each subject. The statistical significance of the parameters is assessed by examining the posterior probabilities, which are normal because of the assumptions in the model, of the parameters at a given level of significance. The statistical significance of the difference in causal links between two conditions is also tested using the posterior probabilities of the difference parameters between two conditions which are again normal. Multiple comparisons in testing the statistical significance are accounted using Bonferroni corrections. The mean time course from each ROI was first extracted for all participants. Each time series was then detrended, and its temporal mean was removed. MDS methods, as described above, were used to estimate the task-dependent causal interactions between brain regions while accounting for variations in hemodynamic responses in the ROIs.

## RESULTS

### Behavioral Results

#### Strategy Assessment

Strategies used during addition problem solving were assessed in 121 children. Overall, 80% (*SE* = 0.01) of the problems were solved correctly. On average, 52% of the correctly answered problems were solved through retrieval, and 35% and 13%, through counting or other strategies, respectively. The mean of participants' correct trial median RT was 3602 msec (*SE* = 132 msec). Median RTs were faster (*F*(1, 99) = 101.74, *p* < .001) when the answer was retrieved (mean = 2804 msec, *SE* = 95 msec) than when it was counted (mean = 5480 msec, *SE* = 267; Figure 1A). Retrieval accuracy (mean = 86%, *SE* = .02) was also better than counting accuracy (mean = 72%, *SE* = .02; *F*(1, 98) = 23.51, *p* < .001). The percentage of trials correctly retrieved (“retrieval fluency”) was significantly correlated with correct trial median RT (*r* = −.48, *p* < .001; Figure 1B) but not accuracy (*r* = .01, *p* = .93). There were no differences in retrieval fluency between second and third graders or between female and male participants (*p* > .20).

The validity of strategy self-reports in our study is supported by the finding of significant RT differences across retrieval and counting trials. Retrieval and counting RTs found in our sample are highly similar to those found for retrieval (2781 msec) and counting (6662 and 4980 msec for finger and verbal counting, respectively) in an independent sample of second-grade children and thus support the accuracy of the strategy classifications (Geary, Hoard, & Bailey, in press).

#### Behavioral Results from fMRI Session

Behavioral and imaging data were successfully acquired from 86 of the 121 children (see Supplementary Methods for exclusion criteria). For the addition task, mean accuracy was 78% (*SE* = 1%), and the mean of correct trial median RT was 2844 msec (*SE* = 69 msec). For the control task, mean accuracy was 87% (*SE* = 1%), and the mean of correct trial median RT was 2352 msec (*SE* = 56 msec). Mean accuracy (*F*(1, 85) = 40.49, *p* < .001) and correct trial median RT (*F*(1, 85) = 84.66, *p* < .001) were both significantly different across conditions (Figure 2A and B). Mean accuracy (*r* = .29, *p* = .007) and correct trial median RT (*r* = −.31, *p* = .004) for the addition task were both significantly correlated with retrieval fluency (Figure 2C and D). To rule out use of an approximation strategy to solve the addition verification problems in the scanner, we compared the effect of distance, that is, the numerical distance between the presented incorrect answer and the correct answer (±1 vs. ±2), using a repeated measures ANCOVA with retrieval fluency as the covariate. Use of an approximation strategy would be indicated if incorrect answers deviating by 2 from the correct answer were solved more quickly and accurately than answers deviating by 1, but there was no interaction between distance and retrieval fluency nor were there any main effects of distance on either accuracy or RT (all *p*s > .1).

### Brain Imaging Results

#### Brain Areas Involved in Addition Problem Solving Compared with the Control Task

We identified brain regions that had elevated signal intensity during addition problem solving compared with the control task. Greater activation was found in left intraparietal sulcus (IPS) and supramarginal gyrus (SMG) divisions of the PPC, bilateral insula and DLPFC, lateral occipital cortex, medial superior frontal gyrus (SFG), left motor cortex, and cerebellum (height threshold of *p* < .01, FWE corrected for multiple comparisons at *p* < .01; Supplementary Figure S2; Supplementary Table S1).

#### Brain Activation Related to Greater Retrieval Use

We examined brain regions that showed increase in signal intensity correlated with retrieval fluency (Supplementary Figure S3 shows the distribution of retrieval fluency used as a covariate in this analysis). A positive correlation between brain activation and retrieval fluency was found in the right hippocampus, fusiform gyrus (FG) and lingual gyrus (LG), left VLPFC (pars triangularis), rostrolateral PFC, bilateral DLPFC, angular gyrus, and lateral occipital cortex (height threshold of *p* < .01, extent threshold of *p* < .01, FWE corrected for multiple comparisons; Figure 3; Table 1). At a more liberal threshold (height threshold of *p* < .01, FWE corrected for multiple comparisons at *p* < .05), retrieval fluency effects were also detected in the left hippocampus and parahippocampal gyrus (PHG). No brain areas showed significant negative correlation with retrieval fluency. To examine whether the signal increases associated with greater retrieval use were confounded by the effects of behavioral performance, additional analyses were conducted using retrieval fluency, with accuracy and RT as covariates of noninterest. To control for individual differences in accuracy and RT, difference scores of performance accuracy (Δ accuracy = addition task accuracy − control task accuracy) and RT (Δ RT = addition task RT − control task RT) rather than absolute values were entered as covariates. This analysis yielded similar results as those described above (Supplementary Table S2).

Region. | Cluster Size (Voxels). | Peak t Score. | Peak MNI Coordinates (mm). | ||
---|---|---|---|---|---|

x. | y. | z. | |||

L angular gyrus | 434 | 4.37 | −42 | −78 | 18 |

L VLPFC | 593 | 4.24 | −36 | 38 | 6 |

R hippocampus/MTL | 1116 | 3.84 | 26 | −42 | −6 |

R LOC | 343 | 3.53 | 24 | −86 | 38 |

R DLPFC | 182 | 3.37 | 34 | 24 | 46 |

R angular gyrus | 141 | 3.22 | 40 | −68 | 26 |

L DLPFC | 221 | 3.06 | −34 | 24 | 52 |

R ventral striatum | 205 | 2.94 | 12 | 18 | −10 |

Region. | Cluster Size (Voxels). | Peak t Score. | Peak MNI Coordinates (mm). | ||
---|---|---|---|---|---|

x. | y. | z. | |||

L angular gyrus | 434 | 4.37 | −42 | −78 | 18 |

L VLPFC | 593 | 4.24 | −36 | 38 | 6 |

R hippocampus/MTL | 1116 | 3.84 | 26 | −42 | −6 |

R LOC | 343 | 3.53 | 24 | −86 | 38 |

R DLPFC | 182 | 3.37 | 34 | 24 | 46 |

R angular gyrus | 141 | 3.22 | 40 | −68 | 26 |

L DLPFC | 221 | 3.06 | −34 | 24 | 52 |

R ventral striatum | 205 | 2.94 | 12 | 18 | −10 |

LOC = lateral occipital cortex.

#### Testing for Retrieval-specific Effects with Stepwise Regression

To further confirm the specificity of effects in brain areas that showed positive correlation with retrieval fluency, we conducted additional stepwise regression analyses. As above, difference scores of performance accuracy and RT as well as retrieval fluency were used as predictors in the model. The dependent variables were the percentages of signal change within the ROIs centered on local peaks in the left VLPFC, left DLPFC, right DLPFC, right fusiform, right hippocampus, and bilateral posterior angular gyrus. In every stepwise regression, only retrieval fluency was selected as the predictor, whereas the other two performance predictors were excluded. The results of the stepwise regression are summarized in Table 2.

ROI. | Variable(s) Entered. | Variable(s) Excluded. | |||||
---|---|---|---|---|---|---|---|

Variable. | r. | t. | p. | Variable. | t. | p. | |

L VLPFC | Retrieval fluency | .35 | 3.41 | .001 | Δaccuracy | .90 | .37 |

ΔRT | −.99 | .32 | |||||

L DLPFC | Retrieval fluency | .26 | 2.49 | .015 | Δaccuracy | .83 | .41 |

ΔRT | −.89 | .38 | |||||

R DLPFC | Retrieval fluency | .30 | 2.88 | .005 | Δaccuracy | .60 | .55 |

ΔRT | −1.84 | .07 | |||||

R FG | Retrieval fluency | .30 | 2.92 | .004 | Δaccuracy | −1.3 | .90 |

ΔRT | 1.94 | .06 | |||||

R hippocampus | Retrieval fluency | .31 | 2.99 | .004 | Δaccuracy | .65 | .52 |

ΔRT | −1.21 | .23 | |||||

L angular gyrus | Retrieval fluency | .32 | 3.12 | .002 | Δaccuracy | 1.10 | .28 |

ΔRT | −.95 | .35 | |||||

R angular gyrus | Retrieval fluency | .26 | 2.42 | .018 | Δaccuracy | −.04 | .97 |

ΔRT | −.08 | .94 |

ROI. | Variable(s) Entered. | Variable(s) Excluded. | |||||
---|---|---|---|---|---|---|---|

Variable. | r. | t. | p. | Variable. | t. | p. | |

L VLPFC | Retrieval fluency | .35 | 3.41 | .001 | Δaccuracy | .90 | .37 |

ΔRT | −.99 | .32 | |||||

L DLPFC | Retrieval fluency | .26 | 2.49 | .015 | Δaccuracy | .83 | .41 |

ΔRT | −.89 | .38 | |||||

R DLPFC | Retrieval fluency | .30 | 2.88 | .005 | Δaccuracy | .60 | .55 |

ΔRT | −1.84 | .07 | |||||

R FG | Retrieval fluency | .30 | 2.92 | .004 | Δaccuracy | −1.3 | .90 |

ΔRT | 1.94 | .06 | |||||

R hippocampus | Retrieval fluency | .31 | 2.99 | .004 | Δaccuracy | .65 | .52 |

ΔRT | −1.21 | .23 | |||||

L angular gyrus | Retrieval fluency | .32 | 3.12 | .002 | Δaccuracy | 1.10 | .28 |

ΔRT | −.95 | .35 | |||||

R angular gyrus | Retrieval fluency | .26 | 2.42 | .018 | Δaccuracy | −.04 | .97 |

ΔRT | −.08 | .94 |

#### Brain Activation in Relation to Performance

We conducted two additional analyses at the whole-brain level to further examine whether signal increases associated with greater retrieval fluency overlapped with individual differences in accuracy and RT. We first examined the relation between differential brain activation and accuracy to the two tasks; differential accuracy was defined as in the previous section. Although accuracy was correlated with increased activation in several brain areas (Supplementary Table S3), conjunction analysis did not reveal significant clusters of overlap between the neural correlates of accuracy and retrieval fluency even at liberal thresholds. We then examined the relation between differential brain activation and RT; differential RT was defined as in the previous section. Again, RT was significantly correlated with decreased activation in several brain areas (Supplementary Table S4), and a conjunction analysis revealed a small overlap with retrieval fluency effects in the right anterior hippocampus that was detectable only at a liberal threshold (*p* < .01, cluster extent = 40 voxels; Supplementary Figure S4). No brain regions showed overlap between retrieval fluency, accuracy, and RT. These results suggest that the effects of retrieval fluency are largely independent of performance.

#### Brain Activation in Relation to Age, Grade, and IQ

To examine whether signal increases associated with greater retrieval use overlapped with individual differences in age, grade, or IQ, we conducted additional ANCOVAs at the whole-brain level using these variables as covariates. The right superior parietal lobule (SPL) was the only brain region that showed increased activation with age measured in months. These regions did not overlap with any of the regions that showed retrieval fluency effects (Supplementary Table S5; Supplementary Figure S5). Between-grade differences were observed in the occipital cortex, LG, intraparietal sulcus, SPL, and left DLPFC (Supplementary Table S6). These grade-related effects were consistent with and overlapped with previously reported developmental changes (Rosenberg-Lee et al., 2011) but were distinct from regions that showed retrieval fluency effects. Finally, full-scale IQ was not associated with activation in any brain region. As a further validation, an additional analysis using retrieval fluency, age, and full-scale IQ as covariates yielded similar results as those from the analysis using only retrieval fluency as a covariate (Supplementary Table S7). Together, these findings suggest that retrieval fluency effects observed in our study are independent of age and IQ.

#### Connectivity of the Right Hippocampus Examined by PPI Analysis

A PPI analysis was used to examine functional circuits associated with hippocampal regions that showed significant correlations with retrieval fluency. A spherical ROI with a 6-mm radius in the right hippocampus was used as a seed for the analysis (Figure 4, top left). The PPI analysis revealed that the right hippocampus ROI had significantly greater arithmetic task-related connectivity with bilateral VLPFC, right DLPFC, FG and LG, bilateral insula, SFG, and ACC (height threshold of *p* < .01, extent threshold of *p* < .01, FWE corrected for multiple comparisons; Figure 4; Table 3).

Region. | Size of Cluster (Voxels). | Peak t Score. | Peak MNI Coordinates (mm)
. | ||
---|---|---|---|---|---|

x. | y. | z. | |||

L anterior cingulate gyrus | 920 | 4.41 | −2 | 32 | 0 |

L middle temporal gyrus | 188 | 4.04 | −62 | −60 | 4 |

R rostral frontal cortex | 323 | 3.94 | 50 | 36 | −6 |

L SFG | 1383 | 3.80 | 0 | 52 | 32 |

R cerebellum | 3275 | 3.63 | 8 | −44 | −14 |

L insular cortex/VLPFC | 1410 | 3.47 | −36 | −8 | −10 |

L supramarginal gyrus | 167 | 3.41 | −64 | −32 | 40 |

R DLPFC | 832 | 3.40 | 44 | 8 | 46 |

R angular gyrus | 155 | 3.38 | 64 | −52 | 20 |

R temporal pole | 504 | 3.26 | 48 | 10 | −10 |

L DLPFC | 147 | 2.94 | −38 | 14 | 52 |

Region. | Size of Cluster (Voxels). | Peak t Score. | Peak MNI Coordinates (mm)
. | ||
---|---|---|---|---|---|

x. | y. | z. | |||

L anterior cingulate gyrus | 920 | 4.41 | −2 | 32 | 0 |

L middle temporal gyrus | 188 | 4.04 | −62 | −60 | 4 |

R rostral frontal cortex | 323 | 3.94 | 50 | 36 | −6 |

L SFG | 1383 | 3.80 | 0 | 52 | 32 |

R cerebellum | 3275 | 3.63 | 8 | −44 | −14 |

L insular cortex/VLPFC | 1410 | 3.47 | −36 | −8 | −10 |

L supramarginal gyrus | 167 | 3.41 | −64 | −32 | 40 |

R DLPFC | 832 | 3.40 | 44 | 8 | 46 |

R angular gyrus | 155 | 3.38 | 64 | −52 | 20 |

R temporal pole | 504 | 3.26 | 48 | 10 | −10 |

L DLPFC | 147 | 2.94 | −38 | 14 | 52 |

#### MDS Modeling of Retrieval-specific Nodes

To examine the strength and direction of causal functional interactions between the hippocampus and PFC regions associated with retrieval fluency, we conducted an MDS analysis of causal interactions. We focused on three PFC regions that were consistently associated with both higher retrieval fluency and greater hippocampal PPI interactions—left DLPFC, right DLPFC, and left VLPFC. As depicted in Figure 5, MDS identified bidirectional links between all nodes (*p* < .01, Bonferroni corrected) except between the right DLPFC and the right hippocampus, which showed no net causal interactions. Comparison of the strength of causal influences revealed the following effects: (1) Causal influence from the left VLPFC to the right hippocampus was stronger than the influence from the left DLPFC to the right hippocampus (*p* < .05, Bonferroni corrected), (2) causal influence from the right hippocampus to the left DLPFC was stronger than the influence from the right hippocampus to the left VLPFC (*p* < .01, Bonferroni corrected), (3) causal influence from the left VLPFC to the right hippocampus was stronger than the influence in the reverse direction (*p* < .01, Bonferroni corrected), (4) causal influence from the right hippocampus to the left DLPFC was stronger than the influence in the reverse direction (*p* < .01, Bonferroni corrected), and (5) causal influence from the left DLPFC to the left VLPFC was stronger than the influence in the reverse direction (*p* < .05, Bonferroni corrected).

## DISCUSSION

### Examining Maturation of Retrieval Fluency in Development

The overall goal of our study was to examine the neural mechanisms underlying children's transition from counting-based procedures to retrieval-based problem solving, which largely occurs in the second and third grades. Importantly, cognitive developmental change should not be conflated with age-related change. To capture the mechanisms driving cognitive change, it is critical to focus on the age ranges in which the rate of change is the steepest. Merely examining age differences as in cross-sectional studies does not address this question because much of the change could occur between the ages of the different groups (e.g., comparing 7- and 12-year olds will not be sensitive to change between the ages of 11 and 12 years). With respect to our study, younger children (first grade and below) will be strongly biased toward use of counting, whereas older children (fourth grade and above) will be strongly biased toward use of retrieval, and thus, contrasting younger and older groups may miss the more critical time when change is occurring rapidly and thus may not be as sensitive to the mechanisms contributing to such change. For the same reason, comparisons with adolescents and adults would be inappropriate. Critically, it cannot be assumed that learning in adults or contrasts between children and adults are comparable with learning in the developing brain (Karmiloff-Smith, 1981, 2010).

We have shown recently that developmental changes cannot be inferred from or characterized by a gross comparison between adults and children or by examining the effects of training on novel problems in adults (Cho et al., 2011). Indeed, most behavioral studies of cognition in children focus on developmental questions in children; comparisons with adults have rarely been the focus of these studies, as it is well understood that children's emerging memory and representational and logical thinking abilities are not analogous to those of a novice adult. Our approach to the study of cognitive development differs fundamentally from the dominant paradigm in brain imaging studies based on “child versus adult” comparisons. Rather, our neuroimaging approach is more consistent with modern theoretical and empirical work in cognitive development, advocated by Geary (Geary, 2006; Geary, 1994), Siegler (Siegler, 1996), Karmiloff-Smith (Karmiloff-Smith, 2010), and other prominent cognitive/developmental psychologists. In summary, we use a more appropriate developmental approach to gain novel insights into the role of memory and cognitive control processes involved in the development of arithmetic fact retrieval.

### Overview of Main Findings

Change in the mix of strategies used in problem solving is a cardinal feature of children's cognitive development (Siegler, 1996). Across all domains that have been studied, the strategy mix is dominated by the use of relatively slow and error-prone procedures early in development and of relatively fast and accurate memory-based processes later in development. One of the better documented domains in which children's gains in competence are reflected in this changing strategy mix is arithmetic, especially simple addition (Geary, 1994; Siegler & Shrager, 1984). The study of children's strategic approaches to solving addition problems not only informs us about their competence in this domain but provides insights into the mechanisms governing more general change in children's cognitive development.

We identified a system of brain regions engaged during children's use of memory retrieval to solve addition problems. Children whose strategy mix was dominated by memory retrieval had greater activity in the right hippocampus and bilateral DLPFC and VLPFC (Figure 3). Critically, brain activation in these regions was specifically related to individual differences in retrieval strategy use and showed minimal to no overlap with accuracy, problem-solving time, grade, age, or IQ (Supplementary Tables S2–S7; Supplementary Figures S4 and S5). Within the MTL and PFC, only the right hippocampus showed prominent overlap between retrieval fluency and performance. Specifically, activation in the right hippocampus was associated with both greater retrieval use and faster RTs, consistent with our behavioral findings that retrieval fluency was significantly correlated with faster RT. These results suggest that the right hippocampus plays a particularly important role in retrieval-mediated performance enhancements.

Effective connectivity analysis using PPI identified strong interactions of the hippocampus with bilateral DLPFC and VLPFC regions known to be involved in memory retrieval. Dynamic causal analysis revealed strong bidirectional interactions between the hippocampus and the left VLPFC and DLPFC. More specifically, the causal influence from the left VLPFC to the right hippocampus served as the main top–down component, whereas the influence of the right hippocampus to the left DLPFC served as the main bottom–up component of this retrieval network (Dove et al., 2006). These results provide novel insights into dynamic interactions in hippocampal–prefrontal circuits mediating arithmetic fact retrieval. We discuss the implication of these findings for the neural basis of arithmetic fact retrieval in children at an age important for the development of this skill.

### Retrieval Fluency and Behavior

The production task was used for the behavioral assessment because it is the most common and best validated method for assessing the mix of problem-solving strategies used by children to solve addition problems (Siegler, 1987). Consistent with previous findings (e.g., Geary et al., 2004; Siegler, 1987), the large sample of children assessed in this study represented considerable variation in the mix of strategies used to solve these problems, ranging from predominant use of procedural counting to a mix of strategies and to predominant use of retrieval (Supplementary Figure S3). Retrieval fluency was not related to grade level in this sample, indicating that it is not simply curriculum exposure driving the change in children's strategy mix (Geary, Bow-Thomas, Liu, & Siegler, 1996).

### Hippocampal Involvement in Children's Retrieval of Arithmetic Facts

The hippocampus is known to be critical for the formation of new declarative memory as well as flexible retrieval and use of these memories (Wang & Morris, 2010; Suzuki, 2007; Squire, Stark, & Clark, 2004). The potential role of the hippocampus in arithmetic fact retrieval has, however, been largely ignored in previous human brain imaging studies, presumably because most of them have focused on adults in whom the solution of simple arithmetic problems is highly automated and does not rely on the hippocampal system. In line with previous studies reporting stronger hippocampus activation for younger, compared with older, participants engaged in mental arithmetic (Rivera et al., 2005) and multivariate pattern differences between children whose dominant strategy was counting versus retrieval (Cho et al., 2011), this study provides definitive evidence for hippocampal involvement in young children's use of retrieval to solve addition problems. Greater engagement of the MTL in children for whom direct retrieval is a more dominant strategy suggests that retrieval and re-encoding mechanisms are actively involved during early periods of arithmetic learning. Cho and colleagues found that children who predominantly used retrieval or counting strategies but were otherwise matched on accuracy and response latencies differed significantly in multivariate activation patterns in very similar hippocampal regions as those identified here (Cho et al., 2011). They did not, however, differ in hippocampal activation levels. One possible reason for this difference is that the “counters” group assessed by Cho and colleagues had variable levels of retrieval use ranging from 0% to 30% (Cho et al., 2011), resulting in smaller between-group differences than would occur for groups that exclusively used counting or retrieval—such groups, however, are not likely to be found as nearly all children use at least two problem-solving strategies before retrieval is fully automated (Siegler, 1996). Our present results provide clear evidence that, in a sample that captures the full range of children's strategic variability, individual differences in retrieval fluency are associated with greater hippocampal responses.

Taken together with previous results (Rivera et al., 2005; Kawashima et al., 2004), our findings suggest that hippocampal engagement increases as children initially improve their retrieval fluency but gradually decreases as simple arithmetic problems become less effortful and automated. We propose that the processes identified here are likely to be a crucial intermediate stage in complex developmental processes underlying memory consolidation to neocortical regions (Smith & Squire, 2009; Takashima et al., 2009; Frankland & Bontempi, 2005) and may serve as a critical neural mechanism for the well-documented behavioral changes in children's mix of problem-solving strategies across domains, not just in simple addition (Siegler, 1996).

### Increased PPC Activation with Retrieval Use

Retrieval fluency was also correlated with increased activation of the left and right angular gyrus. Lesion studies have consistently implicated the PPC as a critical region for numerical and mathematical information processing in adults (Nieder & Dehaene, 2009; Ansari, 2008; Menon et al., 2000). Specifically, in adults, greater use of retrieval has been associated with greater left angular gyrus response, whereas use of procedural strategies engaged more widespread regions in the bilateral IPS and the SPL (Grabner et al., 2009). Parallel studies of learning in which adults are trained on novel problem sets have also demonstrated differential response in the PPC, with reduced activation bilaterally in the intraparietal sulcus and greater responses in both the left and right angular gyrus for newly learned arithmetic facts (Ischebeck, Zamarian, Egger, Schocke, & Delazer, 2007; Delazer et al., 2005). Rivera and colleagues found that adults' arithmetic performance depends more on the anterior IPS and supramarginal gyrus than that of children (Rivera et al., 2005). Retrieval-related changes observed in 7- to 9-year-old children in this study are more posterior and bilateral to the locus of long-term changes identified by Rivera and colleagues. These results suggest that there may be multiple parietal loci undergoing developmental change at different points during the acquisition of strategic competence that eventually leads to memory consolidation. A complete analysis of developmental trajectories and their relation to retrieval, performance, and maturation will likely require longitudinal studies over a protracted period. The critical point here is that, like adults who show greater PPC activation compared with children (Kucian, von Aster, Loenneker, Dietrich, & Martin, 2008; Rivera et al., 2005; Kawashima et al., 2004), children who are more advanced in their transition toward reliance on memory-based problem solving also show greater PPC activations than other children. Taken together, these findings suggest that the PPC is a likely target of greater neocortical consolidation of arithmetic facts in children with high retrieval fluency.

### Both the VLPFC and DLPFC Contribute to Fluent Arithmetic Fact Retrieval in Children

An important finding of our study is that fluent arithmetic fact retrieval was associated with greater PFC response. Notably, linear increases with retrieval fluency were found in the VLPFC and DLPFC. No PFC regions showed decreases with retrieval fluency. Our findings suggest that PFC mechanisms contributing to controlled memory retrieval and inhibition of irrelevant information lead to better arithmetic fact retrieval, especially during early learning (Barrouillet & Lepine, 2005; Passolunghi & Siegel, 2004; McLean & Hitch, 1999). Consistent with this view, children who were highly proficient at retrieval showed greater left VLPFC activation than did proficient counters, even when both groups were closely matched on performance (both RT and accuracy; Cho et al., 2011). In this study, we found that, in addition to the left VLPFC, the left and right DLPFC also showed increased activation with higher retrieval fluency. This finding is consistent with behavioral studies suggesting that attentional resources can be important for mental arithmetic, even for simple, single-digit calculations, in typically developing children and adolescents, likely because of the need for goal-appropriate fact retrieval and inhibition of irrelevant facts (DeStefano & LeFevre, 2004; Kaufmann, Lochy, Drexler, & Semenza, 2004; Kaufmann, 2002; Logie, Gilhooly, & Wynn, 1994). We interpret these findings as reflecting greater involvement of PFC regions supporting cognitive control for memory processes as children's strategy mix shifts from predominantly counting to stable use of retrieval. On the other hand, children who predominantly rely on procedural strategies for arithmetic problem solving are thought to require working memory resources for the execution of serial procedures (Geary et al., 2004). Consistent with this idea, PFC responses related to mental arithmetic are known to decrease from childhood to adulthood (Rivera et al., 2005). However, we did not find decreases in activity in any subregion of PFC, perhaps because PFC responses related to sequential counting and working memory processes are more variable across children. Taken together, these observations indicate that, during this period of early learning in second and third grades, PFC regions supporting cognitive control may be engaged to a greater extent in children who are more advanced in their transition from counting to retrieval. Subsequently, when arithmetic facts become overpracticed and well learned after this transition period, cognitive requirements for controlled memory retrieval may be reduced, resulting in reduced demands on the VLPFC and DLPFC.

### Dynamic Interactions in Hippocampal–Prefrontal Circuits Involved in Arithmetic Fact Retrieval

Interactions between the MTL and PFC are thought to be important for declarative memory retrieval (Simons & Spiers, 2003), yet no previous studies have examined functional interactions underlying arithmetic fact retrieval in either children or adults. In this study, we present the first major steps in this direction. We identified functional circuits associated with the right hippocampus at the whole-brain level using PPI analysis of arithmetic task performance. The PPI analysis identified bilateral VLPFC and bilateral DLPFC as prominent loci of task-related connectivity with the right hippocampus. This result suggests that there is a strong functional interaction between multiple PFC regions and the hippocampus during arithmetic problem solving. Critically, these effects are independent of overall fluctuations in signal levels because PPI analysis regresses out regional responses and common task-related activations (Friston et al., 1997). Our findings are consistent with growing evidence that the VLPFC and DLPFC, together with the hippocampus, play an important role in memory retrieval (Badre & Wagner, 2007; Simons & Spiers, 2003; Dobbins, Foley, Schacter, & Wagner, 2002; Rugg, Fletcher, Chua, & Dolan, 1999). The relative contributions of dorsal versus ventral PFC regions is an important topic that warrants further investigation (Nagel, Schumacher, Goebel, & D'Esposito, 2008; Dobbins et al., 2002).

Although PPI analysis was useful for identifying VLPFC, DLPFC, and other brain areas that show greater task-related interactions with the hippocampus, it does not provide information about the direction and causal nature of interactions between these areas. Almost nothing is known about the causal interactions in hippocampal–prefrontal circuits involved in arithmetic fact retrieval and problem solving. To investigate this question, we identified ROIs in the right hippocampus and in three PFC regions—left VLPFC, left DLPFC, and right DLPFC. These PFC regions showed strong effects of retrieval fluency and were also found to have strong effective connectivity with the right hippocampus. Dynamic causal interactions between these nodes were investigated using MDS, which simultaneously estimates causal interactions between brain regions without having to compare multiple models (Ryali et al., 2011). MDS analysis revealed several interesting patterns of causal interactions between PFC and the hippocampus. First, MDS uncovered a strong top–down influence from the left VLPFC to the right hippocampus, consistent with previous studies in adults suggesting that this PFC region exerts goal-driven, selective retrieval of memory representations for words and pictures (Badre & Wagner, 2007; Badre, Poldrack, Paré-Blagoev, Insler, & Wagner, 2005). Second, significant causal hippocampal–prefrontal interactions were all lateralized to the left PFC. This left lateralization suggests that arithmetic fact retrieval and problem solving most likely engage verbal semantic processing of retrieval (Han, O'Connor, Eslick, & Dobbins, 2012; Whitney, Kirk, O'Sullivan, Lambon Ralph, & Jefferies, 2012; Snyder, Banich, & Munakata, 2011; Geary, 1993). There were direct, bidirectional, causal links between the right hippocampus and the left DLPFC, whereas the right DLPFC showed causal interactions with the left DLPFC but not directly with the right hippocampus. Third, in addition to the left VLPFC, the left DLPFC also exerted strong causal influences on the right hippocampus. Fourth, examining interactions in the reverse direction, we found that the right hippocampus exerted strong bottom–up interactions with both the left DLPFC and the left VLPFC. On the basis of these findings, we suggest that the hippocampus likely signals the left DLPFC for postretrieval evaluation and monitoring of retrieved memory representations (Rossi et al., 2010; Simons & Spiers, 2003) and the left VLPFC for retrieval support when additional cognitive control is needed to overcome interference or competition (Nagel et al., 2008). Disambiguating the temporal dynamics of these causal influences on a trial-by-trial basis remains an important problem for future research in all cognitive domains involving semantic fact retrieval. Taken together, these findings provide the first demonstration of dynamic causal hippocampal–prefrontal interactions involved in arithmetic fact retrieval in children. Our study draws attention to the essential role of controlled memory retrieval processes mediated by hippocampal–prefrontal interactions during a critical window of early learning characterized by shifts from reliance on counting to predominant use of fact retrieval associated with children's growing competence in arithmetic. The findings also have broader implications because the basic process of shifting from procedural- to retrieval-based problem solving has been found in all cognitive domains studied to date (Siegler, 1996). The implication is that the hippocampal–prefrontal system may be critical to children's cognitive development generally, not just addition fact retrieval.

### Limitations and Future Directions

Future studies would benefit from better matching of problems in the strategy assessment and the fMRI sessions. It would be a significant advance to be able to assess strategy use on a trial-by-trial basis during fMRI scanning to improve the correspondence between assessed strategy and measurements in the scanner. In addition, the use of arithmetic production during strategy assessment versus verification during fMRI scanning is another limitation of this study. In a study of children's performance, Ashcraft and colleagues (Ashcraft et al., 1984) found evidence for similar approaches to solving production and verification problems, but some of the dynamics of memory retrieval may differ, at least in adults, depending on whether arithmetic production or verification is required (Campbell & Tarling, 1996). A recent study in adults has demonstrated the feasibility of using arithmetic production tasks during fMRI scanning (Andres, Pelgrims, Michaux, Olivier, & Pesenti, 2011), suggesting the possibility of resolving this limitation in future neurodevelopmental studies.

Although our study provides insights into causal interactions in hippocampal and PFC circuits involved in fact retrieval, further studies are needed to determine how causal interactions between multiple brain regions collectively relate to retrieval success and performance.

### Conclusion

Our study provides important and novel insights into the brain systems and their dynamic interactions underlying individual differences in the mix of strategies children use to solve addition problems (Geary, 1994; Siegler & Shrager, 1984). We found increased recruitment of the anterior and posterior hippocampus, posterior PHG, PPC, and PFC for children whose strategy mix is dominated by retrieval. Functional connectivity and dynamic causal modeling provided the most detailed information to date about the top–down and bottom–up mechanisms underlying the functioning of hippocampal–prefrontal circuits that mediate retrieval. These mechanisms are not only important for a deeper understanding of how the hippocampus and PFC contribute to the development of fluent retrieval and arithmetic problem solving but may provide broader insights into the brain mechanisms contributing to shifts in the mix of problem-solving approaches that are a cardinal feature of cognitive development in general. Our findings provide a more mechanistic understanding of brain systems involved in the development of arithmetic fact retrieval. Moreover, the novel combination of methods employed to demonstrate the effective and causal interactions in the hippocampal–prefrontal network we examined represents an important analytical advance in the integration of brain dynamics with the extant behavioral developmental literature. More broadly, the mechanisms identified here may be central to children's knowledge acquisition across content domains.

## Acknowledgments

We thank Tianwen Chen, Sarah Wu, Jose Anguiano, Maria Barth, Leeza Kondos, Kaustubh Supekar, Kevin Holmes, Katherine Keller, Mary Hoard, Lara Nugent, and Georg Matt for assistance with the study. This work was supported by the National Institutes of Health (R01-HD047520, R01-HD045914, and R37-HD045914) and the National Science Foundation (BCS/DRL-0449927).

Reprint requests should be sent to Vinod Menon, 401 Quarry Rd., Stanford University, Stanford, CA 94305-5719, or via e-mail: menon@stanford.edu.