To investigate potentially dissociable recognition memory responses in the hippocampus and perirhinal cortex, fMRI studies have often used confidence ratings as an index of memory strength. Confidence ratings, although correlated with memory strength, also reflect sources of variability, including task-irrelevant item effects and differences both within and across individuals in terms of applying decision criteria to separate weak from strong memories. We presented words one, two, or four times at study in each of two different conditions, focused and divided attention, and then conducted separate fMRI analyses of correct old responses on the basis of subjective confidence ratings or estimates from single- versus dual-process recognition memory models. Overall, the effect of focussing attention on spaced repetitions at study manifested as enhanced recognition memory performance. Confidence- versus model-based analyses revealed disparate patterns of hippocampal and perirhinal cortex activity at both study and test and both within and across hemispheres. The failure to observe equivalent patterns of activity indicates that fMRI signals associated with subjective confidence ratings reflect additional sources of variability. The results are consistent with predictions of single-process models of recognition memory.
Two theories presently compete to provide a compelling account of recognition memory effects. Single-process theories derived from signal detection theory propose that recognition memory represents the total strength of evidence that a studied item is old (the unequal variance signal detection [UVSD] model; Wixted, 2007; Donaldson, 1996). According to dual-process theories, recognition memory depends on the products of two different processes—retrieval of contextual information associated with a study episode (“recollection”) and simply knowing that an item has been encountered beforehand (“familiarity”) (e.g., Yonelinas, 2002; Mandler, 1980). These two processes can be implemented as distinct sources of information contributing to categorical or threshold-like decisions (the “process-pure” or dual-process signal detection (DPSD) approach; Diana, Yonelinas, & Ranganath, 2010; Yonelinas, 2002) or as sources of continuous information to be combined into a single strength-of-evidence dimension (the dual-process UVSD account; e.g., Wixted, 2007; Wixted & Stretch, 2004).
The process-pure DPSD account has often been used to constrain the design of fMRI studies, presupposing the validity of the approach (Wixted, 2009). Experimental designs involving comparisons of high confidence “old” or “remember” responses with correct rejections of new items or misses have been used to demonstrate “selective” activation of the hippocampal formation attributable to recollection and show perirhinal cortex activation associated with less confident or “know” responses attributable to familiarity (e.g., Diana, Yonelinas, & Ranganath, 2007; Brown & Aggleton, 2001; Eldridge, Knowlton, Furmanski, Bookheimer, & Engel, 2000). However, proponents of the dual-process version of the UVSD model consider high confidence (and remember) decisions to reflect high proportions of both recollection and familiarity and low confidence (and know) decisions to reflect low proportions of the same, interpreting them simply as strong and weak memories, respectively. Thus, rather than identifying qualitatively different memory processes in the hippocampus and perirhinal cortex, these earlier studies are viewed as involving a memory strength confound—a criticism equally applicable from the perspective of the single-process UVSD account (e.g., Wais, Squire, & Wixted, 2010; Squire, Wixted, & Clark, 2007; see also Gonsalves, Kahn, Curran, Norman, & Wagner, 2005).
An alternative design adopted by DPSD theorists in fMRI studies involves comparisons of either associative or source memory and item memory decisions. Successfully identifying an item as old and correctly identifying the source or associate is often attributed to recollection, whereas successful retrieval of item information alone is attributed to familiarity, although an alternative perspective attributes the latter to recollection of sometimes erroneous or irrelevant information (see Mitchell & Johnson, 2009). Again, the principal finding has been one of selective activation of the hippocampus for the former process (e.g., Eichenbaum, Yonelinas, & Ranganath, 2007; Mayes, Montaldi, & Migo, 2007; Kensinger & Schacter, 2006). Despite matching their conditions in terms of correct old responses, few studies accounted for the potential memory strength confound of confidence judgements being typically higher for old items that are accompanied by retrieval of correct source/associate information (see Squire et al., 2007). Recent studies attempting to control for memory strength in these designs have yielded different results with respect to familiarity signals in the hippocampus and perirhinal cortex (e.g., Diana et al., 2010; Wais et al., 2010; Kirwan, Wixted, & Squire, 2008; Staresina & Davachi, 2008).
Typically, the rationale given for using confidence ratings as an index of memory strength is that they are related to accuracy in old/new decisions, and it is often assumed that such ratings are equally spaced and linearly related to differences in memory strength (e.g., Rouder, Pratte, & Morey, 2010; Mickes, Wixted, & Wais, 2007). Yet item recognition memory receiver operating characteristic curves obtained with confidence ratings can differ from those obtained with Remember-Know (RK) decisions, a finding attributed to trial-to-trial variability in the application of decision criteria within participants as well as variability among participants (Malmberg & Xu, 2006; Rotello, Macmillan, & Reeder, 2004; Wixted & Stretch, 2004).1 This variability or decision noise inherent in separating strong from weak memories can also influence the form of receiver operating characteristic curves, changing them from U-shaped to inverted U-shaped (e.g., Malmberg & Xu, 2006; Malmberg, 2002; Ratcliff, McKoon, & Tindall, 1994). Further, such variability may actually increase when participants are instructed to evenly distribute or use all available ratings (e.g., Yonelinas, Otten, Shaw, & Rugg, 2005). As Rouder et al. (2010) have noted, there may be no objective method for assessing the relative variability of memory strength distributions on the basis of confidence ratings. Comparisons of high and low confidence ratings can also elicit medial-temporal lobe (MTL) activity for unstudied (i.e., new) items in fMRI studies (Kirwan, Shrager, & Squire, 2009).
Task-irrelevant item differences can also be confounded with memory strength when participants are allowed to self-select items via ratings or RK decisions. For example, recognition memory strength for words is known to be correlated with a range of lexical variables, such as frequency, imageability, age of acquisition, neighborhood size, and so forth, and it is known that these variables can influence fMRI activity in the hippocampus (e.g., Freeman, Heathcote, Chalmers, & Hockley, 2010; Diana & Reder, 2006; Fliessbach, Weis, Klaver, Elger, & Weber, 2006; de Zubicaray, McMahon, Eastburn, Finnigan, & Humphreys, 2005a, 2005b; Malmberg, Holden, & Shiffrin, 2004). Thus, as low-frequency words are better recognized than high-frequency words, they are more likely to be given high confidence ratings, and high-frequency words are more likely to be given low confidence ratings. Words of high imageability are more likely to be given high confidence ratings. Any differences in fMRI signal, then, may also be a consequence of frequency or imageability differences when words are used as stimuli (e.g., Diana et al., 2010; Wais et al., 2010; Uncapher & Rugg, 2008; Yonelinas et al., 2005; Kensinger, Clarke, & Corkin, 2003). Given the sources of variability introduced by confidence ratings, it seems reasonable to conclude that the method is less than optimal for indexing memory strength.
An alternative to experimental designs that use subjectively derived responses to index memory strength involves manipulating memory strength signals directly in terms of predefined independent variables. To manipulate recognition memory strength in our fMRI experiment, we incremented the number of item presentations at study in two different conditions, focused and divided attention, to permit comparisons of identical old responses (hits) across conditions. Typically, item repetition at study strengthens memory representations, and divided attention produces an interference effect because of competition for central resources that results in poorer memory performance (Fernandes & Moscovitch, 2000; Craik, Govoni, Naveh-Benjamin, & Anderson, 1996; Crowder, 1976). The attentional resources devoted to the study of repeated spaced items are about the same compared with the first presentation because measures of secondary task performance remain relatively stable across presentations (only decreasing with massed repetitions, e.g., Johnston & Uhl, 1976; also Guez & Naveh-Benjamin, 2006).
Process-pure DPSD theorists consider divided attention at study to reduce recollection to a greater extent than familiarity (Uncapher & Rugg, 2005, 2008; Eichenbaum et al., 2007; Kensinger et al., 2003; Yonelinas, 2001, 2002). Previous fMRI studies conducted from a DPSD perspective therefore contrasted correct old responses in focused versus divided attention conditions at study, revealing increased activity in the hippocampus during word encoding that the authors collectively attributed to recollection (e.g., Uncapher & Rugg, 2005, 2008; Kensinger et al., 2003). Of note, one study in which participants self-selected studied items via RK decisions revealed posterior hippocampal activity (Uncapher & Rugg, 2008; recollected vs. missed items), whereas those comparing undifferentiated hits across conditions reported more anterior hippocampal activity (e.g., Uncapher & Rugg, 2005; Kensinger et al., 2003). The results of the latter two investigations may also be interpreted from a UVSD perspective as simply reflecting differences in memory strength between the study conditions.
The effect of item repetition is less clear-cut from the DPSD model perspective. For example, behavioral evidence indicates that repeating spaced items increases estimates of both recollection and familiarity, with a slightly larger effect for the former process (for a review, see Yonelinas, 2002). Yet DPSD proposals based around perirhinal cortex and hippocampal neuronal responses view item repetition as having the opposite effect, influencing familiarity to a greater extent than recollection (e.g., Eichenbaum et al., 2007, Table 1). The UVSD account instead predicts monotonic, although not necessarily linear, responses (e.g., Dunn, 2004, 2008; Squire et al., 2007; Wixted, 2007; Wixted & Stretch, 2004).
In the present study, we investigated the effects of allowing participants to self-select studied items via confidence ratings on fMRI activity in the hippocampus and perirhinal cortex, contrasting the effects with those obtained from analyses informed by both UVSD and DPSD model estimates. If the patterns of activity differ between analysis types at study and test, then this can be considered evidence that previous attempts to dissociate familiarity and recollection signals in the hippocampus and perirhinal cortex likely reflect variability introduced by the application of different decision criteria, item effects, or other task-irrelevant confounds.
Sixteen healthy, right-handed, native English-speaking volunteers (8 women) of mean age 25.6 years (SD = 7.0 years) were recruited from the university community. All gave written informed consent before participating in accordance with the protocol approved by the Medical Research Ethics Committee of the University of Queensland. They were reimbursed for participating.
The critical stimuli comprised 240 high-frequency monosyllabic words, ranging in length from four to six letters. These were assigned randomly to five study-test lists. Each study list consisted of 24 words, and each test list consisted of the 24 studied items and 24 novel, unstudied words. The practice stimuli comprised another 12 high-frequency monosyllabic words ranging in length from four to five letters (6 studied items and 6 novel items).
Five study-test phases were conducted, each separated by a brief (25-sec) retention interval. Instructions as to the nature of each phase were given at the start of the block with “learn” or “remember” appearing for 6 sec. Each study phase consisted of 24 words presented in uppercase 48-point font in the middle of the screen, one, two, or four times for a total of 56 trials. Study words were presented for 3 sec followed by a blank screen, with an SOA of 6 sec. Each word was preceded by a central fixation cross for 0.5 sec, followed by a blank screen for 0.5 sec. Participants were instructed to remember the words for a subsequent memory test.
Study words were presented either in isolation as above (focused attention condition) or as part of a divided attention task (adapted from Zeithamova & Maddox, 2006). In the divided attention task condition, a study word was presented concurrently with two flanking digits presented to the left and right of the word for 250 msec. The two digits were always different from each other and did not include zero, with one digit being randomly small (font size = 40 points) and the other large (font size = 80 points). The study word remained in view until the end of 3 sec after which the question “size?” or “value?” appeared randomly, requiring the participant to indicate which digit (left or right, respectively) was physically larger or had the greater numerical value. Participants responded with a button press to indicate their choice using their right hand.
Each test phase comprised the 24 studied words and the 24 new (i.e., unstudied) words presented in pseudorandom order. To minimize study-test repetition lag variability, all studied words were presented in the same third of the test list as at study. Each test word was presented for 2.5 sec with an SOA of 6.5 sec. Participants were instructed to withhold their response during this period (i.e., until the word disappeared from the screen). Next the categories “certainly new,” “probably new,” “probably old,” and “certainly old” were presented together, in a cross formation around the center of the screen, for up to 2 sec. This served as a prompt to respond and to indicate which button should be pressed for a given response. Participants responded by pressing one of four buttons corresponding to their decision on a similarly arranged response pad using their right hand. They were instructed to adopt response criteria that enabled them to use each of the categories more or less equally. The selected label changed color to red for 0.2 sec to provide response feedback, and a blank screen was presented for the remainder of the 2-sec period.
Before scanning, participants completed a brief practice session comprising focused and divided attention study and test phases (six studied items and six novel items). Participants were given feedback concerning their accuracy at the end of each phase.
A laptop PC running the Microsoft VisualBasic and the ExacTicks (Ryle Design) software was used to deliver the word stimuli and to record responses from an MR-compatible four-button response box. Stimuli were presented in black on a luminous white background, enlarged and back-projected using a BenQ SL705X projector onto a screen that the participants viewed through a mirror mounted on the head coil. The stimuli subtended approximately 10° of visual arc when each participant was positioned for imaging.
Participants were imaged with a Bruker Medspec system operating at 4 T using a transverse electromagnetic head coil for radio-frequency transmission and reception (Vaughan et al., 2002). A gradient-echo EPI sequence optimized for both image quality and noise reduction (McMahon, Pringle, Eastburn, & Maillet, 2004) was used to acquire T2*-weighted images depicting BOLD contrast (64 × 64 matrix, 3.6 × 3.6-mm voxels). In each of five consecutive fMRI sessions, 330 image volumes of 36 axial 3.5-mm slices (0.1-mm gap) were acquired (repetition time = 2.1 sec, echo time = 30 msec, flip angle = 90°), for a total of 1650 images. The first five volumes from each session were discarded. Head movement was limited by foam padding within the head coil. A point-spread function (PSF) mapping sequence was acquired before the EPI time series acquisitions to correct geometric distortions (Zaitsev, Hennig, & Speck, 2003). A three-dimensional T1-weighted image was acquired using a magnetization-prepared rapid acquisition gradient-echo sequence (2563 matrix, 0.9-mm3 voxels) before the fourth fMRI session. Total imaging time was approximately 45 min.
Image preprocessing and analysis were conducted with statistical parametric mapping software (SPM8; Wellcome Department of Imaging Neuroscience, Queen Square, London). All volumes from the five study-test sessions were resampled using generalized interpolation to the acquisition of the middle slice in time to correct for the interleaved acquisition sequence, then realigned to the first volume of the initial session using the INRIAlign toolbox (Freire, Roche, & Mangin, 2002). A mean image was generated from the realigned series and coregistered to the T1-weighted image. The T1-weighted image was subsequently segmented using the “New Segment” procedure in SPM8. The “DARTEL” toolbox (Ashburner, 2007) was then used to create a custom group template from the gray and white matter images and individual flow fields that were used to normalize the realigned fMRI volumes to the MNI atlas T1 template. The resulting images were resampled to 2-mm3 voxels and smoothed with an 8-mm FWHM isotropic Gaussian kernel. Global signal effects were then estimated and removed using a voxel-level linear model (Macey, Macey, Kumar, & Harper, 2004).
We conducted a two-stage, mixed effects model statistical analysis. For both study and test phases, trial types corresponding to correct old responses and misses were defined for each of the divided and focused attention conditions for each presentation (1, 2, or 4). Correct rejections and false alarms were also defined as trial types for the test phase. These were modeled as effects of interest with delta functions representing each onset, along with a nuisance regressor consisting of response onsets, and convolved with a synthetic hemodynamic response function and accompanying temporal and dispersion derivatives. Standard high (1/128 Hz) and low-pass filtering with an autoregressive (AR1) model were applied. Linear contrasts were applied to each participant's parameter estimates at the fixed effects level, for correct old responses in the focused versus divided attention conditions for each presentation, then entered in a group-level repeated measures ANOVA in which covariance components were estimated using a restricted maximum likelihood procedure to correct for nonsphericity (Friston et al., 2002).
A priori ROIs for the hippocampus and the perirhinal cortex were defined in each hemisphere as explicit masks for all analyses using labeled probabilistic maps from the atlases provided by Eickhoff et al. (2005) and Holdstock, Hocking, Notley, Devlin, and Price (2009), respectively. A height threshold of p < .005 was adopted following previous studies (e.g., Diana et al., 2010) in conjunction with a cluster threshold of p < .05 estimated for each ROI using a Monte Carlo estimation procedure with 10,000 simulations (alphasim, implemented in Analysis of Functional NeuroImages toolkit, AFNI; National Institute of Mental Health, Bethesda, MD).
Overall, the participants demonstrated excellent recognition memory performance (“certainly old” mean hit rate = 0.75; “probably old” mean hit rate = 0.11), with hit rates approaching ceiling in the focused attention condition following four item presentations. The mean false alarm rate was 0.16. We fit both UVSD (see Dunn, 2004) and DPSD (Yonelinas, 1994) models to each participant's full set of responses (across four response categories) separately using maximum likelihood estimation to estimate contributions of either single strength-of-evidence dimension or familiarity and threshold-like recollection processes in the different conditions (see Figure 1). One participant's data were unable to fit with any model because of an insufficient number of “probable” responses, whereas another was excluded because of issues with their imaging data (see next section). Hence, the analyses presented here are from the 14 remaining participants. A repeated measures ANOVA on the memory sensitivity (da)2 values derived from the UVSD model using both study condition and item presentations as within subject variables revealed main effects of both Attention, F(1, 15) = 7.87, MSE = 2.42, p < .05, and Presentations, F(1, 14) = 8.46, MSE = 2.28, p < .005. The interaction was not significant, F(1, 14) = 2.19, MSE = 1.82, p > .05. Inspection of Figure 1A shows that the da values increase as the number of study presentations increases in each attention condition, being higher in the focused attention condition overall. A similar ANOVA on the variance parameter estimates (s) failed to reveal main effects of Attention, F(1, 15) = 0.04, MSE = 1.19, p > .05, or Presentations, F(1, 14) = 1.01, MSE = 1.56, p > .05, although the interaction was significant, F(1, 14) = 5.29, MSE = 0.76, p < .05 (Figure 1C).
We conducted a similar ANOVA on the DPSD model estimates of recollection (Figure 1E). This likewise revealed main effects of both Attention, F(1, 15) = 6.42, MSE = 0.05, p < .05, and Presentations, F(1, 14) = 35.97, MSE = 0.01, p < .001. The interaction was again not significant, F(1, 14) = 0.86, MSE = 0.02, p > .05. Inspection of Figure 1E shows that the estimates of recollection in each attention condition increase as the number of study presentations increases, with recollection being higher in the focused attention condition overall. This is consistent with previous behavioral data on item repetition and recollection estimates (see Yonelinas, 2002). However, it is not predicted by more recent dual-process cognitive neuroscience proposals distinguishing between perirhinal cortex (familiarity) and hippocampus (recollection) responses, in which the latter are considered relatively insensitive to item repetition (e.g., Eichenbaum et al., 2007). An ANOVA on the DPSD model estimates of familiarity revealed a main effect of Attention, F(1, 15) = 10.23, MSE = 0.14, p < .01, although not of Presentations, F(1, 14) = 1.68, MSE = 0.21, p > .05. The interaction was significant, F(1, 14) = 8.88, MSE = 0.13, p < .005 (Figure 1G). Again, this pattern is not predicted by more recent dual-process cognitive neuroscience proposals in which perirhinal cortex (familiarity) responses are considered more sensitive to item repetition (e.g., Eichenbaum et al., 2007; Brown & Aggleton, 2001).
Although both the UVSD sensitivity and the DPSD model recollection estimates showed a similar pattern of increasing across conditions and presentations, analyses of the differences between the respective model estimates for focused and divided attention conditions revealed a quite different pattern of results (Figure 1). This was accomplished for each model via the regression analysis for repeated measures data recommended by Lorch and Myers (1990, Method 3, p. 153).3 A linear regression analysis was computed for each individual participant's model estimate difference, with number of study presentations as predictor variable. In a final step, a t test was performed to test whether the regression weights of the group differed significantly from zero. For the UVSD model sensitivity difference estimates, the strength coefficient approached significance, t(13) = 2.01, SE = 0.23, p = .065, with the difference in memory sensitivity between focused and divided attention conditions showing a numerical increase across study presentations (Figure 1B). A significant linear effect was found for the difference in variance parameter estimates, t(13) = 2.78, SE = 0.17, p < .05 (Figure 1D). By contrast, the difference in recollection estimates between attention conditions remained relatively stable across study presentations according to the DPSD model, t(13) = 0.63, SE = 0.02, p = .54 (Figure 1F), whereas the difference in familiarity estimates between attention conditions showed a significant linear reduction, t(13) = −3.36, SE = 0.08, p < .01 (Figure 1H).
fMRI Analyses: Study Phase
The imaging data from one participant were excluded because of problems encountered during the T1 image segmentation step. Analyses were thus conducted on the data from the remaining 15 participants and were designed to compare fMRI signals correlated with subjective estimates of memory strength derived from the confidence ratings and the more objective UVSD and DPSD model estimates.
We first conducted an analysis on only the high confidence (“certainly old”) old responses, contrasting them with responses to studied items that were not recognized (i.e., misses) in each condition (focused versus divided attention). This “subsequent memory” analysis is analogous to those performed to identify recollection-related activity in previous studies of focused and divided attention conducted from a DPSD perspective (e.g., Uncapher & Rugg, 2008). As the analysis is restricted to high confidence responses, it does not include a memory strength confound when comparing attention conditions proposed to differ in terms of recollection (cf. Wais et al., 2010). An ANOVA revealed significant activity in both left posterior and right middle hippocampus (peak −22, −22, −18, Z = 4.40, 278 voxels; peak 24, −10, −20, Z = 4.40, 203 voxels, respectively) and right perirhinal cortex (peak 30, 8, −22, Z = 3.23, 42 voxels). Of note, all three peaks evidenced an increase in activity in the focused condition and a reduction in the divided attention condition (see Figure 2A–C). No significant activity was observed in perirhinal cortex in the left hemisphere.
We next measured subsequent memory strength activity by directly contrasting correct old responses (hits) in focused versus divided attention conditions at study at each level of presentation (i.e., once, twice, or four times). According to the UVSD model sensitivity estimates, this analysis should reveal a pattern of increasing responses (Figure 1B). According to the DPSD model recollection estimates (Figure 1F), the analysis should reveal little or no difference across study presentations in hippocampal activity. The DPSD model familiarity estimates (Figure 1H) additionally predict a linear decrease in responses selectively in the perirhinal cortex. An ANOVA revealed significant activity in the left middle hippocampus, corresponding to the main effect of study presentations (peak voxel −22, −26, −20, Z = 3.36, 39 voxels; see Figure 2D). No significant activity was observed in the right hippocampus or in the perirhinal cortex in either hemisphere.
As the ANOVA makes no assumption about the shape of response across item presentations and may be significant if only one of the means of the item presentation conditions is different from the others, we therefore regressed the left middle hippocampal memory activity on the number of study presentations and tested the reliability of the response. This was accomplished via the regression analysis for repeated measures data recommended by Lorch and Myers (1990, Method 3, p. 153). This involved first extracting, for all participants, the beta values for focused versus divided attention contrasts at each study presentation from the peak voxel identified earlier. In a second step, a linear regression analysis was computed for each individual participant with number of study presentations as predictor variable. In a final step, a t test was performed to test whether the regression weights of the group differed significantly from zero. A significant positive linear relationship can be observed in Figure 2C, t(14) = 2.84, SE = 1.52, p < .05.
fMRI Analyses: Recognition Phase
For the test phase data, we again conducted separate analyses for the subjective confidence- versus model-based contrasts. The first analysis was conducted on the correct “certainly old” responses across study presentations, contrasting high confidence activity (or recollection) in focused and divided attention conditions at test with activity associated with misses (e.g., Uncapher & Rugg, 2008). An ANOVA failed to detect any significant activity in either the hippocampal or the perirhinal cortex ROIs across hemispheres. We next examined the pattern of activity separately for each condition directly using t contrasts. The right posterior hippocampus showed significantly greater activity for high confidence items studied under divided attention compared with missed items (peak 14, −34, 0, Z = 3.87, 41 voxels), whereas high confidence items studied with focused attention compared with missed items showed a nonsignificant trend toward greater activity in the identical peak (Z = 3.13, 11 voxels, p > .05 cluster thresholded) (see Figure 3A). No significant activity was observed in the left hippocampus or in the perirhinal cortex in either hemisphere for each contrast.
The above analyses using subjective confidence as a measure of recollection or memory strength indicate that hippocampal activity was relatively insensitive to the manipulation of attention. We next examined patterns of activity by again contrasting focused versus divided attention conditions using all items/trials with correct old responses (hits) at each presentation. As per the study phase data, the UVSD model sensitivity estimates predict a pattern of increasing responses (Figure 1B). However, the DPSD model recollection estimates (Figure 1F) predict little or no difference across study presentations in hippocampal activity, and a pattern of decreasing responses in the perirhinal cortex associated with the familiarity estimates (Figure 1H). An ANOVA revealed significant activity in the left posterior hippocampus (peak voxel −22, −34, −10, Z = 3.26, 44 voxels) and perirhinal cortex ROIs (peak voxel −38, −12, −28, Z = 3.33, 41 voxels), corresponding to the main effect of study Presentations (see Figure 3). No significant activity was observed in the right hemisphere in either ROI.
Next, we performed regression analyses for repeated measures data for the identified left hippocampal and perirhinal memory strength responses per the method applied for the study phase data discussed in the previous paragraph (Lorch & Myers, 1990). For the left hippocampal response, a significant positive linear relationship can be seen in Figure 3B, t(14) = 3.72, SE = 4.29, p < .005. A significant positive linear relationship can also be observed in the left perirhinal cortex, t(14) = 3.54, SE = 3.18, p < .005 (Figure 3C).
Finally, we compared the coefficients and slopes of the hippocampal and perirhinal memory strength responses to determine whether they differed significantly. The two coefficients did not differ significantly according to Steiger's Z test for correlated coefficients, Z = 0.45, p > .05 (Meng, Rosenthal, & Rubin, 1992), indicating both regions accounted for a similar amount of variance in terms of memory strength responses. Similarly, a t test comparing the two slopes indicated they did not differ significantly (t = 1.39, p > .05).
The present experiment investigated whether hippocampal and perirhinal cortex recognition memory fMRI signals associated with subjective confidence ratings differ from those related to UVSD and DPSD model estimates. We manipulated memory strength directly in a word recognition task by incrementing item presentations across focused and divided attention conditions at study, finding qualitatively different patterns of activity in the hippocampus both at study and at test for the two analysis types. A similar relationship with memory strength was observed for the perirhinal cortex, although only at test. Overall, our results are consistent with predictions from single-process UVSD recognition memory models and indicate that recognition memory fMRI measures derived from confidence ratings, while correlated with memory strength, are also likely to reflect unwanted item or decision-related confounds.
Consistent with prior behavioral research, the effect of focussing attention on repeated spaced items at study was to increase recognition memory performance across presentations (e.g., Guez & Naveh-Benjamin, 2006; Johnston & Uhl, 1976; Homa & Fish, 1975). Both the UVSD and the DPSD model estimates of sensitivity and recollection respectively showed enhanced effects of focussing attention and repeating items at study, confirming the experimental manipulation of memory strength. The relatively stronger effect of repeating items on the DPSD model's estimate of recollection compared with familiarity is also consistent with prior work (e.g., Yonelinas, 2002). However, the estimates from the two models diverged when focused and divided attention conditions were contrasted directly, with the UVSD model's strength estimate increasing across study presentations, whereas the DPSD model's recollection estimate remained relatively stable and its familiarity estimate decreased. We therefore used these opposing estimates to inform our fMRI analyses. Other recent studies have similarly used the DPSD model to inform their fMRI analyses, although it did not derive estimates from the UVSD model for comparison (e.g., Diana et al., 2010).
At both study and test, the hippocampus showed different (i.e., spatially distributed) responses according to whether signals were associated with subjective confidence or contrasts informed by UVSD and DPSD model estimates. At study, the posterior hippocampus showed activity bilaterally when high confidence responses were contrasted with misses between focused and divided attention conditions, consistent with the findings of a recent fMRI study that used a similar subjective response-based contrast (Uncapher & Rugg, 2008). Of note, activity across hemispheres was increased in the focused condition and reduced in the divided attention condition. At test, the right posterior hippocampus also showed activity associated with high confidence responses relative to misses, although this did not differ between attention conditions. This latter result contrasts with the significant main effect of attention shown by the DPSD model recollection estimates and might indicate that fMRI measures of recollection on the basis of confidence ratings are not necessarily equivalent to model estimates. Moreover, it might indicate that activity associated with confidence ratings at study is not necessarily related to that observed at test, an issue not considered by previous studies (e.g., Uncapher & Rugg, 2008). Alternatively, the result may be considered consistent with that of Wais et al. (2010), who recently failed to find differential hippocampal activity for a contrast of high confidence responses with correct and incorrect source attributions, interpreted as supporting a memory strength account. According to the DPSD approach, contrasts of focused versus divided attention and correct versus incorrect source attribution conditions should be analogous in terms of identifying recollection (e.g., Eichenbaum et al., 2007; Yonelinas, 2002).
Conversely, the analysis informed by the model estimates demonstrated BOLD signal differences between conditions increased across item presentations at both study and test in the middle and posterior hippocampus, respectively, and solely in the left hemisphere. This finding, predicted by the UVSD model, is consistent with the operation of a total strength-of-evidence variable (e.g., Wixted, 2007; Dunn, 2004) and is contrary to the DPSD model prediction of a relatively stable response for recollection across item presentations. However, as the DPSD model recollection estimate essentially predicted a null effect, we cannot exclude the possibility that the lack of significant activity observed in adjacent hippocampal voxels in fact reflects the operation of recollection. For the same reason, we cannot exclude the possibility that the lack of significant activity observed in adjacent perirhinal cortex voxels (see succeeding paragraphs) likewise reflects the operation of recollection, although the DPSD account views the latter regions as being selectively associated with familiarity (e.g., Eichenbaum et al., 2007; Brown & Aggleton, 2001). Nevertheless, the fact that the confidence- and model-based analyses produced different hippocampal peak responses both within and across hemispheres indicates that the two approaches to measuring memory strength are not equivalent. We propose that this lack of equivalence likely reflects the item and decision process confounds that we highlighted in the Introduction.
We were unable to detect any fMRI responses in the perirhinal cortex at study that had a significant relationship with either model-based analysis, although the confidence-based analysis did show significant activity in the right hemisphere between attention conditions. Several fMRI studies using confidence ratings have reported perirhinal cortex activation during study related to successful item recognition in the context of item versus source/associate recognition memory tasks, and this has been proposed to reflect processes that might contribute to later recollection, such as binding of related item features (e.g., item-color associations; Staresina & Davachi, 2008; Ranganath et al., 2003) or source information encoded as an item detail (Diana et al., 2010). Hence, this result might be considered consistent with the DPSD model view, as contrasts of focused versus divided attention and correct versus incorrect source attribution conditions should produce similar results in terms of recollection (e.g., Eichenbaum et al., 2007; Yonelinas, 2002). In the model-based analyses, the DPSD model estimate of familiarity predicted a reduced response across item presentations between attention conditions, whereas the UVSD model estimate predicted the opposite relationship. Neither pattern was found in perirhinal cortex at study. Similarly, investigations of item memory comparing conditions varying in strength have typically not reported perirhinal cortex activity at study for successful word recognition, although have reported hippocampal activation (e.g., low- vs. high-frequency words, de Zubicaray et al., 2005a; easy versus hard divided attention, Uncapher & Rugg, 2005; Kensinger et al., 2003).
The absence of perirhinal cortex activity associated with high confidence responses at test is consistent with previous work conducted from the DPSD perspective (e.g., Yonelinas et al., 2005) and indicates that confidence-related perirhinal cortex activity observed at study does not necessarily entail retrieval-related activity in the same region. However, perirhinal cortex did demonstrate a pattern of increasing activity between conditions and across presentations at test in relation to the model-based contrasts. This was consistent with the UVSD model prediction and inconsistent with the DPSD model's familiarity estimate prediction of reduced activity. However, on the basis of statistical comparisons of the coefficients and slopes for the hippocampus and perirhinal cortex activity, our results indicate that the two MTL regions show equivalent strength responses, arguing against an anatomical division of labor in terms of different types of information processing. They also do not support an interpretation that hippocampal activity can be considered the primary determinant of memory strength (cf. Wixted, 2007).
The finding of equivalent memory strength responses in hippocampal and perirhinal cortex, although able to be interpreted as consistent with the dual-process UVSD approach, nevertheless contrasts it with the simpler single-process account. The dual-process UVSD account proposes that continuously distributed recollection and familiarity processes can be combined into a single, unidimensional memory strength variable (e.g., Wixted, 2007), whereas the single-process UVSD account simply proposes a total strength-of-evidence dimension. If the hippocampus and the perirhinal cortex demonstrate equivalent memory strength responses in terms of slope (cf. Squire et al., 2007), as is the case in the present study, then the value of proposing qualitatively different recollection and familiarity processes seems limited. The single-process UVSD approach does not make this additional assumption.
Another possible interpretation of our results is that the higher overall hit rates in the focused attention condition might be responsible for the BOLD signal differences observed at each level of item/memory strength. This seems unlikely because the difference in hit rates between focused and divided attention conditions actually decreased as item presentations increased, the opposite of the relationship observed. To address this possibility, we equated the hit rates for focused and divided conditions at each level of item presentation within participants in a post hoc analysis. The beta values extracted from the peak voxels in the hippocampus and perirhinal cortex again showed equivalent increasing responses in relation to the UVSD model estimates at both study and test.
The present fMRI study contrasted patterns of fMRI activity in the hippocampus and perirhinal cortex associated with subjective confidence ratings and contrasts informed by UVSD and DPSD model estimates at both study and test. The two types of analysis produced qualitatively different patterns of activity. Critically, the model-based analysis did not reflect sources of variability associated with item confounds or intra- and interindividual differences in separating weak from strong memories. This analysis revealed increasing responses in the hippocampus at both study and test and at test in the perirhinal cortex. Although consistent overall with a single-process model account, the results may be interpreted as providing only partial support for the DPSD model view or for an account that considers hippocampal activity to be the primary measure of memory strength. The nature of the relationship(s) between fMRI measures of MTL activity and memory strength requires further investigation, and this may be furthered by explicitly contrasting predictions derived from UVSD and DPSD models.
This study was supported by a Discovery Project grant (DP0878630) from the Australian Research Council (ARC) awarded to J. D., S. D., and G. Z. G. Z. was supported by an ARC Future Fellowship. The authors are grateful to two anonymous reviewers for their helpful comments on the manuscript.
Reprint requests should be sent to Greig I. de Zubicaray, School of Psychology, University of Queensland, St. Lucia, Brisbane, Queensland 4072, Australia, or via e-mail: firstname.lastname@example.org.
Similarly, the flatter and more linear ROC curves obtained from source or associative memory tasks appear to be an artifact of variability and averaging across different levels of subjective memory strength rather than indicating these tasks are more dependent on recollection (Hautus, Macmillan, & Rotello, 2008; cf. Eichenbaum et al., 2007).
The da sensitivity measure from the UVSD model differs from the conventional d′ measure by permitting the variances of the old and new distributions to differ (Macmillan & Creelman, 2005). The values can be interpreted similarly.
The critical distinction between Method 3 and the other two methods proposed by Lorch and Myers (1990) is that it computes the Subject × Linear term, with Mean Square (Subject × Linear) used as the error term to test Mean Square (linear), which is appropriate for our data.