## Abstract

People with superior face recognition have relatively thin cortex in face-selective brain areas, whereas those with superior vehicle recognition have relatively thick cortex in the same areas. We suggest that these opposite correlations reflect distinct mechanisms influencing cortical thickness (CT) as abilities are acquired at different points in development. We explore a new prediction regarding the specificity of these effects through the depth of the cortex: that face recognition selectively and negatively correlates with thickness of the deepest laminar subdivision in face-selective areas. With ultrahigh resolution MRI at 7T, we estimated the thickness of three laminar subdivisions, which we term “MR layers,” in the right fusiform face area (FFA) in 14 adult male humans. Face recognition was negatively associated with the thickness of deep MR layers, whereas vehicle recognition was positively related to the thickness of all layers. Regression model comparisons provided overwhelming support for a model specifying that the magnitude of the association between face recognition and CT differs across MR layers (deep vs. superficial/middle) whereas the magnitude of the association between vehicle recognition and CT is invariant across layers. The total CT of right FFA accounted for 69% of the variance in face recognition, and thickness of the deep layer alone accounted for 84% of this variance. Our findings demonstrate the functional validity of MR laminar estimates in FFA. Studying the structural basis of individual differences for multiple abilities in the same cortical area can reveal effects of distinct mechanisms that are not apparent when studying average variation or development.

## INTRODUCTION

Brain volumetrics are behaviorally relevant, task-independent biomarkers that can be studied in terms of their relation to visual abilities. Those abilities depend on both genetic and environmental factors, as well as on the developmental state of the brain. Importantly, many structural biomarkers can be measured independent of behavior and are therefore more likely to reflect stable variability across individuals rather than performance during a specific task. The relation between brain structure and behavioral abilities can provide a useful window into the underlying forces that culminate into individual differences.

The current work builds on our previous finding that differences in total cortical thickness (CT) of the same brain area in the same people predict different abilities in opposite directions: Vehicle recognition (VR) was associated with a relatively thicker fusiform face area (FFA), whereas face recognition (FR) was associated with a relatively thinner FFA (McGugin, Van Gulick, & Gauthier, 2016). The current study extends that previous work, using ultrahigh resolution (UHR) imaging to probe laminar thickness differences underlying the differences in total CT, and their relationship to behavioral abilities.

When we reported the surprising pattern of FFA CT variations across individuals, we suggested that those opposing results (thinner CT associated with better FR, thicker CT associated with better VR, all within the same region of the brain) could be explained by differences in the developmental state (plasticity) of the brain at the time these visual skills were learned (McGugin et al., 2016). This was predicted because individuation for exemplars in the categories used, faces and vehicles, were likely learned at different times in development. It seemed plausible that learning occurring at different times in development would have different effects on underlying brain structure, with some of these effects being selective to specific cortical layers. In other words, our predictions are based on intuitions relevant to two different fields: the development of behavioral recognition abilities across domains and the development of the associated cortical underpinnings. These are two areas of study that are difficult to study, but we based our conjectures on the following findings.

First, we predicted that observed differences in brain correlates of FR and VR in adults may reflect differential ages of acquisition for these abilities (McGugin et al., 2016). There is growing evidence that faces dominate the visual diet of infants. Three-month-old babies see faces for 25% of their waking time, an amount reduced to 10% by 12 months (Sugden & Moulson, 2019; Fausey, Jayaraman, & Smith, 2016), and early face exposure has been found to be critical to the development of face selectivity (Arcaro, Schade, Vincent, Ponce, & Livingstone, 2017). The age at which infants learn to individuate objects such as vehicles is less clear (Clerkin, Hart, Rehg, Yu, & Smith, 2017). Although face selectivity continues to develop during adolescence (Gomez et al., 2017; Scherf, Thomas, Doyle, & Behrmann, 2014) and FR continues to improve in adults (Germine, Duchaine, & Nakayama, 2011), it is reasonable to assume that FR also starts earlier than VR.

Second, we conjectured that if the earlier acquisition of FR (relative to VR) was responsible for a pattern where thinner cortex related to better performance, then the underlying structural biomarkers for the thinner cortex may be related to the early developmental pruning of connections critical to the acquisition of FR. The amygdala is critical to the development of FR (Schultz, 2005) and is connected to the occipital lobe via the inferior longitudinal fasciculus. Research on the connectivity between the inferior temporal (IT) areas TE and TEO (the monkey analog of the FFA) and the limbic system has revealed projections in the infant monkey that are pruned in the adult (Webster, Ungerleider, & Bachevalier, 1991). Such connections from IT to limbic areas are labeled predominantly in infragranual (deep) layers of IT (Stefanacci & Amaral, 2000). Accordingly, if this variability in the pruning of the inferior longitudinal fasciculus connections from IT to the amygdala is relevant to FR in humans, it should be most reflected in the thickness of deep cortical layers. In contrast, we expect the positive correlation between VR and variability in FFA's CT to reflect mechanisms that support learning-dependent structural plasticity in adults (Wenger et al., 2012). These mechanisms are not well understood but are proposed to include processes such as gliogenesis and angiogenesis that accompany synaptogenesis (Zatorre, Fields, & Johansen-Berg, 2012). We have no reason to expect these mechanisms to be confined to deep layers (but acknowledge no evidence against laminar specificity).

It is important to note that multiple mechanisms may underlie any observed differences in structure that are associated with development. For example, variation in the pruning of connections could occur as described in the predictions that motivated the current work, affecting the thickness of different layers. However, the predicted pattern of laminar-specific correlations could be consistent with pruning but could also reflect other mechanisms. In particular, myelination increases during development and the effects of pruning and myelination are difficult to distinguish, as myelination could shift the apparent gray matter (GM)–white matter (WM) boundary and result in thinner cortical measurements, an effect that would be attributed to deep cortical layers (Sowell et al., 2004). It is fortuitous that both of these plausible mechanisms predict that the earlier development of FR would result in variance across individuals for FR to be specifically related to the thickness of the deep layers in our measurements.

In summary, here we test whether changes in CT that are associated with behavior (McGugin et al., 2016) are predominantly driven by changes in deep layers of the cortex, suggestive of developmental underpinnings. We additionally demonstrate that UHR MRI can be used to image laminar thickness within the cortex, providing a potentially useful biomarker for probing the structure–function relationship of the brain in vivo. In testing our predictions, we contrasted two models (Figure 1). The first represents a null hypothesis with regard to laminar specificity: It simply assumes that the effects observed in overall CT in McGugin et al. (2016) will replicate here but will be the same across all layers (Model 1). In contrast, Model 2 instantiates our predictions based on differential early development and posits a selective linkage between the deep MR layer and variability in FR.

Figure 1.

Composite models tested. M1 specified that correlations and regression coefficients between FR and CT are negative in sign and equal across MR cortical layers and that correlations and regression coefficients between VR and CT are positive in sign and equal across layers. M2 specified a layer-specific pattern for the associations between FR and CT across layers, with the deep MR layer having a stronger, more negative pattern of association than the superficial and middle MR layers. Like M1, M2 specified an invariant of association between VR and CT across layers. As described in the text, we also tested three additional substantive models positing different patterns of association.

Figure 1.

Composite models tested. M1 specified that correlations and regression coefficients between FR and CT are negative in sign and equal across MR cortical layers and that correlations and regression coefficients between VR and CT are positive in sign and equal across layers. M2 specified a layer-specific pattern for the associations between FR and CT across layers, with the deep MR layer having a stronger, more negative pattern of association than the superficial and middle MR layers. Like M1, M2 specified an invariant of association between VR and CT across layers. As described in the text, we also tested three additional substantive models positing different patterns of association.

## METHODS

### Participants

Fourteen right-handed men aged 19–29 years (mean = 22.46 years, SD = 3.60 years) were recruited from the Vanderbilt University community based on previous participation. Only men were recruited to limit variability in brain anatomy. Our sample size was limited by the short period of time between when we developed scan protocols with sufficient resolution and before a major scanner upgrade. We recruited individuals with experience in the scanner and who, as a group, offered a wide range of performance with faces. One participant was excluded because of excessive head motion. Written informed consent was obtained from all participants in accordance with guidelines of the Vanderbilt University institutional review board and Vanderbilt University Medical Center. All participants reported normal or corrected-to-normal visual acuity and received monetary compensation for participation.

### Behavioral Testing

Participants performed a battery of matching and recognition memory tests outside the scanner. First, they completed the extended Cambridge Face Memory Test (Russell, Duchaine, & Nakayama, 2009) in which they studied several views of unfamiliar male faces and were tested in forced-choice trials of three faces on which they selected the face that matched a studied face. Second, participants completed various sections of the Vanderbilt Expertise Test (VET; McGugin, Richler, Herzmann, Speegle, & Gauthier, 2012). This included two face tests (VET-Males, VET-Females), two vehicle tests (VET-Car, VET-Plane), and two animal tests (VET-Bird, VET-Butterfly). In each case, participants studied six identities, models, or species for as long as needed, followed by three-alternative forced-choice test trials in which they indicated which item was studied. Third, participants completed the Vanderbilt Face Matching Test (Sunday, Lee, & Gauthier, 2018). On each trial, participants studied two faces for 4000 msec and then were shown a three-alternative forced choice and indicated which face was a new image of a face studied on that trial. Matching faces were different images from the original presentation. Fourth, participants completed sequential matching tests using cars, planes, birds, and butterflies (McGugin, Gatenby, Gore, & Gauthier, 2012). On each trial, an image appeared for 1000 msec, followed by a 500-msec mask, and then a second image. Participants judged whether the two images showed cars/planes of the same make and model regardless of year or birds/butterflies of the same species. See Table 1 for mean test performance, reliability of each test, and consistency across tests for each category. Aggregate face, vehicle, and animal indices were calculated based in each case on standardized performance for the four relevant tests.

Table 1.
Behavioral Results across 12 Independent Tests
Mean (SD)Cronbach's Alpha
Faces
CFMT+ 0.65 (0.13) 0.88
VET-Male (Acc) 0.83 (0.17) 0.78
VET-Female (Acc) 0.83 (0.12) 0.86
VFMT (Acc) 0.6 (0.07) 0.86

Vehicles
VET-Car (Acc) 0.73 (0.18) 0.93
VET-Plane (Acc) 0.76 (0.12) 0.93
Matching-Car (d′) 1.79 (1.00) 0.75
Matching-Plane (d′) 1.57 (0.77) 0.86

Animals
VET-Bird (Acc) 0.68 (0.17) 0.80
VET-Butterfly (Acc) 0.56 (0.17) 0.83
Matching-Bird (d′) 1.46 (0.39) 0.92
Matching-Butterfly (d′) 1.65 (0.52) 0.82
Mean (SD)Cronbach's Alpha
Faces
CFMT+ 0.65 (0.13) 0.88
VET-Male (Acc) 0.83 (0.17) 0.78
VET-Female (Acc) 0.83 (0.12) 0.86
VFMT (Acc) 0.6 (0.07) 0.86

Vehicles
VET-Car (Acc) 0.73 (0.18) 0.93
VET-Plane (Acc) 0.76 (0.12) 0.93
Matching-Car (d′) 1.79 (1.00) 0.75
Matching-Plane (d′) 1.57 (0.77) 0.86

Animals
VET-Bird (Acc) 0.68 (0.17) 0.80
VET-Butterfly (Acc) 0.56 (0.17) 0.83
Matching-Bird (d′) 1.46 (0.39) 0.92
Matching-Butterfly (d′) 1.65 (0.52) 0.82

For one participant, performance on VET-Male was near chance and an outlier relative to other participants and his own other scores, so this data point was dropped. The ICC4 (with 95% HPD interval) was .89 (.77, .96) for the aggregate of face tests, .83 (.66, .95) for the aggregate vehicle tests, and .85 (.70, .96) for the aggregate animal tests. CFMT+ = extended Cambridge Face Memory Test; VET = Vanderbilt Expertise Test; VFMT = Vanderbilt Face Matching Test.

### MRI Acquisition

All participants were imaged using a Philips Achieva whole-body 7T MRI scanner with a quadrature, head-only transmit coil and a 32-channel receive coil array. Imaging proceeded in three stages: whole-brain structural imaging, functional localization, and UHR susceptibility-weighted imaging (SWI).

### Whole-brain Structural Imaging Acquisition

Whole-brain structural imaging consisted of a sagittal whole-brain 3D T1 weighted MPRAGE image gathered at 1-mm isotropic resolution and resampled into axial and coronal orientations for accurate localization of SWI acquisitions. This was used to identify anatomic landmarks important to the planning of the subsequent UHR SWI acquisitions.

### fMRI Acquisition, Stimuli, Design, and Analyses

Functional localization of the FFA was accomplished via isotropic whole-brain fMRI data. Images were acquired with minimal distortion using a multishot 3D gradient-echo PRESTO EPI pulse sequence (Versluis et al., 2010): repetition time/echo time = 21.93/28.13 msec, flip = 12°, field of view = 211.4 mm, number of slices = 30, voxel size = 2.5 mm isotropic, EPI factor = 11, water/fat shift = 6.705 pix, number of dynamics = 160, time/dynamic = 2.02 sec.

We presented all stimuli with a Macintosh computer (Apple, Inc.) running MATLAB 2014a (The MathWorks) using Psychophysics Toolbox (Brainard, 1997). Stimuli were displayed on a rear projection screen using an Avotec projector. Immediately following the HR structural scan, each participant completed one to two runs (depending on time) of a functional localizer scan (160dynamics/run). We used grayscale images (36 faces, 36 objects) in a 1-back detection task across 20 alternating blocks of face and object images. Each image was presented for 1 sec, with 16 images per category block. The HR T1-weighted structural scans were transformed into the standard sagittal view with 1-mm isometric voxels to facilitate coregistration of functional and anatomical data sets. Functional data were analyzed using Brain Voyager (www.brainvoyager.com) and in-house MATLAB scripts. Preprocessing in Brain Voyager included registration to the structural scan, slice scan time correction (cubic spline), 3D motion correction (sinc interpolation), and temporal filtering (high-pass criterion of two cycles per run) with linear trend removal. No spatial smoothing was applied. ROIs were defined using the Face > Object contrast from the face localizer scan. We localized an ROI that responded more to faces than objects in the right middle fusiform gyrus (FFA2) of all participants.

Right FFA2s were localized on the 3 mm3 (interpolated) statistical maps from the localizer scans as the peak face-selective voxel, then grown to incorporate contiguous 3-mm3 voxels passing statistical threshold within a four-voxel or 108 mm3 fixed volume around the peak of face selectivity. Solely for the purpose of comparing the location of our rFFA2s with those reported in the literature and demonstrating homology of the rFFA2 across participants, the HR T1-weighted structural scans were normalized to Talairach space and functional data realigned to the Talairach-transformed images. Talairach coordinates for our ROIs (mm ± SD) are x = 38.92 ± 3.86; y = −47.54 ± 6.73; z = −20.62 ± 3.01.

### SWI Image Acquisition, Processing, and Averaging

A minimum of six UHR T2*-weighted images were acquired as separate and independent acquisitions in each participant using slice-selective Cartesian gradient-echo acquisitions. Both real and imaginary images were recorded. Measurement of individual differences in CT presents additional challenges over characterizing average laminar profiles because of the various sources of measurement error that can confound interindividual variability. Thus, care was taken to plan image acquisitions such that the frequency encoding direction was perpendicular to the ventral surface of the temporal lobe underneath the right FFA to minimize differences in partial volume effects across layers from participant to participant. Before analysis, UHR data were reconstructed at 0.1875 mm in-plane resolution.

SWI imaging parameters were as follows: field of view = 240 × 180.194 × 21.9 mm, yielding voxel dimensions of 0.194 × 0.194 × 1.00 mm, 20 slices, 0.1 mm gap, “shortest” (878.8 ± 8.29 msec) repetition time, “shortest” (27.5 ± 0.31 msec) echo time, 27.26 pix water/fat shift, 55° flip angle, flow compensation, 9 min 11 ± 5.2 sec total duration. Images were acquired with a 1-D phase navigator to correct for phase errors arising from respiration during acquisition, as has been previously shown to be effective in very high resolution SWI imaging (Versluis et al., 2010).

Real and imaginary images were used to calculate magnitude and phase images, which were then processed to create susceptibility weighted scans (Haacke, Xu, Cheng, & Reichenbach, 2004). Within each participant, each of the six resulting SWI scans were coregistered using SPM12 (www.fil.ion.ucl.ac.uk/spm/) and averaged in MATLAB. For those participants who had more than six T2*-weighted scans acquired, the optimal six were identified for analysis by calculating the normalized mutual information (NMI; Studholme, Hill, & Hawkes, 1999; Viola & Wells, 1997; Wells, Viola, Atsumi, Nakajima, & Kikinis, 1996) between all pairs of acquired and coregistered SWI scans and ranking each according to the sum of its pairwise NMI values. The six scans with the highest summed pairwise NMI were kept for further analysis. This ensured that the six scans most similar to each other and likely containing the least substantial run-dependent artifacts or misalignments were selected for the purpose of averaging in later stages of the analysis.

### Image Registration

Standard resolution functional localizer scans were registered to high-resolution (1 mm isotropic) structural scans in BrainVoyager (Figure 2A). Because we sought to avoid unnecessary interpolations of the UHR images, all intensity analyses were performed directly on the UHR images. We registered the high-resolution structural image and the functionally defined rFFA2 to the UHR slices. Specifically, high-resolution structural images and rFFA2 masks were converted from BrainVoyager's proprietary image formats to NifTI-1 images using the NeuroElf toolbox for MATLAB. Rigid body transformation matrices of the high-resolution structural images to the UHR images were calculated via SPM's SPM_coreg() coregistration function, and these matrices were used to reslice the rFFA2 masks into the UHR space. The 108 mm3 rFFA2 masks traversed four to seven UHR slices, depending on the participant (mean = 5.57, SD = 0.85). For each UHR slice, 16-bit TIFF-format two-dimensional slice images were saved using MATLAB.

Figure 2.

Trace acquisition, intensity sampling, and layer identification. (A) rFFA2 was defined functionally with a Face > Object contrast and at a fixed size of 108 mm3 in each participant, then coregistered onto the UHR slices. Color bar and values represent the face selectivity measured from the 3 mm scan. (B) The edges of the rFFA2 were traced (magenta) over the underlying cortical mantle. Boundaries between GM and CSF (superficial layer boundary) and between GM and WM (deep layer boundary) were manually drawn (green). (C) Traces were acquired as the series of lines originating at each superficial boundary voxel and terminating in each deep boundary voxel and vice versa. GM intensity was sampled across the length of the trace. (D) The line plot visualizes all traces shown in varying colors contributing to one slice of one participant's rFFA2. A fourth-order polynomial (bold black curve) was fit to an average trace, with points of inflection defining the depths at which image intensity changed.

Figure 2.

Trace acquisition, intensity sampling, and layer identification. (A) rFFA2 was defined functionally with a Face > Object contrast and at a fixed size of 108 mm3 in each participant, then coregistered onto the UHR slices. Color bar and values represent the face selectivity measured from the 3 mm scan. (B) The edges of the rFFA2 were traced (magenta) over the underlying cortical mantle. Boundaries between GM and CSF (superficial layer boundary) and between GM and WM (deep layer boundary) were manually drawn (green). (C) Traces were acquired as the series of lines originating at each superficial boundary voxel and terminating in each deep boundary voxel and vice versa. GM intensity was sampled across the length of the trace. (D) The line plot visualizes all traces shown in varying colors contributing to one slice of one participant's rFFA2. A fourth-order polynomial (bold black curve) was fit to an average trace, with points of inflection defining the depths at which image intensity changed.

### Image Segmentation and Trace Identification

Using Adobe Photoshop, the GM-WM (deep) and GM-cerebrospinal fluid (CSF; superficial) cortical boundaries were manually drawn by two image analysts trained to recognize differences in brain structures but blind to the experimental goals (Figure 2B). The interrater reliability of total rFFA2 thickness was very high (intraclass correlation [ICC] = .98). In addition to segmenting cortex from other tissue, blood vessels and image artifacts were manually marked as well. The resulting segmented data images were masked with the ROI images to create a series of ROI-specific cortical segmentation images. These segmented ROI images were next used to trace within-slice line segments through the thickness of cortex, along which we sampled the cortical signal intensity. Before tracing, each boundary was smoothed to avoid single-pixel concavities or convexities. Specifically, for each voxel that was part of the boundary, the three contiguous nearest neighbor boundary voxels in either direction (for a total of seven) were identified and a line segment fit to those voxels using linear regression. The set of such line segments was aggregated as the boundary was traversed and skeletonized (to removed thickenings) and cleaned of spurs using the MATLAB function bwmorph().

Because of the convoluted shape of the cortical sheath, points on the deep and superficial boundaries of cortex do not have a one-to-one correspondence. Using custom MATLAB code, we selected lines originating in each superficial boundary voxel and terminating in the nearest deep boundary voxel within that ROI in that slice (Figure 2C). We conducted the same procedure in the reverse direction as well, originating in each deep boundary voxel and terminating in each superficial boundary voxel. The resulting traces were subject to five constraints: First, traces could not intersect cortical boundaries (aside from at trace origination or termination) to ensure that all samples were restricted to cortical tissue even in areas of high cortical convolution. Second, no trace could cross a previously identified trace originating from the same boundary to avoid overrepresenting certain pieces of the cortical mantle. Third, the angle between the trace and boundary from which it originated could not be more than 20° off perpendicular to ensure that traces were approximately perpendicular to the cortical boundary. Fourth, any trace that was 3 SDs or more longer or shorter than the mean of all traces in a given slice was removed to control for outlier paths. Fifth, no trace could cross an area marked for exclusion, such as a vein, because MRI intensity values are not valid indicators of GM signal in veins. For any trace that violated one of these constraints, a new trace was drawn from the originating voxel to the next closest voxel in the other boundary.

#### Intensity Sampling and MR Layer Identification

First, average CT was defined in each ROI in each slice per participant as the average distance between the GM-CSF border and GM-WM border. Next, three cortical depths were identified by the points at which GM intensity trends changed (Figure 2D). Specifically, to estimate these depths, we fit a fourth-order polynomial to the set of intensity samples from each ROI in each slice. GM intensity was sampled at 500 equally spaced points along the length of each trace. This ensured that voxel intensities were sampled even when a trace only crossed a corner of a voxel. The sampled intensities were linearly interpolated to 1/100th mm intervals. The resulting intensities were thresholded since pixels above or below a certain intensity are assumed to reflect partial-volume or whole-pixel capture of non-GM. Specifically, any intensity values 3 SDs or more from the mean (over the entire set of trace intensities) were not included. Subsequently, intensity data were de-trended.

In all slices, inflection points were identified as the depths at which the second derivative of the polynomial was equal to zero. Inflection points outside the range of depths that fell within cortex were discarded, and the remaining points were taken as estimates of the boundaries between cortical layers. Thus, we operationalized three MR cortical layers as the distances between (1) GM-CSF border and the first inflection point (superficial cortical depth), (2) first and second inflection points (middle cortical depth), and (3) second inflection point and the GM-WM border (deep cortical depth). We hypothesize that these MR layers are related to, although they are not a direct measure of traditional cortical layers defined based on histological properties (Lifshits et al., 2018).

In 2 of 69 slices analyzed, the variation of intensity through the cortex was insufficient for accurately fitting a polynomial to the underlying data. These slices were omitted from further analyses.

### Transcortical Microvasculature

Transcortical microvasculature was segmented and mapped using a contour-based local minimization routine. Having manually identified the deep and superficial boundaries of the functionally defined rFFA2, intermediary contours parallel to the cortical surface were drawn throughout the CT. Transcortical vessels were segmented by mapping local minima along each contour, and the vascular area was quantified as the percentage of voxels within the rFFA2 labeled as belonging to transcortical vessels. The measure for each participant was an average from at least four and up to seven slices, with a weighted average ICC of .80.

## RESULTS

### Reliability of CT Measurements

#### Slices

We estimated the reliability of cortical measurements across slices (after centering them on the peak of face selectivity) using the correlation of thickness estimates for each slice relative to all the other slices in each participant. For total regional CT, which was based on manual tracings of high-contrast boundaries, correlations were higher than .8 for all slices. As a result, total CT measurements were averaged over all available slices. Interrater reliability of total CT was calculated using ICCs for distance from GM-WM boundary to GM-CSF boundary, averaged across all slices for the two analysts.

Thickness measurements of the middle MR layer were based on relatively lower contrast boundaries identified using an automated algorithm (see below). Correlations reached .5 for anterior slices, but slices that were posterior relative to the peak of face selectivity showed little to no evidence of reliability. Inspecting individual images, this proved to be because FFA2 ROIs were located closer to the posterior slices of the UHR stack of 20 images for most participants. Our measurements were made after coregistration of six UHR scans to increase contrast; thus, the final slices of the stack were more likely to show coregistration artifacts due to motion or respiration. We therefore used only the two to four reliable anterior slices for each participant and dropped the two to three most posterior and unreliable slices, for all measurements involving an internal laminar boundary (thickness estimates for deep, superficial and middle MR layers).

#### Average Thickness Measures

The WM-GM boundary and the GM-CSF boundary were traced manually by two image analysts. Across participants, the mean total CT of rFFA2 (distance from the WM-GM boundary to the GM-CSF boundary) was consistent with previous work. Between the two analysts blind to behavioral measurements, interrater reliability of total rFFA2 thickness was very high: ICC = .98). MR-defined laminar subdivisions are defined by two internal boundaries identified using an automated procedure (see Methods). The representation of each MR laminar subdivision proportional to total CT was also consistent with previous work (Figure 2D). Across participants, the mean total CT of rFFA2 (distance from the WM-GM boundary to the GM-CSF boundary) was in the range reported in extrastriate cortex (Fischl & Dale, 2000): average rFFA2 GM thickness = 3.37 mm, with a range between 2.50 and 4.67 mm. Furthermore, the representation of each laminar subdivision proportional to total CT was consistent with MR microscopy in area V4 in nonhuman primates (Chen, Wang, Gore, & Roe, 2012): superficial MR layer (1.07 mm, range = 0.74–1.61 mm, 31%), middle MR layer (1.04 mm, range = 0.64–1.49 mm, 31%), deep MR layer (1.26 mm, range = 0.56–2.72 mm, 38%). As an index of consistency across participants, ICCs for the three laminar subdivisions were: superficial (.77), middle (.73), and deep (.80).

### Association between Behavioral Measures and Total rFFA2 CT

Each behavioral measure was computed as the average of the standardized values of four tests. ICCs were greater than .84 in all cases. We replicated previous results showing that total FFA CT is negatively correlated with FR (r = −.83, p < .001) and positively correlated with the recognition of vehicles (r = .70, p < .001; Table 2), with the two correlations significantly different (t = 8.86, p < .001). Face and vehicle performance were not significantly related (r = −.33, p = .27) and partial correlations with rFFA2 controlling for performance on the other categories were comparable in value and statistical significance to the zero-order rs (Table 2; Figure 3A). We replicated these correlational results using the univariate observed–imposed bootstrap (e.g., Beasley et al., 2007). Multiple regression indicated that both face and vehicle scores independently predicted rFFA2 thickness (both ps ≤ .001) and together accounted for 60% of the variance on adjusted R2.

Table 2.
Pearson's Correlations between Behavioral Performance and CT and Laminar Depth Measurements
Total Regional ThicknessSuperficial Cortical SubdivisionMiddle Cortical SubdivisionDeep Cortical Subdivision
Zero-order Correlations
Faces −0.83 −0.38 −0.28 −0.76
Vehicles 0.70 0.44 0.48 0.45

Partial Correlations
Faces −0.90 −0.28 −0.15 −0.73
Vehicles 0.83 0.37 0.43 0.33
Total Regional ThicknessSuperficial Cortical SubdivisionMiddle Cortical SubdivisionDeep Cortical Subdivision
Zero-order Correlations
Faces −0.83 −0.38 −0.28 −0.76
Vehicles 0.70 0.44 0.48 0.45

Partial Correlations
Faces −0.90 −0.28 −0.15 −0.73
Vehicles 0.83 0.37 0.43 0.33

Zero-order Pearson correlation coefficients are shown for face and vehicle categories correlated with total regional thickness and cortical depths operationalized by the fourth-order polynomial and its second-order derivative. Partial-order Pearson correlation coefficients are given for each behavioral category regressing out the other two (including animals), correlated with total regional and laminar cortical depths. Significant correlations are bolded.

Figure 3.

The relation between GM thickness of rFFA2 or its laminar subdivisions, and behavioral performance. Behavioral performance on the x axes and CT and depth measurements on the y axes show residualized values (see text for actual values in mm). The scatter plots on the left (with face icons) show partial correlations with FR controlling for performance with vehicles, whereas scatter plots on the right (with car icons) show partial correlations with vehicles controlling for FR. (A) The brain is tilted as a result of slices being aligned perpendicular to the rFFA2. Shown is one slice for one participant showing the average of six susceptibility-weighted scan acquisitions. The dashed box is centered on the functionally defined rFFA2. Below the whole-brain slice are scatter plots depicting the relation between total regional CT of rFFA2 and behavioral performance with faces (left) and vehicles (right). Error bars represent the 95% confidence intervals. (B) The central inset shows a magnified view from the box in (A), centered on rFFA2 for one slice in one participant, with the three laminar subdivisions perceivable to the naked eye. Scatter plots represent partial correlations between the thickness of deep, middle, or superficial MR layers as a function of FR and VR, controlling for performance with the other category. Error bars show the 95% confidence intervals.

Figure 3.

The relation between GM thickness of rFFA2 or its laminar subdivisions, and behavioral performance. Behavioral performance on the x axes and CT and depth measurements on the y axes show residualized values (see text for actual values in mm). The scatter plots on the left (with face icons) show partial correlations with FR controlling for performance with vehicles, whereas scatter plots on the right (with car icons) show partial correlations with vehicles controlling for FR. (A) The brain is tilted as a result of slices being aligned perpendicular to the rFFA2. Shown is one slice for one participant showing the average of six susceptibility-weighted scan acquisitions. The dashed box is centered on the functionally defined rFFA2. Below the whole-brain slice are scatter plots depicting the relation between total regional CT of rFFA2 and behavioral performance with faces (left) and vehicles (right). Error bars represent the 95% confidence intervals. (B) The central inset shows a magnified view from the box in (A), centered on rFFA2 for one slice in one participant, with the three laminar subdivisions perceivable to the naked eye. Scatter plots represent partial correlations between the thickness of deep, middle, or superficial MR layers as a function of FR and VR, controlling for performance with the other category. Error bars show the 95% confidence intervals.

As in prior work (McGugin et al., 2016), the correlation between overall CT and recognition of animals, which may not be acquired especially early or late, was in the same direction as FR, although not significantly so (r = −.12, p = .34). Animal recognition also did not predict thickness of MR layers (superficial: r = .15; middle: r = .26; deep: r = −.26). Our results cannot be explained by variability in vascular volume in FFA2 because such differences were not related to total CT (r = −.31), or behavioral performance (face, r = .11; vehicle, r = −.09, animal, r = .21), despite being reliable (ICC = .75).

### Association between Behavior Measures and Thickness of rFFA2 MR Layers

#### Correlations

In accord with predictions of the differential early development model, FR was strongly negatively correlated with the thickness of deep MR layers of rFFA2 (r = −.76, p = .002; Figure 3B; Table 2). This relation is evident when examining participant-specific brain data listed as a function of FR performance (Figure 4). There were no significant correlations between FR and thickness of superficial or middle MR layers (Figure 3B, Table 2). In contrast, correlations between VR and thickness of all three MR layers were consistently positive in sign (rs ranged from .44 to .48, two-tailed ps ranging from .13 to .10), although not statistically significant. For the deep MR layer, the correlations of thickness with FR significantly differed from that with VR (t = 3.69, p < .005). These effects were replicated when bootstrapping was used.

Figure 4.

Plots of the fourth-order polynomial fit to the average rFFA2 intensity curve of the center-most slice for each participant. Participants are rank-ordered from highest (S1) to lowest based on FR performance, regressing out performance with vehicles. Inflection points marking the transitions from deep MR layers (red shading) to middle MR layers (no shading) to superficial MR layers (gray shading) are marked with red dots. Axis labels: WM, CSF. For every other participant, the centermost slice of rFFA2 is shown to the left. The spread of face-selective activation along the fusiform gyrus is outlined in gray. Labels: OTS = occipital temporal sulcus; CoS = collateral sulcus; MFS = middle fusiform sulcus. The rFFA2 most often fell on the lateral aspect of the fusiform gyrus, between MFS and OTS, corresponding to cytoarchitectonic FG4.

Figure 4.

Plots of the fourth-order polynomial fit to the average rFFA2 intensity curve of the center-most slice for each participant. Participants are rank-ordered from highest (S1) to lowest based on FR performance, regressing out performance with vehicles. Inflection points marking the transitions from deep MR layers (red shading) to middle MR layers (no shading) to superficial MR layers (gray shading) are marked with red dots. Axis labels: WM, CSF. For every other participant, the centermost slice of rFFA2 is shown to the left. The spread of face-selective activation along the fusiform gyrus is outlined in gray. Labels: OTS = occipital temporal sulcus; CoS = collateral sulcus; MFS = middle fusiform sulcus. The rFFA2 most often fell on the lateral aspect of the fusiform gyrus, between MFS and OTS, corresponding to cytoarchitectonic FG4.

#### Regression Models Predicting Behavioral Measures from MR Layer-specific CT

Although correlations revealed bivariate relations among pairs of variables, they do not address the degree to which the combination of MR layer-specific CT measures predict individual differences in behavior. To address this question, we conducted multiple regression analyses in which CT in the superficial, middle, and deep MR layers predicted FR and VR scores. According to adjusted R2, the combination of superficial, middle, and deep MR layer thickness accounted for 65% of the variance in FR (overall regression, F(3, 9) = 8.29, p = .006, unadjusted R2 = .73). Both standard and bootstrapped confidence intervals indicated that the thickness of the deep layer was the only statistically significant predictor of FR. We also computed semipartial correlations indicating the correlation between FR and each CT measure with the latter adjusted for the other CT measures. Only the semi-partial correlation for the deep MR layer (r = .72, p = .0125) was significantly greater than 0 (superficial r = −.29, p = .40; middle r = −.25, p = .45). Squared semipartial correlations quantifying the proportion of variance uniquely contributed by each predictor indicated that the thickness of the deep MR layer accounted for slightly over half the total variance of FR (deep sr2 = .52, superficial and middle sr2 = .08 and .06, respectively).

The multiple regression analyses with VR as the dependent variable had a different pattern of results. In this case, the overall regression model was significant, F(3, 9) = 3.98, p < .05, although none of the regression coefficients for the three predictors were significant (ps ranged from .07 to .12). Although the latter result could be due to the relatively small sample size, this result also suggests that it was the shared variance among the three predictors rather than the unique variance of each predictor that accounted for effects on VR.

#### Regression Models Predicting CT of MR Layers from Behavior

Our most extensive analyses used a multiple regression framework to assess alternative models of the effects of FR and VR on CT measures. We emphasized these analyses because they allowed us to test alternative hypotheses about the complete pattern of associations between the two behavioral measures and thickness of the three MR layers.

Our regression models used a repeated-measures framework, with the three MR layer-specific CT measures per person serving as the repeated dependent measure. Predictors were layer, FR, and VR. We created two new variables that decomposed the overall effect of the layer factor into two orthogonal (i.e., uncorrelated) contrasts. As noted above, we hypothesized that the association between FR and thickness would be more strongly negative in the deep MR layer relative to the superficial and middle MR layers. To capture this difference, we gave one contrast variable (denoted LDvsSM) a code of −1 for observations from the deep MR layer and codes of 0.5 for observations from the superficial and middle MR layers. The second contrast variable for layer (denoted LSvM) was orthogonal to the first and assigned a code of 0 to the deep layer, 1 to the superficial layer, and −1 to the middle layer. When both contrast variables are included in a model as a set, they estimate the overall (2 df) main effect of MR layer.

We compared six different regression models that imposed different restrictions on the association between measures of the CT of MR layers (hereafter “layers” for short in the model descriptions) and FR and VR.

##### Model 0 (M0): Baseline model.
In this model, the only predictor of CT levels was the overall main effect of layer,
$Model0:CTij=B0+B1LDvsSM+B2LSvM+εij$
where I denotes participant (1 through 13), j denotes layer (1 through 3), and ε denotes the prediction error for the CT measure in the jth layer for the ith participant.

We used M0 simply as a baseline comparison to verify that FR and VR measures, introduced in subsequent models, were significantly associated with CT.

##### Model 1 (M1): Equal coefficients across layers.
The next two models reflected our two primary alternative hypotheses. M1 predicted CT in a given layer from the two contrast variables for layer, FR, and VR,
$Model1:CTij=B0+B1LDvsSM+B2LSvM+B3FRi+B4VRi+εij$
M1 is a “main effects only” model with no interaction terms. As a result it specifies that the regression coefficient denoting the effect of FR on CT (B3) is equal (i.e., invariant) across all three layers and the coefficient for the effect of VR on CT (B4) is also equal across layers (see Figure 1). This model is the most parsimonious extension of our previous finding using overall CT (McGugin et al., 2016).
##### Model 2 (M2): FR differentially predicts deep versus superficial/middle MR layer CT.
M2 tested the hypothesis that (1) the association between FR and CT is more pronounced (i.e., more negative) in the deep layer than the middle and superficial layers, but (2) layer does not moderate the relation between VR and CT measures. To embody this specification, an interaction (i.e., product) term between FR and the deep versus superficial/middle contrast variable (LDvsSM) was added to M1,
$Model2:CTij=B0+B1LDvsSM+B2LSvM+B3FRi+B4VRi+B5FRi×LDvsSM+εij$
In M2, three unique regression coefficients are sufficient to account for the linkages between FR and VR and CT across layers: the coefficient for the regression of CT on FR within the deep layer (equal to B3B5), the coefficient for the regression of CT on FR within the middle and superficial layers (equal to B3 + B5/2), and the single coefficient for the regression of CT on VR within each of the three layers (B4). Thus, this model specifies a constant effect of VR across all three layers and the hypothesized variable effect for FR.

The remaining models are designed to assess patterns of association other than those directly hypothesized in Models 1 and 2.

##### Model 3 (M3): Extended M2 by adding an interaction term between the second layer contrast variable (LSvM) and FR.

This addition relaxes the assumption that the regression coefficients for FR in the middle and superficial layers were equal to one another. Thus, M3 goes beyond M2 by allowing all three regression coefficients for FR to differ from one another. Like M2, however, it restricts all three regression coefficients for VR to be equal.

##### Model 4 (M4): Added the full VR × Layer interaction to M1.

This model specifies that the regression coefficients involving FR are all the same but allows for unique regression coefficients for VR in each layer.

##### Model 5 (M5): Extended M3 and M4 by adding in both the overall FR × Layer interaction and the overall VR × Layer interaction.

This model allows for six unique regression coefficients (FR/VR × Superficial/Middle/Deep) to capture the association between FR and VR and CT measures. Support for this model would imply that layer moderates the relation between both FR and VR and CT.

We conducted analyses using the gls function (Gałecki & Burzykowski, 2013) in the nlme package (Pinheiro et al., 2018) in R (R Core Team, 2018). For all models, we accounted for the dependence among the three observations for each participant by imposing an unstructured covariance matrix among the residuals. Maximum likelihood estimation was used (e.g., Gałecki & Burzykowski, 2013). We evaluated the relative fit of the models using several indices. When models were nested, we used likelihood ratio tests to compare models. We also used three information indices to compare all models: the Akaike Information Criterion (Akaike, 1973), the small-sample corrected Akaike Information Criterion (Sugiura, 1978), and the Bayesian Information Criterion (Schwarz, 1978). These indices reward model fit but penalize for model complexity and thus are designed to prevent overfitting. Lower values indicate better fit. We also report −2 × the log likelihood (−2LL) value of each model computed at the maximum likelihood estimates. −2LL values are used both in the computation of likelihood ratio tests (for which the difference in the −2LL values of two models are distributed as chi-square variables with degrees of freedom equal to the difference in the number of parameters estimated by each model) and of information indices.

Table 3 shows the values of −2LL and the information indices for the six models. The values for M0 very clearly fit worse than all other models (e.g., all LR tests comparing other models to M0ps < .005), thus indicating that FR and VR contributed significantly to the prediction of CT. Most importantly, Table 4 indicates that M2 (FR Differentially Predicts Deep versus Superficial/Middle CT) is clearly the best-fitting model. It has the lowest values for all three information indices. Likelihood ratio tests comparing M2 to other models yielded a pattern consistent with the information indices. M2 fit significantly better than M112 = 5.94, p = .02). Although M2 is a restricted version of M3 and M5, neither fit any better than M2 (M2 vs. M3: χ12 = .01, p > .90; M2 vs. M5: χ32 = .20, p > .97). These results indicate that M2 is the model that best combines fit to the data with parsimony. Table 4 shows the unstandardized and standardized regression coefficients for the regression of CT on FR and VR that are implied by M2. FR has a stronger (negative) effect on CT in the deep MR layer relative to the superficial and middle MR layers. This difference is statistically significant, t(33) = 2.53, p < .02.

Table 3.
Fit Indices for Regression Models Predicting CT
ModelDescription−2LLAICAICcBICM2/M# BF
M0 Layer only 43.97 61.97 68.18 76.95 161.04
M1 M0 + Main effects for FR and VR 13.51 35.51 45.29 53.81 38.85
M2 M1 + LDvsSM 7.56 31.56 43.56 51.53 —
M3 M1 + Layer × FR Interaction 7.54 33.56 48.12 55.18 3.07
M4 M1 + Layer × VR interaction 12.52 38.52 53.09 60.15 139.52
M5 M1 + Layer × FR + Layer × VR interactions 7.34 37.34 58.21 62.29 19.90
ModelDescription−2LLAICAICcBICM2/M# BF
M0 Layer only 43.97 61.97 68.18 76.95 161.04
M1 M0 + Main effects for FR and VR 13.51 35.51 45.29 53.81 38.85
M2 M1 + LDvsSM 7.56 31.56 43.56 51.53 —
M3 M1 + Layer × FR Interaction 7.54 33.56 48.12 55.18 3.07
M4 M1 + Layer × VR interaction 12.52 38.52 53.09 60.15 139.52
M5 M1 + Layer × FR + Layer × VR interactions 7.34 37.34 58.21 62.29 19.90

−2LL = −2 × log likelihood of the model. This index is used in the computation of likelihood ratio tests comparing nested models and information indices. As lower values of information indices indicate better fit, all favor M2 (in bold). BFs compare M2 to each of the other models. All BFs indicate greater support for M2 than all other models. AIC = Akaike Information Criterion; AICc = AIC with small-sample correction; BIC = Bayesian Information Criterion.

Table 4.
Unstandardized and Standardized Regression Coefficients from M2
LayerFRVR
Superficial −.07 (−.10) .18* (.27)
Middle −.07 (−.10) .18* (.27)
Deep −.62* (−.57) .18* (.27)
LayerFRVR
Superficial −.07 (−.10) .18* (.27)
Middle −.07 (−.10) .18* (.27)
Deep −.62* (−.57) .18* (.27)

M2 is the best-fitting regression model that predicts that the FR is selectively associated with CT in the deep layer but that VR has the same association with CT in each layer. Unstandardized regression coefficients are on top and standardized regression coefficients are in parentheses.

*

p < .001.

Although the fit indices discussed so far clearly indicate that M2 is the best-fitting model, they do not precisely quantify the strength of the evidence for M2 relative to the other hypotheses. For this reason, we additionally computed Bayes factors (BFs; Kass & Raftery, 1995; Jeffreys, 1961) using the BayesFactor package in R (Morey et al., 2018). We specified the same six regression models. Instead of correlated residuals, we specified a random effect for participant to model the correlation among the CT values for each person. For all models, we used the Jeffreys–Zellner–Snow mixture of g-priors specification introduced by Liang, Paulo, Molina, Clyde, and Berger (2008; see also Rouder & Morey, 2012). Because BFs can be sensitive to prior distributions, we used three alternative specifications of the scale parameter of the inverse gamma distribution that, in turn, is used to specify the variances of the prior distribution for the regression coefficients. All three specifications produced identical conclusions. Below, we report the results for the default “medium” values of the scaling parameters. For each pair of models, BFs were computed as the ratio of the marginal likelihood of the first model to the marginal likelihood of the second. Although we caution against rules of thumb, Jeffreys' criteria specify that BFs in the range 1–3, 3–10, 10–30, 30 to 100, and > 100 indicate respectively weak, substantial, strong, very strong, and decisive support for the hypothesis linked to the numerator of the BF. Reciprocal values indicate degree of support for the model in the denominator.

An examination of the BFs yielded by the 15 pairings of models indicated that M2 was clearly the best-fitting model (i.e., for each of its five comparisons, M2 had the best fit). The final column in Table 3 displays the BFs comparing M2 to each of the other models. Perhaps the most important index here is the BF of 38.85 for the M2/M1 comparison. This value indicates that the evidence in the data for M2 is over 30 times as strong as the evidence for M1. All other comparisons yielded what Jeffrey's termed “strong” support for M2 except that involving M3. This is not surprising given that both models are highly similar. Both allow for a difference in the association between FR and CT between the deep and the superficial layers, and both specify that the regression coefficients for VR are the same across layers. Unlike M2, however, M3 allows for differences between the FR coefficients in the superficial and middle layers. That the evidence favoring M2 is over three times stronger indicates that the superficial and middle layer coefficients can be constrained to be equal. Thus, the best-fitting model in both the regression and Bayesian analyses (M2) required only three unique regression coefficients to capture the relation between CT and FR and VR across the three MR layers: the coefficient denoting the effect of FR on CT in the deep MR layer, the coefficient denoting the effect of FR on CT in the superficial and middle MR layers, and the coefficient denoting the effect of VR on CT in all three MR layers.

## DISCUSSION

We used susceptibility-weighted MRI in a 7T magnet to measure the thickness of deep, middle and superficial MR layers, distinctions that cannot be resolved with conventional MRI approaches at 3T or below. Although new developments in MRI have allowed some visualization of intracortical layers, validation of their functional relevance has been reserved to primary visual cortex (Trampel, Ott, & Turner, 2011). We were constrained to a single ROI because of a limited field of view and the fact that accurate measurement of laminar CT with nonisotropic voxels depends upon careful alignment of slices perpendicular to the cortex of a targeted brain area. We selected the rFFA2 based on earlier robust functional expertise effects for nonface objects (McGugin, Van Gulick, Tamber-Rosenau, Ross, & Gauthier, 2015).

Here, we replicate and extend the finding that VR is positively correlated with CT of FFA (McGugin et al., 2016), whereas FR is negatively correlated with FFA thickness (McGugin et al., 2016; Bi, Chen, Zhou, He, & Fang, 2014). This surprising pattern of correlations led us to postulate different mechanisms of plasticity behind these effects, with the mechanism relevant to FR operating earlier and selectively influencing the deep layers of rFFA2.

To measure the thickness of MR layers, we applied an automated algorithm to define borders between layers using signal intensity shifts along perpendicular traces through rFFA2 (Methods and Figure 2). We postulate that the three most prominent intensity differences are related to observed laminar divisions in the fusiform (Lorenz et al., 2017), corresponding to the densely-packed deep/infragranular layers (layers V–VI), middle/internal granular (layer IV), and superficial/supragranular (layers I–III). We successfully visualized laminar structure in rFFA2 and our measurements of the thickness of MR layers were reliable.

Although we compared several different models of the data, our primary focus was on two specific alternatives (see Figure 1). The first (M1) specified that the pattern of correlations observed in total CT (negative for faces and positive for vehicles) is constant across all laminar subdivisions. The second (M2) specified that the association between individual differences in FR and the thickness of the deep MR layer was stronger (i.e., more negative) than the association between FR and thickness of the superficial and middle MR layers. Like M1, M2 specified that the association between VR and CT was invariant across layer. Regression analyses consistently favored M2 relative to M1 and to the other models that we specified, across multiple indices (likelihood ratio tests, three information indices, and BFs). Comparisons of regression coefficients yielded by M2 indicated that the association between FR and the deep MR layer was significantly stronger than the association between FR and the superficial and middle MR layers. The thickness of the deep MR layer accounted for a much higher proportion of variance in FR than the thickness of the other two MR layers. Indeed, when considered in isolation, CT of the deep MR layer accounted for 58% of the variance in FR, which is 84% of the variance accounted for by all the layers together (using total CT). Thus, the strong association between total CT and FR is largely, if not exclusively, due to the deep MR layer. In contrast, for VR there was no evidence that the relation was moderated by layer, and the clearly best-fitting regression model (M2) constrained the regression coefficients for VR to be equal across layers.

The sample size of this study (n = 13) was small. Accordingly, we focused on regression models that specified the overall pattern of association across layers and behavioral measures, to increase power and precision (e.g., see Hoijtink, 2011). In the present case, for example, although the individual correlations between VR and CT in the superficial, middle, and deep layers were not significantly greater than 0, our regression models yielded significant effects for VR on CT when these were constrained to be equal across layers. More generally, that our analyses indicated the clear superiority of M2 even given the relatively small sample size is an indication of its strength as a representation of the underlying pattern of associations.

These results are among the most structurally precise correlates of behavioral ability to date and an important validation of the functional relevance of our in vivo laminar measurements. Future development of these methods to a larger field of view should allow exploration of how these fine structural effects relate to correlations of FR ability on a much more distributed scale (Elbich & Scherf, 2017). Our results also represent perhaps the most striking difference to date between FR and non-FR. FR has often been deemed special relative to the recognition of other categories (Farah, Wilson, Drain, & Tanaka, 1998), but many behavioral hallmarks of face expertise, such as the inversion effect or holistic processing, are observed with non-face objects like cars in expert participants (Bukach, Phillips, & Gauthier, 2010; Curby, Glazek, & Gauthier, 2009). Face and car recognition also show similar degrees of heritability (Shakeshaft & Plomin, 2015) and similar correlations with domain-general object recognition ability (Richler, Wilmer, & Gauthier, 2017). In fMRI studies, car recognition ability correlates positively with selective neural responses to images of cars in FFA (e.g., McGugin, Gatenby, et al., 2012), whereas FR ability also correlates positively with selective responses to faces in the same area (McGugin, Ryan, Tamber-Rosenau, & Gauthier, 2018). Despite these similarities, our results suggest that behaviors supported by common circuitry in a brain region may have developed at different times, with variability in structure as a footprint of the different acquisition history.

Our predictions were inspired by the idea that FR ability begins earlier than VR and that a skill acquired during this period would be influenced by the pruning of connections from IT to limbic areas, though this is not the only possibility. We do not know whether the relevant period of influence on FFA's structure occurs before about 5 years of age (by that age, children show many signs of the expertise adults display with faces; Jeffery et al., 2011), when pruning may be more important, or in later childhood and adolescence, when myelination may dominate developmental changes (Natu et al., 2018). An alternative explanation is that greater myelination near FFA during childhood and adolescence is associated with better FR. Recent work has questioned the prevalent theory that pruning is the main force behind cortical thinning during development, with more evidence for myelination than for pruning after 5 years of age in the fusiform gyrus (Natu et al., 2018). Also unknown are the specific mechanisms of plasticity that would support structural effects of visual learning in adults, leading us to assume the null hypothesis of no laminar specificity of the positive correlation with CT observed for vehicles. Importantly, the mechanism driving the largest amount of average change in CT in an area may not be the same mechanism driving individual differences in CT.

It is unknown whether negative correlations between CT and performance will prove to be specific to deep MR layers for other kinds of abilities and other brain regions. It is also unknown which of the two patterns (positive correlation with total regional CT or negative correlation with thickness of deep MR layers) is more representative of the majority of object categories. Here, as in prior work at 3T (McGugin et al., 2016), there was no significant correlation with animal recognition but the trend was more similar to effects found with faces than vehicles (a negative correlation for thickness of deep MR layers). Future studies should explore how laminar CT in FFA relates to a more extensive set of visual abilities and how this relation is affected by sex. Indeed, as of now, the crossover interaction between CT and visual recognition abilities has only been described in men, yet there are sex differences in the recognition of living and nonliving objects (McGugin, Richler, et al., 2012), and sex also moderates CT effects (Plessen, Hugdahl, Bansal, Hao, & Peterson, 2014).

By the time CT is measured in adults, effects in any given brain area will reflect a complex history of changes due to both experience- and non-experience-dependent developmental factors. This complicates interpreting relationships between behavior and local brain structure, including how structure underlies various neurological disorders. For example, both abnormal thinning of fusiform cortex over development (Raznahan et al., 2010) and lack of developmental improvements in FR (O'Hearn, Schroer, Minshew, & Luna, 2010) have been reported in autism spectrum disorders. The study of the laminar specificity of structural correlates of behavioral abilities may provide a useful window into these complex dynamics.

## Acknowledgments

We thank Carla Shatz for suggesting that different ages of acquisition should translate into differential effects on cortical layers. We thank Magen Speegle and Susan Benear for manual tracing on the gray matter–white matter boundaries; Anita Disney, Jesse Gomez, Thilo Womelsdorf, and Wolfe Zinke for comments; Maarten Versluis and Ed Mojahed for sharing pulse sequence designs that were used as a basis for those developed here. R. W. M. and I. G. conceived the study. R. W. M and A. T. N. collected the primary data. A. T. N. developed the image acquisition techniques and average susceptibility-weighted image reconstruction techniques. R. W. M., A. T. N., and B. T. R. developed the methods for laminar analysis of susceptibility-weighted images. A. T. conducted reliability, bootstrapping, and Bayesian analyses. All authors coordinated second-level analyses. R. W. M., A. T. N., and I. G. wrote the article with input from the other authors. This work was supported by the NSF (SBE-0542013 and SMA-1640681 and NSF award BCS-1840896), the Vanderbilt Vision Research Center (P30-EY008126), and the Vanderbilt University Medical Center (1 S10RR023047 01).

Reprint requests should be sent to Rankin W. McGugin, Department of Psychology, Vanderbilt University, Wilson Hall, PMB 407817, 2301 Vanderbilt Place, Nashville, TN 37240-7817, or via e-mail: Rankin.McGugin@gmail.com.

## REFERENCES

Akaike
,
H.
(
1973
).
Information theory and an extension of the maximum likelihood principle
. In
B. N.
Petrov
&
F.
Csaki
(Eds.),
Proceedings of the 2nd International Symposium on Information Theory
(pp.
267
281
).
Budapest, Hungary
:
.
Arcaro
,
M. J.
,
,
P. F.
,
Vincent
,
J. L.
,
Ponce
,
C. R.
, &
Livingstone
,
M. S.
(
2017
).
Seeing faces is necessary for face-domain formation
.
Nature Neuroscience
,
20
,
1404
1412
.
Beasley
,
W. H.
,
DeShea
,
L.
,
Toothaker
,
L. E.
,
Mendoza
,
J. L.
,
Bard
,
D. E.
, &
Rodgers
,
J. L.
(
2007
).
Bootstrapping to test for nonzero population correlation coefficients using univariate sampling
.
Psychological Methods
,
12
,
414
433
.
Bi
,
T.
,
Chen
,
J.
,
Zhou
,
T.
,
He
,
Y.
, &
Fang
,
F.
(
2014
).
Function and structure of human left fusiform cortex are closely associated with perceptual learning of faces
.
Current Biology
,
24
,
222
227
.
Brainard
,
D. H.
(
1997
).
The Psychophysics Toolbox
.
Spatial Vision
,
10
,
433
436
.
Bukach
,
C. M.
,
Phillips
,
W. S.
, &
Gauthier
,
I.
(
2010
).
Limits of generalization between categories and implications for theories of category specificity
.
Attention, Perception, & Psychophysics
,
72
,
1865
1874
.
Chen
,
G.
,
Wang
,
F.
,
Gore
,
J. C.
, &
Roe
,
A. W.
(
2012
).
Identification of cortical lamination in awake monkeys by high resolution magnetic resonance imaging
.
Neuroimage
,
59
,
3441
3449
.
Clerkin
,
E. M.
,
Hart
,
E.
,
Rehg
,
J. M.
,
Yu
,
C.
, &
Smith
,
L. B.
(
2017
).
Real-world visual statistics and infants' first-learned object names
.
Philosophical Transactions of the Royal Society of London, Series B: Biological Sciences
,
372
,
20160055
.
Curby
,
K. M.
,
Glazek
,
K.
, &
Gauthier
,
I.
(
2009
).
A visual short-term memory advantage for objects of expertise
.
Journal of Experimental Psychology: Human Perception and Performance
,
35
,
94
107
.
Elbich
,
D. B.
, &
Scherf
,
S.
(
2017
).
Beyond the FFA: Brain-behavior correspondences in face recognition abilities
.
Neuroimage
,
147
,
409
422
.
Farah
,
M. J.
,
Wilson
,
K. D.
,
Drain
,
M.
, &
Tanaka
,
J. N.
(
1998
).
What is “special” about face perception?
Psychological Review
,
105
,
482
498
.
Fausey
,
C. M.
,
Jayaraman
,
S.
, &
Smith
,
L. B.
(
2016
).
From faces to hands: Changing visual input in the first two years
.
Cognition
,
152
,
101
107
.
Fischl
,
B.
, &
Dale
,
A. M.
(
2000
).
Measuring the thickness of the human cerebral cortex from magnetic resonance images
.
Proceedings of the National Academy of Sciences, U.S.A.
,
97
,
11050
11055
.
Gałecki
,
A.
, &
Burzykowski
,
T.
(
2013
).
Linear mixed-effects models using R: A step-by-step approach
.
New York
:
Springer
.
Germine
,
L. T.
,
Duchaine
,
B.
, &
Nakayama
,
K.
(
2011
).
Where cognitive development and aging meet: Face learning ability peaks after age 30
.
Cognition
,
118
,
201
110
.
Gomez
,
J.
,
Barnett
,
M. A.
,
Natu
,
V. S.
,
Mezer
,
A.
,
Palomero-Gallagher
,
N.
,
Weiner
,
K. S.
, et al
(
2017
).
Microstructural proliferation in human cortex is coupled with the development of face processing
.
Science
,
355
,
68
71
.
Haacke
,
E. M.
,
Xu
,
Y.
,
Cheng
,
Y.-C. N.
, &
Reichenbach
,
J. R.
(
2004
).
Susceptibility weighted imaging (SWI)
.
Magnetic Resonance in Medicine
,
52
,
612
618
.
Hoijtink
,
H.
(
2011
).
Informative hypotheses: Theory and practice for behavioral and social scientists
.
Boca Raton, FL
:
CRC Press
.
Jeffery
,
L.
,
Rhodes
,
G.
,
McKone
,
E.
,
Pellicano
,
E.
,
Crookes
,
K.
, &
Taylor
,
E.
(
2011
).
Distinguishing norm-based from exemplar-based coding of identity in children: Evidence from face identity aftereffects
.
Journal of Experimental Psychology: Human Perception and Performance
,
37
,
1824
1840
.
Jeffreys
,
H.
(
1961
).
Theory of probability
(3rd ed.).
Oxford
:
Clarendon Press
.
Kass
,
R. E.
, &
Raftery
,
A. E.
(
1995
).
Bayes factors
.
Journal of the American Statistical Association
,
90
,
773
795
.
Liang
,
F.
,
Paulo
,
R.
,
Molina
,
G.
,
Clyde
,
M. A.
, &
Berger
,
J. O.
(
2008
).
Mixtures of g priors for Bayesian variable selection
.
Journal of the American Statistical Association
,
103
,
410
423
.
Lifshits
,
S.
,
Tomer
,
O.
,
Shamir
,
I.
,
Barazany
,
D.
,
Tsarfaty
,
G.
,
Rosset
,
S.
, et al
(
2018
).
Resolution considerations in imaging of the cortical layers
.
Neuroimage
,
164
,
112
120
.
Lorenz
,
S.
,
Weiner
,
K. S.
,
Caspers
,
J.
,
Mohlberg
,
H.
,
Schleicher
,
A.
,
Bludau
,
S.
, et al
(
2017
).
Two new cytoarchitectonic areas on the human mid-fusiform gyrus
.
Cerebral Cortex
,
27
,
373
385
.
McGugin
,
R. W.
,
Gatenby
,
J. C.
,
Gore
,
J. C.
, &
Gauthier
,
I.
(
2012
).
High-resolution imaging of expertise reveals reliable object selectivity in the fusiform face area related to perceptual performance
.
Proceedings of the National Academy of Sciences, U.S.A.
,
109
,
17063
17068
.
McGugin
,
R. W.
,
Richler
,
J. J.
,
Herzmann
,
G.
,
Speegle
,
M.
, &
Gauthier
,
I.
(
2012
).
The Vanderbilt Expertise Test reveals domain-general and domain-specific sex effects in object recognition
.
Vision Research
,
69
,
10
22
.
McGugin
,
R. W.
,
Ryan
,
K. F.
,
Tamber-Rosenau
,
B. J.
, &
Gauthier
,
I.
(
2018
).
The role of experience in the face-selective response in right FFA
.
Cerebral Cortex
,
28
,
2071
2084
.
McGugin
,
R. W.
,
Van Gulick
,
A. E.
, &
Gauthier
,
I.
(
2016
).
Cortical thickness in fusiform face area predicts face and object recognition performance
.
Journal of Cognitive Neuroscience
,
28
,
282
294
.
McGugin
,
R. W.
,
Van Gulick
,
A. E.
,
Tamber-Rosenau
,
B. J.
,
Ross
,
D. A.
, &
Gauthier
,
I.
(
2015
).
Expertise effects in face-selective areas are robust to clutter and diverted attention, but not to competition
.
Cerebral Cortex
,
25
,
2610
2622
.
Morey
,
R. D.
,
Rouder
,
J. N.
,
Jamil
,
T.
,
Urbanek
,
S.
,
Forner
,
K.
, &
Ly
,
A.
(
2018
).
BayesFactor: Computation of Bayes factors for common designs (R package version 0.9.12-4.2)
. .
Natu
,
V. S.
,
Gomez
,
J.
,
Barnett
,
M.
,
Jeska
,
B.
,
Kirilina
,
E.
,
Jaeger
,
C.
, et al
(
2018
).
Apparent thinning of visual cortex during childhood is associated with myelination, not pruning
.
BioRxiv
,
368274
.
O'Hearn
,
K.
,
Schroer
,
E.
,
Minshew
,
N.
, &
Luna
,
B.
(
2010
).
Lack of developmental improvement on a face memory task during adolescence in autism
.
Neuropsychologia
,
48
,
3955
3960
.
Pinheiro
,
J.
,
Bates
,
D.
,
DebRoy
,
S.
,
Sarkar
,
D.
,
Heisterkamp
,
S.
, &
Van Willigen
,
B.
(
2018
).
nlme: Linear and nonlinear mixed effects models (R package version 3.1-137)
. .
Plessen
,
K. J.
,
Hugdahl
,
K.
,
Bansal
,
R.
,
Hao
,
X.
, &
Peterson
,
B. S.
(
2014
).
Sex, age, and cognitive correlates of asymmetries in thickness of the cortical mantle across the life span
.
Journal of Neuroscience
,
34
,
6294
6302
.
R Core Team
. (
2018
).
R: A language and environment for statistical computing
.
Vienna
:
R Foundation for Statistical Computing
. .
Raznahan
,
A.
,
Toro
,
R.
,
Daly
,
E.
,
Robertson
,
D.
,
Murphy
,
C.
,
Deeley
,
Q.
, et al
(
2010
).
Cortical anatomy in autism spectrum disorder: An in vivo MRI study on the effect of age
.
Cerebral Cortex
,
20
,
1332
1340
.
Richler
,
J. J.
,
Wilmer
,
J. B.
, &
Gauthier
,
I.
(
2017
).
General object recognition is specific: Evidence from novel and familiar objects
.
Cognition
,
166
,
42
55
.
Rouder
,
J. N.
, &
Morey
,
R. D.
(
2012
).
Default Bayes factors for model selection in regression
.
Multivariate Behavioral Research
,
47
,
877
903
.
Russell
,
R.
,
Duchaine
,
B.
, &
Nakayama
,
K.
(
2009
).
Super-recognizers: People with extraordinary face recognition ability
.
Psychonomic Bulletin & Review
,
16
,
252
257
.
Scherf
,
K. S.
,
Thomas
,
C.
,
Doyle
,
J.
, &
Behrmann
,
M.
(
2014
).
Emerging structure–function relations in the developing face processing system
.
Cerebral Cortex
,
24
,
2964
2980
.
Schultz
,
R. T.
(
2005
).
Developmental deficits in social perception in autism: The role of the amygdala and fusiform face area
.
International Journal of Developmental Neuroscience
,
23
,
125
141
.
Schwarz
,
G.
(
1978
).
Estimating the dimension of a model
.
Annals of Statistics
,
6
,
461
464
.
Shakeshaft
,
N. G.
, &
Plomin
,
R.
(
2015
).
Genetic specificity of face recognition
.
Proceedings of the National Academy of Sciences, U.S.A.
,
112
,
12887
12892
.
Sowell
,
E. R.
,
Thompson
,
P. M.
,
Leonard
,
C. M.
,
Welcome
,
S. E.
,
Kan
,
E.
, &
Toga
,
A. W.
(
2004
).
Longitudinal mapping of cortical thickness and brain growth in normal children
.
Journal of Neuroscience
,
24
,
8223
8231
.
Stefanacci
,
L.
, &
Amaral
,
D. G.
(
2000
).
Topographic organization of cortical inputs to the lateral nucleus of the macaque monkey amygdala: A retrograde tracing study
.
Journal of Comparative Neurology
,
421
,
52
79
.
Studholme
,
C.
,
Hill
,
D. L. G.
, &
Hawkes
,
D. J.
(
1999
).
An overlap invariant entropy measure of 3D medical image alignment
.
Pattern Recognition
,
32
,
71
86
.
Sugden
,
N. A.
, &
Moulson
,
M. C.
(
2019
).
These are the people in your neighbourhood: Consistency and persistence in infants' exposure to caregivers', relatives', and strangers' faces across contexts
.
Vision Research
,
157
,
230
241
.
Sugiura
,
N.
(
1978
).
Further analysis of the data by Akaike's information criterion and the finite corrections
.
Communications in Statistics—Theory and Methods
,
7
,
13
26
.
Sunday
,
M. A.
,
Lee
,
W.-Y.
, &
Gauthier
,
I.
(
2018
).
Age-related differential item functioning in tests of face and car recognition ability
.
Journal of Vision
,
18
,
2
.
Trampel
,
R.
,
Ott
,
D. V. M.
, &
Turner
,
R.
(
2011
).
Do the congenitally blind have a Stria of Gennari? First intracortical insights in vivo
.
Cerebral Cortex
,
21
,
2075
2081
.
Versluis
,
M. J.
,
Peeters
,
J. M.
,
van Rooden
,
S.
,
van der Grond
,
J.
,
van Buchem
,
M. A.
,
Webb
,
A. G.
, et al
(
2010
).
Origin and reduction of motion and f0 artifacts in high resolution T2*-weighted magnetic resonance imaging: Application in Alzheimer's disease patients
.
Neuroimage
,
51
,
1082
1088
.
Viola
,
P.
, &
Wells
,
W. M.
, III
(
1997
).
Alignment by maximization of mutual information
.
International Journal of Computer Vision
,
24
,
137
154
.
Webster
,
M. J.
,
Ungerleider
,
L. G.
, &
Bachevalier
,
J.
(
1991
).
Lesions of inferior temporal area TE in infant monkeys alter cortico-amygdalar projections
.
NeuroReport
,
2
,
769
772
.
Wells
,
W. M.
, III
,
Viola
,
P.
,
Atsumi
,
H.
,
Nakajima
,
S.
, &
Kikinis
,
R.
(
1996
).
Multi-modal volume registration by maximization of mutual information
.
Medical Image Analysis
,
1
,
35
51
.
Wenger
,
E.
,
Schaefer
,
S.
,
Noack
,
H.
,
Kühn
,
S.
,
Mårtensson
,
J.
,
Heinze
,
H.-J.
, et al
(
2012
).
.
Neuroimage
,
59
,
3389
3397
.
Zatorre
,
R. J.
,
Fields
,
R. D.
, &
Johansen-Berg
,
H.
(
2012
).
Plasticity in gray and white: Neuroimaging changes in brain structure during learning
.
Nature Neuroscience
,
15
,
528
536
.