Automated deep learning segmentation of high-resolution 7 Tesla postmortem MRI for quantitative analysis of structure-pathology correlations in neurodegenerative diseases

Abstract Postmortem MRI allows brain anatomy to be examined at high resolution and to link pathology measures with morphometric measurements. However, automated segmentation methods for brain mapping in postmortem MRI are not well developed, primarily due to limited availability of labeled datasets, and heterogeneity in scanner hardware and acquisition protocols. In this work, we present a high-resolution dataset of 135 postmortem human brain tissue specimens imaged at 0.3 mm3 isotropic using a T2w sequence on a 7T whole-body MRI scanner. We developed a deep learning pipeline to segment the cortical mantle by benchmarking the performance of nine deep neural architectures, followed by post-hoc topological correction. We evaluate the reliability of this pipeline via overlap metrics with manual segmentation in 6 specimens, and intra-class correlation between cortical thickness measures extracted from the automatic segmentation and expert-generated reference measures in 36 specimens. We also segment four subcortical structures (caudate, putamen, globus pallidus, and thalamus), white matter hyperintensities, and the normal appearing white matter, providing a limited evaluation of accuracy. We show generalizing capabilities across whole-brain hemispheres in different specimens, and also on unseen images acquired at 0.28 mm3 and 0.16 mm3 isotropic T2*w fast low angle shot (FLASH) sequence at 7T. We report associations between localized cortical thickness and volumetric measurements across key regions, and semi-quantitative neuropathological ratings in a subset of 82 individuals with Alzheimer’s disease (AD) continuum diagnoses. Our code, Jupyter notebooks, and the containerized executables are publicly available at the project webpage (https://pulkit-khandelwal.github.io/exvivo-brain-upenn/).


INTRODUCTION
Neurodegenerative diseases are increasingly understood to be heterogeneous, with multiple distinct neuropathological processes jointly contributing to neurodegeneration in most patients, called mixed pathology ( Schneider et al., 2007).For example, many patients diagnosed at autopsy with Alzheimer's disease (AD) also have brain lesions associated with vascular disease, TDP-43 proteinopathy, and αsynuclein pathology ( Matej et al., 2019;Robinson et al., 2018).Currently, some of these pathological processes (particularly TDP-43 and αsynuclein pathologies) cannot be reliably detected with antemortem biomarkers, which makes it difficult for clinicians to determine to what extent cognitive decline in individual patients is driven by AD versus other factors.The recent modest successes of AD treatments in clinical trials ( van Dyck et al., 2022) make it ever more important to derive antemortem biomarkers that can detect and quantify mixed pathology, so that treatments can be prioritized for those most likely to benefit from them.
Importantly, the understanding and utility of antemortem biomarkers are augmented by the coupling of imaging and postmortem pathology.Following autopsy, histological examination of the donor's brain tissue provides a semi-quantitative assessment of the presence and severity of various pathological drivers of neurodegeneration.Associations between these pathology measures and regional measures of neurodegeneration, such as cortical thickness, can identify patterns of neurodegeneration probabilistically linked to specific pathological drivers ( de Flores et al., 2020;Frigerio et al., 2021;Wisse et al., 2020Wisse et al., , 2021)).Such association studies can use either antemortem or postmortem imaging ( Wisse et al., 2017).Both approaches have their limitations.When the time between antemortem imaging and death is substantial, the postmortem pathology may not accurately match the state of pathology at the time of imaging.Postmortem MRI generally requires dedicated imaging facilities and image analysis algorithms, and whatever neurodegeneration/pathology association patterns are discovered have to be "translated" into the antemortem imaging domain for use as antemortem biomarkers.However, postmortem MRI allows imaging at much greater resolution than antemortem, allowing structure/pathology associations to be examined with greater granularity than antemortem imaging.
Furthermore, postmortem MRI of the brain can provide an advantage over antemortem MRI for visualizing detailed and intricate neuroanatomy and linking macroscopic morphometric measures such as cortical thickness to underlying cytoarchitecture and pathology ( Adler et al., 2014;Alkemade et al., 2022;Augustinack et al., 2010;Beaujoin et al., 2018;García-Cabezas et al., 2020;Iglesias et al., 2015;Mancini et al., 2020;Pallebage-Gamarallage et al., 2018;Vega et al., 2021).Recent inquiries comparing postmortem imaging with histopathology have demonstrated relationships between atrophy measures and neurodegenerative pathology ( Makkinejad et al., 2019;Ravikumar et al., 2021;Wisse et al., 2021;Yushkevich et al., 2021).Such associations corroborate patterns of neurodegeneration by specifically linking them with the underlying contributing pathology such as TAR DNA-binding protein 43 (TDP-43), phosphorylated tau (p-tau), and αsynuclein in Alzheimer's Disease (AD).In particular, Ravikumar et al. (2021) found significant correlations between tau pathology and thickness in the entorhinal cortex (ERC) and stratum radiatum lacunosum moleculare (SRLM) consistent with early Braak stages.Separately, Wisse et al. (2021) found significant associations of TDP-43 with thickness in the hippocampal subregions.Postmortem imaging also helps in characterizing underlying anatomy at the scale of subcortical layers ( Augustinack et al., 2013;Kenkhuis et al., 2019), such as hippocampal subfields in the medial temporal lobe (MTL) ( Ravikumar et al., 2020;Yushkevich et al., 2021).Several studies have also explored pathology/MRI associations in other neurodegenerative diseases, such as frontotemporal lobar degeneration (FTLD) and amyotrophic lateral sclerosis (ALS) ( Gordon et al., 2016;Irwin et al., 2015;Mackenzie et al., 2011).Previous work has identified correlations between high-resolution postmortem MRI and histopathology, to map myelin and iron deposits in cortical laminae ( Bulk et al., 2020), due to oligodendrocytes and pathologic iron inclusions in astrocytes and microglia ( Tisdall et al., 2021), which are a major source of iron because of myelination demands.Therefore, postmortem MRI would be helpful for validating and refining pathophysiological correlates derived from antemortem studies.Additionally, the volume of the WMH burden is an indirect marker of cerebrovascular pathology and the associations between cortical thickness and WMH ( Dadar et al., 2022;Du et al., 2005;Rizvi et al., 2018) are complementary to the associations between cortical thickness and tau, TDP-43, amyloid-β and αsynuclein pathology.Also, van der Velpen et al. (2023) suggest that the subcortical brain structures are highly involved in dementia risk.Smaller volumes and thickness measurements of thalamus, amygdala, and hippocampus were associated with incident dementia.
Compared to antemortem MRI, postmortem MRI is not affected by head or respiratory motion artifacts and has much less stringent time.Compared to histology, postmortem MRI is less affected by distortion or tearing, and provides a continuous 3D representation of brain anatomy.However, both histology and postmortem MRI are affected by changes occurring in the agonal state, during brain removal, and during tissue fixation and handling.Indeed, postmortem MRI is often used to provide a 3D reference space onto which 2D histological images are mapped.Combined analysis of postmortem MRI and histology makes it possible to link morphological changes in the brain to underlying pathology as well as to generate anatomically correct parcellations of the brain based on cytoarchitecture ( Amunts et al., 2020;Schiffer, Spitzer, et al., 2021), and pathoarchitecture ( Augustinack et al., 2013).Postmortem MRI could also act as a reference space to generate quantitative 3D maps of neurodegenerative proteinopathies from serial histology imaging ( Ushizima et al., 2022;Yushkevich et al., 2021).
Given the rising use of high-resolution postmortem MRI in neurodegenerative disease research, automated techniques are imperative to effectively analyze such growing datasets.Particularly, in the case of structurepathology association studies, scaling them beyond a few dozen datasets requires reliable morphometry measurements from postmortem MRI via accurate 3D segmentation and reconstruction of the structures of interest.There has been substantial work in brain MRI parcellation such as FreeSurfer ( Fischl, 2012) and recent efforts based on deep learning ( Chen et al., 2018;Henschel et al., 2020).However, these approaches focus on antemortem MRI, and there is limited work on developing automated segmentation methods for postmortem MRI segmentation.Postmortem segmentation methods have been region specific.Recent developments include automated deep learning methods for high-resolution cytoarchitectonic mapping of the occipital lobe in 2D histological sections ( Eckermann et al., 2021;Kiwitz et al., 2020;Schiffer, Amunts, et al., 2021;Schiffer, Harmeling, et al., 2021;Schiffer, Spitzer, et al., 2021;Spitzer et al., 2018).The work by Iglesias et al. (2015Iglesias et al. ( , 2018) ) has developed an atlas to segment the MTL and the thalamus using manual segmentations in postmortem images.Yet, a postmortem segmentation method applicable to a variety of brain regions has yet to be described.This is attributable to several factors.Some groups have developed robust whole-brain postmortem image analysis tools ( Casamitjana et al., 2024;Edlow et al., 2019;Jonkman et al., 2019;Mancini et al., 2020;Zeng et al., 2023), though overall there is limited availability of postmortem specimens, scans, segmentation algorithms, and labeled reference standard segmentations.Compared to antemortem structural MRI, postmortem MRI currently exhibits substantial heterogeneity in scanning protocols, larger image dimensions, greater textual complexity, and more profound artifacts.These issues can be addressed with new datasets and automated segmentation tools open to the public.
In this study, we expand upon our pilot study ( Khandelwal, Duong, et al., 2022;Khandelwal, Sadaghiani, et al., 2022;Khandelwal et al., 2021) and develop a methodological framework to segment cortical gray matter; subcortical structures (caudate, putamen, globus pallidus, thalamus), white matter (WM), and WMH in highresolution (0.3 x 0.3 x 0.3 mm 3 ) 7 T T2w postmortem MRI scans of whole-brain hemispheres.We train and evaluate our approach using 135 brain hemisphere scans from the Center for Neurodegenerative Disease Research of the University of Pennsylvania.We measure cortical thickness at several key locations in the cortex based on our automatic segmentation of the cortex, and correlate these measures with thickness measurements obtained using a user-guided semi-automated protocol.High consistency between these two sets of measures supports the use of deep learning-based automated thickness measures for postmortem brain segmentation and morphometry.We then report regional patterns of association between cortical thickness at a set of anatomical locations and neuropathology ratings (regional measures of p-tau, neuronal loss; as well as global amyloid-β, Braak staging, and CERAD ratings) obtained from histology data and WMH burden.Additionally, we show that networks trained on T2-weighted spin echo images acquired at 7 T generalize to postmortem images obtained with T2*w gradient echo fast low angle shot (FLASH) 7T MRI acquired at a resolution of 0.28 x 0.28 x 0.28 mm 3 and 0.16 x 0.16 x 0.16 mm 3 .

Donor cohort
We analyze a dataset of 135 postmortem wholehemisphere MRI scans selected from Penn Integrated Neurodegenerative Disease Database (INDD) ( Toledo et al., 2014).Patients were evaluated at the Penn Frontotemporal Degeneration Center (FTDC) or Alzheimer's Disease Research Center (ADRC) and followed to autopsy at the Penn Center for Neurodegenerative Disease Research (CNDR) as part of ongoing and previous clinical research programs ( Arezoumandan et al., 2022;Irwin, 2016;Tisdall et al., 2021).The cohort included 62 female (sex assigned at birth) donors (Age: 75.37 ± 10.02 years, Age range: 53-97) and 73 male donors (Age: 73.95 ± 11.59 years, Age range: 44-101) with Alzheimer's Disease or related dementias (ADRD), such as Lewy body disease (LBD), FTLD-TDP43, progressive supranuclear palsy (PSP), or cognitively normal adults.Human brain specimens were obtained in accordance with local laws and regulations, and included informed consent from next of kin at time of death.The patients were evaluated at FTDC and ADRC as per standard diagnostic criteria ( Toledo et al., 2014), and imaged by the teams at the ADRC and the Penn Image Computing and Science Laboratory (PICSL) and the FTDC.Autopsy was performed at the CNDR. Figure 1 shows an example of a brain specimen with Parkinson's and LBD ready for autopsy.The brain is subsequently cut into 1 cm-thick slabs using a 3D-printed cutting mold, and slabs are further cut and embedded in paraffin for other ongoing studies in the laboratory.Supplementary Figure 1 illustrates the internal structures of the brain.Separately, the postmortem tissue photograph of a specimen with progressive non-fluent aphasia (PFNA) and Globular glial tauopathy (GGT) disease is shown in Supplementary Figure 2. Table 1 details the primary neuropathological diagnostic groups in the cohort with complete details tabulated in the Supplementary Spreadsheet.

MRI acquisition
During the autopsy of a specimen, one hemisphere was immersed in 10% neutral buffered formalin for at least 4 weeks prior to imaging.After the fixation time, samples were placed in Fomblin (California Vacuum Technology; Freemont, CA), enclosed in custom-built plastic bag holders.Samples were left to rest for at least 2 days to allow the air bubbles to escape from the tissue.The samples were scanned using either a custom-built small solenoid coil or a custom-modified quadrature birdcage (Varian, Palo Alto, CA, USA) coil ( Edlow et al., 2019;Tisdall et al., 2021).The samples were then placed into a whole-body 7 T scanner (MAGNETOM Terra, Siemens Healthineers, Erlangen, Germany).T2-weighted images were acquired using a 3D-encoded T2 SPACE sequence with 0.3 x 0.3 x 0.3 mm 3 isotropic resolution, 3 s repetition time (TR), echo time (TE) 383 ms, turbo factor 188, echo train duration 951 ms, and bandwidth 348 Hz/px in approximately 2-3 hours per measurement.Image reconstruction was done using the vendor's on-scanner reconstruction Imaging Neuroscience, Volume 2, 2024 software which also corrected the global frequency drift, combined the signal averages in k-space, and produced magnitude images for each echo.A total of four repeated measurements were acquired for each sample and subsequently averaged to generate the final image.Sample MRI slices are shown in Figure 2 for a range of specimens with different diseases.The image acquisition suffers from geometric distortions due to the non-linearity of the magnetic gradient field that increases towards the ends (both anterior and posterior) of the sample and B1-transmit inhomogeneity, which results in decreased image quality as shown in Figure 2.
We also acquire images at a much higher resolution using two separate T2*w FLASH MRI sequences, one at 280 microns and the other at 160 microns.We use these sequences for the generalization experiments as described in Section 4.3.Here, we briefly describe their acquisition parameters.For the 280 microns isotropic resolution: MRI data were reacquired with a 3D-encoded, 8-echo gradient-recalled echo (GRE) sequence with non-selective RF pulses.To maintain readout polarity and minimize distortions due to field inhomogeneity, each readout was followed by a flyback rephrasing gradient.The final echo was followed by an additional Fig. 2. MRI of the T2w sequence representative of Alzheimer's disease and related dementias (ADRD) spectrum with mixed pathology and diagnoses of five subjects.The heterogeneity among the subjects can be appreciated through the three different viewing planes.Notice that the MRI signal drops off at the anterior and posterior ends of the hemisphere, a drawback of the current acquisition protocol.AD, Alzheimer's disease; CVD, cerebrovascular disease; LATE, limbicpredominant age-related TDP-43 encephalopathy; LBD, Lewy body disease; CTE, chronic traumatic encephalopathy; FTLD-TDP, frontotemporal lobar degeneration with TDP inclusions; GGT, globular glial tauopathy; PART, primary agerelated tauopathy; PSP, progressive supranuclear palsy.completely rephrased readout to measure frequency drifts.Each line of k-space was acquired with multiple averages sequentially before advancing to the next phase-encode step.Common parameters for the sequence were: 280 microns isotropic resolution, 25° flip angle, 60 ms repetition time (TR), minimum echo time (TE) 3.48 ms, echo spacing 6.62 ms, and bandwidth 400 Hz/px.The field of view was adapted to each sample, and subsequently TRs and TEs were slightly modified based on the necessary readout duration.Total scan times were 8-10 hours for each sample.
For the 160 microns isotropic resolution: MRI data were acquired with a 3D-encoded, 3-echo GRE sequence with non-selective RF pulses with the sequence: 25° flip angle, 60 ms repetition time (TR), minimum echo time (TE), 9.37 ms, echo spacing 11.33 ms, and bandwidth 90 Hz/px similar to methods described in Tisdall et al. (2022).

Localized thickness measurement pipeline at key cortical locations
Our center has adopted an expert-supervised semiautomatic protocol to obtain localized quantitative measures of cortical thickness in all 7 T postmortem MRI scans, as described in the work by Sadaghiani et al. (2022) Section 2.3 and supplementary material S1 and Wisse et al. (2021) Section: MTL thickness measurements and supplementary material: Thickness measurement.In the current study, we use these measures as the reference standard for evaluating automated cortical segmentation.In each hemisphere, 16 cortical landmarks are identified and labeled on the MRI scan as shown in Figure 4A.To measure cortical thickness at these locations, a semi-automatic level set segmentation of the surrounding cortical ribbon is performed and the maximal sphere inscribed into the cortical segmentation and containing the landmark is found; the diameter of this sphere gives thickness at that landmark (Fig. 4B).

METHODS
In Section 3.1, we describe the manual segmentation protocols developed for cortical gray matter, the four subcortical structures, WMH, and normal-appearing white matter.Then, in Sections 3.2 and 3.3, we describe the deep learning-based pipeline and performance evaluation criteria developed for the automated segmentation of these structures.In Section 3.4, we describe the post-hoc topology correction step employed to provide geometrically accurate segmentations.Finally, in Section 3.5, we describe the statistical models used to link neuropathological ratings of p-tau, amyloid-β neuronal loss, CERAD scores, Braak Staging with morphometry based on volume, and regional thickness measurements obtained from the automated segmentations.

Manual segmentation protocols
We developed protocols to manually segment structures in postmortem MRI: cortical gray matter, subcortical structures (caudate, putamen, globus pallidus, thalamus), WM, and WMH.We used these manual segmentations along with the corresponding MR images to train the neural networks.All manual segmentations were done by raters in ITK-SNAP ( Yushkevich et al., 2019).Supplementary Table 1 shows which subjects were Fig. 4. Cortical thickness is measured at the 16 landmarks (A).A dot (shown here: motor, visual, BA35) is first placed to define an anatomical landmark, around which a semi-automatic level set segmentation of the surrounding cortical ribbon is provided.A maximally inscribed sphere is then computed using Voronoi skeletonization ( Ogniewicz & Ilg, 1992), and the diameter of the sphere gives thickness at that landmark (B).manually segmented to obtain the training data and the reference standard for the different labels, and the subsequent cross-validation studies as described in Sections 3.2 and 3.3.

Cortical gray matter
We sampled five 3D image patches of size 64 x 64 x 64 voxel 3 , as shown in Figure 5 around the orbitofrontal cortex (OFC), anterior temporal cortex (ATC), inferior prefrontal cortex (IPFC), primary motor cortex (PMC), and primary somatosensory cortex (PSC) from 6 brain hemispheres, resulting in a total of 30 patches.These regions were selected as representative regions with variable levels of pathology in ADRD cases and control regions (PMC, PSC) ( Tisdall et al., 2021).For example, OFC, ATC, and IPFC have high pathology in PSP, whereas, the PSC, which is generally less affected in most neurodegenerative diseases, was sampled as a negative control.In each patch, gray matter was segmented manually in ITK-SNAP.Five manual raters, divided into groups of two (E.K. and G.C.) and three (E.H., B.L. and E.M.), labeled gray matter as the foreground, and rest of the image as the background using a combination of manual tracing and the semi-automated segmentation tool.The manual segmentation of cortical gray matter was supervised by author P.K. Figure 5 shows sample patch images and the corresponding reference standard labels with 3D renderings.We followed some guiding principles for manual segmentation: (1) we note that the white layer enveloping the cortex is not an imaging artifact, but is the outermost layer of the cortex, and thus is labeled as the gray matter; (2) adjoining gyri in deep sulci region are correctly labeled as gray matter and demarcate the deep sulci as the background; (3) in several regions, gray matter have similar intensities with the nearby WMH which were resolved by visual inspection of texture in the surrounding region; and (4) the 64 x 64 x 64 patches provided little context when segmenting the GM, therefore, the corresponding whole hemisphere image was displayed on a separate ITK-SNAP window for the user to examine the structures surrounding the given image patch.Inter-rater reliability scores were computed for these manual segmentations in terms of Dice Coefficient (DSC): Raters 1&2: 95.26 ± 1.37 %, Raters 1&3: 94.64 ± 1.64 %, Raters 2&3: 94.54 ± 1.20 %, and Raters 4&5: 92.04 ± 4.26 %.After the review of the segmentations by an expert, the patches segmented by raters 2 (G.C.) and 5 (E.M.) were selected for training the deep learning models.

Subcortical structures
Four subcortical gray matter structures (caudate, putamen, globus pallidus, thalamus) were manually segmented on seven specimens, selected to span varying pathological diagnoses (including Alzheimer's spectrum, p-tauopathies, TDP-43 encephalopathies, LBD, cerebrovascular disease and mixed disease), the range of postmortem age and levels of cerebral atrophy, tissue quality and tearing, mineral deposition and calcification, blood vessel size, and vascular pathology on imaging and autopsy.Figure 6 shows an example manual segmentation of the four subcortical structures.First, structures were segmented across the axial plane based on several anatomical considerations.(1) The borders of the caudate (head, body, and tail) were defined by the frontal horns of the lateral ventricles (cerebrospinal fluid, anterior border) and the internal capsule (WM, boundaries of the body).Periventricular WM disease adjacent to the head of the caudate was excluded from the caudate segmentation.(2) The borders of the putamen were determined by the globus pallidus (medial) and the claustrum (lateral).
(3) The globus pallidus was annotated to include the pallidum interna and externa, bounded by WM surrounding the thalamus and subthalamic nucleus.Mineral deposition (such as calcium) was noted by areas of heterogeneous T2 hypointensities and predominantly localized to the lentiform nucleus; where needed, the contours of some segmentations were adjusted to include these regions of heterogeneous signal within the lentiform structures.(4) The thalamus was segmented as the medial gray matter bounded by the subthalamic nucleus, midbrain, mamillary bodies, corpus callosum, lateral ventricles, and caudate.After segmentation in the axial Fig. 6.An example of manual segmentation of the thalamus, putamen, caudate, globus pallidus, and white matter hyperintensities as per the protocol described in Sections 3.1.2and 3.1.3for a subject with Alzheimer's Disease and Lewy body disease (87 years old).Imaging Neuroscience, Volume 2, 2024 plane, volumes were edited in the coronal and sagittal planes to ensure smoothness of the segmentation across all three planes.Boundaries between structures (such as striations between caudate and putamen) were agreed upon among authors.Manual segmentation of the four subcortical structures was performed by author, E.C. and supervised by and edited by M.T.D. Subcortical segmentations were discussed in consensus meetings with P.K., P.A.Y., S.R.D., and D.A.W.

White matter hyperintensities
Nine specimens were chosen to segment WMHs across a gamut of vascular pathology with differing levels of WMH appearance, including focal small-vessel ischemia to intermediate periventricular and juxtacortical patterns to large, diffuse cerebrovascular disease.General principles for segmentation were applied as follows.(1) segmented lesions should be generally larger than 1 cm 3 , (2) segmentations should appear for at least 4-6 slices to be a 1 cm 3 region, and (3) segmented WMH should include both periventricular (anterior frontal and posterior temporal/occipital horns of the lateral ventricles) and juxtacortical lesions.This distinguished WMH from insular cortex, claustrum, basal ganglia, and other gray matter structures embedded in WM. ( 4) WMH segmentations generally exhibited signal intensity above a threshold of ≥450-550 intensity (but this was influenced by field inhomogeneity artifacts, either between images or within the same image, often at the anterior and posterior cortical poles), given the image intensity range was normalized between 0-1000.(5) WMH segmentations included T2 hyperintense perivascular spaces and cortical venules but must also include surrounding T2 hyperintense white matter regions that occupy a region larger than 1 cm 3 .( 6) WMH was segmented primarily in axial plane and then assessed for contiguity and smoothness in sagittal and axial planes as well as 3D renderings.Manual segmentation was performed by the author A.E.D. and supervised and edited by M.T.D. WMH segmentations were discussed in consensus meetings with P.K., P.A.Y, S.R.D, and D.A.W. Figure 6 shows an example manual segmentation of the WMH.

White matter
Two specimens with manually labeled subcortical structures and WMH, but nnU-Net-based automated segmentations for cortical mantle were thresholded to obtain the WM label, which was then manually corrected by the author A.E.D., supervised by P.K., to remove incorrectly labeled spurious voxels to fill-in holes and thereby define a clear GM/WM boundary.

Automated deep learning-based segmentation for cortical gray matter
We first developed an approach for labeling cortical gray matter in postmortem 7 T MRI scans using the 30 patchlevel cortical gray matter segmentations from six subjects (described in Section 3.1.1)for training and cross-validation.Given the exceptional performance of convolutional neural network (CNN) models for antemortem medical image segmentation ( Schiffer, Spitzer, et al., 2021), our effort focused on benchmarking leading existing CNN models, rather than developing another model for the postmortem cortical segmentation task.We benchmarked the following variants of popular biomedical image segmentation deep learning models: (1) nnU-Net ( Isensee et al., 2021); four variants of AnatomyNet ( Zhu et al., 2019) based on squeeze-and-excitation blocks ( Roy et al., 2018) 7) VoxResNet ( Chen et al., 2018); (8) VNet ( Milletari et al., 2016); and (9) Attention Unet ( Oktay et al., 2018).Supplementary Material provides architectural details of the nine deep neural networks.We use PyTorch 1.5.1 and a single Nvidia Quadro RTX 5000 GPU to train the models using user-annotated images described in Sections 3.1.1-3.1.3.To train the deep neural networks, the input images were standardized, and then normalized between 0 and 1.
To ensure a fair and systematic evaluation of the nine networks, we trained and evaluated the nine network architectures within the nnU-Net framework under matched conditions (i.e., same split of data into training/ validation/testing subsets; same data augmentation strategy, same hyper-parameter tuning strategy).We first evaluated the accuracy of cortical segmentation by six-fold cross-validation on the patch-level manual segmentations.We report average DSC and average Hausdorff Distance 95th percentile (HD95) across the 30 segmented patches in the six-fold cross-validation experiments.
To choose the best performing model for subsequent analyses, the nine trained models were then used to segment the whole hemisphere cortical mantle across a subset of the cohort (N = 36 for which the user-supervised semi-automated segmentations are available).We used these segmentations to compute the thickness around the 16 landmarks in these 36 subjects at the cortical landmarks.We then compared the thickness measurements to corresponding measures obtained via user-supervised semi-automated segmentations (reference standard) using our protocol (Section 2.4).In Supplementary Figure 4, for each of the 16 landmarks, we report Spearman's correlation coefficient between the automated and the reference manual segmentation-based cortical thickness measurements, and the average fixed raters intra-class correlation coefficient (ICC) ( Shrout & Fleiss, 1979).Based on the combination of cross-validation DSC accuracy, agreement between the automatic and manual segmentation-based cortical thickness measures, and overall visual inspection of the segmentations, a single model was selected for subsequent experiments.As reported in Section 4.1, this best-performing method was the nnU-Net model.

Automated deep learning-based segmentation for other structures
We trained another vanilla nnU-Net model to obtain the automated segmentation of subcortical structures, WMH, and normal-appearing WM.In particular, for the purpose of five-fold cross-validation studies to report DSC and HD95 scores, we trained an nnU-Net model based on the manually labeled training data for the segmentation of subcortical structures in seven subjects and WMH in nine subjects; and the manually labeled WM and the automated cortical mantle segmentations obtained from the nnU-Net model trained in the previous Section 3.2 as additional labels.Supplementary Table 1 lists the subjects used for training and evaluation of all the seven labels.

Post-hoc automated topology correction of segmentations
As shown in Figure 2, the image acquisition suffers from geometric distortions due to the non-linearity of the magnetic gradient field and might also suffer from partial volume averaging which generally introduces holes and handles in the segmentation and makes the WM surface to lack a spherical topology.Furthermore, the opposing banks of sulcus often appear to be fused, which makes it harder for the segmentation algorithm to correctly segment the cortex and thus might produce erroneous cortical thickness values.Hence, it is important to correct such missegmentations by enforcing a topology correction method.Therefore, we employ the methods developed in CRUISE: Cortical reconstruction using implicit surface evolution ( Han et al., 2004) made available as part of the "nighres" package, with default settings ( Huntenburg et al., 2018) for post-hoc topology correction of WM surface and constraining the GM segmentations.In particular, the method uses a fast marching-based method ( Bazin & Pham, 2007), a graph-based topology correction algorithm (GTCA) ( Han et al., 2002), and a topology preserving geometric deformable model (TGDM) ( Han et al., 2003) to rectify holes and handles, thus producing a WM surface with a spherical topology.Separately, the Anatomically Consistent Enhancement (ACE) method is used to provide a GM representation that has evidence of sulci where it might not otherwise exist due to the partial volume effect.Thus, we obtain geometrically accurate and topologically correct segmentations for the cortical GM and WM.In the current study, we apply this topology correction step on the automated GM/WM segmentation labels obtained from the nnU-Net model trained in the previous Section 3.3.We called the post-hoc topology corrected model as "nnU-Net-CRUISE."

Linking neuropathology ratings with morphometry
We computed Spearman's correlation between thickness measurements obtained from automated segmentations at the 16 anatomical landmarks described above with semiquantitative pathology ratings from approximately corresponding anatomical locations in the contralateral hemi sphere (regional p-tau score, regional neuronal loss) and global pathology ratings (CERAD stage, Braak stage, βamyloid).We repeat this analysis with thickness measurements obtained from user-annotated manual reference segmentations, and thereby test the hypothesis that similar trends would be seen between pathology correlations with automated and manual thickness measures, which, in turn, would imply that automated segmentations are viable for morphometry-based studies (Section 4.4).In particular, we follow the experimental design from our recent work ( Sadaghiani et al., 2022) for a subset of the cohort within the AD spectrum having AD as their primary diagnosis and also diagnosis of either: LBD, PART, LATE, CBD (N = 82) out of 135.The criteria for AD continuum are based on excluding cases with diagnoses of FLTD or non-AD tauopathy (whether primary or secondary).Finally, we normalized the WMH volumes by the corresponding WM volumes, and then computed one-sided Spearman correlation between the normalized WMH volume with regional cortical thickness and subcortical volumes for the subjects within the AD spectrum (N = 82).We include nuisance covariates of age, sex, and postmortem interval (PMI) in all of our analyses.

Dice coefficient volume overlap and qualitative analysis
Table 2 tabulates the patch-level cortical gray matter segmentation performance of the nine different networks across six-fold cross-validation.AnatomyNet and its vari-ants attain the highest patch-level DSC, closely followed by VoxResNet.The nnU-Net model has slightly lower DSC than the best AnatomyNet model, but the difference is less that 1%.However, since the patches used to train the segmentation networks were only sampled from select regions of the hemispheres, cross-validation accuracy on these patches is not necessarily indicative of the networks' ability to generalize to other brain regions.AnatomyNet and its variants are able to distinguish gray matter from white matter in high-contrast regions, but fail to segment the anterior and posterior regions where contrast is lower due to limitations of the MRI coil.There is also some systematic under-segmentation (see white arrows) of the cortex even in higher-contrast regions (see the white circled regions in Fig. 7).By contrast, nnU-Net clearly demarcates GM/WM boundary even in lowcontrast regions, which is remarkable considering that these regions were not captured by the training patches.Supplementary Figure 3 shows the Dice scores and the HD95 scores as box plots with pairwise paired t-tests with Bonferroni correction for all the nine network archi-tectures in 2. Mainly the comparisons between VNet, the lowest performing network, and the rest are significant.This supports our choice of nnU-Net given that it is not significantly worse performing than any of the other architectures.

Intra-class correlation coefficient
We compare cortical thickness (mm) at 16 cortical landmarks between automated gray matter segmentations obtained by the nine networks and the user-supervised semi-automated segmentation method, which serves as the reference standard.
Table 3.Average fixed raters intra-class correlation coefficient (ICC) scores for the regional cortical thickness measurements between automated nnU-Net and reference standard manual segmentations for all the nine neural network architectures, and the topologically corrected nnU-Net-CRUISE segmentations.Across each row, each cell is color coded, with darker shades indicating higher ICC value.
Table 3 tabulates the Average fixed raters ICC scores for all the nine networks trained as described in 3.2.We observe that nnU-Net (mean ICC = 0.72) is clearly the best among the nine patch-based models.The final nnU-Net-CRUISE model which has been topologically corrected as described in Sections 3.3 and 3.4 slightly exceeds the patch-based vanilla nnU-Net (mean ICC of 0.73) on average.The variants of AnatomyNet have mean ICC values of, Vanilla: 0.40, CE: 0.47, CE+SE: 0.40, and SE: 0.34.We observe that the variants of Anat-omyNet were the top performing models when evaluated using DSC scores at patch level, but did not generalize to robustly segment the entire cortical mantle; and thereby failed to show good correlation of regional cortical thickness when compared with the reference standard reference cortical thickness.The other four models also had very low ICC values: VoxResNet: 0.45, VNet: 0.28, 3D Unet: 0.47, and Attention Unet: 0.35.
The Bland-Altman plots in Figure 8 show strong agreement between reference standard and post-hoc topologically corrected automated nnU-Net segmentation-based thickness measurements for the 16 cortical landmarks.Furthermore, Supplementary Figure 4 shows that 13 out of the 16 regions have correlation coefficient (r) greater than 0.6.We also observe high ICC scores, with 12 regions having ICC greater than 0.7.These results confirm that automated segmentations are accurate to give desirable cortical thickness measurements.
Therefore, based on quantitative evaluation in terms of DSC and HD95 scores, ICC values, and the qualitative visual inspection of the segmentations for the different neural network architectures, we conclude that nnU-Net-CRUISE is the best performing model.Figure 9 shows the post-hoc topology corrected automated nnU-Net segmentations of cortical gray matter and white matter shown in sagittal view for five randomly chosen subjects.The first columns shows the MRI slice, the automated nnU-Net segmentation before and after topology correction step are shown in columns 2 and 3. Columns 4 and 5 show the zoomed-in area, with the red arrows indicating the regions where topology correction improves the segmentation.We notice that the opposite banks of sulci are no longer fused and in fact well demarcated after correcting for topology.These segmentations will therefore provide more reliable and accurate estimates of cortical thickness.

Other structures: subcortical, WMH, and WM segmentation
Based on the superior performance of nnU-Net, as explained in the previous section, we employ nnU-Net to perform the multi-label segmentation for WMH, caudate, putamen, globus pallidus, thalamus, and WM.
The mean DSC scores for the four subcortical structures and WMH across all leave-one-out cross-validation experiments were: WMH: 79.70 %, caudate: 88.18 %, putamen: 85.20 %, globus pallidus: 80.12 %, and the thalamus: 87.29 %.Note that currently, we do not have a large sample size to perform cross-validation evaluation on the WM, due to the time-consuming process of manually correcting the segmentations.Obtaining manual WM label is beyond the scope of the current study.Figures 10 and 11 show qualitative results for all the segmented structures in 2D and 3D respectively for the 36 subjects used for quantitative evaluation in the previous section, in sagittal view.

Generalization to other imaging sequences and protocols
Further highlighting the strong generalization properties of nnU-Net, Figure 12 illustrates that the nnU-Net model trained on 7 T 0.3 x 0.3 x 0.3 mm 3 T2w images is able to Fig. 8. Bland-Altman plots for the cortical regions.We observe good agreement between the manual and nnU-Net-CRUISE automated segmentations-based thickness (mm), which supports the hypothesis that topologically corrected deep learning-based automated segmentations are reliable for morphometric measurements.Note that the highlighted differences are in the direction: manual-automated segmentations.Supplementary Figure 3 shows the corresponding correlation plots.generalize well to MRI sequences and resolutions unseen during training.In particular, we evaluate the trained model on 7 T T2*w GRE FLASH sequence postmortem images acquired ( Tisdall et al., 2021) at 0.28 x 0.28 x 0.28 mm 3 (N = 13) and 0.16 x 0.16 x 0.16 mm 3 (N = 73) resolution as shown qualitatively in Figure 12.We observe good generalization performance for GM, WM and WMH but observe some under-segmentations in the subcortical structures for both the 160 and 280 micron sequences.Currently, we only provide qualitative assessment on a representative sample of the FLASH images.The limitation lies in quantitative assessment (beyond the scope of the current study) in terms of Dice score and ICV which is currently not possible due to lack of reference manual segmentation.
Supplementary Table 3 shows the one-sided Spearman's correlation (controlling for age, sex and PMI) between the cortical thickness measures derived from the topologically corrected nnU-Net-CRUISE gray matter segmentation with the corresponding medial temporal lobe (MTL) ratings of p-tau pathology and neuronal loss density as the MTL is a region linked to early neurodegeneration in Alzheimer's disease.The MTL pathology ratings are computed by taking the average pathology of the entorhinal cortex, CA1 and subiculum, and the dentate gyrus.We notice significant results in the entorhinal cortex (r = -0.53,p = 10-5) with the MTL p-tau rating; and in posterior cingulate (r = -0.396,p = 0.0008), entorhinal cortex (r = -0.47,p = 10-5), and BA35 (r = -0.321,p = 0.0026) with the MTL neuronal loss density.
Separately, Supplementary Table 4 tabulates the morphometry associations with the underlying neuropathology for the AD continuum (N = 21) subjects from the subset of the cohort which have the manual segmentations (N = 36) as described in Section 3.2.We observe that the analysis based on manual segmentations shows a similar trend as the nnU-Net-CRUISE automated segmentations, suggesting that automated segmentations obtained from the developed pipeline provide meaningful associations with the underlying neuropathological measurements, are thus reliable for validation of clinical ratings, and can act as surrogates for the time-consuming user-supervised segmentations.Finally, Supplementary Figures 7-13 depict all the correlations as a plot along with the p-values to appreciate the data distribution.

Normalized WMH volume correlation patterns with regional cortical thickness and subcortical volumes
Table 5 compares the correlation between the regional cortical thickness and subcortical volumes with the normalized WMH (WMH volume divided by the total WM volume) volume for measurements based on automated nnU-Net-CRUISE for the subjects in the AD continuum (N = 82).We observed significant negative one-sided Spearman's correlation which survived Bonferroni multiple corrections in posterior cingulate (r = -0.448,p = 0.0001) and midfrontal cortex (r = -0.325,p = 0.0019) with normalized WMH volume for the automated nnU-Net segmentation-based regional thickness measurements; and in caudate (r = -0.399,p = 0.0001) and thalamus (r = -0.352,p = 0.0007) for the subcortical structures when correlated with the normalized WMH volume.All the tests were controlled for age, sex, and PMI.Supplementary Figures 14-16 depict these correlations as a plot along with the p-values to observe the data distribution.

Segmentation pipeline
To our knowledge, the current study is the most comprehensive assessment of automated segmentation of 7 T postmortem human brain MRI.Our evaluation is performed in a large cohort of 135 brain specimens with a range of neurodegenerative pathologies, and focuses on multiple tasks: cortical gray matter segmentation, subcortical gray matter structure segmentation, as well as white matter and WMH segmentation.For cortical segmentation, we evaluated nine deep learning architectures using direct metrics of segmentation accuracy (cross-validation DSC), derived morphological metrics (ICC of regional confidence interval.*:0.01 < p ≤ 0.05; **:0.001< p ≤ 0.01; ***:0.0001< p ≤ 0.001; ****:0.00001< p ≤ 0.0001. cortical thickness with the reference standard; comparison of associations with pathology), and visual assessment.
Our paper stands apart from recent work on automated segmentation of postmortem brain MRI, which has either been performed in lower-resolution 3T MRI scans ( Mancini et al., 2020), or in smaller high-resolution datasets ( Jonkman et al., 2019).Mancini et al. (2020) used Free-Surfer ( Fischl, 2012) and a Bayesian modeling technique, SAMSEG ( Puonti et al., 2016), to map a single postmortem specimen imaged at 3 T but did not evaluate on higher resolution 7 T.In addition to segmenting the cortical gray matter, Mancini et al. (2020) parcellate the cortex into anatomical regions.A separate study ( Kotrotsou et al., 2014) also segmented the entire hemisphere but applied to a smaller dataset of 7 subjects with a slice thickness of 1.5 mm and imaged at 3 T.Both these methods relied on multi-atlas based image segmentation which is dependent on registration between high-resolution postmortem and low-resolution antemortem atlases.Registration between inter-modality postmortem and antemortem currently remains challenging, especially for higher-resolution 7T postmortem MRI ( Casamitjana et al., 2021;Casamitjana et al., 2022;Daly et al., 2021).Other recent studies on postmortem human brain morphometry ( Adler et al., 2014( Adler et al., , 2018;;Augustinack et al., 2014;DeKraker et al., 2018DeKraker et al., , 2021;;Iglesias et al., 2015;Ravikumar et al., 2021;Wisse et al., 2021;Yushkevich et al., 2021) focused on specific areas such as the hippocampus or the MTL, and relied on manual segmentation to guide inter-subject registration and atlas generation.
Our evaluation demonstrates that deep learningbased segmentation pipelines, particularly nnU-Net, can generate high-quality segmentations of cortical gray matter, subcortical structures, normal-appearing white matter, and white matter lesions even with very limited training data.With inference time of around 15 minutes on a CPU, our pipeline represents the first step towards fast, automated, and reliable brain mapping of highresolution postmortem whole-hemisphere MRI.Our nnU-Net based pipeline generalized well to areas of low contrast unseen during training, as well as to other MRI protocols, and resolutions.Furthermore, our pipeline imposed post-hoc constraints on the automated segmentations to produce geometrically accurate and topologically correct segmentations using deformable surface-based methods, which have shown tremendous success in the antemortem literature.Therefore, we were able to correctly label the challenging boundary deep sulci and the cortical gray matter, leading to more reliable cortical thickness estimates.Moreover, thickness measures derived from the deep learning-based automated segmentations concur with the reference standard, and similar associations between thickness and pathology are detected using automatically derived and reference standard thickness measurements, albeit with the latter usually having higher effect sizes.This suggests that fully automated cortical thickness analysis is feasible for postmortem MRI.Indeed, with further improvements to accuracy (e.g., a larger training set covering more of the hemisphere), automated postmortem segmentation may make the labor-intensive and subjective semi-automated approach to cortical thickness measurement unnecessary.Thus, the study suggests the feasibility of a fully automated group-wise cortical thickness analysis in postmortem MRI analogous to the way FreeSurfer is used for antemortem MRI morphometry, which forms the basis of our future research direction.
The limitations of the current pipeline include a relatively small whole hemisphere cortical gray matter segmentation training set which limited our ability to use direct metrics such as DSC to evaluate overall segmentation accuracy.Indeed, the method that performed best in terms of cross-validation DSC (AnatomyNet) performed worse in areas unseen during training than nnU-Net.To address this limitation, in future work we plan to train the method on manual segmentations of whole hemispheres; however, generating such a dataset will require a significant additional investment in time and effort.A limitation in the cross-validation experiment comparing the nine network architectures (Table 2) is that the 30 patches were not stratified by specimen during cross-validation, that is, patches from the same specimen appeared in both training and test subsets, which may have led to slight overestimation of Dice coefficients in Table 2.However, the patches within each subject were very different in terms of anatomical location, and crucially, the main validation experiment in this paper (ICC between expertguided and automatic cortical thickness measurements in Table 4) is free of such data leakage.We still rely on manually placed landmarks to measure thickness in specific anatomical regions, and we do not show 3D maps of thickness as is common in antemortem morphometry studies.For example, the variability in the placement of the BA35 landmarks may explain our observation of the stronger thickness/pathology associations in the ERC than in BA35, even though BA35 is thought to have early tau pathology.Moreover, we noticed that four regions (BA35, superior parietal, angular gyrus, and the visual cortices) have the lowest ICC scores between the manual reference and automated nnU-Net-CRUISE segmentations.In Supplementary Figure 6, we show a sample segmentation where we observe some discrepancy between the manual reference and automated nnU-Net-CRUISE segmentations.Some of the possible reasons for these discrepancies are in fact: over-or under-segmentation, Imaging Neuroscience, Volume 2, 2024 MRI artefacts, or, as mentioned above, the lack of precision in the placement of landmark on which the thickness computation is directly dependent.
Due to limited availability of reference standard segmentations, parcellating the brain into subregions, as in done in most antemortem MRI, currently remains a challenge.We intend to address this limitation by semimanually annotating the brain into different cortical and subcortical structures, guided by anatomical priors derived from antemortem MRI studies.We could then develop techniques for groupwise normalization studies by building a template for postmortem MRI.Towards this goal, we are currently developing deep learning-based methods for registration between postmortem and antemortem MRI.Our work is limited to ADRD, but in future we plan to evaluate our methodological pipeline on a cohort of non-demented specimens obtained from a separate postmortem dataset ( Boon et al., 2019;Frigerio et al., 2021;Jonkman et al., 2019).
Certain limitations of the postmortem imaging should be noted.Wisse et al. (2017) compared the cortical thickness of the MTL substructures between antemortem and postmortem; and also postmortem (3T) and postmortem (9.4T) MRI.They found differences in thickness on different MRI scans and attributed to the various factors such as: (1) difference in the actual size between the antemortem and postmortem tissue is due to an actual difference in size as studies have suggested that tissue changes may occur during or after death since the agonal state causes hypoxia and ischemia which results in brain swelling.(2) an increase in size could result from brain extraction, for example, by a relief of intracranial pressure after autopsy.(3) formalin fixation could cause the underlying differences by shrinking the brain after several weeks.
Separately, at the moment, we do not have a way to quantitatively assess the accuracy of the T2*w FLASH images.In follow-up work, we plan to collect reference data by manually correcting the mis-segmentations and then re-train the deep learning model in a few-shot learning setting to achieve a boost in segmentation performance for T2*w FLASH MRI.Then, we will quantitatively assess the segmentation performance for the said generalization procedure using reference user-supervised manually corrected reference segmentation.

Neuropathology associations
With the help of the proposed segmentation and morphometry pipeline, we were able to conduct studies of postmortem MRI that have not been possible before: for example, we replicated some of the findings from Sadaghiani et al. (2022), and drew associations between WMH volume, cortical thickness, and subcortical volumes.Here, we discuss some of the interesting findings.Strong postmortem image analysis frameworks allow us to better understand the distinct roles and degrees by which antemortem pathology affects neurodegeneration.Prior work shows how amyloid-β, p-tau, TDP-43, etc. have differential influences on atrophy ( Dugger & Dickson, 2017;Matej et al., 2019;Negash et al., 2011;Robinson et al., 2018), and an automated pipeline and dataset as shown here can bolster the quantitative interrogation of these open questions.It will allow future work to study links between macro structure and other local processes beyond pathology, including inflammatory markers, gene expression, etc.We also note that measuring WMH in postmortem imaging adds value to histological studies, as we have less clear measures of vascular burden with traditional autopsy, which will allow probing of WMH for better understanding their pathologic correlates given their non-specific nature.
The current study demonstrates the associations between regional cortical thickness measurements with the underlying semi-quantitative neuropathological ratings for the AD cohort.Negative correlations between p-tau and cortical thickness were found to be significant in angular gyrus and midfrontal regions, which is in line with previous research in antemortem ( Das et al., 2019;Harrison et al., 2021;LaPoint et al., 2017;Whitwell et al., 2018;Xia et al., 2017) and postmortem in situ MRI ( Frigerio et al., 2021) studies.Tau pathology is concurrent with neuronal loss in ADRD ( Dawe et al., 2011;Jack Jr et al., 2018;Ohm et al., 2021) and the loss of neurons is likely a key source of cortical atrophy.We observed that cortical thickness showed significant negative correlation with neuronal loss in BA35 and entorhinal cortex, regions where p-tau pathology has predicted the atrophy rate ( La Joie et al., 2020;LaPoint et al., 2017;Xie et al., 2018) in antemortem studies.Significant negative correlations were observed between Braak staging and cortical thickness in midfrontal, ERC and BA35, regions consistent with high p-tau uptake in positron emission tomography (PET) imaging with cortical thickness on MRI.Tau pathology in Braak regions plays an important role in cortical atrophy and cognitive decline during the course of AD.Similar findings are reported for global cortical thickness with in situ postmortem MRI in Frigerio et al. (2021).The relationship between amyloid-β and neurodegeneration is thought to be rather indirect ( Gómez-Isla & Frosch, 2022;Jack Jr et al., 2018).Nevertheless, we did find strong negative correlations between thickness and amyloid-β in midfrontal, inferior frontal, and ventrolateral temporal cortex, brain regions implicated in working memory capacity ( Barbey et al., 2013;Chiou & Ralph, 2018).Our observation of a significant negative correlation of CERAD scores with cortical thickness in the superior parietal region is consistent with previous studies relating CERAD with cortical thickness ( Paajanen et al., 2013;Santos et al., 2011).
Lastly, WMH has been implicated in age-related cognitive decline and AD, which is characterized by atrophy in the cortical mantle and the MTL ( Dadar et al., 2022;Du et al., 2005;Rizvi et al., 2018).In our study, we observed significant negative correlations between normalized WMH and thickness in posterior cingulate and superior temporal regions.Previous work ( Reijmer et al., 2015) showed that the disruption of structural and functional connectivity has an impact on executive functioning and memory among individuals with high WMH volume.To this point, our study found that subcortical atrophy was significantly negatively correlated with WMH volume in caudate and thalamus, suggesting more global effects on brain volume.
In our dataset, among the 135 specimens, 82 had AD pathology with existence of co-pathologies.Future studies may apply this dataset and pipeline to help disentangle the differential contributions of unique pathologies to individual atrophy patterns.Separately, we are aware that the pathology measures and MRI segmentation-based measures were obtained from contralateral hemispheres, which could potentially weaken the observed associations.But pathology in AD is usually largely symmetrical between the hemispheres, and therefore leaves less room for biases in the observed correlations, as claimed in a recent study ( Ravikumar et al., 2021) which showed that correlations between MTL thickness maps and both contralateral and ipsilateral semi-quantitative p-tau pathology scores did not detect substantially different correlation patterns.Overall, the fact that strong thickness-pathology associations were observed for many brain regions even in this sub-optimal setting, supports our overall hypothesis that thickness measures derived using the proposed automatic pipeline are suitable for brain morphometrypathology association studies.In future work, we plan to examine associations between these thickness measures and ipsilateral quantitative measures of pathology matched via multi-modality image registration.
Another limitation is that our study relies on semiquantitative measures of neuropathology, which are subjective and might not reflect a linear pathology burden.We are currently in the process of obtaining neuropathology measurements from the same hemisphere histology, and developing machine learning-based quantitative pathological ratings to further validate our work.The current study provides support for future work to use larger datasets and quantitative pathology measures to describe the contribution of multiple pathologies to brain morphology in neurodegenerative diseases.But, overall we observe a similar trend as described in our recent work ( Sadaghiani et al., 2022), which looked at regional cortical thickness with p-tau burden.These limitations could be avoided by expanding our analysis to a larger dataset, which we are actively working towards.
Lastly, we should mention that our postmortem imaging project was launched shortly before the COVID-19 pandemic, which greatly interfered with our ability to maintain consistency in formalin fixation.This has been addressed in our Center's more recent autopsies, where we aim for consistent 60 days fixation.Like other limitations, we hypothesize that if formalin fixation had been more consistent, that would only lead to stronger associations between structure and pathology.The plot in Supplementary Figure 5 shows the mismatch in thickness (in mm) between automated (nnU-Net-CRUISE) and manual reference segmentations versus fixation time (in days).The plot does not reveal a systematic relationship between fixation time and thickness mismatch, neither in terms of bias nor in terms of variance.This is consistent with the literature that shows relative plateauing of T2 values in postmortem MRI after initial 1-2 months fixation ( Dawe et al., 2009).We conclude that the large variation in fixation times in our study did not significantly affect the thickness computation based on the obtained segmentations.

CONCLUSION
While there is increased interest in using high-resolution postmortem MRI of the human brain for discovering associations between brain structure and pathology, automated tools for the analysis of such complex images have received much less attention compared to antemortem MRI.Our study used a relatively large dataset of 135 high-resolution T2w 7 T postmortem whole-hemisphere MRI scans to evaluate multiple deep learning image segmentation architectures and to develop an automatic segmentation pipeline that labels cortical gray matter, four subcortical structures (caudate, globus pallidus, putamen, and thalamus), WMH, and normal-appearing white matter.We report good agreement between thickness measures derived from our deep learning pipeline with the reference standard of semi-automated thickness measurement.Our analysis linking morphometry measures and pathology demonstrated that automated analysis of postmortem MRI yields similar findings to a labor-intensive semi-automated approach, and more broadly, that automated segmentation of postmortem MRI can complement and inform antemortem neuroimaging in neurodegenerative diseases.We have released our pipeline as a stand-alone containerized tool that can be readily applied to other postmortem brain datasets.

Fig. 1 .
Fig. 1.Postmortem tissue blockface photograph of a donor with diagnosis of Parkinson's disease (not demented) and Lewy body disease (deceased at the age of 79).Shown are the lateral (A) and medial (B) views of the right hemisphere.The tissue is then placed in a mold (C) and is subsequently slabbed (cut into different slices) as shown (D).See Lasserve et al., 2020 for more details.See Supplementary Figure 1 for the slices of the given brain tissue.

Fig. 5 .
Fig. 5. Manual segmentation of the cortical gray matter patches of size 64 x 64 x 64 using the protocol explained in Figure 5. Section 3.1.1for two subjects with FTLD-TDP (A) and GGT (B) in three viewing planes with the corresponding 3D renderings.
Fig. 7.(A) For a subset of 10 subjects, each column shows an example sagittal view of the automated cortical mantle segmentation predicted across the whole-brain hemisphere using the different neural network architectures.We notice that all architectures except nnU-Net show either under-or over-segmentation of the cortical gray matter, which, together with the results reported in Table3(ICC results), prompted us to select nnU-Net as the preferred model for cortical gray matter segmentation.For example, notice how networks such as 3D Unet and Attention Unet incorrectly segment large chunks of WM as cortical GM (white circles).Whereas nnU-Net performs the best in difficult to segment areas such as the anterior and posterior regions of the brain with poor MRI signal.(B) Again, notice that AnatomyNet (CE) is not able to segment certain areas in the cortical mantle (white arrows).The 3D Unet and Vnet networks oversegment the cortical mantle into the WM.However, nnU-Net overcomes all of these limitations.

Fig. 9 .
Fig. 9. nnU-Net-CRUISE.Topologically corrected automated nnU-Net segmentations of cortical gray matter and white matter shown in sagittal view for five subjects.The first columns shows the MRI slice, the automated nnU-Net segmentation before and after topology correction step (nnU-Net-CRUISE) are shown in columns 2 and 3, respectively.Columns 4 and 5 show the zoomed-in area, with the red arrows indicating the regions where topology correction improves the segmentation.

Fig. 11 .
Fig. 11.Three-dimensional renderings of automated segmentations by nnU-Net of cortical gray matter, WM, WMH and the four subcortical structures for the subject with primary age-related tauopathy and cerebrovascular disease.

Fig. 12 .
Fig. 12. Generalization to other unseen imaging sequences.(A) The nnU-Net architecture trained on 7 T T2w images at 0.3 x 0.3 x 0.3 mm 3 generalized well to images acquired at 7 T T2*w at a high resolution of 0.16 x 0.16 x 0.16 mm 3 .(B) The nnU-Net architecture trained on 7 T T2w images at 0.3 x 0.3 x 0.3 mm 3 generalized well to images acquired at 7 T T2*w at a high resolution of 0.28 x 0.28 x 0.28 mm 3 .

Table 1 .
Demographics of the Alzheimer's disease and related dementias (ADRD) patient cohort in the current study.

Table 5 .
White matter hyperintensities volume correlations.Shown is the one-sided Spearman's correlation between normalized WMH volume (obtained by dividing the WMH volume with the corresponding WM) with regional cortical thickness based on nnU-Net-CRUISE.For the four subcortical structures, partial one-sided Spearman's correlation, with uncorrected p-value in brackets, is shown.All the tests were controlled for age, sex, and PMI.The asterisk indicates that the test survived Bonferroni multiple tests correction.Each cell is color coded, with darker shades indicating more negative correlations.CI indicates 95%