Identification of rare cortical folding patterns using unsupervised deep learning

Abstract Like fingerprints, cortical folding patterns are unique to each brain even though they follow a general species-specific organization. Some folding patterns have been linked with neurodevelopmental disorders. However, due to the high inter-individual variability, the identification of rare folding patterns that could become biomarkers remains a very complex task. This paper proposes a novel unsupervised deep learning approach to identify rare folding patterns and assess the degree of deviations that can be detected. To this end, we preprocess the brain MR images to focus the learning on the folding morphology and train a beta variational auto-encoder (β−VAE) on the inter-individual variability of the folding to identify outliers. We compare the detection power of the latent space and of the reconstruction errors, using synthetic benchmarks and one actual rare configuration related to the central sulcus. Finally, we assess the generalization of our method on a developmental anomaly located in another region and we validate the relevance of our approach on patients suffering from drug-resistant epilepsy. Our results suggest that this method enables encoding relevant folding characteristics that can be enlightened and better interpreted based on the generative power of the β−VAE. The latent space and the reconstruction errors bring complementary information and enable the identification of rare patterns of different nature. This method generalizes well to a different region on another dataset and demonstrates promising results on the epileptic patients. Code is available at https://github.com/neurospin-projects/2022_lguillon_rare_folding_detection.


Introduction
During gestation, the human cortex folds and gets its convoluted shape composed of gyri-the ridges of white matter-that are delimited by furrows-the sulci.Historically, their shape and characteristics have been described by neuroanatomists based on specimens.In the human population, stability of the folding patterns is observed with an overall similarity of location, shape and arrangements (Ono et al., 1990).This stability is important enough to enable to define a road map and a nomenclature of sulci and to develop methods that automate sulci recognition (Rivière et al., 2002;Borne et al., 2020).Despite this homogeneity, each brain displays a unique cortical folding, acting as a fingerprint (Wachinger et al., 2015).Fig. 1A and B. show examples of the variability in the central sulcus region, which is one of the most stable.The folding variability is so complex that it has long been overlooked.
However, thanks to advances in the neuroimaging field, studies have tried to characterize sulci with elementary shapes that amount to building blocks of alternative patterns.For instance, the central sulcus is typically composed of one or several knobs (Yousry et al., 1997).Similarly, the mid-fusiform sulcus presents an omega pattern (Weiner et al., 2014).In contrast, some very rare patterns have also been described, such as the interruption of the central sulcus that can be found in only about 1% of the population (Mangin et al., 2019).Four examples of interrupted central sulci in the right hemisphere are presented in Fig. 1C.
Folding patterns have also proved to be very interesting as they are related to function.For example, the central sulcus divides the cortex into the motor and the sensory areas, and specific parts of the central sulcus have been correlated with the cortical areas of the tongue, foot, and hand among others (Mangin et al., 2019;Germann et al., 2020).Specifically, the central sulcus main knob has been linked to the hand motricity and is called the "hand knob" (Yousry et al., 1997).In another region, cingulate sulcus patterns have been associated with the inhibitory control (Borst et al., 2014).
Specific patterns were also correlated to neurodevelopmental disorders.The Power Button Sign (PBS), a rare configuration of the precentral sulcus, may be associated with a certain type of epilepsy (Mellerio et al., 2014).Patterns in the superior temporal sulcus (STS), central, intraparietal and frontal regions could be related to autism (Levitt et al., 2003;Auzias et al., 2014;Hotier et al., 2017).Hence, deciphering sulcal complexity and having a better understanding of the underlying shape variability is of great interest: folding patterns could become biomarkers of neurodevelopmental disorders.
Folding patterns can be analyzed with two approaches.On one hand, morphometric features can be extracted such as the depth, the surface curvature, or the opening of each sulcus.On the other hand, one can look directly at the shapes of the sulci.Working on shapes rather than on morphometric values is particularly interesting as they constitute "trait features" opposite to "state features" (Cachia et al., 2016).Unlike state features that can evolve during the lifespan, trait features remain fixed afterbirth.For example, the sulcal opening is a state feature because it increases with aging (Kochunov et al., 2005;Jin et al., 2018).In return, the pattern of the cingulate sulcus area is a trait feature because it is stable throughout life after infancy.This difference between trait and state features has also been demonstrated in the study of the effects of handedness on the central sulcus shape.For example, forced dextral subjects show similarities to sinistral subjects in shape, but changes in elongation mimicking dextrals, occur when they are constrained to use the right hand for writing (Sun et al., 2012).Different strategies can be adopted for exploring the shapes: a finite number of shapes can be considered, using clustering for instance (Meng et al., 2018;Duan et al., 2019), or shapes can be represented in a continuous way, such as manifold-based analyses (Sun et al., 2012), (de Vareilles et al., 2022).For both strategies, a first step is required to represent the folding patterns.Folding shapes' complexity can be reduced based on the similarity between different sulci (Sun et al., 2009(Sun et al., , 2012;;de Vareilles et al., 2022) or between sulcal graphs (Im et al., 2011;Meng et al., 2018).These similarity measures are then either directly analyzed, or projected to a lower dimensional space.
Advances in machine learning are now opening up new possibilities for studying folding patterns, identifying typical or rare patterns, and hopefully, emerging sulcal biomarkers.Recently, a neural network classifier was used to map geometric shapes to the broken-H shape pattern in the orbitofrontal region (Roy et al., 2020).However, this method requires having pre-identified the geometric shape to be mapped, which makes it difficult for unknown patterns to emerge.To tackle this issue, unsupervised deep learning techniques seem to be promising.Thus, two unsupervised deep learning models, a β − V AE and SimCLR, were compared in the task of identifying typical patterns in the cingulate region (Guillon et al., 2022).Other works have focused on the task of identifying abnormal folding patterns thanks to unsupervised deep learning in the region of the superior temporal sulcus branches (Guillon et al., 2021).Anomaly detection has been a subject of great interest in the domain of biomedical imaging: many studies, including for brain MR images, have tried to identify abnormal samples (Chalapathy and Chawla, 2019;Fernando et al., 2022).A common framework is to use auto-encoders as they implement a latent space with fewer dimensions than the input which makes it hard to encode uncommon features.Then, the identification of anomalies is usually performed based on the reconstruction error rather than in the latent space.
In this work, we investigate whether an unsupervised deep learning model can learn normal folding variability to identify deviating regional patterns; and if so, what granularity of deviations can be detected?Here, we define granularity as the characteristics and properties of the anomalies, such as their size or nature.The analysis of the granularity that can be identified aims to characterize the abnormal features that can be detected and at what level of detail.We also seek to describe which space is the most relevant to identify deviating patterns: is it based on the reconstruction error, in the input space, that is to say in our case, the folding space, or is it the latent space?
Folding mechanisms may lead to both global and regional anomalies and these two scales have led to correlations with function disorders (Fernández et al., 2016).Here, we focus on regional patterns rather than on a global representation.
Specifically, we concentrate on the central sulcus which is a good candidate for our work.Indeed, it is one of the first folds to appear and it is stable enough to be a first step in modeling inter-individual variability.More importantly, usually long and continuous, the central sulcus can be interrupted in very rare cases, making interrupted central sulci relevant patterns to assess our method.Finally, this region is of clinical interest as it is linked to hand motricity and asymmetries have been described (Sun et al., 2012;Bo et al., 2015).
To perform our study, we worked on the HCP database ( Van Essen et al., 2013).From the MR images, we focused on the folding morphology of the central sulcus area with a specific preprocessing that does not require labeling the sulci of the studied subjects.Indeed, in the future, we wish to be able to apply our methodology to new databases whether the sulci are labeled or not.Even if automatic recognition tools exist and are efficient, some errors may remain and thus lead to the selection of another sulcus and to a contaminated learning set, especially regarding unusual patterns.We then trained a β − V AE to learn the inter-individual variability.Due to the small number of interrupted central sulci and to be able to characterize the detected granularity, we designed synthetic benchmarks of rare patterns to assess our methodology more reliably.We then investigated the detection power of our methodology both on the latent space and on the folding space, using either our synthetic outliers or actual interrupted central sulci, an actual rare pattern.Finally, we assessed the generalization of our approach on another dataset presenting abnormal folding patterns in the cingulate region.
We stress out that rare patterns are not necessarily abnormal and associated with some disorders.However, whether they have a link with pathologies or not, rare patterns are interesting objects to study as they can constitute traces of neurodevelopmental processes.Therefore, in this article, the only abnormal pattern that we study is the one in the cingulate region.We consider our synthetic benchmarks and interrupted central sulci as rare patterns.

Database
We used T1 weighted MR images of the Human Connectome Project (HCP) dataset ( Van Essen et al., 2013).Data were acquired on a single Siemens Skyra Connectom scanner at an isotropic resolution of 0.7mm.Subjects are healthy controls from 22 to 36 years old.In the context of our study we considered only the right hemispheres of the right-handed subjects leading to a total of 1001 subjects.The long-term goal of this work is to identify rare folding patterns that have not been characterized yet.However, we first need to assess our method.To do so, we decided to work on a rare pattern already described, the interrupted central sulci (CS).A previous study identified in this database seven sulci in the right hemisphere and two in the left (Mangin et al., 2019).The identification was based on the depth profiles of the sulcal pit maps, which are defined as the locally deepest point of the cortical surface (Lohmann et al., 2008), and it was then visually confirmed.We chose to work on the right hemisphere rather than on the left in order to have the highest number of rare patterns.

Folding representation
In this work, we consider the folds as the skeleton of a negative cast of the brain, that is to say, voxels located in the cerebrospinal fluid, (Fig. 2A.5) (Mangin et al., 1995), which can be represented as ribbons located between the gyri (green ribbons in Fig. 2A.7).Folding or sulcal patterns are defined as the combination and arrangements of shapes of one or several folds.
BrainVisa/Morphologist pipeline.Structural MR images hold numerous pieces of information beyond the morphology of cortical folding.In order to focus on the folding characteristics, we developed a preprocessing pipeline.The raw MR images are first processed by the BrainVISA/Morphologist software (https:// brainvisa.info/).This pipeline is composed of several steps that include skull stripping, bias correction, segmentation of the brain and of the hemispheres, skeletonization of the grey matter and the cerebrospinal fluid union (Fig. 2A) (Rivière et al., 2002).This step leads to so-called skeletons, 3D images representing only the folding which is then segmented into simple surfaces (SS) depending on various parameters such as the sulcal depth or topological properties (Fig. 2B) (Mangin et al., 1995).For example, in Fig. 2B, small branches (SS2 and SS4) are represented as different simple surfaces from the main ones, SS1 and SS3.The depth variation resulting from the buried gyrus leads to two distinct simple surfaces (SS1 and SS3).Therefore, in this case, the central sulcus is composed of four simple surfaces.All in all, the obtained outputs are 3D images that correspond to a negative cast of the brain (first step of Fig. 3).
From skeletons to distance maps.Skeleton-based images have proved to be relevant and were the object of previous studies for fold recognition (Borne et al., 2020(Borne et al., , 2021)), as well as for folding patterns representation (Guillon et al., 2021(Guillon et al., , 2022)).However, in this work, we applied an additional preprocessing step to convert our skeleton images into distance maps.Indeed, we believe that skeletons have several shortcomings that could limit the performance of our approach.First, skeletons are binary images representing the folding patterns; hence, they are very sparse images.Therefore, only very few voxels hold explicit sulcal information in skeletons.As shown in Fig. 4, on average, sulci represent 3500 voxels in our region of interest (ROI), which stand for less than 5% of the voxels.We argue that it would be interesting to have a representation where the information devoted to folding patterns is more distributed.In addition, skeleton images are not smooth and the local regions of interest in our application correspond to high frequencies making the skeleton details harder to represent and reconstruct.It is also complex to reconstruct folding patterns in skeletons as they are not continuous images, so there is no notion of the proximity of a voxel to a sulcus.This makes the reconstruction error and the gradient-based learning less efficient.Finally, distance maps are built based on the whole hemisphere; therefore, if we work on a ROI, they give, especially near the border of the ROI, information about objects outside the ROI, which is not the case for crops based on skeletons.Therefore, to tackle these shortcomings, we convert the resulting skeletons into distance maps based on the Chamfer distance, which approximates the Euclidean distance.Sulci are considered objects, and the further away a sulcus is, the larger the value of a voxel is.An example  A mask of the central sulcus area is defined based on a distinct manually labeled dataset.HCP is processed with Morphologist to obtain folding graphs, which are used to obtain 3D images of skeletons (1).The Chamfer distance is applied to the skeletons to obtain geodesic distance maps (2).Distance maps are then downsampled and cropped according to the mask (3) and are fed as input to a β − V AE.
of a distance map is presented in Fig. 3 step 2.
Focusing on a single region: crop definition.As we are interested in capturing the local folding patterns variability, such as the hand knob, rather than the global hemisphere-wide arrangements of sulci, we chose to focus our study on a sub-region of the right hemisphere, the central sulcus area.To define the ROI, we learned a mask of the central sulcus over a manually labeled dataset comprising 62 healthy controls (Borne et al., 2020).Subjects are first affinely registered to the ICBMc2009 space and resampled to an isotropic resolution of 1 mm.Then, for each subject, a mask is incremented for all the voxels of the central sulcus represented as a set of simple surfaces.The resulting mask is slightly dilated by 5 mm to include potential central sulcus locations not represented in our database.We crop the distance maps of the HCP subjects according to the mask bounding box using the same affine normalization procedure (step 3 of Fig. 3).The mask is applied on the fly during the training of our network.
Distance maps and folds visualization.Data visualization, and shape characterization in particular, can be per-formed directly based on the distance maps.However, this type of input enables only 2D slice views.To better visualize the folds of our crops, we binarize our distance maps with an empirically defined threshold of 0.4 and convert them to meshes.

beta-VAE
In order to identify rare patterns we first seek to model the inter-individual variability.In the outlier detection field based on unsupervised methods, auto-encoder (AE) models are widely used as they implement a latent space, also known as the bottleneck, that has far fewer dimensions than the input space.Usually, training is performed only on control subjects.The assumption is that the model learns a representation of the normal variability and that at inference when facing outliers, it will not be able to encode and reconstruct them as well as control data.To overcome some shortcomings of simple convolutional AE and to regularize the latent space, variational auto-encoder (VAE) was introduced (Kingma and Welling, 2014).Its strength also lies in its generative power, enabling not only to reconstruct but also to generate new data.Other AE-based models have been used for anomaly detection in the biomedical field such as Generative Adversarial Networks (GAN) (Schlegl et al., 2017(Schlegl et al., , 2019)), which were then transposed to brain images (Simarro Viana et al., 2021).A comparison of AE models showed that VAE was one of the most efficient in brain MR images (Baur et al., 2020).In the context of representation learning of folding patterns, VAE was proved to be well adapted (Guillon et al., 2021(Guillon et al., , 2022)).
In the VAE framework, a sample of input space X is mapped to a distribution in a latent space Z of L dimensions, by an encoder θ.A vector z is then drawn from this distribution and reconstructed by a decoder φ.The objective function seeks to minimize both the reconstruction error and the Kullback-Liebler (KL) divergence (D KL ).The model is thus trained to maximize: (1) where p(z) refers to the prior distribution (in this work, a reduced centered Gaussian distribution) which is approximated with q φ (z|x), the posterior distribution.β − V AE is an extension of the VAE where the KL divergence is weighted by β (Higgins et al., 2017).

Training procedure
Preprocessing.The input data of the model are the previously defined just cropped, then masked distance maps.For augmentation purposes, random rotations between [-10°, 10°], centered on the mask center, are drawn from a uniform distribution at each epoch and applied to the whole brain, before applying the mask that strictly remains at the same position.Such rotations are also sought to limit the edge effects.More precisely, the central sulcus is surrounded by two main folds, the precentral and the postcentral sulci.Parts of these sulci are included in the ROI.Therefore, rotating the distance map under the mask enables to capture a wider context and to try to limit their influence.We observe that skeleton voxels equal 0 in the initial distance maps X and the values increase with the distance to a sulcus, possibly ranging up to 10 mm.Potential reconstruction errors near the sulci would be minor compared to the voxels located far from them at the edge of the mask, whereas we wish the model to concentrate more on the sulci.Thus, to limit the impact of distance variability far from the sulci, we perform a normalization according to equation 2, resulting in values between [0, 1], with the highest values on the folds and a saturation at about 4-5 mm which corresponds to half of the typical width of gyri.An example is shown in Annex 1 of the supplementary.Finally, we apply a small padding, resulting in samples of dimensions 80 x 80 x 96.
Training.Dataset was split into train, validation, and test sets of respectively 640, 161, and 200 subjects.Training is only performed on control data, all identified interrupted central sulci (CS) were added to the test set.
The interrupted central sulci were identified based on the detection of the two main sulcal pits of the central sulcus, between which a depth profile was computed to determine the depth of the "pli de passage frontal-moyen" (PPFM), usually a buried gyrus in the sulcus.Subjects with a shallow PPFM were then manually inspected to determine whether the surrounding central sulci were interrupted (Mangin et al., 2019).However, we point out that there may remain some undetected interrupted central sulci in the training set as all subjects were not individually inspected.To model the normal inter-individual variability, we used a classic convolutional β − V AE of depth 3.In order to choose the best values for β and latent space dimension L, we performed a gridsearch (β=2-10, L=4-150).To assess each parameter configuration, we used two criteria.Our first criterion is the reconstruction quality.Indeed, we seek to leverage the reconstruction and generative power of the β − V AE, hence the reconstruction quality must be sufficient.Our second criterion is the detection power on a proxy for the interrupted central sulci.The pre-central and post-central sulci demonstrate some similarities with the central sulcus in terms of orientation, size, and shape.However, they tend to be more interrupted and to present a higher number of ramifications.Therefore, we used the HCP dataset crops of these two other regions as fake outliers.We selected only preand post-central sulci which presented some ambiguities with the central sulcus based on the procedure described in Annex 2 of the supplementary.Finally, our ambiguous set was composed of 28 precentral sulci and 18 post-central sulci.For each hyperparameter combination, we trained a β − V AE on the train set, then a linear SVM was trained to classify between the latent codes of the validation samples and of the pre-or post-central sulci.We kept the hyperparameters that led to the best classification results and good reconstructions (based on reconstruction error and visual inspection).

Generating synthetic rare patterns
One of the challenges of our work is the lack of consensual rare patterns to evaluate our methodology.In addition, it would be interesting to be able to quantify the degree of deviation that our model is able to detect.Therefore, several sets of synthetic rare patterns were generated to be used as benchmarks.Both benchmarks were generated from the test set subjects.

Deletion benchmark
Our first benchmark consists of subjects for whom we have erased one simple surface (SS).Erasing small simple surfaces could be a good proxy to simulate rare patterns because some fold branches may be missing in some people, or a sulcus may be shorter or absent.Large simple surfaces are less likely to be missing but allow us to assess the degree of deviation that can be detected.Deleting simple surfaces directly on the distance maps would not be interesting as the voxels next to the simple surface indicate the SS position.To tackle this issue, the suppression was done during the generation of the raw skeletons.The distance map is then computed based on the pruned skeletons.To analyze the granularity of anomaly that can be detected by our method, we generated several benchmarks which vary according to the size of the deleted simple surface (SS).As such, we created four sets where SS size was between 200-500 voxels, 500-700 voxels, 700-1000 voxels, and simple surfaces of more than 1000 voxels.In the following, we name each set with the minimum number of voxels: for instance, 200 corresponds to the benchmark where simple surfaces of size between 200 and 500 were erased.To be deleted, simple surfaces must have a number of voxels included inside the mask corresponding to the range of the different sets.If several simple surfaces meet the criteria, one is randomly chosen to be erased.Otherwise, a subject may not have a simple surface satisfying the requirements.In such cases, the subject is not included in the benchmarks.Finally, from the 200 test subjects, benchmark 200 contains 180 subjects; benchmark 500, 68; benchmark 700, 108 and benchmark 1000, 151 subjects.To have a better representation of the amount of deleted sulci, Fig. 4 shows the simple surface sizes distribution in the central sulcus region.The figure shows that our crops contain a large majority of very small simple surfaces (less than 500 voxels) and far fewer large simple surfaces.The smaller simple surfaces are mostly part of the precentral and postcentral sulci, representing more than 85% of the surfaces between 200 and 500 voxels.On the contrary, larger simple surfaces correspond to the central sulcus.Therefore, beyond deleting simple surfaces of varying sizes, the nature of the sulci and thus the location, are also different, especially between the set 200 and the others.The right part of Fig. 4 shows the number of voxels corresponding to skeletons in our crops.It demonstrates the progressive intensity of anomalies when deleting simple surfaces from 200 voxels to more than 1000 voxels.Indeed, when simple surfaces of more than 1000 voxels are deleted, it corresponds to a third or a quarter of the skeleton crop.Distance maps are then generated according to 2.2.An example is presented in Fig. 5.

Asymmetry benchmark
Our second benchmark corresponds to the equivalent crop but in the left hemisphere.Left hemisphere distance maps are generated according to the same methodology as the right.Like our control crops of the right hemisphere, we computed a left central sulcus mask on the labeled dataset.To enforce the exact same crop size, we adapted the mask to match the adequate dimensions by adding or deleting a few voxels.Once the crops were obtained, they were flipped.During training, the right central sulcus mask was applied on the fly.We emphasize that we did not use the interhemispheric plane-symmetric coordinates but a mask specifically designed for the left central sulcus.This is especially important since there is a slight asymmetry in the position of the central sulcus between the two hemispheres (Davatzikos and Bryan, 2002).An example is presented in Fig. 6.

Identifying Outliers
Once the model has learned a representation of the inter-individual variability, outliers identification can be performed at two levels.Traditionally, anomaly detection with AE is done based on the reconstruction error and an error map can be obtained comparing the input and the output (Schlegl et al., 2017(Schlegl et al., , 2019;;Pinaya et al., 2018).But one can also wonder about the distribution of outliers in the latent space.Are the outliers distributed differently?To answer this question, we investigated the detection power in the outliers' distribution in the latent space based on the reconstruction errors performed in the input space-which we call folding space in our case, as we study folding patterns.For both approaches, control test images and outlier images (deletion benchmarks, asymmetry benchmark and interrupted sulci) are encoded and reconstructed by our trained model.

A specification on data
As mentioned in 2.3.2,our control test set comprises 200 subjects.However, when studying our different outliers sets, data subsets were different since some subjects did not have any SS meeting the benchmark's criteria.
• Deletion benchmarks: to avoid any bias, we used only control subjects with a simple surface meeting the benchmark's criteria for each benchmark.Therefore, for benchmark 200, we used 90 controls that have a simple surface between 200 and 500 voxels but that has not been erased, and 90 benchmark subjects, for whom simple surfaces were actually erased.Resulting in n 200 control = n 200 deletion = 90, n 500 control = n 500 deletion = 34, n 700 control = n 700 deletion = 54 and n 1000 control = 75 and n 1000 deletion = 76.
• Asymmetry benchmark: all subjects have their asymmetric counterpart.Hence, 100 subjects were randomly picked among the subjects from the test set for whom we took their asymmetric version.Resulting in n control = n asymmetry = 100.
• Interrupted central sulci: the whole test set is used as control data, leading to n control = 200 and n interrupted = 7.

On the Latent Space
A hint from the visualization.For both of our benchmarks and the interrupted central sulci, we first sought to have a visualization of data distribution in the latent space.Therefore we projected encoded data into a smaller space of two dimensions with UMAP algorithm (McInnes et al., 2018(McInnes et al., , 2020)).This projection enables us to get a first hint as to how outliers are represented.
Assessing the detection power on the benchmarks.However, the UMAP algorithm drastically reduces dimensions, leading to some information loss.We tried to assess whether relevant information regarding folding patterns was encoded in the latent space.Therefore, we trained linear support-vector machines (SVM) (Pedregosa et al., 2011) on the latent codes with stratified cross-validation to classify between control data and benchmark.Performance is assessed based on the ROC curve.Quantifying the marginality of interrupted central sulci.
As interrupted central sulci are very few, we cannot use classification as we did for the benchmarks.Classic machine learning out-of-distribution algorithms are more suited.Therefore, to quantify whether the interrupted sulci are likely to be detected from their location in this reduced space, we applied two classic algorithms, One-Class SVM (OCSVM) (Schölkopf et al., 2001;Pedregosa et al., 2011), and isolation forest (Liu et al., 2008;Pedregosa et al., 2011) based on the data coordinates in the UMAP space.However, interrupted central sulci may not be the rarest pattern, and other folding configurations may be very scarce.Therefore, we also looked at control subjects repeatedly predicted as outliers by these algorithms.
Travelling through the latent space.Finally, to better understand the encoded properties and the learned representations, we leverage the generative power of the β − V AE.
We computed average representations from different sets of data points, taking the mean for each dimension of the latent space.We then reconstructed these vectors.To further analyze the latent space, we traveled through it, going from one point, either the average pattern or a subject, to another point in the latent space, linearly interpolating vectors and reconstructing them.

On the Folding Space
Outlier identification in the folding space relies on the model's error.The reconstruction errors' distributions were compared visually and assessed with the Kolmogorov-Smirnov test for the benchmarks and with the Mann-Whitney U-test for interrupted central sulci.For both cases, the null hypothesis was that the two distributions were identical.The other strength of analyzing this space rather than the latent space is that the model's errors can help understand and locate the rare patterns' characteristics.To localize the errors, we commonly look at the residuals, which are the difference between the input and the reconstruction of the model.This corresponds to what the model has missed or added.To differentiate these two types of errors, we looked at them independently, computing the difference between the input and the output, i.e., the model's omissions, and between the output and the input, i.e., the model's additions.It is particularly interesting in the case of interrupted sulci, as we could expect that the model makes them continuous.

Generalization to another region
To assess the reproducibility in another region and to ensure that our framework is not limited to the central sulcus, we transposed our methodology to the isolated corpus callosum dysgenesis (CCD) which leads to a cortex anomaly located in the cingulate region.This disorder is a congenital malformation that results in a complete or partial absence of the corpus callosum.The corpus callosum is composed of fibers that connect the two hemispheres.

Database
The dataset includes 7 children between 9 and 13 years old presenting an isolated CCD and 7 matched control children (Bénézit et al., 2015).Among the patients, 3 present a complete agenesis, 3 a partial agenesis, and one a hypoplasia, corresponding to "a homogeneous reduction of the callosal size" (Tovar-Moll et al., 2007).In this case, the corpus callosum is completely formed, but abnormally small (Bodensteiner et al., 1994).For all children, the CCD was not associated with other malformations or developmental disorders.As presented before, we used T1-w MR images obtained from a Siemens Tim Trio 3T scanner with an isotropic resolution of 1mm.

Transposition of the method
The described anatomical anomalies associated with CCD include "sulci radiating on hemisphere medial surface, complete or partial absence of the callosomarginal sulcus and of the cingulate gyrus" (Bénézit et al., 2015).Therefore, we transposed our method to the cingulate sulcus region.Using the same methodology as presented before, we computed a mask of the cingulate sulcus (gathering the calloso-marginal anterior and posterior fissure in the BrainVISA nomenclature), resulting in crops of dimensions 30 x 128 x 125 and 30 x 130 x 108, which were padded up to 32 x 128 x 128 and 32 x 136 x 112 respectively for the right and left hemispheres.We used the same data split as before.We used the hyperparameters obtained with the gridsearch on the central sulcus region for training.Choosing these parameters may lead to suboptimal performances but enables us to have a first validation of our methodology.Analyses of the latent and the folding spaces are performed following the method described above for the central sulcus.Since the corpus callosum connects the two hemispheres, CCD can be studied equally in both hemispheres.Therefore, we conducted our experiments in the right and in the left hemisphere.

Each training lasted for approximately 1 hour on an
Nvidia Quadro RTX5000 GPU.We obtained with our gridsearch β = 2 and L = 75.

On the Latent Space
UMAP latent space visualizations for the four deletion benchmarks are presented in Fig. 7.For the benchmark 200, benchmark data are rather homogeneously distributed among control data, suggesting that simple surfaces of sizes between 200 and 500 voxels are too subtle to be encoded differently.Indeed, as shown in Fig. 5, small, simple surfaces can correspond to tiny branches that display a high variability in the population.Therefore these synthetic anomalies may be included in the normal variability.The distribution of benchmark 500 seems to be not completely similar to the control's, but the restricted number of subjects makes it hard to conclude.However, the trend becomes more pronounced for benchmarks 700 and 1000 where fake anomalies are gradually gathered and their distributions are different from the controls.These results are confirmed by the ROC curves (Fig. 7).Even when using all the latent dimensions, classification results are very poor for benchmark 200 (AUC = 0.51), supporting that the deleted branches may be too melted into the inter-individual variability.Classification performances are also very low for benchmark 500 (AUC = 0.70).They start to be slightly better for benchmark 700 (AUC = 0.81) but are very good only for benchmark 1000 (AUC = 0.96).
For the asymmetry benchmark, UMAP visualization demonstrates a good separation between the right and the left hemisphere (Fig. 8A), which is verified by the classification of the whole latent space (AUC=0.82).These results suggest that specific shape features are encoded among other properties in the latent space.
To better understand the asymmetry characteristics encoded by the model, we leveraged the generative power of our β − V AE.Fig. 8B.and C. show the average patterns for the right (green) and the left hemisphere (blue) as encoded by our model.The hand knob of the right central sulcus seems to be slightly higher and shallower than in the left hemisphere.Moreover, the double-knob configuration appears more prominent in the left hemisphere.To further highlight the main differences between the two hemispheres, we selected the most important dimensions for the classifier, here dimensions 9 and 36.In Fig. 9A., control and benchmark data are represented according to these two dimensions.Even if the separation is not well marked, we can observe a trend represented by the arrow.We tried to understand the features encoded by the 9th dimension.We took the average for all 75 dimensions of the latent space, and we traveled from the minimum to the maximum of the 9th dimension and reconstructed the resulting vector.Fig. 9B. 1, 2, and 3 represent the reconstructions.These interpolations confirm the trend observed previously.We observe a double-knob configuration in the left hemisphere.The view from underneath and the side view enable visualizing the pli de passage frontal moyen (PPFM).A pli de passage is a gyrus that connects two gyri and which is buried in the depth of some furrows (Mangin et al., 2019).Fig. 2B.1.and 2. propose a visualization of a "pli de passage" located in the central sulcus, the PPFM.According to the different views from Fig. 9B. 1, 2 and 3, it seems that the PPFM is smaller in the right hemisphere and located higher in the central sulcus.

On the Folding Space
We then investigated whether the folding space, i.e, reconstruction errors, was relevant for identifying outliers.For deletion benchmarks, we observe a similar trend as in the latent space.For deletion 200, we cannot see a difference of distributions (p-value = 0.38).However, from deletion 500 we can see a stall with the deletion benchmarks having significantly higher reconstruction errors (p-values of 0.044, 5.3e-14 and 1.2e-24 for benchmarks 500, 700 and 1000 respectively) (Fig. 7).On the contrary, for the asymmetry benchmark, there is no significant difference, nor a trend, in the reconstruction error distributions (Fig. 8).

On the Latent Space
The UMAP projection from the latent space is shown in Fig. 10A.On this distribution, we can observe that most interrupted central sulci are at the margin of the point cloud except for one.Thus, it appears that the representation learned by our model enables to project rare patterns at the margin of the population.Interestingly, when we look at the pattern of each one of the interrupted sulci, it seems that a specific pattern, the "T-shape" pattern (Mangin et al., 2019) is specifically located on one side of the representation.Fig. 10B.shows the assessment of the marginality of the interrupted sulci based on an OCSVM and isolation forest.Error margins correspond to various UMAP projections, suggesting that the ability to detect interrupted CS in the UMAP space is very dependent on the UMAP projection.Interrupted CS detection is within the confidence interval, but the curves are close to the superior bound suggesting a tendency.However, interrupted CS positions in the UMAP space are not enough to detect them: detecting 5 interrupted CS out of 7 would lead to more than 40% of false positives.Nevertheless, some other patterns considered as controls and detected as outliers might also be rare.
Fig. 11 presents the controls' patterns most often predicted as outliers by the OCSVM.First, we note that the outliers are logically located at the border of the distribution.Moreover, we observe distinct patterns in different regions of the UMAP space.We visually highlighted the subjects of the four regions.Analyzing the corresponding crops' meshes, we observe similarities within the groups.Group B seems to demonstrate a very wide open knob.In addition, the knobs are well defined by the upper and the bottom part of the sulcus.On the contrary, the sulci of group C appear to have larger knobs than usual but they show more continuity with the upper and the bottom parts.The pattern of group D seems to correspond to a rather flat central sulcus with a close, long and continuous postcentral sulcus.The shape characteristics of A are less obvious but the sulci give the impression of having several small knobs, two or even three in the two bottom cases and a small part of the precentral inferior opposite to an upper part of the postcentral sulci.Fig. 12 provides a better understanding of these features.For each pattern, we go from the centroid to one of the subjects in each group by interpolating and generating samples.Fig. 12A.presents the interpolations from the centroid to the several-knobs pattern.We gradually see the upper part of the hand knob curving and becoming more pronounced until forming a first knob at the top of the sulcus.Another knob in the bottom part appears similarly.Likewise, patterns B, C and D vary progressively until they match the centroid's shape.

On the Folding Space
When analyzing the detection power on interrupted CS in the folding space, we first note that the reconstruction errors' distributions seem to be different between HCP controls and interrupted CS (p-value = 0.0011).This result suggests that our model has more difficulties to reconstruct the input and that reconstruction error could constitute a relevant metric to detect rare or abnormal patterns.However, having only seven subjects strongly limits our conclusions and this should be replicated with more data.
Observing the reconstructions and the residual maps of Fig. 13 gives clues into the way our model has encoded the interrupted CS.First, we can note that the reconstruction quality is quite good visually.The model's omissions appear to be quite noisy (blue small fold pieces).The arrow points out an omission beyond the noise which corresponds to a perpendicular branch pointing toward the frontal cortex.Such a pattern might be an uncharacteristic configuration.It is interesting to note that in six out of seven cases, the model transformed interrupted sulci into continuous patterns.This is highlighted by the "outputinput" visualizations.Unlike the omissions, the model additions are rather localized.Moreover, the asterisks show where the model has filled the interrupted sulci.Such visualization could be useful to identify rare patterns like interruptions or perpendicular branches.

On the Latent Space
We first compare distributions of CCD children (n=7) with control children (n=7) acquired in the same conditions and with HCP adult subjects (n=200).UMAP projections, presented in Fig. 14A., give different results depending on the hemisphere.For the right hemisphere, it seems that most children controls are included in the distribution of adult controls (hcp test in green).Five out of the seven subjects having a CCD are located at the margin of the controls, suggesting that their latent representation differs from the average cingulate sulcus pattern.However, two subjects, one with a complete and one with a partial agenesis, are in the middle of the controls.In the left hemisphere, only three control children are clearly in the control adult distribution.The other four are closer to the CCD subjects but they seem to be still distinct.Indeed, CCD subjects are gathered very close to each other.This could be due to the fact that there may be an age effect between children's and adults' brains or a site effect (different scanners, resolution), which we tried to reduce by using skeleton-based images but which may still remain.Nevertheless, we can still observe a difference in distribution between control children and CCD subjects.

On the Folding Space
Regarding reconstruction error distributions (Fig. 14B.), we observe for both hemispheres that control children seem to have the same distribution as adult controls, which is confirmed by Fig. 14C.(p-value=0.034 and 0.017 respectively for right and left hemisphere).On the contrary, CCD subjects present higher reconstruction errors that are significantly different from both HCP controls (p-value=3.6e-06for the two hemispheres) and children controls (p-value=0.0011for the two hemispheres).Therefore, it seems that there is a complete individual separability of the CCD patients which is very promising and should be replicated with more data.
The reconstructions presented in Fig. 15 highlight the singularities of CCD.The model's additions mostly make the cingulate more continuous than initially.The model's omissions are mainly small branches perpendicular to the cingulate sulcus that are radially oriented.

Discussion
This work proposed a methodology to study rare folding patterns which was applied to the central sulcus region and to a described rare pattern, interrupted central sulci.Specifically, we represented folding patterns with distance maps and leveraged the generative power of the β − V AE to have a better understanding of the learned representations.In addition, we proposed a way to study the granularity of deviations that can be identified and we brought to light several rare patterns in the region.We also compared the identification power of both the latent space and the folding space.Finally, we assessed the generalization of our methodology on a developmental anomaly in another region.
4.1.Latent Space and Folding Space, Two Complementary Information In many anomaly detection works applied to medical images, the detection is performed based on the reconstruction error rather than in the latent space (Schlegl et al., 2019;Baur et al., 2020;Behrendt et al., 2022;Tschuchnig and Gadermayr, 2022).However, both of these spaces have their interest and could bring complementary information.In our work, we studied four types of rare patterns, two synthetic types, deletion and asymmetry benchmarks, and two actual rare patterns.These four categories differ from control data by their own characteristics and thus help to study the granularity detected, that is to say, the typology of rare features that can be identified.For instance, the asymmetry benchmark includes more double-knob configurations.Depending on the size of the deleted simple surface, deletion benchmarks represent different features: benchmarks 200 and 500 represent mainly a missing branch with increasing size, which may represent the normal variability of branches.Benchmark 700 could look like an interrupted sulcus in some cases or in others, like benchmark 1000, an unlikely configuration.Interrupted central sulci present a clear interruption and a rare arrangement of the shapes forming the central sulcus.Last, CCD subjects demonstrate a missing sulcus or missing sulcal parts and branches with different orientations.
These different kinds of deviations from the norm pro- vide clues to the characteristics of rare patterns that can be identified respectively in the latent space or in the folding space of our model.As a matter of fact, the identification performances in the latent and in the folding space vary depending on the kind of patterns.For deletion benchmarks, the folding space, based on the reconstruction error, seems to enable the identification of unusual patterns from smaller modifications: different distributions are observed from 500 deleted voxels.Whereas in the latent space, the detection requires at least 1000 deleted voxels.Likewise, for the interrupted central sulci, despite the small number of samples, their detection seems to be easier on the basis of reconstruction error than in the la-tent space.Similar results were obtained on CCD subjects even if the latent representation was encouraging.In return, the error distribution of the asymmetry benchmark is not different from that of the controls, but the benchmark is well detected in the latent space.Therefore, the latent space could be more sensitive to shape arrangements than the folding space.The lack of difference in the error distributions may be due to the fact that the voxel-to-voxel differences between the right and left central sulci are local and subtle and could be embedded in the normal variability.In addition, the reconstruction error is for the entire image.Therefore, in the case of small and very local deviations from the norm, the reconstruction error alone  is likely to be insufficient.A way to limit such effects could be to use a more local error, applied to sub-regions or patches for instance.
The difference in the outlier detection performance may also lie in the way our model encodes the outliers.Based on our results, we can consider several cases.First, a rare configuration is represented by several samples present in the training set.This would be the case with the asymmetry benchmark.Indeed, there are more double-knob configurations in the left hemisphere but single and doubleknob patterns coexist on both sides.In such a case, the distribution support of the left and right hemispheres are the same, but the densities differ, which could lead to a projection of the outlier at the margin of the latent space but to a good reconstruction.Second, the rare configuration is almost never represented in the training set and the model has not detected and thus encoded its local specificity.Then, the subject would be encoded with a "default" representation and projected in the middle of the other subjects.This would be consistent with the results of (Guillon et al., 2021), where major anomalies (differ-ent parts of the brain from the one considered in the train set) were projected in the middle of the point cloud and reconstructed as the average reconstruction.It could be the case of the benchmark deletion 500 and of the interrupted central sulcus that is projected in the point cloud.Last, the outlier configuration is almost never represented in the training set but the model has detected the rare characteristic.The subject is then projected at the margin of the point cloud and the decoder has not learned this part of the latent space leading to a poor reconstruction (interrupted central sulci, CCD subjects).
Nevertheless, in all cases, a strength of the folding space is the possibility to localize the reconstruction errors and, in some cases, the unusual features.If not too noisy, reconstruction errors can be very informative.For example, in the case of interrupted central sulci, looking at the model's addition permits clearly localizing what is atypical in a subject (Fig. 13).Similarly, in the case of the CCD subjects, the reconstruction errors highlight the presence of radial small branches that are typical of this brain disorder (Bénézit et al., 2015).But some noise re- mains, and it might be interesting to add an additional constraint to represent only errors that correspond to a minimum number of contiguous voxels.This could lead to a good explanation of the abnormality which is of major importance in the field and especially when applied to medical images.Other explanation methods exist, directly on the network such as Grad-CAM (Selvaraju et al., 2020) or on an OC-SVM applied on the learned features (Sohn et al., 2022) for instance; but the use of the reconstruction error is immediate and easy to implement.Hence, the latent space and the folding space, based on reconstruction error, can provide complementary information and both can be used to identify rare patterns.

Data size limitations and unknown number of rare patterns
The method should be further qualified because of the low number of our examples of rare patterns.While the study of a known rare pattern is interesting and important, having only seven samples severely limits our conclusions.Similarly, the poor results of benchmarks 500 and 700 in the latent space could be due to their small size, and having larger benchmark datasets could lead to increased performances.
Also, we assessed our method in the CS area on the benchmarks and on an existing rare pattern, but because few rare patterns have been described in this region, there may be other rare configurations in what we consider the control population.For instance, three morphologic variants in the central sulcus region have been introduced, representing 2.9%, 7.0% and 1.8% of the studied population, opposed to 78.2% of "omega" shape, i.e. the central sulcus knob and 10.1% of "epsilon" shape which corresponds to the double-knob configuration (Caulo et al., 2007).This multiplication of rare patterns in the populations would make the identification of interrupted central sulci more difficult.

Relevance of synthetic benchmarks
Moreover, we can wonder about the relevance of our synthetic benchmarks.Although synthetic rare patterns are of high interest as they enable to quantify the performances on different degrees of deviations from the norm, few works have been interested in them to our knowledge  (Guillon et al., 2021;Meissen et al., 2022).But the use of fake deviations raises the question: do they constitute adequate rare patterns?Few studies introduced rare folding patterns based on the arrangement of their shapes such as the PBS (Mellerio et al., 2014), an interrupted central sulcus (Mangin et al., 2019) or a flat central sulcus (Sun et al., 2017).Here, we emphasize their advantage in the study of our understanding of the brain: they are evidence of neurodevelopmental processes and then stable throughout life.But other abnormal sulcal features have been studied and found to be important and correlated with neurodevelopmental disorders, such as the depth, which demonstrated anomalies in autism spectrum disorder (Nordahl et al., 2007;Dierker et al., 2015) or Williams syndrome (Essen et al., 2006) for instance.Despite being another subject of study, a benchmark corresponding to central sulcus depth variations could be interesting to assess whether our framework can be extended to detect such anomalies.
Regarding the current benchmarks we use, we said that small erased SS could remain undetected as this deletion could be embedded in the normal variability.However, there may be several categories of deletion deviations.Some may be minor, as a small SS representing a tiny branch.On the opposite, some small SS, for instance one corresponding to depth change, representing the presence of a pli de passage, and thus leading to an interrupted central sulcus would be expected to be a major feature of the topology.Hence, our criterion, only based on the size of the SS may be insufficient and it could be interesting to add another one, such as topological criteria.
In any case, having an unusual feature (e.g., a missing simple surface or unusual depth) that can be incrementally increased, or comparing several types of features, helps characterize the detection power of a model and the features likely to be detected.

Learning Relevant Representations
When dealing with sulcal patterns and their high complexity, it may be easier to use representations of the folding which attempt to gather several subjects with similar patterns.Local averages of sulci, also called moving averages, enable to concentrate on the main features of the different patterns and are thus very useful to analyze folding patterns (Sun et al., 2012;de Vareilles et al., 2022;Foubet et al., 2022;Guillon et al., 2022).From a graphbased representation of the sulci, the identification of patterns can be done after computing similarity and applying a clustering (Meng et al., 2018).Our approach proposes another method to learn sulcal representations.From our cropped distance maps, the β − V AE learns a mapping to a latent representation which can then be reconstructed.Therefore, rather than explicitly computing pairwise similarity between the subjects, gathering them, and then analyzing the patterns, we hope that our β − V AE directly learns shapes that can be combined and arranged in patterns.The representations learned by our model seem to be relevant and consistent with some morphological characteristics of the central sulcus area.First, the reconstruction of the average representation of the right central sulcus is composed of an upper knob whereas the left average tends more towards a doubleknob configuration (respectively green and blue sulci in Fig. 8B).This is one of the main known asymmetries in terms of patterns and it appears early in the development.It has been detected in infants of 30 weeks postmenstrual age (de Vareilles et al., 2022) and in adults (Sun et al., 2012).We also observed differences in terms of curvature of the hand-knob, with a hand-knob more pronounced in the left hemisphere than in the right (Fig. 8B.).Considering that we study a right-handed population, this could be related to handedness.With the lateralization of the hand motricity, we expect the motor area of the right hand in the left hemisphere and particularly the precentral gyrus to be more developed for right-handed subjects, pushing backward the upper part of the central sulcus which would result in a knob more pronounced.This interpretation is consistent with a study on one-handed subjects that showed that subjects born without a hand had a flatter central sulcus contralateral to the missing hand (Sun et al., 2017).
Another interesting property that was successfully encoded is the PPFM.This pli de passage was first described in 1888 (Broca and Pozzi, 1888) and has been a source of growing interest due to its link with the motor hand area (Boling and Olivier, 2004) and in the context of understanding the formation of the knob regarding evolutionary questions (Hopkins et al., 2014).Our model was able to encode the PPFM in the latent space as well as its asymmetry characteristics.Indeed, we observed that the PPFM is smaller in the right hemisphere which corresponds to central sulcus depth variations described in (Amunts et al., 1996).This is also consistent as the PPFM has been correlated to the hand.Therefore, right-handed subjects tend to have a more developed hand area in the left hemisphere and thus a larger PPFM.Hence, our latent space has learned relevant normal characteristics that are consistent with the region's morphology.It has also enabled to propose four other groups of likely rare patterns (Fig. 11).The pattern representing a rather flat central sulcus is indeed a non-typical configuration.Less than 2% of the studied subjects were reported to have such a configuration in (Caulo et al., 2007).Moreover, flat central sulci appeared as the most important feature when comparing controls to congenital onehanded subjects who tended to demonstrate flatter central sulci (Sun et al., 2017), confirming that flat central sulci are less frequent patterns.The groups representing large knobs and wide open knobs (Fig. 11B.and C.) are also an atypical configuration that is present at one extremity of the axis representing the most extreme variations in Human and is closer to configurations we observe in Chimpanzees (Foubet et al., 2022).

Generative power of β − V AE and comparison with
other strategies Since our proposed framework is able to encode relevant features regarding folding patterns, the generative power of the β − V AE can be exploited.Indeed, reconstructions and interpolations are tools to understand the folding variability.We have just mentioned that the learned patterns were relevant and consistent with those obtained by other methods, but our method has the advantage of being able to reconstruct and interpolate.For instance, interpolations along the main axis of asymmetry variations highlight the evolution from a right to a left hemisphere.It can also be useful to understand the folding process and in particular the formation of interrupted central sulci.As a matter of fact, on Fig. 12C., an interruption of the central sulcus happens when interpolating from the central subject to one control outlier.When observing the PPFM, we can see that the PPFM increases until reaching the surface of the brain and thus interrupting the central sulcus.Jointly, the inferior sulcal part connects to the precentral sulcus.Such observations may provide additional clues in our understanding of the folding processes.
But other deep learning models could be interesting to study folding patterns.For instance, β − V AE reconstructions are known to be blurry contrary to GAN's.Currently, this shortcoming is limited as we seek to have a simpler representation of folding patterns, still, for more subtle details, another model may be better suited.In addition, in the anomaly detection field, models that add constraints on controls distribution are quite appealing.For instance, deep One-Class Classification and its derivatives have been proposed to push control data into the smallest hypersphere in the latent space (Ruff et al., 2018).This could help increase the detection performance in the latent space.Nevertheless, no matter the architecture or the framework, an important limit to understanding what our model has really encoded is the high number of latent dimensions.
One can also wonder about the representation of the folding patterns.In the introduction, we mentioned two main strategies: clustering and manifold.Usually, these two approaches are applied to a continuous space.Nevertheless, if we consider sulcal shapes as symbolic entities that can be combined and arranged, we could represent folding patterns based on a discrete space rather than a continuous one.As such, VQ-VAE (van den Oord et al., 2017) seems to be an interesting representation to compare with our present results.
Finally, this framework of outlier detection based on training on control subjects alone may sensitive to outliers present in the training set.Having a contaminated dataset could severely limit the detection performances, at least in the folding space which is based on the reconstruction error.It has been reported in a brain tumor detection problem that having 3% of outliers in the training set (about 1000 samples) leads to a decrease of 5% of the AU-ROC and to a 13% decrease if the contamination reaches 12% of the training set (Behrendt et al., 2022).Therefore, one serious shortcoming of our paradigm is that we do not know the outliers we are looking for.Applying our framework to a control population alone in order to bring out rare patterns may limit the different patterns that can be identified.A way to tackle this issue and to increase the patterns detected would be to exploit the presence of outliers in the training set as proposed in (Qiu et al., 2022).In their technique, the authors introduce an iterative joint training where they assign labels (anomalous or control) to the examples, and then optimize the network's parameters to better identify the anomalies.Such a method could also enable to project the outliers more at the margin of the latent space.The impact of the presence of outliers during training on the latent space has not yet been investigated to our knowledge.If, as we suggested before, outliers present in the training phase are encoded at the margin of the distribution, i.e. in a different area of the latent space, it could be interesting to deepen our analysis, based on clustering for instance.4.6.Generalization of the approach: towards an analysis of the whole brain?This work has shown that our approach had successfully encoded some relevant features of the folding patterns in the central sulcus region but it is attractive to think about the behavior and results we could obtain in other parts of the brain.Here, we assessed the generalizability of the framework on another dataset and another region.Our results suggest that our method can well transpose in other brain regions.Specifically, even if we use the hyperparameters (β and L) optimized for another area, the learned representations still enable us to distinguish between control and outlier subjects.This is all the more interesting that the two studied regions are rather different.The central sulcus is one of the first folds to form and is rather stable, contrary to the cingulate region that is more variable (Sun et al., 2009).Therefore, it seems that no matter the folding variability of the zone, our framework can be applied.This encouraging result raises a question regarding the procedure to adopt to extend our analysis to the whole brain.A way could be to define a set of regions, consistent with the cytoarchitecture and function and to train our β − V AE on each region.In particular, some areas seem to be interesting from a clinical point of view (Provost et al., 2003;Yücel et al., 2003;Gervais et al., 2004;Borst et al., 2014;Hotier et al., 2017).Our future works may thus focus on proposing an adequate methodology to tackle the whole brain.
On the other hand, when we applied this framework to CCD subjects, we also operated a domain shift.Indeed, the dataset to explore included exclusively children while the β − V AE was trained on young adults.Despite folding patterns being reported as trait features (Cachia et al., 2016), such an age variation may have an impact.In addition, beyond dealing with children, the site and the scanner are different.Such differences have been reported to affect the generalizability and the performances on various targeted tasks.In terms of distributions in the latent space, despite the fact that the distribution of the controls does not seem to completely overlap the distribution of the HCP controls, the patients still seem to present a different distribution than both controls' populations.Moreover, the domain shift does not seem to have an effect on the folding space where controls reconstruction errors are not significantly different from HCP contrary to CCD subjects that have significantly higher reconstruction errors.However, having only seven subjects makes it difficult to conclude on the importance of these age and site effects for our task.We will explore these questions in further studies.
To conclude, this study proposed a framework to identify rare and abnormal folding patterns based on the modeling of the inter-individual variability.With a new representation of folding patterns, we proposed a model that was able to encode relevant folding characteristics.The use of synthetic rare patterns enlightened the identification power of our model on both the latent space and the folding space.Finally, we successfully generalized our approach to another clinical brain anomaly in the cingulate region.Our results open up several avenues of work such as the definition of new synthetic benchmarks that match the characteristics of other known anomalies, the use of other deep learning models that exploit the presence of outliers in the training set, or the use of our framework to better understand the folding process.

Funding
This work was supported by the European Union's Horizon 2020 Research and Innovation Programme under Grant Agreement No.

Figure 1 :
Figure 1: Central sulcus region variability.A. Localization of the studied region of interest (ROI) on a 3D view of one right hemisphere.The colored ribbons represent sulci, defined as a negative cast of the furrows.The central sulcus is red.B. Examples of non-interrupted central sulci.C. Examples of interrupted central sulci.

Figure 2 :
Figure 2: Overview of the BrainVISA/Morphologist pipeline's main steps and of the folds representation.A. Main steps of BrainVISA/Morphologist pipeline.1. Raw T1-w MRI, 2. Bias-corrected image, 3. Segmentation of the brain, 4. Segmentation of the hemispheres and of the grey and white matter, 5. Skeleton representation of the folding graph, representing a negative cast of the 4. 6. Mesh representation of the white matter of the right hemisphere, 7. Folding graph that represents the folds (in green) as the negative cast of the white matter of the right hemisphere (white mesh).B. Folds representation.1. Example of a central sulcus, which is composed of several elementary entities called simple surfaces (SS).(Orientation: A: Anterior, P: Posterior, S: Superior, I: Inferior).2. Corresponding schematic representation of the sulcus represented in 1, which is formed by four simple surfaces.Depth variation caused by the buried gyrus and the presence of two branches lead to the division into four different simple surfaces.3. Corresponding folding graph.

Figure 3 :
Figure 3: Pipeline.A mask of the central sulcus area is defined based on a distinct manually labeled dataset.HCP is processed with Morphologist to obtain folding graphs, which are used to obtain 3D images of skeletons (1).The Chamfer distance is applied to the skeletons to obtain geodesic distance maps (2).Distance maps are then downsampled and cropped according to the mask (3) and are fed as input to a β − V AE.

Figure 4 :
Figure 4: Skeleton's description of the test set.Left: Stacked histogram representing the distribution of simple surfaces sizes for the test subjects for the three main sulci of our crop, the central sulcus (S.C. right), the precentral sulcus (S.Pe.C. right) and the postcentral sulcus (S.Po.C. right).(Note:The labeling used is automatic and therefore not entirely reliable, but these labels are sufficient to draw conclusions regarding the SS size distribution.)Right: Distribution of the number of skeletons' fold voxels for the test subjects when the mask is applied to the crops.

Figure 5 :
Figure 5: Deletion benchmarks.Visualization of original sulcal pattern and its altered versions from the four deletion benchmarks showing patterns with increasing simple surface size deleted.Upper row: Mesh visualization.Middle and bottom rows: distance maps on axial view, visualization at depths 15 and 37.

Figure 6 :
Figure 6: Asymmetry benchmark.Visualization of the original sulcal pattern and its flipped contralateral version for two subjects.

Figure 7 :
Figure 7: Deletion benchmarks results.For each row, controls are represented in green and benchmark data in pink.Left column: UMAP projection of benchmark and control data.Middle column: ROC curves of classification of control and benchmark data.Right column: reconstruction error distributions and p-value of the Kolmogorov-Smirnov test with the null hypothesis that the two samples come from the same distribution.

Figure 8 :
Figure 8: Asymmetry benchmark results.Controls are represented in green and benchmark data in blue. A. UMAP projection of benchmark and control data, ROC curves of classification of control and benchmark data, and reconstruction error distributions.B. Averages for the control subjects, i.e. right hemispheres (in green), and for the highlighted asymmetry subjects, i.e. left hemispheres (in blue).These averages are also placed on the UMAP dimensions.C. 1. and 2. Respectively side and bottom views of the averages of B. The single star indicates a single-knob configuration, and the two stars indicate the second knob of a double-knob configuration.C. 3. Superposition of the two averages respectively in upper and bottom view.

Figure 9 :
Figure 9: Travelling through the 9 th dimension of the latent space.A. Visualization of controls and asymmetry benchmark according to the most important features of the classifier.B. Interpolations along the 9 th dimension.1, 2, and 3, respectively correspond to the upper, bottom and side view of these interpolations.C. Superposition of extreme interpolations.

Figure 10 :
Figure 10: Interrupted central sulci on UMAP space.A. Interrupted central sulci shape distribution in the UMAP space.The 3D folding patterns of the subjects are positioned according to their location in the UMAP space.For instance, the pattern located in the lower left corner corresponds to subject 510225 in the UMAP representation.Subjects with interrupted sulci on the upper left of the UMAP visualization seem to correspond to an interruption with a T-shape pattern.B. Outlier detection performances using OCSVM and isolation forest on the interrupted CS. C. Controls and interrupted CS reconstruction error distributions.

Figure 11 :
Figure 11: Control subjects identified as outliers.A, B, C and D correspond to groups of visually similar patterns.The UMAP projection is the same as the one in Fig.10.Control subjects identified as outliers are in pink and subjects with interrupted central sulci are still represented in red.

Figure 12 :
Figure12: Travelling through the latent space from the centroid to the margin of the UMAP space.The centroid is the centroid of HCP controls.Then, for each row, interpolations between the centroid and one of the patterns of each group are computed and then reconstructed.

Figure 13 :
Figure 13: Reconstructions and residuals for all seven interrupted sulci.A. For all rows, distance maps are converted to meshes for an easier visualization.First row: input data.Second row: reconstruction of the model.Third row: Reconstruction of the model with the difference between the input and the output, i.e. the model's omissions (in blue).The purple arrow highlights an omission corresponding to a perpendicular branch pointing toward the frontal cortex.Last row: Reconstruction of the model with the difference between the output and the input, i.e. the model's additions (in purple).B. Rotated view of the reconstructions represented with asterisks in the last row of A.

Figure 14 :
Figure 14: Results on corpus callosum dysgenesis (CCD) subjects.First row: right hemisphere.Bottom row: left hemisphere.For both rows: A. UMAP projections of CCD subjects, control children and HCP test.B. Reconstruction error distributions for the CCD subjects, control children and HCP test.C. Reconstruction error variations for the CCD subjects, control children and HCP test.Significant differences between populations according to the Mann-Whitney test are indicated with an asterisk.

Figure 15 :
Figure 15: Right cingulate sulcus reconstructions and residuals for the CCD subjects and one control.A. CCD subjects.B. One control subject from the same cohort.For both A. and B.: each column corresponds to a subject.For all rows, distance maps are converted to meshes for an easier visualization.First row: input data.Second row: reconstruction of the model.Third row: Reconstructions of the model with the difference between the input and the output, i.e. the model's omissions.Last row: Reconstructions of the model with the difference between the output and the input, i.e. the model's additions.The arrows highlight interesting features added or missed by the model.