This letter presents a mathematical model of figure-ground articulation that takes into account both local and global gestalt laws and is compatible with the functional architecture of the primary visual cortex (V1). The local gestalt law of good continuation is described by means of suitable connectivity kernels that are derived from Lie group theory and quantitatively compared with long-range connectivity in V1. Global gestalt constraints are then introduced in terms of spectral analysis of a connectivity matrix derived from these kernels. This analysis performs grouping of local features and individuates perceptual units with the highest salience. Numerical simulations are performed, and results are obtained by applying the technique to a number of stimuli.
Gestalt laws have been proposed to explain several phenomena of visual perception, such as grouping and figure-ground segmentation (Wertheimer, 1938; Kohler, 1929; Koflka, 1935; for a recent review, see Wagemans et al., 2012). In order to individuate perceptual units, gestalt theory introduced local and global laws. Among the local laws, we recall the principle of proximity, similarity, and good continuation. The local law of good continuation plays a central role in perceptual grouping (see Figure 1, top row, left)).
Regarding global laws, in the construction of percepts, the feature of saliency is crucial, yet it is not easy to model quantitatively. In Berliner Gestal theory, the concept of salience denotes the relevance of a form with respect to a contextual frame, that is, the power of an object to be present in the visual field. The role of salience is also pivotal in figure-ground articulation. Due to the perceptual grouping process, scenes are perceived as constituted by a finite number of figures, and the salience assigns a discrete value to each of them. The most salient configuration pops up from the ground and becomes a figure (Merleau-Ponty, 1945). Note that in the case of continuous deformation of the visual stimulus, the salient figures can change abruptly from one percept to another (Merleau-Ponty, 1945). This happens, for example, in the top row of Figure 1, where a regular deformation is applied to the Kanizsa square: we progressively perceive a more curved square, until it suddenly disappears and the four inducers are perceived as standing alone (see Lee & Nguyen, 2001; Pillow & Nava, 2002; Petitot, 2008).
A number of results have been provided in order to refine the principles of psychology of form and assess neural correlates of good continuation law. In particular, Grossberg and Mingolla (1985) introduced a “cooperation field” to model illusory contour formation. Similar fields of association and perceptual grouping have been produced by Parent and Zucker (1989). In the 1990s, Kellman and Shipley provided a theory of object perception that adressed the perception of partially occluded objects and illusory contours (Kellman & Shipley, 1991; Shipley & Kellman, 1992, 1994). Heitger and von Der Heydt (Von Der Heydt, Heitger, & Peterhans, 1993) provided a theory of figural completion that can be applied to illusory contour figures (as the Kanizsa triangle) and real images. Field et al. (1993) introduced through psychophysical experiments the notion of association fields, describing the Gestalt principle of good continuation. They studied how the perceptual unit visualized in Figure 1b pops up from a stimulus of Gabor patches (see Figure 1a). Through a series of similar experiments, they constructed an association field that defines the pattern of position-orientation elements of stimuli that can be associated with the same perceptual unit (see Figure 1c).
Starting from the classical results of Hubel and Wiesel (1977), it has been possible to justify these perceptual phenomena on neurophysiological bases. The results of Bosking, Zhang, Schofield, and Fitzpatrick (1997) and Frégnac and Shulz (1999) confirmed that neurons sensitive to similar orientation are preferentially connected. This suggests that the rules of proximity and good continuation are implemented in the horizontal connectivity of low-level visual cortices. A stochastic model that takes into account the structure of the cortex, with position an orientation feature, was proposed by Mumford (1994), and further exploited by Williams and Jacobs (1997) and August and Zucker (2000). They modeled similar fields with Fokker-Planck equations, taking into account different geometric features, such as orientation and curvature. Petitot and Tondut (1999) introduced a model of the functional architecture of V1, compatible with the association field. Citti and Sarti (2006) proposed the model of functional architecture as a Lie group, showing the relation between geometric integral curves, association fields, and cortical properties. This method has been implemented in Sanguinetti, Citti, and Sarti (2008) and Boscain, Duplaix, Gauthier, and Rossi (2012). An exact solution of the Fokker-Planck equation has been provided by Duits and van Almsick (2008), and their results have been applied by Duits and Franken (2009) to image processing.
The local laws are insufficient to explain the constitution of a percept, since a perceived form is characterized by global consistency. Different authors have qualitatively defined this consistency as pregnancy or global saliency (Merleau-Ponty, 1945), but only a few quantitative models have been proposed (Koch & Ullman, 1985). In particular, a spectral approach for image processing was proposed by Perona and Freeman (1998), Shi and Malik (2000), Weiss (1999), Coifman and Lafon (2006). In Sarti and Citti (2015) showed how this spectral mechanism is implemented in neural morphodynamics in terms of symmetry breaking of mean field neural equations. In that sense, Sarti and Citti (2015) can be considered an extension of Bressloff, Cowan, Golubitsky, Thomas, and Wiener (2002).
In this letter, we further develop the approach introduced in Sarti and Citti (2015) and describe an algorithm for the individuation of perceptual units using both local and global constraints: local constraints are modeled by suitable connectivity kernels, which represent neural connections, and the global percepts are computed by means of spectral analysis. The model is described in the geometric setting of a Lie group equipped with a sub-Riemannian metric introduced in Petitot and Tondut (1999), Citti and Sarti (2006), Sarti, Citti, and Petitot (2008). Despite the apparent mathematical difficulty, it helps to clarify in a rigorous way the gestalt law of good continuation.
Here we introduce various substantial differences from the techniques in literature. While studying the local properties of the model, we focus on the properties of the connectivity kernels. The Fokker-Planck and the Laplacian kernel in the motion group are already largely used for the description of the connectivity, since they qualitatively fit the experimental data (Sarti & Citti, 2015). Here we perform a quantitative fitting between the computed kernels and the experimental ones in order to validate the model. We show that the cortical architecture is a realization of stochastic sample functions and how, through this realization, we can construct the connectivity kernel. We make a comparison between the fundamental solution of the Fokker-Planck equation with the experimental data of Bosking et al. (1997), Ben-Shahar and Zucker (2004), and Gilbert, Das, Ito, Kapadia, and Westheimer (1996), showing how the stochastic paths are implemented in the neural network. In particular, we consider the distribution of a tracer through lateral connection, modeling each injection with stochastic paths. The bouton distributions are realizations of a stochastic process, in particular, of a random walk in space. We show how the probability density obtained as a combination of Fokker-Planck is an integration of stochastic paths. Moreover, we propose also using the subelliptic Laplacian kernel in order to account for the variability of connectivity patterns. Second we accomplish grouping with a spectral analysis inspired by the work of Sarti and Citti (2015), who proved the neurophysiological plausibility of this process. In the experiments, we manipulate the stimuli to demonstrate the relation between the pop-up of the figure and the eigenvalue analysis. We analyze in particular the swap between one solution and the other while smoothly changing the stimulus in many grouping experiments. Finally, we enrich the model, exploiting the role of the polarity feature, which allows us to work with two competing kernels.
The plan of the letter is the following. Section 2 is divided in two parts, first describing local constraints and then global ones. We first recall the neurogeometry of the visual cortex and see how the cortical connectivity is represented by the fundamental solution of Fokker-Planck, sub-Riemannian Laplacian, and isotropic Laplacian equations. We propose a method for the individuation of perceptual units, first recalling the notions of spectral analysis of connectivity matrices, obtained by the connectivity kernels. We will see how eigenvectors of this matrix represent perceptual units in the image. In section 3, we present numerical approximations of the kernels and will compare kernels with neurophysiological data of horizontal connectivity (Angelucci et al., 2002; Bosking et al., 1997). We will see how the differential equations that we need to solve originate from a stochastic process, estimated with the efficient numerical technique of Markov chain Monte Carlo methods (MCMC). We also perform a quantitative validation of the kernel considering the experiment of Gilbert et al. (1996), showing the link between the connectivity kernel and the cell’s response. Finally in section 4, we present the results of simulations using the implemented connectivity kernels. We identify perceptual units in different Kanizsa figures, highlighting the role of polarity and discussing and comparing the behavior of the different kernels.
2 The Mathematical Model
In this section, we identify a possible neural basis for local gestalt laws in the functional architecture of the primary visual cortex, the first cortical structure that underlies the processing of visual stimuli. We do not claim that the process of grouping has to be attributed exclusively to V1, since several cortical areas are involved in segmentation of a figure. However, neural evidence ensures that it takes place in V1 (see Lee & Nguyen, 2001; Pillow & Nava, 2002). Hence, we focus on this area where the first elaboration is made and is therefore important for the geometrical aspects of the process.
2.1 Local Constraints: The Neurogeometry of V1
In the 1970s, Hubel and Wiesel (1962, 1977) discovered that this cortical area is organized in the so-called hypercolumnar structure. This means that for each retinal point , there is an entire set of cells, each one sensitive to a specific orientation of the stimulus.
The first geometrical models of this structure are due to Hoffman (1989), Koenderink and van Doorn (1987), Williams and Jacobs (1997), and Zucker (2006). They described the cortical space as a fiber bundle, where the retinal plane is the basis, while the fiber concides with the hypercolumnar variable . More recently, Petitot and Tondut (1999), Citti and Sarti (2006), and Sarti et al. (2008), proposed describing this structure as a Lie group with a sub-Riemannian metric (see also the results of Duits & Franken, 2009). This expresses the fact that each filter can be recovered from a fixed one by translation of the point and rotation of an angle . In particular, the visual cortex can be described as the subset of points of . Every simple cell is the characterized by its receptive field, classically defined as the domain of the retina to which the neuron is sensitive. The shape of the response of the cell in the presence of a visual input is called the receptive profile (RP) and can be reconstructed by electrophysiological recordings (Ringach, 2002). In particular, simple cells of V1 are sensitive to orientation and are strongly oriented. Hence, their RPs are interpreted as Gabor patches (Daugman, 1985; Jones & Palmer, 1987). They are constituted by two coupled families of cells: an even and an odd-symmetric one.
Via the retinotopy, the retinal plane can be identified with the two-dimensional plane . A visual stimulus at the retinal point activates the whole hypercolumnar structure over that point. All cells fire, but the cell with the same orientation of the stimulus is maximally activated, giving rise to orientation selectivity.
2.2 A Model of Cortical Connectivity
From a neurophysiological point of view, there is experimental evidence of the existence of connectivity between simple cells belonging to different hypercolumns—the so-called long-range horizontal connectivity. Combining optical imaging of intrinsic signals with small injections of biocytin in the cortex, Bosking et al. (1997) clarified properties of horizontal connections on V1 of the tree shrew. The propagation of the tracer is strongly directional, and the direction of propagation coincides with the preferential direction of the activated cells. Hence, connectivity can be summarized as preferentially linking neurons with co-circularly aligned receptive fields.
Figure 2a shows an isosurface of the symmetrized kernel , with its typical twisted butterfly shape. The kernel has been proposed in Sanguinetti et al. (2008) as a model of the statistical distribution of edge co-occurrence in natural images. The similarity between the two is proved at both a qualitative and a quantitative level (Sanguinetti et al., 2008; see also Figures 2a and 2b).
We will see in section 3.2 that a combination of Fokker-Planck and sub-Riemannian Laplacian fits the connectivity map measured by Bosking et al. (1997), where the Fokker-Planck fundamental solution represents well the long distances of the trajectory, while the sub-Riemannian Laplacian represents the short ones. A combination of different Fokker-Planck fundamental solutions can also be used to model the functional architecture of primates experimentally measured by Angelucci et al. (2002).
In section 3.1 we describe a numerical technique to construct the three kernels we have described.
2.3 Global Integration
Since the beginning of the twentieth century, perception has been considered by gestaltists as a global process. Moreover, following Koch and Ullman (1985) and Merleau-Ponty (1945), visual perception is a process of the visual field that individuates figure and background at the same time. Then it continues in segmenting the structures by succeeding differentiations.
A cortical mechanism responsible for this analysis has been outlined by Sarti and Citti (2015), starting from the classical mean field equation of Ermentrout and Cowan (1980) and Bressloff and Cowan (Bressloff et al., 2002; Bressloff & Cowan 2003). This equation describes the evolution of cortical activity and depends on connectivity kernels. The discrete output of the simple cells selects in the cortical space the set of active cells, and the cortical connectivity, restricted on this set, defines a neural affinity matrix. The eigenvectors of this matrix describe the stationary states of the mean field equation—hence, the emergent perceptual units. The system will tend to the eigenvector associated with the highest eigenvalue, which corresponds to the most important object in that scene. Mathematically the approach is strongly linked to spectral analysis techniques for locality-preserving embeddings of large data sets (Coifman & Lafon, 2006; Belkin & Niyogi, 2003; Roweis & Saul, 2000), data segregation and partitioning (Perona & Freeman, 1998; Meila & Shi, 2001; Shi & Malik, 2000), and a grouping process in real images (Weiss, 1999).
2.4 The Cortical Activity Equation
Hence these are the emergent states of the cortical activity that individuates the coherent perceptual unit in the scene and allows segmenting it. This is why we will assign to the eigenvalues of the affinity matrix the meaning of a salience index of the objects. Since we have defined three different kernels, we will define different affinity matrices. However, all kernels are real and symmetric, so matrix is a real symmetric matrix . Their eigenvalues are real, and the highest eigenvalue is defined. The associated principal eigenvectors emerge as symmetry breaking of the stationary solutions of mean field equations, and they pop up abruptly as emergent solutions. The first eigenvalue will correspond to the most salient object in the image.
2.5 Individuation of Perceptual Units
Since the three different kernels assign different roles to different directions of connectivity, the different affinity matrices and their spectrum will reflect these different behaviors. Consequently, the resulting data set partitioning will be stronger in the straight direction using the Fokker-Planck kernel or will allow rotation using the kernel (see also (Cocci, Barbieri, Citti, & Sarti, 2015) for a deeper analysis). Using the kernel , we expect an equal grouping capability in the collinear and the ladder directions.
In Figure 3 we show the affinity matrix of the image presented in Figure 10a. It is a square matrix with dimensions , where is exactly the number of active patches. It represents the affinity of each patch with respect to all the others. The structure of the affinity matrix is composed by blocks, and the principal ones correspond to coherent objects. On the right, we visualize the complete set of eigenvalues in a graph (eigenvalue number, eigenvalues). We explicitly note that the first eigenvector will have the meaning of the emergent perceptual unit. The other eigenvectors do not describe an ordered sequence of figures with different rank. However, their presence is important, especially when two eigenvalues have similar values. In this case, a small deformation of the stimulus can induce a change in the order of the eigenvalues and produce a sudden emergence of the corresponding eigenvector with an abroupt change in the perceived image.
This is in good agreement with the perceptual characteristics of salient figures of temporal and spatial discontinuity, since they pop up abruptly from the background, while the background is perceived as undifferentiated (Merleau-Ponty, 1945). Spectral approaches give reason to the discontinuous character of figure-ground articulation better than continuous models, which instead introduce a gradual change in the perception of figure and background (Lorenceau & Alais, 2001).
To find the remaining objects in the image, the process is then repeated on the vector space orthogonal to ; the second and the following eigenvectors can be found, until the associated eigenvalue is sufficiently small. In this way, only eigenvectors are selected; with , this procedure reduces the dimensionality of the description. This procedure neurally reinterprets the process introduced by Perona and Freeman (1998).
3 Quantitative Kernel Validations
3.1 Numerical Approximations of the Kernels
In this section, we numerically approximate the connectivity kernels , defined in section 2.
3.2 Stochastic Paths and Cortical Connectivity
In this section, we describe the cortical architecture as a realization of stochastic sample functions; in particular, we will see how the connectivity is associated with random paths. We will show that the position of presynaptic boutons in the images of Bosking et al. (1997) can be seen as the realization of stochastic paths via the anatomy. Every random walk starts from the injection site of a tracer and gives the position of a set of boutons, as visualized in Figure 5a. The probability density, which is described as a kernel, is the integration of all the random paths; it is estimated as the density of the boutons. Finally, the probability density, which is described as a kernel, is obtained as the integration of all the random paths. From a neural point of view, this integration, which can be interpreted as the action of a columnar population, provides an estimation of the density of the boutons.
We consider a hypercolumn of the ice cube scheme visual cortex, composed of approximately 100 neurons. In the connectivity map in Figure 5a, we notice the presence of an average of six boutons. In this way, the number of possible connections in the visual cortex is , and in our model, we use a number of paths compatible with these data. Now we make a comparison between the connectivity kernel previously defined and the experiments of Bosking et al. (1997) and Ben-Shahar and Zucker (2004).
In Figure 4, we can see the results of Ben-Shahar and Zucker (2004). On the left are the mean and standard deviation of the distribution of long-range connections of seven injection sites considering the data of Bosking et al. (1997), and in the middle are the expected median distribution for seven cells from the curve model described in Ben-Shahar and Zucker (2004). They noticed that the standard deviation is nonmonotonic, finding two local minima at approximately and degrees. Confirming their results, we show that our model implies a nonmonotonically changing variance as the orientation difference increases. In particular, on the right side of Figure 4, the mean and the standard deviation of seven random paths, at a fixed orientation, are visualized. We notice the presence of the nonmonotonicity of the standard deviation and that the two local minima at almost 30 degrees are preserved.
Moreover, the fact that the mean and the variance of the model are similar to the experimental data suggests that the choice of the normal distribution allows us to find physiological values. For these reasons, the connectivity represents the anatomical implementation of random paths.
We will now examine to what extent kernels , are models of connectivity. The kernel is used for comparison and to show that a uniform Euclidean kernel does not capture the anysotropic structure of the cortex. The random paths that we compute through MCM are implemented in the functional architectures in terms of the horizontal connectivity of a single cell. However, the connectivity of an entire population of cells corresponds to the set of all single cells’ connectivities and then to the Fokker-Planck fundamental solution.
A first qualitative comparison between the kernels , and the connectivity pattern has been provided in Sarti and Citti (2015). Here we follow the same framework, but we propose a more accurate quantitative comparison.
It is well known that the 3D cortical structure is implemented in the 2D cortical layer as a pinwheel structure, which codes for position and orientations (see Figure 5b). The pinwheel structure has a large variability from one subject to the next, but within each species, common statistical properties have been obtained. Cortico-cortical connectivity has been measured by Bosking et al. (1997) by injecting a tracer in a simple cell and recording the trajectory of the tracer. In Figure 5a, the propagation through the lateral connections is represented by black points. Bosking found a large variability of injections, which have common stochastic properties as the direction of propagation, a patchy structure with small blobs at approximately fixed distance and the decay of the density of tracer along the injection site.
We model each injection with stochastic path solutions of equation 2.3. Then we evaluate the stochastic paths on the pinwheel structure.
Due to the different role of the directions and in the definition of these kernels, the sub-Riemannian Laplacian paths and the Fokker-Planck paths have different structures.
The sub-Riemannian Laplacian allows diffusion in direction and favors the change of the angle; it can be used to describe short-range connectivity as described in section 4.4. Hence, it is responsible for the center blob in a neighborhood of injection points (see Figure 5c). The Fokker-Planck kernel produces an elongated, patchy structure and seems responsible for the long-range connection (see Figure 5d). We apply our quantitative fit only to the long-range connectivity, discarding the tracer in the neighborhood of the injection. For this reason, the sub-Riemannian Laplacian is not involved in validating the model.
The method is first applied to fit the image of the tracer taken by Bosking et al. (1997; see Figure 5a). They evaluate all the kernels on the pinwheels (see Figure 5b) to obtain a patchy structure. In order to apply formula 3.4, we cover both the image of the tracer and the Fokker-Planck with a regular distribution of rectangles, with edges equal to the mean distances between pinwheels (see Figures 5c, and 5d); clearly we do not cover the central zone, where we cannot fit the Fokker-Planck kernel). The resulting error value is , showing that the model accurately represents the experimental distribution.
A similar procedure has been applied to the image of the tracer provided in Angelucci et al. (2002; see Figure 5e). They obtain their result with various injections in the neighborhood of a pinwheel, so that all orientations are present and the tracer propagates in all directions. In this case, we do not have natural pinwheels; hence we use artificial pinwheels, obtained with the algorithm presented in Barbieri, Citti, Sanguinetti, and Sarti (2012); see Figure 5f), with the constraint that the mean distance between the artificial pinwheels is equal to the mean distance between the blobs produced by the tracer. Here we consider Fokker-Planck paths with all directions to obtain the apparent isotropic diffusion. Also in this case, we cover it with rectangles and perform a best fit; the minimum error value is (see Figures 5g, and 5h).
Bosking et al. (1997) showed a famous image with the tracer superimposed to the pinwheel structure (see Figure 5i). In this case, we have the tracer and the pinwheel of the same animal. This allows going below the scale of the pinwheel, and we correctly recover the orientation with the pinwheel (see Figure 5j). The estimated kernel is again a combination of Fokker-Planck. As before, we focus on orientations; hence, we model only the long-range part of the image, discarding the center blob. The evaluation of the error is made with squared regions at a scale smaller that the pinwheel, and the error goes below .
3.3 Perceptual Facilitation and Density Kernels
In order to obtain a stable and deterministic estimate of this stochastic model, we used the density kernel, which is a regular deterministic function, coding the main properties of the process. We perform a quantitative validation of these regular kernels, comparing them to an experiment by Gilbert et al. (1996). They study the capability of cells to integrate information out of the single receptive field of the cells. This integration process is due to the long-range horizontal connections; hence, it can be used to validate our model of long-range connectivity. As we recognized in the previous section, it is the Fokker-Planck kernel that can be considered a model for long-range connectivity; hence, we use this kernel.
Figure 6 (left) shows the results of Gilbert et al. (1996), visualized by the cell’s response to randomly placed and oriented lines in a black histogram. A vertical line is present in the receptive field of a cell selective to this orientation, and the intensity of its response is represented in the first column of the histograms. If the stimulus is surrounded by random elements aligned with the first one, the cell’s response increases (resp. the second, third, and last column of the histograms). When the other random elements are not aligned with the fixed one (as in the fifth, sixth, and seventh columns), the cell’s response decreases because it reflects an inhibitory effect.
On the right in the blue histogram, we evaluate the probability density modeled by the kernel in equation 2.5 in the presence of the same configuration of elements. The same trend is obtained considering the probability density distribution, as visualized in Figure 6 (right). In order to consider the inhibitory effect, we evaluate the kernel with 0 mean. A quantitative analysis of the differences between them has been evaluated considering the mean square error between the two normalized histograms. The error of underlines how this connectivity kernel well represents neural connections.
4 Emergence of Percepts
In the following experiments, some numerical simulations will be performed in order to test the reliability of the method for performing grouping and detecting perceptual units in an image. The kernel considered here depends only on orientation. Hence, it can be applied to detect the salience of geometrical figures, which can be very well described using this feature.
The purpose is to select the perceptual units in these images, using the following algorithm:
Define the affinity matrix from the connectivity kernel.
Solve the eigenvalue problem = , where the order of is such that is decreasing.
Find and project on the segments the eigenvector associated with the largest eigenvalue.
4.1 The Field, Hayes, and Hess Experiment
In this section we consider some experiments similar to the ones of Field et al. (1993), where a subset of elements organized in a coherent way is presented out of a ground formed by a random distribution of elements. A first stimulus of this type is represented in Figure 7a. The connectivity among these elements is defined as in equations 2.4 and 2.7.
After the affinity matrix and its eigenvalues, the eigenvector corresponding to the highest eigenvalue is in red. The results show that the stimulus is well segmented with the fundamental solutions of Fokker-Planck and sub-Riemannian Laplacian equations (see Figure 7b).
Now we consider a similar experiment proposed in Field et al. (1993), where the orientation of successive elements differs by 15, 30, 45, 60, and 90 degrees and the ability of the observer to detect the path was measured experimentally. It was proved that the path can be identified when the successive elements differ by 60 degrees or less. With our method, we obtain similar results: if the angle between successive elements is less than 60 degrees (Figures 7c–7e), the identification of the unit is correctly performed. With an angle equal to 60 degrees (see Figure 7f) only part of the curve is correctly detected: this can be interpreted as the observer’s increasing difficulty to detect the path. Finally, with higher angles (Figure 7g) the first eigenvector of the affinity matrix corresponding to random inducers, confirming the results.
Finally, we present an example where there are two units in the scene with roughly equal salience and roughly equal eigenvalues. In the first and the second rows of Figure 8, the stimuli are composed by a curve and a line in a background of random elements. In stimulus a, represented in the first row, the elements composing the curve are perfectly aligned and very nearby, so this has the highest saliency; it represents the eigenvector associated with the first eigenvalue (as shown in red in Figure 8b). The second eigenvalue in this case is slightly smaller. After computing the first eigenvector, the stimulus is updated (see Figure 8c); the first eigenvector of the new affinity matrix is computed, corresponding to the inducers of the line (see Figure 8d).
In the second row (see Figure 8e), we slightly modify the stimuli, in particular the alignement of the element forming the curve (e.g., an angle of pi/18). As a consequence, the line becomes the most salient perceptual unit and the first eigenvector (see Figure 8f). The stimulus is updated (see Figure 8g), and the first eigenvector of the new affinity matrix corresponds to the inducers of the curve (see Figure 8h). It is notable that in this case, a small change of the eigenvalues corresponds to small change of the eigenvectors, but the first eigenvalue swaps with the second one, and, consequently, we obtain an abrupt change in the perceived object.
In the previous examples, we have considered all contours with almost the same length. We show here that this length does not affect the feature of saliency. The two perceptual units in Figures 8i and 8m have different lengths. The results underline how the proximity of contours is stronger than length: the shortest units with nearer segments are the first perceptual units, associated with the most salient eigenvectors (see Figures 8j, and 8n). Then the stimuli are updated in Figures 8k and 8o), and the second eigenvectors are visualized in Figures 8l and 8p.
In this analysis, different features can be considered. In particular, the distances between the segments also play a central role. Consider, for example, the straight line in Figure 8a. If one or more segments is missing from the contour, we could obtain a less accurate segmentation (a similar effect is noticed in the case of unaligned segments). Favali, Abbasi-Sureshjani, ter Haar Romeny, and Sarti (2016) considered a similar analysis with small or disconnected contours applied to the study of vessel connectivities.
4.2 The Role of Polarity
The feature of polarity leads to inserting in the model the feature of contrast: contours with the same orientation but opposite contrast are referred to opposite angles. For this reason, we assume that the orientation takes values in ) when we consider the odd filters and in ) while studying the even ones.
The response of the odd filters in presence of a cartoon image is schematically represented in Figure 9. At every boundary point, the maximally activated cell is the one with the same direction of the boundary. Then the maximally firing cells are aligned with the boundary (see Figure 9, top right).
In order to clarify the role of polarity, we consider an image in Figure 9a, studied by Kanizsa (1980), in the contest of a study of convexity in perception. In this case, if we consider only the orientation of the boundaries without polarity, we completely lose any contrast information and obtain the grouping in Figure 9b. Here, the upper edge of the square is grouped as a unique perceptual unit. On the other side, while inserting polarity, the Gabor patches on the upper edge boundary of the black or white region have opposite contrast and detect values of , which differs from (see Figure 9c). There is no affinity between these patches; the first eigenvector of the affinity matrix, represented in red, correctly detects the unit present in the image and corresponds to the inducers of the semicircle (see Figure 9d). This underlines the important role of polarity in perceptual individuation and segmentation. We also note that the first perceptual unit detected is the convex one, as predicted by the gestalt law (see Kanizsa, 1980).
4.3 The Kanizsa Illusory Figures
We consider here stimuli formed by Kanizsa figures, represented by oriented segments that simulate the output of simple cells. Lee and Nguyen (2001) describe the completion of Kanizsa figures taking place in V1.
The results of simulations with the fundamental solutions of Fokker-Planck and sub-Riemannian Laplacian equations are shown in Figure 10. The first eigenvector, visualized in red, corresponds to the inducers of the Kanizsa triangle (see Figure 10c). In this example, after computing the first eigenvector of the affinity matrix, this matrix is updated by removing the identified perceptual unit and then computing the first eigenvector of the new matrix (see Figure 10d). These simulations show that circles are associated with the less salient eigenvectors. In that way, the first eigenvalue can be considered a quantitative measure of saliency because it allows the segmentation of the most important object in the scene and the results of simulations confirm the visual grouping. When the affinity matrix is formed by different eigenvector with almost the same eigenvalues, as in Figure 10d, it is not possible to recognize the most salient object because they all have the same influence. Here, we show just one inducer in red. The other two have the same eigenvalue. That also happens, for example, when the inducers are not aligned circularly or are rotated.
Now we consider the Kanizsa square as stimulus and change the angle between the inducers, so that the subjective contours become curved (see Figures 10e–h, second row). The fact that illusory figures are perceived depends on a limit curvature. Indeed, we perceive a square in the first three cases but not the last one. The results of simulations with the fundamental solutions of Fokker-Planck and sub-Riemannian Laplacian equations confirm the visual grouping (see Figures 10e–h, third row). When the angle between the inducers is not too high, Figures 10e to 10g, the first eigenvector corresponds to the inducers that form the square; otherwise, in Figure 10h, the “Pacman” becomes the most salient object in the image. In this case, we obtain four eigenvectors with almost the same eigenvalue.
Now we consider the Kanizsa bar in Figure 10i, second row) that is perceived only if the inducers are aligned. Also, the result of simulation confirms the visual perception if we use the fundamental solutions of the Fokker-Planck and the sub-Riemannian Laplacian equations (see Figure 10i, third row). When the inducers are not aligned, all the kernels confirm the visual perception, showing two different perceptual units (see Figure 10j).
Considering a stimulus composed of rotated or unaligned inducers, as in Figures 10k and 10l, it is not possible to perceive it, and the results of simulations, using all the connectivity kernels described, confirm the visual grouping. In that case, the affinity matrix is decomposed in three eigenvectors with almost the same eigenvalues, which represent the three perceptual units in the scene.
4.4 Sub-Riemannian Fokker-Planck versus Sub-Riemannian Laplacian
The two kernels we analyze are not mutually exclusive and can be implemented in different cells. The presence of different populations of cells in relation to mathematical models has been also studied in Ben-Shahar and Zucker (2004). We have outlined in sections 2.2 and 3.2 that the Fokker-Planck kernel accounts for long-range connectivity and the sub-Riemannian Laplacian for a short range. In the previous examples, we obtained good results with both kernels, but this difference emerges while we change the parameters. In Figure 11, we compare the action of these two kernels.
In the first row of Figure 11, we see some segments, which form a unique perceptual unit. If they are not too far, the grouping is correctly performed by both the Fokker-Planck and the sub-Riemannian Laplacian (see Figures 11a and 11b). When we separate the inducers, the perceptual unit is correctly detected by the Fokker-Planck kernel (see Figure 11c), while the sub-Riemannian Laplacian is unable to perform the grouping (see Figure 11d). This confirms that the Fokker-Planck kernel is responsible for long-range connectivity. In the second row, we consider an angle. When the angle is sufficiently large, the Fokker-Planck becomes unable to perform the grouping (see Figure 11e), while the sub-Riemannian Laplacian correctly performs the grouping of the perceptual unit (see Figure 11f). This confirms that the sub-Riemannian Laplacian can be used as a model for short-range connectivity.
4.5 Sub-Riemannian versus Riemannian Kernels
In order to further validate the sub-Riemannian model, we show that the model applied with the isotropic Laplacian kernel does not perform correctly. In Figure 12 (top), the visual perception is not correctly modeled: the first eigenvectors coincide with one of the inducers and the squares are not recognized. That also happens for the stimulus of Figure 10a and when the inducers are unaligned circularly or are rotated.
In this work, we have presented a neurally based model for figure-ground segmentation using spectral methods, where segmentation has been performed by computing eigenvectors of affinity matrices.
Different connectivity kernels that are compatible with the functional architecture of the primary visual cortex have been used. We have modeled them as fundamental solutions of Fokker-Planck, sub-Riemannian Laplacian, and isotropic Laplacian equations and compared their properties.
With this model, we have identified perceptual units of different Kanizsa figures, showing that this can be considered a good quantitative model for the constitution of perceptual units equipped by their saliency. We have also shown that the fundamental solutions of Fokker-Planck and sub-Riemannian Laplacian equations are good models for the continuation law, while the isotropic Laplacian equation is less representative for this gestalt law. However, it retrieves information about ladder parallelism, a feature that can be analyzed in the future. All three kernels are able to accomplish boundary, completion, with a preference for the Fokker-Planck and the sub-Riemannian Laplacian operators.
The proposed mathematical model is then able to integrate local and global gestalt laws as a process implemented in the functional architecture of the visual cortex. The kernel considered here depends only on orientation. Hence, it can be applied to detect the salience of geometrical figures, which can be very well described using this feature. The same method can be applied to natural images if their main features are related to orientations, as in retinal images (see Favali et al., 2016). The ideas presented here could be extended to more general kernels able to detect geometrical features different from orientation as curvature (Abbasi et al., 2016), and we are confident that there is a relation between the highest eigenvector and the salient object. However, for general images, we cannot rely on this simple geometric method, since different cortical areas can be involved in the definition of the salience, with a modulatory effect on the connectivity of V1.
We are grateful to the anonymous reviewers for their constructive comments, which helped us to improve this letter. The research leading to these results has received funding from the People Programme (Marie Curie Actions) of the European Union’s Seventh Framework Programme FP7/2007-2013/ under REA grant agreement 607643.