We show how a Hopfield network with modifiable recurrent connections undergoing slow Hebbian learning can extract the underlying geometry of an input space. First, we use a slow and fast analysis to derive an averaged system whose dynamics derives from an energy function and therefore always converges to equilibrium points. The equilibria reflect the correlation structure of the inputs, a global object extracted through local recurrent interactions only. Second, we use numerical methods to illustrate how learning extracts the hidden geometrical structure of the inputs. Indeed, multidimensional scaling methods make it possible to project the final connectivity matrix onto a Euclidean distance matrix in a high-dimensional space, with the neurons labeled by spatial position within this space. The resulting network structure turns out to be roughly convolutional. The residual of the projection defines the nonconvolutional part of the connectivity, which is minimized in the process. Finally, we show how restricting the dimension of the space where the neurons live gives rise to patterns similar to cortical maps. We motivate this using an energy efficiency argument based on wire length minimization. Finally, we show how this approach leads to the emergence of ocular dominance or orientation columns in primary visual cortex via the self-organization of recurrent rather than feedforward connections. In addition, we establish that the nonconvolutional (or long-range) connectivity is patchy and is co-aligned in the case of orientation learning.
Activity-dependent synaptic plasticity is generally thought to be the basic cellular substrate underlying learning and memory in the brain. In Donald Hebb (1949) postulated that learning is based on the correlated activity of synaptically connected neurons: if both neurons A and B are active at the same time, then the synapses from A to B and B to A should be strengthened proportionally to the product of the activity of A and B. However, as it stands, Hebb's learning rule diverges. Therefore, various modifications of Hebb's rule have been developed, which basically take one of three forms (see Gerstner & Kistler, 2002, and Dayan & Abbott, 2001). First, a decay term can be added to the learning rule so that each synaptic weight is able to “forget” what it previously learned. Second, each synaptic modification can be normalized or projected on different subspaces. These constraint-based rules may be interpreted as implementing some form of competition for energy between dendrites and axons (for details, see Miller, 1996; Miller & MacKay, 1996; Ooyen, 2001). Third, a sliding threshold mechanism can be added to Hebbian learning. For instance, a postsynaptic threshold rule consists of multiplying the presynaptic activity and the subtraction of the average postsynaptic activity from its current value, which is referred to as covariance learning (see Sejnowski & Tesauro, 1989). Probably the best known of these rules is the BCM rule presented in Bienenstock, Cooper, and Munro (1982). It should be noted that history-based rules can also be defined without changing the qualitative dynamics of the system. Instead of considering the instantaneous value of the neurons’ activity, these rules consider its weighted mean over a time window (see Földiák, 1991; Wallis & Baddeley, 1997). Recent experimental evidence suggests that learning may also depend on the precise timing of action potentials (see Bi & Poo, 2001). Contrary to most Hebbian rules that detect only correlations, these rules can also encode causal relationships in the patterns of neural activation. However, the mathematical treatment of these spike-timing-dependent rules is much more difficult than rate-based ones.
Hebbian-like learning rules have often been studied within the framework of unsupervised feedfoward neural networks (see Oja, 1982; Bienenstock et al., 1982; Miller & MacKay, 1996; Dayan & Abbott, 2001). They also form the basis of most weight-based models of cortical development, assuming fixed lateral connectivity (e.g., Mexican hat) and modifiable vertical connections (see the review of Swindale, 1996).1 In these developmental models, the statistical structure of input correlations provides a mechanism for spontaneously breaking some underlying symmetry of the neuronal receptive fields, leading to the emergence of feature selectivity. When such correlations are combined with fixed intracortical interactions, there is a simultaneous breaking of translation symmetry across the cortex, leading to the formation of a spatially periodic cortical feature map. A related mathematical formulation of cortical map formation has been developed in Takeuchi and Amari (1979) and Bressloff (2005) using the theory of self-organizing neural fields. Although very irregular, the two-dimensional cortical maps observed at a given stage of development, can be unfolded in higher dimensions to get smoother geometrical structures. Indeed, Bressloff, Cowan, Golubitsky, Thomas, and Wiener (2001) suggested that the network of orientation pinwheels in V1 is a direct product between a circle for orientation preference and a plane for position, based on a modification of the ice cube model of Hubel and Wiesel (1977). From a more abstract geometrical perspective, Petitot (2003) has associated such a structure to a 1-jet space and used this to develop some applications to computer vision. More recently, more complex geometrical structures such as spheres and hyperbolic surfaces that incorporate additional stimulus features such as spatial frequency and textures, were considered, respectively, in Bressloff and Cowan (2003) and Chossat and Faugeras (2009).
In this letter, we show how geometrical structures related to the distribution of inputs can emerge through unsupervised Hebbian learning applied to recurrent connections in a rate-based Hopfield network. Throughout this letter, the inputs are presented as an external nonautonomous forcing to the system and not an initial condition, as is often the case in Hopfield networks. It has previously been shown that in the case of a single fixed input, there exists an energy function that describes the joint gradient dynamics of the activity and weight variables (see Dong & Hopfield, 1992). This implies that the system converges to an equilibrium during learning. We use averaging theory to generalize the above result to the case of multiple inputs, under the adiabatic assumption that Hebbian learning occurs on a much slower timescale than both the activity dynamics and the sampling of the input distribution. We then show that the equilibrium distribution of weights, when embedded into for sufficiently large integer k, encodes the geometrical structure of the inputs. Finally, we numerically show that the embedding of the weights in two dimensions (k=2) gives rise to patterns that are qualitatively similar to experimentally observed cortical maps, with the emergence of features columns and patchy connectivity. In contrast to standard developmental models, cortical map formation arises via the self-organization of recurrent connections rather than feedforward connections from an input layer. Although the mathematical formalism we introduce here could be extended to most of the rate-based Hebbian rules in the literature, we present the theory for Hebbian learning with decay because of the simplicity of the resulting dynamics.
The use of geometrical objects to describe the emergence of connectivity patterns has previously been proposed by Amari in a different context. Based on the theory of information geometry, Amari considers the geometry of the set of all the networks and defines learning as a trajectory on this manifold for perceptron networks in the framework of supervised learning (see Amari, 1998) or for unsupervised Boltzmann machines (see Amari, Kurata, & Nagaoka, 1992). He uses differential and Riemannian geometry to describe an object that is at a larger scale than the cortical maps considered here. Moreover, Zucker and colleagues are currently developing a nonlinear dimensionality-reduction approach to characterize the statistics of natural visual stimuli (see Zucker, Lawlor, & Holtmann-Rice, 2011; Coifman, Maggioni, Zucker, & Kevrekidis, 2005). Although they do not use learning neural networks and stay closer to the field of computer vision than this letter does, it turns out their approach is similar to the geometrical embedding approach we are using.
The structure of the letter is as follows. In section 2, we apply mathematical methods to analyze the behavior of a rate-based learning network. We formally introduce a nonautonomous model to be averaged in a second time. This then allows us to study the stability of the learning dynamics in the presence of multiple inputs by constructing an appropriate energy function. In section 3, we determine the geometrical structure of the equilibrium weight distribution and show how it reflects the structure of the inputs. We also relate this approach to the emergence of cortical maps. Finally, the results are discussed in section 4.
2. Analytical Treatment of a Hopfield-Type Learning Network
2.1.1. Neural Network Evolution.
2.1.2. Correlation-Based Hebbian Learning.
Let us make few remarks about the existence and uniqueness of solutions. First, boundedness of S implies boundedness of the system . More precisely, if I is bounded, the solutions are bounded. To prove this, note that the right-hand side of the equation for W is the sum of a bounded term and a linear decay term in W. Therefore, W is bounded, and, hence, the term is also bounded. The same reasoning applies to V. S being Lipschitz continuous implies that the right-hand side of the system is Lipschitz. This is sufficient to prove the existence and uniqueness of the solution by applying the Cauchy-Lipschitz theorem. In the following, we derive an averaged autonomous dynamical system , which will be well defined for the same reasons.
2.2. Averaging the System.
System is a nonautonomous system that is difficult to analyze because the inputs are periodically changing. It has already been studied in the case of a single input (see Dong & Hopfield, 1992), but it remains to be analyzed in the case of multiple inputs. We show in section A.1 that this system can be approximated by an autonomous Cauchy problem, which will be much more convenient to handle. This averaging method makes the most of multiple timescales in the system.
In the remainder of the letter, we focus on the system whose solutions are close to those of the original system provided condition A.2 in the appendix is satisfied, that is, the network is weakly connected. Finally, note that it is straighforward to extend our approach to time-functional rules (e.g., sliding threshold or BCM rules as described in Bienenstock et al., 1982), which, in this new framework, would be approximated by simple ordinary differential equations (as opposed to time-functional differential equations) provided S is redefined appropriately.
2.3.1. Lyapunov Function.
The initial value problem for the system with , converges to an equilibrium state.
See section A.3.
It follows that neither oscillatory nor chaotic attractor dynamics can occur.
2.3.2. Linear Stability.
Although we have shown that there are stable fixed points, not all of the fixed points are stable. However, we can apply a linear stability analysis on the system to derive a simple sufficient condition for a fixed point to be stable. The method we use in the proof could be extended to more complex rules.
See section A.4.
This condition is strikingly similar to that derived in Faugeras, Grimbert, and Slotine (2008) (in fact, it is stronger than the contracting condition they find). It says the network may converge to a weakly connected situation. It also says that the dynamics of is likely (because the condition is only sufficient) to be contracting and therefore subject to no bifurcations: a fully recurrent learning neural network is likely to have a simple dynamics.
2.4. Equilibrium Points.
3. Geometrical Structure of Equilibrium Points
3.1. From a Symmetric Connectivity Matrix to a Convolutional Network.
In practice, it is not always possible to find a geometry for which the connectivity is a distance matrix. Therefore, we project the appropriate matrix on the set of Euclidean distance matrices. This is the set of matrices M such that with . More precisely, we define , where g−1 is applied to the coefficients of . We then search for the distance matrix such that is minimal. In this letter, we consider an L2-norm. Although the choice of an Lp-norm will be motivated by the wiring length minimization argument in section 3.3, note that the choice of a L2 norm is somewhat arbitrary and corresponds to penalizing long-distance connections. The minimization turns out to be a least square minimization whose parameters are the . This can be implemented by a set of methods known as multidimensional scaling, which are reviewed in Borg and Groenen (2005). In particular, we use the stress majorization or SMACOF algorithm for the stress1 cost function throughout this letter. This leads to writing and therefore .
We now consider two choices of the function g:
See section A.5.
3.2. Unveiling the Geometrical Structure of the Inputs.
We hypothesize that the space defined by the xi reflects the underlying geometrical structure of the inputs. We have not found a way to prove this, so we provide numerical examples that illustrate this claim. Therefore, the following is only a (numerical) proof of concept. For each example, we feed the network with inputs having a defined geometrical structure and then show how this structure can be extracted from the connectivity by the method outlined in section 3.1. In particular, we assume that the inputs are uniformly distributed over a manifold with fixed geometry. This strong assumption amounts to considering that the feedforward connectivity (which we do not consider here) has already properly filtered the information coming from the sensory organs. More precisely, define the set of inputs by the matrix such that where the za are uniformly distributed points over , the yi are the positions on that label the ith neuron, and f is a decreasing function on . The norm is the natural norm defined over the manifold . For simplicity, assume so that the inputs are localized bell-shaped bumps on the shape .
3.2.1. Planar Retinotopy.
If the standard deviation of the inputs is properly chosen as in Figure 3b, we observe that the neurons are distributed on a regular grid, which is retinotopically organized. In other words, the network has learned the geometric shape of the inputs. This can also be observed in Figure 3d, which corresponds to the same connectivity matrix as in Figure 3b but is represented in three dimensions. The neurons self-organize on a two-dimensional saddle shape that accounts for the border distortions that can be observed in two dimensions (which we discuss in the next paragraph). If is too large, as can be observed in Figure 3c, the final result is poor. Indeed, the inputs are no longer local and cover most of the visual field. Therefore, the neurons saturate, , for all the inputs, and no structure can be read in the activity variable. If is small, the neurons still seem to self-organize (as long as the inputs are not completely localized on single neurons) but with significant border effects.
There are several reasons that we observe border distortions in Figure 3. We believe the most important is due to an unequal average excitation of the neurons. Indeed, the neurons corresponding to the border of the “visual field” are less excited than the others. For example, consider a neuron on the left border of the visual field. It has no neighbors on its left and therefore is less likely to be excited by its neighbors and less excited on average. The consequence is that it is less connected to the rest of the network (see, e.g., the top line of the right side of Figure 2) because their connection depends on their level of excitment through the correlation of the activity. Therefore, it is farther away from the other neurons, which is what we observe. When the inputs are localized, the border neurons are even less excited on average and thus are farther away, as shown in Figure 3a.
Another way to get distortions in the positions xi is to reduce or increase excessively the amplitude Im=maxi,a|I(a)i| of the inputs. Indeed, if it is small, the equilibrium actitivity described by equation 2.14 is also small and likely to be the flat part of the sigmoid. In this case, neurons tend to be more homogeneously excited and less sensitive to the particular shape of the inputs. Therefore, the network loses some information about the underlying structures of the inputs. Actually the neurons become relatively more sensitive to the neighborhood structure of the network, and the border neurons have a different behavior from the rest of the network, as shown in Figure 3e. The parameter has much less impact on the final shape since it corresponds only to a homogeneous scaling of the final connectivity.
So far, we have assumed that the inputs were uniformly spread on the manifold . If this assumption is broken, the final position of the neurons will be affected. As shown in Figure 3f, where 50 inputs were added to the case of Figure 3b in only half of the visual field, the neurons that code for this area tend to be closer. Indeed, they tend not to be equally excited on average (as supposed in proposition 1), and a distortion effect occurs. This means that a proper understanding of the role of the vertical connectivity would be needed to complete this geometrical picture of the functioning of the network. This is, however, beyond the scope of this letter.
3.2.2. Toroïdal Retinotopy.
We now assume that the inputs are uniformly distributed over a two-dimensional torus, . That is, the input labels za are randomly distributed on the torus. The neuron labels yi are regularly and uniformly distributed on the torus. The inputs and final stable weight matrix of the simulated system are shown in Figure 4. The positions xi of the neurons after multidimensional scaling for k=3 are shown in Figure 5 and appear to form a cloud of points distributed on a torus. In contrast to the previous example, there are no distortions now because there are no borders on the torus. In fact, the neurons are equally excited on average in this case which makes proposition 1 valid.
3.3. Links with Neuroanatomy.
The brain is subject to energy constraints, which are completely neglected in the above formulation. These constraints most likely have a significant impact on the positions of real neurons in the brain. Indeed, it seems reasonable to assume that the positions and connections of neurons reflect a trade-off between the energy costs of biological tissue and their need to process information effectively. For instance, it has been suggested that a principle of wire length minimization may occur in the brain (see Swindale, 1996; Chklovskii, Schikorski, & Stevens, 2002). In our neural mass framework, one may consider that the stronger two neural masses are connected, the larger the number of real axons linking the neurons together. Therefore, minimizing axonal length can be read as that the stronger the connection, the closer, which is consistent with the convolutional part of the weight matrix. However, the underlying geometry of natural inputs is likely to be very high-dimensional, whereas the brain lies in a three-dimensional world. In fact, the cortex is so flat that it is effectively two-dimensional. Hence, the positions of real neurons are different from the positions in a high-dimensional vector space; since the cortex is roughly two-dimensional, the positions could be realized physically only if k = 2. Therefore, the three-dimensional toric geometry or any higher-dimensional structure could not be perfectly implemented in the cortex without the help of nonconvolutional long-range connections. Indeed, we suggest that the cortical connectivity is made of two parts: (1) a local convolutional connectivity corresponding to the convolutional term g in equations 3.2 and 3.3, which is consistent with the requirements of energy efficiency, and (2) a nonconvolutional connectivity corresponding to the factor M in equations 3.2 and 3.3, which is required in order to represent various stimulus features. If the cortex were higher-dimensional (), then there would no nonconvolutionnal connectivity M, that is, for the linear convolutional kernel or for the exponential one.
We illustrate this claim by considering two examples based on the functional anatomy of the primary visual cortex: the emergence of ocular dominance columns and orientation columns, respectively. We proceed by returning to the case of planar retinotopy (see section 5.3.1) but now with additional input structure. In the first case, the inputs are taken to be binocular and isotropic, whereas in the second case, they are taken to be monocular and anisotropic. The details are presented below. Given a set of prescribed inputs, the network evolves according to equation 2.10, and the lateral connections converge to a stable equilibrium. The resulting weight matrix is then projected onto the set of distance matrices for k=2 (as described in section 3.2.1) using the stress majorization or SMACOF algorithm for the stress1 cost function as described in Borg and Groenen (2005). We thus assign a position to the ith neuron, . (Note that the position xi extracted from the weights using multidimensional scaling is distinct from the “physical” position yi of the neuron in the retinocortical plane; the latter determines the center of its receptive field.) The convolutional connectivity (g in equations 3.2 and 3.3) is therefore completely defined. On the planar map of points xi, neurons are isotropically connected to their neighbors; the closer the neurons are, the stronger is their convolutional connection. Moreover, since the stimulus feature preferences (orientation, ocular dominance) of each neuron i, , are prescribed, we can superimpose these feature preferences on the planar map of points xi. In both examples, we find that neurons with the same ocular or orientation selectivity tend to cluster together: interpolating these clusters then generates corresponding feature columns. It is important to emphasize that the retinocortical positions yi do not have any columnar structure, that is, they do not form clusters with similar feature preferences. Thus, in contrast to standard developmental models of vertical connections, the columnar structure emerges from the recurrent weights after learning, which are intrepreted as a Euclidean distances. It follows that neurons coding for the same feature tend to be strongly connected; indeed, the multidimensional scaling algorithm has the property that it positions strongly connected neurons close together. Equations 3.2 and 3.3 also suggest that the connectivity has a nonconvolutional part, M, which is a consequence of the low dimensionality (k=2). In order to illustrate the structure of the nonconvolutional connectivity, we select a neuron i in the plane and draw a link from it at position xi to the neurons at position xj for which M(xi, xj) is maximal in Figures 7, 8, and 9. We find that M tends to be patchy; it connects neurons having the same feature preferences. In the case of orientation, M also tends to be coaligned, that is, connecting neurons with similar orientation preference along a vector in the plane of the same orientation.
Note that the proposed method to get cortical maps is artificial. First, the networks learn its connectivity and, second, the equilibrium connectivity matrix is used to give a position to the neurons. In biological tissues, the cortical maps are likely to emerge slowly during learning. Therefore, a more biological way of addressing the emergence of cortical maps may consist in repositioning the neurons at each time step, taking the position of the neurons at time t as the initial condition of the algorithm at time t+1. This would correspond better to the real effect of the energetic constraints (leading to wire length minimization), which occur not only after learning but also at each time step. However, when learning is finished, the energetic landscape corresponding to the minimization of the wire length would still be the same. In fact, repositioning the neurons successively during learning changes only the local minimum where the algorithm converges: it corresponds to a different initialization of the neurons’ positions. Yet we do not think this corresponds to a more relevant minimum since the positions of the points at time t=0 are still randomly chosen. Although not biological, the a posteriori method is not significantly worse than the application of algorithm at each time step and gives an intuition of the shapes of the local minima of the functional leading to the minimization of the wire length.
3.3.1. Ocular Dominance Columns and Patchy Connectivity.
First, we illustrate the results of ocular dominance simulations in one dimension in Figure 6. Although it is not biologically realistic, taking the visual field to be one-dimensional makes it possible to visualize the emergence of ocular dominance columns more easily. Indeed, in Figure 6 we analyze the role of the binocular disparity of the network, that is, we change the value of . If (the blue curves in Figures 6a and 6b and top pictures in Figures 6c and 6d), there are virtually no differences between left and right eyes, and we observe much less segregation than in the case (the green curves in Figures 6a and 6b and the bottom pictures in Figures 6c and 6d). Increasing the binocular disparity between the two eyes results in the emergence of ocular dominance columns. Yet there does not seem to be any spatial scale associated with these columns: they form on various scales, as shown in Figure 6d.
In Figures 7 and 8, we plot the results of ocular dominance simulations in two dimensions. In particular, we illustrate the role of changing the binocular disparity , changing the standard deviation of the inputs , and using different convolutional kernels g. We plot the points xi obtained by performing multidimensional scaling on the final connectivity matrix for k=2 and superimposing on this the ocular dominance map obtained by interpolating between clusters of neurons with the same eye preference. The convolutional connectivity (g in equations 3.2 and 3.3) is implicitly described by the position of the neurons: the closer the neurons, the stronger their connections. We also illustrate the nonconvolutional connectivity (M in equations 3.2 and 3.3) by linking one selected neuron to the neurons it is most strongly connected to. The color of the link refers to the color of the target neuron. The multidimensional scaling algorithm was applied for each set of parameters with different initial conditions, and the best final solution (i.e., with the smallest nonconvolutional part) was kept and plotted. The initial conditions were random distributions of neurons or artificially created ocular dominance stripes with different numbers of neurons per stripe. It turns out the algorithm performed better on the latter. (The number of tunable parameters was too high for the system to converge to a global equilibrium for a random initial condition.) Our results show that nonconvolutional or long-range connections tend to link cells with the same ocular dominance provided the inputs are sufficiently strong and different for each eye.
3.3.2. Orientation Columns and Collinear Connectivity.
In this letter, we have shown how a neural network can learn the underlying geometry of a set of inputs. We have considered a fully recurrent neural network whose dynamics is described by a simple nonlinear rate equation, together with unsupervised Hebbian learning with decay that occurs on a much slower timescale. Although several inputs are periodically presented to the network, so that the resulting dynamical system is nonautonomous, we have shown that such a system has a fairly simple dynamics: the network connectivity matrix always converges to an equilibrium point. We then demonstrated how this connectivity matrix can be expressed as a distance matrix in for sufficiently large k, which can be related to the underlying geometrical structure of the inputs. If the connectivity matrix is embedded in a lower two-dimensional space (k=2), then the emerging patterns are qualitatively similar to experimentally observed cortical feature maps, although we have considered simplistic stimuli and the feedforward connectivity as fixed. That is, neurons with the same feature preferences tend to cluster together, forming cortical columns within the embedding space. Moreover, the recurrent weights decompose into a local isotropic convolutional part, which is consistent with the requirements of energy efficiency and a longer-range nonconvolutional part that is patchy. This suggest a new interpretation of the cortical maps: they correspond to two-dimensional embeddings of the underlying geometry of the inputs.
Geometric diffusion methods (see Coifman et al., 2005) are also an efficient way to reveal the underlying geometry of sets of inputs. There are differences with our approach, although both share the same philosophy. The main difference is that geometric harmonics deals with the probability of co-occurence, whereas our approach is more focused on wiring length, which is indirectly linked to the inputs through Hebbian coincidence. From an algoritmic point of view, our method is concerned with the position of N neurons and not M inputs, which can be a clear advantage in certain regimes. Indeed, we deal with matrices of size , whereas the total size of the inputs is , which is potentially much higher. Finally, this letter is devoted to decomposing the connectivity between a convolutional and nonconvolutional part, and this is why we focus not only on the spatial structure but also on the shape of the activity equation on this structure. These two results come together when decomposing the connectivity. Actually, this focus on the connectivity was necessary to use the energy minization argument of section 2.3.1 and compute the cortical maps in section 3.3.
One of the limitations of applying simple Hebbian learning to recurrent cortical connections is that it takes into account only excitatory connections, whereas 20% of cortical neurons are inhibitory. Indeed, in most developmental models of feedforward connections, it is assumed that the local and convolutional connections in cortex have a Mexican hat shape with negative (inhibitory) lobes for neurons that are sufficiently far from each other. From a computational perspective, it is possible to obtain such a weight distribution by replacing Hebbian learning with some form of covariance learning (see Sejnowski & Tesauro, 1989). However, it is difficult to prove convergence to a fixed point in the case of the covariance learning rule, and the multidimensional scaling method cannot be applied directly unless the Mexican hat function is truncated so that it is invertible. Another limitation of rate-based Hebbian learning is that it does not take into account causality, in contrast to more biologically detailed mechanisms such as spike-timing-dependent plasticity.
The approach taken here is very different from standard treatments of cortical development (as in Miller, Keller, & Stryker, 1989; Swindale, 1996), in which the recurrent connections are assumed to be fixed and of convolutional Mexican hat form while the feedforward vertical connections undergo some form of correlation-based Hebbian learning. In the latter case, cortical feature maps form in the physical space of retinocortoical coordinates yi, rather than in the more abstract planar space of points xi obtained by applying multidimensional scaling to recurrent weights undergoing Hebbian learning in the presence of fixed vertical connections. A particular feature of cortical maps formed by modifiable feedforward connections is that the mean size of a column is determined by a Turing-like pattern forming instability and depends on the length scales of the Mexican hat weight function and the two-point input correlations (see Miller et al., 1989; Swindale, 1996). No such Turing mechanism exists in our approach, so the resulting cortical maps tend to be more fractal-like (many length scales) compared to real cortical maps. Nevertheless, we have established that the geometrical structure of cortical feature maps can also be encoded by modifiable recurrent connections. This should have interesting consequences for models that consider the joint development of feedforward and recurrent cortical connections. One possibility is that the embedding space of points xi arising from multidimensional scaling of the weights becomes identified with the physical space of retinocortical positions yi. The emergence of local convolutional structures, together with sparser long-range connections, would then be consistent with energy efficiency constraints in physical space.
Our letter also draws a direct link between the recurrent connectivity of the network and the positions of neurons in some vector space such as . In other words, learning corresponds to moving neurons or nodes so that their final position will match the inputs’ geometrical structure. Similarly, the Kohonen algorithm detailed in Kohonen (1990) describes a way to move nodes according to the inputs presented to the network. It also converges toward the underlying geometry of the set of inputs. Although these approaches are not formally equivalent, it seems that both have the same qualitative behavior. However, our method is more general in the sense that no neighborhood structure is assumed a priori; such a structure emerges via the embedding into .
Finally, note that we have used a discrete formalism based on a finite number of neurons. However, the resulting convolutional structure obtained by expressing the weight matrix as a distance matrix in (see equations 3.2 and 3.3) allows us to take an appropriate continuum limit. This then generates a continuous neural field model in the form of an integro-differential equation whose integral kernel is given by the underlying weight distribution. Neural fields have been used increasingly to study large-scale cortical dynamics (see Coombes, 2005 for a review). Our geometrical learning theory provides a developmental mechanism for the formation of these neural fields. One of the useful features of neural fields from a mathematical perspective is that many of the methods of partial differential equations can be carried over. Indeed, for a general class of connectivity functions defined over continuous neural fields, a reaction-diffusion equation can be derived whose solution approximates the firing rate of the associated neural field (see Degond & Mas-Gallic, 1989; Cottet, 1995; Edwards, 1996). It appears that the necessary connectivity functions are precisely those that can be written in the form 3.2. This suggests that a network that has been trained on a set of inputs with an appropriate geometrical structure behaves as a diffusion equation in a high-dimensional space together with a reaction term corresponding to the inputs.
A.1. Proof of System's Averaging.
Here, we show that system can be approximated by the autonomous Cauchy problem . Indeed, we can simplify system by applying Tikhonov's theorem for slow and fast systems and then classical averaging methods for periodic systems.
A.1.1. Tikhonov's Theorem
The equation 0=g(x, y, t) is solved by , where is a continuous function and an isolated root. Also suppose that is an asymptotically stable solution of the equation that is uniform in the parameters and .
y(0) is contained in an interior subset of the domain of attraction of .
A.1.2. Periodic Averaging
In this letter, we focus on system whose solutions are close to those of the original system provided condition A.2 is satisfied, that is, the network is weakly connected. Clearly the fixed points of system satisfy . Therefore, equation A.2 says that if , then Tikhonov's theorem can be applied, and systems and can be reasonably considered as good approximations of each other.
A.2. Proof of the Convergence to the Symmetric Attractor .
We need to prove the two points: (1) is an invariant set and (2) for all , converges to as . Since is the direct sum of the set of symmetric connectivities and the set of antisymmetric connectivies, we write , where WS is symetric and WA is antisymetric.
A.3. Proof of Theorem 1.
is lower-bounded. Indeed, and W are bounded. Given that and S are also bounded, it is clear that is bounded.
is negative semidefinite on the trajectories as shown in equation A.13.
Then the invariance principle tells us that the solutions of system approach the set . Equation A.13 implies that . Since and everywhere, M consists of the equilibrium points of the system. This completes the proof.
A.4. Proof of Theorem 2.
A.5. Proof of Proposition 1.
Observe that and separate the two cases for the choice of function g:
- If with , then . Therefore, define . We now want to apply h1 to equation A.20, but h1 is monotonic only if . Therefore, divide equation A.20 by 2 and apply h1 on both sides to get for . Because, h1 is increasing, it is clear that (and similarly for y), such that reaches its maximum for . Besides, . This proves that . Moreover, it is easy to observe that for all a>1. This concludes the proof for .
If g(x)=ae−x, then . As before, define . We still want to apply h2 to equation A.20, but h2 is not defined for , which is likely for the right-hand side of equation A.20. Therefore, we apply h2 to equation A.20, divided by two, and use the fact that h2 is increasing on . This leads to . First, we use the convexity of h2 to get , and then we use the fact that for , with . This would conclude the proof, but we have to make sure the angles remain in . Actually, we can compute , which verifies . This leads to . In fact, the coefficients of u, v, and w are strictly positive and larger than S(0). Therefore, the angles between them are strictly smaller than . More precisely, . Therefore, a necessary condition for the result to be true is , using the fact that leads to .
M.N.G. and O.D.F. were partially funded by ERC advanced grant NerVi nb 227747. M.N.G. was partially funded by the région PACA, France. This letter was based on work supported in part by the National Science Foundation (DMS-0813677) and by Award KUK-C1-013-4 made by King Abdullah University of Science and Technology. P.C.B. was also partially supported by the Royal Society Wolfson Foundation. This work was partially supported by the IP project BrainScales No. 269921.