Abstract
The human visual system has the remarkable ability to largely recognize objects invariant of their position, rotation, and scale. A good interpretation of neurobiological findings involves a computational model that simulates signal processing of the visual cortex. In part, this is likely achieved step by step from early to late areas of visual perception. While several algorithms have been proposed for learning feature detectors, only few studies at hand cover the issue of biologically plausible learning of such invariance. In this study, a set of Hebbian learning rules based on calcium dynamics and homeostatic regulations of single neurons is proposed. Their performance is verified within a simple model of the primary visual cortex to learn so-called complex cells, based on a sequence of static images. As a result, the learned complex-cell responses are largely invariant to phase and position.
1. Introduction
Object recognition, which means the ability to perceive, recognize, and distinguish objects in the real world, is one of the most remarkable properties of the human visual system. Although researchers have made great strides in the development of artificial vision systems over the past few decades, they are still no match for the huge variety of tasks the human brain can perform. Despite the fact that objects can vary in form, texture, size, color, and other characteristics, the brain can recognize them effortlessly even from different viewpoints, different backgrounds, or partially obscured. One possible key to successful object recognition in artificial vision systems lies in studying the underlying principles of the visual system of the primate brain.
Fortunately, the primate visual system has been well investigated functionally, anatomically, and computationally. It is widely accepted that the visual system achieves some degree of invariance gradually from early visual areas to high-level areas. The first area in the cortical visual system where invariant cell properties can be found is the primary visual cortex (V1). The behaviors of V1 cells were first explored by Hubel & Wiesel, who coined the terms simple and complex cells. The edge-detecting ability of V1 simple cells gives important information about the structure of the visual input. The receptive fields of simple cells consist of excitatory and inhibitory regions (Hubel & Wiesel, 1962), and their arrangement can be well described by Gabor filters (Jones & Palmer, 1987). In contrast to simple cells, complex-cell responses cannot be fully described by a simple map of inhibitory and excitatory regions. Optimal stimuli for complex cells do not have to be at a special position in their receptive fields, that is, complex cells are slightly invariant to position and phase (Hubel & Wiesel, 1962; De Valois, Albrecht, & Thorell, 1982; Adelson & Bergen, 1985; Carandini et al., 2005). This behavior can be explained as deriving from simple-cell behavior, that is, complex cells in layer 2/3 of the primary visual cortex obtain their inputs from many simple cells in layer 4 with a similar orientation tuning (Hubel & Wiesel, 1962).
There are many approaches for learning simple-cell properties (Olshausen & Field, 1996; Bell & Sejnowski, 1997; van Hateren & van der Schaaf, 1998; Hoyer & Hyvärinen, 2000; Falconbridge, Stamps, & Badcock, 2006; Hamker & Wiltschut, 2007; Rehn & Sommer, 2007; Weber & Triesch, 2008; Wiltschut & Hamker, 2009), but only a few for learning the invariance properties of complex cells using temporal correlations in the input. It has been shown that invariance properties can be learned by Hebbian learning with the additional constraint of an activity trace, often modeled by using a (artificial) history of previous activations in exchange of the activation in Hebb-type learning rules (Földiák, 1991; Wallis & Rolls, 1997; Einhäuser, Kayser, König, & Körding, 2002; Spratling, 2005). Other approaches use an objective function for minimizing the difference of the output units between two consecutive inputs (Kayser, Einhäuser, Dümmer, König, & Körding, 2001; Körding, Kayser, Einhäuser, & König, 2004; Berkes & Wiskott, 2005), or similarly maximize the sparseness of this difference (Hashimoto, 2003). Berkes and Wiskott (2005) applied the slow feature analysis method (SFA; Wiskott & Sejnowski, 2002) to determine parameters of polynomial functions by optimizing their slowness in variation to slight image transformations. Hashimoto (2003) has proposed to maximize the sparseness of temporal output differences to obtain complex-cell properties. Kayser et al. (2001) and Körding et al. (2004) used subspace energy detectors and learned their receptive fields by optimizing the temporal stability of the output. An earlier approach by Kohonen (1996) used subspace energy detectors. All of these approaches are based on the use of temporal coherent input because it is present in the real world. Other research has shown that complex-cell properties can emerge without using temporal correlations of the input (Hyvärinen & Hoyer, 2000, 2001; Osindero, Welling, & Hinton, 2006; Karklin & Lewicki, 2009; Köster & Hyvärinen, 2010). In the independent subspace analysis (Hyvärinen & Hoyer, 2000) and the topographic ICA (Hyvärinen & Hoyer, 2001), pooling residual dependencies between linear filters leads to units with complex-cell-like properties. Köster and Hyvärinen (2010) unify these approaches using weight estimation by score matching (Hyvärinen, 2005), an estimation principle for energy-based models. The model learns to connect features with similar orientation and frequency, but differing in phase. Similarly, Osindero et al. (2006) use a product of Student-t (PoT) approach to model the statistical structure of natural image data. Karklin and Lewicki (2009) shows that a neuronal population encoding the statistical distribution of natural images also shows complex-cell and V2-cell properties. Beside these approaches, Stringer, Perry, Rolls, and Proske (2006), using Hebbian learning, report that it is possible to learn invariance with spatial than temporal continuity. From this previous work, it is an open question if temporal coherence in the input is exploited by the visual cortex to learn invariance properties.
While much previous work has used artificial data sets such as bar problems (Földiák, 1991; Spratling, 2005; Stringer et al., 2006), or seminatural stimuli such as faces on different backgrounds (Wallis & Rolls, 1997; Stringer & Rolls, 2000), more recent work is based on natural images and video sequences (Kohonen, 1996; Hyvärinen and Hoyer, 2000, 2001; Kayser et al., 2001; Einhäuser et al., 2002; Hashimoto, 2003; Körding et al., 2004; Berkes & Wiskott, 2005; Osindero et al., 2006; Karklin & Lewicki, 2009; Köster & Hyvärinen, 2010). Despite this advancement in the field, previous algorithms have been tailored to learn only a subset of weights simultaneously and often require a trial-based design.
In this letter, we focus on the learning of shift and phase invariance properties comparable to those of complex cells in area V1 using a sequence of slightly shifted static natural images. We propose a new learning rule based on the conceptual design of the previously developed learning rule for simple cells (Wiltschut & Hamker, 2009). The previous rule was developed on ideas of Hebbian learning (Oja, 1982), covariance learning (Sejnowski, 1977), and anti-Hebbian decorrelation (Földiák, 1990) and lead to largely independent responses of V1 simple cells when trained on natural scenes. Our new rule expands these ideas by incorporating aspects of BCM learning (Bienenstock, Cooper, & Munro, 1982; Shouval, Castellani, Blais, Yeung, & Cooper, 2002; Yeung, Shouval, Blais, & Cooper, 2004; Castellani, Quinlan, Bersani, Cooper, & Shouval, 2005), particularly by using the level of calcium, rather than neural activity, in the learning rules. Moreover, learning in this model is fully continuous, and no reset or change of values between successive presentations is used.
2. Model
2.1. Architecture.
Scheme of the model architecture. The input image is filtered by a set of Gabor functions. Small patches are cut out of the Gabor-filtered images, which represent the V1-Simple cell responses. This simple-cell population obtains the input for the V1-Complex layer, where the learning of invariance, using Hebbian and anti-Hebbian learning mechanisms, occurs.
Scheme of the model architecture. The input image is filtered by a set of Gabor functions. Small patches are cut out of the Gabor-filtered images, which represent the V1-Simple cell responses. This simple-cell population obtains the input for the V1-Complex layer, where the learning of invariance, using Hebbian and anti-Hebbian learning mechanisms, occurs.
2.2. V1-Simple.
, with constant frequency f
and gaussian extent σ (σx = 2.4, σy = 3.2) are used. This set of Gabor functions is applied as convolution kernel (11 × 11 pixel) for every image at every image position. As a result of the convolution, 16 (resp. 32) different maps of simple-cell responses per image are obtained. These maps are normalized to the admissible range [0, 1]. To avoid boundary effects, the convolution is calculated only in the valid inner area of an image.2.3. V1-Complex.
2.4. Neuronal Calcium Level.
2.5. Calcium-Dependent Hebbian Learning.
2.5.1. Time Constant for Calcium-Dependent Synaptic Change.
2.5.2. Calcium-Dependent Synaptic Change.
The design of this learning rule follows the ideas of normalized Hebbian learning using a covariance rule (Sejnowski, 1977; Oja, 1982; Wiltschut & Hamker, 2009), but also includes the biological considerations of calcium dependency (Shouval, Castellani et al., 2002; Yeung et al., 2004; Castellani et al., 2005), where the amount and speed of learning are directly dependent not on the cellular activity but on the synaptic calcium level.
,
: The subtraction of the threshold from the pre- or postsynaptic synaptic calcium level
is denoted by
, respectively
by
. Furthermore, the velocity of the synaptic change is given by a calcium-dependent learning rate (see equation 2.4). The thresholds
,
are population means, and they are calculated as means over the corresponding calcium levels of the neuronal population (here, V1-Simple and V1-Complex). Moreover, the factor for weight normalization is not static; it is constrained to a maximum firing rate dependent on the alpha constraint Calphsjk, where α is adaptive and given by equation 2.7.
of the mean, homosynaptic, LTD occurs, similar to BCM learning. Thus, cells that are not significantly excited (or lose the competition against other cells) slowly decrease their connection strength, but only for those parts of the input that drive the other (strong firing cells). Beacuse LTD happens on a much slower timescale than LTP (due to the calcium-dependent learning rate), the cells slowly decrease their connections to specific input configurations that other cells prefer. The alpha constraint is applied only when it does not amplify the weight change:. 2.5.3. Metaplasticity and Homeostatic Regulation.
Metaplasticity refers to mechanisms that regulate neural parameters such as a synaptic weight in dependence of other parameters (Abraham & Bear, 1996). A number of recent studies support the idea of homeostatic regulation (Turrigiano, Leslie, Desai, Rutherford, & Nelson, 1998; Desai, Cudmore, Nelson, & Turrigiano, 2002). Neurons seem to stabilize their firing rate within a certain target range through global homeostatic regulation of synaptic strength. Increased activity of a cell results in a reduction of the sensitivity, and reduced activity is followed by an enhancement of the sensitivity. Evidence supports a synaptic scaling mechanism in which, unlike in Hebbian learning, all synapses onto a cell are scaled up or down (Turrigiano & Nelson, 2004). The process operates over hours to days and seems to be linked to activity or input current sensors in the neuron (Marder & Prinz, 2002; MacLean, Zhang, Johnson, & Harris-Warrick, 2003). Despite strong evidence for a homeostatic regulation of synaptic strength, little is known about the induction mechanisms, and little computational work exists so far.
. We introduce a dependency of α to the firing rate of a neuron αk(rk), with and with where the increase of αk is determined by Hk and the decrease by α− = 0.0005, a small chosen constant. Hk increases (see equation 2.8) if the firing rate rk is above a certain threshold γ = 0.7 and decreases by its own value and a small constant value, K = 0.05. In consequence of this mechanism, αk will reach a certain value and restrict the increase of weights to appropriate values, so that maximum firing rates are kept close to γ. αk increases only until the firing rate stops to breach the γ threshold and decreases slowly enough so that it remains nearly stable over a sufficiently long period of time. The speed of adaption is given by the temporal constants τα = 10,000 ms and τH = 100 ms.2.6. Anti-Hebbian Learning of Lateral Inhibitory Connections.
We use anti-Hebbian learning to learn lateral inhibitory weights in the way that a cell inhibits another cell if they often fire together. This mechanism leads to statistically independent responses and a sparse code (Wiltschut & Hamker, 2009).
2.6.1. Time Constant for Anti-Hebbian Learning.
2.6.2. Learning Rule.
| Distance . | Probability . |
|---|---|
| 1 pixel | 0.51 |
| 2 pixel | 0.25 |
| 3 pixel | 0.12 |
| 4 pixel | 0.06 |
| 5 pixel | 0.03 |
| 6 pixel | 0.02 |
| 7 pixel | 0.01 |
| Distance . | Probability . |
|---|---|
| 1 pixel | 0.51 |
| 2 pixel | 0.25 |
| 3 pixel | 0.12 |
| 4 pixel | 0.06 |
| 5 pixel | 0.03 |
| 6 pixel | 0.02 |
| 7 pixel | 0.01 |
3. Materials and Methods
3.1. Training.
A picture set of 10 monochrome images (512 × 512 pixel) of natural scenes, taken from Bruno Olshausen's Sparsenet (http://redwood.berkeley.edu/bruno/sparsenet/) was used in the learning phase. The same image set has been successfully used in the literature to learn from natural scenes (Olshausen & Field, 1996; Rehn & Sommer, 2007; Wiltschut & Hamker, 2009). Each image has been normalized to the range [0,1].
The network was initialized with small, randomly chosen weights and trained for around 1 million presentations. Convergence to stable receptive fields starts after about 500,000 presentations. For learning position invariance at the level of V1, changes in the input from one fixation to the next must be very small, as in fixational eye movements (Dodge, 1907; Zuber, Crider, & Stark, 1964; Martinez-Conde, Macknik, & Hubel, 2004; Rolfs, 2009). Here we follow the observation that during fixation of a certain point in space, the eyes still perform several movements of very small amplitude around the fixation point, thus leading to slightly different views of the scene successively. We generated sequences of 50 image patch presentations using the same image, but slightly shifted patch position due to fixational eye movements (see Table 1). To rule out the possibility that the results depend on too long a sequence length, our model has been tested with a shorter sequence length of 10 consecutive image patches as well. Furthermore, the ability of our model to maintain orientation selectivity is tested using a simple-cell input representing four and eight orientations.
The differential equations, describing the neuronal behavior, are computed through the Euler method. Note that there is no reset of the neuron activations and weights.
3.2. Circular Response Images.
We use circular response images (see Figure 2) to visualize phase and orientation tuning of the learned complex cells within a single image. The circular test image is a 256 × 256 pixel image (codomain [0, 0.04]), generated with the same Matlab function as used by Berkes and Wiskott (2005). The inner 15 pixel radius of the image around the center is empty (value 0.02), and emanating from the center the image is a set of circular sine waves with a logarithmic decreasing frequency to the borders of the image. The frequency spectrum of the test image lies between
.
The input to each neuron is determined by shifting a patch over the whole test image in 1 pixel steps. White denotes maximal excitation of a neuron from one population to the corresponding part in the test image, and black denotes maximal inhibition. For presentation purposes in small plots, we use discrete rather than continuous gray values. In the resulting images, orientation selectivity can be observed if a neuron responds to only parts of the circles (angular selectivity). Moreover, the excitatory and inhibitory receptive field components can be seen. Furthermore, the simple (resp. complex) cell property can be observed. Phase-invariant complex cells show a smooth activation profile in the radial direction, whereas simple cells are sensitive to the exact phase and thus show oscillations.
Based on the circular response images, the orientation bandwidth of each cell is determined by measuring the angle of the maximal response using a half-maximum criterion to define the border. For model evaluation, the mean orientation bandwidths of all cells with an activity above the value of 20% of the most active cell are considered.
3.3. Relative Modulation Index.
The relative modulation index has been previously applied in electrophysiological studies (De Valois et al., 1982). Complex-cell responses show invariant responses to gratings shifted in phase, and simple cells respond to phase shifts with large oscillations as measured by the ratio of the modulation response (F1) to the mean firing rate (F0) given a grating with optimal frequency and orientation (Skottun et al., 1991; Einhäuser et al., 2002; Johnson, Hawken, & Shapley, 2008; Berkes, Turner, & Sahani, 2009). Relative modulation index values higher than one denote that the neuron has simple-cell characteristics. Lower values denote that the neuron has complex-cell characteristics.
The same Gabor functions as for the preprocessing of the input images are used as test stimuli (codomain [0, 0.04]). The modulation response has been determined through a Fourier analysis, where the modulation response (F1) is the first harmonic of the Fourier-transformed response. The reported relative modulation index values for the models with short and long calcium trace lengths are averaged across 10 different runs.
3.4. Slowness.
One hundred randomly selected natural image sequences, similar to but different from those used for training, containing N = 50 small shifted presentations, are presented to the network, and the neuronal responses r of the V1-Complex layer are recorded. The differences in the response of every neuron i to the previous presentation are calculated and normalized by the mean response of the neuron for presentations of the same sequence (〈…〉N, where N denotes the length of the sequence). The mean of the squares of these normalized response differences is divided by the variance of the normalized neuronal responses from the same sequence. Higher values denote slower changes in the activities than lower values.
3.5. Spatial Response Images.
A further important property of complex cells in the primary visual cortex is that they respond to their preferred stimuli independent of its exact position. Position invariance further generalizes phase invariance, which measures only the sensitivity of phase changes orthogonal to the preferred orientation. The spatial response images used here highlight the spatial regions at which a V1-Complex neuron shows a significant response to a Gabor stimulus. For neurons with simple-cell characteristics, narrower spatial regions are expected. Neurons with complex-cell characteristics should result in larger, smooth spatial regions.
4. Results
The goal of invariance learning is to learn a high degree of selectivity to features that change slowly in the input sequence while becoming only broadly tuned to features that rapidly change in the input. In our design, the model should learn a precise mapping in the orientation domain while establishing a divergent mapping in spatial position. This is nontrivial, since a single patch from natural scenes does not contain only a single preferred orientation but leads to responses of multiple cells with different orientation tuning.
Two model variations, one with a short calcium trace length (τCa,Complex = 10 ms) and the second with a longer trace length (τCa,Complex = 500 ms), are compared to one another. The short trace length is chosen comparable to the time constant of neural dynamics, typically too short to learn temporal correlations between the successive patches.
4.1. Circular Response Images.
Both model variations (using four oriented filters) show high orientation selectivity, with strong inhibition for nonpreferred orientations (see Figures 3 and 4). The mean orientation bandwidths are 20.2 degrees (τCa,Complex = 10 ms) and 26.3 degrees (τCa,Complex = 500 ms), which are close to the orientation bandwidth of the Gabor functions used. The neurons of the models mainly differ in the ability to respond phase invariant for preferred orientations. The model with the longer trace length (see Figure 4) typically responds equally strongly to stimuli of a different phase, whereas the model with the shorter trace length (see Figure 3) shows a more phase-sensitive activation pattern to the preferred orientation stimuli, which can be seen in the oscillations of the activity, namely, an excitatory response to the preferred phase and an inhibitory response to shifted phases (see Figure 5: see also the appendix for the feedforward weight matrices).
Circular response images for every model neuron obtained with τCa,Complex = 10 ms. White denotes maximal excitation of a neuron to the corresponding part in the test image, and black denotes maximal inhibition. While all neurons show a high orientation selectivity and the neuron population as a whole represents all possible edge orientations, most neurons show a phase-sensitive activation pattern to the preferred orientation stimuli.
Circular response images for every model neuron obtained with τCa,Complex = 10 ms. White denotes maximal excitation of a neuron to the corresponding part in the test image, and black denotes maximal inhibition. While all neurons show a high orientation selectivity and the neuron population as a whole represents all possible edge orientations, most neurons show a phase-sensitive activation pattern to the preferred orientation stimuli.
Circular response images for every model neuron obtained with τCa,Complex = 500 ms and four oriented filters in the input. White denotes maximal excitation of a neuron to the corresponding part in the test image, and black denotes maximal inhibition. All neurons have learned a high orientation selectivity while at the same time having learned invariance to phase, as can be seen in the equally strong responses among phase variations in the test image.
Circular response images for every model neuron obtained with τCa,Complex = 500 ms and four oriented filters in the input. White denotes maximal excitation of a neuron to the corresponding part in the test image, and black denotes maximal inhibition. All neurons have learned a high orientation selectivity while at the same time having learned invariance to phase, as can be seen in the equally strong responses among phase variations in the test image.
Circular response images for two example neurons of the simulations with τCa,Complex = 10 ms and τCa,Complex = 500 ms. These images visualize the differences in the phase sensitivity of neuronal responses to the test image. The left neuron shows a high oscillation in the activity to phase variations in the input. White denotes maximal excitation of a neuron from one population to the corresponding part in the test image, and black denotes maximal inhibition.
Circular response images for two example neurons of the simulations with τCa,Complex = 10 ms and τCa,Complex = 500 ms. These images visualize the differences in the phase sensitivity of neuronal responses to the test image. The left neuron shows a high oscillation in the activity to phase variations in the input. White denotes maximal excitation of a neuron from one population to the corresponding part in the test image, and black denotes maximal inhibition.
As a control experiment, a model (τCa,Complex = 500 ms; 64 cells) using eight orientations in the simple-cell input is evaluated (see Figure 6). This model variation still shows a high orientation selectivity (the mean orientation bandwidth is 25.1°). Nearly all cells respond to a single orientation in the input set of eight orientations, and the majority of the cells show phase-invariant behavior for their preferred stimuli.
Circular response images for the 40 most excited or inhibited model neurons obtained with τCa,Complex = 500 ms and eight orientations representing input. White denotes maximal excitation of a neuron to the corresponding part in the test image, and black denotes maximal inhibition. The most neurons have learned a high orientation selectivity while at the same time having learned invariance to phase, as can be seen in the equally strong responses among phase variations in the test image.
Circular response images for the 40 most excited or inhibited model neurons obtained with τCa,Complex = 500 ms and eight orientations representing input. White denotes maximal excitation of a neuron to the corresponding part in the test image, and black denotes maximal inhibition. The most neurons have learned a high orientation selectivity while at the same time having learned invariance to phase, as can be seen in the equally strong responses among phase variations in the test image.
The results of the proposed models are compared to those of SFA (Berkes & Wiskott, 2005) using the simple-cell responses to eight orientations in the input (see Figure 7). SFA also leads to orientation selectivity, but frequently for conjunct orientations (mean orientation bandwidth is 40.8°). However, all functions seem to be phase invariant. Following the unit classification in Berkes and Wiskott (2005), some of the units can be classified as orthogonal inhibited. Also nonorthogonal inhibited units, depending on the mean of the test image, can be found. The slowest units are nonoriented and respond to differences in brightness. The property of orthogonal inhibition can also be observed.
Static circular response images for the 48 slowest functions ascertained with the slow feature analysis (SFA) on simple-cell responses (8 orientations). The majority of basis functions are orientation selective, but frequently to conjunct orientations. However, the slowest functions are phase invariant.
Static circular response images for the 48 slowest functions ascertained with the slow feature analysis (SFA) on simple-cell responses (8 orientations). The majority of basis functions are orientation selective, but frequently to conjunct orientations. However, the slowest functions are phase invariant.
The SFA algorithm has also been applied on simple-cell responses to four orientations as input with less satisfying results, leading to cells selective for multiple orientations. When testing SFA on the raw images similar as in Berkes and Wiskott (2005), the slowest 48 basis functions become more narrowly tuned in orientation, but typically to multiple orientations. Generally the responses show a rich repertoire of properties, including frequency inhibition. Berkes and Wiskott (2005) reported more cells tuned to single orientations, but they tested the model with a motion component in the test image, whereas we report our results using a test image with a moving speed of zero. However, when we tested with non static test images using movement speeds of one to up to four pixels per frame, we obtained equivalent results.
4.2. Relative Modulation Index.
The relative modulation index for the model with the longer trace shows that 93% of the cells can be classified as complex cells, while the shorter trace leads to only 46% complex cells. Switching the input protocol to random patch presentation reveals the importance of a temporally correlated input for the longer trace. The amount of complex cells drops to 40% with the longer trace and remains by 44% complex cells with the shorter trace. Thus, the 46% of complex cells found in the model with the short calcium trace can be explained by fluctuations that occur even in random sequences, while the increase up to 93% of complex cells is due to the calcium trace learning. The reduction of the sequence length in the input (10 consecutive patches) leads to 86% complex cells for the model with longer trace and lesser than with longer sequences but sufficiently more than with random patch presentation or short trace length. Hence, the sequence length of 50 has a marginal influence on the development of complex cells.
4.3. Slowness.
Histogram of the slowness values obtained with τCa,Complex = 10 ms and τCa,Complex = 500 ms. The average slowness of the simulation with the shorter trace is −1.59, with a peak around −2, whereas the simulation with the longer trace has an average slowness of −1.18, with an accumulation at values around −1.
Histogram of the slowness values obtained with τCa,Complex = 10 ms and τCa,Complex = 500 ms. The average slowness of the simulation with the shorter trace is −1.59, with a peak around −2, whereas the simulation with the longer trace has an average slowness of −1.18, with an accumulation at values around −1.
4.4. Spatial Response Images.
The spatial response profile shows clear differences between neurons from the short trace model and neurons from the long trace model. Those of the latter model have more broadly tuned response regions than the neurons from the short trace model. Figure 9 shows exemplary spatial response images for two neurons per model. Each subimage of a neuron shows the spatial responses to phase variations of the preferred stimulus. The complex- and simple-cell characteristics can be seen in the broadness of the response. The model with the longer trace shows much higher invariance to phase shifts of the preferred stimuli than its counterpart.
Spatial response images of four example cells. The first two cells are from the short trace model; the second two are from the longer trace model. Each panel shows the response to the preferred orientation at each spatial position. The different panels vary in the phase of the preferred stimulus. The spatial response images obtained with the model with the longer trace typically show broader response regions and much higher invariance to phase shifts of the preferred stimuli.
Spatial response images of four example cells. The first two cells are from the short trace model; the second two are from the longer trace model. Each panel shows the response to the preferred orientation at each spatial position. The different panels vary in the phase of the preferred stimulus. The spatial response images obtained with the model with the longer trace typically show broader response regions and much higher invariance to phase shifts of the preferred stimuli.
5. Conclusion
Invariance is a general property of the processing in the visual cortex and appears to be of fundamental importance for object recognition. It has been shown that cells in IT respond invariantly to a variety of stimulus transformations (Logothetis & Sheinberg, 1996; Tanaka, 1996). Such properties are not rigidly encoded in the visual system; they are a product of learning and adaption in the visual system (Cox, Meier, Oertelt, & DiCarlo, 2005; Li & DiCarlo, 2010).
Here we demonstrated that invariance can be learned from natural images through a biologically plausible learning algorithm on the level of the primary visual cortex using fixational eye movements to generate input sequences. In our model, V1 complex cells learn largely invariant responses to position and phase while at the same time being selective for orientation. The model with a slowly varying calcium trace develops strong orientation selective cells that are predominantly invariant to phase variations of the stimuli and with behavior that responds slowly to changes in the environment. Their spatial response regions are broader than the response regions of simple cells. The model with a short trace shows no more invariance than a model using a random sequence, indicating that a sufficiently long calcium trace could be a crucial neural correlate for invariance learning. However, the results of these simulations show that around 40% of the cells can be classified as complex cells. This higher amount is consistent with the previously reported results that residual dependencies in input can be enough for learning invariance (Hyvärinen & Hoyer, 2000, 2001). Our results show that the number of invariant cells increases rapidly by using temporal correlations in learning.
The learning of position and phase invariance while keeping selectivity in orientation is not trivial. Previous models have often used strongly simplified inputs in the form of artificial barlike patterns or only single categories such as faces to avoid too many fluctuations in the input. Other recent approaches artificially restricted learning to the most activated neuron (Einhäuser et al., 2002; Spratling, 2005). Despite the large variety of responses in SFA (Berkes & Wiskott, 2005) on our data, SFA leads to less orientation tuning than in our model.
The invariant and more temporally stable and high feature selective representation of the learned complex cells should facilitate the processing in further cortical stages without the loss of important identity information. The loss of the exact retinal position should not be a problem for the visual system, because real-world objects consist of many basic structural elements. Consequently, the proposed model and the learning algorithm are a good basis for the development of a more comprehensive model of the visual system; future work has to demonstrate invariance learning on even more complex inputs and finally at the level of objects.
Appendix: Feedforward Weight Matrices
As supplementary material, the excitatory feedforward connections of six cells obtained from two simulations, using τCa = 500 ms (see Figure 10), respectively τCa = 10 ms (see Figure 11) and four orientations in the input, are presented. The ellipses show the orientation and position of the related subunits (Hoyer & Hyvärinen, 2002). The gray value represents the connection strength, and black denotes the maximum weight value of all cells in the network. For display reasons, two consecutive ellipses are represented by only one ellipse, where the value is determined by the mean of both. The connection patterns obtained using τCa = 500 ms show that each cell is highly orientation selective while being evenly connected to all phases over broad regions in the visual space. In contrast, the connection patterns obtained using τCa = 10 ms lack these even broad connections, resulting in the reported oscillating responses on slight stimulus transformations.
Visualization of the feedforward matrices for six cells, obtained using τCa = 500 ms and four orientations in the input.
Acknowledgments
This work has been supported by the German Science Foundation (DFG HA2630/6-1). We thank Pietro Berkes and Laurenz Wiskott for providing the code to generate the circular response test image and helpful comments to the application of the slow feature analysis on our data set.



























