Abstract

We propose a model for pattern recognition in the insect brain. Departing from a well-known body of knowledge about the insect brain, we investigate which of the potentially present features may be useful to learn input patterns rapidly and in a stable manner. The plasticity underlying pattern recognition is situated in the insect mushroom bodies and requires an error signal to associate the stimulus with a proper response. As a proof of concept, we used our model insect brain to classify the well-known MNIST database of handwritten digits, a popular benchmark for classifiers. We show that the structural organization of the insect brain appears to be suitable for both fast learning of new stimuli and reasonable performance in stationary conditions. Furthermore, it is extremely robust to damage to the brain structures involved in sensory processing. Finally, we suggest that spatiotemporal dynamics can improve the level of confidence in a classification decision. The proposed approach allows testing the effect of hypothesized mechanisms rather than speculating on their benefit for system performance or confidence in its responses.

1.  Introduction

A foraging moth or bee can visit on the order of 100 flowers in a day. During these trips, the colors, shapes, textures, and odors of the flowers are associated with nectar rewards. The association process is dynamical in nature, and the stored information is perpetually updated to the varying conditions (Smith, Wright, & Daly 2005; Mackintosh, 1974; Rescorla, 1988). The machinery involved in this process is designed to learn fast and reliably. Honeybees, in particular, can associate a stimulus with a reward within just two or three paired presentations in control conditions (Wright & Smith, 2004; Menzel & Bitterman, 1983; Bitterman, Menzel, Fietz, & Schäfer, 1983; Smith, Abramson, & Tobin, 1991). This observation led us to investigate what aspects of the structural organization of the insect brain are most suitable for fast, robust, and efficient formation of associative memories while not jeopardizing reasonable performance after a long training time.

The mushroom bodies (MB) are areas of the insect brain that have been shown to be involved in memory formation (Heisenberg, 2003; Dubnau, Chiang, & Tully, 2003; Menzel, 2001; McGuire, Le, & Davis, 2001; Dubnau, Grady, Kitamoto, & Tully, 2001; Zars, Fischer, Schulz, & Heisenberg, 2000; Mizunami, Weibrecht, & Strausfeld, 1998). The MBs are organized in two modules: the calyx/Kenyon cells (KCs) and the mushroom body lobes (Strausfeld, Hansen, Li, Gomez, & Ito, 1998). The calyx receives and integrates multimodal sensory information (Strausfeld et al., 1998; Heisenberg, 2003), and the mushroom body lobes are involved in memory formation and storage (McGuire et al., 2001; Dubnau et al., 2001; Zars et al., 2000). There is a large number of KCs in the MB: 200,000 in cockroach, 170,000 in the honeybee, 50,000 in locust, and 2500 in the fruit fly Drosophila. This large group of neurons sends afferents to the MB lobes, which contain on the order of a few hundred output neurons.

Corresponding to the difference in cell numbers, the connectivity from the antennal lobe (AL) to the KCs of the MB is highly divergent and the connectivity from KCs to output neurons is highly convergent. In Huerta, Nowotny, Garcia-Sanchez, Abarbanel, and Rabinovich (2004) and Nowotny, Huerta, Abarbanel, and Rabinovich (2005), we analyzed the effects of this structural organization on classification of structured sets of generic stimuli. We showed that random divergence and convergence of connectivity, sparse activity of the neurons in the middle layer, the KCs, and standard Hebbian learning can account for efficient, potentially self-organized classification of the input space. Here, we propose to include reinforcement learning to accomplish fast and efficient classification of predefined pattern classes. In order to prove the efficiency of the structural organization of the insect brain for general pattern recognition, we use the well-known MNIST database of handwritten digits (LeCun & Cortes, 1998). This database contains 60,000 training samples and 10,000 test samples of handwritten digits. Many pattern recognition schemes have been tested on this database to benchmark their abilities in different situations. In Garcia-Sanchez and Huerta (2003) and Huerta et al. (2004), the connectivity to the calyx of the MB was chosen randomly, based on the argument that it is not specifically tuned to a given input space. We argued that such nonspecific connectivity should accommodate the large variety of information types in the multimodal input to the MB (visual, olfactory, tactile, and motor activity) with each modality's own very different statistics. Here, we aim to substantiate this claim by testing our artificial MB, which was developed with olfaction in mind, on the unrelated task of classifying the MNIST database.

To evaluate the performance of the artificial MB as a universal classifier rather than the quality of preprocessing, we present each of the digits of the MNIST database to our insect brain as is (see Figure 1) even though smart feature extraction in the images is known to improve classification considerably. We furthermore threshold the gray scales into binary representations to take into account that KC processing of PN activity is likely in the form of coincidence detection of single spikes (Perez-Orive et al., 2002; Assisi, Stopfer, Laurent, & Bazhenov, 2007), which are all-or-none events.

Figure 1:

Diagram of the basic system: 28 × 28 input neurons are randomly connected to NKC= 50,000 Kenyon cells. These are read out by 10 output neurons in the MB lobes. The output neurons are connected to all KCs with fairly homogeneous initial weights. Subsequently these connections are modified by the learning rule described in section 4.4.

Figure 1:

Diagram of the basic system: 28 × 28 input neurons are randomly connected to NKC= 50,000 Kenyon cells. These are read out by 10 output neurons in the MB lobes. The output neurons are connected to all KCs with fairly homogeneous initial weights. Subsequently these connections are modified by the learning rule described in section 4.4.

The fundamental learning rule is characterized by two parameters, p+ and p. We analyze a succession of several increasingly more sophisticated learning mechanisms. In the most basic one, plasticity is triggered only if an input triggers a correct response. In this case, connections from active KCs to the correctly firing output neuron are enhanced with probability p+ corresponding to the imperative for this correct output neuron to respond to the features present in the input in question. Synapses from inactive KCs are decreased with probability p, corresponding to the requirement to disregard features that are not present. This learning rule needs a selective reward signal, which gates plasticity whenever a digit is classified correctly. Only when this signal arrives will connections involved in the response change. The biological basis for the reward signal is found in a class of giant neurons that receive input from the gustatory system (see Figure 1c in Hammer & Menzel, 1998). These neurons release neuromodulators at various locations throughout the insect brain, including the MB, when the honeybee tastes food. Release of octopamine onto the MB replicated the expected behavior observed during odor conditioning (Hammer & Menzel, 1998; Menzel, 2001).

It may be relevant at this point to note that the proposed learning rule is consistent with the recently found STDP-type plasticity of the synapses in question (Cassenaer & Laurent, 2007). Synapses are potentiated if a presynaptic KC spike is followed by a postsynaptic output spike. The depression part can be interpreted as a general decay of inactive synapses or a depression for unpaired postsynaptic spikes (both conditions for which the STDP rule does not make direct predictions). The learning implemented here, however, differs from a simple STDP rule insofar as it is gated by a reward signal.

In an incremental series of refinements, we added a mechanism for adjusting the KC response profiles, a modification of the input layer through populations of on and off cells, an additional negative error signal for incorrectly recognized inputs, and a correlate of the dynamics of the AL. To gain an understanding of the role of each of the ingredients in the process of learning, we introduced them one at a time and compared the advantages in terms of fast and robust performance. As a benchmark and general reference we also compared it to support vector machines (Cortes & Vapnik, 1995), one of the most successful learning machines. SVMs are known to perform outstandingly well in the MNIST database (Burges & Schölkopf, 1997). This comparison is meant to be a guide to judge which of the characteristics of the insect brain model affect which aspect of the performance (e.g., speed, final recognition) in comparison to a standard method.

The letter is organized as follows. First, we describe the structural organization of the system, then we show the effect of different types of learning, we investigate the problem of robustness, and, finally, we provide the SVM performance as a reference.

2.  State of the Art

Probably the most representative biomimetic approach to solving pattern recognition problems is the Rosenblatt perceptron (Rosenblatt, 1962), a three-layered neural network resembling the MBs of insects. Follow-up work on Rosenblatt's perceptron by Kussul and coworkers (Kussul, Baidyk, Kasatkina, & Lukovich, 2001) 40 years later showed how the Rosenblatt network can achieve competitive performance on the MNIST database. Rosenblatt himself was critical of the abilities of his three-layered perceptron because of the required excessive size of the system. With today's powerful computers and the potential offered by massively parallel electronic implementations (Arthur & Boahen, 2007; Indiveri, Chicca, & Douglas, 2006; Vogelstein, Mallik, Culurciello, Cauwenberghs, & Etienne-Cummings, 2007), this is no longer a barrier. The size of the system and the number of neurons may still be important, but it is not a critical issue any more. For example, Kussul and colleagues (2001) emphasize the importance of the size of the middle layer of the Rosenblatt perceptron, but the numbers they propose are easily handled in current PCs at a competitive computational speed.

More recently, approaches trying to replicate the structure of the cortex have proved useful for solving complex pattern recognition problems (Johansson & Lansner, 2006; Peper & Shirazi, 1996; Bartlett & Sejnowski, 1998; Amit & Mascaro, 2001). In these approaches, feature extractors are used, in analogy to the visual cortex, that are associated by means of attractor networks (Hopfield, 1982). To place an attractor network in the cortex might be optimistic, but the common language to all these approaches is the use of Hebbian learning and of inhibition as a way to enhance competitive learning.

It is interesting to note that although all approaches mentioned emphasize the need of local learning rules and smart local feature extraction algorithms in order to be able to parallelize the schemes, the speed of learning is not discussed much. Insects and mammals can learn very quickly compared to the prevalent biomimetic technical systems. One of the aspects we explore here is how a structure similar to the MB of insects allows such quick and stable learning concurrently.

3.  Biological Basis of the Model

While making certain simplifications, our model is firmly based on biological observations, in particular:

  1. Structurally, the organization of information processing in the brain is in layered, feedforward networks (Abeles, 1991; Diesmann, Gewaltig, & Aertsen, 1999; Hertz & Prügel-Bennet, 1996; Câteau & Fukai, 2001; Nowotny & Huerta, 2003). In particular, early sensory processing is typically organized in a layered fan-out, fan-in structure. In the olfactory system of insects, the ratio of the number of neurons in the antennal lobe to the number of KCs in the MB is on the order of 1:10. The KCs send projections to output neurons in the MB lobes, which are a lot less numerous (Strausfeld et al., 1998; Heisenberg, 2003). The original model of Rosenblatt (1962) already had this structure without a direct motivation from neurobiology.

  2. KCs in the MB calyces rarely fire (Perez-Orive et al., 2002; Szyszka, Ditzen, Galkin, Galizia, & Menzel, 2005) but do so reliably whenever there is sufficient coincident input (Wüstenberg et al., 2004). The combination of sparse activity and the apparent absence of intrinsic dynamics in KCs makes McCullough-Pitts neurons a valid approximation for their behavior.

  3. The sparse activity in the Calyx (Perez-Orive et al., 2002; Szyszka et al., 2005) matches the observation that sparse activity is very effective in artificial associative models (Willshaw, Buneman, & Longuet-Higgins, 1969; Marr, 1969; Palm, 1980; Tsodyks & Feigel'man, 1988; Amari, 1989; Buhman, 1989; Vicente & Amit, 1989; Curti, Mongillo, Camera, & Amit, 2004; Itskov & Abbott, 2008; Ranzato & LeCun, 2007; Lee, Chaitanya, & Ng, 2008), which in our mind strengthens the hypothesis that the MB are associative learning machines.

  4. The connectivity between the AL and the MB is probably unspecific. While the connections from the AL to the protocerebrum appear to have a good amount of similarity across individuals, the investigation of connectivity patterns from the AL to the MB has not revealed any clear structure (Masuda-Nakagawa, Tanaka, & O'Kane, 2005; Wong, Wang, & Axel, 2002).

  5. Behavioral studies have localized the plasticity underlying learning (“the memory trace”) predominantly in the MB (Heisenberg, 2003; Dubnau et al., 2001, 2003; Menzel, 2001; McGuire et al., 2001; Zars et al., 2000; Mizunami et al., 1998).

  6. In support of the behavioral evidence, direct electrophysiological evidence for synaptic plasticity, in the form of spike-timing-dependent plasticity (Gerstner, Kempter, van Hemmen, & Wagner, 1996; Markram, Lübke, Frotscher, & Sakmann, 1997; Bi & Poo, 2001), has recently been discovered in the synapses from the Kenyon cells to the output neurons of the MB (Cassenaer & Laurent, 2007).

  7. This type of local synaptic plasticity differs from common global learning methods as, for example, gradient descent and quadratic programming.

  8. Local inhibition, for example, the local GABAergic neurons recently identified in the MB lobes of bees (Schürmann, Frambach, & Elekes, 2008), may underlie competition among output neurons. Mutual inhibition is the most likely mechanism by which the nervous system can select the proper classifier in a multiclass problem (O'Reilly, 2001).

  9. Synaptic changes do not occur in a deterministic manner (Harvey & Svoboda, 2007). Changes in the maximal conductance of individual synapses may best be described by a stochastic process of transitions between discrete states (Abarbanel, Talathi, Gibb, & Rabinovich, 2005). Axons make additional connections to dendrites of other neurons with some probability. Thus, if new synapses are formed to strengthen a connection between two neurons, the more realistic model is a stochastic process as well. Stochastic learning in neural systems has already been proposed (e.g., in Seung, 2003), and an interesting application can be found in birdsong learning (Fiete, Fee, & Seung, 2007).

  10. There are giant neurons that receive gustatory input that release octopamine in the MBs when a stimulus is presented (Hammer & Menzel, 1998; Menzel, 2001). This is the basis of the reinforcement signals in our model.

Our model, as detailed below, is based on these key observations. Departing from this basis, we investigate how refining the information representations, and the learning rules in particular, improves the performance of the system in the MNIST problem. These additions are hypotheses for mechanisms we expect to be found in future experimental studies of insect brains.

4.  System Organization

Following Garcia-Sanchez and Huerta (2003), Huerta et al. (2004), and Nowotny et al. (2005) and as shown in Figure 1, there are four essential elements in our model: (1) a nonlinear expansion from the input digits, x, that resembles the connectivity from the antennal lobe to the MBs; (2) a gain control mechanism in the MB to achieve a uniform level of sparse activity of the KCs, y, (but also see the discussion section in 5.1); (3) a classification phase where the connections from the KCs to the output neurons, z, are modified according to a Hebbian learning rule and mutual inhibition in the output neurons; and (4) an error signal that determines when and which output neuron's synapses are potentiated or depressed.

4.1.  Mushroom Body Projection.

It has been shown in locusts that the activity patterns in the AL are practically time-discretized by a periodic feedforward inhibition onto the MB calyces, and it is well known that the activity levels in KCs are very low (Perez-Orive et al., 2002). Accordingly, the information is represented by time-discrete, sparse activity patterns in the MB in which each KC fires at most once in each 50 ms local field potential oscillation cycle. This time discretization is enhanced by general properties of random feedforward networks (Nowotny & Huerta, 2003) and, potentially, synaptic plasticity (Nowotny, Zhigulin, Selverston, Abarbanel, & Rabinovich, 2003; Cassenaer & Laurent, 2007). Given the representation in discrete activity “snapshots,” we chose simple McCulloch-Pitts neurons (McCulloch & Pitts, 1943) to represent all neurons in our system. The neural activity values taken by this neural model are binary (0= no spike and 1= spike). More explicitly, the McCulloch-Pitts KCs are described by
formula
4.1
where the firing threshold θKC is an integer number and the Heaviside function Θ(·) is unity when its argument is positive and zero when its argument is negative. The vector x is the representation of the MNIST digits, consisting of 28 × 28 pixels. It has dimension NAL = 28 × 28 = 784. The components of the vector x = (x1, x2,…, xNAL) are gray tones in the range from 0 to 255 in the original MNIST database. Considering the transmission of these patterns by coincidence detection on individual spikes (Perez-Orive et al., 2002), we thresholded these values to xi = 0 for xi < 50 and xi = 1 for xi ⩾ 50 (see the images in Figure 2). The binary representation x′ will be used in the following sections. The state vector y of the KC layer is NKC dimensional, and cij ∈ {0, 1} are the components of the connectivity matrix, which is NAL × NKC in size.
Figure 2:

Illustrative example of the MNIST handwritten digits. In the upper row are the original gray-scale digits and in the lower row, the thresholded digits in a {0, 1} representation.

Figure 2:

Illustrative example of the MNIST handwritten digits. In the upper row are the original gray-scale digits and in the lower row, the thresholded digits in a {0, 1} representation.

It is known that the use of features in the input patterns (e.g., based on the topology of digits or on chemical features of odorants) can improve classification performance (Hinton, Dayan, & Revow, 1997; Belongie, Malik, & Puzicha, 2000; Schmuker & Schneider, 2007). We will not exploit this observation here and will stick to the idea of having a naive and unspecific system that is capable of learning in many different sensory spaces. Our system does not preserve the topology of the digits in the projection from the AL onto the KC neurons. Instead, the connectivity matrix cji is determined randomly by independent Bernoulli processes with probability pPN→KC for each cji to be one and 1 − pPN→KC to be zero, following the same approach as in Garcia-Sanchez and Huerta (2003) and Huerta et al. (2004). The existence of nonspecific connectivity is substantiated in Wong et al. (2002), and Masuda-Nakagawa et al. (2005), where PN neurons appear to connect to different locations in the MBs in different individuals. The choice for the degree of connectivity from the input neurons to the KC neurons was guided by the imperative to avoid information loss from the input to the output (Garcia-Sanchez & Huerta, 2003). In the following, we always worked with a connection probability pPN→KC = 0.1.

4.2.  Gain Control in the Mushroom Body.

It has been shown experimentally that the activity levels in the MBs are very sparse (Perez-Orive et al., 2002; Szyszka et al., 2005). Our theoretical work has also demonstrated that sparse coding is advantageous in an unsupervised learning system (Huerta et al., 2004; Nowotny et al., 2005). Sparse activity in the basic system described above is, however, very unstable against fluctuations in the total number of active input neurons (pixels) due to the strong divergence of the connectivity. This almost precludes a sparse activity that is constant across inputs. A potential mechanism to remove this instability is gain control by feedforward inhibition (Nowotny et al., 2005; Assisi et al., 2007). For our purposes, we predetermined a number nKC of simultaneously active KCs, and allowed spikes only in the nKC neurons that receive the most excitation according to equation 4.1. As we will see, this gain control mechanism becomes unnecessary with the introduction of on and off cells because the number of active inputs (pixels) then becomes exactly the same for all inputs. We removed the gain control at this point in our investigation.

4.3.  Mushroom Body Fan-In.

It is known that the MBs are involved in memory formation and storage (McGuire et al., 2001; Dubnau et al., 2001; Zars et al., 2000). In addition, it was recently discovered that the synapses from the KC neurons to the output neurons exhibit spike-timing-dependent plasticity (Cassenaer & Laurent, 2007). Based on this biological evidence, one can extend the model into the MB lobes as
formula
4.2
Here, the index LB denotes the MB LoBes. The output vector z of the MB lobes has dimension NLB, and θLB is the threshold for the decision neurons in the MB lobes. The NKC × NLB connectivity matrix has values that are governed by
formula
4.3
with underlying integer entries . These underlying synaptic strengths wlj are subject to changes during learning according to a Hebbian-type plasticity rule described in the following section. The sigmoid filter, equation 4.3, ensures that the strength of synapses does not grow unbounded and reflects that biological synapses have limited resources that limit their maximal strength.

Every row vector ( for fixed l) of a connectivity matrix defines a hyperplane in the intrinsic KC layer coordinates y. This is the plane normal to . There is a different hyperplane for each MB lobe neuron, and the combinatorial placement of hyperplanes determines the classification space.

We hypothesize that mutual inhibition exists in the MB lobes and, in joint action with Hebbian learning, is able to organize a nonoverlapping response of the decision neurons. This is often considered a neural mechanism for self-organization in the brain. The combination of Hebbian learning and mutual inhibition has already been proposed as a biologically feasible mechanism to account for learning in neural networks (O'Reilly, 2001; Nowotny et al., 2005).

In connectionist models with McCulloch-Pitts neurons, mutual inhibition is implemented in an abstract way. Similar to the gain control noted above, we allow only the decision neuron that receives the highest synaptic input to fire. One could also allow a population code with groups nW simultaneously responsive neurons, a classical nW-winner-take-all configuration.

4.4.  Hebbian and Type I Learning.

The hypothesis of locating Hebbian learning in the mushroom bodies goes back to Montague, Dayan, Person, and Sejnowski (1995). They proposed a predictive form of Hebbian learning that succeeded in explaining the foraging behavior of bees. The problem we are addressing here is related but also different because we are focusing on classification. Nonetheless, the learning mechanism in our model is certainly departing from the existence of giant reward neurons that send massive inputs to many parts of the insect brain, including the MBs (Hammer & Menzel, 1995).

Every digit prototype is associated with an output neuron of the MB that represents this digit. During training, a digit is presented to the AL, which elicits an output response. If the response matches the correct digit, then a reinforcement signal will be sent back to apply Hebbian learning. If the output does not match, no learning will be applied.

The plasticity rule is applied to a naive connectivity matrix in which all entries are randomly and independently chosen as wij ∈ {7500, …, 7502}. The small variation is to ensure that initially, not all neurons have the same inputs and fire together. We tried other random initial distributions and observed differences in learning speed but not in final performance. For example, the standard deviation of the performance in 10 independent trials reached a maximum of 0.041 after about 12,000 inputs, while it was only 6 · 10−4 for the final performance after 7 · 105 inputs for initial connections wlj ∈ {6500, …, 8500}.

The inputs are presented to the system in an arbitrary order. The (unfiltered) entries of the connectivity matrix at the time of the nth input are denoted by wlj(n). If the next input leads to a correct output, the updated values wlj(n + 1), are given by the rule
formula
4.4
where
formula
4.5
where ξ is a uniformly distributed random variable with values in [0, 1] and [x]+ denotes the positive part in x:
formula
4.6
A synaptic connection is strengthened with probability p+ if presynaptic activity is accompanied by postsynaptic activity. The connection is weakened with probability p if postsynaptic activity occurs in absence of presynaptic excitation. In the remaining cases, the synapse remains unaltered. This learning rule is inspired by the original ideas of Hebb (1949) and the observation that postsynaptic activity, and in particular the ensuing changes in intracellular Ca levels, seem to be mandatory for synaptic plasticity to occur (Yang, Tang, & Zucker, 1999; Abarbanel, Gibb, Huerta, & Rabinovich, 2003). The stochastic nature of the learning rule reflects the nondeterministic manner in which synapses are formed (Harvey & Svoboda, 2007).

The protocol of learning is as follows: (1) the input is presented in the AL; (2) one output neuron representing a given class or digit wins the response; (3) if the response is correct, a positive reinforcement signal is sent back, and the learning rule is applied; and (4) if the response is not correct, the system does not receive a reinforcement signal and no plasticity takes place. We name this learning mechanism type I learning.

5.  Results

The system is trained with type I learning on the training set, and then the performance levels shown in Figure 3 are obtained directly on the test set. This is the best way to determine the progression of system performance during training. The most influential and experimentally least constrained parameters in this basic type I learning system are the speed of potentiation, p+, and depression, p, at the synapses from the KCs to the MB outputs. They can be interpreted as the speed of learning to pay attention to some feature (encoded by the activity of a given KC) or to disregard it. We measured the performance of the system on the 10,000-digit MNIST test set at logarithmically increasing intervals with respect to the number of presented training examples. Figure 3A shows examples of performance with respect to training time for nine (p+, p) parameter pairs. The performance is acceptable when p ⩾ 0.2 p+. Within this constraint the exact values do not seem to matter too much for the final performance (see Figure 3B). It is very clear in the figures that the speed of learning is improved by using a deterministic learning rule p+ = 1 for potentiation of connections. The prediction at the experimental level is that positive synaptic changes from the KCs to the MB lobes have to be very effective when concurrent depolarization of intrinsic KCs and output neurons occurs.

Figure 3:

Classification performance after different training stages using type I learning in a system with NKC= 50,000. The performance shown is calculated directly on the 10,000-digit MNIST test set. (A) With an increasing number of inputs seen on the training set, the performance increases for most (p+, p) pairs. However, if p is too large, the performance decreases again after more learned examples. (B) The final performance depends on p+ and p and shows a clear division between successful pairings obeying roughly p ⩾ 0.2 p+ and unsuccessful ones for smaller p. One reason for the premature flattening, or even decrease, of the recognition performance after about 1000 seen digits is that some synapses are overtrained. (C) Distribution of synaptic strengths to the neuron representing digit 0 (as an example). There are two peaks around 0 and the maximal synaptic strength (note the log scale on the y-axis).

Figure 3:

Classification performance after different training stages using type I learning in a system with NKC= 50,000. The performance shown is calculated directly on the 10,000-digit MNIST test set. (A) With an increasing number of inputs seen on the training set, the performance increases for most (p+, p) pairs. However, if p is too large, the performance decreases again after more learned examples. (B) The final performance depends on p+ and p and shows a clear division between successful pairings obeying roughly p ⩾ 0.2 p+ and unsuccessful ones for smaller p. One reason for the premature flattening, or even decrease, of the recognition performance after about 1000 seen digits is that some synapses are overtrained. (C) Distribution of synaptic strengths to the neuron representing digit 0 (as an example). There are two peaks around 0 and the maximal synaptic strength (note the log scale on the y-axis).

One detrimental factor for the performance became apparent from the response profile of KCs. Some KCs respond to almost any input (see Figure 4). This obviously cannot be helpful for discrimination. Similarly, KCs that never fire, a large proportion of the population, cannot contribute. Thus, we introduce a new ingredient to the system such that KC responses are sparse but nonsilent over time. This requirement is consistent with experimental observations in many neural systems, where neurons are seen to adjust their firing threshold or the efficacy of the incoming synaptic input (Desai, Rutherford, & Turrigiano, 1999; Him & Dutia, 2001) to achieve some average target activity level. In practice, we form the AL → MB connectivity in three stages. We first use independent Bernoulli processes to determine an initial connectivity as before. Then we present inputs to the system (without calculating a final output), average the responses of KCs over time, and rescale the synaptic input to KCs that fire to more than fmax = 10% of the inputs or that fail to fire. Inputs of KCs that are too active are scaled down by a factor kdown = 0.9, and synapses of silent KCs are scaled up with kup = 1.1. This process is repeated 25 times, followed by another 25 iterations in which only neurons that fire too often are considered.

Figure 4:

Change of the response characteristic of KCs due to pretraining. (A) Distribution of the number of active KCs without pretraining. (B) Distribution of the number of active KCs with pretraining. Both distributions were calculated on the training set and are shown on a linear scale (main panels) and a logarithmic scale (insets). Note the reduction of the peak at 0 (cells active <2%, 29,081 before and 1533 after pretraining) and the removed fat tail of very responsive cells (cells with more than 30% response, 1579 before and 0 after pretraining).

Figure 4:

Change of the response characteristic of KCs due to pretraining. (A) Distribution of the number of active KCs without pretraining. (B) Distribution of the number of active KCs with pretraining. Both distributions were calculated on the training set and are shown on a linear scale (main panels) and a logarithmic scale (insets). Note the reduction of the peak at 0 (cells active <2%, 29,081 before and 1533 after pretraining) and the removed fat tail of very responsive cells (cells with more than 30% response, 1579 before and 0 after pretraining).

The effect of this pretraining adjustment of AL-MB connectivity on the response profile of the MB is illustrated in Figure 4. The long tail toward very active KC is removed, and the sharp peak at 0 response is strongly attenuated. In the typical example shown in the figure, the number of KCs that are active more than 30% of the time is 1579 before and 0 after pretraining. And the number of KCs that are active less than 2% of the time is 29,081 before and 1533 after pretraining.

The effect of this tuning of KC responses on the performance can be seen in Figure 5 for the dark circles (type I learning without pretraining) and gray circles (type I learning with pretraining). The other curves reflect type II learning, explained below, where the dark triangles are without pretraining and the gray ones use pretraining. Overall, the pretraining process substantially helps system performance.

Figure 5:

Performance of the system with the previous set of features (p+ = 0.5, p = 0.2; black circles), pretrained ALKC connectivity (gray circles), and on and off cells (p+ = 0.2, p = 0.05; diamonds) compared to further modified systems. The introduction of “punishment” in the form of depression of synapses connecting active KC and a wrong active output neuron (type II learning) improves performance for pretrained MBs (p+ = 1, p = 0.05; gray triangles) but not for the fully random connections (black triangles). Results are shown for pPN→KC = 0.1, NKC= 50,000, and θKC = 92, which leads to an average activity level of about 2% to 5% in the MB. Note that the displayed performance is calculated directly on the 10,000-digit MNIST test set.

Figure 5:

Performance of the system with the previous set of features (p+ = 0.5, p = 0.2; black circles), pretrained ALKC connectivity (gray circles), and on and off cells (p+ = 0.2, p = 0.05; diamonds) compared to further modified systems. The introduction of “punishment” in the form of depression of synapses connecting active KC and a wrong active output neuron (type II learning) improves performance for pretrained MBs (p+ = 1, p = 0.05; gray triangles) but not for the fully random connections (black triangles). Results are shown for pPN→KC = 0.1, NKC= 50,000, and θKC = 92, which leads to an average activity level of about 2% to 5% in the MB. Note that the displayed performance is calculated directly on the 10,000-digit MNIST test set.

5.1.  On and Off Cells.

Encoding the MNIST digits digitally as 0 and 1 and analyzing them using equally digital 0- and 1-valued connections may not extract the most information from the input patterns. As an example, there cannot be a KC that specifically fires if certain pixels are not activated. In a sense, we are extracting only the “light” information, not the “dark” parts. Kussul et al. (2001) suggested using representations of +1, − 1 and synapses of +1, − 1 to overcome this deficit. Nature has actually found its own solution to the problem. It is well known that in the visual system, so-called on cells, which respond to light in their receptive field, coexist with off cells, which are particularly active when their receptive field does not contain light (Jones & Palmer, 1987; DeAngelis, Ohzawa, & Freeman, 1993). We are adopting this view here and introduce a second population of input neurons that responds in the opposite way to the original cells (see Figure 6). Connections are then formed to neurons of both populations in the unspecific manner as before.

Figure 6:

Different strategies to conserve or extract more information from the original digit. The gray-scale image can be thresholded to give a representation with values +1 and −1 (Kussul et al., 2001; left), or one can use two populations of neurons that respond inversely to each other loosely equivalent to on and off cells in the visual system (used here, right).

Figure 6:

Different strategies to conserve or extract more information from the original digit. The gray-scale image can be thresholded to give a representation with values +1 and −1 (Kussul et al., 2001; left), or one can use two populations of neurons that respond inversely to each other loosely equivalent to on and off cells in the visual system (used here, right).

As a free benefit of this on-off cell representation, the number of active cells is constant removing the need for gain control. We can (and will) therefore use the more primitive system with a fixed firing threshold θKC, which is chosen as θKC = 92 for pPN→KC = 0.1 and NKC = 50000. In general, we adjust θKC such that typically about 2% to 5% of the KC population is active at any given time, consistent with experimental observations (Perez-Orive et al., 2002; Szyszka et al., 2005).

In Figure 5 the circles represent the previous result, with the 0 and 1 representation and type I reinforcement learning. The diamonds show the performance with on-off cell input. We observe a slight improvement, now reaching about 81% of correct recognition. However, learning takes a slightly higher number of digit presentations to reach significant levels of performance.

5.2.  Type II Reinforcement: Negative Reinforcement.

Rather than only reinforcing synapses that fire the correct output neuron and thus making it more likely to fire, one can, in addition, decrease the efficacy of synapses that caused a wrong output neuron to fire. This form of additional negative reinforcement can be added to our learning rule, equation 4.5, as follows:
formula
5.1
where K is given by
formula
5.2
and the rule is applied whenever output neuron k fires and is not the correct output. The protocol therefore is as follows: (1) the input is presented in the AL; (2) an output neuron representing a given class or digit wins the response; (3) if the response is correct, a positive reinforcement signal is sent back, and the learning rule H in section 4.4 is applied; and (4) if the response is not correct, a negative signal is sent back, and the rule K above is applied to the active output neuron. We name this learning mechanism type II learning.

We expect the additional mechanism of negative reinforcement to help disambiguate confusing inputs. The simulations show that this expectation is correct. The additional negative reinforcement improves the final performance (see Figure 6, triangles). Furthermore, it becomes important only in late stages of the training. At the beginning, the performance is almost unchanged but continues to improve with respect to the previous evolutions.

5.3.  MB Size.

We also reexamined the well-known important role of MB size in system performance. From our earlier work (Garcia-Sanchez & Huerta, 2003; Huerta et al., 2004; Nowotny et al., 2005), we expected it to play an essential role for successful classification. This prediction was confirmed (see Figure 7). The performance of the system depends strongly on the MB size.

Figure 7:

Recognition performance in dependence on the size of the MB. Note that the performance is calculated directly on the 10,000-digit MNIST test set. It depends strictly monotonically on the size, and the most prominent improvements can be seen up to NKC= 10,000, even though the performance is never fully saturated. This graph reflects that larger MB sizes in insects may correlate with more complex behaviors.

Figure 7:

Recognition performance in dependence on the size of the MB. Note that the performance is calculated directly on the 10,000-digit MNIST test set. It depends strictly monotonically on the size, and the most prominent improvements can be seen up to NKC= 10,000, even though the performance is never fully saturated. This graph reflects that larger MB sizes in insects may correlate with more complex behaviors.

Biologically the question of MB size is interesting because it is known that insects have quite different cell numbers in their MBs. Notably honeybees have much larger MBs than Drosophila and, not surprisingly in the light of this investigation, show much more complex behaviors and a larger ability to learn different classes of odors.

5.4.  Robustness.

An important characteristic in biological systems is robustness. The brain can be damaged but still find ways to recover from a malfunction. An inherent property of the structural organization of the insect brain is resilience. The system as described does not depend on specific feature detectors or identified neurons. It is highly parallel, distributed, and redundant. We therefore expect that it is very stable against perturbations, for example, in the form of failing elements (neurons). We tested this hypothesis by randomly removing KCs (see Figure 8, triangles) and input neurons (see Figure 8, diamonds). The system is highly stable against failures of the KCs. A measurable decrease in performance appears only when more than 90% of the KCs are removed. One part of this resilience is that not all KCs are used such that removal does not affect performance. If this was the only effect, one would, however, expect that there are cases where by chance, the important cells are removed, leading to a much worse performance. We do not observe such an effect. The worst and best performance in 10 independent trials of cell removal (shown as error bars in Figure 8) do not vary from the average. We conclude that it is not just luck of removing only unimportant cells that makes the system so robust.

Figure 8:

Robustness of the system against removal of cells. The system remains at peak performance beyond 90% of KC removal (triangles), while it is more sensitive to damaging the input space (diamonds). The performance with KC removal is very similar to the performance of systems that have a smaller MB size from the outset (compare to Figure 7, e.g., 99.8% KC removal corresponds to a MB size of 100 KCs). The error bars mark the worst and best performance seen in 10 independent instances of removed cells. This test was run with pPN→KC = 0.1, on and off cells, no gain control in the MB, and temporally sparse KCs with a target activity of 10% responses to inputs. The learning rule was (see equation 5.2) with p+ = 1 and p = 0.05. Note that the displayed performance is calculated directly on the 10,000-digit MNIST test set after the system was learning from the training set.

Figure 8:

Robustness of the system against removal of cells. The system remains at peak performance beyond 90% of KC removal (triangles), while it is more sensitive to damaging the input space (diamonds). The performance with KC removal is very similar to the performance of systems that have a smaller MB size from the outset (compare to Figure 7, e.g., 99.8% KC removal corresponds to a MB size of 100 KCs). The error bars mark the worst and best performance seen in 10 independent instances of removed cells. This test was run with pPN→KC = 0.1, on and off cells, no gain control in the MB, and temporally sparse KCs with a target activity of 10% responses to inputs. The learning rule was (see equation 5.2) with p+ = 1 and p = 0.05. Note that the displayed performance is calculated directly on the 10,000-digit MNIST test set after the system was learning from the training set.

The impact of removing input cells is more dramatic. The KC firing thresholds have to be adjusted such that the active KCs remain roughly the same for different numbers of input neurons. In Figure 8 (diamonds) we can see that the system shows less resilience to the input space compared to KC removal. This result suggests that we should be able to observe in experiments that the performance of insects degrades faster for AL damage than for calyx removal.

5.5.  Dynamics in the Input Space: Investigating the Role of the Antennal Lobe Spatiotemporal Activity.

It is known that early olfactory processing generates spatiotemporal dynamics (Stopfer, Bhagavan, Smith, & Laurent 1997; Friedrich & Laurent 2001; Rabinovich et al. 2001; Wehr & Laurent 1996; Gelperin 1999; Christensen, Waldrop, & Hildebrand 1998, 2003; Abel, Rybak, & Menzel 2001; Meredith 1986; Wellis, Scott, & Harrison 1989; Lam, Cohen, Wachowiak, & Zochowski 2000). There is still an open debate on whether the dynamics of the odor input is really needed for pattern recognition purposes. To shed some light on this question within our framework, we introduced a correlate of AL dynamics and investigated its additional value for classification. To generate a metaphor for the nonlinear AL dynamics, we transformed the digits with a nonlinear transformation that mapped pixels from the center to the sides. In particular, the gray-level ρ(x, y) is added to ρ′(x′, y′) where
formula
5.3
and is the maximum of the normed distances from the center in x or y direction (width w = 28, h = 28, x = 0…w − 1, y = 0…h − 1). Pixels closest to the center are moved most, while those on the outer boundary stay where they are. The resulting new gray-scales ρ′ are cut off at a maximum of 255 to obtain the same representation as before.

This process was applied once to give the equivalent of a second snapshot of AL activity and again for a third (see Figure 9). Then, during training, these three input patterns were presented in sequence for each input digit, and the learning rule was applied for each of them. During testing, the three patterns were again presented in sequence, and a recognition decision was made based on a voting rule similar to Freund and Schapire (1999).

Figure 9:

Primitive analog of a dynamical transformation in the AL. Digits are stretched toward the edge of the 28 × 28 cell according to equation 5.3. All three representations are learned within the same system, unlike in boosting techniques.

Figure 9:

Primitive analog of a dynamical transformation in the AL. Digits are stretched toward the edge of the 28 × 28 cell according to equation 5.3. All three representations are learned within the same system, unlike in boosting techniques.

We compared how unanimous the recognition for the three snapshots was and how often the resulting classification was correct (see Figure 10). When all three snapshots led to the same recognition decision, the decision was true in 93.4% of the cases. If the three snapshots all gave different results, the success rate was by construction about one-third (by chance only 22.1% in this case). But if the decision was a 2:1 vote, the rate of correct recognition was 73.5%. In other words, if there is a unanimous result, we can be 93.4% sure that we have recognized the digit correctly; if the decision is 2:1, our chance to be right is about 73.5%; and if we are thoroughly undecided, we know that we will be right about one-third of the time. Depending on the importance of the correct recognition, we can therefore adjust our behavior based on this additional confidence measure. We hypothesize based on these results that the temporal dynamics in the AL may be a mechanism to convey a measure of confidence in a decision in addition to other hypothesized functions of decorrelation of very similar inputs (Galán, Sachse, Galizia, & Herz, 2004; Bazhenov, Stopfer, Rabinovich, Huerta, et al., 2001, Bazhenov, Stopfer, Rabinovich, Abarbanel, et al., 2001) or added robustness against uncorrelated noise in the input signal. The existence and importance of neural correlates of confidence have recently been shown in orbitofrontal cortex (Kepecs, Uchida, Zariwala, & Mainen, 2008) where neuronal activity, in addition to carrying the outcome of a decision, also appears to provide a measure of the confidence (risk) in the decision.

Figure 10:

Integration over several temporal snapshots provides additional information on the level of confidence for a classification decision. If only one snapshot is used, the overall success rate is almost the same as when three are used. However, if two or three snapshots are used and the vote on what digit is recognized is split (1:1 for two snapshots or 1:2 and 1:1:1 for three), we have the additional information that the resulting recognition may be less successful than with a unanimous vote. Panels A and B show how often the voting situations appear and how successful recognition is in each of the situations.

Figure 10:

Integration over several temporal snapshots provides additional information on the level of confidence for a classification decision. If only one snapshot is used, the overall success rate is almost the same as when three are used. However, if two or three snapshots are used and the vote on what digit is recognized is split (1:1 for two snapshots or 1:2 and 1:1:1 for three), we have the additional information that the resulting recognition may be less successful than with a unanimous vote. Panels A and B show how often the voting situations appear and how successful recognition is in each of the situations.

A similar idea of voting on the output of several perceptrons to achieve better performance has already been proposed by Freund and Schapire (1999) (named the voted-perceptron algorithm). The combination of this idea with the fact the AL generates spatiotemporal patterns provides a good rationale for the hypothesis of the need of spatiotemporal dynamics in the AL. However, the overall performance does not improve significantly. Rather, one gains additional information for each output on the level of confidence in the outcome based on the split of the vote. Note that this output-by-output confidence measure is somewhat different from the classical notion of a margin that describes the quality of a classifier as a whole.

5.6.  Comparison to Support Vector Machines.

The performance of artificial systems in the MNIST database is already outstanding (LeCun, Bottou, Bengio, & Haffner, 1998; Belongie et al., 2000; Burges & Schölkopf, 1997). It has been suggested that it already surpasses human abilities. Our goal has not been to create another new system for pattern classification but rather to determine what aspects of the structural organization of the insect brain can accomplish better stable performance. Nevertheless, it is interesting to show what this simplified insect brain that uses reinforcement learning can do in comparison with one of the most successful supervised machine learning devices: support vector machines (SVMs) (Burges & Schölkopf, 1997). We compare both systems in similar conditions by using exactly the same number of data for both the artificial MB and the SVM. We also do not preprocess the digits in either and threshold them to binary values in both. We used the standard libsvm (Chang & Lin, 2007) implementation as a well-tested SVM toolbox and classified data of on-off cell patterns, as described in section 5.1. Figure 11 shows the classification results for a polynomial kernel of order 3 that appeared to give the best results for a wide range of parameters of the training algorithm. The best performance obtained was 93% for the binarized input space analyzed in this letter. Be aware that SVMs can achieve over 98% success with the full digits and different types of kernels. With appropriate preprocessing schemes, performances greater than 99% have been achieved.

Figure 11:

Comparison of an SVM trained on partial data sets (squares) and the insect brain after limited number of seen digits (triangles).

Figure 11:

Comparison of an SVM trained on partial data sets (squares) and the insect brain after limited number of seen digits (triangles).

In terms of comparing learning speed, our insect classifier processes each digit only once. Only when more than the 60,000 inputs of the database are processed will digits be repeated. To compare to the partial training with n presented digits, we introduced a subset of n training samples in the SVM training. These were then used to minimize a quadratic problem under constraints as usual for SVMs. In this process, the number of evaluations of each digit in the quadratic optimization problem will certainly be the same but probably much higher than the one-shot learning of the insect classifier. As expected, the SVM performs slightly better because it is a supervised learning scheme. It serves us to obtain a hypothetical upper limit to what the artificial insect brain can do. Surprisingly, the SVM does not perform exceptionally well on the binarized on-off cell input.

6.  Conclusion

We have shown that the structural organization of the MB can account for successful classification in a very well-known problem of handwritten digit recognition. We see that the most important critical parameters are the learning probability p+ and different forms of reinforcement signals as described in Table 1. We also propose a new interpretation of the role of temporal coding in introducing the concept of confidence in the individual recognition decisions.

Table 1:
Summary of the Different Types of Learning Signals Included in the Model of the Mushroom Bodies.
Learning TypeDescription
Type I The input is processed. The output is checked. If it is correct, a reinforcement signal is received, and Hebbian learning is applied to the synapses leading to the active neuron (see section 4.4
Type II Type I plus selective elimination of corrupted synapses by negative reinforcement (see section 5.2
Learning TypeDescription
Type I The input is processed. The output is checked. If it is correct, a reinforcement signal is received, and Hebbian learning is applied to the synapses leading to the active neuron (see section 4.4
Type II Type I plus selective elimination of corrupted synapses by negative reinforcement (see section 5.2

Typically the learning curves have two stages. The first stage is a fast rise of performance up to a saturation value, typically reached after 1000 or fewer input presentations. In the following second stage, performance continues to improve but very slowly. We have shown that the simplest type of reinforcement signal, which we denoted type I reinforcement, leads to sufficiently quick learning, but performance saturates early after only 1000 presentations. This is due to an “overload” of information for some of the synapses. Refinements of the MB response profile, on-off cell representation of the inputs, and selective removal of wrong synapses help to improve the second phase of learning. These refinements of the basic classification system have not been discovered experimentally. However, they are all analogous to known mechanisms in neural systems:

  1. Neurons adjust their excitability analogous to our pretraining.

  2. The notion of a second population of off cells was borrowed from the toolbox of the vision community and is an interesting hypothesis for two reasons: first, the MB is a multimodal integration region and receives visual inputs in a separate population of cells in the bee (Mobbs, 1982; Ehmer & Gronenberg, 2002) but the same population as olfactory input in the ant (Ehmer & Gronenberg, 2004), and, second, the identification of an off cell in the olfactory system is not as simple as in the visual system because it is hard to identify whether a PN is active in response to activity of a certain ORN type or in response to the quiescence of certain ORNs. The fact that intrinsically active ORNs (de Bruyne, Foster, & Carlson, 2001) as well as inhibitory responses of PNs (Laurent, Wehr, & Davidowitz 1996), Figure 1, have been found illustrates that the information transduction in the AL is not a simple excitatory feedforward pathway.

  3. The existence of giant reward cells in the insect brain suggests that different forms of reinforcement may operate in the system.

The methodology that we followed to increase the level of complexity in the learning procedure was parsimonious. We started from the most plausible or simple to the more complex but still biologically possible. We always remained within the limits of what is consistent with the body of knowledge from biological experimentation. One of the most interesting observations we made is that there is a broad range of parameter values of p+, p, of connectivities, and of sparseness that leads to satisfactory results (see, e.g., Figure 3B). This indicates that the structural organization of the mushroom bodies is generically very well suited to support classification with Hebbian-type learning.

In addition to good classification abilities, the organization of the mushroom bodies offers a surprising robustness of performance against damage to the system. Parts of this robustness may be attributed to simple redundancy in large populations of neurons, but resilience beyond 90% damage to the systems seems more than naively expected. We also concluded that the system is more sensitive to damage in the AL than in the MB.

Finally, we proposed a new computational role for temporal dynamics in the neural code. We observed that presenting different activity snapshots over time can convey information on the confidence in the answer of the classifier for each individual input by a simple voting procedure. In general, such voting procedures can be easily implemented by temporal integration and lateral inhibition using the winner-take-all concept. This certainly is a departure from previous suggestions that focused on decorrelation of similar inputs as the main function of temporal dynamics.

More than a decade ago, Montague and collaborators (1995) proposed the need of reinforcement and Hebbian learning in the mushroom bodies of insects. The body of knowledge has grown since then, and now it is possible to acquire multiunit recordings and genetically manipulate the insect brain by a large variety of techniques. Although most of the work in reinforcement learning has been directed toward the mammalian brain (see, e.g., Schultz, Dayan, & Montague, 1997) we see good prospects for finding the neural substrates and mechanisms of reinforcement learning in the insect brain.

Acknowledgments

R.H. acknowledges partial support by ONR N00014-07-1-0741 and CAM S-SEM-0255-2006. T.N. was partially supported by a RCUK fellowship of the Research Councils of the U.K. and by the Biotechnology and Biological Sciences Research Council (grant number BB/F005113/1).

References

Abarbanel
,
H. D. I.
,
Gibb
,
L.
,
Huerta
,
R.
, &
Rabinovich
,
M. I.
(
2003
).
Biophysical model of synaptic plasticity dynamics
.
Biol. Cybern.
,
89
(
3
),
214
226
.
Abarbanel
,
H. D. I.
,
Talathi
,
S.
,
Gibb
,
L.
, &
Rabinovich
,
M.I.
(
2005
).
Synaptic plasticity with discrete state synapses
.
Phys. Rev. E
,
72
,
031914
.
Abel
,
R.
,
Rybak
,
J.
, &
Menzel
,
R.
(
2001
).
Structure and response patterns of olfactory interneurons in the honeybee
Apis mellifera. J. Comp. Neurol.
,
437
,
363
383
.
Abeles
,
M.
(
1991
).
Corticonics: Neural circuits of the cerebral cortex
.
Cambridge
:
Cambridge University Press
.
Amari
,
S.
(
1989
).
Characteristics of sparsely encoded associative memory
.
Neural Networks
,
2
,
451
457
.
Amit
,
Y.
, &
Mascaro
,
M.
(
2001
).
Attractor networks for shape recognition
.
Neural Comput.
,
13
(
6
),
1415
1442
.
Arthur
,
J.
, &
Boahen
,
K.
(
2007
).
Silicon neurons that inhibit to synchronize
.
IEEE International Symposium on Circuits and Systems, 2007
(pp.
1186
1186
).
Piscataway, NJ
:
IEEE Press
.
Assisi
,
C.
,
Stopfer
,
M.
,
Laurent
,
G.
, &
Bazhenov
,
M.
(
2007
).
Adaptive regulation of sparseness by feedforward inhibition
.
Nat. Neurosci.
,
10
(
9
),
1176
1184
.
Bartlett
,
M. S.
, &
Sejnowski
,
T. J.
(
1998
).
Learning viewpoint-invariant face representations from visual experience in an attractor network
.
Network
,
9
(
3
),
399
417
.
Bazhenov
,
M.
,
Stopfer
,
M.
,
Rabinovich
,
M. I.
,
Abarbanel
,
H. D. I.
,
Sejnowski
,
T.
, &
Laurent
,
G.
(
2001
).
Model of cellular and network mechanisms for odor-evoked temporal patterning in the locust antennal lobe
.
Neuron
,
30
,
569
581
.
Bazhenov
,
M.
,
Stopfer
,
M.
,
Rabinovich
,
M. I.
,
Huerta
,
R.
, &
Abarbanel
,
H. D. I.
,
Sejnowski
,
T.
, et al
(
2001
).
Model of transient oscillatory synchronization in the locust antennal lobe
.
Neuron
,
30
,
553
567
.
Belongie
,
S.
,
Malik
,
J.
, &
Puzicha
,
J.
(
2000
).
Shape context: A new descriptor for shape matching and object recognition
In
T. K. Leen, T. G. Dietterich, & V. Tresp.
(Eds.),
Advances in neural information processing systems
,
13
(pp.
831
837
).
Cambridge, MA
:
MIT Press
.
Bi
,
G.
, &
Poo
,
M.
(
2001
).
Synaptic modification by correlated activity: Hebb's postulate revisited
.
Annu. Rev. Neurosci.
,
24
,
139
166
.
Bitterman
,
M. E.
,
Menzel
,
R.
,
Fietz
,
A.
, &
Schäfer
,
S.
(
1983
).
Classical conditioning of proboscis extension in honeybees (Apis mellifera)
.
J. Comp. Psychol.
,
97
(
2
),
107
119
.
Buhman
,
J.
(
1989
).
Oscillations and low firing rates in associative memory neural networks
.
Phys. Rev. A
,
40
(
7
),
4145
4148
.
Burges
,
C.
, &
Schölkopf
,
B.
(
1997
).
Improving the accuracy and speed of support vector machines
. In
D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo
(Eds.),
Advances in neural information processing systems
,
9
.
Cambridge, MA
:
MIT Press
.
Cassenaer
,
S.
, &
Laurent
,
G.
(
2007
).
Hebbian STDP in mushroom bodies facilitates the synchronous flow of olfactory information in locusts
.
Nature
,
448
(
7154
),
709
713
.
Câteau
,
H.
, &
Fukai
,
T.
(
2001
).
Fokker-Planck approach to the pulse packet propagation in synfire chain
.
Neural Netw.
,
14
(
6–7
),
675
685
.
Chang
,
C.
, &
Lin
,
C.J.
(
2007
).
LIBSVM—A library for support vector machines
, .
Christensen
,
T. A.
,
Lei
,
H.
, &
Hildebrand
,
J. G.
(
2003
).
Coordination of central odor representations through transient, non-oscillatory synchronization of glomerular output neurons
.
P. Natl. Acad. Sci. U.S.A.
,
100
,
11076
11081
.
Christensen
,
T. A.
,
Waldrop
,
B. R.
, &
Hildebrand
,
J. G
(
1998
).
Multitasking in the olfactory system: Context-dependent responses to odors reveal dual GABA-regulated coding mechanisms in single olfactory projection neurons
.
J. Neurosci.
,
18
(
15
),
5999
6008
.
Cortes
,
C.
, &
Vapnik
,
V
(
1995
).
Support vector networks
.
Mach. Learn.
,
20
,
273
297
.
Curti
,
E.
,
Mongillo
,
G.
,
Camera
,
G. L.
, &
Amit
,
D. J.
(
2004
).
Mean field and capacity in realistic networks of spiking neurons storing sparsely coded random memories
.
Neural Comput.
,
16
(
12
),
2597
2637
.
DeAngelis
,
G. C.
,
Ohzawa
,
I.
, &
Freeman
,
R. D.
(
1993
).
Spatiotemporal organization of simple-cell receptive fields in the cat's striate cortex. I. General characteristics and postnatal development
.
J. Neurophysiol.
,
69
(
4
),
1091
1117
.
de Bruyne
,
M.
,
Foster
,
K.
, &
Carlson
,
J. R.
(
2001
).
Odor coding in th
.
Drosophila antenna. Neuron
,
30
,
537
552
.
Desai
,
N. S.
,
Rutherford
,
L. C.
, &
Turrigiano
,
G. G.
(
1999
).
Plasticity in the intrinsic excitability of cortical pyramidal neurons
.
Nat. Neurosci.
,
2
(
6
),
515
520
.
Diesmann
,
M.
,
Gewaltig
,
M. O.
, &
Aertsen
,
A.
(
1999
).
Stable propagation of synchronous spiking in cortical neural networks
.
Nature
,
402
(
6761
),
529
533
.
Dubnau
,
J.
,
Chiang
,
A.-S.
, &
Tully
,
T.
(
2003
).
Neural substrates of memory: From synapse to system
.
J. Neurobiol.
,
54
(
1
),
238
253
.
Dubnau
,
J.
,
Grady
,
L.
,
Kitamoto
,
T.
, &
Tully
,
T.
(
2001
).
Disruption of neurotransmission in Drosophila mushroom body blocks retrieval but not acquisition of memory
.
Nature
,
411
(
6836
),
476
480
.
Ehmer
,
B.
, &
Gronenberg
,
W.
(
2002
).
Segregation of visual input to the mushroom bodies in the honeybee (Apis mellifera)
.
J. Comp. Neurol.
,
451
,
362
373
.
Ehmer
,
B.
, &
Gronenberg
,
W.
(
2004
).
Mushroom body volumes and visual interneurons in ants: Comparison between sexes and castes
.
J. Comp. Neurol.
,
469
,
198
213
.
Fiete
,
I. R.
,
Fee
,
M. S.
, &
Seung
,
H. S.
(
2007
).
Model of birdsong learning based on gradient estimation by dynamic perturbation of neural conductances
.
J. Neurophysiol.
,
98
(
4
),
2038
2057
.
Freund
,
Y.
, &
Schapire
,
R. E.
(
1999
).
Large margin classification using the perceptron algorithm
.
Machine Learning
,
37
(
3
),
277
296
.
Friedrich
,
R. W.
, &
Laurent
,
G.
(
2001
).
Dynamic optimization of odor representations by slow temporal patterning of mitral cell activity
.
Science
,
291
,
889
894
.
Galán
,
R. F.
,
Sachse
,
S.
,
Galizia
,
C. G.
, &
Herz
,
A. V. M.
(
2004
).
Odor-driven attractor dynamics in the antennal lobe allow for simple and rapid olfactory pattern classification
.
Neural Comput.
,
16
,
999
1012
.
Garcia-Sanchez
,
M.
, &
Huerta
,
R.
(
2003
).
Design parameters of the fan-out phase of sensory systems
.
J. Comput. Neurosci.
,
15
,
5
17
.
Gelperin
,
A.
(
1999
).
Oscillatory dynamics and information processing in olfactory systems
.
Exp. Biol.
,
202
,
1855
1864
.
Gerstner
,
W.
,
Kempter
,
R.
,
van Hemmen
,
J. L.
, &
Wagner
,
H.
(
1996
).
A neuronal learning rule for sub-millisecond temporal coding
.
Nature
,
383
(
6595
),
76
81
.
Hammer
,
M.
, &
Menzel
,
R.
(
1995
).
Learning and memory in the honeybee
.
J. Neurosci.
,
15
(
3 Pt. 1
),
1617
1630
.
Hammer
,
M.
, &
Menzel
,
R.
(
1998
).
Multiple sites of associative odor learning as revealed by local brain microinjections of octopamine in honeybees
.
Learn. Mem.
,
5
,
146
156
.
Harvey
,
C. D.
, &
Svoboda
,
K.
(
2007
).
Locally dynamic synaptic learning rules in pyramidal neuron dendrites
.
Nature
,
450
(
7173
),
1195
1200
.
Hebb
,
D.
(
1949
).
The organization of behavior
.
New York
:
Wiley
.
Heisenberg
,
M.
(
2003
).
Mushroom body memoir: From maps to models
.
Nat. Rev. Neurosci.
,
4
(
4
),
266
275
.
Hertz
,
J.
, &
Prügel-Bennett
,
A.
(
1996
).
Learning short synfire chains by self-organization
.
Network-Comput. Neural
,
7
,
357
363
.
Him
,
A.
, &
Dutia
,
M. B.
(
2001
).
Intrinsic excitability changes in vestibular nucleus neurons after unilateral deafferentation
.
Brain Res.
,
908
(
1
),
58
66
.
Hinton
,
G. E.
,
Dayan
,
P.
, &
Revow
,
M.
(
1997
).
Modeling the manifolds of images of handwritten digits
.
IEEE Trans. Neural Netw.
,
8
(
1
),
65
74
.
Hopfield
,
J. J.
(
1982
).
Neural networks and physical systems with emergent collective computational abilities
.
Proc. Natl. Acad. Sci. U.S.A.
,
79
(
8
),
2554
2558
.
Huerta
,
R.
,
Nowotny
,
T.
,
Garcia-Sanchez
,
M.
,
Abarbanel
,
H. D. I.
, &
Rabinovich
,
M. I.
(
2004
).
Learning classification in the olfactory system of insects
.
Neural Comput.
,
16
,
1601
1640
.
Indiveri
,
G.
,
Chicca
,
E.
, &
Douglas
,
R.
(
2006
).
A VLSI array of low-power spiking neurons and bistable synapses with spike-timing dependent plasticity
.
IEEE Trans. Neural Netw.
,
17
(
1
),
211
221
.
Itskov
,
V.
, &
Abbott
,
L.
(
2008
).
Pattern capacity of a perceptron for sparse discrimination
.
Phys. Rev. Lett.
,
101
,
018101
.
Johansson
,
C.
, &
Lansner
,
A.
(
2006
).
A hierarchical brain inspired computing system
. In
Proc. International Symposium on Nonlinear Theory and Its Applications NOLTA06
(pp.
599
602
). N.p.
Jones
,
J. P.
, &
Palmer
,
L. A.
(
1987
).
An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex
.
J. Neurophysiol.
,
58
(
6
),
1233
1258
.
Kepecs
,
A.
,
Uchida
,
N.
,
Zariwala
,
H. A.
, &
Mainen
,
Z. F.
(
2008
).
Neural correlates, computation and behavioural impact of decision confidence
.
Nature
,
455
(
7210
),
227
331
.
Kussul
,
E.
,
Baidyk
,
T.
,
Kasatkina
,
L.
, &
Lukovich
,
V.
(
2001
).
Rosenblatt perceptrons for handwritten digit recognition
.
Neural Networks, 2001. Proceedings. IJCNN '01. International Joint Conference on
(Vol.
2
, pp.
1516
1520
).
Piscataway, NJ
:
IEEE Press
.
Lam
,
Y.-W.
,
Cohen
,
L. B.
,
Wachowiak
,
M.
, &
Zochowski
,
M. R.
(
2000
).
Odors elicit three different oscillations in the turtle olfactory bulb
.
J. Neurosci.
,
20
,
749
762
.
Laurent
,
G.
,
Wehr
,
M.
, &
Davidowitz
,
H.
(
1996
).
Temporal representations of odors in an olfactory network
.
J. Neurosci.
,
16
,
3837
3847
.
LeCun
,
Y.
,
Bottou
,
L.
,
Bengio
,
Y.
, &
Haffner
,
P.
(
1998
).
Gradient-based learning applied to document recognition
.
Proc. IEEE
,
86
(
11
),
2278
2324
.
LeCun
,
Y.
, &
Cortes
,
C.
(
1998
).
MNIST database
. .
Lee
,
H.
,
Chaitanya
,
E.
, &
Ng
,
A.Y.
(
2008
).
Sparse deep belief net model for visual area V2
. In
J. C. Platt, D. Koller, Y. Singer, & J. Platt
(Eds.),
Advances in neural information processing systems
,
20
.
Cambridge, MA
:
MIT Press
.
Mackintosh
,
N. J.
(
1974
).
The psychology of animal learning
.
Orlando, FL
:
Academic Press
.
Markram
,
H.
,
Lübke
,
J.
,
Frotscher
,
M.
, &
Sakmann
,
B.
(
1997
).
Regulation of synaptic efficacy by coincidence of postsynaptic APs and EPSPs
.
Science
,
275
,
213
215
.
Marr
,
D.
(
1969
).
A theory of cerebellar cortex
.
J. Physiol.
,
202
,
437
470
.
Masuda Nakagawa
,
L. M.
,
Tanaka
,
N. K.
, &
O'Kane
,
C. J.
(
2005
).
Stereotypic and random patterns of connectivity in the larval mushroom body calyx of Drosophila
.
Proc. Natl. Acad. Sci. U.S.A.
,
102
(
52
),
19027
19032
.
McCulloch
,
W. S.
, &
Pitts
,
W.
(
1943
).
Logical calculus of ideas immanent in nervous activity
.
B. Math. Biophys.
,
5
,
115
133
.
McGuire
,
S. E.
,
Le
,
P. T.
, &
Davis
,
R. L.
(
2001
).
The role of Drosophila mushroom body signaling in olfactory memory
.
Science
,
293
(
5533
),
1330
1333
.
Menzel
,
R.
(
2001
).
Searching for the memory trace in a mini-brain, the honeybee
.
Learn Mem.
,
8
(
2
),
53
62
.
Menzel
,
R.
, &
Bitterman
,
M. E.
(
1983
).
Learning by honeybees in an unnatural situation
In
F. Huber & H. Markl
(Eds.),
Neuroethology and behavioral physiology
(pp.
206
215
).
New York
:
Springer
.
Meredith
,
M.
(
1986
).
Patterned response to odor in mammalian olfactory bulb: The influence of intensity
.
J. Neurophysiol.
,
56
(
3
),
572
597
.
Mizunami
,
M.
,
Weibrecht
,
J. M.
, &
Strausfeld
,
N. J.
(
1998
).
Mushroom bodies of the cockroach: Their participation in place memory
.
J. Comp. Neurol.
,
402
(
4
),
520
537
.
Mobbs
,
P. G.
(
1982
).
The brain of the honeybee Apis mellifera. I. The connections and spatial organization of the mushroom bodies
.
Philos. Trans. R. Soc. Lond. Biol.
,
289
,
309
354
.
Montague
,
P. R.
,
Dayan
,
P.
,
Person
,
C.
, &
Sejnowski
,
T. J.
(
1995
).
Bee foraging in uncertain environments using predictive Hebbian learning
.
Nature
,
377
(
6551
),
725
728
.
Nowotny
,
T.
, &
Huerta
,
R.
(
2003
).
Explaining synchrony in feedforward networks: Are McCulloch-Pitts neurons good enough
.
Biol. Cyber.
,
89
,
237
241
.
Nowotny
,
T.
,
Huerta
,
R.
,
Abarbanel
,
H. D. I.
, &
Rabinovich
,
M. I.
(
2005
).
Self-organization in the olfactory system: Rapid odor recognition in insects
.
Biol. Cyber.
,
93
,
436
446
.
Nowotny
,
T.
,
Zhigulin
,
V. P.
,
Selverston
,
A. I.
,
Abarbanel
,
H. D. I.
, &
Rabinovich
,
M. I.
(
2003
).
Enhancement of synchronization in a hybrid neural circuit by spike timing dependent plasticity
.
J. Neurosci.
,
23
,
9776
9785
.
O'Reilly
,
R. C.
(
2001
).
Generalization in interactive networks: The benefits of inhibitory competition and Hebbian learning
.
Neural Comput.
,
13
(
6
),
1199
1241
.
Palm
,
G.
(
1980
).
On associative memory
.
Biol. Cybern.
,
36
(
1
),
19
31
.
Peper
,
F.
, &
Shirazi
,
M. N.
(
1996
).
A categorizing associative memory using an adaptive classifier and sparse coding
.
IEEE Transactions on Neural Networks
,
7
,
669
675
.
Perez-Orive
,
J.
,
Mazor
,
O.
,
Turner
,
G. C.
,
Cassenaer
,
S.
,
Wilson
,
R. I.
, &
Laurent
,
G.
(
2002
).
Oscillations and sparsening of odor representations in the mushroom body
.
Science
,
297
(
5580
),
359
365
.
Rabinovich
,
M.
,
Volkovskii
,
A.
,
Lecanda
,
P.
,
Huerta
,
R.
,
Abarbanel
,
H. D. I.
, &
Laurent
,
G.
(
2001
).
Dynamical encoding by networks of competing neuron groups: Winnerless competition
.
Phys. Rev. Lett.
,
87
,
068102
Ranzato
,
M.
, &
LeCun
,
Y.
(
2007
).
A sparse and locally shift invariant feature extractor applied to document images
. In
Proceedings of the International Conference on Document Analysis and Recognition (ICDAR)
.
Piscataway, NJ
:
IEEE Press
.
Rescorla
,
R. A.
(
1988
).
Behavioral studies of Pavlovian conditioning
.
Annu. Rev. Neurosci.
,
11
,
329
352
.
Rosenblatt
,
F.
(
1962
).
Principles of neurodynamics
.
New York
:
Spartan Books
.
Schmuker
,
M.
, &
Schneider
,
G.
(
2007
).
Processing and classification of chemical data inspired by insect olfaction
.
Proc. Natl. Acad. Sci. U.S.A.
,
104
(
51
),
20285
20289
.
Schultz
,
W.
,
Dayan
,
P.
, &
Montague
,
P. R.
(
1997
).
A neural substrate of prediction and reward
.
Science
,
275
(
5306
),
1593
1599
.
Schürmann
,
F.-W.
,
Frambach
,
I.
, &
Elekes
,
K.
(
2008
).
GABAergic synaptic connections in mushroom bodies of insect brains
.
Acta Biol. Hung.
,
59
(
Supp.
),
173
181
.
Seung
,
H. S.
(
2003
).
Learning in spiking neural networks by reinforcement of stochastic synaptic transmission
.
Neuron
,
40
,
1063
1073
.
Smith
,
B. H.
,
Abramson
,
C. I.
, &
Tobin
,
T. R.
(
1991
).
Conditional withholding of proboscis extension in honeybees (Apis mellifera) during discriminative punishment
.
J. Comp. Psychol.
,
105
(
4
),
345
356
.
Smith
,
B. H.
,
Wright
,
G. A.
, &
Daly
,
K. C.
(
2005
).
Learning-based recognition and discrimination of floral odors
. In
N. Dudareva, & E. Pichersky
(Eds.),
Biology of floral scent
(pp.
263
295
).
Boca Raton, FL
:
CRC Press
.
Stopfer
,
M.
,
Bhagavan
,
S.
,
Smith
,
B. H.
, &
Laurent
,
G.
(
1997
).
Impaired odour discrimination on desynchronization of odour-encoding neural assemblies
.
Nature
,
390
(
6655
),
70
74
.
Strausfeld
,
N. J.
,
Hansen
,
L.
,
Li
,
Y.
,
Gomez
,
R. S.
, &
Ito
,
K.
(
1998
).
Evolution, discovery, and interpretations of arthropod mushroom bodies
.
Learn. Mem.
,
5
(
1–2
),
11
37
.
Szyszka
,
P.
,
Ditzen
,
M.
,
Galkin
,
A.
,
Galizia
,
C. G.
, &
Menzel
,
R.
(
2005
).
Sparsening and temporal sharpening of olfactory representations in the honeybee mushroom bodies
.
J. Neurophysiol.
,
94
(
5
),
3303
3313
.
Tsodyks
,
M. V.
, &
Feigel'man
,
M. V.
(
1988
).
The enhanced storage capacity in neural networks with low activity level
.
Europhys. Lett.
,
6
,
101
105
.
Vicente
,
C. J. P.
, &
Amit
,
D. J.
(
1989
).
Optimised network for sparsely coded patterns
.
J. Phys. A.
,
22
(
5
),
559
569
.
Vogelstein
,
R. J.
,
Mallik
,
U.
,
Culurciello
,
E.
,
Cauwenberghs
,
G.
, &
Etienne-Cummings
,
R.
(
2007
).
A multichip neuromorphic system for spike-based visual information processing
.
Neural Comput.
,
19
(
9
),
2281
2300
.
Wehr
,
M.
, &
Laurent
,
G.
(
1996
).
Odor encoding by temporal sequences of firing in oscillating neural assemblies
.
Nature
,
384
,
162
166
.
Wellis
,
D. P.
,
Scott
,
J. W.
, &
Harrison
,
T. A.
(
1989
).
Discrimination among odorants by single neurons of the rat olfactory bulb
.
J. Neurophysiol.
,
61
(
6
),
1161
1177
.
Willshaw
,
D. J.
,
Buneman
,
O. P.
, &
Longuet-Higgins
,
H. C.
(
1969
).
Nonholographic associative memory
.
Nature
,
222
(
5197
),
960
962
.
Wong
,
A. M.
,
Wang
,
J. W.
, &
Axel
,
R.
(
2002
).
Spatial representation of the glomerular map in the Drosophilaprotocerebrum
.
Cell
,
109
,
229
241
.
Wright
,
G. A.
, &
Smith
,
B. H.
(
2004
).
Different thresholds for detection and discrimination of odors in the honey bee (Apis mellifera)
.
Chem. Senses
,
29
(
2
),
127
135
.
Wüstenberg
,
D. G.
,
Boytcheva
,
M.
,
Grünewald
,
B.
,
Byrne
,
J. H.
,
Menzel
,
R.
, &
Baxter
,
D. A.
(
2004
).
Current- and voltage-clamp recordings and computer simulations of Kenyon cells in the honeybee
.
J. Neurophysiol.
,
92
,
2589
2603
.
Yang
,
S. N.
,
Tang
,
Y. G.
, &
Zucker
,
R. S.
(
1999
).
Selective induction of LTP and LTD by postsynaptic [Ca2+]i elevation
.
J. Neurophysiol.
,
81
(
2
),
781
787
.
Zars
,
T.
,
Fischer
,
M.
,
Schulz
,
R.
, &
Heisenberg
,
M.
(
2000
).
Localization of a short-term memory in Drosophila
.
Science
,
288
(
5466
),
672
675
.