Abstract

The model organism, Drosophila melanogaster, and the mosquito Anopheles gambiae use 60 and 79 odorant receptors, respectively, to sense their olfactory world. However, a commercial “electronic nose” in the form of an insect olfactory biosensor demands very low numbers of receptors at its front end of detection due to the difficulties of receptor/sensor integration and functionalization. In this letter, we demonstrate how computation via artificial neural networks (ANNs), in the form of multilayer perceptrons (MLPs), can be successfully incorporated as the signal processing back end of the biosensor to drastically reduce the number of receptors to three while still retaining 100% performance of odorant detection to that of a full complement of receptors. In addition, we provide a detailed performance comparison between D. melanogaster and A. gambiae odorant receptors and demonstrate that A. gambiae receptors provide superior olfaction detection performance over D. melanogaster for very low receptor numbers. The results from this study present the possibility of using the computation of MLPs to discover ideal biological olfactory receptors for an olfactory biosensor device to provide maximum classification performance of unknown odorants.

1  Introduction

There is an inherent need for fast and reliable odor recognition techniques. Current sensors are not always reliable, and with human testers there is the possibility of inconsistent sample evaluation (Baldwin, Bai, Plotto, & Dea, 2011). An olfactory biosensor based on the olfactory system of model organisms can provide an alternative to current odor recognition techniques (Baldwin et al., 2011). In this letter, we aim to use artificial neural networks (ANNs) in the form of a multilayer perceptron (MLP) as a signal processing back end of an olfactory biosensor. We use the MLP to compare and contrast the performance of Anopheles gambiae odorant receptors (AgOrs) to Drosophila melanogaster odorant receptors (DmOrs) as the sensory array system of the biosensor to identify the chemical class that an odorant belongs to. In our previous work, we have demonstrated that ANNs can be used for discriminating classes of odorants using chemical descriptor values (Bachtiar, Unsworth, Newcomb, & Crampin, 2011a) and from DmOr responses (Bachtiar, Unsworth, Newcomb, & Crampin, 2011b; Bachtiar, Unsworth, Newcomb, & Crampin, 2013; Unsworth, Bachtiar, Newcomb, & Crampin, 2011). In this work, we investigate the application of such techniques using the AgOr data.

Though it has proven to be a difficult insect for electrophysiological studies (van den Broek & den Otter, 1999), the mosquito A. gambiae is increasingly studied in insect olfaction because of its role as a malaria vector (Couto, Alenius, & Dickson, 2005). The fruit fly D. melanogaster is a model organism and more widely used to study insect olfaction (Beshel & Zhong, 2013; Carey, Wang, Su, Zwiebel, & Carlson, 2010; Carlson, 1996; Clyne, Grant, O’Connell, & Carlson, 1997; de Bruyne, Foster, & Carlson, 2001). One particular advantage for using the D. melanogaster model is the ability to perform behavioral assays and in vivo physiological recordings (Anderson, Michalski, Michalski, Carbonell, & Mitchell, 1986; Beshel & Zhong, 2013; Carlson, 1996; de Bruyne et al., 2001; Dobritsa, van der Goes van Naters, Wynand, Warr, Steinbrecht, & Carlson, 2003; Hallem & Carlson, 2006; Leal, 2013). In this letter, we aim to compare and contrast the odorant classification performance of the MLP system when applied to the odor responses of A. gambiae and D. melanogaster.

For a commercial biosensor, minimizing the number of receptors for low-cost production would be desirable (Carlson, 1996). In this study, we also investigate the effect of reducing the number of receptors used for processing, which will allow us to identify certain AgOrs and DmOrs that may be particularly informative (Bachtiar et al., 2013). We aim to reduce the number of receptors used in the biosensor while maintaining a satisfactory degree of odorant classification performance. Nowotny, Berna, Binions, and Trowell (2013), in particular, found that larger sensory arrays used for odor identification may not necessarily produce the best results. By identifying combinations of insect odorant receptors (Ors) that perform well in combination, we present the possibility of using an efficiently minimized sensory array together with an MLP system as a signal processing back end for the development of an effective olfactory biosensor.

1.1  The Olfactory System of Anopheles gambiae

The olfactory system of A. gambiae resides within the antennae and maxillary palps, which are populated by sensilla that house the olfactory receptor neurons (ORNs; see Figure 1). The ORNs encode qualitative, quantitative, temporal, and spatial information about the odors in the environment (Su, Menuz, & Carlson, 2009). Many behaviors of A. gambiae are mediated by olfaction (Zwiebel & Takken, 2004), including host seeking and feeding. Olfactory signals are recognized as the most important group of external stimuli affecting mosquito host-seeking behavior (Takken & Knols, 1999; Zwiebel & Takken, 2004).

Figure 1:

Illustration of the Anopheles gambiae head. The antennae and maxillary palps are involved in odorant sensing (Inscent, 2006). The magnification of the maxillary palps shows how their surface is covered with hair-like sensilla (Benton, Sachse, Michnick, & Vosshall, 2006).

Figure 1:

Illustration of the Anopheles gambiae head. The antennae and maxillary palps are involved in odorant sensing (Inscent, 2006). The magnification of the maxillary palps shows how their surface is covered with hair-like sensilla (Benton, Sachse, Michnick, & Vosshall, 2006).

Many complex biochemical and electrophysiological processes make up olfactory perception. Odorant detection begins with the volatile odorant molecules entering the sensilla through numerous pores or slits. The odorant then binds to an odorant binding protein, which produces a solubilized complex that is transported to Ors on the surface of ORN dendrites, which extend into the sensillar lymph (Leal, 2013). Insect Ors contain a seven-transmembrane-domain subunit that determines their odorant specificity and a ubiquitous subunit called the odorant receptor coreceptor (Orco). Recent studies have shown that the Or-Orco complex acts as an odorant-gated ion channel (Sato et al., 2008; Smart et al., 2008; Wicher et al., 2008). On activation by an odorant stimulus, the channel opens, which leads to changes in the cell membrane potential. Odor information is then transmitted in the form of electrical signals to the antennal lobe of the brain, where it is processed and then transmitted to higher brain centres (Wilson, 2013).

Activation of different subsets of Ors by odor molecules is the basis for neural coding (Hallem, Ho, & Carlson, 2004; Leal, 2013; Zwiebel & Takken, 2004). Electrophysiological studies have found that different classes of ORN are functionally distinct, exhibiting characteristic spontaneous firing rates and odorant specificities. ORNs typically express a single Or. The functional properties of Ors have been studied using a mutant D. melanogaster “empty” ORN lacking a functional receptor. An Or is ectopically expressed in this empty neuron using recombinant gene technology, and the responses exhibited by this receptor are then measured electrophysiologically (Carey et al., 2010; Carey & Carlson, 2011; Clyne et al., 1997; de Bruyne, Clyne, & Carlson, 1999; de Bruyne et al., 2001; Dobritsa et al., 2003; Hallem et al., 2004; Lu et al., 2007), These studies have shown that the odorant specificities of ORNs are conferred by the Or expressed in each ORN.

Figure 2 presents a schematic of how an olfactory biosensor can potentially emulate the olfactory processes of an insect. In a model organism such as the A. gambiae, ligands of the volatile chemical odorants interact with their conjugate receptor expressed on the surface of ORNs (see Figure 2a). Formation of the odorant-receptor complex induces a change in membrane potential of the ORN, and this information propagates to the glomerulus in the antennal lobe. Clusters of ORNs that express the same receptor converge on a single glomerulus. A single projection neuron connects the glomerulus to the higher brain, the mushroom bodies, and lateral horn in insects, for information processing that results in odorant discrimination. Changes in ORN activity, the increase or decrease of the firing rate, is used in the biosensor system (see Figure 2b). Similar to the model organism, the olfactory biosensor employs a sensory array of Ors used to detect the volatile chemical odorants. The change in membrane potential of the mutant olfactory receptor neurons (mORNs) are recorded and processed by an algorithm. In this work, we use an ANN in the form of a MLP to process the mORN responses in order to classify odorants into their chemical class.

Figure 2:

Olfactory discrimination by (a) an organism such as the Anopheles gambiae and (b) olfactory biosensor. (a) In the organism, the volatile odorants bind to a corresponding odorant receptor (Or) located on the surface of the olfactory receptor neurons (ORNs). The resulting change in cell membrane potential is propagated to the corresponding glomerulus (G) of the ORN cluster. A single projection neuron transmits the signal to the higher brain centers and is translated into behavior of the organism. (b) Volatile odorants induce electrophysiological responses of the mutant olfactory receptor neurons (mORNs) in the biosensor, which are fed into the multilayer perceptron (MLP) for analysis and allow for odorant classification.

Figure 2:

Olfactory discrimination by (a) an organism such as the Anopheles gambiae and (b) olfactory biosensor. (a) In the organism, the volatile odorants bind to a corresponding odorant receptor (Or) located on the surface of the olfactory receptor neurons (ORNs). The resulting change in cell membrane potential is propagated to the corresponding glomerulus (G) of the ORN cluster. A single projection neuron transmits the signal to the higher brain centers and is translated into behavior of the organism. (b) Volatile odorants induce electrophysiological responses of the mutant olfactory receptor neurons (mORNs) in the biosensor, which are fed into the multilayer perceptron (MLP) for analysis and allow for odorant classification.

1.2  Analyzing ORN Responses to Volatile Odorants

The different pattern recognition methods used in the field of electronic noses can be divided into two main categories: those that discriminate among classes, such as linear discriminant analysis (LDA) and partial least squares discriminate analysis (PLS-DA), and those that aim toward modeling classes, such as unequal dispersed classes (UNEQ) and soft independent modeling of class analogy (SIMCA) (Berrueta, Alonso-Salces, & Héberger, 2007; Massart et al., 1997). Studies (Mazzatorta, Benfenati, Lorenzini, & Vighi, 2004; Todeschini, Ballabio, Consonni, Mauri, & Pavan, 2007) have found that discriminant analysis has comparable classification abilities as class-modeling methods (Boyle, McInally, & Ray, 2013). Particular success has been achieved in the application of discriminant analysis techniques to artificial nose studies (Boyle et al., 2013; Li, Heinemann, & Sherry, 2007; Marco & Gutierrez-Gálvez, 2009, 2012; Raman, Stopfer, & Semancik, 2011; Tauxe, MacWilliam, Boyle, Guda, & Ray, 2013; Yongwei, Wang, Zhou, & Lu, 2009). Another powerful tool that addresses the underlying multiclass classification problem of odorant detection is support vector machines (SVM) (Cortes & Vapnik, 1995; Huerta & Nowotny, 2009; Huerta, Vembu, Amigó, Nowotny, & Elkan, 2012). SVMs present high generalization capability, and with the possibility of employing a kernel trick, they have proven to be a very popular learning machine technique. Particular success is found in bio-inspired olfactory detection models (Cortes & Vapnik, 1995; Distante, Ancona, & Siciliano, 2003; Galán, Sachse, Galizia, & Herz, 2004; Huerta et al., 2012; Pardo & Sberveglieri, 2005). In this work, we employ artificial neural networks (ANNs) to discriminate odorants based on their chemical classes. The perceptron is one of the earliest forms of ANNs (Rosenblatt, 1958) and was inspired by the activity of the biological neural network system (McCulloch & Pitts, 1943). Increasingly sophisticated ANNs have since been developed, leading to ANNs abstracting from the original biological model. However, biologically inspired models are still used (Huerta, Nowotny, García-Sanchez, Abarbanel, & Rabinovich, 2004; Huerta & Nowotny, 2009).

2  Motivation

Preliminary studies were performed in which we successfully identified the chemical classes of odorants using an MLP trained on the response of D. melanogaster Ors (Bachtiar et al., 2011b, 2013; Bachtiar, Unsworth, & Newcomb, 2014a, 2014b) and MLPs trained on chemical descriptor values (Bachtiar et al., 2011a). The motivation for this work is to provide support for the development of an olfactory biosensor through the design of a processing system and proposal of an efficient sensory array. By analyzing A. gambiae Ors as a possible sensor compared to D. melanogaster Ors, we produced an MLP system that operates efficiently with the sensory array of choice to provide high accuracy odorant classification.

3  Methods

3.1  Data Used

In this study, we used the responses of 50 AgOrs to 110 different volatile compounds that were measured by Carey et al. (2010) through the use of the “empty neuron” system and extracellular electrophysiological recording methods (Dobritsa et al., 2003; Hallem et al., 2004). We represent an odor as a vector where each component within the vector is defined as the spontaneous firing rate response of ORN type (Laurent et al., 2001). These odorant response vectors are fed into the ANN to be analyzed. We will compare the results obtained using AgOrs to our previous work that involved DmOrs (Bachtiar et al., 2011a, 2011b, 2013). Further tests that examine the MLP classification performance difference between using AgOrs and DmOrs will also be presented.

The A. gambiae data set used in this work is presented in Table 1. A representative chemical of each chemical class is included to show the range of excitatory and inhibitory responses of the AgOrs. The data set includes 50 AgOr responses to 110 different chemical odorants, which belong to 12 chemical classes. The “Other” class consists of various odors released by A. gambiae hosts such as carbon dioxide and 1-octen-3-ol. These chemicals allow A. gambiae to target hosts for feeding (Mboera & Takken, 1997). From previous work (Bachtiar, Unsworth, & Newcomb, 2013), we have found that these chemicals do not possess similar characteristics and as such are difficult to classify. Furthermore, as the amine, lactone, sulfur compounds, and other chemical classes, highlighted in bold in Table 1, do not have enough chemical odorants for processing, these classes were removed from our analysis.

Table 1:
Firing Rate Responses of Representative Chemicals from the Various Chemical Classes of the Anopheles gambiae Data Set.
   Odorant Receptor 
Chemical Class Total Odorants Chemical Odorant AgOr1 AgOr2 AgOr76 
Amines 3 Ammonia −1 1 4 
Lactones 2 G-decalactone 2 4 −4 
Carboxylic acids 24 Propanoic acid −6 
Sulfur compounds 1 Dimethyl sulfide −19 6 −7 
Terpenes Geraniol −2 
Aldehydes Hexanal −13 −10 
Ketones Acetone −11 −3 
Aromatics 18 Phenol 234 170 −10 
Heterocyclics Thiazole 26 −14 
Alcohols 15 Methanol 
Esters 11 Ethyl acetate −11 −5 
Other Heptane −2 −2 −5 
   Odorant Receptor 
Chemical Class Total Odorants Chemical Odorant AgOr1 AgOr2 AgOr76 
Amines 3 Ammonia −1 1 4 
Lactones 2 G-decalactone 2 4 −4 
Carboxylic acids 24 Propanoic acid −6 
Sulfur compounds 1 Dimethyl sulfide −19 6 −7 
Terpenes Geraniol −2 
Aldehydes Hexanal −13 −10 
Ketones Acetone −11 −3 
Aromatics 18 Phenol 234 170 −10 
Heterocyclics Thiazole 26 −14 
Alcohols 15 Methanol 
Esters 11 Ethyl acetate −11 −5 
Other Heptane −2 −2 −5 

Notes: The “Chemical Odorant” column contains a representative member of the chemical class and the corresponding firing rates of three odorant receptors: AgOr1, AgOr2, and AgOr76. Due to the shortage of odorants in the amine, lactone, sulphur compound, and Other chemical classes, highlighted in bold, they were not included in this study. Positive values indicate excitatory responses, whereas negative values indicate inhibitory responses. Spontaneous firing rates have been subtracted and responses to diluents have not (Carey et al., 2010). Odorants containing both a phenol ring and an ester moiety are classified as aromatics; terpenes containing an ester moiety are classified as terpenes.

The study by Carey et al. (2010) follows the empty neuron method outlined by Hallem and Carlson (2006), where an insect Or response is recorded in a D. melanogaster mutant olfactory receptor neuron (mORN). The work by Carey includes 34 odorants that were also used in the study of D. melanogaster Ors by Hallem and Carlson (2006). By establishing a common odorant data set, we are able to provide a performance comparison between the Or repertoire of the two insects. Table 2 presents the common odorant data set, including the chemical classes of the 34 common odorants used in this work. A representative chemical from each class is included to illustrate the different responses of the AgOrs and DmOrs; receptor responses to the representative chemicals are presented in their respective positions (x, x, x and x in the MLP input layer. It should be noted that only 24 DmOrs were available in comparison to the A. gambiae data set where 50 AgOrs were available for use.

Table 2:
Firing Rate Values of Representative Chemicals from the Common Chemical Classes of the Anopheles gambiae (AgOr) and Drosophila melanogaster (DmOr) Data Sets.
ChemicalTotalChemicalInsectInput Layer Position
ClassOdorantsOdorantReceptorxxxx
Acids Propanoic acid AgOr −6 
   DmOr 11 NA 
Ketones Acetone AgOr −11 −4 −3 
   DmOr −10 −1 NA 
Aromatics Methyl benzoate AgOr −6 −3 18 −4 
   DmOr −27 226 NA 
Alcohols Methanol AgOr 
   DmOr 10 22 NA 
Esters Ethyl acetate AgOr −11 −5 −5 
   DmOr −3 23 NA 
ChemicalTotalChemicalInsectInput Layer Position
ClassOdorantsOdorantReceptorxxxx
Acids Propanoic acid AgOr −6 
   DmOr 11 NA 
Ketones Acetone AgOr −11 −4 −3 
   DmOr −10 −1 NA 
Aromatics Methyl benzoate AgOr −6 −3 18 −4 
   DmOr −27 226 NA 
Alcohols Methanol AgOr 
   DmOr 10 22 NA 
Esters Ethyl acetate AgOr −11 −5 −5 
   DmOr −3 23 NA 

Notes: Common chemicals have been used in the studies conducted by Carey et al. (2010) and Hallem, Dahanukar, and Carlson (2006) allow for a comparison of the two insect Or data sets. Odorant receptor responses toward the representative chemicals are presented in their respective positions (x, x, x, and x in the MLP input layer. Only 24 DmOrs were available; thus NA is listed under position x50 for D. melanogaster, in comparison to A. gambiae, where 50 AgOrs were available.

Prior to ANN testing, the data were preprocessed to improve the network’s performance (Huang, Tan, & Tang, 2004). The preprocessing involved de-meaning and normalization, which improves network learning and prediction (Bishop, 1994; Sola & Sevilla, 1997). The statistical normalization method employed in particular modifies large raw values such that they cluster in an appropriate area, which further enhances the learning rate of the ANN (Sola & Sevilla, 1997). Statistical normalization involves subtracting the mean and dividing by the standard deviation as follows (Bachtiar et al., 2013):
formula
3.1
where Xi is the raw value of the ith training vector, the mean, the standard deviation, and xt the calculated normalized value. The resulting data ensure that all values are centered about zero, with a zero mean and standard deviation of 1.

3.2  The Multilayer Perceptron Artificial Neural Network and the Learning Scheme Employed

The ANN model used in this study is the multilayer perceptron (MLP). It is a feedforward network in which an input vector, the Or array response profile to an odorant, is given to the input layer and passed through the subsequent hidden layer(s) and output layer. An investigation of different MLP architectures was performed by alternating between single and double hidden layers and implementing a series of MLPs into a hybrid MLP system. A simple schematic of each MLP system is shown in Figures 3a to 3d. The output vector (c, c of the multineuron output layer, single MLP system (see Figures 3a and 3b) represents the classification performance of the validation set. Each value of the vector corresponds to the classification of a validation vector of a specified chemical class. This value is then converted to classification (%) accuracy. The calculation method is further discussed in section 3.3 (Bachtiar et al., 2013). The hybrid MLP system (see Figures 3c and 3d) consists of multiple single-neuron output MLPs in series, wherein each MLP of the hybrid system corresponds to a chemical class. These hybrid MLP structures have been found to be beneficial to multiclass problems as found in this work (Bachtiar et al., 2011b, 2013; Gardner, Hines, & Wilkinson, 1990; Longstaff & Cross, 1987). Machine learning techniques, namely support vector machine methods such as one-against-all, one-against-one, and the directed acyclic graph SVM (DAGSVM), have also been found useful for addressing multiclass problems (Acevedo, Maldonado, Dominguez, Narvaez, & Lopez, 2007; Hsu & Lin, 2002; Weston & Watkins, 1998).

Figure 3:

Schematic of the different multilayer perceptron (MLP) systems used a (a) single hidden layer, single MLP system; (b) double hidden layer, single MLP system; (c) single hidden layer, hybrid MLP system; and (d) double hidden layer, hybrid MLP system. The input layer (, corresponds to the input vector of receptor responses to odorants, the hidden layer(s) have different numbers of hidden neurons according to the best-performing length, and the output layer () corresponds to the output value (, of the different chemical classes. The hexagons represent the weighting functions () belonging to the different layers. (e) A representative output of the change in network error and (f) a representative illustration of the change in weighting value over the defined 200 epoch training period.

Figure 3:

Schematic of the different multilayer perceptron (MLP) systems used a (a) single hidden layer, single MLP system; (b) double hidden layer, single MLP system; (c) single hidden layer, hybrid MLP system; and (d) double hidden layer, hybrid MLP system. The input layer (, corresponds to the input vector of receptor responses to odorants, the hidden layer(s) have different numbers of hidden neurons according to the best-performing length, and the output layer () corresponds to the output value (, of the different chemical classes. The hexagons represent the weighting functions () belonging to the different layers. (e) A representative output of the change in network error and (f) a representative illustration of the change in weighting value over the defined 200 epoch training period.

As illustrated by the hexagons ( in Figure 3, each layer of the MLPs contains weighting functions that were drawn from a symmetric gaussian distribution with a zero mean and variance of unity. A sigmoid activation is applied to the input vector-weighting function complex to provide an output value from 0 to 1; the calculation at each neuron of the MLP is as follows:
formula
3.2
where the is the kth input vector of layer l and is the weighting function of the jth hidden neuron of layer . A summation of all m input vector and weighting function complexes is presented to the sigmoid activation function f to produce , the jth output of layer . Small weighting values were initially chosen to optimize a momentum function that is known to improve network generalization by preventing the MLP from idling at local minima (Churchland & Sejnowski, 1992). It should be noted that each MLP of the hybrid system is independent (see Figures 3c and 3d), possessing an independent set of weighting functions.

Learning ability can be regarded as a fundamental characteristic of intelligence, and in the context of ANNs, the learning process is observed by updating the network architecture via the weighting functions. Implementation of machine learning techniques has been found to improve odorant identification prediction (Tauxe et al., 2013). The learning paradigm used in this study is the supervised learning method that involves presenting a correct answer or output to the network for every input pattern. We employed a backpropagation algorithm as a learning scheme for the MLP system. It is a gradient descent algorithm that operates by adjusting the initially randomly chosen weights based on the error calculated at each neuron. This learning scheme can be said to have been derived from the Hebbian learning rule, which is based on the observation of neurobiological experiments that if neurons on both sides of a synapse are activated synchronously and repeatedly, the synapse’s strength is selectively increased (Bachtiar et al., 2011a; Jain, Mao, & Mohiuddin, 1996). Learning is performed locally where the change in the weighting functions depends on only the activities of the neurons connected by it. Backpropagation is weak against local minima, which may appear on the error surface of a network with hidden units (Churchland & Sejnowski, 1992). A momentum function is included to enhance the gradient descent algorithm, allowing for a more stabilized network, preventing it from stagnating at local minima (Churchland & Sejnowski, 1992). The degree of change is based on the feedback signal (see Figure 4), calculated from the cumulative error between the network output signal and target signal. Backpropagation essentially involves finding the network weights that minimize the network cumulative error.

Figure 4:

Schematic of a double hidden layer MLP demonstrating backpropagation in which the cumulative error is based on the total difference between the output signal and target signal.

Figure 4:

Schematic of a double hidden layer MLP demonstrating backpropagation in which the cumulative error is based on the total difference between the output signal and target signal.

An example of the backpropagation algorithm used in this work is presented in the schematic of a double hidden layer, single MLP in Figure 4. Backpropagation begins at output layer of the MLP. The weighting function corresponds to the weighting function between output layer and the preceding layer . The weight update relates to the weighting function at neuron j of layer to neuron i of layer ; it is the change between and , after and before the presentation of a single training vector at time t. The index i represents neurons of output layer , the index j for neurons of the second hidden layer , the index k for neurons of the first hidden layer l, and the index p for neurons of the input layer .

Network error ei is the difference between the output and the target or desired signal of the input vector. The corresponding error signal is calculated as follows:
formula
3.3
The error signal is subsequently used to calculate the weight change :
formula
3.4
The parameter is a value of 0.05 and represents the learning rate parameter of network training. This value was defined after performing initial simulations and was chosen to enhance MLP classification performance while preventing overtraining of the MLP system. The algorithm advances to the preceding layer in which the error signal is calculated using the error signal and weighting function :
formula
3.5
The index denotes the number of neurons of layer . The weight change of layer is given by
formula
3.6
The final error signal of layer l is calculated by
formula
3.7
The index denotes the number of neurons of layer . The corresponding weight change of layer l is calculated by
formula
3.8
After presentation of the training vector at time t, the weighting functions are updated with the newly calculated weight changes (see equations 3.4, 3.6, and 3.8) and an additional momentum term for layer l where :
formula
3.9
formula
3.10
formula
3.11

After extensive trials, we determined a fixed stopping criteria of 200 epochs as it produced a desired convergence error of 0.01 (where an epoch is defined as the presentation of a complete, randomly shuffled training set). Figure 3e shows a representative output of the change in network error as the MLP adjusts its weighting functions accordingly over the defined 200 epoch training period (see Figure 3f).

3.3  Network Training and Validation

Training of the network involves approximately 90% of the data, the remaining 10% is used as a validation set to assess the degree of network learning (Bishop, 1994). The validation set is obtained by randomly choosing a set number of odorants from each chemical class. For the A. gambiae data set, the validation set was composed of carboxylic acid (2 odorants), terpene (1), aldehyde (1), ketone (1), aromatic (2), heterocyclic (1), alcohol (1), and ester (1). The validation set of the D. melanogaster data was composed of lactone (1), acid (2), terpene (2), aldehyde (1), ketone (2), aromatic (2), alcohol (2), and ester (3). For the common odorant data set that includes both A. gambiae and D. melanogaster Ors, the validation set was composed of acid (1), ketone (1), aromatic (1), alcohol (1), and ester (1). All tests were initially performed using a fixed, randomly obtained validation set. This fixed validation set was used for simulations that required longer processing times, such as the procedure for finding the optimal number of hidden neurons of the MLP systems and the combination of Ors that enhanced MLP classification performance. The final performance results were obtained through implementation of bootstrapping, which is to provide a more accurate representation of the networks’ prediction performance (Efron, 1983; Harrell, 2001). The application of bootstrapping involved 10,000 different training and validation sets obtained from the raw data, which were presented to the MLP to provide 10,000 independent prediction results of the validation sets (Efron & Tibshirani, 1993). Thus, an initial catalog of (n) odorants from (m) chemical classes is used to train the MLPs. Once trained, it is possible for MLPs to predict the chemical class of odorants that the MLP were not initially trained on, which we refer to as predicting unseen chemicals.

Classification results from the MLP system represent the prediction (%) accuracy of correctly identifying a previously unseen chemical of the validation set; a value close to 100% indicates a high probability that the chemical has been correctly classified. The output values of the network are continuous, and the prediction (%) values are obtained by dividing the output validation vector by the absolute sum of the output signal (Bachtiar et al., 2013):
formula
3.12

An additional measure of performance is the mean prediction value of the whole validation set. This value is obtained by calculating the sum of the prediction (%) of each validation vector and dividing by the number of validation vectors, which gives an appreciation of how well the complete validation set performed. To quantify the degree of network learning, a conservative threshold value was implemented. This value was determined by observing the probability of choosing a chemical from the validation set that belongs to the largest chemical class present in the data set, with a 5% margin included as an added safeguard. A successful classification was identified as a (%) value superseding this threshold value, while values below the threshold are interpreted as a failure of classification. For example, in the A. gambiae data set, the largest chemical class is the carboxylic acid group; the probability of randomly choosing a carboxylic acid from the validation set is or 20%. With an added safeguard of 5%, the threshold value for the A. gambiae data set is 25%. The conservative threshold value for both the D. melanogaster data set and common odorant data set was also calculated to be 25%.

3.4  Incrementing the MLP’s Hidden Neurons Employed and Exploring Insect Odorant Receptor Combinations

We have expressed the input vector of the MLP system to complement the number of A. gambiae Ors used. The number of hidden neurons in the MLPs is very important to finding a quality solution because network generalization, the ability of a network to operate on unseen data, is affected by network size and architecture (Kavzoglu, 1999). By incrementally adjusting the number of hidden neurons, we are able to obtain an MLP structure that produces a high-quality solution. The upper range of hidden neurons tested is twice the length of the input layer (Kavzoglu, 1999). This is to ensure no overfitting and overcomplication of the network. Approximately 40,000 simulations were performed to find the optimal hidden layer length(s). Extensive testing showed that a stopping criterion of 200 epochs was suitable for efficient learning and provided a run time of 3 minutes for each simulation, which results in an approximate 2- to 3-week period to complete the simulations on a standard PC.

With 50 AgOr and 24 DmOr sensors available for processing, it is pertinent to minimize the optimal set of AgOrs and DmOrs while also maximizing the classification performance of the system (Gutierrez-Osuna, 2002; Phaisangittisagul & Nagle, 2008; Wilson & Baietto, 2009). The reason for Or minimization is to refine experimental cost, which is known to be demanding and time consuming (Hallem et al., 2004). We investigate MLP classification performance using 3, 5, 10, 20, 24, 30, 40, and 50 Ors. However, due to the larger number of combinations possible, it was not realistic to conduct a complete simulation of all OR combinations. For that reason, we formed a bank of all possible combinations and randomly obtained a group of 10,000 OR sets to be subsequently used in the simulations at each different OR length tested (Bachtiar et al., 2013). Furthermore, we implemented a variation of the wrapper method (Hall & Smith, 1997) in which we assessed the MLP classification performance on the combination of particular AgOrs and DmOrs. Performances of the different combinations were based on the highest prediction accuracy of the validation odorant with the lowest prediction across the validation set. In particular, the use of a wrapper method by Nowotny et al. (2013) for optimal feature selection of a metal oxide sensor array was found to positively enhance the classification method undertaken.

4  Results

In this section, a linear predictor (Poggio & Girosi, 1990) was initially tested; the failure of this predictor to classify the validation set suggested that the data set required a nonlinear MLP system for successful classification. We tested a variety of MLP architectures and investigated the effect of reducing the number of insect Ors. Finally, we compared the performance using AgOrs and DmOrs through MLP analysis of the common odorant data set.

4.1  Initial Analysis with a Linear Predictor

A linear predictor was initially tested to ensure the suitability of using the nonlinear MLP classifier. The results, presented in Figure 5, show a mean prediction of 16.25% across the fixed validation set with three odorants (Terpene, aromatic, and alcohol classes) exceeding the threshold value, while seven others failed. The right bar in Figure 5 presents the results from bootstrapping, which produced a mean prediction value of 20.20% across the validation set. The prediction values of each odorant vector after bootstrapping did not surpass the threshold, which indicates that on average, no odorants surpassed the threshold value. Error bars are calculated from the standard deviation of the 10,000 bootstrapping training and validation sets, which are included to illustrate the variation in performance due to bootstrapping. We define the worst possible case as the number of detections above the threshold when all the validation vectors are attributed with the lower error bar. Similarly, we define the best possible case as the number of detections above the threshold when all validation vectors are attributed with the upper error bar. Hence, because the lower error bar of each odorant in Figure 5 is under the defined threshold, it results in a 0/10 or 0% worst possible case. There are 6/10 odorants with error bars above the threshold, which produces a best possible case of 60%. This poor result validates our assumption that a linear predictor is insufficient and a nonlinear method is needed for more effective odorant classification.

Figure 5:

Performance of the linear predictor on the Anopheles gambiae data set. A fixed validation set (left bar) and bootstrapping (right bar) method is applied. The horizontal dashed lines represent the 25% conservative threshold value that defines correct classification of an odorant of the validation set. Results from bootstrapping include the number of odorants on average that surpass the threshold, the worst possible (lower error bar) and best possible (upper error bar) classification cases, and the mean prediction value of the validation set.

Figure 5:

Performance of the linear predictor on the Anopheles gambiae data set. A fixed validation set (left bar) and bootstrapping (right bar) method is applied. The horizontal dashed lines represent the 25% conservative threshold value that defines correct classification of an odorant of the validation set. Results from bootstrapping include the number of odorants on average that surpass the threshold, the worst possible (lower error bar) and best possible (upper error bar) classification cases, and the mean prediction value of the validation set.

4.2  Finding an Optimal MLP Structure

To find the optimal MLP system for the data set that produces the finest classification performance, the number of neurons of the hidden layer(s) was incrementally altered and the subsequent change in performance established. Hybrid MLP systems were also tested because they have been found to overcome the complexity of multiclass data sets to produce better classification performance (Bachtiar et al., 2011a, 2011b, 2013; Kavzoglu, 1999). By reducing the output layer to a single neuron, we produced a series of MLPs corresponding to each chemical class present in the data. Each MLP of the hybrid system is trained to present a desired output close to 1 for corresponding chemicals and a desired output of near 0 for all other chemicals. A fixed validation set was used to determine the best-performing structure as it allows for comparisons between the MLP systems, presented in the upper four plots in Figure 6. Bootstrapping was subsequently performed and is presented by the lower four plots in Figure 6. Both single- (S1 and H1) and double-hidden-layer (S2 and H2) architectures were also investigated and presented in Figure 6.

Figure 6:

Performance of a single MLP system and hybrid MLP with an optimal number of hidden neurons. A fixed validation (upper row) and bootstrapping method (lower row) are applied on the MLP architectures. The use of a single (S1 and H1) and double (S2 and H2) hidden layers is also presented. The letter S indicates a single MLP, whereas the letter H indicates a hybrid MLP system; the numbers following these letters correspond to the number of hidden layers used in the system (e.g., H2 represents a hybrid MLP system with two hidden layers). The horizontal dashed lines represent the 25% conservative threshold value that defines correct classification of an odorant of the validation set.

Figure 6:

Performance of a single MLP system and hybrid MLP with an optimal number of hidden neurons. A fixed validation (upper row) and bootstrapping method (lower row) are applied on the MLP architectures. The use of a single (S1 and H1) and double (S2 and H2) hidden layers is also presented. The letter S indicates a single MLP, whereas the letter H indicates a hybrid MLP system; the numbers following these letters correspond to the number of hidden layers used in the system (e.g., H2 represents a hybrid MLP system with two hidden layers). The horizontal dashed lines represent the 25% conservative threshold value that defines correct classification of an odorant of the validation set.

Results from the fixed validation simulations showed that when either single- or double-hidden layer designs are used, both a single MLP and hybrid MLP successfully classified the complete validation set (upper row of Figure 6). With bootstrapping, on average all four MLP architectures classified the entire validation set. However, the performance when using a single or double hidden layer showed slight variation in terms of the best and worst possible classification. When we used a single MLP with a single hidden layer (see Bootstrapping S1, Figure 6), the mean prediction of the validation set was 74.83%, with a worst possible classification of 90% of the validation set and a best possible case of 100%. With a double hidden layer (see Bootstrapping S2, Figure 6), a mean validation set prediction of 79.53% was presented, with a 70% worst possible case and a best possible case of 100%. Our use of a hybrid MLP showed a higher mean prediction of the validation set. With a single hidden layer (see Bootstrapping H1, Figure 6), the hybrid MLP gave a 78.83% mean prediction of the validation set, with a worst possible case of 90% and a best possible case of 100%. The double-hidden-layer hybrid MLP design (see Bootstrapping H2, Figure 6) produced a mean prediction of 72.09%, with a worst possible case of 80% and a best possible case of 100%. Because the bootstrapping performances of the MLP systems were comparable, we proceeded to use the different systems for subsequent simulations.

4.3  Effect of Reducing the Number of Anopheles gambiae Odorant Receptors

It was of interest to investigate different AgOr combinations and determine their subsequent performances with the MLP systems. The combination of Ors that produced the best classification performance was selected based on the highest prediction accuracy of the validation odorant with the lowest prediction from the fixed validation set (i.e., the relative strength of the output neuron of the correct class). Bootstrapping was then subsequently applied to the system. Classification results are presented in Figure 7, in which the y-axis highlights the reduction of AgOrs used in the MLP system. Across the x-axis of Figure 7 are the results of the four MLP systems that were included to ensure optimal classification performance.

Figure 7:

Changes in MLP classification performance of different MLP structures when decreasing the number of Anopheles gambiae Ors used. The corresponding number of Ors used and the MLP system employed are presented. The letter S indicates a single MLP, whereas the letter H indicates a hybrid MLP system; the numbers following these letters correspond to the number of hidden layers used in the system (e.g., H2 represents a hybrid MLP system with two hidden layers). The horizontal dashed lines represent the 25% conservative threshold value that defines correct classification of an odorant of the validation set. The graphs highlighted by the boxes with a broken border indicate the MLP system that presented the best classification with the particular number of Ors used.

Figure 7:

Changes in MLP classification performance of different MLP structures when decreasing the number of Anopheles gambiae Ors used. The corresponding number of Ors used and the MLP system employed are presented. The letter S indicates a single MLP, whereas the letter H indicates a hybrid MLP system; the numbers following these letters correspond to the number of hidden layers used in the system (e.g., H2 represents a hybrid MLP system with two hidden layers). The horizontal dashed lines represent the 25% conservative threshold value that defines correct classification of an odorant of the validation set. The graphs highlighted by the boxes with a broken border indicate the MLP system that presented the best classification with the particular number of Ors used.

The observed trend is that classification performance decreases as fewer AgOrs are used, in particular, the bootstrapping prediction of each validation vector and worst possible cases. Out of the different Or lengths tested, using 30 Ors produced the best performance as it on average correctly classified the complete validation set, with a mean prediction value of 56.31% across the validation set, a worst possible case of 90%, and a best possible case of 100% classification. The boxes with a dashed border in Figure 6 highlight the best-performing MLP architecture for the particular number of Ors used. The performance comparison is based on the overall classification performance: mean validation set prediction and the worst and best possible cases based on the error bar with respect to the defined threshold. It appears that using a single MLP was the favored architecture when fewer Ors were used (3 Ors, 5 Ors, and 10 Ors). When using 20 Ors, 30 Ors, and 40 Ors, a hybrid MLP system was the better-suited architecture by producing higher classification performance.

The AgOrs chosen for the different Or length combinations are displayed in Table 3. A number of AgOrs were used in more than one combination, such as AgOr10, AgOr27, AgOr38, AgOr42, AgOr57, AgOr65, and AgOr67, which suggests the importance of using these particular Ors. The combinations of Ors of particular importance are AgOr38, AgOr57, and AgOr67, which were selected in the best-performing 3-AgOr combination.

Table 3:
The Best-Performing Anopheles gambiae Odorant Receptor (AgOr) Combinations with Various Odorant Receptor (Or) Lengths.
MLP Structuren OSNs123456891011121315161820212325262730313233
S2                          
S1                          
S1 10                          
H1 20                          
H1 30                          
H2 40                          
 n OSNs 35 38 39 41 42 43 44 45 46 48 50 53 54 56 57 59 61 63 64 65 66 67 73 75 76 
S2                          
S1                          
S1 10                          
S1 20                          
H2 30                          
H2 40                          
MLP Structuren OSNs123456891011121315161820212325262730313233
S2                          
S1                          
S1 10                          
H1 20                          
H1 30                          
H2 40                          
 n OSNs 35 38 39 41 42 43 44 45 46 48 50 53 54 56 57 59 61 63 64 65 66 67 73 75 76 
S2                          
S1                          
S1 10                          
S1 20                          
H2 30                          
H2 40                          

Notes: There are 50 different AgOrs available; the best-performing combinations when using different lengths of Ors are checked. The corresponding MLP structures that produced the best odorant classification performance are also included.

4.4  Performance Comparison of Anopheles gambiae and Drosophila melanogaster Odorant Receptors

The difference in classification performance between the A. gambiae and D. melanogaster data sets was investigated. As mentioned earlier, 50 AgOrs were available for analysis, whereas the D. melanogaster data set contains only 24 DmOrs. One of the aims of this work was to investigate the best-performing 3-Or combination between the two insects to allow for efficient development of the olfactory biosensor sensory array. Results from using 3-Or combinations are presented in Figure 8. The performances of the complete Or array of the A. gambiae and D. melanogaster data sets are also included, in which the corresponding optimal MLP system was used.

Figure 8:

Output of the optimal MLP systems when using the (a) Anopheles gambiae and (b) Drosophila melanogaster data sets. The best performing three-odorant receptor (Or) combination (left bar) and full receptor array (right bar) results for both data sets. The 25% conservative threshold value is presented as a horizontal broken line.

Figure 8:

Output of the optimal MLP systems when using the (a) Anopheles gambiae and (b) Drosophila melanogaster data sets. The best performing three-odorant receptor (Or) combination (left bar) and full receptor array (right bar) results for both data sets. The 25% conservative threshold value is presented as a horizontal broken line.

When the A. gambiae data set was used, the 3-Or array produced a performance where on average, 90% of the odorant vectors were correctly classed with a mean prediction of 47.04% across the validation set, with a worst possible case of 60% and a best possible case of 90% classification. When the complete A. gambiae Or array was used, the MLP system on average completely classed the validation set with a mean prediction of 59.55% across the validation set, with a worst case of 80% and a best case of 100%. Analysis of the D. melanogaster data set with three Ors found that on average, the MLP classed 66.7% of the validation set with a mean prediction of 29.14%, with a worst case of 40% and a best case of 73.3%. Using the complete D. melanogaster Or array yielded correct classification of on average 86.7% odorant vectors with a mean prediction of 50.95% across the validation set, with a worst case of 60% and a best case of 93.3%. From these results, the A. gambiae data set outperforms the D. melanogaster data set when using three Ors and when using their respective complete Or array. To perform a more comparative investigation between the two insects, analysis using the common odorant chemicals between the two data sets was performed, and the effect of reducing the number of Ors was examined.

Recordings of the experiments conducted by Hallem et al. (2006) and Carey et al. (2010) were performed and obtained under similar conditions, which allows for a comparison in MLP classification performance of the different data sets. This common chemical data set contains 34 chemical odorants belonging to 5 chemical classes (see Table 2). Simulations were performed with varying Or lengths of 3, 5, 10, 20, 24, 30, 40, and 50. A comparison between AgOr and DmOrs was investigated up to and including 24 Ors; however, from 24 to 50 Ors, only A. gambiae Or classification performance was performed. The MLP systems used are unique for the different Or lengths tested; the A gambiae MLP architectures were found and presented earlier, while the D. melanogaster MLP architectures can be found in detail in our previous work (Bachtiar et al., 2013). Thus, the results presented in Figure 9 are the best-performing Or combinations and their corresponding MLP system used, with the application of bootstrapping.

Figure 9:

Prediction performance and receiver operating characteristics (ROC) plots of the ANN system on the common odorant data set and effect of altering the number of Anopheles gambiae and Drosophila melanogaster Ors used. The data set consists of common chemical odorants for both AgOr and DmOrs. A conservative threshold value of 25% is illustrated by the horizontal dashed lines. The AUC scalar value of the ROC plots represents the area under the curve of each ROC plot. Only 24 DmOrs were available for analysis; consequently simulations using 30 to 50 Ors present N/A for the D. melanogaster.

Figure 9:

Prediction performance and receiver operating characteristics (ROC) plots of the ANN system on the common odorant data set and effect of altering the number of Anopheles gambiae and Drosophila melanogaster Ors used. The data set consists of common chemical odorants for both AgOr and DmOrs. A conservative threshold value of 25% is illustrated by the horizontal dashed lines. The AUC scalar value of the ROC plots represents the area under the curve of each ROC plot. Only 24 DmOrs were available for analysis; consequently simulations using 30 to 50 Ors present N/A for the D. melanogaster.

Surprisingly, when only three Ors were used, the AgOrs correctly classified the complete validation set with a mean prediction of 62.81% across the validation set, outperforming the DmOr combination, which produced a mean classification prediction of 32.36%, where, on average, it classified 60% of the vectors with a worst possible case of 60% and best possible case of 80% (see Figure 9). The superior performance of the AgOr set may be attributed to the MLP architecture chosen, which was found to be optimal for the particular input space of three Ors. Comparing the full 24 DmOr array with the best 24 AgOr combination showed very close classification performance, with a mean prediction of 61.88% for the D. melanogaster compared to the A. gambiae’s mean of 59.09%. Overall for the 24-Or combination, they performed equally well, having correctly classified the complete validation set: 100% classification with worst and possible cases. Performance of the AgOr beyond 24 Ors shows great performance. However, using the complete 50 Or array yields a reduction in classification performance, whereby on average, all validation vectors were classified, with a worst case of 80% and a best possible case of 100%. This may be due to the large input space attributed from the 50 AgOrs used for analysis. The single MLP was unable to discriminate between classes, which suggests an additional nonlinear function is needed—a double hidden layer system.

A receiver operating characteristics (ROC) graph is a procedure to visually present the performance of classifiers used in machine learning. ROC graphs can be used to show the balance between hit rates (true positive) and false alarm (false positive) rates of classifiers (Egan, 1975; Swets, 1988). In this work, a true positive instance occurs if the MLP system provided a positive output for the classification of a chemical into its correct class; a false positive would occur if the MLP provided a positive output for the incorrect classification of a chemical. Furthermore, a measure of classifier performance can be judged from the area under the curve (AUC) of an ROC plot. The AUC value of a classifier is equivalent to the probability that the classifier will rank a randomly chosen positive instance higher than a randomly chosen negative instance (Fawcett, 2006). Thus, the AUC scalar value, which lies between 0 and 1, can represent the expected performance of a classifier. An AUC no less than 0.5 is typical for most classifiers, wherein the larger AUC of a classifier, the better average performance can be said of the system. A Matlab Statistics Toolbox (perfcurve) was used to produce ROC curves and calculate the AUC (MathWorks, 2014). The raw value of the MLP output neurons (value between 0 and 1) and their corresponding labels (a value of 0 or 1) were fed into the perfcurve function. The perfcurve function internally changes the threshold level and assesses the classifier output scores with respect to their labels in order to calculate the performance versus threshold level. Because the perfcurve function handles a vector of outputs, several output neurons such as in the single MLP can be assessed in the ROC analysis. Finally, the array of values produced by the perfcurve function corresponds to the TPR and FPR values, which allow for the plotting of the ROC curve.

The ROC plots in Figure 9 present the changes in classifier performance due to altering the number of Ors used by the system and are clearly seen from the shape of the ROC plots and the difference in AUC values. As more Ors are used, the ROC plots shift closer to the perfect classification (0, 1) performance of a classifier: a true positive rate of 1 and false positive rate of 0 (Fawcett, 2006). Furthermore, this trend is observed in the A. gambiae data set by the AUC value of 0.896 when using 50 AgOrs compared to an AUC of 0.855 with 3 AgOrs. With the D. melanogaster data set, a full complement of 24 DmOrs produced an AUC of 0.874 compared to an AUC of 0.706 when using only 3 DmOrs. As discussed earlier, the A. gambiae data set outperformed the D. melanogaster data set in terms of prediction (%), and as seen by the ROC plots and AUC values in Figure 9, the classification of the MLP system using the A. gambiae data presents better average performance.

5  Conclusion

In this letter, we have investigated the use of Anopheles gambiae olfactory receptor neurons (ORNs) as a sensory array for an olfactory biosensor. We employed an artificial neural network (ANN) of the multilayer perceptron (MLP) architecture to analyze A. gambiae ORN responses to volatile chemical odors for the classification of unknown odorants into their respective chemical classes. We also investigate the performance difference between A. gambiae odorants receptors (AgOrs) and Drosophila melanogaster odorant receptors (DmOrs), which we have explored in previous work.

A linear predictor was initially used to test the nature of the data, and the system failed to adequately classify odorants (see Figure 5), suggesting that a nonlinear MLP system would be more suitable. A number of MLP architectures were used in this work. In addition, the optimal sizes of these MLP systems were found by incrementally adjusting the number of hidden neurons. The different MLP systems on average classified the complete validation set and presented minor performance differences (see Figure 6). Because the performances of these MLP systems were very similar, we tested all of them for subsequent simulations.

The analysis of Ors is a lengthy and demanding process (Hallem et al., 2004); thus, we investigated the effect of reducing the number of A. gambiae Ors (AgOrs) on the classification performance of the MLP. An investigation of metal oxide sensors by Nowotny et al. (2013) revealed the prospect of fewer sensors outperforming larger sensory arrays, wherein a model that uses a maximal number of sensors does not always necessarily yield the best results. We were subsequently able to determine the best-performing AgOr combination of different Or lengths and their corresponding optimal MLP architecture (see Figure 7 and Table 3). At 5 to 30 Ors, a single hidden layer was the preferred architecture, while a double hidden layer was selected when using 3 Ors and 40 Ors. In addition, the performance of a single MLP and a hybrid MLP varied as the number of Ors changed: using 3 Ors required a single MLP with a double hidden layer, whereas using 40 Ors required a hybrid MLP system with a double hidden layer. The results suggest that the selected Ors supplemented MLP classification by their nature to detect the presence of odorant classes but not specifically identifying or discriminating distinct odorants (Hallem et al., 2006). Using 30 Ors presented the best performance wherein the MLP system on average classified the complete validation set with a mean of 56.31% across the validation set, a worst possible case of 90%, and a best possible case of 100%. The general trend observed was that as fewer Ors are used, the classification performance of the MLP decreased.

In our previous work (Bachtiar et al., 2013), we utilized the Hallem and Carlson (2006) data set of DmOr responses to odorants and successfully identified DmOrs that elicited high classification performance of unknown odors. The data used in this work were obtained from a study by Carey et al. (2010) that is modeled from and follows the experimental procedure of Hallem and Carlson (2006). By employing a common odorant data set, we were able to compare the MLP classification performance of the two insect receptor repertoires. When using only three Ors, the AgOrs correctly classified the complete validation set, whereas the DmOrs classified on average 60% of the vectors with a worst possible case of 60% and best possible case of 80%. This appears to correspond to the study by Nowotny et al. (2013) in which fewer sensors can possibly outperform larger sensory arrays. Furthermore, when using the complete 24 DmOr array, we found that the best 24 AgOr combination performed equally well, correctly classifying the complete validation set—100% classification with best and worst possible cases. As a result, we found that MLP classification with the AgOr data set outperformed the DmOr data set. We observed with interest that among the three AgOrs identified as the best-performing combination of three receptors, two (AgOr57 and AgOr38) are among the four most broadly tuned of the entire AgOr repertoire (Carey et al., 2010). By contrast, the third receptor of this trio, AgOr67, responds to very few of the tested odorants (Carey et al., 2010). It should be noted that these particular receptors were selected to enhance the classification performance of the MLP system. The combination of their tuning breadths provides separation of odorant classes for the MLP system to distinguish between the chemical classes, which allows for the classification of unknown odors. Consequently, the receptors identified (AgOr38, AgOr57, and AgOr67) may not necessarily be the best ones for in vitro biological studies as these studies prioritize receptor responses regardless of specificity (Carey et al., 2010; Hallem et al., 2004).

A study by Fonollosa, Gutierrez-Galvez, and Marco (2012) also investigated the role of different and diverse types of Ors for odor discrimination. The rat Ors (RtOrs) were grouped according to their receptive range (RR), which is the ratio between the number of anolytes at which the receptor shows a positive response and the total number of odorants available (Fonollosa et al., 2012). Fonollosa et al. examined different combinations of RtOrs, which were chosen according to their respective RR value, and assessed their odorant encoding capabilities based on mutual information (MI). They illustrate the correlation found among narrow and broadly tuned RtOrs; less correlation with narrowly tuned RtOrs, whereas broadly tuned RtOrs responded to a larger odor space and a larger number of odorants. The data set used by Fonollosa et al. (2012) allowed for the mapping of RtOr activity in the rat olfactory bulb (OB), which provides an excellent illustration of RR range across the OB: less selective RtOrs were found to be grouped in the medial-caudal and lateral caudal regions and selective receptors in the ventral region. The use of RR to appears to be very effective as it provides a reliable measure of odor coding performance of the different RtOrs tested.

In this work, we use the recordings of Or responses from the A. gambiae and D. melanogaster to train an MLP to assign unknown into their correct chemical class. For that reason, unlike the study by Fonollosa et al. (2012), our aim is to investigate the different combinations of AgOrs and DmOrs that enhance the classification performance of the MLP system. The emphasis of our study toward the classifier’s performance separates our findings from Fonollosa et al.’s more biologically directed study. Thus, by using fewer insect Ors, we present an efficient olfactory sensory array with a robust discriminatory power for identifying a range of chemical classes of volatile odorant chemicals, an essential characteristic for an olfactory biosensor (Wilson & Baietto, 2009). Utilizing a sensory array with a small number of Ors is desirable due to the lengthy and demanding process of constructing and developing recordable Ors (Hallem et al., 2004). By identifying the Ors that enhanced the performance of the MLP for the identification of odors based on their chemical classes, we are able to assist in the overall development of the sensory system of the olfactory biosensor. Overall, we have found that designing an olfactory biosensor must involve close examination of the insect Ors that are to be used in the sensory array, and we have found that A. gambiae Ors together with an optimal MLP system can provide accurate odorant classification.

Acknowledgments

This work was supported by New Zealand’s New Economy Research Fund C06X0701. We thank John Carlson and Allison Carey from the Department of Molecular, Cellular and Developmental Biology, Yale University, for kindly providing the data and their helpful discussions in the preparation of this manuscript.

References

Acevedo
,
F.
,
Maldonado
,
S.
,
Dominguez
,
E.
,
Narvaez
,
A.
, &
Lopez
,
F.
(
2007
).
Probabilistic support vector machines for multi-class alcohol identification
.
Sensors and Actuators B: Chemical
,
122
(
1
),
227
235
.
Anderson
,
J. R.
,
Michalski
,
R. S.
,
Michalski
,
R. S.
,
Carbonell
,
J. G.
, &
Mitchell
,
T. M.
(
1986
).
Machine learning: An artificial intelligence approach
.
San Francisco
:
Morgan Kaufmann
.
Bachtiar
,
L. R.
,
Unsworth
,
C. P.
, &
Newcomb
,
R. D.
(
2013
).
Application of artificial neural networks on mosquito olfactory receptor neurons for an olfactory biosensor
. In
Proceedings of the Engineering in Medicine and Biology Society (EMBC), 2013 35th Annual International Conference of the IEEE
(pp.
5390
5393
).
Piscataway, NJ
:
IEEE
.
Bachtiar
,
L. R.
,
Unsworth
,
C. P.
, &
Newcomb
,
R. D.
(
2014a
).
“Super E-noses”: Multi-layer perceptron classification of volatile odorants from the firing rates of cross-species olfactory receptor arrays
. In
Proceedings of the Engineering in Medicine and Biology Society (EMBC), 2014 36th Annual International Conference of the IEEE
.
Piscataway, NJ
:
IEEE
.
Bachtiar
,
L. R.
,
Unsworth
,
C. P.
, &
Newcomb
,
R. D.
(
2014b
).
Artificial neural network prediction of specific VOCs and blended VOCs for various concentrations from the olfactory receptor firing rates of Drosophila melanogaster
. In
Proceedings of the Engineering in Medicine and Biology Society (EMBC), 2014 36th Annual International Conference of the IEEE
.
Piscataway, NJ
:
IEEE
.
Bachtiar
,
L. R.
,
Unsworth
,
C. P.
,
Newcomb
,
R. D.
, &
Crampin
,
E. J.
(
2011a
).
Predicting odorant chemical class from odorant descriptor values with an assembly of multi-layer perceptrons
. In
Proceedings of the Engineering in Medicine and Biology Society, EMBC, 2011 Annual International Conference of the IEEE
(pp.
2756
2759
).
Piscataway, NJ
:
IEEE
.
Bachtiar
,
L. R.
,
Unsworth
,
C. P.
,
Newcomb
,
R. D.
, &
Crampin
,
E. J.
(
2011b
).
Using artificial neural networks to classify unknown volatile chemicals from the firings of insect olfactory sensory neurons
. In
Proceedings of the Engineering in Medicine and Biology Society, EMBC, 2011 Annual International Conference of the IEEE
(pp. 
2752
2755
).
Piscataway, NJ
:
IEEE
.
Bachtiar
,
L. R.
,
Unsworth
,
C. P.
,
Newcomb
,
R. D.
, &
Crampin
,
E. J.
(
2013
).
Multilayer perceptron classification of unknown volatile chemicals from the firing rates of insect olfactory sensory neurons and its application to biosensor design
.
Neural Computation
,
25
(
1
),
259
287
.
Baldwin
,
E. A.
,
Bai
,
J.
,
Plotto
,
A.
, &
Dea
,
S.
(
2011
).
Electronic noses and tongues: Applications for the food and pharmaceutical industries
.
Sensors
,
11
(
5
),
4744
4766
.
Benton
,
R.
,
Sachse
,
S.
,
Michnick
,
S. W.
, &
Vosshall
,
L. B.
(
2006
).
Atypical membrane topology and heteromeric function of Drosophila odorant receptors in vivo
.
PLoS Biology
,
4
(
2
),
e20
.
Berrueta
,
L. A.
,
Alonso-Salces
,
R. M.
, &
Héberger
,
K.
(
2007
).
Supervised pattern recognition in food analysis
.
Journal of Chromatography–A
,
1158
(
1
),
196
214
.
Beshel
,
J.
, &
Zhong
,
Y.
(
2013
).
Graded encoding of food odor value in the Drosophila brain
.
Journal of Neuroscience, 33
(40),
15693
15704
. doi:10.1523/JNEUROSCI.2605-13.2013; 10.1523/JNEUROSCI.2605-13.2013
Bishop
,
C. M.
(
1994
).
Novelty detection and neural network validation
.
Vision, Image and Signal Processing, IEE Proceedings
,
141
(
4
),
217
222
.
Boyle
,
S. M.
,
McInally
,
S.
, &
Ray
,
A.
(
2013
).
Expanding the olfactory code by in silico decoding of odor-receptor chemical space
.
eLife, 2
,
e01120
. doi:10.7554/eLife.01120; 10.7554/eLife.01120
Carey
,
A. F.
, &
Carlson
,
J. R.
(
2011
).
Insect olfaction from model systems to disease control
.
Proc. Natl. Acad. Sci. USA
,
108
(
32
),
12987
12995
. doi:1103472108 [pii] 10.1073/pnas.1103472108
Carey
,
A. F.
,
Wang
,
G.
,
Su
,
C.
,
Zwiebel
,
L. J.
, &
Carlson
,
J. R.
(
2010
).
Odorant reception in the malaria mosquito Anopheles gambiae
.
Nature
,
464
(
7285
),
66
71
.
Carlson
,
J. R.
(
1996
).
Olfaction in drosophila: From odor to behavior
.
Trends in Genetics
,
12
(
5
),
175
180
.
Churchland
,
P. S.
, &
Sejnowski
,
T. J.
(
1992
).
The computational brain
.
Cambridge, MA
:
MIT Press
.
Clyne
,
P.
,
Grant
,
A.
,
O’Connell
,
R.
, &
Carlson
,
J. R.
(
1997
).
Odorant response of individual sensilla on the Drosophila antenna
.
Invertebrate Neuroscience
,
3
(
2–3
),
127
135
.
Cortes
,
C.
, &
Vapnik
,
V.
(
1995
).
Support-vector networks
.
Machine Learning
,
20
(
3
),
273
297
.
Couto
,
A.
,
Alenius
,
M.
, &
Dickson
,
B. J.
(
2005
).
Molecular, anatomical, and functional organization of the drosophila olfactory system
.
Current Biology
,
15
(
17
),
1535
1547
.
de Bruyne
,
M.
,
Clyne
,
P. J.
, &
Carlson
,
J. R.
(
1999
).
Odor coding in a model olfactory organ: TheDrosophila maxillary palp
.
Journal of Neuroscience
,
19
(
11
),
4520
4532
.
de Bruyne
,
M.
,
Foster
,
K.
, &
Carlson
,
J. R.
(
2001
).
Odor coding in the Drosophila antenna
.
Neuron
,
30
(
2
),
537
552
.
Distante
,
C.
,
Ancona
,
N.
, &
Siciliano
,
P.
(
2003
).
Support vector machines for olfactory signals recognition
.
Sensors and Actuators B: Chemical
,
88
(
1
),
30
39
.
Dobritsa
,
A. A.
,
van der Goes van Naters
,
W.
,
Warr
,
C. G.
,
Steinbrecht
,
R. A.
, &
Carlson
,
J. R.
(
2003
).
Integrating the molecular and cellular basis of odor coding in the Drosophila antenna
.
Neuron
,
37
(
5
),
827
841
.
Efron
,
B.
(
1983
).
Estimating the error rate of a prediction rule: Improvement on cross-validation
.
Journal of the American Statistical Association
,
316
331
.
Efron
,
B.
, &
Tibshirani
,
R.
(
1993
).
An introduction to the bootstrap
.
Boca Raton, FL
:
Chapman & Hall/CRC
.
Egan
,
J. P.
(
1975
).
Signal detection theory and ROC analysis
.
New York
:
Academic Press
.
Fawcett
,
T.
(
2006
).
An introduction to ROC analysis
.
Pattern Recognition Letters
,
27
(
8
),
861
874
.
Fonollosa
,
J.
,
Gutierrez-Galvez
,
A.
, &
Marco
,
S.
(
2012
).
Quality coding by neural populations in the early olfactory pathway: Analysis using information theory and lessons for artificial olfactory systems
.
PloS One
,
7
(
6
).
Galán
,
R. F.
,
Sachse
,
S.
,
Galizia
,
C. G.
, &
Herz
,
A. V.
(
2004
).
Odor-driven attractor dynamics in the antennal lobe allow for simple and rapid olfactory pattern classification
.
Neural Computation
,
16
(
5
),
999
1012
.
Gardner
,
J. W.
,
Hines
,
E. L.
, &
Wilkinson
,
M.
(
1990
).
Application of artificial neural networks to an electronic olfactory system
.
Measurement Science and Technology
,
1
,
446
451
.
Gutierrez-Osuna
,
R.
(
2002
).
Pattern analysis for machine olfaction: A review
.
Sensors Journal, IEEE
,
2
(
3
),
189
202
.
Hall
,
M. A.
, &
Smith
,
L. A.
(
1997
).
Feature subset selection: A correlation based filter approach
. In
Proceedings of the 1997 International Conference on Neural Information Processing and Intelligent Information Systems
(pp.
855
858
).
New York
:
Springer
.
Hallem
,
E. A.
, &
Carlson
,
J. R.
(
2006
).
Coding of odors by a receptor repertoire
.
Cell
,
125
(
1
),
143
160
.
Hallem
,
E. A.
,
Dahanukar
,
A.
, &
Carlson
,
J. R.
(
2006
).
Insect odor and taste receptors
.
Annu. Rev. Entomol.
,
51
,
113
135
.
Hallem
,
E. A.
,
Ho
,
M. G.
, &
Carlson
,
J. R.
(
2004
).
The molecular basis of odor coding in the Drosophila antenna
.
Cell
,
117
(
7
),
965
979
.
Harrell
,
F. E.
(
2001
).
Regression modeling strategies: With applications to linear models, logistic regression, and survival analysis
.
Berlin
:
Springer-Verlag
.
Hsu
,
C.
, &
Lin
,
C.
(
2002
).
A comparison of methods for multiclass support vector machines
.
IEEE Transactions on Neural Networks
,
13
(
2
),
415
425
.
Huang
,
S.
,
Tan
,
K. K.
, &
Tang
,
K. Z.
(
2004
).
Neural network control: Theory and applications
.
Baldock, Hertfordshire, England
:
Research Studies Press
.
Huerta
,
R.
, &
Nowotny
,
T.
(
2009
).
Fast and robust learning by reinforcement signals: Explorations in the insect brain
.
Neural Computation
,
21
(
8
),
2123
2151
.
Huerta
,
R.
,
Nowotny
,
T.
,
García-Sanchez
,
M.
,
Abarbanel
,
H. D.
, &
Rabinovich
,
M. I.
(
2004
).
Learning classification in the olfactory system of insects
.
Neural Computation
,
16
(
8
),
1601
1640
.
Huerta
,
R.
,
Vembu
,
S.
,
Amigó
,
J. M.
,
Nowotny
,
T.
, &
Elkan
,
C.
(
2012
).
Inhibition in multiclass classification
.
Neural Computation
,
24
(
9
),
2473
2507
.
Inscent
. (
2006
).
The insect chemosensory system
. http://www.inscent.com/chemosensory_system.php
Jain
,
A. K.
,
Mao
,
J.
, &
Mohiuddin
,
K. M.
(
1996
).
Artifcial neural networks: A tutorial
.
Computer
,
29
(
3
),
31
44
.
Kavzoglu
,
T.
(
1999
).
Determining optimum structure for artificial neural networks
. In
Proceedings of the 25th Annual Technical Conference and Exhibition of the Remote Sensing Society
(pp.
675
682
).
N.p
.
Laurent
,
G.
,
Stopfer
,
M.
,
Friedrich
,
R. W.
,
Rabinovich
,
M. I.
,
Volkovskii
,
A.
, &
Abarbanel
,
H. D.
(
2001
).
Odor encoding as an active, dynamical process: Experiments, computation, and theory
.
Annual Review of Neuroscience
,
24
(
1
),
263
297
.
Leal
,
W. S.
(
2013
).
Odorant reception in insects: Roles of receptors, binding proteins, and degrading enzymes
.
Annual Review of Entomology
,
58
,
373
391
.
Li
,
C.
,
Heinemann
,
P.
, &
Sherry
,
R.
(
2007
).
Neural network and Bayesian network fusion models to fuse electronic nose and surface acoustic wave sensor data for apple defect detection
.
Sensors and Actuators B: Chemical
,
125
(
1
),
301
310
.
Longstaff
,
I. D.
, &
Cross
,
J. F.
(
1987
).
A pattern recognition approach to understanding the multi-layer perception
.
Pattern Recognition Letters
,
5
(
5
),
315
319
.
Lu
,
T.
,
Qiu
,
Y. T.
,
Wang
,
G.
,
Kwon
,
J. Y.
,
Rutzler
,
M.
,
Kwon
,
H.
, … 
Carlson
,
J. R.
(
2007
).
Odor coding in the maxillary palp of the malaria vector mosquito Anopheles gambiae
.
Current Biology
,
17
(
18
),
1533
1544
.
Marco
,
S.
, &
Gutierrez-Gálvez
,
A.
(
2009
).
Recent developments in the application of biologically inspired computation to chemical sensing
. In
Olfaction and Electronic Nose: Proceedings of the 13th International Symposium on Olfaction and Electronic Nose
(pp.
151
154
).
Piscataway, NJ
:
IEEE
.
Marco
,
S.
, &
Gutiérrez-Gálvez
,
A.
(
2012
).
Signal and data processing for machine olfaction and chemical sensing: A review
.
IEEE Sensors Journal
,
12
(
11
),
3189
3214
.
Massart
,
D. L.
,
Vandeginste
,
B.
,
Buydens
,
L.
,
De Jong
,
S.
,
Lewi
,
P.
, &
Smeyers-Verbeke
,
J.
(
1997
).
Handbook of chemometrics and qualimetrics: Part A
.
Dordrecht
:
Elsevier
.
MathWorks
. (
2014
).
Compute receiver operating characteristic (ROC) curve or other performance curve for classifier output
. http://www.mathworks.com.au/help/stats/perfcurve.html
Mazzatorta
,
P.
,
Benfenati
,
E.
,
Lorenzini
,
P.
, &
Vighi
,
M.
(
2004
).
QSAR in ecotoxicity: An overview of modern classification techniques
.
Journal of Chemical Information and Computer Sciences
,
44
(
1
),
105
112
.
Mboera
,
L.
, &
Takken
,
W.
(
1997
).
Carbon dioxide chemotropism in mosquitoes (diptera: Culicidae) and its potential in vector surveillance and management programmes
.
Rev. Med. Vet. Entomol.
,
85
(
4
),
355
368
.
McCulloch
,
W. S.
, &
Pitts
,
W.
(
1943
).
A logical calculus of the ideas immanent in nervous activity
.
Bulletin of Mathematical Biology
,
5
(
4
),
115
133
.
Nowotny
,
T.
,
Berna
,
A. Z.
,
Binions
,
R.
, &
Trowell
,
S.
(
2013
).
Optimal feature selection for classifying a large set of chemicals using metal oxide sensors
.
Sensors and Actuators B: Chemical
,
187
,
471
480
.
Pardo
,
M.
, &
Sberveglieri
,
G.
(
2005
).
Classification of electronic nose data with support vector machines
.
Sensors and Actuators B: Chemical
,
107
(
2
),
730
737
.
Phaisangittisagul
,
E.
, &
Nagle
,
H. T.
(
2008
).
Sensor selection for machine olfaction based on transient feature extraction
.
IEEE Transactions on Instrumentation and Measurement
,
57
(
2
),
369
378
.
Poggio
,
T.
, &
Girosi
,
F.
(
1990
).
Networks for approximation and learning
.
Networks for Approximation and Learning
,
78
(
9
),
1481
1497
.
Raman
,
B.
,
Stopfer
,
M.
, &
Semancik
,
S.
(
2011
).
Mimicking biological design and computing principles in artificial olfaction
.
ACS Chemical Neuroscience
,
2
(
9
),
487
499
.
Rosenblatt
,
F.
(
1958
).
The perceptron: A probabilistic model for information storage and organization in the brain
,
Psychological Review, 65
(6),
386
408
.
Sato
,
K.
,
Pellegrino
,
M.
,
Nakagawa
,
T.
,
Nakagawa
,
T.
,
Vosshall
,
L. B.
, &
Touhara
,
K.
(
2008
).
Insect olfactory receptors are heteromeric ligand-gated ion channels
.
Nature
,
452
(
7190
),
1002
1006
.
Smart
,
R.
,
Kiely
,
A.
,
Beale
,
M.
,
Vargas
,
E.
,
Carraher
,
C.
,
Kralicek
,
A. V.
, … 
Warr
,
C. G.
(
2008
).
Drosophila odorant receptors are novel seven transmembrane domain proteins that can signal independently of heterotrimeric G proteins
.
Insect Biochemistry and Molecular Biology
,
38
(
8
),
770
780
.
Sola
,
J.
, &
Sevilla
,
J.
(
1997
).
Importance of input data normalization for the application of neural networks to complex industrial problems
.
IEEE Transactions on Nuclear Science
,
44
(
3
),
1464
1468
.
Su
,
C.
,
Menuz
,
K.
, &
Carlson
,
J. R.
(
2009
).
Olfactory perception: Receptors, cells, and circuits
.
Cell
,
139
(
1
),
45
59
.
Swets
,
J. A.
(
1988
).
Measuring the accuracy of diagnostic systems
.
Science
,
240
(
4857
),
1285
1293
.
Takken
,
W.
, &
Knols
,
B. G.
(
1999
).
Odor-mediated behavior of afrotropical malaria mosquitoes
.
Annu Rev Entomol, 44
,
131
157
. doi:10.1146/annurev.ento.44.1.131
Tauxe
,
G. M.
,
MacWilliam
,
D.
,
Boyle
,
S. M.
,
Guda
,
T.
, &
Ray
,
A.
(
2013
).
Targeting a dual detector of skin and CO(2) to modify mosquito host seeking
.
Cell
,
155
(
6
),
1365
1379
.
Todeschini
,
R.
,
Ballabio
,
D.
,
Consonni
,
V.
,
Mauri
,
A.
, &
Pavan
,
M.
(
2007
).
CAIMAN (classification and influence matrix analysis): A new approach to the classification based on leverage-scaled functions
.
Chemometrics and Intelligent Laboratory Systems
,
87
(
1
),
3
17
.
Unsworth
,
C. P.
,
Bachtiar
,
L. R.
,
Newcomb
,
R. D.
, &
Crampin
,
E. J.
(
2011
).
The Cybernose Project: Predicting the sense of smell. Paper presented at the Annual Joint Symposium on Neural Computation
,
San Diego, CA
.
van den Broek
,
I. V. F.
, &
den Otter
,
C. J.
(
1999
).
Olfactory sensitivities of mosquitoes with different host preferences (Anopheles Gambiae s.s., An. arabiensis, An. quadriannulatus, An. m. atroparvus) to synthetic host odours
.
Journal of Insect Physiology
,
45
(
11
),
1001
1010
.
Weston
,
J.
, &
Watkins
,
C.
(
1998
).
Multi-class support vector machines
(Technical Report CSD-TR-98-04).
London
:
University of London
.
Wicher
,
D.
,
Schäfer
,
R.
,
Bauernfeind
,
R.
,
Stensmyr
,
M. C.
,
Heller
,
R.
,
Heinemann
,
S. H.
, &
Hansson
,
B. S.
(
2008
).
Drosophila odorant receptors are both ligand-gated and cyclic-nucleotide-activated cation channels
.
Nature
,
452
(
7190
),
1007
1011
.
Wilson
,
A. D.
, &
Baietto
,
M.
(
2009
).
Applications and advances in electronic-nose technologies
.
Sensors
,
9
(
7
),
5099
5148
.
Wilson
,
R. I.
(
2013
).
Early olfactory processing in Drosophila: Mechanisms and principles
.
Annual Review of Neuroscience, 36
(1),
217
241
. doi:10.1146/annurev-neuro-062111-150533
Yongwei
,
W.
,
Wang
,
J.
,
Zhou
,
B.
, &
Lu
,
Q.
(
2009
).
Monitoring storage time and quality attribute of egg based on electronic nose
.
Analytica Chimica Acta
,
650
(
2
),
183
188
.
Zwiebel
,
L. J.
, &
Takken
,
W.
(
2004
).
Olfactory regulation of mosquito-host interactions
.
Insect Biochem. Mol. Biol., 34
(7),
645
652
. doi:10.1016/j.ibmb.2004.03.017S0965174804000682