Recent work suggests that changing convolutional neural network (CNN) architecture by introducing a bottleneck in the second layer can yield changes in learned function. To understand this relationship fully requires a way of quantitatively comparing trained networks. The fields of electrophysiology and psychophysics have developed a wealth of methods for characterizing visual systems that permit such comparisons. Inspired by these methods, we propose an approach to obtaining spatial and color tuning curves for convolutional neurons that can be used to classify cells in terms of their spatial and color opponency. We perform these classifications for a range of CNNs with different depths and bottleneck widths. Our key finding is that networks with a bottleneck show a strong functional organization: almost all cells in the bottleneck layer become both spatially and color opponent, and cells in the layer following the bottleneck become nonopponent. The color tuning data can further be used to form a rich understanding of how color a network encodes color. As a concrete demonstration, we show that shallower networks without a bottleneck learn a complex nonlinear color system, whereas deeper networks with tight bottlenecks learn a simple channel opponent code in the bottleneck layer. We develop a method of obtaining a hue sensitivity curve for a trained CNN that enables high-level insights that complement the low-level findings from the color tuning data. We go on to train a series of networks under different conditions to ascertain the robustness of the discussed results. Ultimately our methods and findings coalesce with prior art, strengthening our ability to interpret trained CNNs and furthering our understanding of the connection between architecture and learned representation. Trained models and code for all experiments are available at https://github.com/ecs-vlc/opponency.

1  Introduction

The tendency for learning machines to exhibit oriented-edge receptive fields, similar to those found in nature, has long been observed (Bell & Sejnowski, 1997; Krizhevsky, Sutskever, & Hinton, 2012; Lehky & Sejnowski, 1988; Lindsey, Ocko, Ganguli, & Deny, 2019; Olah et al., 2020a; Olshausen & Field, 1996; Shan, Zhang, & Cottrell, 2007; Wang, Cottrell, & Kanan, 2015). However, learning machines rarely exhibit the functional organization found in nature. In convolutional neural networks, we typically find oriented-edge receptive fields in early layers, rather than a progression from center-surround receptive fields to oriented-edge receptive fields as is common in biological vision (Hubel & Wiesel, 2004). In an important work, Lindsey et al. (2019) demonstrate that the addition of a bottleneck to a deep convolutional network can induce center-surround receptive fields, suggesting a causal link between anatomical constraints and the nature of learned visual processing. In order to refine our understanding of this causal relationship, we pursue an electrophysiological interpretation of convolutional networks that incorporates opponency and color tuning.

Cells with center-surround and oriented edge receptive fields are spatially opponent. From the classic work of Kuffler (1953), Hubel and Wiesel (1962, 2004), and others summarized in Troy and Shou (2002) and Martinez and Alonso (2003), these neurons form the building blocks of feature extraction in the primary visual cortex. Formally, a neuron that is excited by a particular stimulus and inhibited by another in the same stimulus space is said to be opponent to that space. For example, if a neuron is excited by a stimulus in some part of the visual field and inhibited in another, it is spatially opponent. Alternatively, if a neuron is excited by stimulus of a certain wavelength and inhibited by a stimulus of another, it is spectrally opponent. Spectral opponency, first hinted at by the complementary color system from Goethe (1840) and later detailed by Hering (1920), was observed and characterized at a cellular level only around 1960 (Daw, 1967; De Valois, Smith, Kitai, & Karoly, 1958; Naka & Rushton, 1966; Wagner, MacNichol, & Wolbarsht, 1960; Wiesel & Hubel, 1966). Combined, the theories of spatially opponent feature extraction in the visual cortex (Hubel & Wiesel, 1962, 2004; Kuffler, 1953; Troy & Shou, 2002), trichromacy (Helmholtz, 1852; Maxwell, 1860; Young, 1802), and spectral opponency (De Valois, Abramov, & Jacobs, 1966) constitute a deep understanding of the early layers of visual processing in nature.

The notional elegance of the above theories has served to motivate much of the progress made in computer vision, most notably including the development of multilayer (deep) convolutional neural networks (CNNs) (Bottou et al., 1994; Le Cun & Bengio, 1995; Le Cun et al., 1990) that are now so focal in our collective interests. Multilayer CNNs are learning models designed to mimic the functional properties, namely, spatial feature extraction and retinotopy, of the retina, lateral geniculate nucleus (LGN), and primary visual cortex. By virtue of the ease with which one can train such models, multilayer CNNs offer a unique opportunity to study the emergence of visual phenomena across the full gamut of constraints and conditions of interest. It is widely observed that trained convolutional neurons experience the same kinds of receptive fields as those found in nature and that the learned features become successively more abstract with depth (Krizhevsky et al., 2012; Olah et al., 2020a; Olah, Mordvintsev, & Schubert, 2017; Zeiler & Fergus, 2014). However, we do not typically see structural organization of these cell types. For example, edge and color information is confounded in the first layer of ZFNet (Zeiler & Fergus, 2014), with some color information also encoded in the second layer. Furthermore, as Lindsey et al. (2019) addressed, none of the convolutional neurons have center-surround receptive fields of the kind observed in retinal ganglion cells. Rafegas and Vanrell (2018) analyzed color selectivity in a deep CNN, finding cells that are excited by two groups of stimuli that are roughly opposite in hue. To classify these cells as opponent would additionally require an understanding of the stimuli that inhibit each cell. There has been some exploration of the role of inhibition in deep CNNs (Olah et al., 2018), although we are not aware of any demonstration that learned convolutional cells are ever truly opponent in the sense that they are both inhibited below and excited above a baseline by some stimuli.

With the exception of recent developments in metalearning (Tan & Le, 2019; Zoph & Le, 2017), new convolutional architectures are typically designed with the aim of increasing either width (Zagoruyko & Komodakis, 2016) or depth (He, Zhang, Ren, & Sun, 2016; Szegedy et al., 2015) while preventing the vanishing gradient problem with auxiliary losses (Szegedy et al., 2015), skip connections (He et al., 2016), dense connections (Huang, Liu, Van Der Maaten, & Weinberger, 2017), or stochastic depth (Huang, Sun, Liu, Sedra, & Weinberger, 2016), to name a few. However, the finding by Lindsey et al. (2019) that network architecture can affect the fundamental type of function that is learned (rather than simply affecting capacity) suggests a new approach to both architecture design and interpretability. Specifically, if we can improve our understanding of the bias introduced by the network architecture, we may be able to design new architectures with specific goals in mind or better interpret the performance of preexisting ones.

Clearly, research in this space has the potential to affect our understanding of both deep learning and the neuroscience of vision. In order to realize this potential, large-scale studies are needed that properly establish the connections between the model architecture, the data space, and the kind of visual processing that is learned. Lindsey et al. (2019) mainly rely on qualitative assessment for the identification of center-surround and oriented edge receptive fields, but do propose some quantitative analyses such as the variance in gradient with respect to different inputs as a measure of the linearity of the neuron. The highly detailed analyses of Olah et al. (2020b) give a comprehensive understanding of the function of particular neurons or circuits in deep networks; however, each functional unit or group is currently identified manually. The procedure that Rafegas and Vanrell (2018) proposed could be automated but involves the costly process of determining the image patches that most excite each cell. The Brain-Score project from Schrimpf et al. (2018) is an attempt at providing an assessment of the similarity between a given network and various neural and behavioral recordings from primates. This is uninformative in the sense that it does not provide any information regarding precisely how the function of the network is similar to that of the primate visual system. The same could be said of the work of Gomez-Villa, Martin, Vazquez-Corral, and Bertalmio (2019), who find evidence that CNNs are susceptible to the same visual illusions as those that fool human observers.

In this letter, we develop a framework for automatically classifying convolutional cells in terms of their spatial and color opponency, based on electrophysiological definitions from the neuroscience literature. In addition, we propose a method of obtaining a hue sensitivity curve for a given network, inspired by similar methods in psychophysics. Combined, these approaches provide a descriptor of the functions learned by CNNs that provides rich insight into how they encode information. We apply our framework on a color variant of the model from Lindsey et al. (2019) and demonstrate that following the introduction of a bottleneck, different cell types tend to be organized according to their depth in the network, with no such organization found in networks without a bottleneck. We detail the relationship of data, architecture, and learned representation through a series of control experiments. In total, we have trained 2490 models over nine different settings, all of which have been made publicly available, alongside code for all of our experiments, via PyTorch-Hub at https://github.com/ecs-vlc/opponency.

2  The Physiology and Psychophysics of Early Color Vision

Since the advent of Holmgren's electrophysiology experiments in 1866, which first showed the flow of electrical current in the retina, vision scientists have sought to understand the cellular mechanisms that allow us to see. In a series of articles, Adrian and Matthews (1927a, 1927b, 1928) started to explore the electrical response of the retina to light. In later experiments, practitioners explored how single cells respond to different stimuli; for example, Hartline's (1938) early measurements of the response of single optic nerve fibers to illumination, Barlow's “fly detectors,” Lettvin, Maturana, McCulloch, and Pitts's (1959) “bug perceivers,” and Hubel and Wiesel's (1962) classic experiments in understanding receptive field structure. As a consequence of these experiments, a number of different observations and subsequent classifications of the behavioral characteristics of single cells and cell populations have been made. Although many of these classifications have been disproved or disputed, a number have stood the test of time. In particular, there is now a good shared understanding of how cells in the early parts of the visual systems of a range of primates respond to different spatial and spectral stimuli. This understanding covers the main pathway from the retina, through the lateral geniculate nucleus (LGN) and into early parts of the visual cortex (e.g., V1 and V2). In the following sections we highlight the key findings from previous physiological studies that directly relate to the work presented in this letter.

2.1  Spatial Opponency in Cells

Following Adrian and Matthews (1928), and Hartline (1938, 1940) discovered evidence for different types of cellular behavior to stimuli, and in particular found that inhibitory interactions were sometimes revealed when multiple receptors were excited (Hartline, Wagner, & Macnichol, 1952). Kuffler (1953) and Barlow (1953) investigated this finding further and discovered cells with spatial receptive fields that are opponent to each other. These early results, obtained by presenting spots of light to different parts of the receptive field, showed an antagonism (opponency) between an inner center and outer surround. It is now widely accepted that such center-surround cells can be found in the retina and LGN (Hubel & Wiesel, 2004). In contrast, the majority of cells in V1 are orientation tuned (Livingstone & Hubel, 1984). One approach to analyzing this spatial selectivity involves the presentation of drifting high-contrast sinusoidal gratings (De Valois, Albrecht, & Thorell, 1982; Johnson, Hawken, & Shapley, 2001, 2008; Lennie, Krauskopf, & Sclar, 1990; Levick & Thibos, 1982; Zhao, Chen, Liu, & Cang, 2013). For example, one can characterize orientation selectivity through presentation of gratings with fixed frequency and contrast at a range of orientations (Johnson et al., 2008; Lennie et al., 1990; Levick & Thibos, 1982; Zhao et al., 2013). Similarly, a spatial frequency tuning curve can be obtained through the use of a fixed orientation and contrast (De Valois et al., 1982; Johnson et al., 2001). These analyses again grant a notion of spatial antagonism (spatial opponency here) in the cortex, where there exists a grating configuration that excites the cell and an opponent grating configuration that inhibits the cell (Shapley & Hawken, 2011). Note that although nontypical, presentation of grating stimuli have also been used to detect center-surround organization in the retina (Bilotta & Abramov, 1989) since these are cells that are highly tuned to frequency but not orientation selective.

2.2  Color Vision and Color Opponency

With respect to color vision, the first major physiological finding relates to the discovery of two broad classes of cell that respond to color: those that exhibit opponent spectral sensitivity, and those (nonopponent) that do not. Experiments by De Valois et al. (1966) discovered spectrally opponent cells in the LGN of a trichromatic primate that are excited by particular single-wavelength stimuli and inhibited by others. Additionally, they discovered that, broadly speaking, the cells could be grouped into those that were excited by red and inhibited by green (and vice versa) and cells that were excited by blue and inhibited by yellow (and vice versa). Indeed, these cells would appear to align with Hering's unique hues (red, green, blue, and yellow) (Hering, 1920), which are unique in the sense that none of them can be viewed as a combination of the others. However, the experiments from Derrington et al. (1984) reveal that the cardinal axes of the chromatic response in the macaque LGN are not aligned to Hering's unique hues but to cone responses. The consequence of this finding is that spectrally opponent cells in early primate vision are best described as cone opponent. It has similarly been argued that so-called red/green opponency is better described as magenta/cyan and that these should be viewed as complementary colors rather than opponent (Pridmore, 2005, 2011). (For a more in-depth exposition of the contention between the physiological and psychophysical understanding of spectral opponency see Shevell & Martin, 2017.) Cells that are spectrally nonopponent have also been observed in primate LGN; these are cells that are not sensitive to specific wavelengths but respond to broad range of wavelengths in the same way (either inhibitory or excitatory) (De Valois, Smith, Karoly, & Kitai, 1958; Jacobs, 1964). In V1, it has been suggested that cells described as selective to orientation but not color by Livingstone and Hubel (1984) are in fact color opponent but with unbalanced cone inputs such that they respond to general changes in luminance (Johnson et al., 2001; Lennie et al., 1990).

More recently, techniques such as functional magnetic resonance imaging (fMRI) have been used to explore population coding of vision and color-related processes (Boynton, 2002; Engel, Zhang, & Wandell, 1997; Seymour, Williams, & Rich, 2015; Wade, Augath, Logothetis, & Wandell, 2008). In particular, studies have shown strong responses in V1 to stimuli that are preferred by spectrally opponent cells (Engel et al., 1997; Kleinschmidt, Lee, Requardt, & Frahm, 1996; Schluppeck & Engel, 2002). The work of Wade et al. (2008) validates that the early visual system of the macaque (where many of the single-cell measurements of color vision have been taken) correlates strongly with humans in terms of overall population responses to chromatic contrast; this is important to our work since we seek functional archetypes that are of general efficacy in visual intelligence. It is, however, worth noting that Wade et al. also show that in later areas of the visual pathway, the topographical organization of the macaque is fundamentally different.

Following De Valois et al.'s initial findings, there has been a realization that cells responsive to color could be further grouped into single opponent and double opponent cells. The defining characteristic of double opponent cells is that they respond strongly to color patterns but are nonresponsive or weakly responsive to full-field color stimuli (e.g., solid color across the receptive field, slow gradients, or low-frequency changes in color) (Shapley & Hawken, 2011). In the retina, double opponency presents as spectrally opponent cells with center-surround organization (Troy & Shou, 2002). In the primary visual cortex, there are both the spectrally opponent cells with oriented receptive fields mentioned above and nonoriented double opponent cells in the cytochrome oxidase rich blobs (Livingstone & Hubel, 1984). Note that one interpretation is that double opponent cells are both spatially and spectrally opponent.

2.3  Linearity of Retinal Ganglion Cell Response

There is a connection between anatomy and the relative presence of linear and nonlinear cells in the retina. For example, midget cells, which are well approximated by a linear model (Smith et al., 1992), are the most prevalent ganglion cell type in the human retina (Dacey, 1993). In contrast, the most prevalent ganglion cell type in the mouse retina is a nonlinear feature detector that is thought to act as an overhead predator detection mechanism (Zhang, Kim, Sanes, & Meister, 2012), not dissimilar to the previously noted fly detectors and bug perceivers. In their experiments with CNNs, Lindsey et al. (2019) suggest that the contrast between the anatomy of the primate and mouse visual systems can be considered in terms of network depth. The authors subsequently present evidence that the natural differences in function derive from these associated differences in visual system anatomy. In particular, deeper networks learn linear features in early layers, whereas shallower networks learn nonlinear features.

3  Opponency in Artificial Vision

The notion of a spatially opponent receptive field has a long history in computer vision. Notably, the Marr-Hildreth algorithm for edge detection (Marr & Hildreth, 1980) performs a Laplacian of gaussian (often approximated by a difference of gaussian (DoG)), which resembles the function performed by center-surround ganglion cells in the retina. Oriented-edge receptive fields were also modeled in early approaches to visual recognition. In particular, edge orientation histograms (Freeman & Roth, 1995; McConnell, 1986) and later histograms of oriented gradients (Dalal & Triggs, 2005) are similar in principle to a layer of neurons with oriented-edge receptive fields with different rotation, frequency, and phase. DoG and edge orientation assignment are also integral components of the well-known scale invariant feature transform (SIFT) descriptor (Lowe, 1999).

In addition to approaches that directly model opponent receptive fields, several studies have shown emergent opponency in learning machines. For example, Lehky and Sejnowski (1988) found evidence for orientation selectivity in a neural network trained with backpropagation to determine the curvature of simple surfaces in procedurally generated images. Olshausen and Field (1996) demonstrated the emergence of basis functions that resemble oriented receptive fields when learning an efficient sparse linear code for a set of images. Similar results are presented by Bell and Sejnowski (1997), who show that a nonlinear infomax network, which performs independent component analysis (ICA), trained on images of natural scenes, produces sets of visual filters that show orientation and spatial selectivity. Lehky and Sejnowski (1999) use a four-layer neural network to map cone responses to a population of gaussian tuning curves in CIE color space and demonstrate color opponent neurons in the hidden layers. Karklin and Lewicki (2003) propose a hierarchical probabilistic approach to learning a nonlinear efficient code. The authors demonstrate the emergence of higher-order features such as object location, scale, and texture. Alternatively, Shan et al. (2007) introduced recursive ICA, where the outputs of a previous application of ICA are transformed such that it may be reapplied. The authors again demonstrate the emergence of these higher-order features when applying their model to natural images. Wang et al. (2015) use recursive ICA to automatically learn visual features that accord with those found in the early visual cortex. The authors subsequently model the object recognition pathway using gnostic fields (Kanan, 2013, 2014), a brain-inspired model of object categorization. Wang et al. (2015) demonstrate that the features in the first ICA layer, trained on natural images, are oriented-edges with the color opponent characteristics typical of V1 neurons (dark-light, yellow-blue, red-green). The second-layer filters are sensitive to edges of different frequency and orientation, reminiscent of complex cells in V1. Cells that exhibit responses similar to simple and complex neurons in V1 can only be observed in the two ICA layers.

In this work we are primarily concerned with opponency in deep CNNs, for which some early approaches used variants of ICA to learn the filters (Le, Karpenko, Ngiam, & Ng, 2011). Modern CNNs are trained using the backpropagation algorithm, similar to the work of Lehky and Sejnowski (1988, 1999), such that the features learned are dependent on the objective function of the model. In addition, CNNs are typically constructed with many more layers of nonlinear feature extraction than the one or two layers used in ICA. As a result, CNNs permit a notion of functional organization: “what happens where” rather than just “what happens.” Due to the connections between CNNs and ICA, one might reasonably expect CNNs to exhibit emergent opponency. This is indeed the case, with multiple works pointing out that learned filters in early layers appear to be spatially and color selective (Krizhevsky et al., 2012; Lindsey et al., 2019; Olah et al., 2020a; Rafegas & Vanrell, 2018; Zeiler & Fergus, 2014).

Rafegas and Vanrell (2018) propose an automated measurement of the spectral selectivity of convolutional neurons. For their approach, the authors find image patches that maximally excite each neuron and construct an index with high values when these patches are consistent in color. The authors further suggest that a neuron is double opponent if it is selective to two distinct colors that are roughly opposite in hue. Note that these definitions of opponency are not direct correlates of the previously discussed definition. The key difference is that the electrophysiological definition requires an understanding of the stimuli that inhibit cells in addition to the stimuli that excite them. This is important since although cells that are excited by two colors may be projecting the input on an opponent axis, they may also just be activating for both colors indiscernibly. The double colour selective neurons found by Rafegas and Vanrell (2018) are typically red-cyan, blue-yellow, and magenta-green. These do not closely reflect the opponent axes of the primate LGN. This is to be expected since cone opponency observed in nature translates to channel opponency in a convolutional model, and so we can reasonably expect the opponent axes to be aligned with extreme RGB values rather than cone responses or Hering's unique hues (although note that these are a subset of the RGB extrema).

4  Methods

In this section, we detail our methodology for classifying convolutional cells according to their spatial and color processing. Generaliszing the discussed physiological definitions, to classify a cell as opponent, we require a set of stimuli, the ability to measure the response of the cell to each stimulus, and a measurement of the baseline response of the cell (in order to establish excitation and inhibition). The response of each neuron to the input is readily available in a deep network, and we define the baseline as the response of the cell to a black input (a matrix of zeros). If there exists a stimulus for which the cell is excited (responds above the baseline) and a stimulus for which the cell is inhibited (responds below the baseline), then the cell is opponent to the axis of variance of the stimuli set. We first describe the two stimuli sets that we will use for the classification of spatial and color opponency. We go on to discuss automatic classification of a cell as double opponent and how we can infer the specific type of an opponent cell. In addition, we introduce an approach for studying the hue sensitivity curve of a deep network, inspired by Bedford and Wyszecki (1958). The experiments in this section form our core results. We later perform a control study to determine how well these results extend to different settings.

4.1  Spatial Opponency

To classify spatial opponency, we require a set of stimuli that vary spatially. Following Johnson et al. (2001), we construct a set of high-contrast grayscale gratings produced from a sinusoidal function for a range of rotations, frequencies, and phases. Figure 1 gives some example stimuli generated using PsychoPy (Peirce et al., 2019) with various degrees of rotation. We compute the response of each cell to each stimulus and compare these to the baseline to obtain a tuning curve that can be used to perform an automatic classification. In addition to spatially opponent, we have spatially nonopponent and spatially unresponsive. A nonopponent cell is one that may be excited or inhibited by the stimuli but does not cross the baseline. An unresponsive cell is one that is neither excited nor inhibited by any of the stimuli. We do not use any form of tolerance when making these classifications. The reason is that each cell will activate in its own space, and so the relative effect of a fixed tolerance could vary greatly among cells. As a consequence of this design decision, it is likely that the output of any unresponsive cells lies in a clipped region of the activation function. For example, if using rectified linear units (ReLUs), the cell response may remain at zero for all of the stimuli. Such cells are either highly tuned to a particular, complex stimulus or merely unresponsive to all stimuli. That said, our interests here are primarily bound to the existence and distribution of opponent and nonopponent cells only; we are not aware of any demonstration that unresponsive cells are found in nature. Recall that although the described stimuli are oriented edges, they can still be used to infer center-surround spatial opponency since a cell with a characteristic center-surround receptive field would be highly tuned to a particular frequency and phase but responsive to a broad range of angles.
Figure 1:

Examples of grating patterns used as stimuli for the spatial opponency experiments. These samples have been generated using PsychoPy (Peirce et al., 2019), with different angles (θ), frequency of 4, and phase of 0.

Figure 1:

Examples of grating patterns used as stimuli for the spatial opponency experiments. These samples have been generated using PsychoPy (Peirce et al., 2019), with different angles (θ), frequency of 4, and phase of 0.

4.2  Color Opponency

To classify spectral opponency, De Valois et al. (1966) vary the stimuli according to wavelength. For our experiments, we propose using stimuli that vary in hue rather than wavelength. The reason for this is that the trained networks will expect an RGB input, and there is no exact mapping from wavelength to RGB. We could consider a more biologically valid color representation such as the cone response space used by Lehky and Sejnowski (1999) but opt for RGB as it is the standard practice in deep learning. We sample colors in the hue, saturation, lightness (HSL) color space for all integer hue values with saturation of 1.0 and lightness of 0.5. We then convert our stimuli to RGB before forwarding to the network and constructing the color tuning curve. We can perform classification by following the same process of comparing to the baseline as in the spatial setting. We use the terms hue opponency and color opponency interchangeably to refer to the different cell types found through this process.

4.3  Double Opponency

As discussed, we can automatically classify a cell as double opponent if it is both color and spatially opponent. Our interests here lie in whether double opponent cells emerge in convolutional networks trained with a classification objective. Note that it has been observed that most spectrally opponent cells in macaque V1 are also orientation selective (Johnson et al., 2008), that is, they are double opponent. Unlike in the single-opponent cases, we do not define a notion of double nonopponency or double unresponsiveness (although such classifications could be made if required).

4.4  Excitatory and Inhibitory Colors

Using the color tuning curve, we can further determine the hue that most excites or inhibits each cell. Since cells are typically equipped with a nonlinear activation function, there may be a wide range of stimuli for which they produce the lowest response. As such, we use the preactivation output to infer the most inhibitory stimulus. This excitation and inhibition data will allow us to plot the distribution of colors to which cells in networks are tuned. Note that this distribution is insufficient to describe the type of opponency since it does not permit an understanding of whether there are distinct classes of opponent cell. For example, the distribution of excitation and inhibition does not distinguish between two groups of cells that are red/green opponent and blue/yellow opponent, respectively, or many groups of cells that are red/green opponent, green/blue opponent, blue/red opponent and so on. One option would be to apply a clustering technique to the most excitatory and inhibitory responses. However, this would introduce additional challenges through the need for appropriate algorithm and hyperparameter choice. Instead, we can additionally study the conditional distribution of maximal excitation, given maximal inhibition by some colors in a chosen range. We suggest evaluation of these conditional distributions for the following hue ranges: red ([315,45)), yellow ([45,75)), green ([75,165)), cyan ([165,195)), blue ([195,285)), and magenta ([285,315)). By enabling direct assessment of the inhibition/excitation pairs, this will give a much deeper understanding of the kinds of opponency present in the networks being analyzed.

4.5  Hue Sensitivity

In addition to the hue tuning curve, we can consider the hue sensitivity of a network. Specifically, we look to replicate the experiments of Bedford and Wyszecki (1958), who showed that the change needed to elicit a just-noticeable difference in hue to a human observer is a complex function of wavelength. Long, Yang, and Purves (2006) further suggest that the reason for this nonuniform spectral sensitivity derives from the statistics of natural scenes, showing that the curve predicted from a data set of natural images bears a strong resemblance to that obtained for a human observer. Another way to explain the discrimination curve is in terms of cone responses (Zhaoping, Geisler, & May, 2011). This is more direct since scene statistics can be seen as indirectly controlling wavelength discrimination through evolutionary modifications of cone properties. It is expected that such a sensitivity curve, though over hue rather than wavelength, will enable a more holistic view of color tuning.

To perform a similar experiment to Bedford and Wyszecki (1958), note that the just-noticeable-difference method is inversely related to the gradient of the perceived color with respect to wavelength, which can be seen as a form of sensitivity. By virtue of automatic differentiation, it is trivial to obtain the gradient of the activation in a layer of our network with respect to the RGB input. Since the conversion from HSL to RGB is piece-wise differentiable, we can further obtain the approximate gradient of the activation with respect to hue. Note that we use the hidden-layer activation of a network rather than a notion of perceived color, so it is unclear whether these results should reflect the biological data. Furthermore, in light of the above, one might expect that the predominant features of the sensitivity curve should derive from the relative responses of the RGB channels as a function of hue.

5  Results

We now present the results for our core experiments with Retina-Net models trained on color CIFAR-10. We later perform a control study and provide an in-depth discussion of the implications of these results; our aim in this section is merely to present the core findings of this work.

5.1  Retina-Net

Since we are interested in understanding the link between architectural constraints and learned representation, we adopt the same deep convolutional model of the visual system as Lindsey et al. (2019), referred to as Retina-Net. This model, depicted in Figure 2a, consists of a model of the retina that feeds into a model of the visual cortex and ventral visual stream (VVS). The retina model consists of a pair of convolutional layers with ReLU nonlinearities. The ventral network is a stack of convolutional layers (again with ReLUs) followed by a two-layer MLP (with 1024 ReLU neurons in the hidden layer, and a 10-way softmax on the output layer). Note that Lindsey et al. (2019) additionally explore a model of the LGN, which can be considered as an extension of the retinal bottleneck (Ghodrati, Khaligh-Razavi, & Lehky, 2017). We do not include such an exploration in this work as we are primarily focused on opponency and color tuning in the bottleneck layer.
Figure 2:

(a) Schematic of the Retina-Net model from Lindsey et al. (2019). (b, c) CIFAR-10 test accuracy for the different combinations of retinal bottleneck and ventral depth explored in the experiments. Mean and standard error given over 10 trials.

Figure 2:

(a) Schematic of the Retina-Net model from Lindsey et al. (2019). (b, c) CIFAR-10 test accuracy for the different combinations of retinal bottleneck and ventral depth explored in the experiments. Mean and standard error given over 10 trials.

As with Lindsey et al.'s work, the networks are trained to perform classification on the CIFAR-10 data set (Krizhevsky, 2009), the only difference being that our model expects RGB inputs rather than grayscale. The choice of an object categorization task is validated by previous studies showing a strong correlation between neural unit responses of CNNs trained on such a task and the neural activity observed in the primate visual stream (Cadena et al., 2017; Güçlü & van Gerven, 2015; Yamins et al., 2014). (For further discussion of these results, refer to Lindsey et al., 2019.) Note that there may be many other learning tasks that are biologically valid in the sense that they yield similar functional properties. For example, self-supervised learning through deep information maximization (Hjelm et al., 2018) and contrastive predictive coding (Hénaff et al., 2019) may present viable alternatives to the supervised object recognition used here.

We train models across the same range of hyperparameters as Lindsey et al. (2019): bottleneck width NBN{1,2,4,8,16,32} and ventral depth DVVS{0,1,2,3,4}. Again following Lindsey et al. (2019), we perform 10 repeats, with error bars denoting the standard deviation in result across all repeats. Networks were trained for 20 epochs with the RMSProp optimizer and a learning rate of 1e-4 with initial weights sampled via the Xavier method (Glorot & Bengio, 2010). We note that in order to replicate the results from Lindsey et al. (2019), we required additional regularization. Specifically, we use a weight decay of 1e-6 and data augmentation (random translations of 10% of the image width/height, and random horizontal flipping). Figures 2b and 2c give the average terminal accuracy for models trained on both grayscale and color images, respectively. The grayscale accuracy curves match those given in Lindsey et al. (2019). The accuracy for networks trained on color images is generally higher, particularly for networks with no ventral layers. We will discuss additional training settings that are variants of the above.

5.2  Characterizing Single Cells

To begin, we illustrate our framework for characterizing single cells. Figure 3 shows the first-order receptive field approximations, orientation tuning curves, and color tuning curves for four cells in the bottleneck layer of a network with NBN=4 and DVVS=2. Following Lindsey et al. (2019), the receptive field approximation is the gradient (obtained through backpropagation) of the output of a single convolutional filter in a single spatial position (that is, a single convolutional “neuron”) with respect to a blank input with a constant value of 0.01. This small, positive amount is required to ensure that each of the cells is in the linear region of the ReLU activation function (that is, the gradient is nonzero). The gradient image is then normalized and scaled so that it can be interpreted visually. Visually, cells 1, 3, and 4 appear to be grayscale edge filters, whereas cell 2 is red/blue or magenta/cyan center-surround. However, the limitation of this analysis is the noise in the approximation. For example, one could argue that cell 1 is center-surround with a dark center and a magenta surround. Assessments given for any of the cells will be similarly contentious. Furthermore, this representation permits no understanding of inhibition. For example, cell 2 may be better described as tuned to blue hues in the interval (180,270) rather than center-surround opponent.

To further characterize each cell, we employ our described approach. To characterize spatial opponency, in Figures 3b and 3c, we provide orientation tuning curves for the frequency and phase that elicit the weakest and strongest responses, respectively. If the cell responds above the baseline in one tuning curve and below in the other, or if either curve crosses the baseline, then the cell is spatially opponent. We can therefore say that by our definition, cells 1, 3, and 4 are spatially opponent. In contrast, cell 2 is merely spatially nonopponent, always responding above the baseline for any choice of rotation, frequency, and phase. In addition to classifying opponency, we can identify the orientation tuning of each cell by further study of the curves in Figure 3c. Figure 3d gives the color tuning curves for each cell. As hue is the only parameter to consider, classification here is simpler: the cell is hue opponent if the tuning curve crosses the baseline. Given this definition, we can say that cells 1, 3, and 4 are hue opponent, although the extent of inhibition is different in each case. Furthermore, for every cell, we can identify the range of hues to which it is tuned.

Figure 3:

Characterization of the four cells in the second retinal layer of a network with NBN=4 and DVVS=2. (a) The receptive field approximation obtained from the gradient of the cell with respect to a blank image. (b, c) Orientation tuning curves for the frequency and phase combination that yielded the smallest and largest response, respectively. (d) Color tuning curve over the hue wheel. Cells 1, 3, and 4 are double opponent; cell 2 is nonopponent.

Figure 3:

Characterization of the four cells in the second retinal layer of a network with NBN=4 and DVVS=2. (a) The receptive field approximation obtained from the gradient of the cell with respect to a blank image. (b, c) Orientation tuning curves for the frequency and phase combination that yielded the smallest and largest response, respectively. (d) Color tuning curve over the hue wheel. Cells 1, 3, and 4 are double opponent; cell 2 is nonopponent.

Following interpretation of the tuning curves, we can now state that cells 1, 3, and 4 are double opponent and cell 2 is nonopponent both spatially and with regard to hue. Furthermore, for each cell, we can state the orientation and hue to which it is tuned. For example, cell 2 is broadly excited by blue stimuli but with a distinct peak at a hue of around 240. Cell 2 is spatially tuned to lines oriented in the interval (0,45). Although it is true that this approach gives us a deeper understanding of each cell, the real value is in the fact that each of the above can trivially be automated over the whole cell population. We therefore transition away from studying single cells and instead consider the distributions of different cell types for the remainder of the letter.

5.3  Characterizing Cell Populations

For each result in this section, we automate cell classification following our described method and present the distribution of each cell type as a function of retinal bottleneck width and ventral depth. This allows us to understand the effect that these two architectural variables have on the kinds of cells that are learned and where they are found in the network. Note, however, that cells in deeper layers are expected to have a highly nonlinear response and thus may have receptive field properties that are quite different from the opponent cells observed in shallower layers. As such, observations regarding these deeper layers (Ventral 2 in particular) should be considered only in the context of our approach and may not generally apply to the broader understanding of opponency.

5.3.1  Spatial Opponency

Figure 4 gives the distribution of spatially opponent, spatially nonopponent, and spatially unresponsive cells as a function of bottleneck width for a range of ventral depths. For a small bottleneck, the vast majority of cells in the second retinal layer are spatially opponent. Conversely, cells in the first ventral layer are predominantly spatially nonopponent. For deeper networks with less constrained bottlenecks, the distributions are approximately equal in each of the layers. Almost all cells respond to some configuration of the grating stimulus, with only a small fraction of the population being spatially unresponsive. These findings are consistent with the observations that unresponsiveness has not been observed in the neuroscience literature and that the majority of cells in primate V1 are orientation tuned (Livingstone & Hubel, 1984). Regarding ventral depth, the results show a consistent reduction in spatial opponency in the last convolutional layer (Retina 2 when depth is 0, Ventral 1 when depth is 1). There is a corresponding spike in spatial opponency in the penultimate convolutional layer (Retina 2 when depth is 1, Ventral 1 when depth is 2). The average number of opponent cells in each layer does not differ greatly.
Figure 4:

Distribution of spatially opponent, nonopponent, and unresponsive cells in different layers of our model as a function of bottleneck width, for a range of ventral depths. Functional organization emerges for networks with tight bottlenecks. The last convolutional layer (e.g., Retina 2 when depth is 0, Ventral 1 when depth is 1 and so on) exhibits a reduction in spatial opponency. The penultimate convolutional layer (Retina 2 when depth is 1, Ventral 1 when depth is 2, and so on) exhibits an increase.

Figure 4:

Distribution of spatially opponent, nonopponent, and unresponsive cells in different layers of our model as a function of bottleneck width, for a range of ventral depths. Functional organization emerges for networks with tight bottlenecks. The last convolutional layer (e.g., Retina 2 when depth is 0, Ventral 1 when depth is 1 and so on) exhibits a reduction in spatial opponency. The penultimate convolutional layer (Retina 2 when depth is 1, Ventral 1 when depth is 2, and so on) exhibits an increase.

5.3.2  Color Opponency

Curves showing how the distributions of the color opponent classes change for the second retinal and first two ventral layers as the bottleneck is increased, for a range of ventral depths, are given in Figure 5. As the bottleneck decreases, the second retina layer exhibits a strong increase in hue opponency, nearing 100% for a bottleneck of one. Conversely, cells in the first ventral layer show a decrease in hue opponency over the same region. For all but the tightest bottlenecks, up to half of the cells are hue nonopponent. Hue nonopponent cells show almost the exact opposite pattern to hue opponent cells. The implication of this result is that networks with strong hue opponent representations in the bottleneck layer exhibit an increase in hue nonopponent cells in Ventral 1. Since this spike in opponency returns in Ventral 2, we speculate that Ventral 1 merely preserves the opponent code from Retina 2 for downstream processing and learns a set of filters that are tuned but nonopponent. This is inconsistent with the evidence that spatially tuned cells in primate V1 are also color opponent (Lennie et al., 1990; Johnson et al., 2001). However, it should be stressed that our model of the primary visual cortex and ventral stream is highly simplified. In particular, we do not explicitly model the LGN or subsequent projections to different layers of V1, and greater similarity may well be observed in such a case. Similar to the results for spatial opponency, there is a consistent reduction/spike in hue opponency in the last and penultimate convolutional layers, respectively. Averaged over bottleneck width, the number of hue opponent cells is generally lower than the number of spatially opponent cells.

5.3.3  Double Opponency

Figure 6 shows the distribution of double opponent cells as a function of bottleneck size and ventral depth, giving a similar picture to the spatial and hue opponency plots. The results suggest that the majority of hue opponent cells are also spatially opponent. This finding is in alignment with the observation that most hue opponent cells in the macaque V1 are also orientation selective (Johnson et al., 2008).

5.3.4  Types of Opponency

The plots in Figure 7 show the distribution over the hue wheel of the most excitatory and most inhibitory colors for cells in our models before and after training. The key observation here is that maximal excitation and inhibition before training are naturally aligned to the hues that correspond with RGB values of 255 or 0. This is a quirk of the convolutional architecture. Since at initialization the function of the network is smooth, if the cell is excited by a particular channel, it will be most excited when that channel is maximized and vice versa. The effect of training, regarding both excitation and inhibition, is to reduce the proportion of cells that are tuned to red, yellow, cyan, and blue and increase the proportion of cells that are tuned to green and magenta. In addition some cells in the bottleneck layer (Retina 2) become most excited by orange/red and cyan/blue. Unlike the random networks, this changes as a function of depth, tending to broaden the range of excitatory and inhibitory hues. Note that this corresponds to the network learning a complex, nonlinear, color system. Cells in deeper layers are excited not only by particular channels but by the specific hue of the input.
Figure 5:

Distribution of color opponent, nonopponent, and unresponsive cells in different layers of our model as a function of bottleneck width, for a range of ventral depths. Functional organization again emerges for networks with tight bottlenecks. Furthermore, the last and penultimate convolutional layers exhibit a reduction and increase in color opponency, respectively. The echoes the spatial findings from Figure 4.

Figure 5:

Distribution of color opponent, nonopponent, and unresponsive cells in different layers of our model as a function of bottleneck width, for a range of ventral depths. Functional organization again emerges for networks with tight bottlenecks. Furthermore, the last and penultimate convolutional layers exhibit a reduction and increase in color opponency, respectively. The echoes the spatial findings from Figure 4.

Figure 6:

Distribution of double opponent cells in different layers of our model as a function of bottleneck width and ventral depth. Most spatially opponent cells are also color opponent, and so these distributions bare a strong similarity to those in Figures 4 and 5.

Figure 6:

Distribution of double opponent cells in different layers of our model as a function of bottleneck width and ventral depth. Most spatially opponent cells are also color opponent, and so these distributions bare a strong similarity to those in Figures 4 and 5.

Figure 7:

Distribution of excitatory and inhibitory hues for cells in different layers of networks with random weights and networks trained on RGB images. Maximal excitation and inhibition before training are naturally aligned to the hues that correspond to RGB values of 255 or 0. Trained networks show a preference for green and magenta. Some cells are highly nonlinear, maximally excited by orange/red and cyan/blue.

Figure 7:

Distribution of excitatory and inhibitory hues for cells in different layers of networks with random weights and networks trained on RGB images. Maximal excitation and inhibition before training are naturally aligned to the hues that correspond to RGB values of 255 or 0. Trained networks show a preference for green and magenta. Some cells are highly nonlinear, maximally excited by orange/red and cyan/blue.

One could speculate that the distribution in Figure 7 indicates that the type of opponency that is learned corresponds well with the cone opponency observed in primates. However, as discussed, Figure 7 does not permit an understanding of the discrete types of opponency that are learned. Furthermore, the figure does not differentiate the different model architectures. In Figure 8, we additionally plot the distribution of excitatory colors for all cells in the bottleneck layer, given that they are most inhibited by red, green, magenta, cyan, yellow, and blue, respectively. These are plotted for Shallow (DVVS{0,1}) and Deep (DVVS{3,4}) networks with Narrow (NBN{1,2,4}) and Wide (NBN{8,16,32}) bottlenecks.
Figure 8:

Conditional distribution of excitatory hues for cells that are most inhibited by red ([315,45)), yellow ([45,75)), green ([75,165)), cyan ([165,195)), blue ([195,285)), and magenta ([285,315)) for Shallow (DVVS{0,1}) and Deep (DVVS{3,4}) networks with Narrow (NBN{1,2,4}) and Wide (NBN{8,16,32}) bottlenecks. Narrow networks learn a simple color system, with cells that are maximally excited/inhibited by extreme RGB values (dashed vertical lines). Deep networks show an increase in cells that are most excited by blue.

Figure 8:

Conditional distribution of excitatory hues for cells that are most inhibited by red ([315,45)), yellow ([45,75)), green ([75,165)), cyan ([165,195)), blue ([195,285)), and magenta ([285,315)) for Shallow (DVVS{0,1}) and Deep (DVVS{3,4}) networks with Narrow (NBN{1,2,4}) and Wide (NBN{8,16,32}) bottlenecks. Narrow networks learn a simple color system, with cells that are maximally excited/inhibited by extreme RGB values (dashed vertical lines). Deep networks show an increase in cells that are most excited by blue.

We can now observe that the primary opponent axis in our networks is green/magenta, with cells that are inhibited by red or magenta and excited by green being unique to the Wide/Shallow networks. In addition, we can say that the majority of hue opponent cells (that is, cells in the Narrow networks) are channel opponent. In the Wide networks, we find cells that are broadly excited by orange/red and cyan/blue. These cells persist in the first ventral layer and are not typically present in Narrow networks. This suggests that the Wide networks are responsible for the peaks in Figure 7. We find the presence of cells that are excited by blue and inhibited by yellow, red, and green more prominently in the Deep networks, with particular prevalence in the Narrow + Deep networks. In general, the range of excitatory and inhibitory hues is greater in the Wide networks, suggesting increased prevalence of complex, nonlinear cells. This mirrors the finding from Lindsey et al. (2019) that cells in this setting tend to have a non-linear receptive field. Note that we have found that cells in the ventral layer (not included in the figure) are excited and inhibited by a much wider range of hues, particularly in the Narrow networks. This suggests that the bottleneck induces an efficient color code that enables cells in later layers to become attuned to highly specific hues. Recall that we observe an increase in the proportion of color tuned but nonopponent cells in Ventral 1 in models with tight bottlenecks, corroborating this assertion.

5.3.5  Hue Sensitivity

Figure 9 gives the results for the hue sensitivity experiment. We again provide plots for Shallow and Deep networks with Narrow and Wide bottlenecks so that these results can be understood in the context of the previous section. Since we are taking the gradient of the response, the sensitivity is undefined where there are discontinuities in the conversion from HSL to RGB (dotted vertical lines). The first point to note is that the straight lines in the sensitivity curves correspond to at most a quadratic response to hue. In contrast, nonlinear sensitivity curves suggest a higher-order hue response. In light of this, we can observe a general transition from highly nonlinear hue response in the Wide + Shallow networks to a more linear hue response in the Narrow + Deep networks. This is in line with our findings in the previous section and again mirrors the findings from Lindsey et al. (2019). Regarding tuning to specific colors, Lehky and Sejnowski (1990) note that gradient of the tuning curve (such as the curves in Figure 3d) is maximal when the stimulus is to either side of the peak. As such, where there are peaks in the distribution of cells that are excited by a particular hue, we should expect corresponding troughs in the sensitivity curve. As an example, note that the troughs in sensitivity to orange/red in the Wide + Shallow (and to a lesser extent in the Narrow + Shallow and Wide + Deep) networks matches the peaks in excitation observed in Figure 8. A similar observation can be made regarding the trough in sensitivity to cyan/blue in the Wide + Shallow networks. The blue excitation that is a uniquely prominent feature in the Narrow networks has also resulted in a corresponding dip in sensitivity. However, this has manifested as a sudden drop rather than a smooth transition since it lies at the discontinuous boundary between blue with a green component and blue with a red component. Ultimately these results demonstrate that low-level analysis of the color tuning distributions provides valid insights into the high-level functional properties of the networks.
Figure 9:

Mean gradient of the sum of the bottleneck layer response with respect to hue for Shallow (DVVS{0,1}) and Deep (DVVS{3,4}) networks with Narrow (NBN{1,2,4}) and Wide (NBN{8,16,32}) bottlenecks. The shaded region indicates the standard error across the trained models. Discontinuities derive from the conversion from HSL to RGB. Sensitivity is an approximately linear function of hue for Narrow networks, and particularly in the Narrow + Deep setting, again showing a simple color code in the bottleneck layer. Conversely, Wide + Shallow networks exhibit a highly nonlinear sensitivity to hue.

Figure 9:

Mean gradient of the sum of the bottleneck layer response with respect to hue for Shallow (DVVS{0,1}) and Deep (DVVS{3,4}) networks with Narrow (NBN{1,2,4}) and Wide (NBN{8,16,32}) bottlenecks. The shaded region indicates the standard error across the trained models. Discontinuities derive from the conversion from HSL to RGB. Sensitivity is an approximately linear function of hue for Narrow networks, and particularly in the Narrow + Deep setting, again showing a simple color code in the bottleneck layer. Conversely, Wide + Shallow networks exhibit a highly nonlinear sensitivity to hue.

6  Control Experiments

In this section, we perform a series of targeted experiments to assess how well our results extend to different settings. These experiments are intended to improve our understanding of the conditions under which the various forms of opponency emerge, supporting a comprehensive discussion.

6.1  Random Weights

Although we have presented strong evidence that cells in trained networks exhibit spatial, color, and double opponency, we have not yet demonstrated that this is learned. To determine if this opponency is learned, we require a demonstration that it is not present at initialization (when the weights are random). We have therefore performed our experiments on randomly initialized models. We find that networks with random weights (that is, following the Xavier initialization; Glorot & Bengio, 2010) never exhibit spatial or hue opponency. Instead, most cells are nonopponent, and their distribution over the layers does not change significantly with the bottleneck size. These results demonstrate that all of the opponency in our networks is learned. However, it could be the case that opponency derives from simple statistics of the convolutional filters. In order to understand this further, we experimented with networks whose filter weights are gaussian with the same mean and variance as the filters of the same depth in a reference pretrained model. The results for this experiment are given in Figure 10. Although we do find some opponency in this case, we do not find the same structure. Of particular note are the unresponsive, not present in the trained networks. These findings reflect the fact that a degree of structure in the receptive fields is required in order for a cell to exhibit a consistent opponent or nonopponent response.
Figure 10:

Distribution of spatially and color opponent, nonopponent, and unresponsive cells in different layers of our models with gaussian weights (mean and variance from filters of the same depth in a reference pretrained model with NBN=32 and DVVS=4) as a function of bottleneck width. Some opponency is explained by simple statistics of the filters. Functional organization emerges only as a result of training.

Figure 10:

Distribution of spatially and color opponent, nonopponent, and unresponsive cells in different layers of our models with gaussian weights (mean and variance from filters of the same depth in a reference pretrained model with NBN=32 and DVVS=4) as a function of bottleneck width. Some opponency is explained by simple statistics of the filters. Functional organization emerges only as a result of training.

6.2  Greyscale

As previously mentioned, in addition to the color models, we trained a batch of models with grayscale inputs. The results in Figure 11a validate that spatially opponent cells still emerge and have a similar distribution throughout the layers as that of cells in models trained with RGB inputs. Furthermore, this validates our classification approach since, from Lindsey et al. (2019), it is known that the Retina-Net model learns center-surround and oriented edge (both spatially opponent) filters with grayscale input.
Figure 11:

(a) Distribution of spatially opponent, nonopponent, and unresponsive cells in different layers of our model as a function of bottleneck width, for models trained with grayscale images showing that the known spatial opponency from Lindsey et al. (2019) is detected by our method. (b) Distribution of excitatory and inhibitory hues for cells in different layers of networks trained on images with distorted color (hue rotation of 90). The most prevalent excitatory and inhibitory colors are aligned with the RGB extremes closest to a 90 rotation of the peaks in Figure 7.

Figure 11:

(a) Distribution of spatially opponent, nonopponent, and unresponsive cells in different layers of our model as a function of bottleneck width, for models trained with grayscale images showing that the known spatial opponency from Lindsey et al. (2019) is detected by our method. (b) Distribution of excitatory and inhibitory hues for cells in different layers of networks trained on images with distorted color (hue rotation of 90). The most prevalent excitatory and inhibitory colors are aligned with the RGB extremes closest to a 90 rotation of the peaks in Figure 7.

6.3  Distorted Color

To further explore the idea that the opponency in our networks derives from the statistics of the data, we trained a batch of models on images with distorted color. Specifically, we convert the images into HSV space and offset the hue channel by 90, before converting back into RGB and forwarding to the network. Our interest here is not in whether opponency emerges, but in the effect this distortion has on it. Figure 11b shows the distribution of excitatory and inhibitory colors in networks trained with distorted inputs. Here, the most prevalent excitatory and inhibitory colors are aligned with the RGB extremes closest to a 90 rotation of the peaks in Figure 7. This is consistent with our observation that the vast majority of color opponent neurons are channel opponent. In contrast, the additional excitation peak has been rotated by exactly 90 from orange/red to green. This demonstrates that the cells that are excited by specific hues emerge as a result of the statistics of the data, not of the input color space.

6.4  CIELAB Space

In a similar vein to our experiments with distorted color, we now perform experiments to validate whether opponency is still a feature in networks trained on images in the CIELAB color space. The CIELAB color space encodes color in terms of lightness (L*), and two opponent axes: green/red (a*), and blue/yellow (b*). Each axis is nonlinear such that uniform changes in CIELAB space correspond to uniform perceptual changes in color. This will allow us to understand if functional organization still emerges when receptive fields are naturally opponent, that is, structure would require the cells to learn to “ignore” the inherent opponency of the a* and b* channels. Figure 12a shows the distribution of spatially and color opponent cells in this setting. The distribution is nearly identical to that of networks trained on images in RGB space, with the same characteristic functional organization. In Figure 12b, we plot the distribution of most excitatory and inhibitory colors in CIELAB space for random and trained networks. We again find that the distribution for random networks naturally aligns to hues that represent extreme values in the input color space; CIELAB encodes color on green/red and blue/yellow axes. Following training, we find that the majority of cells are most excited or inhibited by either green or blue. This bares some similarity to the RGB networks, which aligned to green and magenta following training. Again in accordance with the RGB networks, we find cells that are most excited by orange/red or cyan/blue, further showing that tuning to these particular colours is an artifact of the data rather than the color space.
Figure 12:

(a) Distribution of spatially and color opponent, nonopponent, and unresponsive cells in different layers of models trained on images in LAB space as a function of bottleneck width, showing that functional organization is not unique to RGB. (b) Excitatory/inhibitory hues in LAB space for random and trained networks. Training increases prevalence of blue/green and excitation by orange/red and cyan/blue.

Figure 12:

(a) Distribution of spatially and color opponent, nonopponent, and unresponsive cells in different layers of models trained on images in LAB space as a function of bottleneck width, showing that functional organization is not unique to RGB. (b) Excitatory/inhibitory hues in LAB space for random and trained networks. Training increases prevalence of blue/green and excitation by orange/red and cyan/blue.

6.5  Street View House Numbers

In addition to our experiments with CIFAR-10, we have trained a batch of networks on the Street View House Numbers (SVHN) data set (Netzer et al., 2011). This is a digit recognition (10 classes) problem with the same spatial resolution (32 × 32) as CIFAR-10. The distributions of the different cell types for these models are shown in Figure 13. The results show that spatial opponency is abundant in the second retina layer and increases for models with tight bottlenecks. In the first two ventral layers, the proportion of spatially opponent cells is generally lower (20%) and does not change significantly with bottleneck size. Color opponency is muted in comparison to the CIFAR-10 experiments and increases in the second retina layer only slightly for tight bottlenecks. This is unsurprising since color is not expected to be an important feature in the house number recognition problem. The largest networks in this setting achieved over 90% accuracy.
Figure 13:

Distribution of spatially and color opponent, nonopponent, and unresponsive cells in different layers of models trained on Street View House Numbers (SVHN) (Netzer et al., 2011) as a function of bottleneck width. Spatial opponency is present, with a similar distribution to the networks trained on CIFAR-10. Color opponency is generally lower, increasing only slightly for networks with narrow bottlenecks.

Figure 13:

Distribution of spatially and color opponent, nonopponent, and unresponsive cells in different layers of models trained on Street View House Numbers (SVHN) (Netzer et al., 2011) as a function of bottleneck width. Spatial opponency is present, with a similar distribution to the networks trained on CIFAR-10. Color opponency is generally lower, increasing only slightly for networks with narrow bottlenecks.

6.6  ImageNet

Our experiments thus far have focused on low-resolution (32 × 32) images. It is now important to understand whether our findings generalize to a higher-resolution setting. To that end, we have trained networks on the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) data set (Russakovsky et al., 2015) at a resolution of 128 × 128. Due to hardware constraints, we perform only three repeats across the full range of ventral depths and bottleneck widths. We adapt the Retina-Net model slightly, adding average pooling with a window of size 4 before the first fully connected layer. Figure 14 gives the distributions of the different spatial and color cell types respectively for models trained on ImageNet. As with CIFAR-10, the results show an increase in the proportion of spatially and color opponent cells in the bottleneck layer of networks with a tight bottleneck. Unlike the CIFAR-10 results, this opponency decays rapidly, and the emergent organization is observed only partially in the networks with the tightest bottlenecks. The percentage of opponent cells was generally lower than in networks trained on CIFAR-10. This could be related to the fact that the Retina-Net model does not effectively fit to ImageNet; the trained models achieved an accuracy between 5% and 20%. Although the number of double opponent cells in this setting (not shown in the figure) is much lower, the vast majority of spatially opponent cells are also color opponent. We do not find a spike in opponency in the penultimate convolutional layer in general in the ImageNet-trained models.
Figure 14:

Distribution of spatially and color opponent, nonopponent, and unresponsive cells in different layers of models trained on ImageNet (Russakovsky et al., 2015) as a function of bottleneck width, showing how our findings transfer to a higher resolution setting. There is an increase in opponency for narrow bottlenecks, which decays rapidly. Emergent organization is observed only partially in the networks with the tightest bottlenecks.

Figure 14:

Distribution of spatially and color opponent, nonopponent, and unresponsive cells in different layers of models trained on ImageNet (Russakovsky et al., 2015) as a function of bottleneck width, showing how our findings transfer to a higher resolution setting. There is an increase in opponency for narrow bottlenecks, which decays rapidly. Emergent organization is observed only partially in the networks with the tightest bottlenecks.

6.7  Intel Scene Classification

In addition to our experiments with ImageNet, we have trained a batch of models on the Intel scene classification challenge (Intel, 2018) data set. This is a natural scene classification problem with six classes and the same spatial resolution as ImageNet. As such, these models allow us to explore opponent cell types in a high-resolution setting where the model obtains stronger performance (up to 80% for the largest models). The results in Figure 15 show that models trained in this setting exhibit a much higher percentage of spatially and color opponent cells than in models trained on ImageNet, particularly in the second retina layer. We again find that the percentage of opponent cells in the first two ventral layers is equal and broadly independent of bottleneck size, not showing the emergent structure observed in CIFAR-10 except in the extreme case of NBN=1.
Figure 15:

Distribution of spatially and color opponent, nonopponent, and unresponsive cells in different layers of models trained on the Intel scene classification challenge data set (Intel, 2018) as a function of bottleneck width. With fewer classes (six in this case), the number of opponent cells is much higher. The distribution of opponent cells in Retina 2 bares strong similarity with the results from CIFAR-10. This does not extend to the ventral layers, which have near-identical cell distributions.

Figure 15:

Distribution of spatially and color opponent, nonopponent, and unresponsive cells in different layers of models trained on the Intel scene classification challenge data set (Intel, 2018) as a function of bottleneck width. With fewer classes (six in this case), the number of opponent cells is much higher. The distribution of opponent cells in Retina 2 bares strong similarity with the results from CIFAR-10. This does not extend to the ventral layers, which have near-identical cell distributions.

6.8  Classifying Mosaics

We also performed experiments to determine whether there are conditions under which certain types of opponency can be removed. To attempt to ablate spatial oppponency, we trained a batch of models to classify mosaic images. These are images that have been separated into smaller squares that have then been shuffled (see the examples in Figure 16 for reference). Figure 16 gives the distribution of spatially and color opponent cells in these networks. The figure shows that some of the spatial opponency is removed in this setting. Notably, Retina 2 exhibits a moderately lower proportion of spatial opponency, such that it is now in line with Ventral 2. This could be due to the fact that the impact of the mosaic images depends heavily on the size of the receptive field. We further note that color opponency is affected to the same extent as spatial. This suggests that the efficacy of color opponent cells is in some way reduced in the mosaic setting.
Figure 16:

Distribution of spatially and color opponent, nonopponent, and unresponsive cells in different layers of models trained on mosaic images as a function of bottleneck width with example mosaic images. These results show that when the spatial structure of the input is removed, some spatial opponency, particularly in Retina 2, is removed also. Color opponency is similarly affected, suggesting a complex dependence between spatial and color processing.

Figure 16:

Distribution of spatially and color opponent, nonopponent, and unresponsive cells in different layers of models trained on mosaic images as a function of bottleneck width with example mosaic images. These results show that when the spatial structure of the input is removed, some spatial opponency, particularly in Retina 2, is removed also. Color opponency is similarly affected, suggesting a complex dependence between spatial and color processing.

6.9  Shuffled Color Channels

To attempt to ablate color opponency, we remove color information by randomly shuffling the channels of inputs to the network (see examples in Figure 17 for reference). The resultant distribution plots in Figure 17 show that this alteration removes the vast majority of color opponent cells, while spatial opponency remains. Since the information present in shuffled images is the same, this experiment demonstrates that color opponency arises out of a need to consistently infer the colors in the inputs. We speculate that this aids in classification since each class will be associated with a set of features that vary both spatially and in hue. By shuffling the channels, we remove the ability to repeatedly associate a particular input tuple with a particular class. This view is supported by the fact that the models in this setting generally reached a lower accuracy than the models trained on standard color images and sometimes failed to match the models trained on grayscale images.
Figure 17:

Distribution of spatially and color opponent, nonopponent, and unresponsive cells in different layers of models trained on images with shuffled color channels as a function of bottleneck width with example shuffled images. When consistent color information is removed, most color opponency is also removed. Spatial opponency remains.

Figure 17:

Distribution of spatially and color opponent, nonopponent, and unresponsive cells in different layers of models trained on images with shuffled color channels as a function of bottleneck width with example shuffled images. When consistent color information is removed, most color opponency is also removed. Spatial opponency remains.

7  Discussion

Equipped with the results of our experiments, we now discuss the conclusions that can be drawn regarding spatial and color processing in convolutional neural networks. In addition, we suggest possible directions for future work building on these findings. Our primary finding is that the addition of a bottleneck in the Retina-Net model induces functional organization when trained on color (RGB) CIFAR-10. We have further shown that this finding generalizes to networks trained on images in the CIELAB color space. There is some evidence that this result differs when the networks are trained on other data sets, although the key finding, that structure emerges only with the tightest bottlenecks, remains. In the case of ImageNet, more experimentation with a model capable of fitting to the data would be required to understand this fully. Regarding network depth, our experiments have uncovered an increase in the number of opponent cells in the penultimate convolutional layer of the network and a corresponding decrease in the last convolutional layer. Our experiments with random networks demonstrate that all of the discussed opponency is learned and that most opponency is not a result of simple statistics of the weights.

In addition to these high-level observations, we have shown that an analysis based on approaches from neuroscience can yield a rich understanding of the function performed by a trained network. For example, we have shown that the deep Retina-Net model with a tight bottleneck learns a set of double opponent filters in the bottleneck layer, followed by a set of spatially and color-tuned but nonopponent filters in the first ventral layer, with opponency returning in the second ventral layer. Cells that are maximally excited by blue are a unique feature of these networks not present when the bottleneck is relaxed. Furthermore, these networks tend to learn linear, channel opponent neurons rather than neurons opponent to specific hues. We speculate that this is due to the increased need to learn an efficient color code in the tight bottleneck case.

The key implication of our core findings is that the model architecture can be the source of an inductive bias toward the number of opponent cells. While this finding alone may be of interest, whether it is of any practical significance depends on whether opponency is desirable. By virtue of the fact that opponent cells represent a more efficient encoding of the input, one might speculate that an increase in opponency could lead to increased generalization performance. This view is mildly supported by the plot in Figure 2c, where the networks with NBN=2 and DVVS=4 obtained the highest accuracy. We further suggest that opponent cells may be of greater utility in applications such as transfer learning. Specifically, one can envisage a scenario where the prebottleneck weights are fixed and the postbottleneck weights are updated to fit a new data set. Before such a setting could be considered, our findings would need to be demonstrated on a much more capable network architecture that can obtain competitive performance on standard data sets. The finding that the penultimate layer exhibits a spike in opponency may provide insight into the efficacy of layer-wise training procedures such as deep cascade learning (Marquez, Hare, & Niranjan, 2018). Note that cascade learning has been found to work well with transfer learning (Du, Farrahi, & Niranjan, 2019). Based on the evidence presented in this work, one might speculate that cascade learning increases the number of opponent cells and that these cells perform well for the transfer learning task. That said, and as previously discussed, whether the opponent cells in later layers inherit the same properties as opponent cells in earlier layers remains to be determined.

We have also demonstrated a number of similarities between the learned representations of our networks and representations observed in nature. The large number of double opponent cells we find in the retina layer of networks with tight bottlenecks is consistent with what is known about cells in the retina and LGN (Hubel & Wiesel, 2004). There are some consistencies and some inconsistencies between the ventral layers of the model and what is known about spatial and color processing in the visual cortex. However, as discussed, it is not clear that the ventral convolutional architecture is a good analog of the structure of the visual cortex, so such comparisons should be treated with skepticism. Our finding that the type of opponency learned is aligned with extreme values in the input color space accords with the physiological finding that opponency in early stages of the visual pathway is aligned with cone responses (Shevell & Martin, 2017).

The consequence of these demonstrations is not to suggest that convolutional neurons and biological neurons are similar. Instead, we have shown that similarity in the data space, architecture, and problem setting can give rise to similarity in the emergent functional properties. In addition, we have demonstrated some settings in which opponency is either hindered or removed entirely. This kind of controlled experiment may enable the exploration of hypotheses relating to the neuroscience of vision. Specifically, through construction of a data set that mimics an environment or an architecture that mimics an anatomy, one might seek a better explanation of the differences in visual processing between species. This potential is hinted at by our experiments with SVHN, which show that networks trained on the digit recognition task have fewer color opponent cells.

In conclusion, our considerations here provide a strong mandate for future research across a range of interests. Work should be conducted to understand whether the presence of opponent cells promotes increased adversarial robustness. Such research will require the ability to apply our methods to state-of-the-art architectures in order to be of practical relevance. In particular, it remains to be seen whether the introduction of a bottleneck is enough to promote opponency in more complex architectures. Indeed, this may require more sophisticated approaches such as cascade learning. Additionally, future research should attempt to further explore the connection between the problem space and the nature of learned visual processing. For example, it could be possible to construct a model that permits a notion of learnable monochromacy or dichromacy. This would make it possible to better understand the connection between problem complexity and the need for color acuity. Finally, experimentation with networks trained on hyperspectral images, where a complete spectrum is collected for each pixel, may enable more finely grained comparison with physiological data.

References

Adrian
,
E. D.
, &
Matthews
,
R.
(
1927a
).
The action of light on the eye
.
Journal of Physiology
,
63
(
4
), pp.
378
414
. doi:10.1113/jphysiol.1927.sp002410
Adrian
,
E. D.
, &
Matthews
,
R.
(
1927b
).
The action of light on the eye
.
Journal of Physiology
,
64
(
3
),
279
301
. doi:10.1113/jphysiol.1927.sp002437
Adrian
,
E. D.
, &
Matthews
,
R.
(
1928
).
The action of light on the eye
.
Journal of Physiology
,
65
(
3
),
273
298
. doi:10.1113/jphysiol.1928.sp002475
Barlow
,
H. B.
(
1953
).
Summation and inhibition in the frog's retina
.
Journal of Physiology
,
119
(
1
),
69
88
. doi:10.1113/jphysiol.1953.sp004829
Bedford
,
R.
, &
Wyszecki
,
G. W.
(
1958
).
Wavelength discrimination for point sources
.
JOSA
,
48
(
2
), 129–135.
Bell
,
A. J.
, &
Sejnowski
,
T. J.
(
1997
).
The “independent components” of natural scenes are edge filters
.
Vision Research
,
37
(
23
),
3327
3338
. https://doi.org/10.1016/S0042-6989(97)00121-1
Bilotta
,
J.
, &
Abramov
,
I.
(
1989
).
Spatial properties of goldfish ganglion cells
.
Journal of General Physiology
,
93
(
6
),
1147
1169
.
Bottou
,
L.
,
Cortes
,
C.
,
Denker
,
J. S.
,
Drucker
,
H.
,
Guyon
,
I.
,
Jackel
,
L. D.
, …
Vapnik
,
V.
(
1994
).
Comparison of classifier methods: A case study in handwritten digit recognition
. In
Proceedings of the International Conference on Pattern Recognition
(pp.
77
).
Piscataway, NJ
:
IEEE
.
Boynton
,
G. M.
(
2002
).
Color vision: How the cortex represents color
.
Current Biology
,
12
(
24
),
R838
R840
. https://doi.org/10.1016/S0960-9822(02)01347-7
Cadena
,
S. A.
,
Denfield
,
G. H.
,
Walker
,
E. Y.
,
Gatys
,
L. A.
,
Tolias
,
A. S.
,
Bethge
,
M.
, &
Ecker
,
A. S.
(
2017
).
Deep convolutional models improve predictions of macaque V1 responses to natural images
. bioRxiv. doi:10.1101/201764
Dacey
,
D. M.
(
1993
).
The mosaic of midget ganglion cells in the human retina
.
Journal of Neuroscience
,
13
(
12
),
5334
5355
.
Dalal
,
N.
, &
Triggs
,
B.
(
2005
).
Histograms of oriented gradients for human detection
. In
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
(Vol. 1, pp.
886
893
).
Piscataway, NJ
:
IEEE
.
Daw
,
N. W.
(
1967
).
Goldfish retina: Organization for simultaneous color contrast
.
Science
,
158
(
3803
),
942
944
.
De Valois
,
R. L.
,
Abramov
,
I.
, &
Jacobs
,
G. H.
(
1966
).
Analysis of response patterns of LGN cells
.
JOSA
,
56
(
7
),
966
977
. doi:10.1364/JOSA.56.000966
De Valois
,
R. L.
,
Albrecht
,
D. G.
, &
Thorell
,
L. G.
(
1982
).
Spatial frequency selectivity of cells in macaque visual cortex
.
Vision Research
,
22
(
5
),
545
559
.
De Valois
,
R. L.
,
Smith
,
C. J.
,
Karoly
,
A.
, &
Kitai
,
S.
(
1958
).
Electrical responses of primate visual system: I. Different layers of macaque lateral geniculate nucleus
.
Journal of Comparative and Physiological Psychology
,
51
(
6
), 662.
De Valois
,
R. L.
,
Smith
,
C.
,
Kitai
,
S. T.
, &
Karoly
,
A.
(
1958
).
Response of single cells in monkey lateral geniculate nucleus to monochromatic light
.
Science
,
127
,
238
239
.
Derrington
,
A. M.
,
Krauskopf
,
J.
, &
Lennie
,
P.
(
1984
).
Chromatic mechanisms in lateral geniculate nucleus of macaque
.
Journal of Physiology
,
357
(
1
),
241
265
.
Du
,
X.
,
Farrahi
,
K.
, &
Niranjan
,
M.
(
2019
).
Transfer learning across human activities using a cascade neural network architecture
. In
Proceedings of the 23rd International Symposium on Wearable Computers
(pp.
35
44
).
New York
:
ACM
.
Engel
,
S.
,
Zhang
,
X.
, &
Wandell
,
B.
(
1997
).
Colour tuning in human visual cortex measured with functional magnetic resonance imaging
.
Nature
,
388
(
6637
),
68
71
. doi:10.1038/40398
Freeman
,
W. T.
, &
Roth
,
M.
(
1995
).
Orientation histograms for hand gesture recognition
. In
International Workshop on Automatic Face and Gesture Recognition
(Vol. 12, pp.
296
301
).
Washington, DC
:
IEEE Computer Society
.
Ghodrati
,
M.
,
Khaligh-Razavi
,
S.-M.
, &
Lehky
,
S. R.
(
2017
).
Towards building a more complex view of the lateral geniculate nucleus: Recent advances in understanding its role
.
Progress in Neurobiology
,
156
,
214
255
.
Glorot
,
X.
, &
Bengio
,
Y.
(
2010
).
Understanding the difficulty of training deep feedforward neural networks
. In
Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics
(pp.
249
256
).
Goethe
,
J. W. v.
(
1840
).
Theory of colours
(vol.
3
).
Cambridge, MA
:
MIT Press
.
Gomez-
Villa
,
A.
,
Martin
,
A.
,
Vazquez-Corral
,
J.
, &
Bertalmio
,
M.
(
2019
).
Convolutional neural networks can be deceived by visual illusions
. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
Piscataway, NJ
:
IEEE
.
Güçlü
,
U.
, &
van Gerven
,
M. A.
(
2015
).
Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream
.
Journal of Neuroscience
,
35
(
27
),
10005
10014
.
Hartline
,
H. K.
(
1938
).
The response of single optic nerve fibers of the vertebrate eye to illumination of the retina
.
American Journal of Physiology—Legacy Content
,
121
(
2
),
400
415
. doi:10.1152/ajplegacy.1938.121.2.400
Hartline
,
H. K.
(
1940
).
The nerve messages in the fibers of the visual pathway
.
JOSA
,
30
(
6
),
239
247
. doi:10.1364/JOSA.30.000239
Hartline
,
H. K.
,
Wagner
,
H. G.
, &
Macnichol
,
E. F.
(
1952
).
The peripheral origin of nervous activity in the visual system
.
Cold Spring Harbor Symposia on Quantitative Biology
,
17
, 125-41.
He
,
K.
,
Zhang
,
X.
,
Ren
,
S.
, &
Sun
,
J.
(
2016
).
Deep residual learning for image recognition
. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(pp.
770
778
).
Piscataway, NJ
:
IEEE
.
Helmholtz
,
H. v.
(
1852
).
On the theory of compound colours
.
London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science
,
4
(
28
),
519
534
.
Hénaff
,
O. J.
,
Srinivas
,
A.
,
De Fauw
,
J.
,
Razavi
,
A.
,
Doersch
,
C.
,
Eslami
,
S.
, &
Oord
,
A. v. d.
(
2019
).
Data-efficient image recognition with contrastive predictive coding
. arXiv:1905.09272.
Hering
,
E.
(
1920
).
Grundzüge der lehre vom lichtsinn
.
Berlin
:
Springer
.
Hjelm
,
R. D.
,
Fedorov
,
A.
,
Lavoie-Marchildon
,
S.
,
Grewal
,
K.
,
Bachman
,
P.
,
Trischler
,
A.
, &
Bengio
,
Y.
(
2018
).
Learning deep representations by mutual information estimation and maximization
. arXiv:1808.06670.
Huang
,
G.
,
Liu
,
Z.
,
Van Der Maaten
,
L.
, &
Weinberger
,
K. Q.
(
2017
).
Densely connected convolutional networks
. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(pp.
4700
4708
).
Piscataway, NJ
:
IEEE
.
Huang
,
G.
,
Sun
,
Y.
,
Liu
,
Z.
,
Sedra
,
D.
, &
Weinberger
,
K. Q.
(
2016
).
Deep networks with stochastic depth
. In
Proceedings of the European Conference on Computer Vision
(pp.
646
661
).
Berlin
:
Springer
.
Hubel
,
D. H.
, &
Wiesel
,
T. N.
(
1962
).
Receptive fields, binocular interaction and functional architecture in the cat's visual cortex
.
Journal of Physiology
,
160
(
1
),
106
154
.
Hubel
,
D. H.
, &
Wiesel
,
T. N.
(
2004
).
Brain and visual perception: The story of a 25-year collaboration
. Oxford: Oxford University Press.
Jacobs
,
G. H.
(
1964
).
Single cells in squirrel monkey lateral geniculate nucleus with broad spectral sensitivity
.
Vision Research
,
4
(
3–4
),
221
IN3
.
Johnson
,
E. N.
,
Hawken
,
M. J.
, &
Shapley
,
R.
(
2001
).
The spatial transformation of color in the primary visual cortex of the macaque monkey
.
Nature Neuroscience
,
4
(
4
), 409.
Johnson
,
E. N.
,
Hawken
,
M. J.
, &
Shapley
,
R.
(
2008
).
The orientation selectivity of color-responsive neurons in macaque V1
.
Journal of Neuroscience
,
28
(
32
),
8096
8106
.
Kanan
,
C.
(
2013
).
Recognizing sights, smells, and sounds with gnostic fields
.
PLOS One
,
8
(
1
).
Kanan
,
C.
(
2014
).
Fine-grained object recognition with gnostic fields
. In
Proceedings of the IEEE Winter Conference on Applications of Computer Vision
(pp.
23
30
).
Piscataway, NJ
:
IEEE
.
Karklin
,
Y.
, &
Lewicki
,
M. S.
(
2003
).
Learning higher-order structures in natural images
.
Network: Computation in Neural Systems
,
14
(
3
),
483
499
.
Kleinschmidt
,
A.
,
Lee
,
B. B.
,
Requardt
,
M.
, &
Frahm
,
J.
(
1996
).
Functional mapping of color processing by magnetic resonance imaging of responses to selective p- and m-pathway stimulation
.
Experimental Brain Research
,
110
(
2
),
279
288
.
Krizhevsky
,
A.
(
2009
).
Learning multiple layers of features from tiny images
(Technical Report).Toronto: University of Toronto.
Krizhevsky
,
A.
,
Sutskever
,
I.
, &
Hinton
,
G. E.
(
2012
). Imagenet classification with deep convolutional neural networks. In
F.
Pereira
,
C. J. S.
Burges
,
L.
Bottou
, &
K. Q.
Weinberger
(Eds.),
Advances in neural information processing systems
(pp.
1097
1105
).
Red Hook, NY
:
Curran
.
Kuffler
,
S. W.
(
1953
).
Discharge patterns and functional organization of mammalian retina
.
Journal of Neurophysiology
,
16
(
1
),
37
68
.
Le
,
Q. V.
,
Karpenko
,
A.
,
Ngiam
,
J.
, &
Ng
,
A. Y.
(
2011
).
ICA with reconstruction cost for efficient overcomplete feature learning
. In
J. Shawe
-
Taylor
,
R.
Zemel
,
P.
Bartlett
,
F.
Pereira
, &
K. Q.
Weinberger
(Eds.),
Advances in neural information processing systems
(pp.
1017
1025
)
Red Hook, NY
:
Curran
.
Le Cun
,
Y.
, &
Bengio
,
Y.
(
1995
).
Convolutional networks for images, speech, and time series
. In
M. A.
Arbib
(Ed.),
The handbook of brain theory and neural networks
.
Cambridge, MA
:
MIT Press
.
Le Cun, Y.,
Matan
,
O.
,
Boser
,
B.
,
Denker
,
J. S.
,
Henderson
,
D.
,
Howard
,
R. E.
, …
Baird
,
H. S.
(
1990
).
Handwritten zip code recognition with multilayer networks
. In
Proc. 10th International Conference on Pattern Recognition
(Vol. 2, pp.
35
40
).
Piscataway, NJ
:
IEEE
.
Lehky
,
S. R.
, &
Sejnowski
,
T. J.
(
1988
).
Network model of shape-from-shading: Neural function arises from both receptive and projective fields
.
Nature
,
333
(
6172
),
452
454
.
Lehky
,
S. R.
, &
Sejnowski
,
T. J.
(
1990
).
Neural model of stereoacuity and depth interpolation based on a distributed representation of stereo disparity
.
Journal of Neuroscience
,
10
(
7
),
2281
2299
.
Lehky
,
S. R.
, &
Sejnowski
,
T. J.
(
1999
).
Seeing white: Qualia in the context of decoding population codes
.
Neural Computation
,
11
(
6
),
1261
1280
.
Lennie
,
P.
,
Krauskopf
,
J.
, &
Sclar
,
G.
(
1990
).
Chromatic mechanisms in striate cortex of macaque
.
Journal of Neuroscience
,
10
(
2
),
649
669
.
Lettvin
,
J. Y.
,
Maturana
,
H. R.
,
McCulloch
,
W. S.
, &
Pitts
,
W. H.
(
1959
).
What the frog's eye tells the frog's brain
. In
Proceedings of the IRE
,
47
(
11
),
1940
1951
.
Levick
,
W.
, &
Thibos
,
L.
(
1982
).
Analysis of orientation bias in cat retina
.
Journal of Physiology
,
329
, 243.
Lindsey
,
J.
,
Ocko
,
S. A.
,
Ganguli
,
S.
, &
Deny
,
S.
(
2019
).
A unified theory of early visual representations from retina to cortex through anatomically constrained deep CNNs
. In
Proceedings of the International Conference on Learning Representations.
OpenReview.net. https://openreview.net/forum?id=S1xq3oR5tQ
Livingstone
,
M. S.
, &
Hubel
,
D. H.
(
1984
).
Anatomy and physiology of a color system in the primate visual cortex
.
Journal of Neuroscience
,
4
(
1
),
309
356
.
Long
,
F.
,
Yang
,
Z.
, &
Purves
,
D.
(
2006
).
Spectral statistics in natural scenes predict hue, saturation, and brightness
. In
Proceedings of the National Academy of Sciences
,
103
(
15
),
6013
6018
.
Lowe
,
D. G.
(
1999
).
Object recognition from local scale-invariant features
. In
Proceedings of the Seventh IEEE International Conference on Computer Vision
(Vol. 2, pp.
1150
1157
).
Piscataway, NJ
:
IEEE
.
Marquez
,
E. S.
,
Hare
,
J. S.
, &
Niranjan
,
M.
(
2018
).
Deep cascade learning
.
IEEE Transactions on Neural Networks and Learning Systems
,
29
(
11
),
5475
5485
.
Marr
,
D.
, &
Hildreth
,
E.
(
1980
).
Theory of edge detection
. In
Proceedings of the Royal Society of London. Series B. Biological Sciences
,
207
(
1167
),
187
217
.
Martinez
,
L. M.
, &
Alonso
,
J.-M.
(
2003
).
Complex receptive fields in primary visual cortex
.
Neuroscientist
,
9
(
5
),
317
331
.
Maxwell
,
J. C.
(
1860
).
IV. On the theory of compound colours, and the relations of the colours of the spectrum
.
Philosophical Transactions of the Royal Society of London
,
150
,
57
84
.
McConnell
,
R. K.
(
1986, January 28
).
Method of and apparatus for pattern recognition.
Google Patents. (US Patent 4,567,610)
Naka
,
K.
, &
Rushton
,
W. A.
(
1966
).
S-potentials from colour units in the retina of fish (cyprinidae)
.
Journal of Physiology
,
185
(
3
),
536
555
.
Netzer
,
Y.
,
Wang
,
T.
,
Coates
,
A.
,
Bissacco
,
A.
,
Wu
,
B.
, &
Ng
,
A. Y.
(
2011
).
Reading digits in natural images with unsupervised feature learning
. NIPS Workshop on Deep Learning and Unsupervised Feature Learning.
Olah
,
C.
,
Cammarata
,
N.
,
Schubert
,
L.
,
Goh
,
G.
,
Petrov
,
M.
, &
Carter
,
S.
(
2020a
).
An overview of early vision in inceptionV1
.
Distill
,
5
(
4
),
e00024
002
.
Olah
,
C.
,
Cammarata
,
N.
,
Schubert
,
L.
,
Goh
,
G.
,
Petrov
,
M.
, &
Carter
,
S.
(
2020b
).
Zoom in: An introduction to circuits
.
Distill
,
5
(
3
),
e00024
001
.
Olah
,
C.
,
Mordvintsev
,
A.
, &
Schubert
,
L.
(
2017
).
Feature visualization
.
Distill
,
2
(
11
), e7.
Olah
,
C.
,
Satyanarayan
,
A.
,
Johnson
,
I.
,
Carter
,
S.
,
Schubert
,
L.
,
Ye
,
K.
, &
Mordvintsev
,
A.
(
2018
).
The building blocks of interpretability
.
Distill
,
3
(
3
), e10.
Olshausen
,
B. A.
, &
Field
,
D. J.
(
1996
).
Emergence of simple-cell receptive field properties by learning a sparse code for natural images
.
Nature
,
381
(
6583
),
607
609
.
Peirce
,
J.
,
Gray
,
J. R.
,
Simpson
,
S.
,
MacAskill
,
M.
,
Höchenberger
,
R.
,
Sogo
,
H.
, …
Lindeløv
,
J. K.
(
2019
).
Psychopy2: Experiments in behavior made easy
.
Behavior Research Methods
,
51
(
1
),
195
203
.
Pridmore
,
R. W.
(
2005
).
Theory of corresponding colors as complementary sets
.
Color Research and Application
,
30
(
5
),
371
381
.
Pridmore
,
R. W.
(
2011
).
Complementary colors theory of color vision: Physiology, color mixture, color constancy and color perception
.
Color Research and Application
,
36
(
6
),
394
412
.
Rafegas
,
I.
, &
Vanrell
,
M.
(
2018
).
Color encoding in biologically-inspired convolutional neural networks
.
Vision Research
,
151
,
7
17
.
Russakovsky
,
O.
,
Deng
,
J.
,
Su
,
H.
,
Krause
,
J.
,
Satheesh
,
S.
,
Ma
,
S.
, …
Fei-Fei
,
L.
(
2015
).
Imagenet large scale visual recognition challenge
.
International Journal of Computer Vision
,
115
(
3
),
211
252
.
Schluppeck
,
D.
, &
Engel
,
S. A.
(
2002
).
Color opponent neurons in V1: A review and model reconciling results from imaging and single-unit recording
.
Journal of Vision
,
2
(
6
),
5
5
.
Schrimpf
,
M.
,
Kubilius
,
J.
,
Hong
,
H.
,
Majaj
,
N. J.
,
Rajalingham
,
R.
,
Issa
,
E. B.
, …
DiCarlo
,
J. J.
(
2018
).
Brain-score: Which artificial neural network for object recognition is most brain-like?
https://doi.org/10.1101/407007
Seymour
,
K. J.
,
Williams
,
M. A.
, &
Rich
,
A. N.
(
2015
).
The representation of color across the human visual cortex: Distinguishing chromatic signals contributing to object form versus surface color
.
Cerebral Cortex
,
26
(
5
),
1997
2005
. doi:10.1093/cercor/bhv021
Shan
,
H.
,
Zhang
,
L.
, &
Cottrell
,
G. W.
(
2007
). Recursive ICA. In
B.
Schölkopf
,
J. C.
Platt
, &
T.
Hoffman
(Eds.),
Advances in neural information processing systems, 19
(pp.
1273
1280
).
Cambridge, MA
:
MIT Press
.
Shapley
,
R.
, &
Hawken
,
M. J.
(
2011
).
Color in the cortex: Single- and double-opponent cells
.
Vision Research
,
51
(
7
),
701
717
. https://doi.org/10.1016/j.visres.2011.02.012
Shevell
,
S. K.
, &
Martin
,
P. R.
(
2017
).
Color opponency: Tutorial
.
JOSA
,
34
(
7
),
1099
1108
.
Smith
,
V. C.
,
Lee
,
B.
,
Pokorny
,
J.
,
Martin
,
P.
, &
Valberg
,
A.
(
1992
).
Responses of macaque ganglion cells to the relative phase of heterochromatically modulated lights
.
Journal of Physiology
,
458
(
1
),
191
221
.
Szegedy
,
C.
,
Liu
,
W.
,
Jia
,
Y.
,
Sermanet
,
P.
,
Reed
,
S.
,
Anguelov
,
D.
, …
Rabinovich
,
A.
(
2015
).
Going deeper with convolutions
. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(pp.
1
9
).
Piscataway, NJ
:
IEEE
.
Tan
,
M.
, &
Le
,
Q. V.
(
2019
).
Efficientnet: Rethinking model scaling for convolutional neural networks
. arXiv:1905.11946.
Troy
,
J. B.
, &
Shou
,
T.
(
2002
).
The receptive fields of cat retinal ganglion cells in physiological and pathological states: Where we are after half a century of research
.
Progress in Retinal and Eye Research
,
21
(
3
),
263
302
.
Wade
,
A. R.
,
Augath
,
M.
,
Logothetis
,
N. K.
, &
Wandell
,
B. A.
(
2008
).
FMRI measurements of color in macaque and human
.
Journal of Vision
,
8
(
10
), 6.
1
619
.
Wagner
,
H. G.
,
MacNichol
,
E.
, &
Wolbarsht
,
M. L.
(
1960
).
Opponent color responses in retinal ganglion cells
.
Science
,
131
(
3409
),
1314
1314
.
Wang
,
P.
,
Cottrell
,
G. W.
, &
Kanan
,
C.
(
2015
).
Modeling the object recognition pathway: A deep hierarchical model using gnostic fields
. In
Proceedings of the 37th Annual Conference of the Cognitive Science Society
.
Red Hook, NY
:
Curran
.
Wiesel
,
T. N.
, &
Hubel
,
D. H.
(
1966
).
Spatial and chromatic interactions in the lateral geniculate body of the rhesus monkey
.
Journal of Neurophysiology
,
29
(
6
),
1115
1156
.
Yamins
,
D. L.
,
Hong
,
H.
,
Cadieu
,
C. F.
,
Solomon
,
E. A.
,
Seibert
,
D.
, &
DiCarlo
,
J. J.
(
2014
).
Performance-optimized hierarchical models predict neural responses in higher visual cortex
. In
Proceedings of the National Academy of Sciences
,
111
(
23
),
8619
8624
.
Young
,
T.
(
1802
).
II. The Bakerian Lecture: On the theory of light and colours
.
Philosophical Transactions of the Royal Society of London
,
92
,
12
48
.
Zagoruyko
,
S.
, &
Komodakis
,
N.
(
2016
).
Wide residual networks
. In
E. R. H.
Richard
,
C.
Wilson
, &
W. A. P.
Smith
(Eds.),
Proceedings of the British Machine Vision Conference
(pp.
87.1
87.12
).
Durham, U.K.
:
BMVA Press
. doi:10.5244/C.30.87
Zeiler
,
M. D.
, &
Fergus
,
R.
(
2014
).
Visualizing and understanding convolutional networks
. In
Proceedings of the European Conference on Computer Vision
(pp.
818
833
).
Berling
:
Springer
.
Zhang
,
Y.
,
Kim
,
I.-J.
,
Sanes
,
J. R.
, &
Meister
,
M.
(
2012
).
The most numerous ganglion cell type of the mouse retina is a selective feature detector
. In
Proceedings of the National Academy of Sciences
,
109
(
36
),
E2391
E2398
.
Zhao
,
X.
,
Chen
,
H.
,
Liu
,
X.
, &
Cang
,
J.
(
2013
).
Orientation-selective responses in the mouse lateral geniculate nucleus
.
Journal of Neuroscience
,
33
(
31
),
12751
12763
.
Zhaoping
,
L.
,
Geisler
,
W. S.
, &
May
,
K. A.
(
2011
).
Human wavelength discrimination of monochromatic light explained by optimal wavelength decoding of light of unknown intensity
.
PLOS One
,
6
(
5
), e19248.
Zoph
,
B.
, &
Le
,
Q. V.
(
2017
).
Neural architecture search with reinforcement learning
. In
Proceedings of the International Conference on Learning Representations.
https://openreview.net/forum?id=r1Ue8Hcxg

Author notes

The authors made equal contributions to this letter.