Under difficult viewing conditions, the brain’s visual system uses a variety of recurrent modulatory mechanisms to augment feedforward processing. One resulting phenomenon is contour integration, which occurs in the primary visual (V1) cortex and strengthens neural responses to edges if they belong to a larger smooth contour. Computational models have contributed to an understanding of the circuit mechanisms of contour integration, but less is known about its role in visual perception. To address this gap, we embedded a biologically grounded model of contour integration in a task-driven artificial neural network and trained it using a gradient-descent variant. We used this model to explore how brain-like contour integration may be optimized for high-level visual objectives as well as its potential roles in perception. When the model was trained to detect contours in a background of random edges, a task commonly used to examine contour integration in the brain, it closely mirrored the brain in terms of behavior, neural responses, and lateral connection patterns. When trained on natural images, the model enhanced weaker contours and distinguished whether two points lay on the same versus different contours. The model learned robust features that generalized well to out-of-training-distribution stimuli. Surprisingly, and in contrast with the synthetic task, a parameter-matched control network without recurrence performed the same as or better than the model on the natural-image tasks. Thus, a contour integration mechanism is not essential to perform these more naturalistic contour-related tasks. Finally, the best performance in all tasks was achieved by a modified contour integration model that did not distinguish between excitatory and inhibitory neurons.

Deep neural networks (DNN) are often used as models of the visual system (Kriegeskorte, 2015; Yamins & DiCarlo, 2016; Spoerer et al., 2017; Nayebi et al., 2018; Schrimpf et al., 2018; Lindsay, 2020). It has been argued that they are mechanistic models (Lindsay, 2020) because some of their computational elements have analogies in the brain. But they lack many other biological mechanisms, which may contribute to differences in representations (Schrimpf et al., 2018; Tripp, 2017; Shi et al., 2022) and behavior (Geirhos et al., 2019; Rajalingham et al., 2018; Szegedy et al., 2013; Nguyen et al., 2015; Hendrycks & Dietterich, 2019; Serre, 2019; Lake et al., 2015). In contrast, there are many physiological models of circuits that underlie localized neural phenomena (Haeusler & Maass, 2007; Rubin et al., 2015; Carandini & Heeger, 2012; Baker & Bair, 2016; Hurzook et al., 2013; Piëch et al., 2013), but these models tend to be isolated from larger circuits and to have uncertain connections with ethologically important visual tasks.

The limitations of both deep networks and isolated circuit models might potentially be addressed by combining them, that is, incorporating detailed circuit models into deep networks. In this direction, recent studies have incorporated details of interlaminar and interareal connectivity into deep networks (Kubilius et al., 2018; Lindsey et al., 2019; Tripp, 2019; Shi et al., 2022). Few studies (Guerguiev et al., 2017; Sacramento et al., 2018; Linsley et al., 2018; Iyer et al., 2020) have incorporated biologically grounded microcircuits into functionally sophisticated deep networks, but doing so may be an important step in understanding how microcircuits contribute to behavior and reproducing the superior generalization abilities of the brain (Sinz et al., 2019).

Contour integration (Field et al., 1993; Li et al., 2006; Hess et al., 2014; Roelfsema, 2006) is a phenomenon in the V1 cortex where stimuli from outside a neuron’s classical receptive field (cRF) modulate its feedforward responses (see Figure 1). In particular, a neuron’s response is enhanced if a preferred stimulus within the cRF is part of a larger contour. Li et al. (2006) found that these elevated V1 responses were highly correlated with contour detectability. Under difficult viewing conditions, it is thought that the visual system uses contour integration to pop out smooth contours. Contour integration is mediated by intra-area lateral and higher-layer feedback connections (Chen et al., 2017; Liang et al., 2017). Past computational models (Li, 1998; Piëch et al., 2013; Ursino & La Cara, 2004; Hu & Niebur, 2017; Mély et al., 2018) have tested potential mechanisms and successfully replicated neurophysiological data.1 However, a limitation of all of these circuit models is that they are stand-alone models that do little to clarify the roles of contour integration in natural vision.

Figure 1:

Contour integration. (A) Contour integration has been studied using stimuli in which line segments form a contour within a larger field of randomly oriented segments. Subjectively, the contour pops out from the background. (B) Contour integration is thought to be mediated by long-range inter-area and feedback connections from higher layers. The association field model (Field et al., 1993) is commonly used to model intra-area lateral connections. These long-range connections preferentially connect neurons with co-linear or co-circular orientation preferences. (C) Microcircuit architecture of the circuit model (Piëch et al., 2013) on which our work is based, which focuses on the role of lateral connections in V1. The outgoing connections of one of the excitatory nodes are highlighted in blue, while those of its paired inhibitory node are shown in red. Connections ending in a circle are excitatory and those ending in a bar are inhibitory.

Figure 1:

Contour integration. (A) Contour integration has been studied using stimuli in which line segments form a contour within a larger field of randomly oriented segments. Subjectively, the contour pops out from the background. (B) Contour integration is thought to be mediated by long-range inter-area and feedback connections from higher layers. The association field model (Field et al., 1993) is commonly used to model intra-area lateral connections. These long-range connections preferentially connect neurons with co-linear or co-circular orientation preferences. (C) Microcircuit architecture of the circuit model (Piëch et al., 2013) on which our work is based, which focuses on the role of lateral connections in V1. The outgoing connections of one of the excitatory nodes are highlighted in blue, while those of its paired inhibitory node are shown in red. Connections ending in a circle are excitatory and those ending in a bar are inhibitory.

Close modal

In this work, we embedded a circuit model of contour integration within a deep network. We used this model to investigate two broad questions. First, we tested whether key characteristics of biological contour integration would emerge as a result of the network learning to identify contours within backgrounds of randomly oriented edges (a kind of stimulus that has often been used to study contour integration). We found that the trained model was consistent with biological data on behavior (detection of contours), electrophysiology (unit responses versus contour length and contour-fragment spacing), and connectivity (structure of learned lateral connections). This provides new evidence that these particular circuit characteristics benefit the perception of contours within these synthetic visual stimuli.

Second, we used our model to investigate whether contour integration improved performance of two natural scene tasks. One of these was detection of weak edges in natural scenes, a role that has previously been proposed for contour integration. The second was a new task that required distinguishing connected contours from nearby unconnected contours. In the first task, the contour integration model performed similarly to a parameter-matched feedforward network. In the second task, surprisingly, the model performed much worse than the control network. However, it generalized better to a variation of the task that it was not trained on. Furthermore, a variation of the model that allowed excitatory neurons to inhibit some of their targets substantially outperformed the control. This suggests that contour integration is relevant to the second task, but the model we adopted was not optimal, either because biological contour integration is not optimal or because important biological elements were missing from the model.

2.1  Contour Integration Block

We adapted an existing circuit model of V1 contour integration and incorporated it into an artificial neural network (ANN). We used the current-based, subtractive-inhibition model of Piëch et al. (2013). This model focuses on within-layer lateral interactions between V1 orientation columns (co-located populations of neurons that respond to edges of similar orientations over a small area of visual space).

Orientation columns, the basic building block of the model, were modeled individually as a pair of reciprocally connected excitatory (E) and inhibitory (I) nodes whose temporal dynamics were defined as
(2.1)
(2.2)
Here, x and y are membrane potentials of E and I nodes; the Jxx,Jxy,Jyx model within node-pair interactions and represent EE (self), IE, and EI connection strengths, respectively; f.(.) is a nonlinear activation function that transforms membrane potentials into firing rates; τ. are membrane time constants; I is the external input current to the model (preceding layer edge extraction outputs); I0. are background currents and Lxx' and Lyx' are lateral connection strengths from E node x' in a neighboring columns that lie within the extra classical receptive field (e-cRF), to the excitatory and inhibitory nodes, respectively. An E-I pair of this model and all of its connections are shown in Figure 1C.

Functionally, E nodes process incoming edge extraction responses from preceding layers, while I nodes subtractively modulate E node activities. Both of these nodes also receive recurrent inputs from nearby columns via lateral connections. Piëch et al. (2013) designed these anisotropically distributed connections (Stettler et al., 2002) with connectivity patterns suggested by the association field model (Field et al., 1993) (see Figure 1B); each orientation column preferentially connects with nearby columns that respond to stimuli that are co-linear or curve-linear with the column’s preferred orientation. A similar but orthogonally oriented association field was used to model inhibitory lateral connections.

Piëch et al. (2013) defined the full model over a 2D grid of spatial locations. Each spatial location contained a set of orientation columns with the same frequency selectivities and a range of orientation preferences. The lateral connections of each orientation column were hard-coded. The dynamics of the full model were realized as the joint activities of all columns.

We made minimal adaptations to this circuit model to implement it as a trainable block inside a convolutional network. First, we replaced summations over e-cRFs with convolutions. The convolution operates over columns in nearby locations as well as at the same location. It incorporates the excitatory-self connection, Jxx, and the lateral connections. Second, we used Euler’s method to express the dynamics as difference equations (Linsley et al., 2018; Tallec & Ollivier, 2018). Third, we defined all model parameters including the lateral connections to be learnable and used task-level optimization to learn their optimal settings. Piëch et al. (2013) distinguished excitatory and inhibitory neurons in their model, consistent with Dale’s principle (Dale, 1935; Eccles et al., 1954). In contrast, neurons in convolutional networks typically do not make this distinction but allow weights to take on whatever values maximize performance. This consistently results in each neuron exciting some of its targets and inhibiting others. To ensure individual nodes were consistent with Dale’s principle, we constrained weights to be positive or negative, as appropriate. For connections between paired excitatory and inhibitory neurons, a logistic sigmoid nonlinearity was applied to the learned weight parameter to prevent changes in sign. The same method was used to retain the sign of the model’s time constants. For lateral connection kernels, a positive-only constraint was imposed on each element during training.

With these modifications, the activity of an orientation column is expressed as
(2.3)
(2.4)

where x0=y0=0.

Here, x and y are membrane potentials; f.(.) is a nonlinear activation function; σ(a),σ(b) are membrane time constants; σ(Jxy), σ(Jyx) are local I E, E I connection strengths; σ() is the logistic sigmoid function that constrains time constants and local connection strengths to be positive; We are lateral excitatory connections from E nodes in nearby columns to E; Wi are connections from nearby E nodes to I; fx(Xt) is the output of all modeled nodes at time t; is the convolution operator; I is the external input; and I0. is a node’s background activity.

This final form is a recurrent neural network that can be trained using standard neural network training techniques (Tallec & Ollivier, 2018). Finally, we included batch normalization (BN) (Ioffe & Szegedy, 2015) layers after every convolutional layer to model weak omnidirectional inhibition (Kapadia et al., 2000). We refer to this transformed model as the contour integration (CI) block and include it as a whole inside ANNs. Parameters of the CI block and their settings are described in the section 6.1.

2.2  Visual Inference Network

The full model is composed of edge extraction, CI, and classification blocks (see Figure 2). For edge extraction, we used the first convolutional layer of a ResNet50 (He et al., 2016) that was pretrained on the ImageNet (Deng et al., 2009) data set. We additionally added BN and max-pooling after the convolutional layer in all tasks other than edge detection in natural images. This helped reduce computational complexity (by reducing the spatial dimensions over which the recurrent CI block acts) and improved performance as well. For the task of edge detection in natural images, only the BN was added. Outputs of the edge extraction block were fed into the CI block. The same CI block was used across all tasks.

Figure 2:

Model architectures. The main component of the model is the contour integration (CI) block. It consists of a 3D grid of orientation columns and models the horizontal interactions between them. Each orientation column is modeled by a pair of excitatory (E) and inhibitory (I) nodes. Each orientation column receives as input the output of an edge extraction unit at the same spatial location and channel. Horizontal connections connect orientation columns with other orientation columns at different spatial locations and channels. These connections are learned by optimizing performance on high-level tasks. The full model consists of three main blocks: edge extraction, CI, and classification blocks. Edge extraction and CI blocks are common for all tasks. For edge extraction, the first convolutional layer of a ResNet50 (He et al., 2016) that was previously trained on ImageNet (Deng et al., 2009) was used. Task specific classification blocks (edge detector, fragments classifier, binary classifier) map contour integration activations to output labels. For each convolutional (conv) layer, the square brackets specify the number, size, and stride length of kernels. Batch normalization (BN) layers were typically used after convolutional blocks. Bilinear interpolation was used for upsampling in the edge detector classification block.

Figure 2:

Model architectures. The main component of the model is the contour integration (CI) block. It consists of a 3D grid of orientation columns and models the horizontal interactions between them. Each orientation column is modeled by a pair of excitatory (E) and inhibitory (I) nodes. Each orientation column receives as input the output of an edge extraction unit at the same spatial location and channel. Horizontal connections connect orientation columns with other orientation columns at different spatial locations and channels. These connections are learned by optimizing performance on high-level tasks. The full model consists of three main blocks: edge extraction, CI, and classification blocks. Edge extraction and CI blocks are common for all tasks. For edge extraction, the first convolutional layer of a ResNet50 (He et al., 2016) that was previously trained on ImageNet (Deng et al., 2009) was used. Task specific classification blocks (edge detector, fragments classifier, binary classifier) map contour integration activations to output labels. For each convolutional (conv) layer, the square brackets specify the number, size, and stride length of kernels. Batch normalization (BN) layers were typically used after convolutional blocks. Bilinear interpolation was used for upsampling in the edge detector classification block.

Close modal

Outputs of the CI blocks were passed to classification blocks, which mapped CI block outputs to required label sizes for each task. These blocks had two convolutional layers each. Deeper classification blocks might have allowed better task performance, but we chose shallower classification blocks so that the CI block would play an essential role in network function. Description of each of the classification blocks is in section 6.2. The architectures of all the models we used are shown in Figure 2.

2.2.1  Feedforward Control Network

We compared our contour-integration model (the visual inference network described above) with a feedforward control network of matching capacity (number of parameters). Feedforward networks can be parameterized to match capacity in several different ways (Spoerer et al., 2020; Tan & Le, 2019). Because we were interested in modeling V1 lateral connections, we used convolutional kernels of the same size as the model. Compared to standard convolutional kernels, these were much larger and were specifically designed to model lateral connections, which may spread out up to eight times the cRF of V1 neurons (Stettler et al., 2002).

The control network used the same edge extraction and classification blocks as the model. Only the middle block was different. The control’s middle block used the same convolutional layers as the model’s CI block but ordered them sequentially. Additionally, batch normalization and dropout layers (pdropout=0.3) were added after every convolutional layer to prevent the control from overfitting the training data. Finally, no positive-only weight constraint was enforced on the control network. It was free to adopt any weight changes that improved performance. Compared to the control, the CI block does approximately Niter more computations per image and has a larger inference run time because it is recurrent. However, this is consistent with contour integration in the brain, which affects late-phase responses of V1 neurons rather than their initial responses (Li et al., 2006).

3.1  Contour Detection

We first trained the networks with stimuli that are typically used to study biological contour integration (Field et al., 1993; Li et al., 2006, 2008). These consisted of many small edges, a few of which were aligned to form a contour, while the rest were randomly oriented to form the background (see Figure 3). Li et al. (2008) found that macaque monkeys progressively improved at detecting contours and had higher contour-enhanced V1 responses with experience on these stimuli. Hence, contour integration is learnable from these stimuli.

Figure 3:

Contour fragments stimuli. (A, B) Example training stimuli. All line segments are identical Gabor fragments. A few adjacent fragments were aligned to form a smooth contour (highlighted in red). Remaining fragments were randomly distributed. Training stimuli consisted of curved and straight contours that differed in their location, length, interfragment curvature, and component Gabors. (C, D) Test stimuli used to analyze the impact of length and interfragment spacing at the behavioral and neuronal levels. Test stimuli consisted of centrally located straight contours with different lengths (C) and different spacing between contour fragments (D).

Figure 3:

Contour fragments stimuli. (A, B) Example training stimuli. All line segments are identical Gabor fragments. A few adjacent fragments were aligned to form a smooth contour (highlighted in red). Remaining fragments were randomly distributed. Training stimuli consisted of curved and straight contours that differed in their location, length, interfragment curvature, and component Gabors. (C, D) Test stimuli used to analyze the impact of length and interfragment spacing at the behavioral and neuronal levels. Test stimuli consisted of centrally located straight contours with different lengths (C) and different spacing between contour fragments (D).

Close modal

We constructed a data set containing 64,000 training and 6400 validation images in which contours differed in their locations, lengths (number of edges, lc=[1,3,5,7,9]), curvature (random interedge rotations of β=[0,±15], and component edges (64 Gabors with different parameters). Details of the full data set are described in section 6.5.

Networks were tasked with identifying fragments that were part of the contour. A fragments classifier block (see Figure 2) followed the CI block to map its outputs to the desired label size. Details of the training process are described in section 6.3. Network performances were evaluated using mean intersection over union (IoU) scores between predictions and labels (see section 6.4.1). We refer to this task as contour detection due to its similarity with object detection in computer vision, but note that it differs from the kind of detection used in monkey experiments, which involves two patches of line segments and requires only selection of the patch that contains a contour (Li et al., 2008).

Averaged peak IoU scores after training are shown in Table 1. For each network, the results were averaged over five independent runs, each initialized with different random seeds. The model outperformed the control by approximately 11% (validation score).

Table 1:

Peak IoU Scores on the Contour Fragments Data Set.

NetworkTrain (%)Validation (%)
Model 87.33±0.28% 84.48±0.30% 
Control 71.62±0.35% 73.61±0.38% 
NetworkTrain (%)Validation (%)
Model 87.33±0.28% 84.48±0.30% 
Control 71.62±0.35% 73.61±0.38% 

Note: Peak values (mean ± 1 SD) were averaged across five independent runs for each network.

To ensure the validity of our lateral kernel size choice, we also tested control models with more standard kernel sizes (5×5,3×3). The validation IoU for the 3×3 control reached approximately 36%, while for the 5×5 control, it reached approximately 44%. Both of these figures were lower than the score achieved by our selected control model.

3.1.1  Effect of Contour Length and Interfragment Spacing

To determine whether networks learned to integrate contours in a manner similar to the brain, we analyzed them for consistency with behavioral and neurophysiological data. Li et al. (2006) concurrently monitored behavioral performance and V1 neural responses of macaque monkeys as the length of embedded contours and the spacing between contour fragments were varied. At the behavioral level, contours became more salient as lengths increased. Furthermore, when contours extended in the direction of the preferred orientation of V1 neurons, firing rates monotonically increased. Conversely, when spacing between fragments increased, contours became less salient and V1 firing rates decreased monotonically.

In a similar manner, we analyzed trained networks behaviorally at the output of networks and neurophysiologically at the output of centrally located neurons of the CI blocks. For the contour-integration model network, this corresponded to the outputs of E neurons, while for the control network, it corresponded to the outputs of the second convolutional layer. Behavioral performance was quantified using task-level mean IoU scores. Similar to Li et al. (2006), neurophysiological responses were quantified by the contour integration gain,
(3.1)
where the relative co-linear distance (RCD) quantifies interfragment spacing and was defined as the ratio of interfragment spacing to fragment length in pixels. The condition lc=1, RCD = 1 is when a neuron receives its optimal stimulus within its cRF and no neighboring contour fragments align with it.

We constructed separate test stimuli (similar to those of Li et al., 2006) for each recorded neuron. These consisted of centrally located contours of varying length and interfragment spacing, where each contour fragment was a spatially shifted copy of the neuron’s optimal within-cRF stimulus. A detailed description of the test stimuli is given in section 6.6. Examples are shown in Figures 3C and 3D.

Average IoU scores as contour length increased are shown in Figure 4A. Results were averaged over five copies of each network, each trained in the same way but initialized with different random weights. For centrally located straight contours, behavioral performance of both networks was similar. Both the contour-integration model and control networks excelled (95% or higher) at detecting the absence of contours. There were dips in performance for length-three contours as they were the hardest to detect. For all other lengths, prediction accuracy increased with length, with the model outperforming the control at larger contour lengths.

Figure 4:

Synthetic contour fragments results. (A) IoU versus contour length for straight contours. Behavioral classification performance of the model and control were similar. (B, C) Population average gains versus length and versus fragment spacing, respectively. Contour lengths are expressed in number of fragments, and spacing between fragments is expressed as relative co-linear distance (RCD). RCD is defined as the ratio between the spacing between fragments to the length of a fragment. Neurophysiological results are from Li et al. (2006). The plot shows the weighted average gains from the two monkeys used in their study. Dark lines show mean values, and shaded areas represent unit standard deviation from means over neurons from five different training runs for each model. (D, E) Gradients of linear fits of the outputs of individual neurons as contour length and as interfragment spacing were increased. (F, G) Similar plots as panels D and E but for the control. The contour-integration model showed consistent trends with neurophysiological data, while the control behaved differently.

Figure 4:

Synthetic contour fragments results. (A) IoU versus contour length for straight contours. Behavioral classification performance of the model and control were similar. (B, C) Population average gains versus length and versus fragment spacing, respectively. Contour lengths are expressed in number of fragments, and spacing between fragments is expressed as relative co-linear distance (RCD). RCD is defined as the ratio between the spacing between fragments to the length of a fragment. Neurophysiological results are from Li et al. (2006). The plot shows the weighted average gains from the two monkeys used in their study. Dark lines show mean values, and shaded areas represent unit standard deviation from means over neurons from five different training runs for each model. (D, E) Gradients of linear fits of the outputs of individual neurons as contour length and as interfragment spacing were increased. (F, G) Similar plots as panels D and E but for the control. The contour-integration model showed consistent trends with neurophysiological data, while the control behaved differently.

Close modal

Larger contrasts between the model and control were observed when neural response gains were analyzed. Figure 4B shows population average gains as contour lengths changed, along with averaged gains from two monkeys in Li et al. (2006). In the contour-integration model network, average gains increased monotonically with contour length, similar to the monkey data. In contrast, average gains in the control network did not change appreciably with contour length. Figure 4C shows population average gains as the spacing between fragments increased. Model network gains decreased monotonically with spacing, consistent with the monkey data (Li et al., 2006). Control network gains, unexpectedly, increased with spacing.

To calculate gains in both the model and the control network, we excluded neurons that did not respond to any single Gabor fragment in the cRF (no optimal stimulus). Out of the 320 possible neurons, 188 model and 178 control neurons were retained according to this criterion. Furthermore, for population average gains, neurons that were unresponsive to any contour condition (all zero gains) and those that had outlier gains (20 or more) for any contour condition were also removed. Typically, these large gains were seen for neurons that had small responses to lc=1 contours, and small changes in the CI block outputs significantly affected their gains. This resulted in the removal of an additional 36 model and 144 control neurons. Across each population (model and control), there was a wide range of enhancement gains exhibited by individual neurons as shown in the mean ±1 SD shaded area in Figure 4B.

To better understand how responses varied across neuron populations, we plotted histograms of the slopes of linear fits to CI block outputs versus contour length and interfragment spacing. This was done for all neurons for which the optimal stimulus was found. Since outputs rather than gains were considered, we included neurons with outlier gains in these histograms. Results of the model network are shown in Figures 4D and 4E while those of the control network are shown in Figures 4F and 4G. Most model neurons showed positive slopes as contour lengths increased and negative slopes as fragment spacing increased, consistent with trends in the monkey data. In contrast, the slopes of control-network responses versus fragment length and spacing were both clustered slightly above zero. While the task performance of the model and the control networks were similar, they employed different strategies to solve the task, and only the contour-integration model network was consistent with neurophysiological data.

3.1.2  Lateral Connectivity Patterns

We additionally analyzed lateral kernels of trained models for consistency with neuroanatomical properties of V1 lateral connections. To maintain consistency with Dale’s principle and the approach used in the model of Piëch et al. (2013), the signs for all lateral kernel weights were constrained to be positive. Moreover, separate kernels were used to model excitatory connections onto excitatory neurons and inhibitory neurons. These constraints also facilitated visualizing these multidimensional connection patterns (see section 6.8). Example learn outgoing lateral kernels for a trained model are shown in Figures 5A and 5B. The full set of excitatory and inhibitory learned lateral kernels of this trained model are shown in Figures S2 and S3, respectively. Their corresponding feedforward kernels are shown in Figure S1. Qualitatively, many excitatory-targeting connections were anisotropically distributed and spread out densely in the preferred orientations of the source neurons, while inhibitory-targeting connections were shorter and more omnidirectional.

Figure 5:

Lateral kernel analysis results. (A, B) Example learned excitatory (E) and inhibitory (I) lateral connections. The left-most subplot shows a kernel in the feedforward (FF) edge extraction layer. The red line through its center shows its preferred orientation. The middle subplot shows its learned outgoing lateral connections to E neurons, while the right-most subplot shows its learned outgoing lateral connections to I neurons. The procedure we used to visualize lateral connections is described in section 6.8. (C, D) Histograms of normalized index of ellipticity of lateral E and I connections, respectively. Lateral E connections spread out farther and are more directed than inhibitory connections. (E) Axis-of-elongation of lateral connections plotted against the orientation of their corresponding feedforward edge extraction kernels. Each point is scaled by its normalized index of ellipticity; larger markers are more directed kernels. Dashed lines show ±90 angular difference. Lateral kernels that lie on these lines are orthogonal to feedforward kernels.

Figure 5:

Lateral kernel analysis results. (A, B) Example learned excitatory (E) and inhibitory (I) lateral connections. The left-most subplot shows a kernel in the feedforward (FF) edge extraction layer. The red line through its center shows its preferred orientation. The middle subplot shows its learned outgoing lateral connections to E neurons, while the right-most subplot shows its learned outgoing lateral connections to I neurons. The procedure we used to visualize lateral connections is described in section 6.8. (C, D) Histograms of normalized index of ellipticity of lateral E and I connections, respectively. Lateral E connections spread out farther and are more directed than inhibitory connections. (E) Axis-of-elongation of lateral connections plotted against the orientation of their corresponding feedforward edge extraction kernels. Each point is scaled by its normalized index of ellipticity; larger markers are more directed kernels. Dashed lines show ±90 angular difference. Lateral kernels that lie on these lines are orthogonal to feedforward kernels.

Close modal

We quantified the spread of lateral connections using a procedure adapted from Sincich and Blasdel (2001). They injected axon staining dye into V1 orientation columns, and characterized the staining pattern around each injection with an averaging vector, R. The magnitude, r, indicated the direction selectivity of lateral connections, while its angle pointed in the direction of the densest staining. Directional selectivity was quantified using a normalized index of ellipticity rn, which was obtained by normalizing r with the mean length of all lateral connection vectors. More details of the procedure are in section 6.7. An rn of zero indicates an omnidirectional spread of lateral connections, while a value of one indicates a straight line. Finally, they compared the axes of elongation of lateral connections with orientation preferences of V1 columns. In 11 of the 14 injection sites, a highly elliptical distribution of lateral connections was found (rn= 0.42), as well as a close correspondence between the axis-of-elongation of lateral connections and the preferred orientation of injected V1 columns (mean difference of 11).

We analyzed the directional selectivity and axis-of-elongation of lateral connections in our trained models in a similar manner. Details of how we adapted their analysis for our models are described in section 6.7. rn distributions for excitatory-targeting and inhibitory-targeting kernels of a trained model are shown in Figures 5C and 5D, respectively. The average rn for excitatory-targeting kernels was found to be 0.20, while for the inhibitory-targeting kernels, it was substantially lower at 0.07. Across the five trained models, we found a population-average excitatory-targeting rn of 0.19 ± 0.01 (mean ± 1 SD) and inhibitory-targeting rn of 0.07 ± 0.01. Excitatory-targeting connections were substantially more directed than inhibitory-targeting ones.

These rns were lower than those reported by Sincich and Blasdel (2001). Two differences in our analysis may contribute to this. First, Sincich and Blasdel (2001) were only able to include connections that were outside a radius of 200 μm of the injection location, while we considered all lateral connections. Second, we weighted all lateral connections by their connection strengths so that stronger connections had a greater influence on the averaging vector, while Sincich and Blasdel (2001) considered all patch vectors to have equal weight.

Orientation differences, θdiff, between neurons’ orientation preferences and axes of elongation of their lateral connections are shown in Figure 5E for a trained model. Each marker is scaled by the kernel’s normalized index of ellipticity, so that larger markers show more anisotropic connections. Because orientation has a period of ±180, angular differences have a potential range of ±90. Most neurons’ axes of elongation were close to their feedforward kernel orientations (see Figure 5E; mean excitatory-targeting θdiff=22, mean inhibitory-targeting θdiff=31). A smaller number of neurons had axes of elongation nearly orthogonal to their preferred orientation. The results were consistent across the five independently trained models (population average excitatory-targeting θdiff=22±3 and inhibitory-targeting θdiff=28±2). The difference in orientations between lateral connections, axis-of-elongation and feedforward orientation preferences was larger than what Sincich and Blasdel (2001) found, but the trend was similar; most lateral excitatory connections project in the same direction as the preferred direction of their associated feedforward kernel.

Excitatory lateral connections onto inhibitory neurons in our model have a net inhibitory effect on excitatory neurons in surrounding columns. In previous contour integration models with fixed connection structures (Piëch et al., 2013; Li, 1998; Ursino & La Cara, 2004), typically a similar size is used for both excitatory and inhibitory interactions. Contrastingly, our model learned smaller and more omnidirectional inhibitory-targeting kernels. Moreover, previous models aligned the orientation of lateral inhibition kernels in the orthogonal-to-the-preferred direction of feedforward kernels, consistent with Kapadia et al. (2000). In contrast, our model learned inhibitory-targeting connections that were mostly aligned with the preferred orientations of feedforward kernels but more omnidirectional (see Figure 5D). These kernels are consistent with observations that short-range connections in superficial layers of V1 tend to be omnidirectional and largely suppressive (Malach et al., 1993). They are also related to a recent version of the associate field model that includes short-range, omnidirectional inhibition (Field et al., 2013).

In summary, the lateral kernels in our model were qualitatively realistic in three respects: degree of elongation, alignment of elongation with neurons’ preferred orientations, and relatively omnidirectional short-range inhibitory interactions. Together with realistic responses discussed in previous sections, this indicates that a physiologically realistic contour integration mechanism is consistent with optimization the contour integration network for this contour detection task.

3.2  Edge Detection in Natural Images

Next, we explored whether brain-like contour integration can be learned from tasks in our natural viewing environment and whether contour integration is useful in the performance of these tasks. Despite substantial research on the mechanisms of contour integration and the phenomenon of contour pop-out, little is known about the role of contour integration in natural life and survival. Perhaps the most specific proposal to date is that contour integration may enhance detection of parts of a contour with weak local cues, such as poor contrast (Piëch et al., 2013; Li, 1998). To test this idea, we trained our network to detect edges in natural images. We used the Barcelona Images for Perceptual Edge Detection (BIPED) data set (Poma et al., 2020) as it considers all contours rather than object boundaries only. This is important because our focus is on contour integration in V1, whereas object awareness relies on more abstract representations in deeper layers. The data set contains 200 train and 50 validation (image, edge map) pairs. It was expanded to 57,600 related training images using data augmentation methods. Sample images and ground-truth labels are shown in Figures 6A and 6B.

Figure 6:

Edge detection in natural image stimuli. (A) Example images from the BIPED (Poma et al., 2020) data set. Each row shows a different image. (B) Ground truth edge maps for input images shown in panel A. (C, D) Corresponding predictions of the control and contour-integration model, respectively.

Figure 6:

Edge detection in natural image stimuli. (A) Example images from the BIPED (Poma et al., 2020) data set. Each row shows a different image. (B) Ground truth edge maps for input images shown in panel A. (C, D) Corresponding predictions of the control and contour-integration model, respectively.

Close modal

Networks were tasked with detecting all contours in input images. Performance was evaluated using mean IoU scores (see section 6.4.1) between network predictions and ground-truth labels over all pixels in an image and all images in the data set. An edge detection block (see Figure 2) was used to map CI block outputs to the same dimensions as labels. Details of the training process are described in section 6.3.

Example predictions of trained control and model networks are shown in Figures 6C and 6D respectively. Visually, differences between their predictions are subtle. Validation IoU scores over the time course of training, for a detection threshold of 0.3 (see section 6.4.1), are shown in Figure 7A. Both networks achieved their highest mean IoU scores (0.45) at this threshold. The mean IoU scores of both networks were similar. The CI block had little impact on overall performance, suggesting that the physiology of contour integration may not be essential for reliable detection of a wide variety of edges in natural scenes. To further explore this point, we trained a version of the model in which the lateral connections had a much smaller spatial extent: 3×3 kernels rather than 15 × 15. This model also reached the same peak performance.

Figure 7:

Edge detection in natural images. (A) Validation IoU scores of the contour-integration model, control, and a model with reduced 3×3 lateral kernels over training. All networks had similar performances. (B, C) Average prediction difference between model and control over the validation data set as a function of prediction strength. Model and control outputs were compared over the entire validation data set pixel-by-pixel. Using a sliding window of width 0.2 and a step size of 0.1, control predictions within the window were highlighted and compared with corresponding model predictions. Average differences for edge pixels are shown in panel B and for nonedges in panel C. Positive values indicate higher model predictions compared to control. Solid line shows mean differences, and the shaded area shows unit standard deviation around the mean.

Figure 7:

Edge detection in natural images. (A) Validation IoU scores of the contour-integration model, control, and a model with reduced 3×3 lateral kernels over training. All networks had similar performances. (B, C) Average prediction difference between model and control over the validation data set as a function of prediction strength. Model and control outputs were compared over the entire validation data set pixel-by-pixel. Using a sliding window of width 0.2 and a step size of 0.1, control predictions within the window were highlighted and compared with corresponding model predictions. Average differences for edge pixels are shown in panel B and for nonedges in panel C. Positive values indicate higher model predictions compared to control. Solid line shows mean differences, and the shaded area shows unit standard deviation around the mean.

Close modal

3.2.1  Weak versus Strong Edge Pixel Detection

In natural images, contours have nonuniform strengths, and some parts are easier to detect than others. Li (1998) and Piëch et al. (2013) showed that contour integration can potentially enhance weak contours. However, results were qualitatively analyzed only using a single image. Although we found that contour integration did not improve detection of a wide variety of contours, including weak contours, contour integration may still strengthen low-level responses to weak contours. To investigate this question, we plotted the difference between model and control outputs as a function of the control outputs, pixel-by-pixel. Details of the procedure we used are described in section 6.9.

The results are shown in Figure 7B for edge pixels and in Figure 7C for nonedge pixels. On average, the model had higher edge predictions for weaker edges (up to control output 0.3). For stronger edges, the control network responded more strongly on average. For nonedge pixels, model outputs were on average lower than control outputs for all control outputs above 0.2. This shows that models had a lower tendency toward false-positive edge detection. In summary, contour integration strengthened the representation of weak edges, but this had little practical effect on detection of weak edges at the most effective discrimination threshold.

3.3  Naturalistic Contour Processing

Contour integration may support other kinds of reasoning about contours in natural scenes—for example, determining which branch to climb in order to reach some fruit. To investigate this possibility, we devised a new visual perception task. Specifically, we trained the model to detect whether two points in a natural scene were part of the same contour. We placed two markers in each image. In some cases, the markers were connected by a single contour in the image, while in others, they were placed on different contours. We additionally punctured input images with occlusion bubbles to fragment the contours. This made it difficult to rely solely on edge extraction to solve the task. Example images are shown in Figure 8C.

Figure 8:

Contour tracing in natural images stimuli. (A) Sample starting image from the BIPED data set (Poma et al., 2020). (B) Using its edges label, two markers were randomly placed on edge pixels. (C) During training, images were punctured with occlusion bubbles to randomly fragment all image contours. (D) After training, the impact of fragment spacing was analyzed using test contours with equidistant occlusion bubbles that were placed along contours with various interfragment spacing. The top row shows an example connected class stimulus, and the bottom row shows an example unconnected class stimulus.

Figure 8:

Contour tracing in natural images stimuli. (A) Sample starting image from the BIPED data set (Poma et al., 2020). (B) Using its edges label, two markers were randomly placed on edge pixels. (C) During training, images were punctured with occlusion bubbles to randomly fragment all image contours. (D) After training, the impact of fragment spacing was analyzed using test contours with equidistant occlusion bubbles that were placed along contours with various interfragment spacing. The top row shows an example connected class stimulus, and the bottom row shows an example unconnected class stimulus.

Close modal

We constructed a data set of 50,000 training contours and 5000 validation contours that were extracted from the BIPED data set (Poma et al., 2020). Details of the data set and how it was constructed are described in section 6.10. A binary classifier block (see Figure 2) was used to map CI block outputs to binary decisions, that is, whether the pair of markers in each image was connected by a smooth contour. Performance was measured by comparing the accuracy of network predictions with labels. Training details are described in section 6.3.

Table 2 shows peak classification accuracies averaged across five independent runs for all networks. Over the whole data set, the model performed about 5% worse than the control (validation IoU).

Table 2:

Peak Classification Accuracies on the Contour Tracing in Natural Images Task.

NetworkTrain (%)Validation (%)Test (%)
Model 70.52 ± 0.95 77.27 ± 1.55 70.39 
Control 77.54 ± 0.44 82.67 ± 0.53 65.65 
NetworkTrain (%)Validation (%)Test (%)
Model 70.52 ± 0.95 77.27 ± 1.55 70.39 
Control 77.54 ± 0.44 82.67 ± 0.53 65.65 

Notes: Peak values (mean ± 1 SD) were averaged across five independent runs for each network. The Test column shows results for test stimuli not seen during training and which had a constant interfragment distance of RCD = 1.

3.3.1  Effect of Interfragment Spacing

We wondered whether this natural-image task might elicit the same kinds of sensitivity to contour fragment spacing as synthetic contours (Li et al., 2006) or whether responses to contour spacing were unique to the stimuli used in monkey experiments. We designed a variation of the task that allowed us to investigate this question in the artificial networks. To quantify the effects of interfragment spacing, we created new test stimuli in which interfragment spacing was changed in a controlled manner; occlusion bubbles were added along contours at fixed intervals. Contours were punctured with bubbles of sizes 7, 9, 11, 13, 15, and 17 pixels, corresponding to fragment spacing of [7, 9, 11, 13, 15, and 17]/7 RCD. An example test stimulus is shown in Figure 8D. Details of the stimulus construction are described in section 6.11. Binary classification accuracy was used to quantify behavioral performance, while neuron responses were quantified by the contour integration gain for natural images,
(3.2)
where CI Output @ RCD = 1 is the output activation of an individual neuron responding to its optimal stimulus within the cRF and with the contour fragmented with gaps the same size as the cRF stimulus, while CI Output @ RCD = rcd is the response of the neuron when the spacing between fragments was changed.

When occlusion bubbles were systematically added along contours rather than randomly placed throughout the image, classification accuracies of all networks dropped even for the smallest bubble size (see Table 2, Test column). However, the relative drop in performance for the model (about 6%) was significantly less than that of the control (about 17%), showing that the strategy employed by the model generalized better from the training data to these new stimuli. Figure 9A shows the results of fragment spacing on the behavioral performance of networks. From the least to the most spacing, model performance monotonically dropped by about 4%, consistent with trends in the synthetic contour detection task. The control was unaffected by interfragment spacing.

Figure 9:

Contour tracing in natural images stimuli. (A) Classification accuracy of the model (blue) and control (red) versus fragment spacing. Dark lines show mean accuracies, and shaded areas show unit standard deviation around means. Model performance dropped as spacing increased, consistent with observed behavioral trends. (B) Population average GNI versus spacing. Gains of both networks dropped with spacing. (C, D) Histograms of gradients of linear fits of gain versus spacing results for the control and model, respectively. For all networks, gains decreased with spacing. The model was more sensitive to interfragment spacing. Insets show histograms of gradients of CI block input activations versus fragment spacing. Input gradients did not change significantly with spacing, showing that observed trends were learned by CI blocks.

Figure 9:

Contour tracing in natural images stimuli. (A) Classification accuracy of the model (blue) and control (red) versus fragment spacing. Dark lines show mean accuracies, and shaded areas show unit standard deviation around means. Model performance dropped as spacing increased, consistent with observed behavioral trends. (B) Population average GNI versus spacing. Gains of both networks dropped with spacing. (C, D) Histograms of gradients of linear fits of gain versus spacing results for the control and model, respectively. For all networks, gains decreased with spacing. The model was more sensitive to interfragment spacing. Insets show histograms of gradients of CI block input activations versus fragment spacing. Input gradients did not change significantly with spacing, showing that observed trends were learned by CI blocks.

Close modal

Figure 9B shows population-averaged contour integration gains as interfragment spacing increased. Population averages were found by averaging gains of individual neurons for which the optimal stimuli were found and across all five networks (trained from different random initializations) of each type. Model results were averaged across 293 neurons, while control results were averaged across 120 neurons. Response gains of the control network were similar regardless of spacing, in contrast with their marked increase with spacing in the synthetic contour task. Response gains in the model decreased with increasing fragment spacing, consistent with the synthetic contour task, although the changes were less pronounced in this case.

We further analyzed the impact of fragment spacing on output activations using linear fits of output activation versus fragment spacing of individual neurons. Histograms of the slopes are plotted in Figures 9C and 9D for the control and the model networks, respectively. Similar to population-averaged gain results, model outputs dropped more sharply while control output activations only dropped slightly as spacing increased.

Overall, the model behaved more consistently than the control. Its performance was less affected by new stimuli outside the training distribution, and its responses to fragment spacing were similar to those of both synthetic contours and natural images.

3.4  The Effect of Separating Excitation and Inhibition

Following Piëch et al. (2013) and consistent with physiology, our model has separate excitatory and inhibitory neurons. This requires a constraint on the synaptic weights that is rarely used in deep learning and may have an impact on performance as well as neuron responses. We created a version of the model without this constraint to test its effects. We refer to this variant as the relaxed-positivity-constraint model (RPCM). In the RPCM, each element of the lateral-connection kernels was allowed to take on any value individually. However, net lateral interactions were still restricted to be positive. This was accomplished with ReLU nonlinearities operating on the weighted sums of the lateral inputs to each neuron. This is similar to the approach of Linsley et al. (2020) to model biologically plausible lateral interactions. However, individual neurons no longer conform with Dale’s principle (Dale, 1935; Eccles et al., 1954) as they can have both excitatory and inhibitory influences on other neurons. With this modification, the membrane potential equations of E and I nodes were defined as
(3.3)
(3.4)
where parameters are defined in section 2.1.

On the fragmented contours data set, the RPCM network outperformed the model by about 7% and the control by about 18% (Train IoU =94.11±0.05%, Validation IoU =91.40±0.12%, averaged across three networks), even though it was trained for half the time (see section 6.3). The effect of contour length on behavioral performance is shown in Figure 10A. For all contour lengths, IoU scores of the RPCM network were higher than those of the model. Moreover, performance monotonically increased for contours of length three or longer, consistent with behavioral data. Neuron response gains also increased monotonically with contour length (see Figure 10B; results averaged over 149 neurons from three networks). However, these increases were not as pronounced as those of the model network. Similarly, RPCM neurons responded less to more widely spaced fragments, but the difference was not as pronounced as in the model network (see Figure 10C).

Figure 10:

RPCM network results. The RPCM network is similar to the model but does not use a strict positive-only constraint on lateral kernels. (A) IoU versus contour length for centrally located straight contours. RPCM network IoU scores were higher than the models for all contour lengths. (B, C) Population average contour integration gains versus length and versus fragment spacing, respectively. Similar to the model, gains monotonically increased with contour length, although they were smaller. For interfragment spacing of up to 1.5 RCD, gains monotonically decreased. For larger spacing, gains increased slightly. (D) Validation IoU scores of the RPCM network on the edge detection task in natural images. The RPCM network had a slightly higher IoU score than all other networks. (E, F) Average prediction difference between RPCM and control networks for edges (E) and nonedges (F) as a function of control prediction strength. Similar to the model, the RPCM network had stronger responses to weak edges compared to the control network. (G) Classification accuracy of the RPCM network versus fragment spacing in the contour tracing in natural images task. Performance dropped monotonically as interfragment spacing increased. (H) Population average GNI versus. spacing. Neuronal gains dropped more sharply than the control but not as much as model gains. (I) Histograms of gradients of linear fits of gain versus spacing results for the RPCM network. Similar to the model, a range of primarily negative contour integration gains was observed at the output of the CI block.

Figure 10:

RPCM network results. The RPCM network is similar to the model but does not use a strict positive-only constraint on lateral kernels. (A) IoU versus contour length for centrally located straight contours. RPCM network IoU scores were higher than the models for all contour lengths. (B, C) Population average contour integration gains versus length and versus fragment spacing, respectively. Similar to the model, gains monotonically increased with contour length, although they were smaller. For interfragment spacing of up to 1.5 RCD, gains monotonically decreased. For larger spacing, gains increased slightly. (D) Validation IoU scores of the RPCM network on the edge detection task in natural images. The RPCM network had a slightly higher IoU score than all other networks. (E, F) Average prediction difference between RPCM and control networks for edges (E) and nonedges (F) as a function of control prediction strength. Similar to the model, the RPCM network had stronger responses to weak edges compared to the control network. (G) Classification accuracy of the RPCM network versus fragment spacing in the contour tracing in natural images task. Performance dropped monotonically as interfragment spacing increased. (H) Population average GNI versus. spacing. Neuronal gains dropped more sharply than the control but not as much as model gains. (I) Histograms of gradients of linear fits of gain versus spacing results for the RPCM network. Similar to the model, a range of primarily negative contour integration gains was observed at the output of the CI block.

Close modal

On the task of edge detection in natural images, RPCM networks peaked at a mean IoU score of 0.46 and slightly outperformed other networks (see Figure 10D). Like contour-integration model neurons, RPCM neurons had larger responses to weak edges than control neurons (see Figure 10E). Relative to control responses, RPCM responses varied in much the same way as model responses, although the variations were somewhat less pronounced. Similar to the contour-integration model, RPCM networks enhanced weaker contours, but this did not substantially affect task performance.

There were larger differences between the model and the RPCM on the task of contour tracing in natural images. The RPCM network outperformed the model by about 13% and the control by about 8% (Train 92.90 =± 0.14, Validation = 90.62 ± 0.21). When tested with contours that were fragmented with fixed interfragment spacing, RPCM network performance dropped by about 6% (Test = 84.36). The drop was similar to what was observed for the model and was substantially less than that of the control. RPCM networks retained the generalization properties of the model while improving overall performance. Performance also dropped monotonically with interfragment spacing (see Figure 10G), similar to the model. Neural response gains in the RPCM also decreased with increasing fragment spacing (see Figure 10H, averaged across 257 neurons), intermediate between the model and control gains.

In summary, RPCM networks trained more quickly than contour-integration model networks and outperformed both the model and the control on every task. RPCM neurons’ responses to contour length and fragment spacing were intermediate to those of control and model neurons, but qualitatively consistent with monkey data (i.e., stronger responses with longer contours and tighter fragment spacing). Thus, Dale’s principle may have helped to account for monkeys’ neural responses, while at the same time it was functionally counterproductive in these networks and tasks.

Despite the separation of excitation and inhibition in the brain, the functional connection from any neuron to another could, in principle, be either excitatory or inhibitory depending on the strengths of direct connections and indirect connections through inhibitory interneurons (Parisien et al., 2008). We wondered whether there was a similar equivalence in our model network. Analysis of the dynamic equations (see the appendix) indicated that the contour-integration model could become functionally equivalent to the RPCM at steady state. This suggests that functional differences may be due to transient responses and/or the model being more difficult to optimize with standard algorithms in deep learning.

As a category, deep networks are the most realistic models of the brain in terms of neural representations (Schrimpf et al., 2018) and behavior, including near-human performance on a wide range of vision tasks. However, they lack many well-known mechanisms that seem to prominently affect the function of real brains. Local circuit models (Li, 1998; Piëch et al., 2013; Mély et al., 2018; Rubin et al., 2015; Carandini & Heeger, 2012) have the opposite limitation. They reflect specific physiological phenomena faithfully but lack sophisticated perceptual abilities. Each of these approaches has limitations that might be alleviated by integration with the other, but such integration is rare.

Contour integration in particular has been studied extensively, but the scope of its role in visual perception is uncertain. Contour integration in V1 may occur too late (Li et al., 2006) to drive core object recognition, which involves selectivity in inferotemporal cortex 100 ms after stimulus onset (DiCarlo et al., 2012). It is not necessary for visual motion perception, which proceeds robustly in the absence of contours (Shadlen et al., 1996). Contour integration may play a role in later stages of object recognition, together with dynamics in higher areas of the ventral stream. It could also bias core object recognition if inferotemporal neurons learn to predict their future inputs. Such a mechanism might help to account for humans’ greater reliance on contours in object recognition compared with deep networks (Geirhos et al., 2019; Baker et al., 2018). Contour integration has been proposed to strengthen the representation of weak edges in complex scenes (Li, 1998; Piëch et al., 2013). It seems also to play a role in perceptual grouping, related to the Gestalt laws of good continuation, proximity, and similarity (Wertheimer, 1938; Elder & Goldberg, 2002). It may also be involved in segmentation, or in other kinds of reasoning about visual scenes. Integrating local circuit models into a deep network may help to clarify the plausibility of various potential roles of contour integration in higher-level visual tasks and may lead to new questions and predictions.

4.1  Main Findings

Our integration of a contour integration model with a deep network has produced new insights, discussed below.

Realistic physiology emerges from training the model to detect contours in a background of randomly oriented line segments. In contrast with past work, our model was initialized with random synaptic weights and optimized as a whole to perform various tasks. When we trained the model to perform a contour detection task, which was similar to tasks that have been used to study contour integration in monkeys and humans, the model learned a physiologically realistic local circuit organization. Specifically, neurons in the trained model had local edge responses that were enhanced in the presence of contours, and this enhancement varied with contour length and contour fragment spacing in physiologically realistic ways. Neurons in a similar feedforward network that was trained to perform the same task did not have physiologically realistic contour responses. Their responses did not depend appreciably on contour length, and they increased instead of decreasing with contour fragment spacing. Furthermore, our contour integration model learned excitatory lateral connections that were elongated and largely aligned with neurons’ preferred orientations, as observed in the brain (Sincich & Blasdel, 2001). Past models have already established that such lateral connection patterns can produce realistic contour responses by imposing these connection patterns on the model. Our work reinforces this link by showing that it emerges consistently from an optimization process. In other words, we showed that both the lateral connections and physiological responses associated with contour integration are optimal for detecting contours in these synthetic stimuli, among a fairly generic family of networks with broad lateral connections and separate excitatory and inhibitory neurons.

Contour fragment spacing affects response gains similarly in natural and synthetic images. We occluded contours in natural images to test how spacing of visible contour fragments would affect contour gains. We found that greater fragment spacing monotonically reduced response strength. This result was qualitatively similar to the effect of contour spacing in synthetic images, although it was less pronounced. We do not believe that the effect of contour fragment spacing in natural images has been tested in monkeys. This would be informative, because the response patterns observed so far may only occur in response to specialized synthetic images, which would limit their ethological relevance. However, our computational results suggest that the phenomenon can generalize beyond synthetic images.

A contour integration model strengthens representation of edges with weak local cues in natural images. We trained the contour integration network to detect edges in natural scenes. Compared with a feedforward control network, this network responded more uniformly to local edge cues, with stronger responses to weak edges and weaker responses with strong edges. This confirms a suggestion by Piëch et al. (2013) and Li (1998) that was previously tested with only a single image. However, despite these changes in local edge representation, we did not find that the contour integration model facilitated edge detection overall. The weakest edges were strengthened the most, but not enough that they exceeded the detection threshold. Indeed, because the transition from strengthening to weakening occurred near the detection threshold and because the differences were not sufficiently pronounced (specifically, the slope in Figure 7B was less than -1), the differences in representation had little effect on edge detection. These results elaborate a previous proposal about the role of contour integration in natural images. However, while the use of natural images goes part of the way toward confronting the role of contour integration in natural life, edge detection per se has limited survival value. It may be fruitful in the future to consider edge representations in service of a higher-level perceptual task. In such a context, effects of contour integration below the edge detection threshold may become more relevant.

The contour integration mechanism can impair contour following. When we trained the model to determine whether two points in a natural image belonged to the same contour, the model performed substantially worse than the feedforward control (about 77% versus about 83% correct; chance performance 50%). This outcome was consistent with the impressive performance of standard convolutional networks in a wide range of vision tasks. However, it was unexpected, because the task directly involved contours. This outcome was also complicated by two factors. First, the model was better able to generalize to new stimuli than the control network. Second, the RPCM variation of the model, which did not respect Dale’s principle, outperformed the control (about 91% correct). The RPCM appropriately constrains the signs of net lateral influences and exhibits physiological responses that are more realistic than those of the control network. These results indicate that recurrence in general facilitates this task and more, specifically, that recurrence with some physiological properties can be beneficial. Results with the model network also show that contour integration can produce a solution that generalizes well outside the range of prior experience. However, the results do not support our expectation that physiologically realistic contour integration would improve performance of this task.

Dale’s principle consistently impaired performance. As a general rule, neurons release the same small-molecule neurotransmitter at each synapse (Dale’s principle), leading to distinct groups of excitatory, inhibitory, and modulatory neurons. Accordingly, our model had separate groups of excitatory and inhibitory neurons. We also tested a variant of the model (the relaxed positivity constraint model, RPCM) that did not respect Dale’s principle but allowed the optimization process to make any synaptic weight either excitatory or inhibitory. In every task, the RPCM outperformed the more biologically grounded model. This is unsurprising because Dale’s principle amounts to a constraint on the model parameters. It is for this reason that Dale’s principle has not been adopted in deep learning.

It is unclear why Dale’s principle has been adopted in the brain, for that matter. Exceptions suggest that it could have been otherwise. For example, glutamate is normally excitatory but has inhibitory effects associated with certain receptors (Katayama et al., 2003). Some neurons elicit a biphasic inhibitory-excitatory response due to cotransmission of dopamine and GABA (Liu et al., 2013) or glutamate and GABA (Shabel et al., 2014), and others change from excitatory to inhibitory depending on the presence of brain-derived neurotrophic factor (Yang et al., 2002). So the fact that excitation and inhibition are largely separate in the brain seems to suggest that this separation is consistent with effective information processing in ways that have yet to be exploited in deep networks.

The fact that Dale’s principle impaired our model could indicate that it impairs performance of contour-related tasks in the brain or that our model is missing other factors (e.g., feedback from higher areas, or a different kind of plasticity) that keep it from impairing performance in the brain. Consistent with the former possibility, the model that respected Dale’s principle produced the most physiologically realistic responses. However, there may be another solution that has both realistic physiology and superior task performance. Analysis of the dynamic equations indicates that the model and RPCM can become equivalent in certain conditions. This may suggest that the constrained and unconstrained models could learn similar behavior given suitable learning rules. Related to this, recent work, Cornford et al. (2021) has shown that carefully designed feedforward networks with separate layers of excitatory projection neurons and intermediate inhibitory neurons can learn as well as standard deep networks, and an extension of this approach to recurrent networks was proposed. A related approach was shown to introduce new modes of instability in recurrent networks (Tripp & Eliasmith, 2016), a promising direction for future work. Alternatively, while our model learned task-optimized lateral connections, unsupervised learning of lateral connections, as in Iyer et al. (2020), might be more effective.

4.2  Related Work

Apart from an earlier version of this work (Khan et al., 2020), our model is most closely related to the horizontal gated recurrent unit (hGRU) model (Linsley et al., 2018), which similarly embeds a learnable circuit model of a low-level neural phenomenon into a larger ANN. Here we discuss some of the distinctions with that work. First, the objectives were different. Whereas we sought to test a physiologically grounded circuit model within a deep network, the purpose of the hGRU model was to improve task-level performance by using lateral connections to address the inefficient detection of long-range spatial dependencies in CNNs. Many biological constraints were relaxed to achieve higher performance. Second, the two models use different embedded circuit models. The hGRU model uses the circuit model of Mély et al. (2018), a model of surround modulation, while our model uses the contour integration circuit model of Piëch et al. (2013). Third, recurrent interactions in the hGRU model are derived from gated recurrent unit (GRU) networks (Chung et al., 2014). These networks are trainable and expressive, but their internal architectures are complex and difficult to map onto circuits of the brain. Fourth, because we constrained our learned lateral connections to be positive only, a more detailed analysis and comparison of lateral kernels was possible. In particular, we were able to compare the axis-of-elongation of lateral kernels with orientation preferences.

The V1Net model of Veerabadran & de Sa (2020) also incorporates biologically inspired lateral connection into ANNs for contour grouping tasks. The model is similar to hGRU (Linsley et al., 2018) but derives its recurrent interactions from convolutional long-short-term-memory (conv-LSTM) networks (Shi et al., 2015). Consistent with the results of the hGRU model, they find that certain recurrent ANNs, especially those with biological constraints, can match or outperform a variety of feedforward networks, including those with many more parameters. Moreover, on these tasks, they train more quickly and are more sample efficient.

Local circuits are of much interest in neuroscience, but their roles in perception and behavior are mediated by the rest of the brain. Ideas about these relationships can be tested for plausibility by integrating biologically grounded models of local circuits into functionally sophisticated deep networks. Overall, our work to integrate a contour integration model into a deep network has not supported a role for this circuit in the natural-image tasks we investigated (a contour following task and detection of edges in complex natural images). This may be due to limitations of the model, although the model’s physiologically realistic responses suggest that it has much in common with the brain circuit. More work is needed to determine whether incorporating other physiological factors might produce a model that is more effective (similar to our model variant without constraints on the weight signs) without being less realistic and to test the role of contour integration in a wider range of tasks. This line of work may be important for understanding the role of contour integration in natural life.

6.1  Contour Integration Block Parameters

The architecture of the model’s contour integration (CI) block is shown in Figure 2. In the brain, V1 lateral connections of orientation columns are sparse and preferentially connect with other orientation columns with similar selectivities (Malach et al., 1993). Furthermore, these connections are long and can extend up to eight times the classical receptive fields (cRF) of V1 neurons (Stettler et al., 2002). Rather than using hard-coded lateral connections, we connected all columns within an S×S neighborhood and used task-level optimization to learn them. For edge extraction, we used the first convolutional layer of a ResNet50 (He et al., 2016). It uses 7×7 kernels, and we defined S to be 15×15. Additionally, a sparsity constraint was used during training to retain only the most important connections (see section 6.3).

Incoming feedforward signals iterated through the CI block for Niters steps before E node outputs were passed to deeper layers. We used Niters=5, which we found to be a good trade-off between performance and run-time. Connection strengths σ(Jxy), σ(Jyx) were initialized to 0.1, while time constants σ(a), σ(b) were initialized to 0.5. Each neuron incorporated a rectified linear unit (ReLU) activation function, except where noted.

6.2  Classification Blocks of the Network

For the task of detecting fragmented contours, CI block outputs were fed into the fragments classifier block (see Figure 2). It consisted of two convolutional layers. The first layer contained 16 kernels of size of 3 × 3 and used a stride length of 1, while the second layer used a single kernel of size 1 × 1. There was a batch normalization layer between the two convolutional layers. The final convolutional layer used a sigmoid nonlinearity to generate prediction maps.

For the test of edge detection in natural images, CI block outputs were passed to an edge detection block (see Figure 2). These outputs were upsampled by a factor of 4, using bilinear interpolation, to resize them back to input sizes. Upsampled activations were passed through two convolutional layers before prediction maps were generated. The first convolutional layer contained eight kernels of size 3 × 3 and used a stride length of one. There was a batch normalization layer after the first convolutional layer. The last convolutional layer contained a single kernel of size 1 × 1 and was used to flatten activations to a single channel. Outputs of the final convolutional layer were passed though a logistic sigmoid nonlinearity to generate prediction maps.

For the task of detecting whether two markers were connected by a smooth contour, CI block outputs were passed to the binary classifier block (see Figure 2) that also consisted of two convolutional layers. The first convolutional layer consisted of eight kernels of size 3 × 3 and used a stride of three. As in the other detection blocks, there was a batch normalization layer after the first convolutional layer. The final convolutional layer used a single kernel of size 1 × 1 and used a stride of one. Finally, a global average pooling layer (Lin et al., 2013) mapped output activations to a single value that could be compared with image labels.

6.3  Training

Networks were trained to minimize binary cross-entropy loss,
(6.1)

where p{0,1} is the label and q[0,1] is the network prediction. Here, N represents the total across all images, as well as the total predictions per image.

To encourage sparse lateral connections, L1 regularization loss multiplied with an inverted 2D gaussian mask was applied over excitatory and inhibitory lateral kernels,
(6.2)
where G(.) is a normalized 2D gaussian mask whose spatial spread, σM, is defined by its standard deviation. The use of the gaussian mask encouraged a more gradual reduction of connection strength with distance.
The total loss was defined as
(6.3)
where λ is a weighting term for sparsity loss.

For the sparsity constraint, σM was set to 10 pixels while λ was set to 1e-4. Learned lateral connections of the model (but not the control) were restricted to be positive only. After every weight update step, negative weights were clipped to 0.

All networks were trained with the Adam (Kingma & Ba, 2014) optimizer. In the synthetic contour fragments detection and the contour tracing in natural images tasks, both the model and control were trained for 100 epochs with a starting learning rate (lr) of 1e-4, which was reduced by a factor of two after 80 epochs. The RPCM network was trained for 50 epochs with the same starting lr which was dropped by a factor of two after 40 epochs. Trained RPCM networks had fully converged after 50 epochs and did not noticeably improve with additional training. For edge detection in natural images, networks were trained for 50 epochs with an initial lr of 1e-3, which was reduced by a factor of two after 40 epochs. A fixed batch size of 32 images was used in all tasks.

All input images were fixed to a size of 256 × 256 pixels, resizing images and labels when necessary. Input pixels were preprocessed to be approximately zero-centered with a standard deviation of one on average. Synthetic contour fragment images were normalized with data set channel means and standard deviations while natural images were normalized with ImageNet values. In the contour tracing in natural images tasks, input images were punctured with occlusion bubbles as described in section 6.10.

6.4  Metrics

6.4.1  Mean Intersection-over-Union

For tasks with multiple outputs per image, behavioral performance was measured using the mean intersection-over-union metric,
(6.4)
where N is the number of images in the data set and A and B are the per-tile/pixel binary network predictions and labels for image n, respectively.

To get binary network predictions for an image, network outputs were passed through a sigmoid nonlinearity and thresholded. The intersection with the labels was found by multiplying the predictions with their corresponding labels, while the union was found by summing labels and predictions followed by subtracting the intersection of the two. An IoU score of 1 signifies a perfect match between predictions and labels, while an IoU score of 0 means that there is no match between what the network predicted and the label. Mean IoU score was found by averaging IoU scores over the full data set.

For the contour fragments data set, a threshold of 0.5 was used, while for the contour detection in natural images tasks, a value of 0.3 returned the best scores. IoU scores dropped off monotonically as detection threshold deviated away from 0.3 for all networks.

6.5  Synthetic Contour Fragments Stimuli

We used stimuli similar to those of Field et al. (1993). Each input stimulus consisted of a 2D grid of tiles that contained Gabor fragments that were identical except for their orientations and positions. The orientations and locations of a few adjacent fragments were aligned to form a smooth contour. The remaining (background) fragments had randomly varying orientations and positions.

To construct each stimulus, first, a Gabor fragment, contour length in number of fragments, lc, and contour curvature, β, was selected. Each Gabor fragment was a square tile the same size as the cRF (kernel spatial size) of the preceding edge extracting layer. Second, a blank image was initialized with the mean pixel value of all boundary pixels of the selected Gabor. Third, the input image was sectioned into a grid of squares (full tiles) whose length was set to the pixel length of a fragment plus the desired interfragment spacing, dfull. The grid was aligned to coincide the center of the image with the center of the middle full tile. Fourth, a starting contour fragment was randomly placed in the image. Fifth, the location of the next contour fragment was found by projecting a vector of length dfull±dfull/8 and orientation equal to the previous fragment’s orientation ±β. The random direction change of β and distance jitter were added to prevent them from appearing as cues to the network. Sixth, a fragment rotated by β, was added at this position. The fifth and sixth steps were repeated until lc/2 contour fragments were added to both ends of the starting fragment. Seventh, background fragments were added to all unoccupied full tiles. Background fragments were randomly rotated and positioned inside the larger full tiles. Finally, a binary label was created for each full tile indicating whether it contained the center of a contour fragment.

In all training images, interfragment spacing and fragment length were equal. A fixed input image size of 256 × 256 pixels was used. Gabor fragments of size 7 × 7 pixels and full tile of size 14 × 14 pixels were used in stimulus construction. This resulted in labels of size 19 × 19 for each input stimulus.

The full data set contained 64,000 training and 6400 validation images. In its construction, 64 different Gabor types, lc of 1, 3, 5, 7, 9 fragments and interfragment rotations β of 0±15 were used. Gabor parameters were manually picked with the only restriction that the Gabor fragment visually appear as a well-defined line segment. Each Gabor fragment was defined over three channels, and the data set included colored as well as standard black and white stimuli. lc= 1 stimuli were included to teach the model to not do contour integration when there are no co-aligned fragments outside the cRF. Contour integration requires inputs from outside the cRF, and the model had to learn when not to apply enhancement gains. For these stimuli, the label was set to all zeros. An equal number of images were generated for each condition. Due to the random distance jitter, interfragment rotations, and the location of contours, multiple unique contours were possible for each condition. Moreover, background fragments varied in each image.

6.6  Test Synthetic Contour Fragments Stimuli

We used test stimuli similar to those of Li et al. (2006). These consisted of centrally located contours of different lengths and interfragment spacing. Test stimuli were similar to training stimuli except that the starting contour fragment was always centered at the image center. This ensured that centrally located neurons (whose outputs were measures) always received a full stimulus within their cRF. Furthermore, test stimuli were constructed in an online manner whereby the optimal stimulus of each centrally located neuron in each channel was first found by checking which of the 64 Gabor fragments elicited the maximum response in the cRF. Next, contours were extended in the direction of the preferred orientations of selected Gabors. The effects of contour length were analyzed using lc=1, 3, 5, 7, 9 fragments and a fixed spacing of RCD =1 (see Figure 3C). The effects of interfragment spacing were analyzed using RCD = [7, 8, 9, 10, 11, 12, 13, 14] / 7 and a fixed lc=7 fragments (see Figure 3D). For each condition, results were averaged across 100 different images.

6.7  Lateral Kernel Analysis

We followed the method of Sincich and Blasdel (2001) to quantify the directional selectivity and find the axis-of-elongation of lateral connections. First, Sincich and Blasdel (2001) identified locations where stained lateral connections terminated in clusters (patches). Next, they constructed vectors originating at the injection site and ending at patch centers. Given a set of patch vectors of a V1 orientation column, an averaging vector R was computed. Because patch vectors pointing in opposite directions represent lateral connections extending in the same direction, the orientations of individual patch vectors were doubled before computing the vector sum. Consequently, patches that were in opposite directions summed constructively, while those that were orthogonal summed destructively. After computing the vector sum, the resultant angle was halved to get its axis-of-elongation, θ. To quantify directional selectivity, the magnitude of the averaging vector was normalized by the magnitude sum of all patch vectors to get a normalized index of ellipticity rn.

Similarly, for trained models, we analyzed the lateral kernels of the CI block. Each input channel of lateral kernels receives output from a specific kernel in the previous feedforward layer. This signal is passed to all other neurons in a defined area as specified by its individual connection kernel. For each input channel, we constructed patch vectors starting at the kernel center and extending to each nonzero weight. This was slightly different from the approach of Sincich and Blasdel (2001) as we considered all weights rather than patch centers only. Moreover, individual patch vectors were weighted with their connection strengths. Stronger weights contributed more to the average vector compared to weaker ones. Next, similar to Sincich and Blasdel (2001), we computed an averaging vector and used it to compute the axis-of-elongation and directional selectivity of lateral connections.

Axes-of-elongation of lateral connections were compared with the preferred orientation of source feedforward edge extraction neurons. To find the preferred orientation of a kernel in the edge extraction layer, we least-square-fit each channel to a 2D Gabor function that was defined by eight parameters: the x and y location of its center, its amplitude, the orientation, wavelength, and phase offset of its sinusoid component, and the spatial extent and ratio of the spread in the x versus y direction of its gaussian envelope. The orientation of the channel with the highest amplitude was selected as the kernel’s preferred orientation. Orientation preferences of the pretrained edge extraction kernels are shown in Figure S1 (see the Supplementary Information section). We found Gabor fits for 42 out of the 64 kernels of the edge extracting layer.

Only those lateral kernels for which the orientation of feedforward kernels were found were considered in the analysis. Excitatory-targeting and inhibitory-targeting lateral kernels were analyzed separately.

6.8  Lateral Kernel Visualization

In the model, lateral connections were implemented using convolutional layers within the CI block. Each convolutional layer had kernels of size [chin, chout, S, S]. Here, chin is the number of feedforward input channels, chout is the number of output channels and S×S is the spatial extent of these lateral connections. To visualize lateral connections of a particular feedforward kernel in the preceding layer (input channel), we summed out output channels and plotted their spatial spread.

6.9  Network Predictions Strengths Comparison in Natural Images

To compare predictions of the model and the control at different edge strengths, first prethreshold control and model outputs over the entire BIPED validation data set were collected. Second, a sliding window of size 0.2 was run over control outputs to highlight pixels whose predictions lay within the desired range. Third, corresponding predictions of the model were found. Fourth, the average differences between model and control predictions were calculated. The process was repeated over the full range of predictions (0, 1) by sliding the window at intervals of 0.1.

Edge pixels and nonedge pixels were separately analyzed. To extract edge predictions, network outputs were multiplied with the ground-truth mask, while to separate nonedge pixels, network outputs were multiplied with the inverted ground truth mask. Considering edge pixels, if the mean difference is above zero, this suggests that the model is better at detecting pixels of the corresponding strength. Considering nonedge pixels, if the mean difference is below zero, then the model has a lower tendency for false positives.

6.10  Contour Tracing in Natural Images Stimuli

The construction of stimuli for the contour tracing in natural images task required selecting contours in natural images. We randomly extracted a smooth contour C1 from a BIPED (Poma et al., 2020) image using its edge map. Contours were extracted by first selecting a random starting edge pixel from the edge map. Valid starting pixels had to be part of a straight contour in their 3 × 3 pixel vicinity, either vertically, horizontally, or diagonally. Next, this starting contour was extended at both ends by adding contiguous edge pixels that were at most ±π/4 radians from the local direction of the contour. The local direction of the contour was defined as the angular difference between the last two points of the contour. If there was more than one candidate edge pixel, the candidate with the smallest offset from the contour direction was selected. The process was repeated until there were no more edge pixels at candidate positions or if the selected candidate pixel was already a part of C1 (circular contours). Additionally, once the contour length was greater than eight pixels, a large-scale smooth curvature constraint was applied to check that the angle difference between (n,n-4) and (n-4,n-8) contour points was not greater than π/4 radians, where n is the last point on the contour. Contour extraction was also stopped if the large-scale curvature constraint was not met.

After extracting C1, one of its end points was chosen as the position of the first marker, M1. Next, a second edge pixel that did not lie on C1 was randomly selected. To ensure that connected and unconnected stimuli had similar separation distances, the selection process used a nonuniform probability distribution to favor edge pixels that were equidistant with the unselected end point of C1. First, distances for all edge pixels from M1 were calculated. Next, the absolute difference between distances of edge pixels and the distance to the unselected end point of C1 was calculated. A Softmax function was used to convert negative distances differences to probabilities. Edge pixels that were of a similar distance to the unselected end point of C1 had distance differences close to zero and were more likely to be selected, while edge pixels that were at a different distance had large negative distance differences and were less probable.

Given the location of the second edge pixel, a second contour, C2, was extended from it. If any point on C2 overlapped with C1, a new starting-edge pixel was selected and the process was repeated until a nonoverlapping pair of contours was found. The location of the second marker, M2, was determined by the type of stimulus. For connected stimuli, the opposite end of C1 was selected as M2, while for unconnected stimuli, one of the end points of C2 was chosen. Once marker positions were determined, markers were placed at corresponding positions in the input image. Each marker consisted of a bull’s-eye of alternating red and blue concentric circles (see Figure 8B). Markers were added directly to input natural images, and networks were given no information about the selected contours.

To fragment contours, occlusion bubbles were added to input images. Following Gosselin & Schyns (2001), bubbles with a 2D gaussian profile were used to reduce the impact of bubble edges. Each image was punctured using 200 bubbles of multiple sizes. Bubble sizes were specified by the full-width half-maximum (FWHM) of 2D gaussian functions and were chosen to correspond to bubble sizes used to explore the effects of fragment spacing on neurophysiological gains (see section 6.11. Individual bubbles were defined over a 2 × FWHM square area. After randomly selecting bubble sizes and locations, bubbles were placed in a mask that was used to blend the input image with image channel mean values using
(6.5)

Within a mask, bubbles were allowed to overlap, and a different mask was used for each image. Values in the bubble mask ranged between [0, 1]. Sample input training images for the contour tracing task are shown in Figure 8C.

The train data set contained 50,000 contours that were extracted from BIPED train images, while the validation data set contained 5000 contours that were extracted from BIPED test images. Since the BIPED test data set contains only 50 images, multiple contours per image were extracted. Care was taken to ensure duplicate contours were not selected. Puncturing of input images was done as a preprocessing step during the training loop. Consequently, each exposure of an image to a network was unique. Equal probabilities were used for generating connected and unconnected stimuli.

6.11  Test Contour Tracing in Natural Images Stimuli

Similar to when the effects of interfragment spacing were analyzed using synthetic fragmented contours, the optimal stimuli of target neurons needed to be found. In the synthetic contour fragments data set, test images were designed to contain the optimal stimuli of monitored neurons. However, for natural images, inputs cannot be defined in a similar way. Therefore, a new procedure was devised. To find the optimal stimulus of an individual channel, multiple unoccluded connected contours were presented to networks (see Figure 8B). For each image, the position of the most active neuron of each channel in the CI block was found. If it was within three pixels (the same as the stride length of the subsequent convolutional layer) of the contour, the image as well as the position of the most active neuron were stored. The process was repeated over 5000 contours and the top 50 (contour, most active neuron) pairs were retained for each channel. New random contours were selected from the augmented BIPED train data set. The train data set, as opposed to the test data set, was used as it contained more images and a larger variety of contours.

Given the optimal stimulus for a channel, each input contour was fragmented by inserting occlusion bubbles at specific positions along the contour. Different bubble sizes were used to fragment contours with different interfragment spacing. A fixed fragment length of seven pixels, the same size as the cRF of edge extracting neurons, was used. To ensure the cRF of the most active neuron was unaffected by bubbles, first, the position of the closest point on the contour was found. Bubbles were then inserted along the contour at offsets of ±(lfrag+lbubble)/2,±3(lfrag+lbubble)/2,±5(lfrag+lbubble)/2,... until the ends of the contour. Finally, the blending-in area of bubbles was restricted to FWHM pixels to ensure visible contour fragments were unaffected.

The model’s dynamic equations are,
(A.1)
(A.2)
The term inside square brackets in equation 2.1 is the drive into x,
(A.3)
y does not receive inhibitory input, so if I0i is positive, then y is positive, and the rectifying function fy can be ignored. Suppose we set σ(Jxy)=1. Under these conditions, d can be simplified to
(A.4)
As yt approaches steady state,
(A.5)
We can set σ(Jyx)=0 by absorbing this factor into Wi. Then,
(A.6)
In summary, Wi and We affect xt in the same way in the following conditions: (1) when y reaches steady state; (2) assuming I0i0; and (3) σ(Jxy)=1; 4) σ(Jyx)=0. So in these conditions, if both matrices were unconstrained and contained both positive and negative values, Dale’s principle could be reestablished by moving the positive values to We and the negative values to Wi, without change of function.

More generally, the latter two factors do not have to be enforced. If σ(Jyx)>0, then it can be moved into the diagonal of Wi. Similarly, if σ(Jxy)=g<1, then it can be multiplied by 1/g and Wi and I0i multiplied by g, without changing the function.

This suggests that differences between model and RPCM are due to transient dynamics and learning dynamics (i.e., the model may be structurally capable of RPCM performance, but the solution may not be reachable via backpropagation and Adam).

111213
Figure S1:

Feedforward edge extraction kernels and their preferred orientations. Each subplot shows one of the 64 kernels of the first convolutional layer of a ResNet50 model that was trained on ImageNet (Deng et al., 2009). This served as the main component of the edge extraction block. It contains 64 kernels, and each kernel has three input channels and a spatial spread of 7 × 7 pixels. Each kernel was fit to a 2D Gabor function to find its preferred orientation (red lines). The fitting algorithm was able to find the orientation preferences of 42 kernels. Kernels for which no fits were found (no red line) were skipped and not used in the analysis.

Figure S1:

Feedforward edge extraction kernels and their preferred orientations. Each subplot shows one of the 64 kernels of the first convolutional layer of a ResNet50 model that was trained on ImageNet (Deng et al., 2009). This served as the main component of the edge extraction block. It contains 64 kernels, and each kernel has three input channels and a spatial spread of 7 × 7 pixels. Each kernel was fit to a 2D Gabor function to find its preferred orientation (red lines). The fitting algorithm was able to find the orientation preferences of 42 kernels. Kernels for which no fits were found (no red line) were skipped and not used in the analysis.

Close modal
Figure S2:

Learned lateral excitatory kernels. Each subplot plots one of the 64 learned excitatory lateral kernels of a model trained on the synthetic contour fragments data set. Individual kernels had 64 channels and had a spatial spread of 15 × 15. To view the kernels, the channel dimensions were compressed by summing over all channels. Many excitatory kernels appear to be highly directed, spreading out in one dimension more than others. See section 6.8 for the procedure used to visualize these kernels.

Figure S2:

Learned lateral excitatory kernels. Each subplot plots one of the 64 learned excitatory lateral kernels of a model trained on the synthetic contour fragments data set. Individual kernels had 64 channels and had a spatial spread of 15 × 15. To view the kernels, the channel dimensions were compressed by summing over all channels. Many excitatory kernels appear to be highly directed, spreading out in one dimension more than others. See section 6.8 for the procedure used to visualize these kernels.

Close modal
Figure S3:

Learned lateral inhibitory kernels. Each subplot plots one of the 64 learned inhibitory lateral kernels of a model trained on the synthetic contour fragments data set. Individual kernels had 64 channels and had a spatial spread of 15 × 15. To view the kernels, the channel dimensions were compressed by summing over all channels. The spatial extent of inhibitory kernels was less than the spread of excitatory kernels and mostly omnidirectional. See section 6.8 for the procedure used to visualize these kernels.

Figure S3:

Learned lateral inhibitory kernels. Each subplot plots one of the 64 learned inhibitory lateral kernels of a model trained on the synthetic contour fragments data set. Individual kernels had 64 channels and had a spatial spread of 15 × 15. To view the kernels, the channel dimensions were compressed by summing over all channels. The spatial extent of inhibitory kernels was less than the spread of excitatory kernels and mostly omnidirectional. See section 6.8 for the procedure used to visualize these kernels.

Close modal

The source code for all networks, experiments and analysis that were performed as well as for generating data sets used in this work is available at https://github.com/salkhan23/contour_integration_pytorch.

1

Many other computational models of contour integration exist in the literature, including those that are based on edge co-occurrence probabilities in natural images. For a review, see Elder and Goldberg (2002) and Geisler et al. (2001). However, because we are interested in the brain’s mechanisms of contour integration, we restrict our comparisons to mechanistic models only.

Baker
,
N.
,
Lu
,
H.
,
Erlikhman
,
G.
, &
Kellman
,
P. J.
(
2018
).
Deep convolutional networks do not classify based on global object shape
.
PLOS Computational Biology
,
14
(
12
), e1006613.
Baker
,
P. M.
, &
Bair
,
W.
(
2016
).
A model of binocular motion integration in MT neurons
.
Journal of Neuroscience
,
36
(
24
),
6563
6582
.
Carandini
,
M.
, &
Heeger
,
D. J.
(
2012
).
Normalization as a canonical neural computation
.
Nature Reviews Neuroscience
,
13
(
1
),
51
62
.
Chen
,
R.
,
Wang
,
F.
,
Liang
,
H.
, &
Li
,
W.
(
2017
).
Synergistic processing of visual contours across cortical layers in V1 and V2
.
Neuron
,
96
(
6
),
1388
1402
.
Chung
,
J.
,
Gulcehre
,
C.
,
Cho
,
K.
, &
Bengio
,
Y.
(
2014
).
Empirical evaluation of gated recurrent neural networks on sequence modeling.
.
Cornford
,
J.
,
Kalajdzievski
,
D.
,
Leite
,
M.
,
Lamarquette
,
A.
,
Kullmann
,
D. M.
, &
Richards
,
B. A.
(
2021
).
Learning to live with Dale’s principle: ANNs with separate excitatory and inhibitory units
. In
Proceedings of the International Conference on Learning Representations.
Dale
,
H.
(
1935
).
Pharmacology and nerve-endings
.
Proceedings of the Royal Society of Medicine
,
28
(
3
),
319
332
.
Deng
,
J.
,
Dong
,
W.
,
Socher
,
R.
,
Li
,
L.-J.
,
Li
,
K.
, &
Fei-Fei
,
L.
(
2009
).
ImageNet: A large-scale hierarchical image database
. In
Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition
(pp.
248
255
).
DiCarlo
,
J. J.
,
Zoccolan
,
D.
, &
Rust
,
N. C.
(
2012
).
How does the brain solve visual object recognition?
Neuron
,
73
(
3
),
415
434
.
Eccles
,
J. C.
,
Fatt
,
P.
, &
Koketsu
,
K.
(
1954
).
Cholinergic and inhibitory synapses in a pathway from motor-axon collaterals to motoneurones
.
Journal of Physiology
,
126
(
3
),
524
562
.
Elder
,
J. H.
, &
Goldberg
,
R. M.
(
2002
).
Ecological statistics of gestalt laws for the perceptual organization of contours
.
Journal of Vision
,
2
(
4
), 5.
Field
,
D. J.
,
Golden
,
J. R.
, &
Hayes
,
A.
(
2013
).
Contour integration and the association field
. In
L. M.
Chalupa
&
J. S.
Werner
(Eds.)
,
The new visual neurosciences
(pp.
627
638
).
MIT Press
.
Field
,
D. J.
,
Hayes
,
A.
, &
Hess
,
R. F.
(
1993
).
Contour integration by the human visual system: Evidence for a local “association field.”
Vision Research
,
33
(
2
),
173
193
.
Geirhos
,
R.
,
Rubisch
,
P.
,
Michaelis
,
C.
,
Bethge
,
M.
,
Wichmann
,
F. A.
, &
Brendel
,
W.
(
2019
).
ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness
. In
Proceedings of the International Conference on Learning Representations
.
Geisler
,
W. S.
,
Perry
,
J. S.
,
Super
,
B.
, &
Gallogly
,
D.
(
2001
).
Edge co-occurrence in natural images predicts contour grouping performance
.
Vision Research
,
41
(
6
),
711
724
.
Gosselin
,
F.
, &
Schyns
,
P. G.
(
2001
).
Bubbles: A technique to reveal the use of information in recognition tasks
.
Vision Research
,
41
(
17
),
2261
2271
.
Guerguiev
,
J.
,
Lillicrap
,
T. P.
, &
Richards
,
B. A.
(
2017
).
Towards deep learning with segregated dendrites
.
eLife
,
6
, e22901.
Haeusler
,
S.
, &
Maass
,
W.
(
2007
).
A statistical analysis of information-processing properties of lamina-specific cortical microcircuit models
.
Cerebral Cortex
,
17
(
1
),
149
162
.
He
,
K.
,
Zhang
,
X.
,
Ren
,
S.
, &
Sun
,
J.
(
2016
).
Deep residual learning for image recognition
. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(pp.
770
778
).
Hendrycks
,
D.
, &
Dietterich
,
T.
(
2019
).
Benchmarking neural network robustness to common corruptions and perturbations.
.
Hess
,
R. F.
,
May
,
K. A.
, &
Dumoulin
,
S. O.
(
2014
).
Contour integration: Psychophysical, neurophysiological, and computational perspectives
. In
J.
Wagemans
(Ed.)
,
The Oxford Handbook of Perceptual Organization
.
Oxford University Press
.
Hu
,
B.
, &
Niebur
,
E.
(
2017
).
A recurrent neural model for proto-object based, contour integration and figure-ground segregation
.
Journal of Computational Neuroscience
,
43
(
3
),
227
242
.
Hurzook
,
A.
,
Trujillo
,
O.
, &
Eliasmith
,
C.
(
2013
).
Visual motion processing and perceptual decision making
. In
Proceedings of the Annual Meeting of the Cognitive Science Society
, vol.
35
.
Ioffe
,
S.
, &
Szegedy
,
C.
(
2015
).
Batch normalization: Accelerating deep network training by reducing internal covariate shift
. In
Proceedings of the International Conference on Machine Learning
(pp.
448
456
).
Iyer
,
R.
,
Hu
,
B.
, &
Mihalas
,
S.
(
2020
).
Contextual integration in cortical and convolutional neural networks
.
Frontiers in Computational Neuroscience
,
14
, 31.
Kapadia
,
M. K.
,
Westheimer
,
G.
, &
Gilbert
,
C. D.
(
2000
).
Spatial distribution of contextual interactions in primary visual cortex and in visual perception
.
Journal of Neurophysiology
,
84
(
4
),
2048
2062
.
Katayama
,
J.
,
Akaike
,
N.
, &
Nabekura
,
J.
(
2003
).
Characterization of pre-and post- synaptic metabotropic glutamate receptor-mediated inhibitory responses in substantianigra dopamine neurons
.
Neuroscience Research
,
45
(
1
),
101
115
.
Khan
,
S.
,
Wong
,
A.
, &
Tripp
,
B. P.
(
2020
).
Task-driven learning of contour integration responses in a V1 model
. In
Proceeding of the NeurIPS 2020 Workshop SVRHM
.
Kingma
,
D. P.
, &
Ba
,
J.
(
2014
).
Adam: A method for stochastic optimization
. .
Kriegeskorte
,
N.
(
2015
).
Deep neural networks: A new framework for modelling biological vision and brain information processing
.
bioRxiv:029876
.
Kubilius
,
J.
,
Schrimpf
,
M.
,
Nayebi
,
A.
,
Bear
,
D.
,
Yamins
,
D. L.
, &
DiCarlo
,
J. J.
(
2018
).
Cornet: Modeling the neural mechanisms of core object recognition
.
bioRxiv:408385
.
Lake
,
B. M.
,
Salakhutdinov
,
R.
, &
Tenenbaum
,
J. B.
(
2015
).
Human-level concept learning through probabilistic program induction
.
Science
,
350
(
6266
),
1332
1338
.
Li
,
W.
,
Piëch
,
V.
, &
Gilbert
,
C. D.
(
2006
).
Contour saliency in primary visual cortex
.
Neuron
,
50
(
6
),
951
962
.
Li
,
W.
,
Piëch
,
V.
, &
Gilbert
,
C. D.
(
2008
).
Learning to link visual contours
.
Neuron
,
57
(
3
),
442
451
.
Li
,
Z.
(
1998
).
A neural model of contour integration in the primary visual cortex
.
Neural Computation
,
10
(
4
),
903
940
.
Liang
,
H.
,
Gong
,
X.
,
Chen
,
M.
,
Yan
,
Y.
,
Li
,
W.
, &
Gilbert
,
C. D.
(
2017
).
Interactions between feedback and lateral connections in the primary visual cortex
. In
Proceedings of the National Academy of Sciences
,
114
(
32
),
8637
8642
.
Lin
,
M.
,
Chen
,
Q.
, &
Yan
,
S.
(
2013
).
Network in network
. .
Lindsay
,
G. W.
(
2020
).
Convolutional neural networks as a model of the visual system: Past, present, and future
.
Journal of Cognitive Neuroscience
,
33
(
10
),
1
15
.
Lindsey
,
J.
,
Ocko
,
S. A.
,
Ganguli
,
S.
, &
Deny
,
S.
(
2019
).
A unified theory of early visual representations from retina to cortex through anatomically constrained deep CNNs
. .
Linsley
,
D.
,
Kim
,
J.
,
Ashok
,
A.
, &
Serre
,
T.
(
2020
).
Recurrent neural circuits for contour detection
. In
Proceedings of the International Conference on Learning Representations
.
Linsley
,
D.
,
Kim
,
J.
,
Veerabadran
,
V.
,
Windolf
,
C.
, &
Serre
,
T.
(
2018
).
Learning long-range spatial dependencies with horizontal gated recurrent units
. In
S.
Bengio
,
H.
Wallach
,
H.
Larochelle
,
K.
Grauman
,
N.
Cesa-Bianchi
, &
R.
Gartnett
(Eds.)
,
Advances in neural information processing systems
,
31
(pp.
152
164
).
Curran
.
Liu
,
S.
,
Plachez
,
C.
,
Shao
,
Z.
,
Puche
,
A.
, &
Shipley
,
M. T.
(
2013
).
Olfactory bulb short axon cell release of GABA and dopamine produces a temporally biphasic inhibition–excitation response in external tufted cells
.
Journal of Neuroscience
,
33
(
7
),
2916
2926
.
Malach
,
R.
,
Amir
,
Y.
,
Harel
,
M.
, &
Grinvald
,
A.
(
1993
).
Relationship between intrinsic connections and functional architecture revealed by optical imaging and in vivo targeted biocytin injections in primate striate cortex
.
Proceedings of the National Academy of Sciences
,
90
(
22
),
10469
10473
.
Mély
,
D. A.
,
Linsley
,
D.
, &
Serre
,
T.
(
2018
).
Complementary surrounds explain diverse contextual phenomena across visual modalities
.
Psychological Review
,
125
(
5
), 769.
Nayebi
,
A.
,
Bear
,
D.
,
Kubilius
,
J.
,
Kar
,
K.
,
Ganguli
,
S.
,
Sussillo
,
D.
, . . .
Yamins
,
D. L.
(
2018
).
Task-driven convolutional recurrent models of the visual system
. In
S.
Bengio
,
H.
Wallach
,
H.
Larochelle
,
K.
Grauman
,
N.
Cesa-Bianchi
, &
R.
Garnett
(Eds.)
,
Advances in neural information processing systems
,
31
.
Curran
.
Nguyen
,
A.
,
Yosinski
,
J.
, &
Clune
,
J.
(
2015
).
Deep neural networks are easily fooled: High confidence predictions for unrecognizable images
. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(pp.
427
436
).
Parisien
,
C.
,
Anderson
,
C. H.
, &
Eliasmith
,
C.
(
2008
).
Solving the problem of negative synaptic weights in cortical models
.
Neural Computation
,
20
(
6
),
1473
1494
.
Piëch
,
V.
,
Li
,
W.
,
Reeke
,
G. N.
, &
Gilbert
,
C. D.
(
2013
).
Network model of top-down influences on local gain and contextual interactions in visual cortex
.
Proceedings of the National Academy of Sciences
,
110
(
43
),
E4108
E4117
.
Poma
,
X. S.
,
Riba
,
E.
, &
Sappa
,
A.
(
2020
).
Dense extreme inception network: Towards a robust CNN model for edge detection
. In
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision
(pp.
1923
1932
).
Rajalingham
,
R.
,
Issa
,
E. B.
,
Bashivan
,
P.
,
Kar
,
K.
,
Schmidt
,
K.
, &
DiCarlo
,
J. J.
(
2018
).
Large-scale, high-resolution comparison of the core visual object recognition behavior of humans, monkeys, and state-of-the-art deep artificial neural networks
.
Journal of Neuroscience
,
38
(
33
),
7255
7269
.
Roelfsema
,
P. R.
(
2006
).
Cortical algorithms for perceptual grouping
.
Annual Review of Neuroscience
,
29
,
203
227
.
Rubin
,
D. B.
, Van
Hooser
,
S. D.
, &
Miller
,
K. D.
(
2015
).
The stabilized supralinear network: A unifying circuit motif underlying multi-input integration in sensory cortex
.
Neuron
,
85
(
2
),
402
417
.
Sacramento
,
J.
,
Ponte Costa
,
R.
,
Bengio
,
Y.
, &
Senn
,
W.
(
2018
).
Dendritic cortical microcircuits approximate the backpropagation algorithm
. In
S.
Bengio
,
H.
Wallach
,
H.
Larochelle
,
K.
Grauman
,
N.
Cesa-Bianchi
, &
R.
Gartnett
(Eds.)
,
Advances in neural information processing systems
,
31
.
Curran
.
Schrimpf
,
M.
,
Kubilius
,
J.
,
Hong
,
H.
,
Majaj
,
N. J.
,
Rajalingham
,
R.
,
Geiger
,
F.
, . . .
DiCarlo
,
J. J.
(
2018
).
Brain-score: Which artificial neural network for object recognition is most brain-like?
bioRxiv:407007
.
Serre
,
T.
(
2019
).
Deep learning: The good, the bad, and the ugly
.
Annual Review of Vision Science
,
5
:
399
426
.
Shabel
,
S. J.
,
Proulx
,
C. D.
,
Piriz
,
J.
, &
Malinow
,
R.
(
2014
).
GABA/glutamate corelease controls habenula output and is modified by antidepressant treatment
.
Science
,
345
(
6203
),
1494
1498
.
Shadlen
,
M. N.
,
Britten
,
K. H.
,
Newsome
,
W. T.
, &
Movshon
,
J. A.
(
1996
).
A computational analysis of the relationship between neuronal and behavioral responses to visual motion
.
Journal of Neuroscience
,
16
(
4
),
1486
1510
.
Shi
,
J.
,
Tripp
,
B.
,
Shea-Brown
,
E.
,
Mihalas
,
S.
, &
A. Buice
,
M.
(
2022
).
Mousenet: A biologically constrained convolutional neural network model for the mouse visual cortex
.
PLOS Computational Biology
,
18
(
9
), e1010427.
Shi
,
X.
,
Chen
,
Z.
,
Wang
,
H.
,
Yeung
,
D.-Y.
,
Wong
,
W.-K.
, &
Woo
,
W.-c.
(
2015
).
Convolutional LSTM network: A machine learning approach for precipitation nowcasting
. In
C.
Cortes
,
N.
Lawrence
,
D.
Lee
,
M.
Sugiyama
, &
R.
Garnett
(Eds.)
,
Advances in neural information processing systems
,
28
.
Curran
.
Sincich
,
L. C.
, &
Blasdel
,
G. G.
(
2001
).
Oriented axon projections in primary visual cortex of the monkey
.
Journal of Neuroscience
,
21
(
12
),
4416
4426
.
Sinz
,
F. H.
,
Pitkow
,
X.
,
Reimer
,
J.
,
Bethge
,
M.
, &
Tolias
,
A. S.
(
2019
).
Engineering A less artificial intelligence
.
Neuron
,
103
(
6
),
967
979
.
Spoerer
,
C. J.
,
Kietzmann
,
T. C.
,
Mehrer
,
J.
,
Charest
,
I.
, &
Kriegeskorte
,
N.
(
2020
).
Recurrent neural networks can explain flexible trading of speed and accuracy in biological vision
.
PLOS Computational Biology
,
16
(
10
), e1008215.
Spoerer
,
C. J.
,
McClure
,
P.
, &
Kriegeskorte
,
N.
(
2017
).
Recurrent convolutional neural networks: A better model of biological object recognition
.
Frontiers in Psychology
,
8
, 1551.
Stettler
,
D. D.
,
Das
,
A.
,
Bennett
,
J.
, &
Gilbert
,
C. D.
(
2002
).
Lateral connectivity and contextual interactions in macaque primary visual cortex
.
Neuron
,
36
(
4
),
739
750
.
Szegedy
,
C.
,
Zaremba
,
W.
,
Sutskever
,
I.
,
Bruna
,
J.
,
Erhan
,
D.
,
Goodfellow
,
I.
, &
Fergus
,
R.
(
2013
).
Intriguing properties of neural networks
. .
Tallec
,
C.
, &
Ollivier
,
Y.
(
2018
).
Can recurrent neural networks warp time?
In
Proceedings of the Inter national Conference on Learning Representations
.
Tan
,
M.
, &
Le
,
Q.
(
2019
).
Efficientnet: Rethinking model scaling for convolutional neural networks
. In
Proceedings of the International Conference on Machine Learning
(pp.
6105
6114
).
Tripp
,
B. P.
(
2017
).
Similarities and differences between stimulus tuning in the inferotemporal visual cortex and convolutional networks
. In
Proceedings of the 2017 International Joint Conference on Neural Networks
(pp.
3551
3560
).
Tripp
,
B.
(
2019
).
Approximating the architecture of visual cortex in a convolutional network
.
Neural Computation
,
31
(
8
),
1551
1591
.
Tripp
,
B.
, &
Eliasmith
,
C.
(
2016
).
Function approximation in inhibitory networks
.
Neural Networks
,
77
,
95
106
.
Ursino
,
M.
, &
La Cara
,
G. E.
(
2004
).
A model of contextual interactions and contour detection in primary visual cortex
.
Neural Networks
,
17
(
5
6
),
719
735
.
Veerabadran
,
V.
, &
de Sa
,
V. R.
(
2020
).
Learning compact generalizable neural representations supporting perceptual grouping
. .
Wertheimer
,
M.
(
1938
).
Laws of organization in perceptual forms
. In
W. D.
Ellis
(Ed.)
,
A source book of Gestalt psychology
(pp.
627
638
).
Kegan Paul, Trench, Trubner
.
Yamins
,
D. L.
, &
DiCarlo
,
J. J.
(
2016
).
Using goal-driven deep learning models to understand sensory cortex
.
Nature Neuroscience
,
19
(
3
),
356
365
.
Yang
,
B.
,
Slonimsky
,
J. D.
, &
Birren
,
S. J.
(
2002
).
A rapid switch in sympathetic neurotransmitter release properties mediated by the p75 receptor
.
Nature Neuroscience
,
5
(
6
),
539
545
.
This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. For a full description of the license, please visit https://creativecommons.org/licenses/by/4.0/legalcode