## Abstract

Contour is a critical feature for image description and object recognition in many computer vision tasks. However, detection of object contour remains a challenging problem because of disturbances from texture edges. This letter proposes a scheme to handle texture edges by implementing contour integration. The proposed scheme integrates structural segments into contours while inhibiting texture edges with the help of the orientation histogram-based center-surround interaction model. In the model, local edges within surroundings exert a modulatory effect on central contour cues based on the co-occurrence statistics of local edges described by the divergence of orientation histograms in the local region. We evaluate the proposed scheme on two well-known challenging boundary detection data sets (RuG and BSDS500). The experiments demonstrate that our scheme achieves a high -measure of up to 0.74. Results show that our scheme achieves integrating accurate contour while eliminating most of texture edges, a novel approach to long-range feature analysis.

## 1 Introduction

Contour plays a fundamental and essential role in pattern recognition, especially in object segmentation and recognition of complex scenes. Compared with edge extraction, contour detection is more complicated and high level because it aims to isolate objects accurately and distinguish outlines belonging to different objects while eliminating textures and noncontour edges (Forsyth & Ponce, 2003). Most contour detection algorithms are edge based, which makes use of image gradients to find local luminance changes and more low-level cues.

Numerous efforts relying on luminance changes (Forsyth & Ponce, 2003; Canny 1986; Grigorescu, Petkov, & Westenberg, 2003; Papari, Petkov, & Neri, 2007) have been made to detect contours. Other work (Martin, Fowlkes, & Malik, 2004; Maire, Arbelaez, Fowlkes, & Malik, 2008; Arbelaez, Maire, Fowlkes, & Malik, 2011) has proposed methods to combine multiple local cues into a unified framework for object segmentation and contour detection. Recently, robust contour has been constructed by deep neural networks (Hwang & Liu, 2014; Ganin & Lempitsky, 2014; Kivinen, Williams, & Heess, 2014; Bertasius, Shi, & Torresani, 2015; Shen, Wang, Wang, & Bai, 2015; Xie & Tu, 2015), which model high-level abstractions in images by using multiple processing layers with complex structures.

Contour integration (Field, Hayes, & Hess, 1993; Hess, Hayes, & Field, 2003; Field, Golden, & Hayes, 2014; Vilankar, Golden, Chandler, & Field, 2014; Morimichi & Cornelia, 2016) is an effective approach for boundary detection and object segmentation. This approach holds that contour is a combination of local segments with some consistent constraints; therefore, contour could be generated by grouping and integrating many local low-level features such as contour segments and edges. However, it has been a challenging task for contour integration to conform to a globalization contour architecture based on many local contour segments. Edges are also a key problem.

Fortunately, the human visual system evolved so as to be able to integrate contour segments rapidly and segregate an image into objects or background effectively based on the contour of objects (Albright & Stoner, 2002; Bar, 2004; Landy & Graham, 2004). It has been proposed that neurons in the primary visual system (V1) are not only responsive to visual stimuli and abstract local low-level features, but also group those features from broad parts of the visual field to participate in more complex perceptual tasks such as, object contours perception (Jones, Grieve, Wang, & Sillito, 2001; Kapdia, Westheimer, & Gillert, 2000; Walker, Ohzawa, & Freeman, 2000). For contour detection tasks, the low-level features are extracted and serve as local evidence of the presence of contours and objects. Furthermore, some simple local evidence unites prior knowledge in making use of globalization regularities, for example, the Gestalt principle, and then grouping structural local pieces into a globalization perception.

In this letter, we propose an integration scheme for contour detection based on a novel bio-inspired model, orientation histogram-based center-surround interaction, which mimics the nonlinear long-range interaction among neurons in the primary visual cortex (V1). The basic assumption is that contours can be detected by the globalization integration of structural edges or segments, which are easy detected based on local knowledge, while fractional nonmeaningful texture edges are suppressed by the interaction from neighbors. Furthermore, integration is implemented by means of nonlinear interactions between the center region and its neighbors, and the interaction intensity is held by orientation co-occurrence statistics of neighboring local edges.

First, the scheme takes the raw image as input and creates a multidirectional edges map by computing Gabor convolution energy within local regions. Second, the orientation histogram within a local region centered on each edge point is extracted for the center-surround interactions. Finally, the globalization integration progression is implemented by a nonlinear regression that could accept surrounding interactions to determine the contour intensity of the centered point. In our scheme, the co-occurrence statistics for edges are measured by the divergence of the orientation statistical histogram of local edges.

The rest of this letter is organized as follows. Section 2 examines related work about the bio-inspired contour detection approaches. We introduce our center-surround interaction model in section 3 by presenting our proposed surroundings structure. Section 4 uses the integration approach of contour detection by detailing spatial position weight function, orientation inhibition function, and the co-occurrence statistics of edges. Experimental results and performance evaluation are presented in section 5. Section 6 offers discussion and summary.

## 2 Related Work

The human visual system can extract object contours effectively, which inspires us to design a biologically motivated algorithm for robust contour detection. Based on the studies of Hubel (1988), and Field et al. (1993), we propose more frameworks to describe the functional architecture for object perception (e.g., Bressloff & Cowan, 2003; Sarti & Citti, 2015; Citti & Sarti, 2006; Sarti, Citti, & Petitot, 2008; Duits & Franklin, 2010a, 2010b).

Physiological research has clearly shown that the area beyond the classical receptive field (CRF) of the majority of neurons in V1, namely, the nonclassical receptive field (non-CRF) or surroundings, although alone is unresponsive to visual stimuli, can exert modulatory effects on the response to the stimuli in the CRF. Usually the modulatory effects that come from the surrounding area of the CRF are called center-surround interaction: the stimuli of the broad area outside the CRF participates in complex nonlinear effects among neuron links that play an important role in representing visual information and featuring integration in a broad range (Jones et al., 2001; Kapdia et al., 2000; Walker et al., 2000; Hubel & Wiesel, 1968; Li & Li, 1994; Li, 1996).

In a contour extraction task, the modulatory effect makes use of the co-occurrence statistics between the central area and its surroundings to suppress noncontour edges while keeping the integral contour. In general, the inhibition exerts effects in a ring-formed surrounding area. Furthermore, physiological findings show that the intensity of the inhibition decreases as the distance from the concerned point of receptive field increases. Therefore, this form of inhibition can be realized using the difference of gaussian function, known as the difference of gaussian (DOG) model shown in Figure 1A. However, the DOG model, which cannot discriminate an object contour from noncontour edges, is invariable and insensitive to the input image pattern, especially the gradient direction in the local region. Subsequently, electrophysiological studies (Li & Li, 1994; Li, 1996) have shown that the non-CRF area could be described as four symmetrical parts according to the stimuli orientation at the center point: two end regions and two side regions shown in Figure 1B. It is called a butterfly-shaped model and is flexible in analyzing and understanding different patterns of input images.

Based on the idea of center-surround interaction, several inhibition-based contour detection models have been proposed (Grigorescu et al., 2003; Papari et al., 2007; Ursino & La Cara, 2004; La Cara & Ursino, 2008; Long & Li, 2008; Petkov & Westenberg, 2003; Ren, 2008; Tang, Sang, & Zhang, 2007). The typical model, proposed by Grigorescu et al. (2003) and Petkov and Westenberg (2003) and named the isotropic and anisotropic inhibition models, employs a ring-formed area for surrounding structures and the DOG model for computing the inhibition effects. Their models enhance contour detection by eliminating a few local edges of textured background and make it more effective than the classical Canny edge detector (Canny, 1986). However, the fact that the consistent inhibition affects contour and texture uniformly leads to a serious issue where information in the surrounding areas always exerts equivalent suppression on edges belonging to object contours. This issue, known as self-inhibition, not only suppresses texture but also makes weak contours difficult to pop out. To resolve the issue of self-inhibition, researchers have to consider the balance of inhibitions between object contours and background textures. Grigorescu et al. (2007) have changed the surrounding structure by splitting the inhibition into two truncated half-rings, leaving a band region oriented along the concerned edge with no inhibition. In addition, Papari and Petkov (2011) developed an operator based on steerable filters for computating inhibitions to overcome self-inhibition.

Based on the butterfly-shaped surroundings hypothesis, numerous inhibition-based contour detection algorithms (Long & Li, 2008; Tang et al., 2007; Zeng, Li, & Li, 2011; Zeng, Li, Yang, & Li, 2011) have been proposed. Those methods adopt different inhibition in the end and side regions of surroundings to restrain background textures while retaining edges belonging to the contours of objects. Furthermore, several approaches (La Cara & Ursino, 2008; Ren, 2008; Wei, Lang, & Zuo, 2013) exert inhibition in different scales to realize more effective contour detection through integrating multiscale, multiregion information. Yang, Li, and Li (2014) proposed a multifeature-based center-surround framework in which multiple features are combined according to a scale-guided strategy, to improve the performance of perceptually salient contour detection. Aiming to combine brightness and color features to maximize the reliability of contour detection, a framework based on color-opponent mechanisms was proposed (Yang, Gao, Li, & Li, 2013; Yang, Gao, Guo, Li, & Li, 2015).

Although the accurate physiological structure of non-CRF is unknown, there is no doubt that different types of non-CRF exist, and they are flexible enough to analyze and understand visual information. Therefore, it is clear that the adaptive interaction among different areas of receptive fields of V1 neurons is one of the foundations for V1 to capture and group different patterns of visual stimuli effectively.

The model we propose in this letter is composed of a central region and surrounding subregions. Each subregion extracts a local image pattern and exerts a flexible nonlinear modulatory effect on the response of the central region, with the modulation intensity determined by orientation co-occurrence statistics in the local region. Mathematically, the center-surround interaction is implemented by a nonlinear regression learned from the statistical divergence of an orientation histogram of local contour segments. Depending on the interactions between each central region and its surrounding subregions, the model groups structure local segments into contours while suppressing texture edges.

## 3 The Center-Surround Model

Our model evolves from both the ring-shaped and butterfly-shaped surroundings, which are not useful for contour integration. They cannot keep a consistent inhibition in a broad area without distinguishing the contextual configuration flexibly even though they have achieved some effective results for contour detection.

The hypothesis behind our model, illustrated in Figure 1C, is that the surroundings of each central region have a number of independent subregions that can extract local low-level image features like edges, gradient orientation, energy, and contrast. These features represent the presence of contour segments in the local region. Moreover, the subregion exerts a nonlinear modulatory effect on the response of the central local region. The nonlinear modulation is used to identify consistent local contour segments and align them to provide powerful evidence of the contour presence since the surroundings structure composed of subregions could implement flexible interaction among neighboring neurons.

### 3.1 Local Contour Segments

The response of the simple neuron in the visual system represents local patterns that act as fundamental elements of visual objects. The local low-level features like edges, gradient orientation, energy, and contrast provide evidence of the object contour. In the hierarchical architecture, the low-level features are grouped into high-level object blocks for visual recognition. Consequently, the primary function of our model is to extract the contour segments based on local low-level image features.

### 3.2 Surroundings Structure

To achieve flexible and self-adaptive surrounding interaction, our hypothesis about the surroundings of a central region is that the surrounding area is composed of several subregions that have independent perceived capability of local low-level features, and inputs of those subregions exert different effects on the output of the central local region, respectively. It is reasonable to assume that the broad receptive field could flexibly perceive more complete visual appearances based on variable modulations and self-adaptive interaction between the central and surrounding regions.

Second, unlike other approaches, the subregion in our model possesses the perception of local low-level contour cues independently. The perception is designed as Gabor-simulated form, which is the same as the function of the central region since the center-surround interaction is based on a comparison of local feature between subregions and its central region. The perception of the ellipse-shaped subregion is given by equations 3.5 and 3.6. In the same way as the capture process in the central region, the subregion perceives local contour cues using the convolution of stimuli in the ellipse-shaped subregion with Gabor functions and captures the preferred orientation and the corresponding Gabor energy map in the subregion.

Based on the value of in the subregion model, the surroundings are divided into several ellipse-shaped subregions (see Figure 1C), which can capture the local image pattern of the inputs. The size of the subregion is adaptive to the pattern of the input image. To simplify the model, we choose a fixed value for the size of subregions in the surroundings for this letter.

## 4 Orientation Histogram-Based Interaction and Contour Detection

According to the findings of physiological research, the types of neurons with inhibitory effects at both the end and side regions show weak or no response to broad field homogeneous textures but respond vigorously to various textural contrasts (Zeng, Li, & Li, 2011). Consequently, this feature-selective inhibitory effect was viewed as the neural basis for our contour detection model.

In the contour detection algorithm, the contour intensity in the central region is determined by the local contour cues and interactions with the surrounding subregions depending on the segment’s co-occurrence statistics. In addition, the interaction is weighted by a spatial position function and an orientation inhibition function. In short, the interaction is determined based on spatial position, orientation inhibition, and the co-occurrence statistics. (We discuss each part of the contour detection algorithm in section 4.1.)

where is the final output of the contour detector proposed in our work at location (, and is the response to the input raw image presented in the local region given by equation 3.6. denotes the modulatory term from the th subregion of the center region, which is determined by spatial position function, orientation inhibition function, and the co-occurrence statistics introduced in section 4.1. The factor is the modulation coefficient that controls the linking intensity of the modulation from the subregion. is a half-wave rectification operator simulating that the cortical cell can fire only when positive excitation is received. Consequently, the final output is represented by a rectified linear regression of the local response and its surrounding interactions. Equation 4.1 indicates that the contour intensity presented in the center region is modulated by the nonlinear combination of several subregions within its surroundings. As a result, the center-surround interaction is transformed as a rectified linear regression; therefore, the parametric can be learned by the gradient descent algorithm (Boyd & Vandenberghe, 2004).

### 4.1 Spatial Position Weight Function

*DOG*defines a ring-shaped region and is formulated by the difference of two gaussian functions, so the location weight model is called difference-of-gaussian (DOG) model. is the ratio of the standard divisions of two gaussian functions; in most research, is set as 4 based on the neurophysiological findings.

### 4.2 Orientation Inhibition Function

The curve of the inhibition function (see Figure 2A) illustrates that the weight of the modulatory effect increases with is from the minimum when is equal to 0 to the maximum while is valued as . To simplify the interpretation of the model, we choose the parameter as C. The extreme values of inhibition function are located at the integer multiples of (e.g., 0, , and ), and it possesses positive value when is on the intervals [/4, 3/4] and [5/4, 7/4]. This means that the region of surroundings corresponding to the two intervals exerts a more powerful inhibition on the response to input at the center than other regions when the model is considered an inhibition one. The region corresponds to the side region in the butterfly-shaped model. By the crossing point of the weight function with , the surroundings can be divided into four regions. It is the end regions when is on the intervals [/4, /4] and [3/4, 5/4], where the area exerts weak inhibition because edges belonging to object contours generally traverse the center and the end regions of surroundings simultaneously. The side regions are defined by on the intervals [/4, 3/4] and [5/4, 7/4]. However, the weight function in our model, which is different from the butterfly-shaped model, varies with the location of each point of the surroundings; furthermore, the transition from one region to another is smooth. In this study, we call this weight function an orientation inhibition function.

### 4.3 Orientation Histogram-Based Interaction

The key issue of contour integration is aligning local contour segments to form the globalization contour architecture based on the co-occurrence statistics of contour segments. Inspired by the HOG algorithm (Dalal & Triggs, 2005), which counts occurrences of gradient orientation in the local position of an image, we assume that the distribution of the preferred orientation of segments in a local region can describe the local appearance and characteristics of contours. Consequently, the statistical divergence of the distribution can also represent the co-occurrence of contour segments, which implies the structural consistency of the contours. Thus, we first consider the distribution of preferred orientation; then we calculate the statistical divergence of the distribution to control the interaction of the local regions.

#### 4.3.1 Orientation Histogram

The orientation histogram is calculated based on the directional Gabor energy within the surroundings of the central region and relies on the preferred orientation and corresponding Gabor energy. The first step is to create the histogram of the preferred orientation in each subregion. For the subregion within surroundings, a local one-dimensional histogram of the preferred orientation and Gabor energy is accumulated over points of the subregion. Each point of the subregion calculates a weighted vote for an orientation-based histogram channel based on the preferred orientation of the point in the subregion, while the vote is a function of the Gabor energy itself. The votes are accumulated in the orientation bin over the subregion within the CRF surroundings. There are nine, histogram channels, and the orientations from 0 to are divided into nine channels, so each channel possesses 20 degrees in our work because the unsigned channel performs best experimentally. To compute the orientation binning, the role of the subregion and surroundings of our model is the same as the cell and block in the HOG algorithm.

*center_b*denote the center of the bin that the preferred orientation falls in. Therefore, votes belonging to this bin are calculated as and votes on the neighboring bin are calculated as where

*Gabor_energy*( and

*orientation*( are the Gabor energy and the preferred orientation of the point in the subregion, respectively; and are the index of the neighboring bins; and

*bin_width*denotes the width of each bin. The histogram of the preferred orientation is represented by the histogram for each subregion of the surroundings.

#### 4.3.2 Interactions Based on Co-Occurrence Statistics

### 4.4 Contour Detection Algorithm

Based on the center-surround interaction model, we propose a novel contour detection algorithm in this section. Figure 3 shows the schematic drawings illustrating the general flowchart of the proposed algorithm for contour detection. Given a raw image, Gabor energy maps and the preferred orientations are calculated by convolution with a series of directional Gabor functions. The energy maps indicate the appearance of local edges and textures at different orientations, spatial frequencies, and phases. For each pixel of the image, the central region and surrounding subregion are defined. Then the histograms of the preferred orientation are counted in each central region and subregion. Based on the divergence of the orientation histogram, adaptive modulations are calculated for each subregion. The modulatory effects from every subregion are exerted on the Gabor energy with a weight function . The modulation is a subtraction operation when the surrounding is inhibitive and an addition operation when facilitative. Finally, to construct the binary contour map, the well-known nonmaximum suppression is used. For the contour integration and detection algorithm, see algorithm 1.

## 5 Performance Evaluations

In this section, we present the results of a number of experiments. To evaluate the robustness of the proposed contour detection method, the model with various combinations of specific parameters is tested on the database RuG data set (Grigorescu et al., 2003) and the performance is shown with box-and-whisker diagrams. Then we give benchmarking results of the contour detection algorithm compared with some other bio-inspired or other methods on the challenging database BSDS500 (Martin, Fowlkes, Tal, & Malik, 2001).

### 5.1 Robust Evaluation

where *card*(S) denotes the number of the set S.

Performance is usually affected by the parameters of the model. To evaluate robustness, we test the model with various combinations of parameters. For Gabor functions in our model, there is only a single parameter to be tuned, the deviation of Gabor filters in equation 3.1. We use . To simplify the test, for every subregion, we adopt the fixed value for the modulation coefficient in equation 4.1. For the last part of our contour detection algorithm where nonmaxima suppression is used for binary contour map construction, we use .

For each of the 40 test images in the RuG data set, the 80 values computed with the 80 groups of parameter combinations of , , are statistically illustrated in Figure 4 by using statistical box-and-whisker diagrams. Each whisker represents the range of the value with 80 groups of parameter combinations for each image, the top end of each whisker represents the optimal performance, and the horizontal red line shows the median value of the performance. In the graph, the results of our model are compared with most of the surround inhibition-based contour approaches proposed recently, for example, the standard anisotropic and isotropic inhibition models (Grigorescu et al., 2003; Petkov & Westenberg, 2003), the butterfly-shaped inhibition model (Zeng, Li, & Li, 2011; Zeng, Li, Yang et al. (2011), and the multiscale integration model (Wei et al., 2013). From the plot, we can see that our model achieves better performance than the other methods. For most of test images, the whisker’s top ends of our model are obviously higher than those of the other models, which reveal that our model could obtain the optimal performance for contour detection compared with the other methods. In addition, the whiskers of most images are compact because our model is insensitive to parameters. Finally, our model produces a better median value for most cases.

In Figure 5, we compare our model with most of the bio-inspired models. Those contour maps are the best in the results tested with many groups of parameter combinations. The rows from top to bottom show the original images, corresponding ground truth, results from anisotropic and isotropic inhibition models (Grigorescu et al., 2003; Petkov & Westenberg, 2003), the adaptive inhibition model (Zeng, Li, & Li, 2011; Zeng, Li, Yang et al., 2011), the multiscale integration model (Wei et al., 2013), and our model, respectively. The results clearly show that our model could effectively suppress noncontour edges and background textures and perform better than the other models in detecting object contour from cluttered background.

Furthermore, the contour maps of our model retain as many as possible integral contours of tiny objects, for example, the eyes of those animals and the bird in the second image, which are lost in other methods.

### 5.2 Benchmarking Results

Contour detection performance is evaluated in the precision-recall framework (Jesse & Mark, 2006). The precision-recall curve captures the trade-off between accuracy and noise. High precision means that an algorithm returns more true positives than false ones, and high recall means that an algorithm returns more true positives than false negatives. Precision and recall are computed based on the result maps of the model and ground truths. For contour detection, high precision indicates that the detected contour map contains more contour pixels belonging to ground truth than noise, while high recall suggests that the map contains most of contour pixels belonging to ground truth. In the performance evaluations of contour detection, high precision and recall are expected simultaneously, but more noise would be introduced when high recall is obtained. Generally, a convex precision recall curve is expected. In addition, the -measure combines precision and recall as . Here, the larger the , the better.

In Figure 6, we compare some of previous contour detectors. We choose state-of-the-art methods to compare more extensively—for example Crisp Boundaries (Isola, Zoran, Krishnan, & Adelson, 2014), gPb-owt-ucm detector (Arbelaez et al., 2011), Sketch Tokens (Lim, Zitnick, & Dollar, 2013), Normalized Cut (Cour, Benezit, & Shi, 2005), Canny algorithm (Canny, 1986), and other contour detection methods (Dollar & Zitnick, 2013; Ren & Bo, 2012; Comaniciu & Meer, 2002; Felzenszwalb & Huttenlocher, 2004). The results of the comparisons are based on the widely used benchmark for the contour detection algorithm, BSDS500, the Berkeley segmentation data set, which contains 500 test images and corresponding ground truth. We report performance by the precision-recall curve and -measure, which show that our approach to contour detection can match the state-of-the-art.

## 6 Discussion and Conclusion

Most bio-inspired contour detection models are based on center-surround interaction of the V1 neurons, but those models are so inflexible that they cannot accurately distinguish contours from edges and textures. This letter proposes a novel scheme to group structural edges into contours by the orientation-based center-surround interaction model, which integrates structural edges or contour segments into a globalization contour architecture based on edge co-occurrence statistics. Noncontour edges and textures are suppressed effectively through a nonlinear regression from surroundings. The experimental results exhibit its superior performance in contour detection.

The scheme is robust to suppress noncontour edges and textures for three reasons. First, our proposed center-surround interaction model has a flexible structure. There is no strict definition of the side and end regions. Rather, side and end are is represented by the cosine-simulated weight function in equation 4.4 and Figure 2, which changes smoothly in each region and over the regions. This means that the modulation intensity is multi valued rather than fixed states presenting inhibition or not. Also, the surroundings of a V1 neuron’s receptive field are made up of many subregions that could perceive local low-level cues of contours independently, such as edges and contour segments. In other words, the proposed model can achieve pixel-wise contour detection.

Second, the long-range interaction is described as a nonlinear modulation from the surrounding area on the response of the central region. The modulatory effect is determined by the partial position, orientation inhibition, and co-occurrence statistics of the edges. Consequently, the contour detector perceives information from a large range of vision and groups the aligned edges into structural contours.

Finally, although some content of our model is speculative, there is physiological evidence supporting our basic idea about center-surround interaction. Increasing studies show that the V1 neuron acts as an adaptive information processor, and this kind of processing is dependent on visual content. Furthermore, the adaptive performance originates from the dynamic center-surround interaction (Barthelemy, Vanzetta, & Masson, 2006).

Our task is to develop an integration approach to contour detection. Although the proposed approach is useful for contour detection, it is extremely difficult to achieve such accurate judgment about the noncontour edges and how to deal with the jag of contours. More work could be done, for example, on the prediction of continuous contours and contour integration on a different scale. This will be the subject for our further studies.

## Acknowledgments

This work was supported by the scientific research foundation of Central South University. We thank N. Petkov et al. and J. Malik et al. for their RuG data set and BSDS500 benchmarks. We are very grateful to the anonymous reviewers for their valuable comments and suggestions on the manu-script and to Kun Zhan, Yide Ma, and Hongjuan Zhang for many useful suggestions.