## Abstract

Robust coding has been proposed as a solution to the problem of minimizing decoding error in the presence of neural noise. Many real-world problems, however, have degradation in the input signal, not just in neural representations. This generalized problem is more relevant to biological sensory coding where internal noise arises from limited neural precision and external noise from distortion of sensory signal such as blurring and phototransduction noise. In this note, we show that the optimal linear encoder for this problem can be decomposed exactly into two serial processes that can be optimized separately. One is Wiener filtering, which optimally compensates for input degradation. The other is robust coding, which best uses the available representational capacity for signal transmission with a noisy population of linear neurons. We also present spectral analysis of the decomposition that characterizes how the reconstruction error is minimized under different input signal spectra, types and amounts of degradation, degrees of neural precision, and neural population sizes.

## 1. Introduction

We address the problem of how to linearly transform *N*-dimensional inputs into *M*-dimensional representations in order to best transmit signal information through noisy gaussian channels when the input signal is degraded. Here the degradation is modeled by additive white gaussian noise as well as a linear distortion such as blurring. Most of the earlier studies have addressed the problem without such degradations (for a review, see Palomar & Jiang, 2007), although they are inevitable in almost any real system. It is therefore important to characterize and understand its impact on the optimal encoder. The main result of this note is that under a minimum mean squared error (MMSE) objective, the optimal linear transform can be decomposed exactly into optimal denoising and debarring (Wiener filtering), and optimal coding for a noisy gaussian channel (robust coding; Lee, 1975; Lee and Petersen, 1976; Doi, Balcan, & Lewicki, 2007). Such decomposition was shown earlier in the context of source coding and quantization (Dobrushin & Tsybakov, 1962; Sakrison, 1968) and proven in a general case (Wolf & Ziv, 1970). Although it has been extensively studied (for a review, see Gray & Neuhoff, 1998), a unified treatment in the context of MMSE linear coding and its characterization has not been provided. We also offer an alternative proof of the decomposition in the course of characterizing the solution.

A special case of this problem was examined previously in which the linear transform was assumed to be convolutional, implying that the channel (or neural) dimension is equal to the input (or sensory) dimension and that individual coding units are shifted versions of each other with the identical filter shape (Ruderman, 1994). Those simplifying assumptions have commonly been made in the study of optimal sensory coding, because it matches reasonably well in a certain sensory system (e.g., foveal retina) and it is analytically more tractable (Atick & Redlich, 1990, 1992; Atick, Li, & Redlich, 1990; van Hateren, 1992). Also, an approximate decomposition along similar lines has been used under an information maximization objective (Atick & Redlich, 1992; Atick, 1992; Dayan & Abbott, 2001). Here, we assume no such restrictions and provide a general solution with the exact decomposition, in which the channel dimension may be smaller than, equal to, or larger than the input dimension (undercomplete, complete, or overcomplete representations, respectively), and individual coding units are not restricted to having the same filter shape.

## 2. Problem Formulation

**s**is zero mean with covariance , is a fixed linear distortion, and is sensory (or input) noise. Note that signal

**s**may not necessarily be gaussian distributed. Encoding is assumed to be a linear transform into an arbitrary dimension, and the resulting representations may be undercomplete, complete, or overcomplete, with the linear encoder, neural (channel, or output) noise, and

**r**the representation. Decoding is assumed to be linear: where is the linear decoder and is the estimate of the original signal.

The problem is to find and that minimize subject to one of the power constraints above.

## 3. Results

### 3.1. Optimal Linear Decoder.

### 3.2. Exact Decomposition.

**y**is the Wiener estimate

**s*** (and hence the reconstruction by robust coding ), then it can be shown that the product of Wiener filtering and robust coding, , is the optimal linear transform for the generalized problem (see Figure 1).

**x**using the eigenvalue decomposition, and also rewrite the linear encoder as where whitens the observed signal

**x**, and is the linear encoder with respect to the whitened signal. Note that the inverse of always exists because of the nonzero input noise .

^{1}we rewrite and , and equation 3.12 is further simplified as

*M*-dimensional channel whose

*N*-dimensional input spectrum (Lee, 1975; Lee & Petersen, 1976; Doi et al., 2007).

^{2}The optimal that satisfies the total or the individual power constraint is ready to be derived from these earlier studies. □

This decomposition implies that the optimal filtering in the MSE sense is given by denoising and deblurring of the input signal, followed by the optimal coding of this cleaned signal where the signal transmission is restricted by channel noise and the limited channel dimension. The first filtering is separable from the second, implying that input degradation cannot be compensated by increasing the representational capacity of the channel, for example, by decreasing channel noise or increasing the channel dimension.

In summary, the MMSE linear transform can be obtained by the following steps:

Find the Wiener filter given the input signal spectrum, the linear distortion, and the amplitude of input noise: .

Find the robust coding solution given the Wiener estimate spectrum, the amplitude of channel noise, and the channel dimension: .

The optimal linear transform for the generalized problem is .

### 3.3. Spectral Characterization.

The exact decomposition into Wiener filtering and robust coding allows us to gain insight into how the reconstruction error is minimized by analyzing how the power spectrum (eigenvalues) of a generic signal is transformed through the two serial processes.

*M*-dimensional orthogonal matrix

^{3}and is the gain of input signal in the eigenspace (note the input signal of this second process,

**y**=

**s***, is represented in the signal's eigenspace by in equation 3.20). The

*i*th element of this gain (squared, for clarity) is given by where is the threshold and only signal components whose eigenvalues are greater than this value are encoded with robust coding, and

*K*is the total number of such components (without loss of generality, we assume λ*

_{i}is sorted in descending order; see Lee, 1975; Lee & Petersen, 1976, for derivation).

*K*needs to be computed numerically.

Our analysis is reduced to that in Ruderman (1994) when the linear encoder is restricted to be convolutional, namely, when the eigenvector matrix is the Fourier basis and the *M*-dimensional orthogonal matrix is also given by the same Fourier basis (note this is *N*-dimensional).

Unlike Wiener filtering, robust coding is much less widely known, and its characteristics have not been fully examined except for one- or two-dimensional signal problems (Doi et al., 2007). Next we illustrate four major characteristics of robust coding for a general *N*-dimensional signal.

First, the threshold *l*_{0} defines which input signal dimensions are encoded. Figure 2a shows the signal's eigenvalues (power spectrum). Components that exceed *l*_{0} are encoded, and the rest are discarded in robust coding. The corresponding “critical” index (or frequency), *K*, is indicated with the gray vertical line in all panels in Figure 2. From equation 3.22, we observe the following:

- •
The threshold

*l*_{0}gets lower (and hence the critical frequency*K*goes higher; see Figure 2a) if a larger total power (see equation 2.8) is available in the neural population. The total power can be increased by increasing the power available for individual neurons or by increasing the neural population size while the individual neural power is fixed, or both. Note that overcompleteness in our model is useful to minimize MSE because it could increase the total power. If the overcompleteness is increased while the total power is fixed, the MMSE will not change. (This could still be beneficial for the other reason, for example, if a large number of low-power-coding units is more economical than a small number of high-power-coding units.) The changes of encoder and the critical frequency by doubling the population size at a fixed individual neural power are illustrated in Figure 2b, and the resulting change of the error ratio is shown in Figure 2f. - •
The threshold gets lower if channel noise σ

^{2}_{δ}is smaller. Note that in equation 3.22 is the effective SNR where*K*is the dimension of subspace represented by the neural population; it is different from the apparent average SNR of the neural population, which is given by where*M*is the neural population size. - •
The thresholding depends on the anisotropy (or distribution) of input spectrum λ*

_{i}, and there is no thresholding when it is isotropic. This was thoroughly analyzed for the two-dimensional signal (Doi et al., 2007).

Second, the squared gain of the input signal (see equation 3.21 illustrated in Figure 2b) has the maximum at the frequency *L* whose eigenvalue is λ*_{L} = 4*l*_{0} (see Figure 2a). (This can be shown by the first- and the second-order derivatives of equation 3.21a with respect to λ*_{i}.) It is interesting that the bandpass characteristic emerges under the MMSE objective, with the 1/*f*^{2} signal. In contrast, under the information maximization objective, the optimal filtering is whitening (Atick, 1992; Cover & Thomas, 2006), and hence it is high-pass filtering with 1/*f*^{2} signal (its squared gain is proportional to *f*^{2}).

**r**(given by the product of the squared gain of robust coding, equation 3.21a, and the power spectrum of signal, λ*

_{i}, plus channel noise spectrum, σ

^{2}

_{δ}) is proportional to the square root of the signal spectrum if the signal is beyond the threshold; otherwise it is identical to the noise spectrum, that is, those components are all noise (see Figure 2c): This may be seen as an intermediate between the original signal λ*

_{i}and a flat spectrum generated by whitening. This “half-whitening” is a distinct feature of robust coding.

*i*is (see Figure 2f): This results from the optimal allocation of a limited representational resource: more resources are allocated to signal components whose power is higher. In contrast, the information maximization objective leads to whitening instead of robust coding, in which the reconstruction spectrum is proportional to the input spectrum and the error ratio is constant for any signal component, even if the power spectrum of the signal is not uniform (Doi & Lewicki, 2005). This may be interpreted as allocating the representational resources evenly over the signal components even if their signal strengths are different. The suboptimality of whitening in the MMSE sense has been observed in the literature (Bethge, 2006; Doi et al., 2007; Eichhorn, Sinz, & Bethge, 2009), and our analysis provides an explanation as to why that is the case.

## 4. Conclusion

We investigated how to optimally organize a population of noisy neurons in order to represent noisy input signals. Our analysis provides an exact solution and its characterization under an idealized setting, which is applicable in a wide range of conditions. For example, the degradation of input signals can model optical blur and additive noise in image formation and could be set from measurements in a specific system. Similarly, the neural (channel) noise and population size can also be determined for the system of interest. This provides a way to predict the optimal code for a wide variety of biological systems. The application need not be restricted to vision or even to biological systems, and our results could be used to design optimal signal processors.

## Acknowledgments

We thank a reviewer for pointing out the prior study on the decomposition in a general case (Wolf & Ziv, 1970).

## Notes

^{1}

This holds in the application of image coding. Specifically, the image signal **s** is shift invariant, and the linear distortion is optical blur, and hence is convolutional. In such a case, the eigenvectors are given by the Fourier basis functions. More precisely, we further assume the periodic boundary condition for both **s** and , and then and , are circulant in addition to Toeplitz. The eigenvector matrix of circulant Toeplitz matrix is the DFT matrix. Alternatively, we can employ an approximation for a noncirculant Toeplitz matrix without assuming periodic boundary condition, which is called circulant approximation (Gray, 2006).

^{2}

^{3}

Under the total power constraint, the robust coding solution is unique up to an orthogonal matrix (see equation 3.20). Note that is cancelled out from equation 2.8. Under the individual power constraint, acts to distribute the total power evenly over the coding units. This problem is known as the inverse eigenvalue problem (Chu & Golub, 2005), and the existence of such an orthogonal matrix and an algorithm to find it were shown in Lee (1975) and Lee and Petersen (1976).