## Abstract

We present an approach for constructing nonlinear empirical mappings from high-dimensional domains to multivariate ranges. We employ radial basis functions and skew radial basis functions for constructing a model using data that are potentially scattered or sparse. The algorithm progresses iteratively, adding a new function at each step to refine the model. The placement of the functions is driven by a statistical hypothesis test that accounts for correlation in the multivariate range variables. The test is applied on training and validation data and reveals nonstatistical or geometric structure when it fails. At each step, the added function is fit to data contained in a spatiotemporally defined local region to determine the parameters—in particular, the scale of the local model. The scale of the function is determined by the zero crossings of the autocorrelation function of the residuals. The model parameters and the number of basis functions are determined automatically from the given data, and there is no need to initialize any ad hoc parameters save for the selection of the skew radial basis functions. Compactly supported skew radial basis functions are employed to improve model accuracy, order, and convergence properties. The extension of the algorithm to higher-dimensional ranges produces reduced-order models by exploiting the existence of correlation in the range variable data. Structure is tested not just in a single time series but between all pairs of time series. We illustrate the new methodologies using several illustrative problems, including modeling data on manifolds and the prediction of chaotic time series.

## 1. Introduction

In data-driven modeling problems, generally the goal is to construct a representation of the data that will be both descriptive and predictive. The requirement that the model be descriptive dictates that it accurately reproduces the relationships in the observed data, while the requirement that the model be predictive means that unobserved inputs, or domain values, be mapped to meaningful outputs, or range values. In general terms, it is the predictive quality, or ability to generalize from observations, of the model that leads to added value and the potential for knowledge discovery.

When data may be viewed as being sampled from a manifold, we may appeal to some
powerful theorems from differential geometry to guide our modeling approach. For
example, Whitney's (1936) (easy) embedding
theorem pertains to representing data as the graph of a nonlinear function and
provides guidance concerning the dimension of the domain when it is not known a
priori. More precisely, Whitney's theorem states that an *m _{d}*-dimensional manifold residing in an

*n*-dimensional ambient space may be embedded in a 2

*m*+1 dimensional subspace.

_{d}^{1}The proof of the theorem is constructive and shows that there is a large set of projections such that there exists a nonlinear inverse that affords perfect reconstruction of the data.

^{2}Whitney's theorem pertains not to data but to manifolds. When data may be viewed as a sampled manifold, the question arises concerning how to construct the nonlinear function that Whitney's theorem proves exists.

In contrast, Takens' theorem (1980) indicates that data sampled from an *m _{d}*-dimensional smooth manifold may be reconstructed (up to a diffeomorphic
copy) by using a time-delay embedding in a space of dimension 2

*m*+1. What is surprising about this theorem is that it states that only a single component, a scalar function, of the time-dependent trajectory must be observed to permit this reconstruction. This permits a now-well-known framework for the time-series prediction problem (Packard, Crutchfield, Farmer, & Shaw, 1980). These geometric theorems suggest that data on manifolds may possess a structure that is richer than simple nonlinearity and may be exploited in the modeling process. This is the primary motivation for the algorithm proposed in this letter.

_{d}The objective of this letter is to present a new approach for building nonlinear models for multivariate data that is particularly useful for modeling time-evolving trajectories on manifolds. The main contribution is a practical algorithm to model geometric structure not just in the individual components of the range of model but also to identify and model structure that exists between all pairs of components. The proposed algorithm is an extension to our previous work (Jamshidi, 2004; Jamshidi & Kirby, 2007; Anderle & Kirby, 2001), which deals with modeling where the range dimension is one. Similar to the univariate case, the proposed algorithm progresses iteratively, adding a new function at each step to refine the model. The placement of the functions is driven by a statistical hypothesis test, but now in higher dimensions that reveals geometric structure when it fails. At each step, the added function is fit to data contained in a spatiotemporally defined local region to determine the model parameters. A great advantage of the proposed algorithm is that it does not require ad hoc parameters save for the selection of the skew radial basis functions. Thus, the number of basis functions required for an accurate fit is determined automatically by the algorithm, making it a powerful tool for modeling multivariate time series, especially when the additional manifold structure may be exploited.

This letter is organized as follows. In section 2, the geometric theorems are discussed in the context of data analysis. In section 3.1, basic facts about radial basis functions (RBFs) are reviewed, and a recent variation, skew RBFs (sRBFs), is described. In section 4, the theoretical aspects of the multivariate RBF algorithm are presented. In section 5, the basic multivariate RBF algorithm is presented in detail. Numerical experiments are described in section 6, illustrating applications of Whitney's and Takens' theorems. Section 7 summarizes the main contributions of this letter.

## 2. Geometric Framework

Here we outline the geometric framework of the algorithms. We shall see that at the heart of each problem is a need for construction of a nonlinear mapping of multivariate data.

### 2.1. Representing Data as a Graph of a Function.

*x*as being the sum of the portion of

*x*in the range of , that is, , and the portion in the null space of , that is, .

*d*projector may be defined by where is an matrix. Similarly, the complementary projector may be defined as where . We note that the quantity is an

*n*-tuple in the ambient basis: . It is the expansion coefficients that provide the

*d*-dimensional representation, and these are given as Similarly, the reduced representation for

*q*consists of

*n*−

*d*components: Taking these together, we rewrite equation 2.1 in terms of the reduced variables as

*f*is now a mapping from the

*d*-dimensional projected data to the

*n*−

*d*–dimensional set of residuals of the projection. The reconstruction of the original data then may be written in terms of two components—one linear and one nonlinear:

*x*is given as a mapping from the

*d*-dimensional linear space to the

*n*−

*d*-dimensional residual. In general, one does not know the domain of the graph, and in these cases, one must identify a good projection to determine this.

If the rank of is *d* where *d*>2*m _{d}*, then Whitney's theorem ensures the existence of a global map from the
range of the projector to its null space, that is, there exists an

*f*such that equation 2.2 will hold. We have referred to the representation in equation 2.3 as the Whitney reduction network (Broomhead & Kirby, 2001). Note that the special case of principal component analysis is obtained if the linear term contains enough dimensions to fully reconstruct, or encapsulate, the data. However, if the data have the structure of a manifold, then it is more efficient to represent by taking the rank of to be as small as possible and to have the nonlinear term encode the manifold structure. Determining parsimonious representations for

*f*in this expression is one of our main motivations for developing efficient algorithms for multivariate RBFs.

### 2.2. Time-Delay Embedding.

*T*corresponds to a separation interval between predicted samples, corresponds to a separation interval between domain samples, and

_{s}*m*indicates the number of ambient dimension of the range of the map

*f*. Typically the values of

*T*and are problem dependent and must be determined for each data set. In general, it is not possible to say, a priori, whether such a function

_{s}*f*exists for a given data set. One may answer this question empirically by attempting to construct

*f*empirically from the data.

*m*, then it is sufficient to observe a scalar value from this system, say

_{d}*x*(

*t*), to be able to reconstruct geometry of the manifold up to a diffeomorphism; this result is known as Takens' theorem (Takens, 1980). In particular, the time-delay coordinates of the scalar time series, create a time series (Packard et al., 1980). This time series traces out a manifold structure that is effectively a reconstruction of the original manifold as long as, according to Takens' theorem,

*n*is taken large enough and guaranteed for

*n*>2

*m*. In this context,

_{d}*n*is referred to as the global embedding dimension, and is the time delay (see, e.g., Abarbanel, Gilpin, & Rotenberg, 2005, for additional implementation details).

*n*is the number of time series of interest and are the time delay and the embedding dimension of the

_{r}*i*th time series. Again, a priori it is often not possible to know whether the data of interest possess a manifold structure or if a prediction function

*f*exists.

## 3. Relationship to Other Work

Here we describe the background of RBFs, as well as provide an overview of recent work in the general area of manifold learning

### 3.1. Data Fitting with Radial Basis Functions.

RBFs have been widely used for data approximation. Multiquadrics, that is, functions of the form , were introduced by Hardy (1997) for modeling scattered data on topographical surfaces. Thin plate splines, , were introduced by Harder and Desmarais (1972) for surface interpolation. Gaussian RBFs, , were proposed by Broomhead and Lowe (1988) for data fitting and classification. There is a significant literature treating theoretical aspects of RBFs, including universal approximation theorems (see, e.g., Powell, 1992; Park & Sandberg, 1991, 1993; Schaback & Wendland, 2000). A large number of algorithms have been proposed in the literature for computing the model parameters (Platt, 1991; Karayiannis & Mi, 1997; Holmes & Mallick, 1998; Yingwei, Sundararajan, & Saratchandran, 1998; Nabney, McLachlan, & Lowe, 1996). The research monographs (Lee & Haykin, 2001; Buhmann, 2003; Wendland, 2005) contain references treating additional theory, algorithms, and applications.

*z*. These functions, taken together, result in the Arctan-Hanning sRBF,

The issue now is to determine the parameters associated with the data-fitting
problem. Of particular importance are the locations of the fitting functions and the width of the basis , as well as the number of fitting functions *N _{c}*. A method for initializing these using the correlation and
cross-correlation structure is described in section 4. Once these parameters are initialized, together
with the remaining parameters they are determined using a variety of nonlinear
optimization routines.

### 3.2. Manifold Learning.

This letter particularly concerned with modeling multivariate time series on
manifolds—where locally the structure behaves like Euclidean space. This
setting is attractive given the existence of Whitney's theorem and Takens'
theorem described in section 2, which
indicate that data on manifolds are particularly amenable to nonlinear modeling
by constructing an empirical representation of a function *f* that captures the structure of the data from examples. This review presented
here is not intended to be comprehensive but rather a sampling of the literature
on this topic.

The modeling procedure presented here represents a special case of a more general area now referred to as manifold learning. We note that this field comprises a collection of approaches to model data assuming a special structure, in the same way lines of best fit are used to model data where we assume some linear trend exists. It is not essential that the data actually reside on a manifold to make these techniques useful in the same way that data need not fit exactly on a line when we use linear least squares. Of course, the more like a manifold the data are, the better we can expect the results to be.

There are a variety of ways to approach the manifold learning problem. A local approach based on data charts models the data in patches that, when taken together, form a complete manifold model. Each chart consists of a connected neighborhood of the manifold, equipped with a projection mapping from the manifold to a linear space (whose dimension is the topological dimension of the manifold) and an inverse function from the linear space to the manifold.

The composition of these functions behaves as the identity mapping. (For additional details on this approach, see Hundley, 1998; Hundley, Kirby, & Miranda, 1999.) A probabilistic approach for generating an atlas of charts is provided in Brand (2003). This algorithm employs a mapping that preserves local geometric relations in the manifold and is pseudo-invertible. The algorithm decomposes the sample data into locally linear low-dimensional patches that are merged into a single low-dimensional coordinate system by stochastic optimization.

A manifold learning approach referred to as local linear embedding (Roweis & Saul, 2000) represents each point in the ambient space as a weighted sum of its nearest neighbors. This weighting matrix is then used to generate a configuration of points with the same spatial relationships in a space of reduced dimension. We note that this approach results in an eigenvector problem where the matrix has the same size as the number of points in the sample. Although strictly speaking, this is not an isometric embedding of the data, it may be viewed loosely as topology preserving.

Multidimensional scaling (MDS) approaches this problem from a different perspective (Mardia, Kent, & Bibby, 1980). Given a set of distances between points, is it possible to locate these points in Euclidean space such that they have the same interpoint distances. MDS provides a solution in terms of the eigenvectors of the interpoint distance matrix. An extension to this approach for data on manifolds, referred to as ISOMAP, is based on computing geodesic distances numerically from sampled data and using these distances in the MDS eigenvector problem (Tenenbaum, de Silva, & Langford, 2000). The result is an embedding of the data in Euclidean space with distances between the data points, as measured on the manifolds, preserved. Other techniques have been proposed for constructing isometric embeddings, notably Belkin and Niyogi (2003), Donoho and Grimes (2003), and Coifman and Lafon (2006).

In Roweis, Saul, and Hinton (2002), the local linear models are represented by a mixture of factor analyzers, and the global coordination of these models is achieved by adding a regularizing term to the standard maximum likelihood objective function favoring models whose internal coordinate systems are aligned in a consistent way. As a result, the internal coordinates change smoothly as one traverses a connected path on the manifold even when the path crosses the domains of many different local models. To get an efficient algorithm that allows separate local models to learn consistent global representations, an automatic alignment procedure that maps the disparate internal representations learned by several local dimensionality-reduction experts into a single coherent global coordinate system for the original data space is studied in Teh and Roweis (2002).

Kernel principal component analysis (KPCA; Verbeek, Roweis, & Vlassis, 2004) treats the processing of data on manifolds by first flattening the data out using an appropriately selected kernel function, for example, a Veronese embedding Schölkopf, Smola, and Müller (1998). This approach has the advantage of turning nonlinear problems into tractable linear problems via the kernel trick. KPCA is also related to the well-known classification technique of support vector machines (Vapnik, 1995).

Projection pursuit was proposed as a linear approach for reducing the dimension of data to reveal clusters in data and their separation (Friedman & Tukey, 1974). It is often used in the context of an interactive data visualization system such as the projection pursuit guided tour (Cook, Buja, Cabrera, & Hurley, 1995).

A global approach to manifold learning that involves reducing data such that it is constrained to lie on a prescribed manifold in the reduced space was presented in Kirby and Miranda (1996). For example, one can construct the best topological circle through a data set using this approach. This idea was extended to spheres in Hundley, Kirby, and Miranda (1995).

## 4. Statistical Background for the Multivariate sRBF Algorithm

*N*as where

_{c}*e*=

_{k}*y*−

_{k}*f*(

*x*) is the

_{k}*m*-variate residual of the

*k*th data point.

*L*is the cardinality of the training set. In addition, is the mean vector

*E*(

*e*), and is the covariance matrix at lag

_{k}*h*. An unbiased estimate for is given by . An estimate of the covariance matrix is given by Similar to the univariate case, we decompose the autocovariance function (ACVF) into its components as . Furthermore, is the (

*i*,

*j*)-component of . In other words, For a fixed lag

*h*, the quantity is the contribution of the

*k*th residual to the autocorrelation function. And the quantity is the contribution of the

*i*and

*j*th time series at the

*k*th residual of the autocovariance function. Later we focus on this quantity and illustrate that it reveals critical information concerning where new basis functions should be placed.

*R*(.) is then given by where is the (

*i*,

*j*)-component of . If

*i*=

*j*, reduces to the sample autocorrelation function of the

*i*th series. (For the asymptotic behavior and the convergence properties of the sample mean and covariance functions see Brockwell & Davis, 1991).

*m*-variate series is said to be white noise with mean 0 and covariance matrix , written as , if and only if

*e*is stationary with mean vector 0 and covariance matrix function: We use the notation to indicate that the random vectors are i.i.d. with mean 0 and variance .

_{t}In general, the derivation of the asymptotic distribution of the sample cross-correlation function is quite complicated even for multivariate moving averages (Brockwell & Davis, 1991). The methods employed for the univariate case are not immediately adaptable to the multivariate case. An important special case arises when the two component time series have independent moving averages. The asymptotic distribution of for such a process is given in the following theorem:

*Suppose that*

*where the two sequences and are independent, and . If , then*

*If and , then the vector is asymptotically normal (AN) with mean 0, variances as above and covariance*,

As reported in Brockwell and Davis (1991),
without knowing the correlation function of each of the processes, it is impossible
to decide if the two processes are uncorrelated with one another. The problem is
resolved by prewhitening the two series before computing the cross-correlation , that is, transfer the two series to white noise by application of
suitable filters. In other words, any test for independence of the two component
series cannot be based solely on estimated values of the cross-correlation without
taking into account the nature of the two component series. Note that since in
practice the true model is nearly always unknown and since the data *X _{tj}*, , are not available, it is convenient to replace the sequences by the residuals, which, if we assume that the fitted models are
in fact the true models, are white noise sequences. To test the hypothesis

*H*

_{0}that and are independent series, we observe that under

*H*

_{0}, the corresponding two prewhitened series and are also independent. Under

*H*

_{0}, theorem 1 implies that the sample autocorrelations and , , of and are asymptotically independent normal with mean 0 and variances

*L*

^{−1}. An appropriate test for independence can therefore be obtained by comparing the values of with . If we prewhiten only one of the two original series, say, , then under

*H*

_{0}, theorem 1 implies that the sample cross-correlations and , , of and are asymptotically independent normal with mean 0 and variances

*L*

^{−1}and covariance . Hence, for any fixed

*h*, also falls (under

*H*

_{0}) between the bounds with a probability of approximately 0.95.

*h*and finds that more than 0.05

*h*of the samples fall outside the bound, or that one value falls far outside the bounds, the i.i.d. hypothesis is rejected. This test can equivalently be written in terms of distribution. Given Brockwell and Davis (1991) showed that

*Q*has a distribution with

*L*−1 degrees of freedom. The adequacy of the model is therefore rejected at level if

## 5. Multivariate Algorithm Implementation

The main difference with the univariate algorithm is the statistical hypothesis test.
Again, the question of whether a new basis function should be added is answered by
the i.i.d. test. We shall see that this test also indicates where the new basis
function should be initialized. First, we compute the autocorrelation functions of
all the *m* time series. If all of these pass the white noise (WN) or
i.i.d. test, then the cross-correlations among the time series are considered. If
there is structure in the autocorrelations or cross-correlations of the time series,
then the i.i.d. will be rejected.

As in the univariate case, the next requirement is to determine where the new basis function should be located to optimally reduce the structure in the model residuals. In our extension, we look for the point in the domain that makes the largest contribution to the auto- or cross-correlation that has caused the test to fail.

*k*is inherited from the data labels and, in the case of a time series, corresponds to a time ordering. For simplicity, we assume that decreases monotonically for both increasing and decreasing values of

*k*until it crosses zero at the indices and . Here we use

*l*,

*r*to indicate left and right, respectively. We now compute the distances and as these indicate the size of the data ball around the center . The subset of the data employed to update the added basis function is then where is the entire training set. The distance

*d*can be selected in a variety of ways, and here we select Note that now may contain data whose indices have values that are substantially different from , and .

_{c}*c*

_{0}is initialized at the point of most structure according to our test: . The vector of widths is very effectively initialized using the diagonal elements of the covariance matrix of the local data, Note here that . The initial value for the multivariate weight, , is calculated via least squares using the initial values for center location and widths. We have initialized the skew parameters at zero. Then the parameters associated with the new basis function are optimized by solving the nonlinear optimization procedure using BFGS. The optimization cost function is where

*v*contains the new sRBFs parameters: . Note that all the multivariate range values associated with contribute to the optimization procedure.

## 6. Numerical Results

In this section, we provide numerical evidence that the proposed *k*-variate algorithm results in a model that significantly
outperforms *k* univariate models. The enhanced parsimony of the
model is result of accounting for correlations between the multivariate ranges.

### 6.1. Multivariate *Pringle* Data Set.

To begin, in this section, we consider the application of the multivariate RBF algorithm to the Pringle data set (Broomhead & Kirby, 2000, 2001). As described below, we will build a two-variate data set with correlation one between the first and the second time series, as a first illustration of the methodology. This problem illustrates the challenges of modeling data as the graph of a function with range dimension two.

This Pringle data set, as proposed in Broomhead and Kirby (2005), can be generated as the solution to the
following systems of ordinary differential equations, where and are parameters. A numerically integrated trajectory of the
attracting cycle (*x*, *y*, *z*) is
shown in Figure 1a.^{3} A second data set, also displayed in Figure 1a, is constructed by amplifying the *z* coordinate as (*x*, *y*,
2*z*).

A good projection for this problem, in the sense of minimizing the Lipschitz
constant of *f*, has been shown to be the *x*, *y*-plane (Broomhead & Kirby, 2005). Thus, the (*x*, *y*) data values are the inputs to our function, and the
range data values (*z*, 2*z*) constitute the
associated output of the function. The training set was selected to consist of
101 points (almost two cycles), while an additional 100 points were used for
validation. In this experiment, the testing data set was selected to be the next
500 points, or approximately nine cycles. We use the Hanning-RBF in equation 3.6 in the numerical
experiments in this section.

Figure 2 shows the output of the first RBF allocated by the multivariate algorithm. Figure 2a is the output associated with the first component of the multivariate range data, while Figure 2b is the output associated with the second component. The final model consists of four RBFs, and training was terminated as a result of the residuals' passing the i.i.d. test on the residual. The training, validation, and testing data sets and the output of the multivariate algorithm on multivariate Pringle data set are shown in Figure 1b. The performance of the RBF fit on the multivariate Pringle data set in the RMSE sense is shown in Figure 3a. The confidence level of the fitted model on the training and the validation set data sets as new basis functions are added to the model are shown in Figure 3b. After four RBFs have been added to the model, the confidence level of the i.i.d. hypothesis test is well above 95% for both the training and validation data, indicating a lack of structure in the residuals.

This simple example illustrates the ideal behavior of the multivariate algorithm when the output time series are highly correlated. Two univariate models in this instance would double the model complexity. Our next example, while harder to visualize, is a more challenging test for the algorithm.

### 6.2. Multivariate Mackey-Glass.

*a*=0.2,

*b*=0.1, and using the trapezoidal rule with , with initial conditions for . The initial 1000 data points corresponding to transient behavior are discarded. The next 3000 data points are reserved for the training set; points 4001 to 5000 are used as the validation set. Finally, the test set consists of 500 data points starting from point 5001. In these experiments (uniformly distributed), noise was added to the data with a standard deviation of 0.05. We use the Arctan-Hanning sRBF in equation 3.7 in the numerical experiments in this section.

*n*, a forecast is made of the values and — and samples ahead using the time-delay embedding consisting of the samples

*s*,

_{n}*s*

_{n−6},

*s*

_{n−12}, and

*s*

_{n−18}. Hence, for each

*n*, the desired mapping

*f*should take the time-delayed vector to this pair of future samples: In the experiments that follow we consider and . These parameters are similar to those selected in Yingwei et al. (1998), where only a univariate output is considered (see also Liebert, Pawelzik, & Schuster, 1991; Farmer & Sidorowich, 1987; Kennel, Brown, & Abarbanel, 1992). The autocorrelation function of this time series goes through zero between lags 13 and 14 and is approximately −0.75 at lag 25. Given that the two time series in the range have significant correlation, we expect that fitting a two-variate model using the proposed algorithm rather than two univariate models will result in a reduced-order model. As we shall see below, this is indeed the case.

Figure 4 shows the confidence level of the
multivariate model as new Arctan-Hanning sRBFs are added to the model for the
case . We provide the confidence levels for all pairs of the time
series. In almost all cases, the confidence level of 95% was achieved for the
training set and validation set with approximately the same number of RBFs. In
general, we advocate that this confidence level be attained for both sets unless
the RMSE on the validation set increases suggesting the possibility of
overfitting. The number of sRBFs required to achieve 95% of confidence on both
the training and validation data sets is 43 for the two-variate problem. In this
case, the RMSE of the final model is 0.0175, which is below the noise floor. The
associated univariate 25 steps-ahead prediction problem requires 20 sRBFs to
achieve 95% of confidence on the training and validation sets and results in a
model with an RMSE of 0.0128, again below the noise floor. The univariate model
for 50 steps-ahead prediction requires 28 sRBFs to reach 95% of confidence on
training and validation sets with an RMSE below the noise floor of 0.0192. So,
in summary, for this data set, the sRBF representation of *f* using two univariate models consisting of 53 modes is over 20% larger than the
two-variate model consisting of 43 sRBFs, while the errors for both
representations are under the noise floor.

Figure 5 shows both outputs of the two-variate model for the Mackey-Glass experiment for the case . Figure 5a shows the output for the 25-steps-ahead prediction, while Figure 5b provides the model output for the 50-steps-ahead prediction. The log plot of the RMSE of the model as new sRBFs are added to the model is shown in Figure 5c.

We repeated this experiment for the case . For this case, the two-univariate model representation was over 60% larger than the single two-variate, model consisting of 74 and 49 sRBFs, respectively, with comparable errors below the noise floor. This example provides convincing evidence of the savings one may obtain using the cross-correlation structure of the residuals of the model. Details of this second example may be found in Jamshidi (2008).

## 7. Conclusion

We observed that the extension of the ACF test for i.i.d. noise to *k*-variate ranges produces models of significantly smaller order
than using *k*-univariate models while attaining comparable
prediction errors. This is a consequence of the fact that the correlations as well
as the cross-correlations of the residuals of the multivariate model are being
exploited during the model-building process. We applied the proposed algorithm to
the problem of representing data as the graph of a function as well as the benchmark
problem of forecasting the chaotic time series produced by the Mackey-Glass
equation. In each case, we saw a substantial reduction in the order of the *k*-variate model when compared to *k*-univariate
models.

The opportunity for applications of the proposed RBFs is significant: various problems in nonlinear signal processing, optimal control, computer vision, pattern recognition, and prediction, such as the financial time-series problem. In future work, we will apply these functions for representing data on manifolds as graphs of functions (Broomhead & Kirby, 2000, 2001), as well as the low-dimensional modeling of dynamical systems (Broomhead & Kirby, 2005).

The algorithm proposed here exploits the existence of cross-correlation between the output time series. Hence, we expect the advantage of this approach to be especially significant in cases where the range data possess a related geometric or statistical structure.

## Acknowledgments

This work is partially supported by NSF grants MSPA-MCS 0434351, ATM-530884, and DOD-USAF-Air Force FA-9550-08-1-0166. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation or the AFOSR.

## Notes

^{1}

This theorem is particularly helpful when *n*>>2*m _{d}*+1.

^{2}

In mathematical terms, there is an open and dense set of projections that allow
one to reduce the dimension to 2*m _{d}*+1.

^{3}

In this example, we are concerned only with fitting data on the limit cycle and ignore transients.

## References

## Author notes

Arta Jamshidi is now at the Department of Electrical and Electronic Engineering, Imperial College London, London SW7 2AZ, U.K.