Abstract

We propose a method for intrinsic dimension estimation. By fitting the power of distance from an inspection point and the number of samples included inside a ball with a radius equal to the distance, to a regression model, we estimate the goodness of fit. Then, by using the maximum likelihood method, we estimate the local intrinsic dimension around the inspection point. The proposed method is shown to be comparable to conventional methods in global intrinsic dimension estimation experiments. Furthermore, we experimentally show that the proposed method outperforms a conventional local dimension estimation method.

1  Introduction

In common data analysis situations, an observed datum is expressed by a -dimensional vector . In general, the apparent data dimension and its intrinsic dimension are different. A basic assumption in many data analysis and machine learning methods is that the intrinsic dimension is low even when the apparent dimension is high and the data distribution is constrained onto a low-dimensional manifold. Examples of such methods include manifold learning (Tenenbaum, Silva, & Langford, 2000; Brand, 2002; Ma & Fu, 2011), subspace methods (Oja, 1983) and visualization and dimensionality-reduction methods (Cook & Yin, 2001; Kokiopoulou & Saad, 2007). The key to the success of dimensionality reduction, manifold learning, and latent variable analysis lies in the accurate estimation of the intrinsic dimension of the data set at hand.

Existing intrinsic dimension estimation (IDE) methods can be roughly classified into two categories: projection-based methods (Fukunaga & Olsen, 1971; Verveer & Duin, 1995; Kambhatla & Leen, 1997; Bruske & Sommer, 1998) and distance-based methods (Pettis, Bailey, Jain, & Dubes, 1979; Grassberger & Procaccia, 1983; Kégl, 2002; Levina & Bickel, 2005; Hein & Audibert, 2005; Fan, Qiao, & Zhang, 2009; Gupta & Huang, 2010; Eriksson & Crovella, 2012). The basic concept of projection-based methods is to first divide the data distribution space into cells. Then, in each cell, principal component analysis is performed on the subset of data falling into that cell, and the number of significant eigenvalues is used as an estimate of the intrinsic dimension in the cell. These methods yield the local dimension estimate in each cell, a preferable characteristic for analyzing complicated data sets. The performance of such methods is heavily dependent on how the input space is divided. Moreover, these methods lack an objective criterion for determining the number of significant eigenvalues. This approach is suitable for dialogical analysis, that is, iterative parameter tuning and investigation of the obtained results. In this letter, we focus on automatic dimension determination and do not consider a dialogical approach.

Further, we consider a class of methods based on the distance between objects. The fundamental concept underlying this class of methods is based on the fact that, in -dimensional space, the probability mass within the hyperball of radius centered at the inspection point is proportional to . Historically, the intrinsic dimension defined by the ratio of the probability mass and is called the fractal dimension (Falconer, 1990). Depending on the measure of volume element in a metric space, there are several varieties of the fractal dimension, including the capacity dimension (Hentschel & Procaccia, 1983), the information dimension (Rényi, 1959), and the correlation dimension (Grassberger & Procaccia, 1983). Most of the notions of fractal dimension are considered as special cases of the -dimension (Pesin, 1993, 1997). We note that Houle (2013) discussed general properties of fractal dimension.

It is worth noting that there is another approach for IDE based on the minimum spanning tree. Costa and Hero (2004) propose a method for estimating -divergence (Ali & Silvey, 1966; Csiszár & Shields, 2004) based on the total edge length of the minimum spanning tree of the observed objects; they also used this method to estimate the intrinsic dimension. This work is based on probability and graph theory and is rich in theoretical implications. For example, this approach has been recently applied to the efficient estimation of Fisher information and -divergence for multidimensional distributions (Sricharan & Hero, 2012; Moon & Hero, 2014). However, the number of samples required for stable estimation is large and the computational cost is high. For these, reasons, we do not consider this approach in this letter.

Existing IDE methods are classified into two categories: global and local methods. Global methods assume that all the observed data points share the same intrinsic dimension, while local methods allow different dimensions for different points. Most of the methods introduced above were originally global methods. However, Farahmand, Szepesvári, and Audibert (2007) estimated the global dimension by the average of the local dimensions, and this approach can be used as a local method.

In this letter, we propose a local IDE method based on a higher-order expansion of the probability mass and the generalized linear model(Dobson, 2002). The proposed method is formulated as a special case of the local fractal dimension. The method is similar to the one proposed in Farahmand et al. (2007), but it is shown to yield a more accurate results in our experiments.

The remainder of this letter is organized as follows. Section 2 introduces existing definitions and methods for IDE. Section 3 describes the proposed method. Section 4 evaluates the proposed method and conventional methods through numerical experiments. Finally, section 5 discusses our findings and concludes the letter.

2  Definitions of Intrinsic Dimension

In this letter, we consider the intrinsic dimension based on the distance between objects, which is called the fractal dimension or topological dimension (Falconer, 1990) and is briefly reviewed in this section.

2.1  -Dimension

The -dimension has been proposed for dealing with fractal dimensions in a unified manner (Pesin, 1993, 1997). Our method is also a variant of the -dimension; hence, we explain the -dimension and its special cases in detail. It is possible to define the notion of dimension for a general metric space, although we consider a -dimensional metric space with a metric induced from the Euclidean norm and assume that for simplicity. Let be the Borel measure in the metric space .

For , we define the average volume as
2.1
with respect to the measure . In the above definition, is a hyperball of radius centered at .
Definition 1.
For , the upper and lower -dimensions (Pesin, 1993, 1997) are defined as
2.2
and
2.3
respectively. If , their common value is denoted by and is called the -dimension.
The -dimension is not defined when the upper and lower limits do not coincide. In this letter, we assume that they take the same finite value and simply define the -dimension as
2.4
An alternative definition of the -dimension uses hypercubes of edge length to cover the support of the measure instead of the closed balls (Hentschel & Procaccia, 1983). Let be the number of hypercubes intersecting the support of and denote the natural measures of these cubes, that is, the probability that these cubes are populated, by . The number is called the covering number. Then the -dimension based on the grid of hypercubes is defined as
2.5
It is known that in the limit , equations 2.4 and 2.5 give the same value.

In the following sections, we introduce some specific definitions of dimensions derived from the -dimension as well as methods for estimating them.

2.2  Correlation Dimension

The correlation dimension (Grassberger & Procaccia, 1983) is a special case of the -dimension when . For the set of observed data , using a sufficiently small , the average volume for is defined as
2.6
which is called the correlation integral. In equation 2.6, is the Euclidean norm, and is the step function defined as
2.7
Using the average volume , the correlation dimension is defined as
2.8
Intuitively, the number of sample pairs with distance smaller than would increase in proportion to , where is the intrinsic dimension. The correlation dimension exploits this property, , to define the intrinsic dimension . Grassberger and Procaccia (1983) proposed the use of the finite sample estimates
2.9
of the average volume with two different radii, and , in order to estimate the correlation dimension, equation 2.8, as
2.10
Instead of using the step function for counting the number of sample pairs, Hein and Audibert (2005) proposed the use of a U-statistic of the form
2.11
using a kernel function with bandwidth and replaced the correlation integral by . The convergence of this U-statistic with , by an argument similar to kernel bandwidth selection (Tsybakov, 2009), requires that and . Hein and Audibert (2005) used this condition to derive a formula for estimating the global intrinsic dimension .

2.3  Capacity Dimension

By setting in the definition of -dimension 2.5, we obtain the capacity dimension (Hentschel & Procaccia, 1983) as
2.12
The capacity dimension is directly determined by the covering number ; it is also called the box counting dimension. The problem of estimating is reduced to the problem of estimating , but finding the covering number even of a finite set of data points is computationally intractable. Kégl (2002) proposed replacing of the covering number with the -packing number . Given a metric space with distance , a set is said to be -separated if for all . The -packing number is defined by the maximum cardinality of an -separated subset of the data space , and it is known that the inequalities hold. Considering the fact that the capacity dimension is defined as the limit ,
2.13
holds. Capacity dimension based on the packing number is estimated using the estimates of the packing number at two different radii, and , as
2.14
where is the estimate of the packing number. The -packing number has been estimated using a greedy algorithm in Kégl (2002) and a hierarchical clustering algorithm in Eriksson and Crovella (2012).

2.4  Information Dimension

When we consider in the -dimension, we obtain
2.15
which is called the information dimension (Rényi, 1959) because the minus sign of the Shannon entropy appears in the numerator. Estimating the information dimension is difficult because the natural measure is unknown in general, but when we assume that , holds.

The proposed method described in section 3 is based on the local information dimension. Farahmand et al. (2007) proposed the local information dimension estimator, but the relationship between the -dimension and the information dimension, but they did not investigate their proposed method in their original work. We show that their result can be regarded as the local information dimension. Further, we propose novel methods based on higher-order expansions of the probability mass function in section 3.

2.5  Other Approaches

We note that there is another line of approach to distance-based IDE besides the fractal dimension. These methods are based on the th nearest distance from inspection points (Pettis et al., 1979; Levina & Bickel, 2005; Gupta & Huang, 2010). One of the oldest methods of this type was proposed in Pettis et al. (1979), where the distribution of the distance from a data point to its -nearest point is approximated by a Poisson distribution. The estimated average of the distance includes the intrinsic dimension , and it is estimated by an iterative algorithm. Another well-known IDE method is based on the maximum likelihood method (Levina & Bickel, 2005; Gupta & Huang, 2010). The likelihood of the rate parameter of the Poisson process on is considered, and the maximum likelihood estimator (MLE) for the intrinsic dimension is derived. It is also worth noting that there is another line of studies on distance- and model-based dimension estimation. The notion of expansion rate is introduced for characterizing the maximum rate of increase of the neighborhood set sizes as the neighborhood radius is doubled (Karger & Ruhl, 2002). The logarithm of the expansion rate is used as a measure of intrinsic dimensionality, and a number of extensions are proposed for intrinsic dimension estimation (Houle, Kashima, & Nett, 2012; Amsaleg et al., 2015).

3  Proposed Method

Let be an absolutely continuous probability measure on a metric space , and let the corresponding probability density function be . Consider the problem of estimating the value of the probability density function (pdf) at a point by using a set of observations . We start with the introduction of the local information dimension estimation. Then we propose a novel local IDE method based on a higher-order expansion of the probability mass function and multiple regression.

3.1  First-Order Method: Manifold Adaptive Dimension Estimation

Let the -dimensional hyperball of radius centered at be . The probability mass of the ball is defined as
3.1
where is the uniform measure in -dimensional Euclidean space. Let
3.2
be the local volume of the ball around the inspection point . We note that equation 2.1 is the average volume of the ball of radius in the entire metric space , while equation 3.2 is the local volume around . Noting that for a nonzero value , and , we obtain
the probability mass is considered as the local information dimension. We assume that for a sufficiently small radius , the value of the pdf is approximately a constant within the ball . Under this assumption, using the Taylor series expansion of the probability mass, we obtain
where . The volume of the ball with the uniform measure is , where and is the gamma function. In this expansion, the integration is performed within the -ball; hence, is of the same order as . The term with the first derivative of the density function vanishes owing to symmetry. When we fix the number of samples falling within the ball instead of the radius , the radius is determined by the distance between the inspection point and its th nearest neighbor. In this letter, denotes the radius determined by . When we fix the radius , the number of samples falling within the -ball centered at an inspection point is determined, and it is denoted by . In Farahmand et al. (2007), is approximated by the ratio of and the sample size as
3.3
Then, for different radii , taking the logarithm of the above approximation formula gives
Solving this system of equations with respect to the dimension yields the estimate of the local information dimension:
3.4
Intuitively, as shown in Figure 1, even when the dimensionality of the ambient space is two, the spread of data is one-dimensional and the increase in the number of samples included in with respect to the increase in is close to that of one-dimensional Euclidean space; hence, the intrinsic dimension is estimated as one. Since the convergence rate of this estimator is independent of the ambient dimension but depends on the intrinsic dimension, is called the manifold adaptive dimension estimator in Farahmand et al. (2007). This estimator is simple and easy to implement. It also provides a finite sample error bound. However, in our experiments, it does not show better performance than the other methods.
Figure 1:

Intrinsic dimension and assumption of local uniformity.

Figure 1:

Intrinsic dimension and assumption of local uniformity.

3.2  Second-Order Method: GLM for Dimension Estimation

In this letter, we propose accurate local IDE methods based on a higher-order expansion of the probability mass function. By using the second-order Taylor series expansion for , we obtain the following proposition:

Proposition 1.
The probability mass of the -ball centered at is expressed in the form
3.5
(See the appendix for the proof of proposition 2.) By approximating the probability mass empirically with the ratio (i.e., the number of samples falling into the -ball to the whole sample size), we obtain the relationship
by ignoring the higher-order term with respect to . Furthermore, by multiplying both sides of each equation by and letting the coefficients of and be and , respectively, we obtain
3.6
For estimating the intrinsic dimension using the second-order Taylor series expansion, we fit a generalized linear model (GLM; Dobson, 2002) to equation 3.6, which can express the counting nature of the left-hand side of that equation.
Let the intrinsic dimension at the inspection point be . We denote realizations of a vector-valued random variable , which is composed of and , where is the distance from the inspection point, by
3.7
We also introduce realizations of a random variable , which is the number of samples included in the ball . Specifically, we consider a pair of random variables and fix a radius corresponding to a trial that results in realizations . Since the realization is the number of samples within the -ball, we assume that the error structure of is a Poisson distribution. Then we can formulate the relationship between the distance from the inspection point and the number of samples falling within the -ball by a GLM with a Poisson error structure and linear link function as
3.8
A set of different radii is denoted as , that is, , and maximize the log likelihood of the Poisson distribution with observation with respect to the coefficient vector . In this work, we simply consider the th nearest neighbor with and let the Euclidean distance from the inspection point to its th nearest point be . Then we uniformly sample radii from a uniform distribution in . A theoretical investigation on the choice of and optimal selection of the radii will be performed in future work. Let the observation vector and design matrix, which are composed of realizations of and , be
3.9
3.10
We consider a GLM (Dobson, 2002) with the linear predictor and identity link
3.11
and we estimate by maximizing the log-likelihood function of the Poisson distribution:
3.12
3.13
The proposed IDE methods assume that the intrinsic dimension is , and the optimal IDE is estimated on the basis of the goodness of fit of the data to the regression model, equation 3.6. We use the log likelihood, equation 3.13, to measure the goodness of fit. Further, we note that the number of parameters is always two, even when we change the assumed IDE : hence, the problem of overfitting by maximizing the likelihood is avoided in our setting.

3.3  Method 1: Discrete Dimension Estimation

Let the ambient dimension of the observation be . We assume that the intrinsic dimension is , and for every , we fit the regression model, equation 3.6, by maximizing the log likelihood, equation 3.13, with respect to and employ the dimension that maximizes the likelihood
3.14
The proposed method is summarized in algorithm 1. This method returns a natural number as the intrinsic dimension.

3.4  Method 2: Continuous Dimension Estimation

The log likelihood, equation 3.13, is a function of both the regression coefficients and the intrinsic dimension . Given a set of observations, we consider directly maximizing the log-likelihood function with respect to . Since it is difficult to obtain a closed-form solution for the maximizer of the likelihood, equation 3.13, we numerically maximize the likelihood. The derivatives of the likelihood function with each of variables are
3.15
3.16
3.17
Using these derivatives, we obtain the estimate,
3.18
by using the quasi-Newton method (the BFGS method) for maximizing the likelihood. The initial value for the method is set to the one obtained by proposed method 1. We note that this method returns a real number as the intrinsic dimension ; hence, when we expect a noninteger dimension, this method is more suitable than the proposed method 1. The proposed method 2 is summarized in algorithm 2.

3.5  Method 3: Continuous Dimension Estimation by EM Algorithm

Maximum likelihood estimate is, under mild regularity conditions, known to be asymptotically unbiased and asymptotically consistent, and it holds asymptotic normality. However, when we consider as parameters simultaneously in our model, equation 3.13, we cannot apply standard theory of maximum likelihood estimation. For example, the Hessian matrix of the likelihood function is
3.19
The second row of is equal to the first row multiplied by . The Hessian is degenerate, and the asymptotic variance of the estimates cannot be calculated.
We derive another algorithm that has asymptotic normality and theoretically estimable finite sample bias based on the EM algorithm (Dempster, Laird, & Rubin, 1977) by treating as a missing variable and as the target parameter to be estimated. Let be the complete data composed of observed data and missing data , respectively. The probability model for is
3.20
The EM algorithm is composed of E- and M-steps, and it iteratively maximizes the likelihood function with respect to a parameter, which is the intrinsic dimension in our case. In the E-step of the EM algorithm, the conditional expectation of complete data log likelihood given the observation and the th temporary value ,
3.21
is evaluated. In expression 3.21, the conditional distribution of is not known; hence, we approximate the distribution by the point mass function concentrated at the maximizer of the likelihood function given current estimate :
3.22
We note that the procedure to obtain is nothing but the maximum likelihood estimate of the Poisson model with fixed . Then the MAP estimate of the function is written as
3.23
In the M-step of the algorithm, we maximize with respect to and obtain the updated estimate of the intrinsic dimension:
3.24
This univariate maximization problem is easily solved by an arbitrary optimization routine. In our implementation, we use the golden section search method. Optimization problems 3.22 and 3.24 are iterated until the gap between and is over prespecified threshold , which was set to 0.01 in our experiments. The proposed EM algorithm for IDE is summarized in algorithm 3.

In the following, we show some theoretical results on algorithm 3.

3.5.1  Convexity

In the E-step of the algorithm, we consider as a constant. The likelihood function with respect to is shown to be convex by calculating the Hessian of the likelihood function:
3.25
3.26
Second derivatives are given by
3.27
3.28
3.29
Then the minus of the Hessian of the likelihood function is given by
which is nonnegative due to the Cauchy-Schwarz inequality. The E-step is shown to be able to find a global optimal hidden variable in each iteration.
We next consider the convexity of the likelihood function in the M-step. The second derivative of the likelihood function with respect to parameter is
3.30
It is obvious that , . In the expression , since is the estimate of , it is assumed to be nonnegative. As for , it can take a negative value when the trace of the Hessian of the pdf is negative. However, assuming that and , it is reasonable to assume . Under this assumption, , and the likelihood function is convex with respect to ; hence, the E-step results in a global optimal estimate of when is fixed.

3.5.2  Asymptotic Variance

The EM algorithm is an algorithm for maximum likelihood estimation. The maximum likelihood estimator has asymptotic normality under standard regularity conditions. Our proposed algorithm 3 also has asymptotic normality of the form
3.31
where is the true intrinsic dimension, is estimated intrinsic dimension by algorithm 3, and is the asymptotic variance. The EM algorithm does not generate an estimate of the asymptotic variance of the estimate as its by-product. However, a number of approximation methods have been proposed. Meng and Rubin (1991) derived the following formula,
3.32
3.33
where is the conditional expected value of the information of the complete data and is the convergence rate of the EM algorithm. In our model, the information is defined by
3.34
where is the estimated parameters by the EM algorithm. It is empirically estimated as
3.35
Combined with the estimated convergence rate, we can evaluate the asymptotic variance of the estimate of intrinsic dimension by the proposed method.

3.5.3  Finite-Sample Bias

The maximum likelihood estimator is in general not unbiased in finite-sample situations. We derive a finite-sample bias of the intrinsic dimension estimator following the method established inRilstone, Srivastava, and Ullah, (1996). From proposition 3.2 of Rilstone et al. (1996), under mild regularity conditions, the bias of the estimate is given by
3.36
where
3.37
is the score function and
3.38
The expectation is taken with respect to the distribution of radius . Substituting , and in equation 3.36, we obtain an explicit formula of the finite-sample bias of the estimate of the intrinsic dimension:
3.39
From equation 3.39, it is seen that the bias of the estimate decreases as the number of epsilons increases, as expected.

3.6  Computational Cost

We conclude this section with estimating the computational costs of the proposed algorithms. For different radii, we calculate by comparing each of sampled to the distances from the inspection point to observed data points and count the number of samples within epsilons. To obtain , we do not have to perform full sorting of distances but only need top selection. For this purpose, we can use partial select algorithm (Hoare, 1961), with computational complexity . Counting requires computation. To maximize the likelihood function, we used the BFGS method, whose computational cost is of order , where is the number of iterations and is the number of variables. is typically below 10, and for algorithms 1 and 3 and for algorithm 2; hence, the computational cost for BFGS is considered to be a constant in our problem. Algorithm 1 maximizes the likelihood function at most times, where is the ambient dimensionality; hence, the computational cost of algorithm 1 is of order . Algorithm 2 utilizes the result of algorithm 1 as an initial estimate. Since the cost for the BFGS algorithm can be regarded as a constant, the order of the computational cost for algorithm 2 remains the same as that of algorithm 1. Algorithm 3 also utilizes the result of algorithm 1, and the computational cost of the following likelihood maximizations with respect to and to are also considered as constants. However, the likelihood maximization is iteratively done by the EM algorithm; hence, the computational cost for algorithm 3 is of order , where is the number of iterations for the convergence of the EM steps in algorithm 3. These computational costs are for estimating a local intrinsic dimension around an inspection point. To obtain a global dimension estimate, the computational costs of our algorithms are for algorithms 1 and 2 and for algorithm 3.

4  Experimental Results

We present some experimental results to illustrate the capability of the proposed method in determining the intrinsic dimension of manifolds. We note that different methods for IDE capture different aspects of data sets. In particular, we consider several IDE methods based on fractal dimensions, and each method is based on a different definition of fractal dimensions. Our aim is to see that our proposed methods offer reasonable estimates when applied to the dimension estimation task of representative data sets in the literature and consistent with conventional intrinsic dimension estimation methods. First, we consider several synthetic data sets with ground-truth intrinsic dimensions in order to show that the proposed method is comparable or superior to conventional methods. It is also shown that as a local intrinsic dimension estimation method, the proposed method is better than the existing method. Then the proposed and conventional methods are evaluated on two image data sets, and it is shown that the proposed method returns reasonable estimates.

We compare the three proposed IDE methods, prop1, prop2, and prop3, with seven conventional IDE methods. The conventional methods based on the fractal dimension include the correlation dimension estimation “corint” in equation 2.10 by Grassberger and Procaccia (1983), a method convU, which is based on the convergence rate of the correlation integral with a -statistic by Hein and Audibert (2005), and the capacity dimension estimators with packing number estimation by a greedy algorithm, packG (Kégl, 2002), and by hierarchical clustering, packT (Eriksson & Crovella, 2012). The manifold adaptive local information dimension estimator “mada” is also included; it is the most similar method to the proposed methods. Other approaches based on the distance distribution from inspection points are the iterative method “ite” proposed in Pettis et al. (1979) and the corrected “mle” proposed in Levina and Bickel (2005) and corrected by MacKay and Ghahramani.1 Most of these methods require two different neighbors: . As with the -nearest neighbor density estimation, must be such that and as . It is also known that through the argument of the minimum least square error, the asymptotically optimal order of is  (Biau, Chazal, Cohen-Steiner, Devroye, & Rodriguez, 2011). The optimal relationship between and is not known. In our preliminary experiments, setting , offered favorable results in many cases: hence, we adopted this parameter setting in our experiments. As for the manifold adaptive dimension estimation, the number of neighbors is set to , which is the recommended value in the original paper (Farahmand et al., 2007).

The local IDEs can estimate the global IDE by adopting the average, median, or mode as the global estimate. When we compare mada and the proposed methods with the global methods, we adopt the median of the local estimates as the estimate of the global IDE.

4.1  Estimation of Global Intrinsic Dimensions

Most examples in this section are two-dimensional manifolds embedded in a three-dimensional space because they enable us to easily understand how the data are embedded and how the applied methods work on them. We enumerate seven data sets for global IDE experiments.

• Spiral (), a one-dimensional curve in a two-dimensional space, with the parametric equations
4.1
where is uniformly sampled from the interval .
• Trefoil knot (), a compact one-dimensional manifold embedded in a three-dimensional space. The parametric equations are ,
4.2
4.3
4.4
where is uniformly sampled from the interval . This is a well-known example of a nontrivial knot in the literature on topology.
• Torus (), a compact two-dimensional manifold embedded in a three-dimensional space. The parametric equations are
4.5
where is uniformly sampled from the interval .
• Swiss roll (), a two-dimensional manifold embedded in a three-dimensional space, with parametric equations
4.6
where is uniformly sampled from the interval .
• Nondeformable Swiss roll (),
4.7
where is uniformly distributed in the interval .
• Möbius strip (), a two-dimensional manifold embedded in a three-dimensional space, with parametric equations
4.8
where is uniformly sampled from the interval .
• Spherical shell (), a compact -dimensional manifold embedded in a -dimensional space. When , its parametric equations are
4.9
We used this data set for observing the effect of dimensionality on the accuracy of the intrinsic dimension estimation methods by varying from three to nine by two.

The realizations of each data set are shown in Figures 2a to 2g.

Figure 2:

Samples of data sets.

Figure 2:

Samples of data sets.

To evaluate estimation performance, we repeated random sampling for generating these data sets 50 times and estimated the IDEs using nine methods. The number of observations for each data set was set to , or 3000. In addition, it is interesting to investigate the effects of errors and outliers on the observations. For each of the observations , we put an additive noise with zero mean and standard deviation equal to 1% of the range of the observation to generate the data sets with noise. We also generated the data sets with outliers by replacing 5% of the observations by the realizations of the uniform distribution with the domain equal to the range of the observation . The box plots of the estimated dimensions are shown in Figures 3 to 9.

Figure 3:

Estimated dimensions for Spiral data.

Figure 3:

Estimated dimensions for Spiral data.

Figure 4:

Estimated dimensions for Trefoil knot data.

Figure 4:

Estimated dimensions for Trefoil knot data.

Figure 5:

Estimated dimensions for Torus data.

Figure 5:

Estimated dimensions for Torus data.

Figure 6:

Estimated dimensions for Swiss roll data.

Figure 6:

Estimated dimensions for Swiss roll data.

Figure 7:

Estimated dimensions for Nondeformable Swiss roll data.

Figure 7:

Estimated dimensions for Nondeformable Swiss roll data.

Figure 8:

Estimated dimensions for Möbius strip data.

Figure 8:

Estimated dimensions for Möbius strip data.

Figure 9:

Estimated dimensions for Spherical shell data ().

Figure 9:

Estimated dimensions for Spherical shell data ().

These results show that the three proposed methods are comparable to or superior to conventional IDE methods.

Since the additive noise would affect the dimensionality (i.e., when the noise is perpendicular to the tangent of the manifold), it is not clear whether the intrinsic dimension of the data set should be regarded as the one without noise or it should be increased. It is reasonable that proposed method 1, which outputs only natural numbers as estimates of the IDE, is robust against noise, while the other methods, including proposed methods 2 and 3, tend to overestimate the IDE when there is additive noise. Also, we can see that although it is possible to derive some theoretical properties and guarantees for proposed algorithm 3, in the sense of the performance on dimension estimation tasks, algorithms 1 or 2 would be the choice.

4.1.1  Computational Cost

Overall, the estimation accuracy of the proposed methods are superior to other conventional methods in many data sets, and our methods show robustness to noise and outliers. Of course, the accuracy of the proposed methods cannot be obtained without cost. We measured the computational costs of dimension estimation methods for the Swiss roll data set, where the number of observations varied from 1000 to 3000 by 500. Data sets of size are repeatedly generated 100 times. The results are shown in Figure 10.

Figure 10:

Computational costs and number of observations.

Figure 10:

Computational costs and number of observations.

The computational cost of algorithm 3 (prop.3) is more than double those of other methods, and its accuracy is not significantly better than those of algorithms 1 and 2 (prop.1 and prop.2). Although algorithm 3 allows us theoretical analysis, we recommend algorithms 1 or 2 in practical use. The computational costs of our proposed methods grow almost quadratically, as we expected from the theoretical evaluation in section 3.6. Among various methods, mle and ite are computationally very efficient. The computational cost of our proposed methods is higher than most of the other methods, excluding packT. Therefore, our proposed methods offer accurate estimation results, though they should be used when we have enough time.

4.1.2  Effect of Dimension

For the Spherical shell data set, we ran another experiment in which the number of observations was fixed as and the ambient dimensionality was varied from to at intervals of two. The experimental results are shown in Figure 11. The ground-truth IDEs are shown at the top of the panels. These results show that when the intrinsic dimension is low, most of the methods work well; however, the estimates have downward biases when the intrinsic dimension increases. It is known that distance-based dimension estimators tend to underestimate the true dimension and that noise may pollute the estimation. There are a few approaches to alleviate this bias (Camastra & Vinciarelli, 2002; Rozza, Lombardi, Ceruti, Casiraghi, & Campadelli, 2012), and considering the bias correction in our proposed methods is an important direction of our future work.

Figure 11:

Spherical shell data with increasing dimensionalities and their estimates. The ground-truth intrinsic dimension is shown at the top of each panel.

Figure 11:

Spherical shell data with increasing dimensionalities and their estimates. The ground-truth intrinsic dimension is shown at the top of each panel.

4.2  Estimation of Local Intrinsic Dimensions

The manifold adaptive dimension estimation method mada and the three proposed methods, prop1, prop2, and prop3, are capable of estimating local intrinsic dimension. To evaluate their performance as local IDE methods, we generated a data set with different intrinsic dimensions for different points and evaluated their point-wise estimation performance. The data set is shown in Figure 2h, where a line (), a disc (), a filled ball (), and another line () are stacked along the vertical axis in a three-dimensional space. Data points are uniformly distributed within each region. The proportions of the number of points for the four parts of the data set are 5%, 20%, 70%, and 5%, respectively. The total number of data sets was set to , or 5000. Data sets of size were randomly generated 50 times. The average and standard deviation of the root mean square errors of the estimated dimensions for the four different regions are listed in Table 1. From this table, it can be seen that the proposed methods, particularly the continuous method, prop2, outperforms the conventional method, mada. This is also observed in Figure 12, which shows the averages of the estimated dimensions along the vertical axis of the data set in Figure 2h.

Figure 12:

Averages of 50 trials for local IDE by the manifold adaptive method and proposed methods for discrete and continuous local IDE. The sample size is . The vertical axis corresponds to that of the data set line-disc-ball-line, and the horizontal axis corresponds to the estimated dimensions. The red dotted lines represent one, two, and three dimensions. The proposed methods offer accurate and stable estimates compared to the conventional method.

Figure 12:

Averages of 50 trials for local IDE by the manifold adaptive method and proposed methods for discrete and continuous local IDE. The sample size is . The vertical axis corresponds to that of the data set line-disc-ball-line, and the horizontal axis corresponds to the estimated dimensions. The red dotted lines represent one, two, and three dimensions. The proposed methods offer accurate and stable estimates compared to the conventional method.

Table 1:
Root Mean Square Errors of the IDEs for the Joints of Manifolds with Different Dimensions.
LineDiscBallLine
Sample Size100030005000100030005000100030005000100030005000
mada 0.48(0.13) 0.46(0.11) 0.40(0.11) 1.27(0.64) 1.26(1.08) 1.13(1.02) 1.29(0.32) 1.34(0.51) 1.15(0.36) 0.53(0.20) 0.52(0.20) 0.39(0.08)
prop1 0.06(0.13) 0.05(0.08) 0.06(0.12) 0.80(0.13) 0.66(0.11) 0.63(0.11) 0.99(0.10) 1.05(0.10) 1.08(0.10) 0.15(0.23) 0.13(0.22) 0.07(0.13)
prop2 0.08(0.11) 0.08(0.03) 0.10(0.07) 0.74 (0.12) 0.58(0.11) 0.50(0.11) 0.44(0.06) 0.39(0.04) 0.38(0.04) 0.17(0.06) 0.11(0.06) 0.09(0.03)
prop3 0.08(0.12) 0.10(0.04) 0.12(0.08) 0.76 (0.13) 0.62(0.11) 0.60(0.11) 0.97(0.10) 1.04(0.10) 1.06(0.10) 0.17(0.20) 0.17(0.19) 0.12(0.10)
LineDiscBallLine
Sample Size100030005000100030005000100030005000100030005000
mada 0.48(0.13) 0.46(0.11) 0.40(0.11) 1.27(0.64) 1.26(1.08) 1.13(1.02) 1.29(0.32) 1.34(0.51) 1.15(0.36) 0.53(0.20) 0.52(0.20) 0.39(0.08)
prop1 0.06(0.13) 0.05(0.08) 0.06(0.12) 0.80(0.13) 0.66(0.11) 0.63(0.11) 0.99(0.10) 1.05(0.10) 1.08(0.10) 0.15(0.23) 0.13(0.22) 0.07(0.13)
prop2 0.08(0.11) 0.08(0.03) 0.10(0.07) 0.74 (0.12) 0.58(0.11) 0.50(0.11) 0.44(0.06) 0.39(0.04) 0.38(0.04) 0.17(0.06) 0.11(0.06) 0.09(0.03)
prop3 0.08(0.12) 0.10(0.04) 0.12(0.08) 0.76 (0.13) 0.62(0.11) 0.60(0.11) 0.97(0.10) 1.04(0.10) 1.06(0.10) 0.17(0.20) 0.17(0.19) 0.12(0.10)

Note: The average and one standard deviation of errors over 50 trials are shown.

4.3  Image Data Sets

Finally, we compared the IDE methods on two image data sets: the Isomap face database2 and the CMU hand rotation sequence3 used in Kégl (2002) and Levina and Bickel (2005).

The face database consists of images of an artificial face under three varying conditions: illumination, and vertical and horizontal orientations. The hand image data constitute a real video sequence of a hand rotating along a one-dimensional curve in space; hence, its intrinsic dimension is one.

The estimated dimensions are listed in Table 2. For the face data set, most methods estimate that the intrinsic dimension is around three, which is consistent with our intuition that there are three degree of freedom corresponding to illumination, and vertical and horizontal orientations. For the hand rotation data, the intrinsic dimension should be one, and our proposed methods successfully estimate the intrinsic dimension as one. We note that mle results in 2.88 and corint, convU, packG, and ite result in about two. Kégl (2002) showed that their proposed method estimates intrinsic dimensions of this hand data between one to four depending on the tuning parameter, while Levina and Bickel (2005) argued that for representing image, we need three bases; hence the estimated intrinsic dimension around three is reasonable. There are no ground-truth intrinsic dimensions in these data sets, but all of the methods estimated between one and three as the intrinsic dimension, and we can conclude that as a consensus of all methods, the intrinsic complexity of these data sets is relatively low despite their apparent high dimensionalities. Among many methods, our algorithm 2 estimates the intrinsic dimension, which coincides with our intuition on these data sets.

Table 2:
Estimated Dimensions for Popular Image Data Sets for Manifold Learning.
Faces  698 4.43 2.53 3.00 3.77 2.12 2.65 2.49 3.00 2.00 2.89
Hands  481 3.58 3.76 3.00 4.60 3.17 4.07 3.37 4.00 1.21 4.02
Faces  698 4.43 2.53 3.00 3.77 2.12 2.65 2.49 3.00 2.00 2.89
Hands  481 3.58 3.76 3.00 4.60 3.17 4.07 3.37 4.00 1.21 4.02

5  Conclusion

We proposed local intrinsic dimension estimation methods based on the estimation of a local information dimension, which is a special case of the fractal dimension. In contrast to conventional methods, the proposed methods are based on a higher-order expansion of the probability mass function around a data point. For estimating the intrinsic dimension, a generalized linear model in which the response variable follows a Poisson distribution is fitted by the maximum likelihood method with respect to both the intrinsic dimension and the regression coefficients. The proposed methods offered estimation accuracies that are comparable to or better than those of conventional methods over various data sets for global IDE. For local IDE, the proposed methods outperformed the conventional method to a considerable extent.

As with most conventional methods, the proposed methods need to determine the parameters for specifying the neighborhood. Directions for future work include the development of a method to determine parameters based on the asymptotic theory of density estimation and the generalized linear model and more detailed theoretical error analysis of the proposed methods.

The IDE plays the essential role of preprocessing in manifold learning methods (Ma and Fu, 2011), dimensionality reduction methods (Lee & Verleysen, 2007), and subspace methods (Oja, 1983); hence, it is important to investigate the effectiveness of the proposed methods when used with the above-mentioned methods.

Appendix: Second-Order Expansion of Probability Mass

The probability mass contained within the -ball centered at inspection point is defined as
A.1
Using a second-order Taylor series expansion of , we obtain
A.2
where . The integrand of the quadratic term is written as . By symmetry, the off-diagonal elements of the quadratic term become zero when integrated in the ball, and only diagonal elements remain. The remaining integral is
Substituting the above results back to equation A.2, we obtain
which proves the proposition.

Acknowledgments

We express our special thanks to the editor and reviewers whose comments led to valuable improvements of this letter. This work was partly supported by JSPS KAKENHI No. 25120009, 25120011, 16H02842, and 16K16108.

References

Ali
,
S. M.
, &
Silvey
,
S. D.
(
1966
).
A general class of coefficients of divergence of one distribution from another
.
Journal of the Royal Statistical Society. Series B (Methodological)
,
28
,
131
142
.
Amsaleg
,
L.
,
Chelly
,
O.
,
Furon
,
T.
,
Girard
,
S.
,
Houle
,
M. E.
,
Kawarabayashi
,
K.
, &
Nett
,
M.
(
2015
).
Estimating local intrinsic dimensionality
. In
Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
(pp.
29
38
),
New York
:
ACM
.
Biau
,
G.
,
Chazal
,
F.
,
Cohen-Steiner
,
D.
,
Devroye
,
L.
, &
Rodriguez
,
C.
(
2011
).
A weighted -nearest neighbor density estimate for geometric inference
.
Electronic Journal of Statistics
,
5
,
204
237
.
Brand
,
M.
(
2002
).
Charting a manifold
. In
S.
Becker
,
S.
Thrun
, &
K.
Obermayer
(Eds.),
Advances in neural information processing systems, 15
(pp.
961
968
).
Cambridge, MA
:
MIT Press
.
Bruske
,
J.
, &
Sommer
,
G.
(
1998
).
Intrinsic dimensionality estimation with optimally topology preserving maps
.
IEEE Transactions on Pattern Analysis and Machine Intelligence
,
20
(
5
),
572
575
.
Camastra
,
F.
, &
Vinciarelli
,
A.
(
2002
).
Estimating the intrinsic dimension of data with a fractal-based method
.
IEEE Transactions on Pattern Analysis and Machine Intelligence
,
24
(
10
),
1404
1407
.
Cook
,
R. D.
, &
Yin
,
X.
(
2001
).
Dimension reduction and visualization in discriminant analysis (with discussion)
.
Australian and New Zealand Journal of Statistics
,
43
(
2
),
147
199
.
Costa
,
J. A.
, &
Hero
,
A. O. III
(
2004
).
Geodesic entropic graphs for dimension and entropy estimation in manifold learning
.
IEEE Transactions on Signal Processing
,
52
,
2210
2221
.
Csiszár
,
I.
, &
Shields
,
P. C.
(
2004
).
Information theory and statistics: A tutorial
.
Communication of Information Theory
,
1
(
4
),
417
528
.
Dempster
,
A. P.
,
Laird
,
N. M.
, &
Rubin
,
D. B.
(
1977
).
Maximum likelihood from incomplete data via the EM algorithm
.
Journal of the Royal Statistical Society, Series B
,
39
,
1
38
.
Dobson
,
A. J.
(
2002
).
An introduction to generalized linear models
.
Boca Raton, FL
:
Chapman & Hall/CRC
.
Eriksson
,
B.
, &
Crovella
,
M.
(
2012
).
Estimating intrinsic dimension via clustering
. In
Proceedings of IEEE Workshop on Statistical Signal Processing
(pp.
760
763
).
Piscataway, NJ
:
IEEE
.
Falconer
,
K. J.
(
1990
).
Fractal geometry: Mathematical foundations and applications
.
New York
:
Wiley
.
Fan
,
M.
,
Qiao
,
H.
, &
Zhang
,
B.
(
2009
).
Intrinsic dimension estimation of manifolds by incising balls
.
Pattern Recognition
,
42
(
5
),
780
787
.
Farahmand
,
A. M.
,
Szepesvári
,
C.
, &
Audibert
,
J-Y.
(
2007
).
. In
Z.
Ghahramani
(Ed.).
Proceedings of the Twenty-Fourth International Conference on Machine Learning
(pp.
265
272
).
Corvallis, OR
:
Omnipress
.
Fukunaga
,
K.
, &
Olsen
,
D. R.
(
1971
).
An algorithm for finding intrinsic dimensionality of data
.
IEEE Transaction on Computers
,
20
(
2
),
176
183
.
Grassberger
,
P.
, &
Procaccia
,
I.
(
1983
).
Measuring the strangeness of strange attractors
.
Physica D: Nonlinear Phenomena
,
9
(
1–2
),
189
208
.
Gupta
,
M. D.
, &
Huang
,
T. S.
(
2010
).
Regularized maximum likelihood for intrinsic dimension estimation
. In
P.
Grunwald
, &
P.
Spirtes
(Eds.),
Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence
(pp.
220
227
).
Corvallis, OR
:
AUAI Press
.
Hein
,
M.
, &
Audibert
,
J-Y.
(
2005
).
Intrinsic dimensionality estimation of submanifolds in R
. In
Proceedings of the Twenty-Second International Conference on Machine Learning
(pp.
289
296
).
New York
:
ACM
.
Hentschel
,
H. G. E.
, &
Procaccia
,
I.
(
1983
).
The infinite number of generalized dimensions of fractals and strange attractors
.
Physica D: Nonlinear Phenomena
,
8
(
3
),
435
444
.
Hoare
,
C. A. R.
(
1961
).
Algorithm 64: Quicksort
.
Commun. ACM
,
4
(
7
),
321
.
Houle
,
M. E.
(
2013
).
Dimensionality, discriminability, density and distance distributions
. In
Proceedings of the 13th IEEE International Conference on Data Mining Workshops
(pp.
468
473
).
:
SIAM
.
Houle
,
M. E.
,
Kashima
,
H.
, &
Nett
,
M.
(
2012
)
Generalized expansion dimension
. In
Proceedings of the 13th IEEE International Conference on Data Mining Workshops
(pp.
587
594
).
:
SIAM
.
Kambhatla
,
N.
, &
Leen
,
T. K.
(
1997
).
Dimension reduction by local principal component analysis
.
Neural Computation
,
9
(
7
),
1493
1516
.
Karger
,
D. R.
, &
Ruhl
,
M.
(
2002
).
Finding nearest neighbors in growth-restricted metrics
. In
Proceedings of the Thirty-Fourth Annual ACM Symposium on Theory of Computing
(pp.
741
750
).
New York
:
ACM
.
Kégl
,
B.
(
2002
).
Intrinsic dimension estimation using packing numbers
. In
S. Becker S.
Thrun
, &
K.
Obermayer
(Eds.),
Advances in neural information processing systems, 15
(pp.
681
688
).
Cambridge, MA
:
MIT Press
.
Kokiopoulou
,
E.
, &
,
Y.
(
2007
).
Orthogonal neighborhood preserving projections: A projection-based dimensionality reduction technique
.
IEEE Transactions on Pattern Analysis and Machine Intelligence
,
29
(
12
),
2143
2156
.
Lee
,
J. A.
, &
Verleysen
,
M.
(
2007
).
Nonlinear dimensionality reduction
.
New York
:
Springer
.
Levina
,
E.
, &
Bickel
,
P. J.
(
2005
).
Maximum likelihood estimation of intrinsic dimension
. In
L. K.
Saul
,
Y.
Weiss
, &
L.
Bottou
(Eds.),
Advances in neural information processing systems, 17
(pp.
777
784
).
Cambridge, MA
:
MIT Press
.
Ma
,
Y.
, &
Fu
,
Y.
(
2011
).
Manifold learning theory and applications
.
Boca Raton, FL
:
CRC Press
.
Meng
,
X-L.
, &
Rubin
,
D. B.
(
1991
).
Using EM to obtain asymptotic variance-covariance matrices: The SEM algorithm
.
Journal of the American Statistical Association
,
86
(
416
),
899
909
.
Moon
,
K. R.
, &
Hero
,
A. O.
(
2014
).
Multivariate f-divergence estimation with confidence
. In
Z.
Ghahramani
,
M.
Welling
,
C.
Cortes
,
N. D.
Lawrence
, &
K. Q.
Weinberger
(Eds.),
Advances in neural information processing systems, 27
(pp.
2420
2428
).
Red Hook, NY
:
Curran
.
Oja
,
E.
(
1983
).
Subspace methods of pattern recognition
.
Hertfordshire, England
:
Research Studies Press
.
Pesin
,
Y. B.
(
1993
).
On rigorous mathematical definitions of correlation dimension and generalized spectrum for dimensions
.
Journal of Statistical Physics
,
71
(
3
),
529
547
.
Pesin
,
Y. B.
(
1997
).
Dimension theory in dynamical systems: Contemporary views and applications
.
Chicago
:
University of Chicago Press
.
Pettis
,
K. W.
,
Bailey
,
T. A.
,
Jain
,
A. K.
, &
Dubes
,
R. C.
(
1979
).
An intrinsic dimensionality estimator from near-neighbor information
.
IEEE Transactions on Pattern Analysis and Machine Intelligence
,
1
(
1
),
25
37
.
Rényi
,
A.
(
1959
).
On the dimension and entropy of probability distributions
.
,
10
(
1
),
193
215
.
Rilstone
,
P.
,
Srivastava
,
V. K.
, &
Ullah
,
A.
(
1996
).
The second-order bias and mean squared error of nonlinear estimators
.
Journal of Econometrics
,
75
(
2
),
369
395
.
Rozza
,
A.
,
Lombardi
,
G.
,
Ceruti
,
C.
,
Casiraghi
,
E.
, &
,
P.
(
2012
).
Novel high intrinsic dimensionality estimators
.
Machine Learning
,
89
(
1
),
37
65
.
Sricharan
,
K.
, &
Hero
,
A. O.
(
2012
).
Ensemble weighted kernel estimators for multivariate entropy estimation
. In
F.
Pereira
,
C.J.C.
Burges
,
L.
Bottou
, &
K. Q.
Weinberger
(Eds.).
Advances in neural information processing systems
,
25
(pp.
566
574
).
Red Hook, NY
:
Curran
.
Tenenbaum
,
J. B.
,
Silva
,
V.
, &
Langford
,
J. C.
(
2000
).
A global geometric framework for nonlinear dimensionality reduction
.
Science
,
290
(
5500
),
2319
2323
.
Tsybakov
,
A. B.
(
2009
).
Introduction to nonparametric estimation
.
New York
:
Springer
.
Verveer
,
P. J.
, &
Duin
,
R. P. W.
(
1995
).
An evaluation of intrinsic dimensionality estimators
.
IEEE Transactions on Pattern Analysis and Machine Intelligence
,
17
(
1
),
81
86
.