Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
TocHeadingTitle
Date
Availability
1-12 of 12
Hideitsu Hino
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Neural Computation (2023) 35 (1): 82–103.
Published: 01 January 2023
Abstract
View article
PDF
We propose a nonlinear probabilistic generative model of Koopman mode decomposition based on an unsupervised gaussian process. Existing data-driven methods for Koopman mode decomposition have focused on estimating the quantities specified by Koopman mode decomposition: eigenvalues, eigenfunctions, and modes. Our model enables the simultaneous estimation of these quantities and latent variables governed by an unknown dynamical system. Furthermore, we introduce an efficient strategy to estimate the parameters of our model by low-rank approximations of covariance matrices. Applying the proposed model to both synthetic data and a real-world epidemiological data set, we show that various analyses are available using the estimated parameters.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2022) 34 (12): 2432–2466.
Published: 08 November 2022
Abstract
View article
PDF
Domain adaptation aims to transfer knowledge of labeled instances obtained from a source domain to a target domain to fill the gap between the domains. Most domain adaptation methods assume that the source and target domains have the same dimensionality. Methods that are applicable when the number of features is different in each domain have rarely been studied, especially when no label information is given for the test data obtained from the target domain. In this letter, it is assumed that common features exist in both domains and that extra (new additional) features are observed in the target domain; hence, the dimensionality of the target domain is higher than that of the source domain. To leverage the homogeneity of the common features, the adaptation between these source and target domains is formulated as an optimal transport (OT) problem. In addition, a learning bound in the target domain for the proposed OT-based method is derived. The proposed algorithm is validated using both simulated and real-world data.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2022) 34 (9): 1944–1977.
Published: 16 August 2022
FIGURES
Abstract
View article
PDF
Many machine learning methods assume that the training and test data follow the same distribution. However, in the real world, this assumption is often violated. In particular, the marginal distribution of the data changes, called covariate shift, is one of the most important research topics in machine learning. We show that the well-known family of covariate shift adaptation methods is unified in the framework of information geometry. Furthermore, we show that parameter search for a geometrically generalized covariate shift adaptation method can be achieved efficiently. Numerical experiments show that our generalization can achieve better performance than the existing methods it encompasses.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2020) 32 (10): 1901–1935.
Published: 01 October 2020
FIGURES
| View All (5)
Abstract
View article
PDF
Principal component analysis (PCA) is a widely used method for data processing, such as for dimension reduction and visualization. Standard PCA is known to be sensitive to outliers, and various robust PCA methods have been proposed. It has been shown that the robustness of many statistical methods can be improved using mode estimation instead of mean estimation, because mode estimation is not significantly affected by the presence of outliers. Thus, this study proposes a modal principal component analysis (MPCA), which is a robust PCA method based on mode estimation. The proposed method finds the minor component by estimating the mode of the projected data points. As a theoretical contribution, probabilistic convergence property, influence function, finite-sample breakdown point, and its lower bound for the proposed MPCA are derived. The experimental results show that the proposed method has advantages over conventional methods.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2017) 29 (7): 1838–1878.
Published: 01 July 2017
FIGURES
| View All (14)
Abstract
View article
PDF
We propose a method for intrinsic dimension estimation. By fitting the power of distance from an inspection point and the number of samples included inside a ball with a radius equal to the distance, to a regression model, we estimate the goodness of fit. Then, by using the maximum likelihood method, we estimate the local intrinsic dimension around the inspection point. The proposed method is shown to be comparable to conventional methods in global intrinsic dimension estimation experiments. Furthermore, we experimentally show that the proposed method outperforms a conventional local dimension estimation method.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2016) 28 (12): 2687–2725.
Published: 01 December 2016
FIGURES
| View All (60)
Abstract
View article
PDF
This study considers the common situation in data analysis when there are few observations of the distribution of interest or the target distribution, while abundant observations are available from auxiliary distributions. In this situation, it is natural to compensate for the lack of data from the target distribution by using data sets from these auxiliary distributions—in other words, approximating the target distribution in a subspace spanned by a set of auxiliary distributions. Mixture modeling is one of the simplest ways to integrate information from the target and auxiliary distributions in order to express the target distribution as accurately as possible. There are two typical mixtures in the context of information geometry: the - and -mixtures. The -mixture is applied in a variety of research fields because of the presence of the well-known expectation-maximazation algorithm for parameter estimation, whereas the -mixture is rarely used because of its difficulty of estimation, particularly for nonparametric models. The -mixture, however, is a well-tempered distribution that satisfies the principle of maximum entropy. To model a target distribution with scarce observations accurately, this letter proposes a novel framework for a nonparametric modeling of the -mixture and a geometrically inspired estimation algorithm. As numerical examples of the proposed framework, a transfer learning setup is considered. The experimental results show that this framework works well for three types of synthetic data sets, as well as an EEG real-world data set.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2014) 26 (9): 2074–2101.
Published: 01 September 2014
FIGURES
| View All (7)
Abstract
View article
PDF
Clustering is a representative of unsupervised learning and one of the important approaches in exploratory data analysis. By its very nature, clustering without strong assumption on data distribution is desirable. Information-theoretic clustering is a class of clustering methods that optimize information-theoretic quantities such as entropy and mutual information. These quantities can be estimated in a nonparametric manner, and information-theoretic clustering algorithms are capable of capturing various intrinsic data structures. It is also possible to estimate information-theoretic quantities using a data set with sampling weight for each datum. Assuming the data set is sampled from a certain cluster and assigning different sampling weights depending on the clusters, the cluster-conditional information-theoretic quantities are estimated. In this letter, a simple iterative clustering algorithm is proposed based on a nonparametric estimator of the log likelihood for weighted data sets. The clustering algorithm is also derived from the principle of conditional entropy minimization with maximum entropy regularization. The proposed algorithm does not contain a tuning parameter. The algorithm is experimentally shown to be comparable to or outperform conventional nonparametric clustering methods.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2014) 26 (7): 1455–1483.
Published: 01 July 2014
FIGURES
| View All (24)
Abstract
View article
PDF
A graph is a mathematical representation of a set of variables where some pairs of the variables are connected by edges. Common examples of graphs are railroads, the Internet, and neural networks. It is both theoretically and practically important to estimate the intensity of direct connections between variables. In this study, a problem of estimating the intrinsic graph structure from observed data is considered. The observed data in this study are a matrix with elements representing dependency between nodes in the graph. The dependency represents more than direct connections because it includes influences of various paths. For example, each element of the observed matrix represents a co-occurrence of events at two nodes or a correlation of variables corresponding to two nodes. In this setting, spurious correlations make the estimation of direct connection difficult. To alleviate this difficulty, a digraph Laplacian is used for characterizing a graph. A generative model of this observed matrix is proposed, and a parameter estimation algorithm for the model is also introduced. The notable advantage of the proposed method is its ability to deal with directed graphs, while conventional graph structure estimation methods such as covariance selections are applicable only to undirected graphs. The algorithm is experimentally shown to be able to identify the intrinsic graph structure.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2012) 24 (7): 1853–1881.
Published: 01 July 2012
FIGURES
| View All (7)
Abstract
View article
PDF
Kernel methods are known to be effective for nonlinear multivariate analysis. One of the main issues in the practical use of kernel methods is the selection of kernel. There have been a lot of studies on kernel selection and kernel learning. Multiple kernel learning (MKL) is one of the promising kernel optimization approaches. Kernel methods are applied to various classifiers including Fisher discriminant analysis (FDA). FDA gives the Bayes optimal classification axis if the data distribution of each class in the feature space is a gaussian with a shared covariance structure. Based on this fact, an MKL framework based on the notion of gaussianity is proposed. As a concrete implementation, an empirical characteristic function is adopted to measure gaussianity in the feature space associated with a convex combination of kernel functions, and two MKL algorithms are derived. From experimental results on some data sets, we show that the proposed kernel learning followed by FDA offers strong classification power.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2011) 23 (6): 1623–1659.
Published: 01 June 2011
FIGURES
| View All (29)
Abstract
View article
PDF
The Bradley-Terry model is a statistical representation for one's preference or ranking data by using pairwise comparison results of items. For estimation of the model, several methods based on the sum of weighted Kullback-Leibler divergences have been proposed from various contexts. The purpose of this letter is to interpret an estimation mechanism of the Bradley-Terry model from the viewpoint of flatness, a fundamental notion used in information geometry. Based on this point of view, a new estimation method is proposed on a framework of the em algorithm. The proposed method is different in its objective function from that of conventional methods, especially in treating unobserved comparisons, and it is consistently interpreted in a probability simplex. An estimation method with weight adaptation is also proposed from a viewpoint of the sensitivity. Experimental results show that the proposed method works appropriately, and weight adaptation improves accuracy of the estimate.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2010) 22 (11): 2887–2923.
Published: 01 November 2010
FIGURES
| View All (4)
Abstract
View article
PDF
Reducing the dimensionality of high-dimensional data without losing its essential information is an important task in information processing. When class labels of training data are available, Fisher discriminant analysis (FDA) has been widely used. However, the optimality of FDA is guaranteed only in a very restricted ideal circumstance, and it is often observed that FDA does not provide a good classification surface for many real problems. This letter treats the problem of supervised dimensionality reduction from the viewpoint of information theory and proposes a framework of dimensionality reduction based on class-conditional entropy minimization. The proposed linear dimensionality-reduction technique is validated both theoretically and experimentally. Then, through kernel Fisher discriminant analysis (KFDA), the multiple kernel learning problem is treated in the proposed framework, and a novel algorithm, which iteratively optimizes the parameters of the classification function and kernel combination coefficients, is proposed. The algorithm is experimentally shown to be comparable to or outperforms KFDA for large-scale benchmark data sets, and comparable to other multiple kernel learning techniques on the yeast protein function annotation task.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2010) 22 (9): 2417–2451.
Published: 01 September 2010
FIGURES
| View All (15)
Abstract
View article
PDF
Given a set of rating data for a set of items, determining preference levels of items is a matter of importance. Various probability models have been proposed to solve this task. One such model is the Plackett-Luce model, which parameterizes the preference level of each item by a real value. In this letter, the Plackett-Luce model is generalized to cope with grouped ranking observations such as movie or restaurant ratings. Since it is difficult to maximize the likelihood of the proposed model directly, a feasible approximation is derived, and the em algorithm is adopted to find the model parameter by maximizing the approximate likelihood which is easily evaluated. The proposed model is extended to a mixture model, and two applications are proposed. To show the effectiveness of the proposed model, numerical experiments with real-world data are carried out.