Skip Nav Destination

*PDF*
*PDF*
*PDF*
*PDF*
*PDF*
*PDF*

Update search

### NARROW

Format

Journal

TocHeadingTitle

Date

Availability

1-6 of 6

Shotaro Akaho

Close
**Follow your search**

Access your saved searches in your account

Would you like to receive an alert when new items match your search?

*Close Modal*

Sort by

Journal Articles

Publisher: Journals Gateway

*Neural Computation*(2021) 33 (8): 2274–2307.

Published: 26 July 2021

Abstract

View article
The Fisher information matrix (FIM) plays an essential role in statistics and machine learning as a Riemannian metric tensor or a component of the Hessian matrix of loss functions. Focusing on the FIM and its variants in deep neural networks (DNNs), we reveal their characteristic scale dependence on the network width, depth, and sample size when the network has random weights and is sufficiently wide. This study covers two widely used FIMs for regression with linear output and for classification with softmax output. Both FIMs asymptotically show pathological eigenvalue spectra in the sense that a small number of eigenvalues become large outliers depending on the width or sample size, while the others are much smaller. It implies that the local shape of the parameter space or loss landscape is very sharp in a few specific directions while almost flat in the other directions. In particular, the softmax output disperses the outliers and makes a tail of the eigenvalue density spread from the bulk. We also show that pathological spectra appear in other variants of FIMs: one is the neural tangent kernel; another is a metric for the input signal and feature space that arises from feedforward signal propagation. Thus, we provide a unified perspective on the FIM and its variants that will lead to more quantitative understanding of learning in large-scale DNNs.

Journal Articles

Publisher: Journals Gateway

*Neural Computation*(2017) 29 (7): 1838–1878.

Published: 01 July 2017

FIGURES
| View All (14)

Abstract

View article
We propose a method for intrinsic dimension estimation. By fitting the power of distance from an inspection point and the number of samples included inside a ball with a radius equal to the distance, to a regression model, we estimate the goodness of fit. Then, by using the maximum likelihood method, we estimate the local intrinsic dimension around the inspection point. The proposed method is shown to be comparable to conventional methods in global intrinsic dimension estimation experiments. Furthermore, we experimentally show that the proposed method outperforms a conventional local dimension estimation method.

Journal Articles

Publisher: Journals Gateway

*Neural Computation*(2016) 28 (12): 2687–2725.

Published: 01 December 2016

FIGURES
| View All (60)

Abstract

View article
This study considers the common situation in data analysis when there are few observations of the distribution of interest or the target distribution, while abundant observations are available from auxiliary distributions. In this situation, it is natural to compensate for the lack of data from the target distribution by using data sets from these auxiliary distributions—in other words, approximating the target distribution in a subspace spanned by a set of auxiliary distributions. Mixture modeling is one of the simplest ways to integrate information from the target and auxiliary distributions in order to express the target distribution as accurately as possible. There are two typical mixtures in the context of information geometry: the - and -mixtures. The -mixture is applied in a variety of research fields because of the presence of the well-known expectation-maximazation algorithm for parameter estimation, whereas the -mixture is rarely used because of its difficulty of estimation, particularly for nonparametric models. The -mixture, however, is a well-tempered distribution that satisfies the principle of maximum entropy. To model a target distribution with scarce observations accurately, this letter proposes a novel framework for a nonparametric modeling of the -mixture and a geometrically inspired estimation algorithm. As numerical examples of the proposed framework, a transfer learning setup is considered. The experimental results show that this framework works well for three types of synthetic data sets, as well as an EEG real-world data set. Abstract This study considers the common situation in data analysis when there are few observations of the distribution of interest or the target distribution, while abundant observations are available from auxiliary distributions. In this situation, it is natural to compensate for the lack of data from the target distribution by using data sets from these auxiliary distributions—in other words, approximating the target distribution in a subspace spanned by a set of auxiliary distributions. Mixture modeling is one of the simplest ways to integrate information from the target and auxiliary distributions in order to express the target distribution as accurately as possible. There are two typical mixtures in the context of information geometry: the - and -mixtures. The -mixture is applied in a variety of research fields because of the presence of the well-known expectation-maximazation algorithm for parameter estimation, whereas the -mixture is rarely used because of its difficulty of estimation, particularly for nonparametric models. The -mixture, however, is a well-tempered distribution that satisfies the principle of maximum entropy. To model a target distribution with scarce observations accurately, this letter proposes a novel framework for a nonparametric modeling of the -mixture and a geometrically inspired estimation algorithm. As numerical examples of the proposed framework, a transfer learning setup is considered. The experimental results show that this framework works well for three types of synthetic data sets, as well as an EEG real-world data set. Abstract This study considers the common situation in data analysis when there are few observations of the distribution of interest or the target distribution, while abundant observations are available from auxiliary distributions. In this situation, it is natural to compensate for the lack of data from the target distribution by using data sets from these auxiliary distributions—in other words, approximating the target distribution in a subspace spanned by a set of auxiliary distributions. Mixture modeling is one of the simplest ways to integrate information from the target and auxiliary distributions in order to express the target distribution as accurately as possible. There are two typical mixtures in the context of information geometry: the - and -mixtures. The -mixture is applied in a variety of research fields because of the presence of the well-known expectation-maximazation algorithm for parameter estimation, whereas the -mixture is rarely used because of its difficulty of estimation, particularly for nonparametric models. The -mixture, however, is a well-tempered distribution that satisfies the principle of maximum entropy. To model a target distribution with scarce observations accurately, this letter proposes a novel framework for a nonparametric modeling of the -mixture and a geometrically inspired estimation algorithm. As numerical examples of the proposed framework, a transfer learning setup is considered. The experimental results show that this framework works well for three types of synthetic data sets, as well as an EEG real-world data set. Abstract This study considers the common situation in data analysis when there are few observations of the distribution of interest or the target distribution, while abundant observations are available from auxiliary distributions. In this situation, it is natural to compensate for the lack of data from the target distribution by using data sets from these auxiliary distributions—in other words, approximating the target distribution in a subspace spanned by a set of auxiliary distributions. Mixture modeling is one of the simplest ways to integrate information from the target and auxiliary distributions in order to express the target distribution as accurately as possible. There are two typical mixtures in the context of information geometry: the - and -mixtures. The -mixture is applied in a variety of research fields because of the presence of the well-known expectation-maximazation algorithm for parameter estimation, whereas the -mixture is rarely used because of its difficulty of estimation, particularly for nonparametric models. The -mixture, however, is a well-tempered distribution that satisfies the principle of maximum entropy. To model a target distribution with scarce observations accurately, this letter proposes a novel framework for a nonparametric modeling of the -mixture and a geometrically inspired estimation algorithm. As numerical examples of the proposed framework, a transfer learning setup is considered. The experimental results show that this framework works well for three types of synthetic data sets, as well as an EEG real-world data set. Abstract This study considers the common situation in data analysis when there are few observations of the distribution of interest or the target distribution, while abundant observations are available from auxiliary distributions. In this situation, it is natural to compensate for the lack of data from the target distribution by using data sets from these auxiliary distributions—in other words, approximating the target distribution in a subspace spanned by a set of auxiliary distributions. Mixture modeling is one of the simplest ways to integrate information from the target and auxiliary distributions in order to express the target distribution as accurately as possible. There are two typical mixtures in the context of information geometry: the - and -mixtures. The -mixture is applied in a variety of research fields because of the presence of the well-known expectation-maximazation algorithm for parameter estimation, whereas the -mixture is rarely used because of its difficulty of estimation, particularly for nonparametric models. The -mixture, however, is a well-tempered distribution that satisfies the principle of maximum entropy. To model a target distribution with scarce observations accurately, this letter proposes a novel framework for a nonparametric modeling of the -mixture and a geometrically inspired estimation algorithm. As numerical examples of the proposed framework, a transfer learning setup is considered. The experimental results show that this framework works well for three types of synthetic data sets, as well as an EEG real-world data set. Abstract This study considers the common situation in data analysis when there are few observations of the distribution of interest or the target distribution, while abundant observations are available from auxiliary distributions. In this situation, it is natural to compensate for the lack of data from the target distribution by using data sets from these auxiliary distributions—in other words, approximating the target distribution in a subspace spanned by a set of auxiliary distributions. Mixture modeling is one of the simplest ways to integrate information from the target and auxiliary distributions in order to express the target distribution as accurately as possible. There are two typical mixtures in the context of information geometry: the - and -mixtures. The -mixture is applied in a variety of research fields because of the presence of the well-known expectation-maximazation algorithm for parameter estimation, whereas the -mixture is rarely used because of its difficulty of estimation, particularly for nonparametric models. The -mixture, however, is a well-tempered distribution that satisfies the principle of maximum entropy. To model a target distribution with scarce observations accurately, this letter proposes a novel framework for a nonparametric modeling of the -mixture and a geometrically inspired estimation algorithm. As numerical examples of the proposed framework, a transfer learning setup is considered. The experimental results show that this framework works well for three types of synthetic data sets, as well as an EEG real-world data set.

Journal Articles

Publisher: Journals Gateway

*Neural Computation*(2014) 26 (7): 1455–1483.

Published: 01 July 2014

FIGURES
| View All (24)

Abstract

View article
A graph is a mathematical representation of a set of variables where some pairs of the variables are connected by edges. Common examples of graphs are railroads, the Internet, and neural networks. It is both theoretically and practically important to estimate the intensity of direct connections between variables. In this study, a problem of estimating the intrinsic graph structure from observed data is considered. The observed data in this study are a matrix with elements representing dependency between nodes in the graph. The dependency represents more than direct connections because it includes influences of various paths. For example, each element of the observed matrix represents a co-occurrence of events at two nodes or a correlation of variables corresponding to two nodes. In this setting, spurious correlations make the estimation of direct connection difficult. To alleviate this difficulty, a digraph Laplacian is used for characterizing a graph. A generative model of this observed matrix is proposed, and a parameter estimation algorithm for the model is also introduced. The notable advantage of the proposed method is its ability to deal with directed graphs, while conventional graph structure estimation methods such as covariance selections are applicable only to undirected graphs. The algorithm is experimentally shown to be able to identify the intrinsic graph structure.

Journal Articles

Publisher: Journals Gateway

*Neural Computation*(2004) 16 (1): 115–137.

Published: 01 January 2004

Abstract

View article
This letter analyzes the Fisher kernel from a statistical point of view. The Fisher kernel is a particularly interesting method for constructing a model of the posterior probability that makes intelligent use of unlabeled data (i.e., of the underlying data density). It is important to analyze and ultimately understand the statistical properties of the Fisher kernel. To this end, we first establish sufficient conditions that the constructed posterior model is realizable (i.e., it contains the true distribution). Realizability immediately leads to consistency results. Subsequently, we focus on an asymptotic analysis of the generalization error, which elucidates the learning curves of the Fisher kernel and how unlabeled data contribute to learning. We also point out that the squared or log loss is theoretically preferable-because both yield consistent estimators-to other losses such as the exponential loss, when a linear classifier is used together with the Fisher kernel. Therefore, this letter underlines that the Fisher kernel should be viewed not as a heuristics but as a powerful statistical tool with well-controlled statistical properties.

Journal Articles

Publisher: Journals Gateway

*Neural Computation*(2000) 12 (6): 1411–1427.

Published: 01 June 2000

Abstract

View article
Theories of learning and generalization hold that the generalization bias, defined as the difference between the training error and the generalization error, increases on average with the number of adaptive parameters. This article, however, shows that this general tendency is violated for a gaussian mixture model. For temperatures just below the first symmetry breaking point, the effective number of adaptive parameters increases and the generalization bias decreases. We compute the dependence of the neural information criterion on temperature around the symmetry breaking. Our results are confirmed by numerical cross-validation experiments.