Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
TocHeadingTitle
Date
Availability
1-8 of 8
Shotaro Akaho
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Neural Computation (2022) 34 (6): 1398–1424.
Published: 19 May 2022
FIGURES
Abstract
View article
PDF
The full-span log-linear (FSLL) model introduced in this letter is considered an n th order Boltzmann machine, where n is the number of all variables in the target system. Let X = ( X 0 , … , X n - 1 ) be finite discrete random variables that can take | X | = | X 0 | … | X n - 1 | different values. The FSLL model has | X | - 1 parameters and can represent arbitrary positive distributions of X . The FSLL model is a highest-order Boltzmann machine; nevertheless, we can compute the dual parameter of the model distribution, which plays important roles in exponential families in O ( | X | log | X | ) time. Furthermore, using properties of the dual parameters of the FSLL model, we can construct an efficient learning algorithm. The FSLL model is limited to small probabilistic models up to | X | ≈ 2 25 ; however, in this problem domain, the FSLL model flexibly fits various true distributions underlying the training data without any hyperparameter tuning. The experiments showed that the FSLL successfully learned six training data sets such that | X | = 2 20 within 1 minute with a laptop PC.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2022) 34 (5): 1189–1219.
Published: 15 April 2022
Abstract
View article
PDF
This letter proposes an extension of principal component analysis for gaussian process (GP) posteriors, denoted by GP-PCA. Since GP-PCA estimates a low-dimensional space of GP posteriors, it can be used for metalearning, a framework for improving the performance of target tasks by estimating a structure of a set of tasks. The issue is how to define a structure of a set of GPs with an infinite-dimensional parameter, such as coordinate system and a divergence. In this study, we reduce the infiniteness of GP to the finite-dimensional case under the information geometrical framework by considering a space of GP posteriors that have the same prior. In addition, we propose an approximation method of GP-PCA based on variational inference and demonstrate the effectiveness of GP-PCA as meta-learning through experiments.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2021) 33 (8): 2274–2307.
Published: 26 July 2021
Abstract
View article
PDF
The Fisher information matrix (FIM) plays an essential role in statistics and machine learning as a Riemannian metric tensor or a component of the Hessian matrix of loss functions. Focusing on the FIM and its variants in deep neural networks (DNNs), we reveal their characteristic scale dependence on the network width, depth, and sample size when the network has random weights and is sufficiently wide. This study covers two widely used FIMs for regression with linear output and for classification with softmax output. Both FIMs asymptotically show pathological eigenvalue spectra in the sense that a small number of eigenvalues become large outliers depending on the width or sample size, while the others are much smaller. It implies that the local shape of the parameter space or loss landscape is very sharp in a few specific directions while almost flat in the other directions. In particular, the softmax output disperses the outliers and makes a tail of the eigenvalue density spread from the bulk. We also show that pathological spectra appear in other variants of FIMs: one is the neural tangent kernel; another is a metric for the input signal and feature space that arises from feedforward signal propagation. Thus, we provide a unified perspective on the FIM and its variants that will lead to more quantitative understanding of learning in large-scale DNNs.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2017) 29 (7): 1838–1878.
Published: 01 July 2017
FIGURES
| View All (14)
Abstract
View article
PDF
We propose a method for intrinsic dimension estimation. By fitting the power of distance from an inspection point and the number of samples included inside a ball with a radius equal to the distance, to a regression model, we estimate the goodness of fit. Then, by using the maximum likelihood method, we estimate the local intrinsic dimension around the inspection point. The proposed method is shown to be comparable to conventional methods in global intrinsic dimension estimation experiments. Furthermore, we experimentally show that the proposed method outperforms a conventional local dimension estimation method.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2016) 28 (12): 2687–2725.
Published: 01 December 2016
FIGURES
| View All (60)
Abstract
View article
PDF
This study considers the common situation in data analysis when there are few observations of the distribution of interest or the target distribution, while abundant observations are available from auxiliary distributions. In this situation, it is natural to compensate for the lack of data from the target distribution by using data sets from these auxiliary distributions—in other words, approximating the target distribution in a subspace spanned by a set of auxiliary distributions. Mixture modeling is one of the simplest ways to integrate information from the target and auxiliary distributions in order to express the target distribution as accurately as possible. There are two typical mixtures in the context of information geometry: the - and -mixtures. The -mixture is applied in a variety of research fields because of the presence of the well-known expectation-maximazation algorithm for parameter estimation, whereas the -mixture is rarely used because of its difficulty of estimation, particularly for nonparametric models. The -mixture, however, is a well-tempered distribution that satisfies the principle of maximum entropy. To model a target distribution with scarce observations accurately, this letter proposes a novel framework for a nonparametric modeling of the -mixture and a geometrically inspired estimation algorithm. As numerical examples of the proposed framework, a transfer learning setup is considered. The experimental results show that this framework works well for three types of synthetic data sets, as well as an EEG real-world data set.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2014) 26 (7): 1455–1483.
Published: 01 July 2014
FIGURES
| View All (24)
Abstract
View article
PDF
A graph is a mathematical representation of a set of variables where some pairs of the variables are connected by edges. Common examples of graphs are railroads, the Internet, and neural networks. It is both theoretically and practically important to estimate the intensity of direct connections between variables. In this study, a problem of estimating the intrinsic graph structure from observed data is considered. The observed data in this study are a matrix with elements representing dependency between nodes in the graph. The dependency represents more than direct connections because it includes influences of various paths. For example, each element of the observed matrix represents a co-occurrence of events at two nodes or a correlation of variables corresponding to two nodes. In this setting, spurious correlations make the estimation of direct connection difficult. To alleviate this difficulty, a digraph Laplacian is used for characterizing a graph. A generative model of this observed matrix is proposed, and a parameter estimation algorithm for the model is also introduced. The notable advantage of the proposed method is its ability to deal with directed graphs, while conventional graph structure estimation methods such as covariance selections are applicable only to undirected graphs. The algorithm is experimentally shown to be able to identify the intrinsic graph structure.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2004) 16 (1): 115–137.
Published: 01 January 2004
Abstract
View article
PDF
This letter analyzes the Fisher kernel from a statistical point of view. The Fisher kernel is a particularly interesting method for constructing a model of the posterior probability that makes intelligent use of unlabeled data (i.e., of the underlying data density). It is important to analyze and ultimately understand the statistical properties of the Fisher kernel. To this end, we first establish sufficient conditions that the constructed posterior model is realizable (i.e., it contains the true distribution). Realizability immediately leads to consistency results. Subsequently, we focus on an asymptotic analysis of the generalization error, which elucidates the learning curves of the Fisher kernel and how unlabeled data contribute to learning. We also point out that the squared or log loss is theoretically preferable-because both yield consistent estimators-to other losses such as the exponential loss, when a linear classifier is used together with the Fisher kernel. Therefore, this letter underlines that the Fisher kernel should be viewed not as a heuristics but as a powerful statistical tool with well-controlled statistical properties.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2000) 12 (6): 1411–1427.
Published: 01 June 2000
Abstract
View article
PDF
Theories of learning and generalization hold that the generalization bias, defined as the difference between the training error and the generalization error, increases on average with the number of adaptive parameters. This article, however, shows that this general tendency is violated for a gaussian mixture model. For temperatures just below the first symmetry breaking point, the effective number of adaptive parameters increases and the generalization bias decreases. We compute the dependence of the neural information criterion on temperature around the symmetry breaking. Our results are confirmed by numerical cross-validation experiments.