Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
TocHeadingTitle
Date
Availability
1-6 of 6
Taiji Suzuki
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Neural Computation (2020) 32 (6): 1168–1221.
Published: 01 June 2020
FIGURES
| View All (6)
Abstract
View article
PDF
Sparse regularization such as ℓ 1 regularization is a quite powerful and widely used strategy for high-dimensional learning problems. The effectiveness of sparse regularization has been supported practically and theoretically by several studies. However, one of the biggest issues in sparse regularization is that its performance is quite sensitive to correlations between features. Ordinary ℓ 1 regularization selects variables correlated with each other under weak regularizations, which results in deterioration of not only its estimation error but also interpretability. In this letter, we propose a new regularization method, independently interpretable lasso (IILasso), for generalized linear models. Our proposed regularizer suppresses selecting correlated variables, so that each active variable affects the response independently in the model. Hence, we can interpret regression coefficients intuitively, and the performance is also improved by avoiding overfitting. We analyze the theoretical property of the IILasso and show that the proposed method is advantageous for its sign recovery and achieves almost minimax optimal convergence rate. Synthetic and real data analyses also indicate the effectiveness of the IILasso.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2014) 26 (6): 1169–1197.
Published: 01 June 2014
FIGURES
| View All (9)
Abstract
View article
PDF
We propose a new method for detecting changes in Markov network structure between two sets of samples. Instead of naively fitting two Markov network models separately to the two data sets and figuring out their difference, we directly learn the network structure change by estimating the ratio of Markov network models. This density-ratio formulation naturally allows us to introduce sparsity in the network structure change, which highly contributes to enhancing interpretability. Furthermore, computation of the normalization term, a critical bottleneck of the naive approach, can be remarkably mitigated. We also give the dual formulation of the optimization problem, which further reduces the computation cost for large-scale Markov networks. Through experiments, we demonstrate the usefulness of our method.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2013) 25 (10): 2734–2775.
Published: 01 October 2013
FIGURES
| View All (36)
Abstract
View article
PDF
We address the problem of estimating the difference between two probability densities. A naive approach is a two-step procedure of first estimating two densities separately and then computing their difference. However, this procedure does not necessarily work well because the first step is performed without regard to the second step, and thus a small estimation error incurred in the first stage can cause a big error in the second stage. In this letter, we propose a single-shot procedure for directly estimating the density difference without separately estimating two densities. We derive a nonparametric finite-sample error bound for the proposed single-shot density-difference estimator and show that it achieves the optimal convergence rate. We then show how the proposed density-difference estimator can be used in L 2 -distance approximation. Finally, we experimentally demonstrate the usefulness of the proposed method in robust distribution comparison such as class-prior estimation and change-point detection.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2013) 25 (5): 1324–1370.
Published: 01 May 2013
FIGURES
| View All (15)
Abstract
View article
PDF
Divergence estimators based on direct approximation of density ratios without going through separate approximation of numerator and denominator densities have been successfully applied to machine learning tasks that involve distribution comparison such as outlier detection, transfer learning, and two-sample homogeneity test. However, since density-ratio functions often possess high fluctuation, divergence estimation is a challenging task in practice. In this letter, we use relative divergences for distribution comparison, which involves approximation of relative density ratios. Since relative density ratios are always smoother than corresponding ordinary density ratios, our proposed method is favorable in terms of nonparametric convergence speed. Furthermore, we show that the proposed divergence estimator has asymptotic variance independent of the model complexity under a parametric setup, implying that the proposed estimator hardly overfits even with complex models. Through experiments, we demonstrate the usefulness of the proposedapproach.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2013) 25 (3): 725–758.
Published: 01 March 2013
FIGURES
Abstract
View article
PDF
The goal of sufficient dimension reduction in supervised learning is to find the low-dimensional subspace of input features that contains all of the information about the output values that the input features possess. In this letter, we propose a novel sufficient dimension-reduction method using a squared-loss variant of mutual information as a dependency measure. We apply a density-ratio estimator for approximating squared-loss mutual information that is formulated as a minimum contrast estimator on parametric or nonparametric models. Since cross-validation is available for choosing an appropriate model, our method does not require any prespecified structure on the underlying distributions. We elucidate the asymptotic bias of our estimator on parametric models and the asymptotic convergence rate on nonparametric models. The convergence analysis utilizes the uniform tail-bound of a U -process, and the convergence rate is characterized by the bracketing entropy of the model. We then develop a natural gradient algorithm on the Grassmann manifold for sufficient subspace search. The analytic formula of our estimator allows us to compute the gradient efficiently. Numerical experiments show that the proposed method compares favorably with existing dimension-reduction approaches on artificial and benchmark data sets.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2011) 23 (1): 284–301.
Published: 01 January 2011
FIGURES
| View All (4)
Abstract
View article
PDF
Accurately evaluating statistical independence among random variables is a key element of independent component analysis (ICA). In this letter, we employ a squared-loss variant of mutual information as an independence measure and give its estimation method. Our basic idea is to estimate the ratio of probability densities directly without going through density estimation, thereby avoiding the difficult task of density estimation. In this density ratio approach, a natural cross-validation procedure is available for hyperparameter selection. Thus, all tuning parameters such as the kernel width or the regularization parameter can be objectively optimized. This is an advantage over recently developed kernel-based independence measures and is a highly useful property in unsupervised learning problems such as ICA. Based on this novel independence measure, we develop an ICA algorithm, named least-squares independent component analysis .
Includes: Supplementary data