Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
TocHeadingTitle
Date
Availability
1-12 of 12
Takafumi Kanamori
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Neural Computation (2023) 35 (7): 1288–1339.
Published: 12 June 2023
FIGURES
Abstract
View article
PDF
We consider the scenario of deep clustering, in which the available prior knowledge is limited. In this scenario, few existing state-of-the-art deep clustering methods can perform well for both noncomplex topology and complex topology data sets. To address the problem, we propose a constraint utilizing symmetric InfoNCE, which helps an objective of the deep clustering method in the scenario of training the model so as to be efficient for not only noncomplex topology but also complex topology data sets. Additionally, we provide several theoretical explanations of the reason that the constraint can enhances the performance of deep clustering methods. To confirm the effectiveness of the proposed constraint, we introduce a deep clustering method named MIST, which is a combination of an existing deep clustering method and our constraint. Our numerical experiments via MIST demonstrate that the constraint is effective. In addition, MIST outperforms other state-of-the-art deep clustering methods for most of the commonly used 10 benchmark data sets.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2019) 31 (8): 1718–1750.
Published: 01 August 2019
FIGURES
Abstract
View article
PDF
In this letter, we propose a variable selection method for general nonparametric kernel-based estimation. The proposed method consists of two-stage estimation: (1) construct a consistent estimator of the target function, and (2) approximate the estimator using a few variables by ℓ 1 -type penalized estimation. We see that the proposed method can be applied to various kernel nonparametric estimation such as kernel ridge regression, kernel-based density, and density-ratio estimation. We prove that the proposed method has the property of variable selection consistency when the power series kernel is used. Here, the power series kernel is a certain class of kernels containing polynomial and exponential kernels. This result is regarded as an extension of the variable selection consistency for the nonnegative garrote (NNG), a special case of the adaptive Lasso, to the kernel-based estimators. Several experiments, including simulation studies and real data applications, show the effectiveness of the proposed method.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2017) 29 (5): 1406–1438.
Published: 01 May 2017
FIGURES
| View All (16)
Abstract
View article
PDF
Nonconvex variants of support vector machines (SVMs) have been developed for various purposes. For example, robust SVMs attain robustness to outliers by using a nonconvex loss function, while extended -SVM (E -SVM) extends the range of the hyperparameter by introducing a nonconvex constraint. Here, we consider an extended robust support vector machine (ER-SVM), a robust variant of E -SVM. ER-SVM combines two types of nonconvexity from robust SVMs and E -SVM. Because of the two nonconvexities, the existing algorithm we proposed needs to be divided into two parts depending on whether the hyperparameter value is in the extended range or not. The algorithm also heuristically solves the nonconvex problem in the extended range. In this letter, we propose a new, efficient algorithm for ER-SVM. The algorithm deals with two types of nonconvexity while never entailing more computations than either E -SVM or robust SVM, and it finds a critical point of ER-SVM. Furthermore, we show that ER-SVM includes the existing robust SVMs as special cases. Numerical experiments confirm the effectiveness of integrating the two nonconvexities.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2014) 26 (11): 2541–2569.
Published: 01 November 2014
FIGURES
| View All (28)
Abstract
View article
PDF
Financial risk measures have been used recently in machine learning. For example, -support vector machine ( -SVM) minimizes the conditional value at risk (CVaR) of margin distribution. The measure is popular in finance because of the subadditivity property, but it is very sensitive to a few outliers in the tail of the distribution. We propose a new classification method, extended robust SVM (ER-SVM), which minimizes an intermediate risk measure between the CVaR and value at risk (VaR) by expecting that the resulting model becomes less sensitive than -SVM to outliers. We can regard ER-SVM as an extension of robust SVM, which uses a truncated hinge loss. Numerical experiments imply the ER-SVM’s possibility of achieving a better prediction performance with proper parameter setting.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2013) 25 (10): 2734–2775.
Published: 01 October 2013
FIGURES
| View All (36)
Abstract
View article
PDF
We address the problem of estimating the difference between two probability densities. A naive approach is a two-step procedure of first estimating two densities separately and then computing their difference. However, this procedure does not necessarily work well because the first step is performed without regard to the second step, and thus a small estimation error incurred in the first stage can cause a big error in the second stage. In this letter, we propose a single-shot procedure for directly estimating the density difference without separately estimating two densities. We derive a nonparametric finite-sample error bound for the proposed single-shot density-difference estimator and show that it achieves the optimal convergence rate. We then show how the proposed density-difference estimator can be used in L 2 -distance approximation. Finally, we experimentally demonstrate the usefulness of the proposed method in robust distribution comparison such as class-prior estimation and change-point detection.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2013) 25 (5): 1324–1370.
Published: 01 May 2013
FIGURES
| View All (15)
Abstract
View article
PDF
Divergence estimators based on direct approximation of density ratios without going through separate approximation of numerator and denominator densities have been successfully applied to machine learning tasks that involve distribution comparison such as outlier detection, transfer learning, and two-sample homogeneity test. However, since density-ratio functions often possess high fluctuation, divergence estimation is a challenging task in practice. In this letter, we use relative divergences for distribution comparison, which involves approximation of relative density ratios. Since relative density ratios are always smoother than corresponding ordinary density ratios, our proposed method is favorable in terms of nonparametric convergence speed. Furthermore, we show that the proposed divergence estimator has asymptotic variance independent of the model complexity under a parametric setup, implying that the proposed estimator hardly overfits even with complex models. Through experiments, we demonstrate the usefulness of the proposedapproach.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2013) 25 (3): 759–804.
Published: 01 March 2013
FIGURES
| View All (52)
Abstract
View article
PDF
A wide variety of machine learning algorithms such as the support vector machine (SVM), minimax probability machine (MPM), and Fisher discriminant analysis (FDA) exist for binary classification. The purpose of this letter is to provide a unified classification model that includes these models through a robust optimization approach. This unified model has several benefits. One is that the extensions and improvements intended for SVMs become applicable to MPM and FDA, and vice versa. For example, we can obtain nonconvex variants of MPM and FDA by mimicking Perez-Cruz, Weston, Hermann, and Schölkopf's ( 2003 ) extension from convex ν-SVM to nonconvex Eν-SVM. Another benefit is to provide theoretical results concerning these learning methods at once by dealing with the unified model. We give a statistical interpretation of the unified classification model and prove that the model is a good approximation for the worst-case minimization of an expected loss with respect to the uncertain probability distribution. We also propose a nonconvex optimization algorithm that can be applied to nonconvex variants of existing learning methods and show promising numerical results.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2009) 21 (2): 533–559.
Published: 01 February 2009
FIGURES
| View All (8)
Abstract
View article
PDF
The goal of regression analysis is to describe the stochastic relationship between an input vector x and a scalar output y . This can be achieved by estimating the entire conditional density p ( y ∣ x ). In this letter, we present a new approach for nonparametric conditional density estimation. We develop a piecewise-linear path-following method for kernel-based quantile regression. It enables us to estimate the cumulative distribution function of p ( y ∣ x ) in piecewise-linear form for all x in the input domain. Theoretical analyses and experimental results are presented to show the effectiveness of the approach.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2008) 20 (6): 1596–1630.
Published: 01 June 2008
Abstract
View article
PDF
We discuss robustness against mislabeling in multiclass labels for classification problems and propose two algorithms of boosting, the normalized Eta-Boost.M and Eta-Boost.M, based on the Eta-divergence. Those two boosting algorithms are closely related to models of mislabeling in which the label is erroneously exchanged for others. For the two boosting algorithms, theoretical aspects supporting the robustness for mislabeling are explored. We apply the proposed two boosting methods for synthetic and real data sets to investigate the performance of these methods, focusing on robustness, and confirm the validity of the proposed methods.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2007) 19 (8): 2183–2244.
Published: 01 August 2007
Abstract
View article
PDF
Boosting is known as a gradient descent algorithm over loss functions. It is often pointed out that the typical boosting algorithm, Adaboost, is highly affected by outliers. In this letter, loss functions for robust boosting are studied. Based on the concept of robust statistics, we propose a transformation of loss functions that makes boosting algorithms robust against extreme outliers. Next, the truncation of loss functions is applied to contamination models that describe the occurrence of mislabels near decision boundaries. Numerical experiments illustrate that the proposed loss functions derived from the contamination models are useful for handling highly noisy data in comparison with other loss functions.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2004) 16 (7): 1437–1481.
Published: 01 July 2004
Abstract
View article
PDF
We aim at an extension of AdaBoost to U -Boost, in the paradigm to build a stronger classification machine from a set of weak learning machines. A geometric understanding of the Bregman divergence defined by a generic convex function U leads to the U -Boost method in the framework of information geometry extended to the space of the finite measures over a label set. We propose two versions of U -Boost learning algorithms by taking account of whether the domain is restricted to the space of probability functions. In the sequential step, we observe that the two adjacent and the initial classifiers are associated with a right triangle in the scale via the Bregman divergence, called the Pythagorean relation. This leads to a mild convergence property of the U -Boost algorithm as seen in the expectation-maximization algorithm. Statistical discussions for consistency and robustness elucidate the properties of the U -Boost methods based on a stochastic assumption for training data.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2002) 14 (10): 2469–2496.
Published: 01 October 2002
Abstract
View article
PDF
In the presence of a heavy-tail noise distribution, regression becomes much more difficult. Traditional robust regression methods assume that the noise distribution is symmetric, and they downweight the influence of so-called outliers. When the noise distribution is asymmetric, these methods yield biased regression estimators. Motivated by data-mining problems for the insurance industry, we propose a new approach to robust regression tailored to deal with asymmetric noise distribution. The main idea is to learn most of the parameters of the model using conditional quantile estimators (which are biased but robust estimators of the regression) and to learn a few remaining parameters to combine and correct these estimators, to minimize the average squared error in an unbiased way. Theoretical analysis and experiments show the clear advantages of the approach. Results are on artificial data as well as insurance data, using both linear and neural network predictors.