Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
TocHeadingTitle
Date
Availability
1-9 of 9
Lei Xu
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Neural Computation (2023) 35 (8): 1404–1429.
Published: 12 July 2023
FIGURES
| View All (13)
Abstract
View articletitled, Graph-Regularized Tensor Regression: A Domain-Aware Framework for Interpretable Modeling of Multiway Data on Graphs
View
PDF
for article titled, Graph-Regularized Tensor Regression: A Domain-Aware Framework for Interpretable Modeling of Multiway Data on Graphs
Modern data analytics applications are increasingly characterized by exceedingly large and multidimensional data sources. This represents a challenge for traditional machine learning models, as the number of model parameters needed to process such data grows exponentially with the data dimensions, an effect known as the curse of dimensionality. Recently, tensor decomposition (TD) techniques have shown promising results in reducing the computational costs associated with large-dimensional models while achieving comparable performance. However, such tensor models are often unable to incorporate the underlying domain knowledge when compressing high-dimensional models. To this end, we introduce a novel graph-regularized tensor regression (GRTR) framework, whereby domain knowledge about intramodal relations is incorporated into the model in the form of a graph Laplacian matrix. This is then used as a regularization tool to promote a physically meaningful structure within the model parameters. By virtue of tensor algebra, the proposed framework is shown to be fully interpretable, both coefficient-wise and dimension-wise. The GRTR model is validated in a multiway regression setting and compared against competing models and is shown to achieve improved performance at reduced computational costs. Detailed visualizations are provided to help readers gain an intuitive understanding of the employed tensor operations.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2013) 25 (6): 1642–1659.
Published: 01 June 2013
FIGURES
| View All (6)
Abstract
View articletitled, Simple Neural-Like P Systems for Maximal Independent Set Selection
View
PDF
for article titled, Simple Neural-Like P Systems for Maximal Independent Set Selection
Membrane systems (P systems) are distributed computing models inspired by living cells where a collection of processors jointly achieves a computing task. The problem of maximal independent set (MIS) selection in a graph is to choose a set of nonadjacent nodes to which no further nodes can be added. In this letter, we design a class of simple neural-like P systems to solve the MIS selection problem efficiently in a distributed way. This new class of systems possesses two features that are attractive for both distributed computing and membrane computing: first, the individual processors do not need any information about the overall size of the graph; second, they communicate using only one-bit messages.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2007) 19 (2): 546–569.
Published: 01 February 2007
Abstract
View articletitled, One-Bit-Matching Theorem for ICA, Convex-Concave Programming on Polyhedral Set, and Distribution Approximation for Combinatorics
View
PDF
for article titled, One-Bit-Matching Theorem for ICA, Convex-Concave Programming on Polyhedral Set, and Distribution Approximation for Combinatorics
According to the proof by Liu, Chiu, and Xu (2004) on the so-called one-bit-matching conjecture (Xu, Cheung, and Amari, 1998a), all the sources can be separated as long as there is an one-to-one same-sign correspondence between the kurtosis signs of all source probability density functions (pdf's) and the kurtosis signs of all model pdf's, which is widely believed and implicitly supported by many empirical studies. However, this proof is made only in a weak sense that the conjecture is true when the global optimal solution of an independent component analysis criterion is reached. Thus, it cannot support the successes of many existing iterative algorithms that usually converge at one of the local optimal solutions. This article presents a new mathematical proof that is obtained in a strong sense that the conjecture is also true when any one of local optimal solutions is reached in helping to investigating convex-concave programming on a polyhedral set. Theorems are also provided not only on partial separation of sources when there is a partial matching between the kurtosis signs, but also on an interesting duality of maximization and minimization on source separation. Moreover, corollaries are obtained on an interesting duality, with supergaussian sources separated by maximization and subgaussian sources separated by minimization. Also, a corollary is obtained to confirm the symmetric orthogonalization implementation of the kurtosis extreme approach for separating multiple sources in parallel, which works empirically but lacks mathematical proof. Furthermore, a linkage has been set up to combinatorial optimization from a distribution approximation perspective and a Stiefel manifold perspective, with algorithms that guarantee convergence as well as satisfaction of constraints.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2005) 17 (2): 331–334.
Published: 01 February 2005
Abstract
View articletitled, A Further Result on the ICA One-Bit-Matching Conjecture
View
PDF
for article titled, A Further Result on the ICA One-Bit-Matching Conjecture
The one-bit-matching conjecture for independent component analysis (ICA) has been widely believed in the ICA community. Theoretically, it has been proved that under the assumption of zero skewness for the model probability density functions, the global maximum of a cost function derived from the typical objective function on the ICA problem with the one-bit-matching condition corresponds to a feasible solution of the ICA problem. In this note, we further prove that all the local maximums of the cost function correspond to the feasible solutions of the ICA problem in the two-source case under the same assumption. That is, as long as the one-bit-matching condition is satisfied, the two-source ICA problem can be successfully solved using any local descent algorithm of the typical objective function with the assumption of zero skewness for all the model probability density functions.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2004) 16 (2): 383–399.
Published: 01 February 2004
Abstract
View articletitled, One-Bit-Matching Conjecture for Independent Component Analysis
View
PDF
for article titled, One-Bit-Matching Conjecture for Independent Component Analysis
The one-bit-matching conjecture for independent component analysis (ICA) could be understood from different perspectives but is basically stated as “all the sources can be separated as long as there is a one-toone same-sign-correspondence between the kurtosis signs of all source probability density functions (pdf's) and the kurtosis signs of all model pdf's” (Xu, Cheung, & Amari, 1998a). This conjecture has been widely believed in the ICA community and implicitly supported by many ICA studies, such as the Extended Infomax (Lee, Girolami, & Sejnowski, 1999) and the soft switching algorithm (Welling & Weber, 2001). However, there is no mathematical proof to confirm the conjecture theoretically. In this article, only skewness and kurtosis are considered, and such a mathematical proof is given under the assumption that the skewness of the model densities vanishes. Moreover, empirical experiments are demonstrated on the robustness of the conjecture as the vanishing skewness assumption breaks. As a by-product, we also show that the kurtosis maximization criterion (Moreau & Macchi, 1996) is actually a special case of the minimum mutual information criterion for ICA.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2002) 14 (2): 303–324.
Published: 01 February 2002
Abstract
View articletitled, A Lagrange Multiplier and Hopfield-Type Barrier Function Method for the Traveling Salesman Problem
View
PDF
for article titled, A Lagrange Multiplier and Hopfield-Type Barrier Function Method for the Traveling Salesman Problem
A Lagrange multiplier and Hopfield-type barrier function method is proposed for approximating a solution of the traveling salesman problem. The method is derived from applications of Lagrange multipliers and a Hopfield-type barrier function and attempts to produce a solution of high quality by generating a minimum point of a barrier problem for a sequence of descending values of the barrier parameter. For any given value of the barrier parameter, the method searches for a minimum point of the barrier problem in a feasible descent direction, which has a desired property that lower and upper bounds on variables are always satisfied automatically if the step length is a number between zero and one. At each iteration, the feasible descent direction is found by updating Lagrange multipliers with a globally convergent iterative procedure. For any given value of the barrier parameter, the method converges to a stationary point of the barrier problem without any condition on the objective function. Theoretical and numerical results show that the method seems more effective and efficient than the softassign algorithm.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2000) 12 (12): 2881–2907.
Published: 01 December 2000
Abstract
View articletitled, Asymptotic Convergence Rate of the EM Algorithm for Gaussian Mixtures
View
PDF
for article titled, Asymptotic Convergence Rate of the EM Algorithm for Gaussian Mixtures
It is well known that the convergence rate of the expectation-maximization (EM) algorithm can be faster than those of convention first-order iterative algorithms when the overlap in the given mixture is small. But this argument has not been mathematically proved yet. This article studies this problem asymptotically in the setting of gaussian mixtures under the theoretical framework of Xu and Jordan (1996). It has been proved that the asymptotic convergence rate of the EM algorithm for gaussian mixtures locally around the true solution Θ* is o( e 0.5−ε (Θ*) ), where ε > 0 is an arbitrarily small number, o ( x ) means that it is a higher-order infinitesimal as x → 0, and e (Θ*) is a measure of the average overlap of gaussians in the mixture. In other words, the large sample local convergence rate for the EM algorithm tends to be asymptotically superlinear when e (Θ*) tends to zero.
Journal Articles
Publisher: Journals Gateway
Neural Computation (1996) 8 (1): 129–151.
Published: 01 January 1996
Abstract
View articletitled, On Convergence Properties of the EM Algorithm for Gaussian Mixtures
View
PDF
for article titled, On Convergence Properties of the EM Algorithm for Gaussian Mixtures
We build up the mathematical connection between the “Expectation-Maximization” (EM) algorithm and gradient-based approaches for maximum likelihood learning of finite gaussian mixtures. We show that the EM step in parameter space is obtained from the gradient via a projection matrix P , and we provide an explicit expression for the matrix. We then analyze the convergence of EM in terms of special properties of P and provide new results analyzing the effect that P has on the likelihood surface. Based on these mathematical results, we present a comparative discussion of the advantages and disadvantages of EM and other algorithms for the learning of gaussian mixture models.
Journal Articles
Publisher: Journals Gateway
Neural Computation (1995) 7 (3): 580–595.
Published: 01 May 1995
Abstract
View articletitled, Bayesian Self-Organization Driven by Prior Probability Distributions
View
PDF
for article titled, Bayesian Self-Organization Driven by Prior Probability Distributions
Recent work by Becker and Hinton (1992) shows a promising mechanism, based on maximizing mutual information assuming spatial coherence, by which a system can self-organize to learn visual abilities such as binocular stereo. We introduce a more general criterion, based on Bayesian probability theory, and thereby demonstrate a connection to Bayesian theories of visual perception and to other organization principles for early vision (Atick and Redlich 1990). Methods for implementation using variants of stochastic learning are described.