Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
TocHeadingTitle
Date
Availability
1-3 of 3
Naftali Tishby
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Neural Computation (2006) 18 (8): 1739–1789.
Published: 01 August 2006
Abstract
View article
PDF
The information bottleneck (IB) method is an unsupervised model independent data organization technique. Given a joint distribution, p(X, Y) , this method constructs a new variable, T , that extracts partitions, or clusters, over the values of X that are informative about Y . Algorithms that are motivated by the IB method have already been applied to text classification, gene expression, neural code, and spectral analysis. Here, we introduce a general principled framework for multivariate extensions of the IB method. This allows us to consider multiple systems of data partitions that are interrelated. Our approach utilizes Bayesian networks for specifying the systems of clusters and which information terms should be maintained. We show that this construction provides insights about bottleneck variations and enables us to characterize the solutions of these variations. We also present four different algorithmic approaches that allow us to construct solutions in practice and apply them to several real-world problems.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2001) 13 (12): 2681–2708.
Published: 01 December 2001
Abstract
View article
PDF
The detection of a specific stochastic pattern embedded in an unknown background noise is a difficult pattern recognition problem, encountered in many applications such as word spotting in speech. A similar problem emerges when trying to detect a multineural spike pattern in a single electrical recording, embedded in the complex cortical activity of a behaving animal. Solving this problem is crucial for the identification of neuronal code words with specific meaning. The technical difficulty of this detection is due to the lack of a good statistical model for the background activity, which rapidly changes with the recording conditions and activity of the animal. This work introduces the use of an adversary background model. This model assumes that the background “knows” the pattern sought, up to a first-order statistics, and this “knowledge” creates a background composed of all the permutations of our pattern. We show that this background model is tightly connected to the type-based information-theoretic approach. Furthermore, we show that computing the likelihood ratio is actually decomposing the log-likelihood distribution according to types of the empirical counts. We demonstrate the application of this method for detection of the reward patterns in the basal ganglia of behaving monkeys, yielding some unexpected biological results.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2001) 13 (11): 2409–2463.
Published: 01 November 2001
Abstract
View article
PDF
We define predictive information I pred (T) as the mutual information between the past and the future of a time series. Three qualitatively different behaviors are found in the limit of large observation times T : I pred (T) can remain finite, grow logarithmically, or grow as a fractional power law. If the time series allows us to learn a model with a finite number of parameters, then I pred (T) grows logarithmically with a coefficient that counts the dimensionality of the model space. In contrast, power-law growth is associated, for example, with the learning of infinite parameter (or non-parametric) models such as continuous functions with smoothness constraints. There are connections between the predictive information and measures of complexity that have been defined both in learning theory and the analysis of physical systems through statistical mechanics and dynamical systems theory. Furthermore, in the same way that entropy provides the unique measure of available information consistent with some simple and plausible conditions, we argue that the divergent part of I pred (T) provides the unique measure for the complexity of dynamics underlying a time series. Finally, we discuss how these ideas may be useful in problems in physics, statistics, and biology.