Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
TocHeadingTitle
Date
Availability
1-3 of 3
Ilya Nemenman
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Data Efficiency, Dimensionality Reduction, and the Generalized Symmetric Information Bottleneck
UnavailablePublisher: Journals Gateway
Neural Computation (2024) 36 (7): 1353–1379.
Published: 07 June 2024
Abstract
View articletitled, Data Efficiency, Dimensionality Reduction, and the Generalized Symmetric Information Bottleneck
View
PDF
for article titled, Data Efficiency, Dimensionality Reduction, and the Generalized Symmetric Information Bottleneck
The symmetric information bottleneck (SIB), an extension of the more familiar information bottleneck, is a dimensionality-reduction technique that simultaneously compresses two random variables to preserve information between their compressed versions. We introduce the generalized symmetric information bottleneck (GSIB), which explores different functional forms of the cost of such simultaneous reduction. We then explore the data set size requirements of such simultaneous compression. We do this by deriving bounds and root-mean-squared estimates of statistical fluctuations of the involved loss functions. We show that in typical situations, the simultaneous GSIB compression requires qualitatively less data to achieve the same errors compared to compressing variables one at a time. We suggest that this is an example of a more general principle that simultaneous compression is more data efficient than independent compression of each of the input variables.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2005) 17 (9): 2006–2033.
Published: 01 September 2005
Abstract
View articletitled, Fluctuation-Dissipation Theorem and Models of Learning
View
PDF
for article titled, Fluctuation-Dissipation Theorem and Models of Learning
Advances in statistical learning theory have resulted in a multitude of different designs of learning machines. But which ones are implemented by brains and other biological information processors? We analyze how various abstract Bayesian learners perform on different data and argue that it is difficult to determine which learning—theoretic computation is performed by a particular organism using just its performance in learning a stationary target (learning curve). Based on the fluctuation—dissipation relation in statistical physics, we then discuss a different experimental setup that might be able to solve the problem.
Journal Articles
Predictability, Complexity, and Learning
UnavailablePublisher: Journals Gateway
Neural Computation (2001) 13 (11): 2409–2463.
Published: 01 November 2001
Abstract
View articletitled, Predictability, Complexity, and Learning
View
PDF
for article titled, Predictability, Complexity, and Learning
We define predictive information I pred (T) as the mutual information between the past and the future of a time series. Three qualitatively different behaviors are found in the limit of large observation times T : I pred (T) can remain finite, grow logarithmically, or grow as a fractional power law. If the time series allows us to learn a model with a finite number of parameters, then I pred (T) grows logarithmically with a coefficient that counts the dimensionality of the model space. In contrast, power-law growth is associated, for example, with the learning of infinite parameter (or non-parametric) models such as continuous functions with smoothness constraints. There are connections between the predictive information and measures of complexity that have been defined both in learning theory and the analysis of physical systems through statistical mechanics and dynamical systems theory. Furthermore, in the same way that entropy provides the unique measure of available information consistent with some simple and plausible conditions, we argue that the divergent part of I pred (T) provides the unique measure for the complexity of dynamics underlying a time series. Finally, we discuss how these ideas may be useful in problems in physics, statistics, and biology.