Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
Date
Availability
1-3 of 3
Andreas Wichert
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Neural Computation (2024) 36 (8): 1626–1642.
Published: 19 July 2024
FIGURES
Abstract
View article
PDF
In computer vision research, convolutional neural networks (CNNs) have demonstrated remarkable capabilities at extracting patterns from raw pixel data, achieving state-of-the-art recognition accuracy. However, they significantly differ from human visual perception, prioritizing pixel-level correlations and statistical patterns, often overlooking object semantics. To explore this difference, we propose an approach that isolates core visual features crucial for human perception and object recognition: color, texture, and shape. In experiments on three benchmarks—Fruits 360, CIFAR-10, and Fashion MNIST—each visual feature is individually input into a neural network. Results reveal data set–dependent variations in classification accuracy, highlighting that deep learning models tend to learn pixel-level correlations instead of fundamental visual features. To validate this observation, we used various combinations of concatenated visual features as input for a neural network on the CIFAR-10 data set. CNNs excel at learning statistical patterns in images, achieving exceptional performance when training and test data share similar distributions. To substantiate this point, we trained a CNN on CIFAR-10 data set and evaluated its performance on the “dog” class from CIFAR-10 and on an equivalent number of examples from the Stanford Dogs data set. The CNN poor performance on Stanford Dogs images underlines the disparity between deep learning and human visual perception, highlighting the need for models that learn object semantics. Specialized benchmark data sets with controlled variations hold promise for aligning learned representations with human cognition in computer vision research.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2021) 33 (12): 3334–3350.
Published: 12 November 2021
Abstract
View article
PDF
Convolutional neural networks (CNNs) evolved from Fukushima's neocognitron model, which is based on the ideas of Hubel and Wiesel about the early stages of the visual cortex. Unlike other branches of neocognitron-based models, the typical CNN is based on end-to-end supervised learning by backpropagation and removes the focus from built-in invariance mechanisms, using pooling not as a way to tolerate small shifts but as a regularization tool that decreases model complexity. These properties of end-to-end supervision and flexibility of structure allow the typical CNN to become highly tuned to the training data, leading to extremely high accuracies on typical visual pattern recognition data sets. However, in this work, we hypothesize that there is a flip side to this capability, a hidden overfitting. More concretely, a supervised, backpropagation based CNN will outperform a neocognitron/map transformation cascade (MTC) when trained and tested inside the same data set. Yet if we take both models trained and test them on the same task but on another data set (without retraining), the overfitting appears. Other neocognitron descendants like the What-Where model go in a different direction. In these models, learning remains unsupervised, but more structure is added to capture invariance to typical changes. Knowing that, we further hypothesize that if we repeat the same experiments with this model, the lack of supervision may make it worse than the typical CNN inside the same data set, but the added structure will make it generalize even better to another one. To put our hypothesis to the test, we choose the simple task of handwritten digit classification and take two well-known data sets of it: MNIST and ETL-1. To try to make the two data sets as similar as possible, we experiment with several types of preprocessing. However, regardless of the type in question, the results align exactly with expectation.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2020) 32 (1): 136–152.
Published: 01 January 2020
FIGURES
| View All (10)
Abstract
View article
PDF
Willshaw networks are single-layered neural networks that store associations between binary vectors. Using only binary weights, these networks can be implemented efficiently to store large numbers of patterns and allow for fault-tolerant recovery of those patterns from noisy cues. However, this is only the case when the involved codes are sparse and randomly generated. In this letter, we use a recently proposed approach that maps visual patterns into informative binary features. By doing so, we manage to transform MNIST handwritten digits into well-distributed codes that we then store in a Willshaw network in autoassociation. We perform experiments with both noisy and noiseless cues and verify a tenuous impact on the recovered pattern's relevant information. More specifically, we were able to perform retrieval after filling the memory to several factors of its number of units while preserving the information of the class to which the pattern belongs.