Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
TocHeadingTitle
Date
Availability
1-10 of 10
Bernd Porr
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Neural Computation (2020) 32 (11): 2122–2144.
Published: 01 November 2020
Abstract
View article
PDF
A reflex is a simple closed-loop control approach that tries to minimize an error but fails to do so because it will always react too late. An adaptive algorithm can use this error to learn a forward model with the help of predictive cues. For example, a driver learns to improve steering by looking ahead to avoid steering in the last minute. In order to process complex cues such as the road ahead, deep learning is a natural choice. However, this is usually achieved only indirectly by employing deep reinforcement learning having a discrete state space. Here, we show how this can be directly achieved by embedding deep learning into a closed-loop system and preserving its continuous processing. We show in z-space specifically how error backpropagation can be achieved and in general how gradient-based approaches can be analyzed in such closed-loop scenarios. The performance of this learning paradigm is demonstrated using a line follower in simulation and on a real robot that shows very fast and continuous learning.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2013) 25 (11): 2976–3019.
Published: 01 November 2013
FIGURES
| View All (27)
Abstract
View article
PDF
We present formal specification and verification of a robot moving in a complex network, using temporal sequence learning to avoid obstacles. Our aim is to demonstrate the benefit of using a formal approach to analyze such a system as a complementary approach to simulation. We first describe a classical closed-loop simulation of the system and compare this approach to one in which the system is analyzed using formal verification. We show that the formal verification has some advantages over classical simulation and finds deficiencies our classical simulation did not identify. Specifically we present a formal specification of the system, defined in the Promela modeling language and show how the associated model is verified using the Spin model checker. We then introduce an abstract model that is suitable for verifying the same properties for any environment with obstacles under a given set of assumptions. We outline how we can prove that our abstraction is sound: any property that holds for the abstracted model will hold in the original (unabstracted) model.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2009) 21 (4): 1173–1202.
Published: 01 April 2009
FIGURES
| View All (9)
Abstract
View article
PDF
In this theoretical contribution, we provide mathematical proof that two of the most important classes of network learning—correlation-based differential Hebbian learning and reward-based temporal difference learning—are asymptotically equivalent when timing the learning with a modulatory signal. This opens the opportunity to consistently reformulate most of the abstract reinforcement learning framework from a correlation-based perspective more closely related to the biophysics of neurons.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2007) 19 (10): 2694–2719.
Published: 01 October 2007
Abstract
View article
PDF
It is a well-known fact that Hebbian learning is inherently unstable because of its self-amplifying terms: the more a synapse grows, the stronger the postsynaptic activity, and therefore the faster the synaptic growth. This unwanted weight growth is driven by the autocorrelation term of Hebbian learning where the same synapse drives its own growth. On the other hand, the cross-correlation term performs actual learning where different inputs are correlated with each other. Consequently, we would like to minimize the autocorrelation and maximize the cross-correlation. Here we show that we can achieve this with a third factor that switches on learning when the autocorrelation is minimal or zero and the cross-correlation is maximal. The biological counterpart of such a third factor is a neuromodulator that switches on learning at a certain moment in time. We show in a behavioral experiment that our three-factor learning clearly outperforms classical Hebbian learning.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2006) 18 (6): 1380–1412.
Published: 01 June 2006
Abstract
View article
PDF
Currently all important, low-level, unsupervised network learning algorithms follow the paradigm of Hebb, where input and output activity are correlated to change the connection strength of a synapse. However, as a consequence, classical Hebbian learning always carries a potentially destabilizing autocorrelation term, which is due to the fact that every input is in a weighted form reflected in the neuron's output. This self-correlation can lead to positive feedback, where increasing weights will increase the output, and vice versa, which may result in divergence. This can be avoided by different strategies like weight normalization or weight saturation, which, however, can cause different problems. Consequently, in most cases, high learning rates cannot be used for Hebbian learning, leading to relatively slow convergence. Here we introduce a novel correlation-based learning rule that is related to our isotropic sequence order (ISO) learning rule (Porr & Wörgötter, 2003a), but replaces the derivative of the output in the learning rule with the derivative of the reflex input. Hence, the new rule uses input correlations only, effectively implementing strict heterosynaptic learning. This looks like a minor modification but leads to dramatically improved properties. Elimination of the output from the learning rule removes the unwanted, destabilizing autocorrelation term, allowing us to use high learning rates. As a consequence, we can mathematically show that the theoretical optimum of one-shot learning can be reached under ideal conditions with the new rule. This result is then tested against four different experimental setups, and we will show that in all of them, very few (and sometimes only one) learning experiences are needed to achieve the learning goal. As a consequence, the new learning rule is up to 100 times faster and in general more stable than ISO learning.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2006) 18 (5): 1156–1196.
Published: 01 May 2006
Abstract
View article
PDF
Biped walking remains a difficult problem, and robot models can greatly facilitate our understanding of the underlying biomechanical principles as well as their neuronal control. The goal of this study is to specifically demonstrate that stable biped walking can be achieved by combining the physical properties of the walking robot with a small, reflex-based neuronal network governed mainly by local sensor signals. Building on earlier work (Taga, 1995; Cruse, Kindermann, Schumm, Dean, & Schmitz, 1998), this study shows that human-like gaits emerge without specific position or trajectory control and that the walker is able to compensate small disturbances through its own dynamical properties. The reflexive controller used here has the following characteristics, which are different from earlier approaches: (1) Control is mainly local. Hence, it uses only two signals (anterior extreme angle and ground contact), which operate at the interjoint level. All other signals operate only at single joints. (2) Neither position control nor trajectory tracking control is used. Instead, the approximate nature of the local reflexes on each joint allows the robot mechanics itself (e.g., its passive dynamics) to contribute substantially to the overall gait trajectory computation. (3) The motor control scheme used in the local reflexes of our robot is more straightforward and has more biological plausibility than that of other robots, because the outputs of the motor neurons in our reflexive controller are directly driving the motors of the joints rather than working as references for position or velocity control. As a consequence, the neural controller and the robot mechanics are closely coupled as a neuromechanical system, and this study emphasizes that dynamically stable biped walking gaits emerge from the coupling between neural computation and physical computation. This is demonstrated by different walking experiments using a real robot as well as by a Poincaré map analysis applied on a model of the robot in order to assess its stability.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2005) 17 (2): 245–319.
Published: 01 February 2005
Abstract
View article
PDF
In this review, we compare methods for temporal sequence learning (TSL) across the disciplines machine-control, classical conditioning, neuronal models for TSL as well as spike-timing-dependent plasticity (STDP). This review introduces the most influential models and focuses on two questions: To what degree are reward-based (e.g., TD learning) and correlation-based (Hebbian) learning related? and How do the different models correspond to possibly underlying biological mechanisms of synaptic plasticity? We first compare the different models in an open-loop condition, where behavioral feedback does not alter the learning. Here we observe that reward-based and correlation-based learning are indeed very similar. Machine control is then used to introduce the problem of closed-loop control (e.g., actor-critic architectures). Here the problem of evaluative (rewards) versus nonevaluative (correlations) feedback from the environment will be discussed, showing that both learning approaches are fundamentally different in the closed-loop condition. In trying to answer the second question, we compare neuronal versions of the different learning architectures to the anatomy of the involved brain structures (basal-ganglia, thalamus, and cortex) and the molecular biophysics of glutamatergic and dopaminergic synapses. Finally, we discuss the different algorithms used to model STDP and compare them to reward-based learning rules. Certain similarities are found in spite of the strongly different timescales. Here we focus on the biophysics of the different calcium-release mechanisms known to be involved in STDP.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2004) 16 (3): 595–625.
Published: 01 March 2004
Abstract
View article
PDF
Spike-timing-dependent plasticity (STDP) is described by long-term potentiation (LTP), when a presynaptic event precedes a postsynaptic event, and by long-term depression (LTD), when the temporal order is reversed. In this article, we present a biophysical model of STDP based on a differential Hebbian learning rule (ISO learning). This rule correlates presynaptically the NMDA channel conductance with the derivative of the membrane potential at the synapse as the postsynaptic signal. The model is able to reproduce the generic STDP weight change characteristic. We find that (1) The actual shape of the weight change curve strongly depends on the NMDA channel characteristics and on the shape of the membrane potential at the synapse. (2) The typical antisymmetrical STDP curve (LTD and LTP) can become similar to a standard Hebbian characteristic (LTP only) without having to change the learning rule. This occurs if the membrane depolarization has a shallow onset and is long lasting. (3) It is known that the membrane potential varies along the dendrite as a result of the active or passive backpropagation of somatic spikes or because of local dendritic processes. As a consequence, our model predicts that learning properties will be different at different locations on the dendritic tree. In conclusion, such site-specific synaptic plasticity would provide a neuron with powerful learning capabilities.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2003) 15 (4): 831–864.
Published: 01 April 2003
Abstract
View article
PDF
In this article, we present an isotropic unsupervised algorithm for temporal sequence learning. No special reward signal is used such that all inputs are completely isotropic. All input signals are bandpass filtered before converging onto a linear output neuron. All synaptic weights change according to the correlation of bandpass-filtered inputs with the derivative of the output. We investigate the algorithm in an open- and a closed-loop condition, the latter being defined by embedding the learning system into a behavioral feedback loop. In the open-loop condition, we find that the linear structure of the algorithm allows analytically calculating the shape of the weight change, which is strictly heterosynaptic and follows the shape of the weight change curves found in spike-time-dependent plasticity. Furthermore, we show that synaptic weights stabilize automatically when no more temporal differences exist between the inputs without additional normalizing measures. In the second part of this study, the algorithm is is placed in an environment that leads to closed sensor-motor loop. To this end, a robot is programmed with a prewired retraction reflex reaction in response to collisions. Through isotropic sequence order (ISO) learning, the robot achieves collision avoidance by learning the correlation between his early range-finder signals and the later occurring collision signal. Synaptic weights stabilize at the end of learning as theoretically predicted. Finally, we discuss the relation of ISO learning with other drive reinforcement models and with the commonly used temporal difference learning algorithm. This study is followed up by a mathematical analysis of the closed-loop situation in the companion article in this issue, “ISO Learning Approximates a Solution to the Inverse-Controller Problem in an Unsupervised Behavioral Paradigm” (pp. 865–884).
Journal Articles
Publisher: Journals Gateway
Neural Computation (2003) 15 (4): 865–884.
Published: 01 April 2003
Abstract
View article
PDF
In “Isotropic Sequence Order Learning” (pp. 831–864 in this issue), we introduced a novel algorithm for temporal sequence learning (ISO learning). Here, we embed this algorithm into a formal nonevaluating (teacher free) environment, which establishes a sensor-motor feedback. The system is initially guided by a fixed reflex reaction, which has the objective disadvantage that it can react only after a disturbance has occurred. ISO learning eliminates this disadvantage by replacing the reflex-loop reactions with earlier anticipatory actions. In this article, we analytically demonstrate that this process can be understood in terms of control theory, showing that the system learns the inverse controller of its own reflex. Thereby, this system is able to learn a simple form of feedforward motor control.