Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
TocHeadingTitle
Date
Availability
1-4 of 4
Mitsuo Kawato
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Neural Computation (2020) 32 (11): 2069–2084.
Published: 01 November 2020
Abstract
View article
PDF
The cerebellum is known to have an important role in sensing and execution of precise time intervals, but the mechanism by which arbitrary time intervals can be recognized and replicated with high precision is unknown. We propose a computational model in which precise time intervals can be identified from the pattern of individual spike activity in a population of parallel fibers in the cerebellar cortex. The model depends on the presence of repeatable sequences of spikes in response to conditioned stimulus input. We emulate granule cells using a population of Izhikevich neuron approximations driven by random but repeatable mossy fiber input. We emulate long-term depression (LTD) and long-term potentiation (LTP) synaptic plasticity at the parallel fiber to Purkinje cell synapse. We simulate a delay conditioning paradigm with a conditioned stimulus (CS) presented to the mossy fibers and an unconditioned stimulus (US) some time later issued to the Purkinje cells as a teaching signal. We show that Purkinje cells rapidly adapt to decrease firing probability following onset of the CS only at the interval for which the US had occurred. We suggest that detection of replicable spike patterns provides an accurate and easily learned timing structure that could be an important mechanism for behaviors that require identification and production of precise time intervals.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2012) 24 (3): 577–606.
Published: 01 March 2012
FIGURES
| View All (17)
Abstract
View article
PDF
Reinforcement learning (RL) can provide a basic framework for autonomous robots to learn to control and maximize future cumulative rewards in complex environments. To achieve high performance, RL controllers must consider the complex external dynamics for movements and task (reward function) and optimize control commands. For example, a robot playing tennis and squash needs to cope with the different dynamics of a tennis or squash racket and such dynamic environmental factors as the wind. In addition, this robot has to tailor its tactics simultaneously under the rules of either game. This double complexity of the external dynamics and reward function sometimes becomes more complex when both the multiple dynamics and multiple reward functions switch implicitly, as in the situation of a real (multi-agent) game of tennis where one player cannot observe the intention of her opponents or her partner. The robot must consider its opponent's and its partner's unobservable behavioral goals (reward function). In this article, we address how an RL agent should be designed to handle such double complexity of dynamics and reward. We have previously proposed modular selection and identification for control (MOSAIC) to cope with nonstationary dynamics where appropriate controllers are selected and learned among many candidates based on the error of its paired dynamics predictor: the forward model. Here we extend this framework for RL and propose MOSAIC-MR architecture. It resembles MOSAIC in spirit and selects and learns an appropriate RL controller based on the RL controller's TD error using the errors of the dynamics (the forward model) and the reward predictors. Furthermore, unlike other MOSAIC variants for RL, RL controllers are not a priori paired with the fixed predictors of dynamics and rewards. The simulation results demonstrate that MOSAIC-MR outperforms other counterparts because of this flexible association ability among RL controllers, forward models, and reward predictors.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2002) 14 (6): 1347–1369.
Published: 01 June 2002
Abstract
View article
PDF
We propose a modular reinforcement learning architecture for nonlinear, nonstationary control tasks, which we call multiple model-based reinforcement learning (MMRL). The basic idea is to decompose a complex task into multiple domains in space and time based on the predictability of the environmental dynamics. The system is composed of multiple modules, each of which consists of a state prediction model and a reinforcement learning controller. The “responsibility signal,” which is given by the softmax function of the prediction errors, is used to weight the outputs of multiple modules, as well as to gate the learning of the prediction models and the reinforcement learning controllers. We formulate MMRL for both discrete-time, finite-state case and continuous-time, continuous-state case. The performance of MMRL was demonstrated for discrete case in a nonstationary hunting task in a grid world and for continuous case in a nonlinear, nonstationary control task of swinging up a pendulum with variable physical parameters.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2001) 13 (10): 2201–2220.
Published: 01 October 2001
Abstract
View article
PDF
Humans demonstrate a remarkable ability to generate accurate and appropriate motor behavior under many different and often uncertain environmental conditions. We previously proposed a new modular architecture, the modular selection and identification for control (MOSAIC) model, for motor learning and control based on multiple pairs of forward (predictor) and inverse (controller) models. The architecture simultaneously learns the multiple inverse models necessary for control as well as how to select the set of inverse models appropriate for a given environment. It combines both feedforward and feedback sensorimotor information so that the controllers can be selected both prior to movement and subsequently during movement. This article extends and evaluates the MOSAIC architecture in the following respects. The learning in the architecture was implemented by both the original gradient-descent method and the expectation-maximization (EM) algorithm. Unlike gradient descent, the newly derived EM algorithm is robust to the initial starting conditions and learning parameters. Second, simulations of an object manipulation task prove that the architecture can learn to manipulate multiple objects and switch between them appropriately. Moreover, after learning, the model shows generalization to novel objects whose dynamics lie within the polyhedra of already learned dynamics. Finally, when each of the dynamics is associated with a particular object shape, the model is able to select the appropriate controller before movement execution. When presented with a novel shape-dynamic pairing, inappropriate activation of modules is observed followed by on-line correction.