Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
TocHeadingTitle
Date
Availability
1-4 of 4
Jun Morimoto
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Neural Computation (2022) 34 (2): 360–377.
Published: 14 January 2022
Abstract
View articletitled, Implicit Contact Dynamics Modeling With Explicit Inertia Matrix Representation for Real-Time, Model-Based Control in Physical Environment
View
PDF
for article titled, Implicit Contact Dynamics Modeling With Explicit Inertia Matrix Representation for Real-Time, Model-Based Control in Physical Environment
Model-based control has great potential for use in real robots due to its high sampling efficiency. Nevertheless, dealing with physical contacts and generating accurate motions are inevitable for practical robot control tasks, such as precise manipulation. For a real-time, model-based approach, the difficulty of contact-rich tasks that requires precise movement lies in the fact that a model needs to accurately predict forthcoming contact events within a limited length of time rather than detect them afterward with sensors. Therefore, in this study, we investigate whether and how neural network models can learn a task-related model useful enough for model-based control, that is, a model predicting future states, including contact events. To this end, we propose a structured neural network model predictive control (SNN-MPC) method, whose neural network architecture is designed with explicit inertia matrix representation. To train the proposed network, we develop a two-stage modeling procedure for contact-rich dynamics from a limited number of samples. As a contact-rich task, we take up a trackball manipulation task using a physical 3-DoF finger robot. The results showed that the SNN-MPC outperformed MPC with a conventional fully connected network model on the manipulation task.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2013) 25 (6): 1512–1547.
Published: 01 June 2013
FIGURES
| View All (15)
Abstract
View articletitled, Efficient Sample Reuse in Policy Gradients with Parameter-Based Exploration
View
PDF
for article titled, Efficient Sample Reuse in Policy Gradients with Parameter-Based Exploration
The policy gradient approach is a flexible and powerful reinforcement learning method particularly for problems with continuous actions such as robot control. A common challenge is how to reduce the variance of policy gradient estimates for reliable policy updates. In this letter, we combine the following three ideas and give a highly effective policy gradient method: (1) policy gradients with parameter-based exploration, a recently proposed policy search method with low variance of gradient estimates; (2) an importance sampling technique, which allows us to reuse previously gathered data in a consistent way; and (3) an optimal baseline, which minimizes the variance of gradient estimates with their unbiasedness being maintained. For the proposed method, we give a theoretical analysis of the variance of gradient estimates and show its usefulness through extensive experiments.
Journal Articles
Reinforcement Learning State Estimator
UnavailablePublisher: Journals Gateway
Neural Computation (2007) 19 (3): 730–756.
Published: 01 March 2007
Abstract
View articletitled, Reinforcement Learning State Estimator
View
PDF
for article titled, Reinforcement Learning State Estimator
In this study, we propose a novel use of reinforcement learning for estimating hidden variables and parameters of nonlinear dynamical systems. A critical issue in hidden-state estimation is that we cannot directly observe estimation errors. However, by defining errors of observable variables as a delayed penalty, we can apply a reinforcement learning frame-work to state estimation problems. Specifically, we derive a method to construct a nonlinear state estimator by finding an appropriate feedback input gain using the policy gradient method. We tested the proposed method on single pendulum dynamics and show that the joint angle variable could be successfully estimated by observing only the angular velocity, and vice versa. In addition, we show that we could acquire a state estimator for the pendulum swing-up task in which a swing-up controller is also acquired by reinforcement learning simultaneously. Furthermore, we demonstrate that it is possible to estimate the dynamics of the pendulum itself while the hidden variables are estimated in the pendulum swing-up task. Application of the proposed method to a two-linked biped model is also presented.
Journal Articles
Robust Reinforcement Learning
UnavailablePublisher: Journals Gateway
Neural Computation (2005) 17 (2): 335–359.
Published: 01 February 2005
Abstract
View articletitled, Robust Reinforcement Learning
View
PDF
for article titled, Robust Reinforcement Learning
This letter proposes a new reinforcement learning (RL) paradigm that explicitly takes into account input disturbance as well as modeling errors. The use of environmental models in RL is quite popular for both off-line learning using simulations and for online action planning. However, the difference between the model and the real environment can lead to unpredictable, and often unwanted, results. Based on the theory of H ∞ control, we consider a differential game in which a “disturbing” agent tries to make the worst possible disturbance while a “control” agent tries to make the best control input. The problem is formulated as finding a min-max solution of a value function that takes into account the amount of the reward and the norm of the disturbance. We derive online learning algorithms for estimating the value function and for calculating the worst disturbance and the best control in reference to the value function. We tested the paradigm, which we call robust reinforcement learning (RRL), on the control task of an inverted pendulum. In the linear domain, the policy and the value function learned by online algorithms coincided with those derived analytically by the linear H ∞ control theory. For a fully nonlinear swing-up task, RRL achieved robust performance with changes in the pendulum weight and friction, while a standard reinforcement learning algorithm could not deal with these changes. We also applied RRL to the cart-pole swing-up task, and a robust swing-up policy was acquired.