## Abstract

Model-based control has great potential for use in real robots due to its high sampling efficiency. Nevertheless, dealing with physical contacts and generating accurate motions are inevitable for practical robot control tasks, such as precise manipulation. For a real-time, model-based approach, the difficulty of contact-rich tasks that requires precise movement lies in the fact that a model needs to accurately predict forthcoming contact events within a limited length of time rather than detect them afterward with sensors. Therefore, in this study, we investigate whether and how neural network models can learn a task-related model useful enough for model-based control, that is, a model predicting future states, including contact events. To this end, we propose a structured neural network model predictive control (SNN-MPC) method, whose neural network architecture is designed with explicit inertia matrix representation. To train the proposed network, we develop a two-stage modeling procedure for contact-rich dynamics from a limited number of samples. As a contact-rich task, we take up a trackball manipulation task using a physical 3-DoF finger robot. The results showed that the SNN-MPC outperformed MPC with a conventional fully connected network model on the manipulation task.

## 1 Introduction

The application of reinforcement learning or optimal control methods using (deep) neural networks to robot control has been a popular research topic (Gu, Holly, Lillicrap, & Levine, 2017; Levine, Pastor, Krizhevsky, Ibarz, & Quillen, 2018). In particular, model-based policy learning has great potential for use in physical robots due to its sample efficiency (Deisenroth, Fox, & Rasmussen, 2015; Lenz, Knepper, & Saxena, 2015; Nagabandi, Kahn, Fearing, & Levine, 2018). However, most of the previous studies have evaluated their methods in either simulated environments (Pan & Theodorou, 2014; Gupta et al., 2020), real environments but without any external contacts (Deisenroth & Rasmussen, 2011; Gamboa Higuera, Meger, & Dudek, 2018; Lutter, Listmann, & Peters, 2019), or real environments with external contact but for tasks that do not require very precise movements (Deisenroth et al., 2015; Lenz et al., 2015). Conversely, dealing with external contacts and generating accurate motions are inevitable for real robot control tasks, such as precise manipulation. Particularly for a real-time, model-based approach, contact-rich tasks requiring precise movement pose a fundamentally difficult problem (Fazeli, Kolbert, Tedrake, & Rodriguez, 2017). Concretely, a model needs to accurately predict forthcoming contact events within a limited length of time rather than detect them afterward with sensors. Furthermore, we need to train such accurate models from a limited number of training samples because acquiring data from a physical system is costly and time-consuming. Therefore, in the study we present in this letter, we develop both a neural network structure for modeling contact-rich dynamics and a sample-efficient training procedure for the network.

As a sample efficient training procedure, we develop a two-stage model learning method that utilizes human demonstration for contact-rich tasks. We train a contact-free dynamics model using random movement data in the first stage. In the second stage, we sample contact-rich data by making the robot track human-demonstrated contact-rich trajectories using the derived model-based controller from the first stage. We then used these movement data sampled through the trajectory tracking to train a model that implicitly includes contact dynamics (see Figure 1b).

We adopt model predictive control (MPC) as an online model-based control method to generate real-time robot movements using the neural network model. The models used in previous neural network–based MPC (NN-MPC) studies vary in both input domain and architecture; for example, they used camera images with convolutional neural networks (Lenz et al., 2015; Finn & Levine, 2017; Zhang et al., 2019) or joint angles and velocities with a fully connected network (Nagabandi et al., 2018). We propose to use the joint angles and velocities to model system dynamics and introduce physics-based structural constraints into the network architecture. Our structured NN-MPC (SNN-MPC) combines MPC with the structured neural network for dynamics modeling.

Introducing physical knowledge of rigid-body dynamics into neural network models has also been explored. Díaz Ledezma and Haddadin (2017) embedded a robotic system's kinematic structure into a neural network. They focused on learning an inverse dynamics model and did not apply their technique for model-based control. Lutter et al. (2019) proposed representing the Lagrangian mechanics as the neural network structure and applied the learned model to a real pendulum control. However, they did not consider contact forces from the environment. Also, since the Lagrangian formulation provides the inverse dynamics, the inertia matrix needs to be inverted to derive the forward model. Gupta, Menda, Manchester, and Kochenderfer (2020) additionally introduced a network to predict dissipative forces such as joint friction, while Lutter et al. (2019) needed to measure these forces, but evaluations were limited in simulation environments. Zhong, Dey, and Chakraborty (2021) proposed an approach to incorporate a contact model into energy-conserving neural networks that take advantage of physical knowledge of Lagrangian (Lutter et al., 2019) or Hamiltonian mechanics (Greydanus, Dzamba, & Yosinski, 2019; Finzi, Wang, & Wilson, 2020) with adopting an idea introduced in Agrawal et al. (2019). To activate the contact model, they needed to detect contact events. Moreover, in their evaluations, they focused on simulating target dynamics rather than controlling mechanical systems.

In our approach, we design a network to represent the forward dynamics to use with MPC. Inverse calculation for the inertia matrix is not necessary. Furthermore, we take into account the existence of the contact forces in our network structure. An interesting feature of our approach is that we do not need to explicitly detect the contact event to derive the contact forces. The proposed network rather predicts the contact forces from the current state and control variables. As a contact-rich task for experimental evaluation, we employed a trackball manipulation task using a real 3-DoF finger robot. Results show that the manipulation performance of the SNN-MPC, which has the constrained network models with explicit inertia-matrix representation, was better than that of a conventional fully connected network model. We also evaluated control performance with different control periods to clarify the importance of fast calculation of the network model, that is, the importance of a small control time step, for the precise manipulation task.

The contributions of our study are following:

We introduced physics-based structural constraints into the network architecture for the online and real-time MPC calculation. In our network model, we implicitly took into account the existence of the contact dynamics.

A two-stage network training procedure for our SNN-MPC was developed to learn a forward dynamics model for a contact-rich task efficiently.

We applied our proposed approach to the contact-rich trackball manipulation task using a real 3-Dof finger robot. Our SNN-MPC method outperformed MPC with a conventional fully connected network model.

The rest of this letter is organized as follows. Section 2 provides the background of the NN-MPC framework. Section 3 introduces the proposed SNN-MPC framework and its implementation. Section 4 explains the experimental system and procedure for evaluating the proposed methods. Section 5 demonstrates the performance of the network. Section 6 discusses the findings and summarizes the study.

## 2 Using Model Predictive Control with Neural Networks

This section summarizes the background of the neural-network-based model predictive control (NN-MPC) framework. Here we use MPC to derive the optimal control signal using a dynamics model at each time step.

An MPC derives optimal control sequence under the system dynamics, equation 2.1, at each time step, with initial state $x0$ set to the currently observed state. Once the optimization is completed, we apply only $u0*$, which is the first element of $U0*$, to the system. When a new state is observed, the same procedure is repeated.

However, one cannot analytically derive the optimal control sequence, equation 2.3, under nonlinear system dynamics in general. Here we adopted an iterative linear quadratic regulator (iLQR; Li & Todorov, 2004) to approximately derive the optimal control sequence. In iLQR, the dynamics and the cost function are linearly and quadratically approximated. Since it efficiently computes the control sequence, iLQR is widely used in the MPC studies (Tassa, Erez, & Smart, 2008; Tassa, Erez, & Todorov, 2012).

To derive the linear system and the quadratic cost function for iLQR calculation, one has to compute the derivatives^{1} of dynamics ($fx$, $fu$) and cost functions ($lx$, $lu$, $lxx$, $lxu$, and $luu$). When a neural network is used as a dynamics model, the derivatives of dynamics can be computed on the network. The computation technique for those derivatives is provided in detail in section 3.3.

## 3 Proposed Method

This section introduces our structured NN-MPC (SNN-MPC) framework, which uses a neural network model with an explicit representation of the inertia matrix. We also explain how we train the neural network model in our SNN-MPC framework with the two-stage network training strategy.

### 3.1 Structured Neural Network for Modeling Robot Dynamics

### 3.2 Two-Stage Model Identification in Contact-Rich Environments

To train an accurate task-related dynamics model in a contact-rich environment, it is essential to have training data (i.e., robot motion data) recorded with appropriate contact force that militates to the system. However, acquiring such motion data with contact is often difficult on physical systems. Thus, we developed a two-stage model identification method in contact-rich environments (see Figure 1b):

In the first stage, we sampled the robot movement data by applying random torque command to the robot. Then we trained a

*contact-free*model using these random movements of the robot system.For the second stage, we first prepared a variety of

*contact-rich*reference trajectories by kinesthetic teaching by a human operator. Then we controlled the robot to track the reference trajectories using SNN-MPC with the dynamics model acquired in the first stage. Finally, we trained a model that implicitly includes contact dynamics using movement data sampled through this tracking.

### 3.3 Deriving Controllers Using an Acquired Network Model

To derive an optimal control sequence $U*$ with iLQR, we have to compute the derivatives of system dynamics ($fx$ and $fu$) and the cost functions ($lx$, $lxx$, $lxu$, and $luu$). The cost functions $l$ are subject to our design; hence, we can make them analytically differentiable. Let us now consider the derivatives of the system dynamics. Although Carpentier and Mansard (2018) proposed a method to calculate analytical derivatives of a ridged-body dynamics model, this algorithm cannot be used with a contact model. If contact modeling is necessary and the number of DoF becomes larger, analytically differentiating a multiridged-body dynamics model (i.e., deriving $fx$ and $fu$) tends to be impractical. In such a case, using numerical differentiation is the standard approach. However, for a many-DoF system, the numerical differentiation process constitutes the most time-consuming part of the computation of optimal control signals (Erez et al., 2013). On the other hand, modeling the system dynamics using a neural network allows us to analytically compute the derivatives using backpropagation. Here, we assume that our neural network represents a function that has similar smoothness to the real dynamical system. With that assumption, by minimizing the error between the output of the network and that of the real system, we can expect that the error between the gradients of the network model and the real one will also be decreased. We empirically validated this assumption through evaluating the model-based control performances.

The discrete-time system dynamics $f$ in equation 2.1 is computed by numerical integration of the generalized acceleration $q\xa8$, and the derivatives of $f$ are computed using the derivatives of $q\xa8=g(q,q\u02d9,u)$ defined in equation 3.1. Here, $g$ is approximated by a neural network; thus, we can analytically compute the derivatives of $g$ (i.e., $gx$ and $gu$) via backpropagation on the neural network instead of time-consuming numerical differentiation. For brevity, we concatenate $gx$ and $gu$ into one matrix $gz$ by defining $z\u2261x\u22a4,u\u22a4\u22a4$. Using backpropagation, we can compute the $i$th row of $gz$, which corresponds to the $i$th element of $q\xa8$, at once. We first set the initial gradient vector $gq\xa8,i$ on the output layer (i.e., $q\xa8$) to satisfy $gq\xa8,i,j=\delta i,j$, where $gq\xa8,i,j$ is the $j$th element of $gq\xa8,i$ and $\delta i,j$ is the Kronecker delta. We then compute the gradient vector $\u2202q\xa8i/\u2202z$ by backpropagating this initial gradient vector from the output layer to the input layer ($x$ and $u$), where $q\xa8i$ denote the $i$th element of $q\xa8$. By computing $\u2202q\xa8i/\u2202z$ for all $i$, we get $gz$, hence $gx$ and $gu$.

Here, there are a couple of caveats to implement a faster iLQR program with a neural network dynamics model:

Compute the derivatives of $g$ in a batch of the whole nominal trajectory using matrix multiplication rather than computing them for each time step in a loop.

Because the computation of $\u2202q\xa8i/\u2202z$ is mutually independent with regard to $i$, we can parallelize them.

## 4 Experiments

To evaluate the proposed SNN-MPC approach, we compared the control performance of the SNN-MPC and an NN-MPC method with a fully connected network in a contact-rich driving game task using a 5-Dof system.

### 4.1 System Setup

The system was controlled in hard real time. The state $xt$ composed of the robot joint angles, angular velocities, the trackball 2-D rotation, and rotational velocity was sampled every 10 ms. The SNN-MPC, implemented on the server equipped with two Intel Xeon E5-2690 v4 processors, derived an optimal control signal $u$ for the next time step within 10 ms. This control signal represented the desired torque, and motor drivers converted to the desired current. Note that there were no contact sensors mounted on the system, but the $F$-network implicitly represents contact dynamics.

### 4.2 Training Dynamics Models

We implemented the structured neural network introduced in section 3.1 using the Chainer deep learning framework (Tokui, Oono, Hido, & Clayton, 2015). In this experiment, we applied additional constraints on the $M$-network by the nature of our robot-and-trackball system:

The robot and the trackball are physically unconnected; thus, the inertia matrix is separated into two submatrices, and the remaining elements are zero.

The trackball is a spherical rigid body and is geometrically symmetric; thus, the diagonal elements of the trackball submatrix take the same value.

To train the neural network described above, we recorded the robot motion trajectories composed of tuples of $q$, $q\u02d9$, $u$, and $q\xa8$. We adopted our two-stage network training procedure introduced in section 3.2.

In the first stage, we sampled the movement data while applying random control sequences to the robot to acquire a contact-less model. We used the zero-order hold for 10 time steps. Concretely, we sampled the reference torque every 10 time steps (100 ms) from a normal distribution and fed the sampled signal to the robot for 10 time steps. We determined the variance of the normal distribution for generating random control sequences regarding the torque limits of the motors.

In the second stage, this contact-less model was used to control the robot around the contact surface. We handcrafted reference joint trajectories in which the robot and the trackball kept touch by manually moving the robot. We obtained a contact-rich trajectory data set by controlling the robot to follow these joint trajectories. By learning from this data set, the neural network ended up representing the contact-rich model.

We collected the training data sets only for 10 minutes (60000 data points) in both training stages. Eighty percent of these data sets were used for training, while the remaining 20% were used for validation. Note that we collected those movement data sets without considering the car driving task (explained in the subsequent section). This means the parameters of the network model were not specially tuned for the given task.

To compare the task performances between the proposed structured network and the conventional fully connected network, we trained various-sized networks with 1, 2, 3, 5, and 8 hidden layers, each of which had 20 to 500 hidden units. Note that some of these, such as the 8-layer 500-unit network, were omitted because the iLQR optimization on such large networks violated the 10 ms limitation of the MPC time step. For the car driving task, the best performance was achieved by a structured neural network model that had one layer with 20 units in each $M$-, $C$-, and $F$-network. The fully connected network that achieved the best performance among them had one layer and 75 units. Note that the fully connected networks with two hidden layers or more could not learn the dynamics model from the limited training data set. We used the above network models for the comparisons in the result section.

### 4.3 The Car Driving Task

We evaluated our real-time, model-based control method with the neural network models on the car driving game task (see Figure 3). The horizontal position of the car in the game was controlled by the lateral rotation of the trackball, while the car was moved with constant speed for the vertical direction. The task duration was 24 s. The yellow line at the center of the course in the figure is the desired trajectory for the car.

To evaluate how the control frequency affects the performance of MPC, we conducted the experiments with three different control time steps: $\Delta t=10,20$, and 30 ms. In this task, we set the time horizon for MPC as $Tmpc\xd7\Delta t=500$ ms. Note that the MPC horizon $Tmpc$ was adjusted according to the step size $\Delta t$. Since the task duration was 24 s ($=24,000$ ms), the total number of steps for one trial was $Ttrial=24,000/\Delta t$, that is, $Ttrial=2400,1200$, and 800 for the trials with the time steps $\Delta t=10,20$, and 30 ms, respectively.

### 4.4 Experimental Evaluation

## 5 Results

We evaluated both the proposed SNN-MPC and the standard NN-MPC techniques with a fully connected network on the car driving game. We conducted the driving game using MPCs with each network model for 15 trials.

### 5.1 Task Performance Evaluation

Figure 4b shows the trajectory tracking error for each trial. Here, we omitted the result with the 30 ms time step because the task completion rates were very low. We observe that our SNN-MPC outperformed the fully connected network. With a 20 ms time step, the SNN-MPC failed in three trials, showing three outliers in the box plot. The error is much larger for the fully connected network.

We performed a two-way ANOVA on these results. We observed significant effects on both network type ($p<10-3$) and time step ($p<10-3$). We also noticed a relationship between the network type and the time step ($p=0.005$). This indicates that the performance of the fully connected network deteriorated more rapidly than the structured neural network as the time step increased.

Figure 4c shows the total cost score in equation 4.5 for both networks tested with 10 ms time steps because, from Figure 4b, they showed reasonable control performances only with the shortest time step of 10 ms. The SNN-MPC method with a 10 ms control time step achieved the best median performance score (35.2). It outperformed the NN-MPC method with a fully connected network, which failed to follow the course in two trials out of 15 and whose median score was 48.9.

### 5.2 Prediction Performances of Learned Network Models

Figure 6c shows the average 500 ms trajectory prediction error for 14 different trajectories. The prediction error of the structured neural network is smaller than that of the fully connected network. These results support that the performance of the SNN-MPC method was better than that of the fully connected one because it modeled task dynamics more accurately.

## 6 Conclusion

In this letter, we proposed the structured neural network model predictive control (SNN-MPC) technique and the two-stage training procedure to model contact-rich task-related dynamics. The structured network model implicitly predicted the contact dynamics while explicitly represented the inertial matrix.

In the driving game experiments, the SNN-MPC outperformed the standard NN-MPC method with a fully connected network. This result is possibly attributed to the fact that unconstrained, fully connected networks tend to have redundant representations with deeper network structures and are susceptible to overfitting when trained with limited data sets. Furthermore, shorter control time steps resulted in performances superior to those with the lower-frequency ones. We showed that the structured neural network achieved a smaller trajectory prediction error than the fully connected network, which reflects the accuracy of the modeled dynamics. This result also validated the effectiveness of our approach to introduce the structural constraint into the network architecture.

If a robot system is equipped with reliable contact sensors, we can possibly combine our network model with the contact model proposed by Zhong et al. (2021) to further improve the prediction performance. We empirically showed that such physics-based structural constraints for neural networks could improve robot control. Further research on the theoretical validity of such constraints is necessary.

## Conflict of Interest

The authors declare no competing financial interests.

## Note

^{1}

We use vector subscripts to denote partial derivatives: $fx\u2261\u2202f/\u2202x$ and $lxx\u2261\u22022l/\u2202x2$.

## Acknowledgments

This research was supported by JSPS KAKENHI grants JP16H06565 and JP19J20669 and also partially supported by the New Energy and Industrial Technology Development Organization (NEDO) JPNP20006, JST-Mirai Program Grant JPMJMI18B8 and JPMJI21B1, MIC-SCOPE (182107105), Tateishi Science and Technology Foundation, and ImPACT Program of Council for Science, Technology and Innovation (Cabinet Office, Government of Japan).

## References

*Advances in neural information processing systems, 32*

*Proceedings of the Robotics: Science and Systems XIV.*

*IEEE Transactions on Pattern Analysis and Machine Intelligence*

*Proceedings of the 28th International Conference on Machine Learning*

*Proceedings of the IEEE-RAS 17th International Conference on Humanoid Robotics*

*Proceedings of the 13th IEEE-RAS International Conference on Humanoid Robots*

*International Journal of Robotics Research*

*Proceedings of the IEEE International Conference on Robotics and Automation*

*Advances in neural information processing systems*

*Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems*

*Advances in neural information processing systems, 32*

*Proceedings of the IEEE International Conference on Robotics and Automation*

*Proceedings of the Second Annual Conference on Learning for Dynamics and Control*

*Proceedings of the Robotics: Science and Systems XI.*

*International Journal of Robotics Research*

*Proceedings of the First International Conference on Informatics in Control, Automation and Robotics*

*Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems*

*Proceedings of the IEEE International Conference on Robotics and Automation.*

*Advances in neural information processing systems*

*Advances in neural information processing systems*

*Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems*

*Proceedings of the Workshop on Machine Learning Systems in the 29th Annual Conference on Neural Information Processing Systems*

*Proceedings of the 36th International Conference on Machine Learning*

*A differentiable contact model to extend Lagrangian and Hamiltonian neural networks for modeling hybrid dynamics*