Paper ReviewDeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills

October 25, 2020
Simulation

📖 Link to the Paper - DeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills
Peng, Xue Bin, Pieter Abbeel, Sergey Levine, and Michiel van de Panne. "Deepmimic: Example-guided deep reinforcement learning of physics-based character skills." ACM Transactions on Graphics (TOG) 37, no. 4 (2018): 1-14.
💡 Link to Project Webstie - DeepMimic

Main Contribution

The research problem in this paper is physics-based character animation. In animation, it’s important to generate physically realistic modelling of motions in dynamic actions involving humans and animals. Inspired by the success of Reinforcement Learning (RL) in motion control, the authors aim to future enhance the quality of learned controllers by incorporating reference animation data.

The main contributions include the proposal of a physics-based character animation system allowing goal-directed reinforcement learning with data. The reference is supplied in the form of motion capture clips or keyframes handcrafted by a technical artist, making this system versatile and practical.

Method

The proposed system takes a character model, a reference motion (i.e jump, jog, run), and a task (i.e. striking/throwing a target, running towards a direction) defined by the reward function as input. It then outputs a policy that enables the character to reproduce the given reference motions while satisfying the task objective. Compared to prior works, I think the concept of completing “a task” and combining it with motion imitation is a novel technique from this work. DeepMimic has presented this technique to create a controller that drives natural and realistic animation characters using reinforcement learning.

The reward contains two terms, the first term is from the imitation objective encouraging the character to mimic the reference motion; the second term is from the task objective encouraging the completion of the task. The policy is modelled with a neural network with two standard fully-connected layers, and additionally convolutional layers along with a fully-connected layer if the task requires a heightmap. In training, two key techniques are applied to tackle some extremely challenging motion scenarios, such as backflip, which are Reference State Initialization (RSI) and Early Termination (ET).

In discussing RSI, the authors give a vivid example of performing a backflip, which in training is very receptive to the initial conditions at takeoff. Thus, the authors use RS to initialize the state sampled randomly from the reference motion so that the system can find the high rewards state gradually. In ET, when the character falls into a bad state without the hope of recovering, this training episode will be terminated early to avoid collecting void data, which mitigates the class imbalance problem in this setting.

Future Work

The paper discusses multi-clip integration for even more complex tasks, multiple policies are learnt by training multiple neural networks individually and then linearly blended at runtime. This seems a little counter-intuitive. I wonder if jointly training a single policy for the composition of motion clips could be done instead such that the transition between motions is smoothly learnt. Besides, this work only supports a small number of clips as admitted in the paper, scaling up to a large number of clips could be a possible extension of this paper.

In this work, the authors proposed a data-driven RL method utilizing reference motion data. As we can tell, the ability to combine motion imitation and task-related demands is a key point, where the task is encoded in the task objective in the reward function. I am curious to understand more about how the authors specify the tasks at hand, such as by its complexity (simple enough) or commonality (common enough). For example, why we are interested in striking/throwing a target where you could control the characters to pick up, carry and collect the target.

Another extension is to incorporate pose estimation to be used in pre-processing to extract kinematic reference motion so that the model can read directly from video sequences in the wild. This can help eliminate the human-authored keyframes or kinematic motion reference.