Official implementation of MSR-GCN (ICCV 2021 paper)

Last update: Nov 07, 2022

Overview

MSR-GCN

Official implementation of MSR-GCN: Multi-Scale Residual Graph Convolution Networks for Human Motion Prediction (ICCV 2021 paper)

[Paper] [Supp] [Poster] [Slides]

Authors

Lingwei Dang, School of Computer Science and Engineering, South China University of Technology, China, [email protected]
Yongwei Nie, School of Computer Science and Engineering, South China University of Technology, China, [email protected]
Chengjiang Long, JD Finance America Corporation, USA, [email protected]
Qing Zhang, School of Computer Science and Engineering, Sun Yat-sen University, China, [email protected]
Guiqing Li, School of Computer Science and Engineering, South China University of Technology, China, [email protected]

Overview

Human motion prediction is a challenging task due to the stochasticity and aperiodicity of future poses. Recently, graph convolutional network (GCN) has been proven to be very effective to learn dynamic relations among pose joints, which is helpful for pose prediction. On the other hand, one can abstract a human pose recursively to obtain a set of poses at multiple scales. With the increase of the abstraction level, the motion of the pose becomes more stable, which benefits pose prediction too. In this paper, we propose a novel multi-scale residual Graph Convolution Network (MSR-GCN) for human pose prediction task in the manner of end-to-end. The GCNs are used to extract features from fine to coarse scale and then from coarse to fine scale. The extracted features at each scale are then combined and decoded to obtain the residuals between the input and target poses. Intermediate supervisions are imposed on all the predicted poses, which enforces the network to learn more representative features. Our proposed approach is evaluated on two standard benchmark datasets, i.e., the Human3.6M dataset and the CMU Mocap dataset. Experimental results demonstrate that our method outperforms the state-of-the-art approaches.

Dependencies

Pytorch 1.7.0+cu110
Python 3.8.5
Nvidia RTX 3090

Get the data

Human3.6m in exponential map can be downloaded from here.

CMU mocap was obtained from the repo of ConvSeq2Seq paper.

About datasets

Human3.6M

A pose in h3.6m has 32 joints, from which we choose 22, and build the multi-scale by 22 -> 12 -> 7 -> 4 dividing manner.
We use S5 / S11 as test / valid dataset, and the rest as train dataset, testing is done on the 15 actions separately, on each we use all data instead of the randomly selected 8 samples.
Some joints of the origin 32 have the same position
The input / output length is 10 / 25

CMU Mocap dataset

A pose in cmu has 38 joints, from which we choose 25, and build the multi-scale by 25 -> 12 -> 7 -> 4 dividing manner.
CMU does not have valid dataset, testing is done on the 8 actions separately, on each we use all data instead of the random selected 8 samples.
Some joints of the origin 38 have the same position
The input / output length is 10 / 25

Train

train on Human3.6M:

python main.py --exp_name=h36m --is_train=1 --output_n=25 --dct_n=35 --test_manner=all
train on CMU Mocap:

python main.py --exp_name=cmu --is_train=1 --output_n=25 --dct_n=35 --test_manner=all

Evaluate and visualize results

evaluate on Human3.6M:

python main.py --exp_name=h36m --is_load=1 --model_path=ckpt/pretrained/h36m_in10out25dctn35_best_err57.9256.pth --output_n=25 --dct_n=35 --test_manner=all
evaluate on CMU Mocap:

python main.py --exp_name=cmu --is_load=1 --model_path=ckpt/pretrained/cmu_in10out25dctn35_best_err37.2310.pth --output_n=25 --dct_n=35 --test_manner=all

Results

H3.6M-10/25/35-all	80	160	320	400	560	1000	-
walking	12.16	22.65	38.65	45.24	52.72	63.05	-
eating	8.39	17.05	33.03	40.44	52.54	77.11	-
smoking	8.02	16.27	31.32	38.15	49.45	71.64	-
discussion	11.98	26.76	57.08	69.74	88.59	117.59	-
directions	8.61	19.65	43.28	53.82	71.18	100.59	-
greeting	16.48	36.95	77.32	93.38	116.24	147.23	-
phoning	10.10	20.74	41.51	51.26	68.28	104.36	-
posing	12.79	29.38	66.95	85.01	116.26	174.33	-
purchases	14.75	32.39	66.13	79.63	101.63	139.15	-
sitting	10.53	21.99	46.26	57.80	78.19	120.02	-
sittingdown	16.10	31.63	62.45	76.84	102.83	155.45	-
takingphoto	9.89	21.01	44.56	56.30	77.94	121.87	-
waiting	10.68	23.06	48.25	59.23	76.33	106.25	-
walkingdog	20.65	42.88	80.35	93.31	111.87	148.21	-
walkingtogether	10.56	20.92	37.40	43.85	52.93	65.91	-
Average	12.11	25.56	51.64	62.93	81.13	114.18	57.93

CMU-10/25/35-all	80	160	320	400	560	1000	-
basketball	10.24	18.64	36.94	45.96	61.12	86.24	-
basketball_signal	3.04	5.62	12.49	16.60	25.43	49.99	-
directing_traffic	6.13	12.60	29.37	39.22	60.46	114.56	-
jumping	15.19	28.85	55.97	69.11	92.38	126.16	-
running	13.17	20.91	29.88	33.37	38.26	43.62	-
soccer	10.92	19.40	37.41	47.00	65.25	101.85	-
walking	6.38	10.25	16.88	20.05	25.48	36.78	-
washwindow	5.41	10.93	24.51	31.79	45.13	70.16	-
Average	8.81	15.90	30.43	37.89	51.69	78.67	37.23

Train

train on Human3.6M: python main.py --expname=h36m --is_train=1 --output_n=25 --dct_n=35 --test_manner=all
train on CMU Mocap: python main.py --expname=cmu --is_train=1 --output_n=25 --dct_n=35 --test_manner=all

Citation

If you use our code, please cite our work

@InProceedings{Dang_2021_ICCV,
    author    = {Dang, Lingwei and Nie, Yongwei and Long, Chengjiang and Zhang, Qing and Li, Guiqing},
    title     = {MSR-GCN: Multi-Scale Residual Graph Convolution Networks for Human Motion Prediction},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {11467-11476}
}

Acknowledgments

Some of our evaluation code and data process code was adapted/ported from LearnTrajDep by Wei Mao.

Licence

MIT

Official implementation of MSR-GCN (ICCV 2021 paper)

Related tags

Overview

MSR-GCN

Authors

Overview

Dependencies

Get the data

About datasets

Train

Evaluate and visualize results

Results

Train

Citation

Acknowledgments

Licence

Owner

LevonDang

Deep Reinforcement Learning based autonomous navigation for quadcopters using PPO algorithm.

Fre-GAN: Adversarial Frequency-consistent Audio Synthesis

The description of FMFCC-A (audio track of FMFCC) dataset and Challenge resluts.

Codes for CIKM'21 paper 'Self-Supervised Graph Co-Training for Session-based Recommendation'.

Face Mesh is a face geometry solution that estimates 468 3D face landmarks in real-time even on mobile devices

Key information extraction from invoice document with Graph Convolution Network

Like ThreeJS but for Python and based on wgpu

Python library for tracking human heads with FLAME (a 3D morphable head model)

We have implemented shaDow-GNN as a general and powerful pipeline for graph representation learning. For more details, please find our paper titled Deep Graph Neural Networks with Shallow Subgraph Samplers, available on arXiv (https//arxiv.org/abs/2012.01380).

We provided a matlab implementation for an evolutionary multitasking AUC optimization framework (EMTAUC).

Awesome Monocular 3D detection

Python implementation of Lightning-rod Agent, the Stack4Things board-side probe

Multi-Task Temporal Shift Attention Networks for On-Device Contactless Vitals Measurement (NeurIPS 2020)

CycleTransGAN-EVC: A CycleGAN-based Emotional Voice Conversion Model with Transformer

Jremesh-tools - Blender addon for quad remeshing

[KDD 2021, Research Track] DiffMG: Differentiable Meta Graph Search for Heterogeneous Graph Neural Networks

The Video-based Accident Detection System built in Python

The Easy-to-use Dialogue Response Selection Toolkit for Researchers

An Efficient Implementation of Analytic Mesh Algorithm for 3D Iso-surface Extraction from Neural Networks

Python scripts for performing object detection with the 1000 labels of the ImageNet dataset in ONNX.