This is an official implementation for "DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation"

Last update: Dec 24, 2022

Overview

DeciWatch: A Simple Baseline for 10× Efficient 2D and 3D Pose Estimation

This repo is the official implementation of "DeciWatch: A Simple Baseline for 10× Efficient 2D and 3D Pose Estimation". [Paper] [Project]

Update

Clean version is released! It currently includes code, data, log and models for the following tasks:
2D human pose estimation
3D human pose estimation
Body recovery via a SMPL model

TODO

Provide different sample interval checkpoints/logs
Add DeciWatch in MMHuman3D

Description

This paper proposes a simple baseline framework for video-based 2D/3D human pose estimation that can achieve 10 times efficiency improvement over existing works without any performance degradation, named DeciWatch. Unlike current solutions that estimate each frame in a video, DeciWatch introduces a simple yet effective sample-denoise-recover framework that only watches sparsely sampled frames, taking advantage of the continuity of human motions and the lightweight pose representation. Specifically, DeciWatch uniformly samples less than 10% video frames for detailed estimation, denoises the estimated 2D/3D poses with an efficient Transformer architecture, and then accurately recovers the rest of the frames using another Transformer-based network. Comprehensive experimental results on three video-based human pose estimation, body mesh recovery tasks and efficient labeling in videos with four datasets validate the efficiency and effectiveness of DeciWatch.

Getting Started

Environment Requirement

DeciWatch has been implemented and tested on Pytorch 1.10.1 with python >= 3.6. It supports both GPU and CPU inference.

Clone the repo:

git clone https://github.com/cure-lab/DeciWatch.git

We recommend you install the requirements using conda:

# conda
source scripts/install_conda.sh

Prepare Data

All the data used in our experiment can be downloaded here.

Google Drive

Baidu Netdisk

Valid data includes:

Dataset	Pose Estimator	3D Pose	2D Pose	SMPL
Sub-JHMDB	SimplePose		✔
3DPW	EFT	✔		✔
3DPW	PARE	✔		✔
3DPW	SPIN	✔		✔
Human3.6M	FCN	✔
AIST++	SPIN	✔		✔

Please refer to doc/data.md for detailed data information and data preparing.

Training

Run the commands below to start training:

python train.py --cfg [config file] --dataset_name [dataset name] --estimator [backbone estimator you use] --body_representation [smpl/3D/2D] --sample_interval [sample interval N]

For example, you can train on 3D representation of 3DPW using backbone estimator SPIN with sample interval 10 by:

python train.py --cfg configs/config_pw3d_spin.yaml --dataset_name pw3d --estimator spin --body_representation 3D --sample_interval 10

Note that the training and testing datasets should be downloaded and prepared before training.

You may refer to doc/training.md for more training details.

Evaluation

Results on 2D Pose

Dataset	Estimator	PCK 0.05 (INPUT/OUTPUT)	PCK 0.1 (INPUT/OUTPUT)	PCK 0.2 (INPUT/OUTPUT)	Download
Sub-JHMDB	simplepose	57.30%/79.32%	81.61%/94.27%	93.94%/98.85%	Baidu Netdisk / Google Drive

Results on 3D Pose

Dataset	Estimator	MPJPE (INPUT/OUTPUT)	Accel (INPUT/OUTPUT)	Download
3DPW	SPIN	96.92/93.34	34.68/7.06	Baidu Netdisk / Google Drive
3DPW	EFT	90.34/89.02	32.83/6.84	Baidu Netdisk / Google Drive
3DPW	PARE	78.98/77.16	25.75/6.90	Baidu Netdisk / Google Drive
AIST++	SPIN	107.26/71.27	33.37/5.68	Baidu Netdisk / Google Drive
Human3.6M	FCN	54.56/52.83	19.18/1.47	Baidu Netdisk / Google Drive

Results on SMPL

Dataset	Estimator	MPJPE (INPUT/OUTPUT)	Accel (INPUT/OUTPUT)	MPVPE (INPUT/OUTPUT)	Download
3DPW	SPIN	100.13/97.53	35.53/8.38	114.39/112.84	Baidu Netdisk / Google Drive
3DPW	EFT	91.60/92.56	33.57/8.7 5	110.34/109.27	Baidu Netdisk / Google Drive
3DPW	PARE	80.44/81.76	26.77/7.24	94.88/95.68	Baidu Netdisk / Google Drive
AIST++	SPIN	108.25/82.10	33.83/7.27	137.51/106.08	Baidu Netdisk / Google Drive

Noted that although our main contribution is the efficiency improvement, using DeciWatch as post processing is also helpful for accuracy and smoothness improvement.

You may refer to doc/evaluate.md for evaluate details.

Quick Demo

Run the commands below to visualize demo:

python demo.py --cfg [config file] --dataset_name [dataset name] --estimator [backbone estimator you use] --body_representation [smpl/3D/2D] --sample_interval [sample interval N]

You are supposed to put corresponding images with the data structure:

|-- data
    |-- videos
        |-- pw3d 
            |-- downtown_enterShop_00
                |-- image_00000.jpg
                |-- ...
            |-- ...
        |-- jhmdb
            |-- catch
            |-- ...
        |-- aist
            |-- gWA_sFM_c01_d27_mWA2_ch21.mp4
            |-- ...
        |-- ...

For example, you can train on 3D representation of 3DPW using backbone estimator SPIN with sample interval 10 by:

python demo.py --cfg configs/config_pw3d_spin.yaml --dataset_name pw3d --estimator spin --body_representation 3D --sample_interval 10

Please refer to the dataset website for the raw images. You may change the config in lib/core/config.py for different visualization parameters.

You may refer to doc/visualize.md for visualization details.

Citing DeciWatch

If you find this repository useful for your work, please consider citing it as follows:

@article{zeng2022deciwatch,
  title={DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation},
  author={Zeng, Ailing and Ju, Xuan and Yang, Lei and Gao, Ruiyuan and Zhu, Xizhou and Dai, Bo and Xu, Qiang},
  journal={arXiv preprint arXiv:2203.08713},
  year={2022}
}

Please remember to cite all the datasets and backbone estimators if you use them in your experiments.

Acknowledgement

Many thanks to Xuan Ju for her great efforts to clean almost the original code!!!

License

This code is available for non-commercial scientific research purposes as defined in the LICENSE file. By downloading and using this code you agree to the terms in the LICENSE. Third-party datasets and software are subject to their respective licenses.

This is an official implementation for "DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation"

Related tags

Overview

DeciWatch: A Simple Baseline for 10× Efficient 2D and 3D Pose Estimation

Update

TODO

Description

Getting Started

Environment Requirement

Prepare Data

Training

Evaluation

Quick Demo

Citing DeciWatch

Acknowledgement

License

Owner

sssegmentation is a general framework for our research on strongly supervised semantic segmentation.

NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework

Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch

A toolset for creating Qualtrics-based IAT experiments

Official implementation of the paper Label-Efficient Semantic Segmentation with Diffusion Models

Pytorch implementation of TailCalibX : Feature Generation for Long-tail Classification

Semi-supervised semantic segmentation needs strong, varied perturbations

CrossMLP - The repository offers the official implementation of our BMVC 2021 paper (oral) in PyTorch.

PyExplainer: A Local Rule-Based Model-Agnostic Technique (Explainable AI)

Use VITS and Opencpop to develop singing voice synthesis; Maybe it will VISinger.

YoloV5 implemented by TensorFlow2 , with support for training, evaluation and inference.

Multiwavelets-based operator model

Pytorch implementation of CVPR2020 paper “VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation”

Generalized Data Weighting via Class-level Gradient Manipulation

PyTorch implementations of the paper: "Learning Independent Instance Maps for Crowd Localization"

Text to Image Generation with Semantic-Spatial Aware GAN

A light and fast one class detection framework for edge devices. We provide face detector, head detector, pedestrian detector, vehicle detector......

The most simple and minimalistic navigation dashboard.

Run containerized, rootless applications with podman

WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU