This is an official implementation for "DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation"

Overview

DeciWatch: A Simple Baseline for 10× Efficient 2D and 3D Pose Estimation

This repo is the official implementation of "DeciWatch: A Simple Baseline for 10× Efficient 2D and 3D Pose Estimation". [Paper] [Project]

Update

  • Clean version is released! It currently includes code, data, log and models for the following tasks:
  • 2D human pose estimation
  • 3D human pose estimation
  • Body recovery via a SMPL model

TODO

  • Provide different sample interval checkpoints/logs
  • Add DeciWatch in MMHuman3D

Description

This paper proposes a simple baseline framework for video-based 2D/3D human pose estimation that can achieve 10 times efficiency improvement over existing works without any performance degradation, named DeciWatch. Unlike current solutions that estimate each frame in a video, DeciWatch introduces a simple yet effective sample-denoise-recover framework that only watches sparsely sampled frames, taking advantage of the continuity of human motions and the lightweight pose representation. Specifically, DeciWatch uniformly samples less than 10% video frames for detailed estimation, denoises the estimated 2D/3D poses with an efficient Transformer architecture, and then accurately recovers the rest of the frames using another Transformer-based network. Comprehensive experimental results on three video-based human pose estimation, body mesh recovery tasks and efficient labeling in videos with four datasets validate the efficiency and effectiveness of DeciWatch.

Getting Started

Environment Requirement

DeciWatch has been implemented and tested on Pytorch 1.10.1 with python >= 3.6. It supports both GPU and CPU inference.

Clone the repo:

git clone https://github.com/cure-lab/DeciWatch.git

We recommend you install the requirements using conda:

# conda
source scripts/install_conda.sh

Prepare Data

All the data used in our experiment can be downloaded here.

Google Drive

Baidu Netdisk

Valid data includes:

Dataset Pose Estimator 3D Pose 2D Pose SMPL
Sub-JHMDB SimplePose
3DPW EFT
3DPW PARE
3DPW SPIN
Human3.6M FCN
AIST++ SPIN

Please refer to doc/data.md for detailed data information and data preparing.

Training

Run the commands below to start training:

python train.py --cfg [config file] --dataset_name [dataset name] --estimator [backbone estimator you use] --body_representation [smpl/3D/2D] --sample_interval [sample interval N]

For example, you can train on 3D representation of 3DPW using backbone estimator SPIN with sample interval 10 by:

python train.py --cfg configs/config_pw3d_spin.yaml --dataset_name pw3d --estimator spin --body_representation 3D --sample_interval 10

Note that the training and testing datasets should be downloaded and prepared before training.

You may refer to doc/training.md for more training details.

Evaluation

Results on 2D Pose

Dataset Estimator PCK 0.05 (INPUT/OUTPUT) PCK 0.1 (INPUT/OUTPUT) PCK 0.2 (INPUT/OUTPUT) Download
Sub-JHMDB simplepose 57.30%/79.32% 81.61%/94.27% 93.94%/98.85% Baidu Netdisk / Google Drive

Results on 3D Pose

Dataset Estimator MPJPE (INPUT/OUTPUT) Accel (INPUT/OUTPUT) Download
3DPW SPIN 96.92/93.34 34.68/7.06 Baidu Netdisk / Google Drive
3DPW EFT 90.34/89.02 32.83/6.84 Baidu Netdisk / Google Drive
3DPW PARE 78.98/77.16 25.75/6.90 Baidu Netdisk / Google Drive
AIST++ SPIN 107.26/71.27 33.37/5.68 Baidu Netdisk / Google Drive
Human3.6M FCN 54.56/52.83 19.18/1.47 Baidu Netdisk / Google Drive

Results on SMPL

Dataset Estimator MPJPE (INPUT/OUTPUT) Accel (INPUT/OUTPUT) MPVPE (INPUT/OUTPUT) Download
3DPW SPIN 100.13/97.53 35.53/8.38 114.39/112.84 Baidu Netdisk / Google Drive
3DPW EFT 91.60/92.56 33.57/8.7 5 110.34/109.27 Baidu Netdisk / Google Drive
3DPW PARE 80.44/81.76 26.77/7.24 94.88/95.68 Baidu Netdisk / Google Drive
AIST++ SPIN 108.25/82.10 33.83/7.27 137.51/106.08 Baidu Netdisk / Google Drive

Noted that although our main contribution is the efficiency improvement, using DeciWatch as post processing is also helpful for accuracy and smoothness improvement.

You may refer to doc/evaluate.md for evaluate details.

Quick Demo

Run the commands below to visualize demo:

python demo.py --cfg [config file] --dataset_name [dataset name] --estimator [backbone estimator you use] --body_representation [smpl/3D/2D] --sample_interval [sample interval N]

You are supposed to put corresponding images with the data structure:

|-- data
    |-- videos
        |-- pw3d 
            |-- downtown_enterShop_00
                |-- image_00000.jpg
                |-- ...
            |-- ...
        |-- jhmdb
            |-- catch
            |-- ...
        |-- aist
            |-- gWA_sFM_c01_d27_mWA2_ch21.mp4
            |-- ...
        |-- ...

For example, you can train on 3D representation of 3DPW using backbone estimator SPIN with sample interval 10 by:

python demo.py --cfg configs/config_pw3d_spin.yaml --dataset_name pw3d --estimator spin --body_representation 3D --sample_interval 10

Please refer to the dataset website for the raw images. You may change the config in lib/core/config.py for different visualization parameters.

You may refer to doc/visualize.md for visualization details.

Citing DeciWatch

If you find this repository useful for your work, please consider citing it as follows:

@article{zeng2022deciwatch,
  title={DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation},
  author={Zeng, Ailing and Ju, Xuan and Yang, Lei and Gao, Ruiyuan and Zhu, Xizhou and Dai, Bo and Xu, Qiang},
  journal={arXiv preprint arXiv:2203.08713},
  year={2022}
}

Please remember to cite all the datasets and backbone estimators if you use them in your experiments.

Acknowledgement

Many thanks to Xuan Ju for her great efforts to clean almost the original code!!!

License

This code is available for non-commercial scientific research purposes as defined in the LICENSE file. By downloading and using this code you agree to the terms in the LICENSE. Third-party datasets and software are subject to their respective licenses.

Fuzzy Overclustering (FOC)

Fuzzy Overclustering (FOC) In real-world datasets, we need consistent annotations between annotators to give a certain ground-truth label. However, in

2 Nov 08, 2022
Domain Adaptation with Invariant RepresentationLearning: What Transformations to Learn?

Domain Adaptation with Invariant RepresentationLearning: What Transformations to Learn? Repository Structure: DSAN |└───amazon |    └── dataset (Amazo

DMIRLAB 17 Jan 04, 2023
Capture all information throughout your model's development in a reproducible way and tie results directly to the model code!

Rubicon Purpose Rubicon is a data science tool that captures and stores model training and execution information, like parameters and outcomes, in a r

Capital One 97 Jan 03, 2023
Deep Structured Instance Graph for Distilling Object Detectors (ICCV 2021)

DSIG Deep Structured Instance Graph for Distilling Object Detectors Authors: Yixin Chen, Pengguang Chen, Shu Liu, Liwei Wang, Jiaya Jia. [pdf] [slide]

DV Lab 31 Nov 17, 2022
Code for Domain Adaptive Video Segmentation via Temporal Consistency Regularization in ICCV 2021

Domain Adaptive Video Segmentation via Temporal Consistency Regularization Updates 08/2021: check out our domain adaptation for sematic segmentation p

36 Dec 12, 2022
Implementation of "Generalizable Neural Performer: Learning Robust Radiance Fields for Human Novel View Synthesis"

Generalizable Neural Performer: Learning Robust Radiance Fields for Human Novel View Synthesis Abstract: This work targets at using a general deep lea

163 Dec 14, 2022
An exploration of log domain "alternative floating point" for hardware ML/AI accelerators.

This repository contains the SystemVerilog RTL, C++, HLS (Intel FPGA OpenCL to wrap RTL code) and Python needed to reproduce the numerical results in

Facebook Research 373 Dec 31, 2022
IMBENS: class-imbalanced ensemble learning in Python.

IMBENS: class-imbalanced ensemble learning in Python. Links: [Documentation] [Gallery] [PyPI] [Changelog] [Source] [Download] [知乎/Zhihu] [中文README] [a

Zhining Liu 176 Jan 04, 2023
CowHerd is a partially-observed reinforcement learning environment

CowHerd is a partially-observed reinforcement learning environment, where the player walks around an area and is rewarded for milking cows. The cows try to escape and the player can place fences to h

Danijar Hafner 6 Mar 06, 2022
A new version of the CIDACS-RL linkage tool suitable to a cluster computing environment.

Fully Distributed CIDACS-RL The CIDACS-RL is a brazillian record linkage tool suitable to integrate large amount of data with high accuracy. However,

Robespierre Pita 5 Nov 04, 2022
Reference implementation for Structured Prediction with Deep Value Networks

Deep Value Network (DVN) This code is a python reference implementation of DVNs introduced in Deep Value Networks Learn to Evaluate and Iteratively Re

Michael Gygli 55 Feb 02, 2022
PyTorch implementation of adversarial patch

adversarial-patch PyTorch implementation of adversarial patch This is an implementation of the Adversarial Patch paper. Not official and likely to hav

Jamie Hayes 172 Nov 29, 2022
An implementation of the proximal policy optimization algorithm

PPO Pytorch C++ This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch. It uses a simple TestEnvironment t

Martin Huber 59 Dec 09, 2022
Implementation of paper "Towards a Unified View of Parameter-Efficient Transfer Learning"

A Unified Framework for Parameter-Efficient Transfer Learning This is the official implementation of the paper: Towards a Unified View of Parameter-Ef

Junxian He 216 Dec 29, 2022
Computer Vision Paper Reviews with Key Summary of paper, End to End Code Practice and Jupyter Notebook converted papers

Computer-Vision-Paper-Reviews Computer Vision Paper Reviews with Key Summary along Papers & Codes. Jonathan Choi 2021 The repository provides 100+ Pap

Jonathan Choi 2 Mar 17, 2022
​TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.

TextWorld A text-based game generator and extensible sandbox learning environment for training and testing reinforcement learning (RL) agents. Also ch

Microsoft 983 Dec 23, 2022
免费获取http代理并生成proxifier配置文件

freeproxy 免费获取http代理并生成proxifier配置文件 公众号:台下言书 工具说明:https://mp.weixin.qq.com/s?__biz=MzIyNDkwNjQ5Ng==&mid=2247484425&idx=1&sn=56ccbe130822aa35038095317

说书人 32 Mar 25, 2022
Modular Probabilistic Programming on MXNet

MXFusion | | | | Tutorials | Documentation | Contribution Guide MXFusion is a modular deep probabilistic programming library. With MXFusion Modules yo

Amazon 100 Dec 10, 2022
Lunar is a neural network aimbot that uses real-time object detection accelerated with CUDA on Nvidia GPUs.

Lunar Lunar is a neural network aimbot that uses real-time object detection accelerated with CUDA on Nvidia GPUs. About Lunar can be modified to work

Zeyad Mansour 276 Jan 07, 2023
🌊 Online machine learning in Python

In a nutshell River is a Python library for online machine learning. It is the result of a merger between creme and scikit-multiflow. River's ambition

OnlineML 4k Jan 02, 2023