This repository is the offical Pytorch implementation of ContextPose: Context Modeling in 3D Human Pose Estimation: A Unified Perspective (CVPR 2021).

Last update: Nov 21, 2022

Overview

Context Modeling in 3D Human Pose Estimation: A Unified Perspective (CVPR 2021)

Introduction

This repository is the offical Pytorch implementation of ContextPose, Context Modeling in 3D Human Pose Estimation: A Unified Perspective (CVPR 2021). Below is the example pipeline of using ContextPose for 3D pose estimation.

Quick start

Environment

This project is developed using >= python 3.5 on Ubuntu 16.04. NVIDIA GPUs are needed.

Installation

Clone this repo, and we'll call the directory that you cloned as ${ContextPose_ROOT}.
Install dependences.
1. Install pytorch >= v1.4.0 following official instruction.
2. Install other packages. This project doesn't have any special or difficult-to-install dependencies. All installation can be down with:
```
pip install -r requirements.txt
```
Download data following the next section. In summary, your directory tree should be like this

${ContextPose_ROOT}
├── data
├── experiments
├── mvn
├── logs 
├── README.md
├── process_h36m.sh
├── requirements.txt
├── train.py
`── train.sh

Data

Note: We provide the training and evaluation code on Human3.6M dataset. We do NOT provide the source data. We do NOT own the data or have permission to redistribute the data. Please download according to the official instructions.

Human3.6M

Install CDF C Library by following (https://stackoverflow.com/questions/37232008/how-read-common-data-format-cdf-in-python/58167429#58167429), which is neccessary for processing Human3.6M data.
Download and preprocess the dataset by following the instructions in mvn/datasets/human36m_preprocessing/README.md.
To train ContextPose model, you need rough estimations of the pelvis' 3D positions both for train and val splits. In the paper we use the precalculated 3D skeletons estimated by the Algebraic model proposed in learnable-triangulation (which is an opensource repo and we adopt their Volumetric model to be our baseline.) All pretrained weights and precalculated 3D skeletons can be downloaded at once from here and placed to ./data/pretrained. Here, we fine-tuned the pretrained weight on the Human3.6M dataset for another 20 epochs, please download the weight from here and place to ./data/pretrained/human36m.
We provide the limb length mean and standard on the Human3.6M training set, please download from here and place to ./data/human36m/extra.
Finally, your data directory should be like this (for more detailed directory tree, please refer to README.md)

${ContextPose_ROOT}
|-- data
    |-- human36m
    |   |-- extra
    |   |   | -- una-dinosauria-data
    |   |   | -- ...
    |   |   | -- mean_and_std_limb_length.h5
    |   `-- ...
    `-- pretrained
        |-- human36m
            |-- human36m_alg_10-04-2019
            |-- human36m_vol_softmax_10-08-2019
            `-- backbone_weights.pth

Train

Every experiment is defined by .config files. Configs with experiments from the paper can be found in the ./experiments directory. You can use the train.sh script or specifically:

Single-GPU

To train a Volumetric model with softmax aggregation using 1 GPU, run:

python train.py \
  --config experiments/human36m/train/human36m_vol_softmax_single.yaml \
  --logdir ./logs

The training will start with the config file specified by --config, and logs (including tensorboard files) will be stored in --logdir.

Multi-GPU

Multi-GPU training is implemented with PyTorch's DistributedDataParallel. It can be used both for single-machine and multi-machine (cluster) training. To run the processes use the PyTorch launch utility.

To train our model using 4 GPUs on single machine, run:

python -m torch.distributed.launch --nproc_per_node=4 --master_port=2345 --sync_bn\
  train.py  \
  --config experiments/human36m/train/human36m_vol_softmax_single.yaml \
  --logdir ./logs

Evaluation

After training, you can evaluate the model. Inside the same config file, add path to the learned weights (they are dumped to logs dir during training):

model:
    init_weights: true
    checkpoint: {PATH_TO_WEIGHTS}

Single-GPU

Run:

python train.py \
  --eval --eval_dataset val \
  --config experiments/human36m/eval/human36m_vol_softmax_single.yaml \
  --logdir ./logs

Multi-GPU

Using 4 GPUs on single machine, Run:

python -m torch.distributed.launch --nproc_per_node=4 --master_port=2345 \
  train.py  --eval --eval_dataset val \
  --config experiments/human36m/eval/human36m_vol_softmax_single.yaml \
  --logdir ./logs

Argument --eval_dataset can be val or train. Results can be seen in logs directory or in the tensorboard.

Results & Model Zoo

We evaluate ContextPose on two available large benchmarks: Human3.6M and MPI-INF-3DHP.
To get the results reported in our paper, you can download the weights and place to ./logs.

Dataset to be evaluated	Weights	Results
Human3.6M	link	43.4mm (MPJPE)
MPI-INF-3DHP	link	81.5 (PCK), 43.6 (AUC)

For H36M, the main metric is MPJPE (Mean Per Joint Position Error) which is L2 distance averaged over all joints. To get the result, run as stated above.
For 3DHP, Percentage of Correctly estimated Keypoints (PCK) as well as Area Under the Curve (AUC) are reported. Note that we directly apply our model trained on H36M dataset to 3DHP dataset without re-training to evaluate the generalization performance. To prevent from over-fitting to the H36M-style appearance, we only change the training strategy that we fix the backbone to train 20 epoch before we train the whole network end-to-end. If you want to eval on MPI-INF-3DHP, you can save the results and use the official evaluation code in Matlab.

Human3.6M

MPI-INF-3DHP

Citation

If you use our code or models in your research, please cite with:

@article{ma2021context,
  title={Context Modeling in 3D Human Pose Estimation: A Unified Perspective},
  author={Ma, Xiaoxuan and Su, Jiajun and Wang, Chunyu and Ci, Hai and Wang, Yizhou},
  journal={arXiv preprint arXiv:2103.15507},
  year={2021}
}

Acknowledgement

This repo is built on https://github.com/karfly/learnable-triangulation-pytorch. Part of the data are provided by https://github.com/una-dinosauria/3d-pose-baseline.

This repository is the offical Pytorch implementation of ContextPose: Context Modeling in 3D Human Pose Estimation: A Unified Perspective (CVPR 2021).

Related tags

Overview

Context Modeling in 3D Human Pose Estimation: A Unified Perspective (CVPR 2021)

Introduction

Quick start

Environment

Installation

Data

Human3.6M

Train

Single-GPU

Multi-GPU

Evaluation

Single-GPU

Multi-GPU

Results & Model Zoo

Human3.6M

MPI-INF-3DHP

Citation

Acknowledgement

Owner

ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis

A Weakly Supervised Amodal Segmenter with Boundary Uncertainty Estimation

PyTorch Implementation of Unsupervised Depth Completion with Calibrated Backprojection Layers (ORAL, ICCV 2021)

Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

DropNAS: Grouped Operation Dropout for Differentiable Architecture Search

This repository contains the code for the binaural-detection model used in the publication arXiv:2111.04637

Code that accompanies the paper Semi-supervised Deep Kernel Learning: Regression with Unlabeled Data by Minimizing Predictive Variance

Official code release for ICCV 2021 paper SNARF: Differentiable Forward Skinning for Animating Non-rigid Neural Implicit Shapes.

Code for the ICME 2021 paper "Exploring Driving-Aware Salient Object Detection via Knowledge Transfer"

SAMO: Streaming Architecture Mapping Optimisation

Reinforcement-learning - Repository of the class assignment questions for the course on reinforcement learning

Pytorch implementation of ProjectedGAN

Code for Referring Image Segmentation via Cross-Modal Progressive Comprehension, CVPR2020.

Code, pre-trained models and saliency results for the paper "Boosting RGB-D Saliency Detection by Leveraging Unlabeled RGB Images".

Attack on Confidence Estimation algorithm from the paper "Disrupting Deep Uncertainty Estimation Without Harming Accuracy"

Course about deep learning for computer vision and graphics co-developed by YSDA and Skoltech.

Awesome Monocular 3D detection

Pytorch implementation of NeurIPS 2021 paper: Geometry Processing with Neural Fields.

Pytorch implementation of "Training a 85.4% Top-1 Accuracy Vision Transformer with 56M Parameters on ImageNet"

An inofficial PyTorch implementation of PREDATOR based on KPConv.