2D Human Pose estimation using transformers. Implementation in Pytorch

Overview

PE-former: Pose Estimation Transformer

Vision transformer architectures perform very well for image classification tasks. Efforts to solve more challenging vision tasks with transformers rely on convolutional backbones for feature extraction.

POTR is a pure transformer architecture (no CNN backbone) for 2D body pose estimation. It uses an encoder-decoder architecture with a vision transformer as an encoder and a transformer decoder (derived from DETR).

You can use the code in this repository to train and evaluate different POTR configurations on the COCO dataset.

Model

POTR is based on building blocks derived from recent SOTA models. As shown in the figure there are two major components: A Visual Transformer encoder, and a Transformer decoder.

model

The input image is initially converted into tokens following the ViT paradigm. A position embedding is used to help retain the patch-location information. The tokens and the position embedding are used as input to transformer encoder. The transformed tokens are used as the memory input of the transformer decoder. The inputs of the decoder are M learned queries. For each query the network will produce a joint prediction. The output tokens from the transformer decoder are passed through two heads (FFNs).

  • The first is a classification head used to predict the joint type (i.e class) of each query.
  • The second is a regression head that predicts the normalized coordinates (in the range [0,1]) of the joint in the input image.

Predictions that do not correspond to joints are mapped to a "no object" class.

Acknowledgements

The code in this repository is based on the following:

Thank you!

Preparing

Create a python venv and install all the dependencies:

python -m venv pyenv
source pyenv/bin/activate
pip install -r requirements.txt

Training

Here are some CLI examples using the lit_main.py script.

Training POTR with a deit_small encoder, patch size of 16x16 pixels and input resolution 192x256:

python lit_main.py --vit_arch deit_deit_small --patch_size 16 --batch_size 42 --input_size 192 256 --hidden_dim 384 --vit_dim 384 --gpus 1 --num_workers 24

POTR with Xcit_small_p16 encoder:

 python lit_main.py --vit_arch xcit_small_12_p16 --batch_size 42 --input_size 288 384 --hidden_dim 384 --vit_dim 384 --gpus 1 --num_workers 24   --vit_weights https://dl.fbaipublicfiles.com/xcit/xcit_small_12_p16_384_dist.pth

POTR with the ViT as Backbone (VAB) configuration:

 python lit_main.py --vit_as_backbone --vit_arch resnet50 --batch_size 42 --input_size 192 256 --hidden_dim 384 --vit_dim 384 --gpus 1 --position_embedding learned_nocls --num_workers 16 --num_queries 100 --dim_feedforward 1536 --accumulate_grad_batches 1

Baseline that uses a resnet50 (pretrained with dino) as an encoder:

 python lit_main.py --vit_arch resnet50 --patch_size 16 --batch_size 42 --input_size 192 256 --hidden_dim 384 --vit_dim 384 --gpus 1 --num_workers 24 --vit_weights https://dl.fbaipublicfiles.com/dino/dino_resnet50_pretrain/dino_resnet50_pretrain.pth --position_embedding learned_nocls

Check the lit_main.py cli arguments for a complete list.

python lit_main.py --help

Evaluation

Evaluate a trained model using the evaluate.py script.

For example to evaluate POTR with an xcit_small_12_p8 encoder:

python evaluate.py --vit_arch xcit_small_12_p8 --patch_size 8 --batch_size 42 --input_size 192 256 --hidden_dim 384 --vit_dim 384  --position_embedding enc_xcit --num_workers 16 --num_queries 100 --dim_feedforward 1536 --init_weights paper_experiments/xcit_small12_p8_dino_192_256_paper/checkpoints/checkpoint-epoch\=065-AP\=0.736.ckpt --use_det_bbox

Evaluate POTR with a deit_small encoder:

 python evaluate.py --vit_arch deit_deit_small --patch_size 16 --batch_size 42 --input_size 192 256 --hidden_dim 384 --vit_dim 384 --num_workers 24 --init_weights lightning_logs/version_0/checkpoints/checkpoint-epoch\=074-AP\=0.622.ckpt  --use_det_bbox

Set the argument of --init_weights to your model's checkpoint.

Model Zoo

name input params AP AR url
POTR-Deit-dino-p8 192x256 36.4M 70.6 78.1 model
POTR-Xcit-p16 288x384 40.6M 70.2 77.4 model
POTR-Xcit-dino-p16 288x384 40.6M 70.7 77.9 model
POTR-Xcit-dino-p8 192x256 40.5M 71.6 78.7 model
POTR-Xcit-dino-p8 288x384 40.5M 72.6 79.4 model

Check the experiments folder for configuration files and evaluation results.

All trained models and tensorboard training logs can be downloaded from this drive folder.

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Owner
Panteleris Paschalis
Panteleris Paschalis
Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX

CQL-JAX This repository implements Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX (FLAX). Implementation is built on

Karush Suri 8 Nov 07, 2022
NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.

NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.

880 Jan 07, 2023
CCAFNet: Crossflow and Cross-scale Adaptive Fusion Network for Detecting Salient Objects in RGB-D Images

Code and result about CCAFNet(IEEE TMM) 'CCAFNet: Crossflow and Cross-scale Adaptive Fusion Network for Detecting Salient Objects in RGB-D Images' IEE

zyrant丶 14 Dec 29, 2021
[CVPR 2022 Oral] Crafting Better Contrastive Views for Siamese Representation Learning

Crafting Better Contrastive Views for Siamese Representation Learning (CVPR 2022 Oral) 2022-03-29: The paper was selected as a CVPR 2022 Oral paper! 2

249 Dec 28, 2022
Immortal tracker

Immortal_tracker Prerequisite Our code is tested for Python 3.6. To install required liabraries: pip install -r requirements.txt Waymo Open Dataset P

74 Dec 03, 2022
PyTorch code for our ECCV 2020 paper "Single Image Super-Resolution via a Holistic Attention Network"

HAN PyTorch code for our ECCV 2020 paper "Single Image Super-Resolution via a Holistic Attention Network" This repository is for HAN introduced in the

五维空间 140 Nov 23, 2022
LSTM built using Keras Python package to predict time series steps and sequences. Includes sin wave and stock market data

LSTM Neural Network for Time Series Prediction LSTM built using the Keras Python package to predict time series steps and sequences. Includes sine wav

Jakob Aungiers 4.1k Jan 02, 2023
RTSeg: Real-time Semantic Segmentation Comparative Study

Real-time Semantic Segmentation Comparative Study The repository contains the official TensorFlow code used in our papers: RTSEG: REAL-TIME SEMANTIC S

Mennatullah Siam 592 Nov 18, 2022
Implementation for "Conditional entropy minimization principle for learning domain invariant representation features"

Implementation for "Conditional entropy minimization principle for learning domain invariant representation features". The code is reproduced from thi

1 Nov 02, 2022
A PyTorch implementation of DenseNet.

A PyTorch Implementation of DenseNet This is a PyTorch implementation of the DenseNet-BC architecture as described in the paper Densely Connected Conv

Brandon Amos 771 Dec 15, 2022
Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechanism

Period-alternatives-of-Softmax Experimental Demo for our paper 'Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechani

slwang9353 0 Sep 06, 2021
SMPL-X: A new joint 3D model of the human body, face and hands together

SMPL-X: A new joint 3D model of the human body, face and hands together [Paper Page] [Paper] [Supp. Mat.] Table of Contents License Description News I

Vassilis Choutas 1k Jan 09, 2023
Official implementation of the PICASO: Permutation-Invariant Cascaded Attentional Set Operator

PICASO Official PyTorch implemetation for the paper PICASO:Permutation-Invariant Cascaded Attentive Set Operator. Requirements Python 3 torch = 1.0 n

Samira Zare 0 Dec 23, 2021
A modified version of DeepMind's Alphafold2 to divide CPU part (MSA and template searching) and GPU part (prediction model)

ParallelFold Author: Bozitao Zhong This is a modified version of DeepMind's Alphafold2 to divide CPU part (MSA and template searching) and GPU part (p

Bozitao Zhong 77 Dec 22, 2022
The repository includes the code for training cell counting applications. (Keras + Tensorflow)

cell_counting_v2 The repository includes the code for training cell counting applications. (Keras + Tensorflow) Dataset can be downloaded here : http:

Weidi 113 Oct 06, 2022
Differentiable scientific computing library

xitorch: differentiable scientific computing library xitorch is a PyTorch-based library of differentiable functions and functionals that can be widely

98 Dec 26, 2022
Code for our work "Activation to Saliency: Forming High-Quality Labels for Unsupervised Salient Object Detection".

A2S-USOD Code for our work "Activation to Saliency: Forming High-Quality Labels for Unsupervised Salient Object Detection". Code will be released upon

15 Dec 16, 2022
BLEURT is a metric for Natural Language Generation based on transfer learning.

BLEURT: a Transfer Learning-Based Metric for Natural Language Generation BLEURT is an evaluation metric for Natural Language Generation. It takes a pa

Google Research 492 Jan 05, 2023
Official implementation for the paper "Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D Object Detection"

Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D Object Detection PyTorch code release of the paper "Attentive Prototypes for Sour

Deepti Hegde 23 Oct 17, 2022
Implementation of Hierarchical Transformer Memory (HTM) for Pytorch

Hierarchical Transformer Memory (HTM) - Pytorch Implementation of Hierarchical Transformer Memory (HTM) for Pytorch. This Deepmind paper proposes a si

Phil Wang 63 Dec 29, 2022