[BMVC2021] "TransFusion: Cross-view Fusion with Transformer for 3D Human Pose Estimation"

Last update: Dec 23, 2022

Overview

TransFusion-Pose

TransFusion: Cross-view Fusion with Transformer for 3D Human Pose Estimation
Haoyu Ma, Liangjian Chen, Deying Kong, Zhe Wang, Xingwei Liu, Hao Tang, Xiangyi Yan, Yusheng Xie, Shih-Yao Lin and Xiaohui Xie
In BMVC 2021
[Paper] [Video]

Overview

We propose the TransFusion, which apply the transformer architecture to multi-view 3D human pose estimation
We propose the Epipolar Field, a novel and more general form of epipolar line. It readily integrates with the transformer through our proposed geometry positional encoding to encode the 3D relationships among different views.
Extensive experiments are conducted to demonstrate that our TransFusion outperforms previous fusion methods on both Human 3.6M and SkiPose datasets, but requires substantially fewer parameters.

Installation

Clone this repo, and we'll call the directory that you cloned multiview-pose as ${POSE_ROOT}

git clone https://github.com/HowieMa/TransFusion-Pose.git

Install dependencies.

pip install -r requirements.txt

Download TransPose models pretrained on COCO.

wget https://github.com/yangsenius/TransPose/releases/download/Hub/tp_r_256x192_enc3_d256_h1024_mh8.pth

You can also download it from the official website of TransPose

Please download them under ${POSE_ROOT}/models, and make them look like this:

${POSE_ROOT}/models
└── pytorch
    └── coco
        └── tp_r_256x192_enc3_d256_h1024_mh8.pth

Data preparation

Human 3.6M

For Human36M data, please follow H36M-Toolbox to prepare images and annotations.

Ski-Pose

For Ski-Pose, please follow the instruction from their website to obtain the dataset.
Once you download the Ski-PosePTZ-CameraDataset-png.zip and ski_centers.csv, unzip them and put into the same folder, named as ${SKI_ROOT}.
Run python data/preprocess_skipose.py ${SKI_ROOT} to format it.

Your folder should look like this:

${POSE_ROOT}
|-- data
|-- |-- h36m
    |-- |-- annot
        |   |-- h36m_train.pkl
        |   |-- h36m_validation.pkl
        |-- images
            |-- s_01_act_02_subact_01_ca_01 
            |-- s_01_act_02_subact_01_ca_02

|-- |-- preprocess_skipose.py
|-- |-- skipose  
    |-- |-- annot
        |   |-- ski_train.pkl
        |   |-- ski_validation.pkl
        |-- images
            |-- seq_103 
            |-- seq_103

Training and Testing

Human 3.6M

# Training
python run/pose2d/train.py --cfg experiments-local/h36m/transpose/256_fusion_enc3_GPE.yaml --gpus 0,1,2,3

# Evaluation (2D)
python run/pose2d/valid.py --cfg experiments-local/h36m/transpose/256_fusion_enc3_GPE.yaml --gpus 0,1,2,3  

# Evaluation (3D)
python run/pose3d/estimate_tri.py --cfg experiments-local/h36m/transpose/256_fusion_enc3_GPE.yaml

Ski-Pose

# Training
python run/pose2d/train.py --cfg experiments-local/skipose/transpose/256_fusion_enc3_GPE.yaml --gpus 0,1,2,3

# Evaluation (2D)
python run/pose2d/valid.py --cfg experiments-local/skipose/transpose/256_fusion_enc3_GPE.yaml --gpus 0,1,2,3

# Evaluation (3D)
python run/pose3d/estimate_tri.py --cfg experiments-local/skipose/transpose/256_fusion_enc3_GPE.yaml

Our trained models can be downloaded from here

Citation

If you find our code helps your research, please cite the paper:

@inproceedings{ma2021transfusion,
  title={TransFusion: Cross-view Fusion with Transformer for 3D Human Pose Estimation},
  author={Ma, Haoyu and Chen, Liangjian and Kong, Deying and Wang, Zhe and Liu, Xingwei and Tang, Hao and Yan, Xiangyi and Xie, Yusheng and Lin, Shih-Yao and Xie, Xiaohui},
  booktitle={British Machine Vision Conference},
  year={2021}
}

[BMVC2021] "TransFusion: Cross-view Fusion with Transformer for 3D Human Pose Estimation"

Related tags

Overview

TransFusion-Pose

Overview

Installation

Data preparation

Human 3.6M

Ski-Pose

Training and Testing

Human 3.6M

Ski-Pose

Citation

Acknowledgement

Owner

Haoyu Ma

Hardware-accelerated DNN model inference ROS2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU

Pytorch implementation of Feature Pyramid Network (FPN) for Object Detection

Company clustering with K-means/GMM and visualization with PCA, t-SNE, using SSAN relation extraction

Evaluating saliency methods on artificial data with different background types

Generating Fractals on Starknet with Cairo

X-VLM: Multi-Grained Vision Language Pre-Training

An index of algorithms for learning causality with data

Deconfounding Temporal Autoencoder: Estimating Treatment Effects over Time Using Noisy Proxies

Stable Neural ODE with Lyapunov-Stable Equilibrium Points for Defending Against Adversarial Attacks

Exemplo de implementação do padrão circuit breaker em python

Text and code for the forthcoming second edition of Think Bayes, by Allen Downey.

Rax is a Learning-to-Rank library written in JAX

Make your first PR. A beginner friendly repository made specifically for open source beginners. Add any program under any language (it can be anything from a simple program to a complex data structure algorithm). Happy coding...

Deep Learning for Natural Language Processing SS 2021 (TU Darmstadt)

TCTrack: Temporal Contexts for Aerial Tracking (CVPR2022)

Brain Tumor Detection with Tensorflow Neural Networks.

AgML is a comprehensive library for agricultural machine learning

Gym for multi-agent reinforcement learning

PoseViz – Multi-person, multi-camera 3D human pose visualization tool built using Mayavi.

Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging