This is the official PyTorch implementation of the CVPR 2020 paper "TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting".

Last update: Dec 11, 2022

Overview

TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting

Project Page | YouTube | Paper

This is the official PyTorch implementation of the CVPR 2020 paper "TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting".

Environment

conda install pytorch torchvision cudatoolkit=<your cuda version>
conda install pyyaml scikit-image scikit-learn opencv
pip install -r requirements.txt

Data

Mixamo

Mixamo is a synthesized 3D character animation dataset.

Download mixamo data here.
Extract under data/mixamo

For directions for downloading 3D Mixamo data please refer to this link.

SoloDance

SoloDance is a collection of dancing videos on youtube. We use DensePose to extract skeleton sequences from these videos for training.

Download the extracted skeleton sequences here.
Extract under data/solo_dance

The original videos can be downloaded here.

Preprocessing

run sh scripts/preprocess.sh to preprocess the two datasets above.

Pretrained model

Download the pretrained models here.

Inference

For Skeleton Extraction, please consider using a pose estimation library such as Detectron2. We require the input skeleton sequences to be in the format of a numpy .npy file:
- The file should contain an array with shape 15 x 2 x length.
- The first dimension (15) corresponds the 15 body joint defined here.
- The second dimension (2) corresponds to x and y coordinates.
- The third dimension (length) is the temporal dimension.
For Motion Retargeting Network, we provide the sample command for inference:

python infer_pair.py 
--config configs/transmomo.yaml 
--checkpoint transmomo_mixamo_36_800_24/checkpoints/autoencoder_00200000.pt # replace with actual path
--source a.npy  # replace with actual path
--target b.npy  # replace with actual path
--source_width 1280 --source_height 720 
--target_height 1920 --target_width 1080

For Skeleton-to-Video Rendering, please refer to Everybody Dance Now.

Training

To train the Motion Retargeting Network, run

python train.py --config configs/transmomo.yaml

To train on the SoloDance dataest, run

python train.py --config configs/transmomo_solo_dance.yaml

Testing

For testing motion retargeting MSE, first generate the motion-retargeted motions with

python test.py
--config configs/transmomo.yaml # replace with the actual config used for training
--checkpoint transmomo_mixamo_36_800_24/checkpoints/autoencoder_00200000.pt
--out_dir transmomo_mixamo_36_800_24_results # replace actual path to output directory

And then compute MSE by

python scripts/compute_mse.py 
--in_dir transmomo_mixamo_36_800_24_results # replace with the previous output directory

Project Structure

transmomo.pytorch
├── configs - configuration files
├── data - place for storing data
├── docs - documentations
├── lib
│   ├── data.py - datasets and dataLoaders
│   ├── networks - encoders, decoders, discriminators, etc.
│   ├── trainer.py - training pipeline
│   ├── loss.py - loss functions
│   ├── operation.py - operations, e.g. rotation, projection, etc.
│   └── util - utility functions
├── out - place for storing output
├── infer_pair.py - perform motion retargeting
├── render_interpolate.py - perform motion and body interpolation
├── scripts - scripts for data processing and experiments
├── test.py - test MSE
└── train.py - main entrance for training

TODOs

Detailed documentation
Add example files
Release in-the-wild dancing video dataset (unannotated)
Tool for visualizing Mixamo test error
Tool for converting keypoint formats

Citation

Z. Yang*, W. Zhu*, W. Wu*, C. Qian, Q. Zhou, B. Zhou, C. C. Loy. "TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting." IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020. (* indicates equal contribution.)

BibTeX:

@inproceedings{transmomo2020,
  title={TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting},
  author={Yang, Zhuoqian and Zhu, Wentao and Wu, Wayne and Qian, Chen and Zhou, Qiang and Zhou, Bolei and Loy, Chen Change},
  booktitle={The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2020}
}

Acknowledgement

This repository is partly based on Rundi Wu's Learning Character-Agnostic Motion for Motion Retargeting in 2D and Xun Huang's MUNIT: Multimodal UNsupervised Image-to-image Translation. The skeleton-to-rendering part is based on Everybody Dance Now. We sincerely thank them for their inspiration and contribution to the community.

This is the official PyTorch implementation of the CVPR 2020 paper "TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting".

Related tags

Overview

TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting

Project Page | YouTube | Paper

Environment

Data

Mixamo

SoloDance

Preprocessing

Pretrained model

Inference

Training

Testing

Project Structure

TODOs

Citation

Acknowledgement

Owner

Zhuoqian Yang

A PyTorch Toolbox for Face Recognition

Auxiliary Raw Net (ARawNet) is a ASVSpoof detection model taking both raw waveform and handcrafted features as inputs, to balance the trade-off between performance and model complexity.

A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB, or simply to separate onnx files to any size you want.

Predictive AI layer for existing databases.

Relative Uncertainty Learning for Facial Expression Recognition

MoveNet Single Pose on OpenVINO

Experiments on continual learning from a stream of pretrained models.

A variational Bayesian method for similarity learning in non-rigid image registration (CVPR 2022)

Elegy is a framework-agnostic Trainer interface for the Jax ecosystem.

frida工具的缝合怪

Deep Markov Factor Analysis (NeurIPS2021)

U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection

DECA: Detailed Expression Capture and Animation (SIGGRAPH 2021)

PyTorch Implementation of ECCV 2020 Spotlight TuiGAN: Learning Versatile Image-to-Image Translation with Two Unpaired Images

Re-implememtation of MAE (Masked Autoencoders Are Scalable Vision Learners) using PyTorch.

[ICCV 2021] Deep Hough Voting for Robust Global Registration

The Hailo Model Zoo includes pre-trained models and a full building and evaluation environment

Localizing Visual Sounds the Hard Way

My tensorflow implementation of "A neural conversational model", a Deep learning based chatbot

IndoNLI: A Natural Language Inference Dataset for Indonesian