The official code for the ICCV-2021 paper "Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates".

Overview

SpeechDrivesTemplates

The official repo for the ICCV-2021 paper "Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates".

[arxiv / video]

Our paper and this repo focus on upper-body pose generation from audio. To synthesize images from poses, please refer to this Pose2Img repo.

  • Code
  • Model
  • Data preparation

Package Hierarchy

|-- config
|     |-- default.py
|     |-- voice2pose_s2g_speech2gesture.yaml        # baseline: speech2gesture
|     |-- voice2pose_sdt_vae_speech2gesture.yaml    # ours (VAE)
|     |-- pose2pose_speech2gesture.yaml             # gesture reconstruction  
|     `-- voice2pose_sdt_bp_speech2gesture.yaml     # ours (Backprop)
|
|-- core
|     |-- datasets
|     |-- netowrks
|     |-- pipelines
|     \-- utils
|
|-- dataset
|     \-- speech2gesture  # create a soft link here
|
|-- output
|     \-- <date-config-tag>  # A directory for each experiment
|
`-- main.py

Setup the Dataset

Datasets shuold be placed in the dataset directory. Just create a soft link like this:

ln -s <path-to-SPEECH2GESTURE-dataset> ./dataset/speech2gesture

For your own dataset, you need to implement a subclass of torch.utils.data.Dataset in core/datasets/custom_dataset.py.

Train

Train a Model from Scratch

python main.py --config_file configs/voice2pose_sdt_bp_speech2gesture.yaml \
    --tag DEV \
    SYS.NUM_WORKERS 32
  • --tag set the name of the experiment which wil be displayed in the outputfile.
  • You can overwrite the any parameters defined in voice2pose_default.py by simply adding it at the end of the command. The example above set SYS.NUM_WORKERS to 32 temporarily.

Resume Training from an Interrupted Experiment

python main.py --config_file configs/voice2pose_sdt_bp_speech2gesture.yaml \
    --resume_from <checkpoint-to-continue-from>
  • This command will load the state_dict from the checkpoint for both the model and the optimizer, and write results to the original directory that the checkpoint lies in.

Training from a pretrained model

python main.py --config_file configs/voice2pose_sdt_bp_speech2gesture.yaml \
    --pretrain_from <checkpoint-to-continue-from> \
    --tag DEV
  • This command will only load the state_dict for the model, and write results to a new base directory.

Test

To test the model, run this command:

python main.py --config_file configs/voice2pose_sdt_bp_speech2gesture.yaml \
    --tag DEV \
    --test-only \
    --checkpoint <path-to-checkpoint>

Demo

python main.py --config_file configs/voice2pose_sdt_bp_speech2gesture.yaml \
    --tag <DEV> \
    --demo_input <audio.wav> \
    --checkpoint <path-to-checkpoint> \
    DATASET.SPEAKER oliver \
    SYS.VIDEO_FORMAT "['mp4']"

Important Details

Dataset caching

We turn on dataset caching (DATASET.CACHING) by default to speed up training.

If you encounter errors in the dataloader like RuntimeError: received 0 items of ancdata, please increase ulimit by running the command ulimit -n 262144. (refer to this issue)

DataParallel and DistributedDataParallel

We use single GPU (warpped by DataParallel) by default since it is fast enough with dataset caching. For multi-GPU training, we recommand using DistributedDataParallel (DDP) because it provide SyncBN across GPU cards. To enable DDP, set SYS.DISTRIBUTED to True and set SYS.WORLD_SIZE according to the number of GPUs.

When using DDP, assure that the batch_size can be divided exactly by SYS.WORLD_SIZE.

Misc

  • To run any module other than the main files in the root directory, for example the core\datasets\speech2gesture.py file, you should run python -m core.datasets.speech2gesture rather than python core\datasets\speech2gesture.py. This is an interesting problem of Python's relative importing which deserves in-depth thinking.
  • We save a checkpoint and conduct validation after each epoch. You can change the interval in the config file.
  • We generate and save 2 videos in each epoch when training. During validation, we sample 8 videos for each epoch. These videos are saved in tensorborad (without sound) and mp4 (with sound). You can change the SYS.VIDEO_FORMAT parameter to select one or two of them.
  • We usually sett NUM_WORKERS to 32 for best performance. If you encounter any error about memory, try lower NUM_WORKERS.
@inproceedings{qian2021speech,
  title={Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates},
  author={Qian, Shenhan and Tu, Zhi and Zhi, YiHao and Liu, Wen and Gao, Shenghua},
  journal={International Conference on Computer Vision (ICCV)},
  year={2021}
}
Owner
Qian Shenhan
Qian Shenhan
This is a real life mario project using python and mediapipe

real-life-mario This is a real life mario project using python and mediapipe How to run to run this just run - realMario.py file requirements This req

Programminghut 42 Dec 22, 2022
Code for paper "Role-based network embedding via structural features reconstruction with degree-regularized constraint"

Role-based network embedding via structural features reconstruction with degree-regularized constraint Train python main.py --dataset brazil-flights

wang zhang 1 Jun 28, 2022
CTPN + DenseNet + CTC based end-to-end Chinese OCR implemented using tensorflow and keras

简介 基于Tensorflow和Keras实现端到端的不定长中文字符检测和识别 文本检测:CTPN 文本识别:DenseNet + CTC 环境部署 sh setup.sh 注:CPU环境执行前需注释掉for gpu部分,并解开for cpu部分的注释 Demo 将测试图片放入test_images

Yang Chenguang 2.6k Dec 29, 2022
BD-ALL-DIGIT - This Is Bangladeshi All Sim Cloner Tools

BANGLADESHI ALL SIM CLONER TOOLS INSTALL TOOL ON TERMUX $ apt update $ apt upgra

MAHADI HASAN AFRIDI 2 Jan 19, 2022
Official code for ROCA: Robust CAD Model Retrieval and Alignment from a Single Image (CVPR 2022)

ROCA: Robust CAD Model Alignment and Retrieval from a Single Image (CVPR 2022) Code release of our paper ROCA. Check out our video, paper, and website

123 Dec 25, 2022
Using python libraries to track hands

Python-HandTracking Using python libraries to track hands on a camera Uses cv2 and mediapipe libraries custom hand tracking module PyCharm IDE Final E

Martin Matsudaira 1 Dec 17, 2021
A tensorflow implementation of EAST text detector

EAST: An Efficient and Accurate Scene Text Detector Introduction This is a tensorflow re-implementation of EAST: An Efficient and Accurate Scene Text

2.9k Jan 02, 2023
Give a solution to recognize MaoYan font.

猫眼字体识别 该 github repo 在于帮助xjtlu的同学们识别猫眼的扭曲字体。已经打包上传至 pypi ,可以使用 pip 直接安装。 猫眼字体的识别不出来的原理与解决思路在采茶上 使用方法: import MaoYanFontRecognize

Aruix 4 Jun 30, 2022
An Implementation of the FOTS: Fast Oriented Text Spotting with a Unified Network

FOTS: Fast Oriented Text Spotting with a Unified Network Introduction This is a pytorch re-implementation of FOTS: Fast Oriented Text Spotting with a

GeorgeJoe 171 Aug 04, 2022
MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition

MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition Python 2.7 Python 3.6 MORAN is a network with rectification mechanism for

Canjie Luo 595 Dec 27, 2022
Awesome anomaly detection in medical images

A curated list of awesome anomaly detection works in medical imaging, inspired by the other awesome-* initiatives.

Kang Zhou 57 Dec 19, 2022
Fast style transfer

faststyle Faststyle aims to provide an easy and modular interface to Image to Image problems based on feature loss. Install Making sure you have a wor

Lucas Vazquez 21 Mar 11, 2022
Introduction to image processing, most used and popular functions of OpenCV

👀 OpenCV 101 Introduction to image processing, most used and popular functions of OpenCV go here.

Vusal Ismayilov 3 Jul 02, 2022
Thresholding-and-masking-using-OpenCV - Image Thresholding is used for image segmentation

Image Thresholding is used for image segmentation. From a grayscale image, thresholding can be used to create binary images. In thresholding we pick a threshold T.

Grace Ugochi Nneji 3 Feb 15, 2022
Fun program to overlay a mask to yourself using a webcam

Superhero Mask Overlay Description Simple project made for fun. It consists of placing a mask (a PNG image with transparent background) on your face.

KB Kwan 10 Dec 01, 2022
A simple component to display annotated text in Streamlit apps.

Annotated Text Component for Streamlit A simple component to display annotated text in Streamlit apps. For example: Installation First install Streaml

Thiago Teixeira 312 Dec 30, 2022
ERQA - Edge Restoration Quality Assessment

ERQA - a full-reference quality metric designed to analyze how good image and video restoration methods (SR, deblurring, denoising, etc) are restoring real details.

MSU Video Group 27 Dec 17, 2022
Code for the "Sensing leg movement enhances wearable monitoring of energy expenditure" paper.

EnergyExpenditure Code for the "Sensing leg movement enhances wearable monitoring of energy expenditure" paper. Additional data for replicating this s

Patrick S 42 Oct 26, 2022
Fully-automated scripts for collecting AI-related papers

AI-Paper-Collector Web demo: https://ai-paper-collector.vercel.app/ (recommended) Colab notebook: here Motivation Fully-automated scripts for collecti

772 Dec 30, 2022
A python screen recorder for low-end computers, provides high quality video output.

RecorderX - v1.0 A screen recorder made in Python with the help of OpenCv, it has ability to record your screen in high quality. No matter what your P

Priyanshu Jindal 4 Nov 10, 2021