The official code for the ICCV-2021 paper "Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates".

Overview

SpeechDrivesTemplates

The official repo for the ICCV-2021 paper "Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates".

[arxiv / video]

Our paper and this repo focus on upper-body pose generation from audio. To synthesize images from poses, please refer to this Pose2Img repo.

  • Code
  • Model
  • Data preparation

Package Hierarchy

|-- config
|     |-- default.py
|     |-- voice2pose_s2g_speech2gesture.yaml        # baseline: speech2gesture
|     |-- voice2pose_sdt_vae_speech2gesture.yaml    # ours (VAE)
|     |-- pose2pose_speech2gesture.yaml             # gesture reconstruction  
|     `-- voice2pose_sdt_bp_speech2gesture.yaml     # ours (Backprop)
|
|-- core
|     |-- datasets
|     |-- netowrks
|     |-- pipelines
|     \-- utils
|
|-- dataset
|     \-- speech2gesture  # create a soft link here
|
|-- output
|     \-- <date-config-tag>  # A directory for each experiment
|
`-- main.py

Setup the Dataset

Datasets shuold be placed in the dataset directory. Just create a soft link like this:

ln -s <path-to-SPEECH2GESTURE-dataset> ./dataset/speech2gesture

For your own dataset, you need to implement a subclass of torch.utils.data.Dataset in core/datasets/custom_dataset.py.

Train

Train a Model from Scratch

python main.py --config_file configs/voice2pose_sdt_bp_speech2gesture.yaml \
    --tag DEV \
    SYS.NUM_WORKERS 32
  • --tag set the name of the experiment which wil be displayed in the outputfile.
  • You can overwrite the any parameters defined in voice2pose_default.py by simply adding it at the end of the command. The example above set SYS.NUM_WORKERS to 32 temporarily.

Resume Training from an Interrupted Experiment

python main.py --config_file configs/voice2pose_sdt_bp_speech2gesture.yaml \
    --resume_from <checkpoint-to-continue-from>
  • This command will load the state_dict from the checkpoint for both the model and the optimizer, and write results to the original directory that the checkpoint lies in.

Training from a pretrained model

python main.py --config_file configs/voice2pose_sdt_bp_speech2gesture.yaml \
    --pretrain_from <checkpoint-to-continue-from> \
    --tag DEV
  • This command will only load the state_dict for the model, and write results to a new base directory.

Test

To test the model, run this command:

python main.py --config_file configs/voice2pose_sdt_bp_speech2gesture.yaml \
    --tag DEV \
    --test-only \
    --checkpoint <path-to-checkpoint>

Demo

python main.py --config_file configs/voice2pose_sdt_bp_speech2gesture.yaml \
    --tag <DEV> \
    --demo_input <audio.wav> \
    --checkpoint <path-to-checkpoint> \
    DATASET.SPEAKER oliver \
    SYS.VIDEO_FORMAT "['mp4']"

Important Details

Dataset caching

We turn on dataset caching (DATASET.CACHING) by default to speed up training.

If you encounter errors in the dataloader like RuntimeError: received 0 items of ancdata, please increase ulimit by running the command ulimit -n 262144. (refer to this issue)

DataParallel and DistributedDataParallel

We use single GPU (warpped by DataParallel) by default since it is fast enough with dataset caching. For multi-GPU training, we recommand using DistributedDataParallel (DDP) because it provide SyncBN across GPU cards. To enable DDP, set SYS.DISTRIBUTED to True and set SYS.WORLD_SIZE according to the number of GPUs.

When using DDP, assure that the batch_size can be divided exactly by SYS.WORLD_SIZE.

Misc

  • To run any module other than the main files in the root directory, for example the core\datasets\speech2gesture.py file, you should run python -m core.datasets.speech2gesture rather than python core\datasets\speech2gesture.py. This is an interesting problem of Python's relative importing which deserves in-depth thinking.
  • We save a checkpoint and conduct validation after each epoch. You can change the interval in the config file.
  • We generate and save 2 videos in each epoch when training. During validation, we sample 8 videos for each epoch. These videos are saved in tensorborad (without sound) and mp4 (with sound). You can change the SYS.VIDEO_FORMAT parameter to select one or two of them.
  • We usually sett NUM_WORKERS to 32 for best performance. If you encounter any error about memory, try lower NUM_WORKERS.
@inproceedings{qian2021speech,
  title={Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates},
  author={Qian, Shenhan and Tu, Zhi and Zhi, YiHao and Liu, Wen and Gao, Shenghua},
  journal={International Conference on Computer Vision (ICCV)},
  year={2021}
}
Owner
Qian Shenhan
Qian Shenhan
A Joint Video and Image Encoder for End-to-End Retrieval

Frozen️ in Time ❄️ ️️️️ ⏳ A Joint Video and Image Encoder for End-to-End Retrieval (arXiv) Repository to contain the code, models, data for end-to-end

225 Dec 25, 2022
Using Opencv ,based on Augmental Reality(AR) and will show the feature matching of image and then by finding its matching

Using Opencv ,this project is based on Augmental Reality(AR) and will show the feature matching of image and then by finding its matching ,it will just mask that image . This project ,if used in cctv

1 Feb 13, 2022
Localization of thoracic abnormalities model based on VinBigData (top 1%)

Repository contains the code for 2nd place solution of VinBigData Chest X-ray Abnormalities Detection competition. The goal of competition was to auto

33 May 24, 2022
The code for CVPR2022 paper "Likert Scoring with Grade Decoupling for Long-term Action Assessment".

Likert Scoring with Grade Decoupling for Long-term Action Assessment This is the code for CVPR2022 paper "Likert Scoring with Grade Decoupling for Lon

10 Oct 21, 2022
【Auto】原神⭐钓鱼辅助工具 | 自动收竿、校准游标 | ✨您只需要抛出鱼竿,我们会帮你完成一切✨

原神钓鱼辅助工具 ✨ 作者正在努力重构代码中……会尽快带给大家一个更完美的脚本 ✨ 「您只需抛出鱼竿,然后我们会帮您搞定一切」 如果你觉得这个脚本好用,请点一个 Star ⭐ ,你的 Star 就是作者更新最大的动力 点击这里 查看演示视频 ✨ 欢迎大家在 Issues 中分享自己的配置文件 ✨ ✨

261 Jan 02, 2023
Automatically download multiple papers by keywords in CVPR

CVFPaperHelper Automatically download multiple papers by keywords in CVPR Install mkdir PapersToRead cd PaperToRead pip install requests tqdm git clon

46 Jun 08, 2022
Script para controlar o movimento do mouse usando Python e openCV com câmera em tempo real que detecta pontos de referência da mão, rastreia padrões de gestos em vez de um mouse físico.

mouserController Script para controlar o movimento do mouse usando Python e openCV com câmera em tempo real que detecta pontos de referência da mão, r

Vinícius Azevedo 6 Jun 28, 2022
Primary QPDF source code and documentation

QPDF QPDF is a command-line tool and C++ library that performs content-preserving transformations on PDF files. It supports linearization, encryption,

QPDF 2.2k Jan 04, 2023
PAGE XML format collection for document image page content and more

PAGE-XML PAGE XML format collection for document image page content and more For an introduction, please see the following publication: http://www.pri

PRImA Research Lab 46 Nov 14, 2022
Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words.

Handwritten-Text-Recognition Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. T

27 Jan 08, 2023
Kornia is a open source differentiable computer vision library for PyTorch.

Open Source Differentiable Computer Vision Library

kornia 7.6k Jan 06, 2023
Volume Control using OpenCV

Gesture-Volume-Control Volume Control using OpenCV Here i made volume control using Python and OpenCV in which we can control the volume of our laptop

Mudit Sinha 3 Oct 10, 2021
Face Recognizer using Opencv Python

Face Recognizer using Opencv Python The first step create your own dataset with file open-cv-create_dataset second step You can put the photo accordin

Han Izza 2 Nov 16, 2021
Image processing in Python

scikit-image: Image processing in Python Website (including documentation): https://scikit-image.org/ Mailing list: https://mail.python.org/mailman3/l

Image Processing Toolbox for SciPy 5.2k Dec 30, 2022
The code for “Oriented RepPoints for Aerail Object Detection”

Oriented RepPoints for Aerial Object Detection The code for the implementation of “Oriented RepPoints”, Under review. (arXiv preprint) Introduction Or

WentongLi 207 Dec 24, 2022
A buffered and threaded wrapper for the OpenCV VideoCapture object. Can speed up video decoding significantly. Supports

A buffered and threaded wrapper for the OpenCV VideoCapture object. Can speed up video decoding significantly. Supports "with"-syntax.

Patrice Matz 0 Oct 30, 2021
📷 This repository is focused on having various feature implementation of OpenCV in Python.

📷 This repository is focused on having various feature implementation of OpenCV in Python. The aim is to have a minimal implementation of all OpenCV features together, under one roof.

Aditya Kumar Gupta 128 Dec 04, 2022
GDB python tool to pretty print and debug c++ xtensor containers

gdb_xt2np GDB python tool to pretty print, examine, and debug c++ Xtensor containers. Xtensor is a c++ library for scientific computing using multidim

Christopher Burke 4 Oct 29, 2021
[ICCV, 2021] Cloud Transformers: A Universal Approach To Point Cloud Processing Tasks

Cloud Transformers: A Universal Approach To Point Cloud Processing Tasks This is an official PyTorch code repository of the paper "Cloud Transformers:

Visual Understanding Lab @ Samsung AI Center Moscow 27 Dec 15, 2022
Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition:

Multi-Type-TD-TSR Check it out on Source Code of our Paper: Multi-Type-TD-TSR Extracting Tables from Document Images using a Multi-stage Pipeline for

Pascal Fischer 178 Dec 27, 2022