Code for One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022)

Last update: Jan 03, 2023

Overview

One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022)

Paper | Demo

Requirements

Python >= 3.6 , Pytorch >= 1.8 and ffmpeg
Set up OpenFace
- We use the OpenFace tools to extract the initial pose of the reference image
- Make sure you have installed this tool, and set the OPENFACE_POSE_EXTRACTOR_PATH in config.py. For example, it should be the absolute path of the "FeatureExtraction.exe" for Windows.
Other requirements are listed in the 'requirements.txt'

Pretrained Checkpoint

Please download the pretrained checkpoint from google-drive and unzip it to the directory (/checkpoints). Or manually modify the settings of GENERATOR_CKPT and AUDIO2POSE_CKPT in the config.py.

Extract phoneme

We employ the CMU phoneset to represent phonemes, the extra 'SIL' means silence. All the phonesets can be seen in 'phindex.json'.

We have extracted the phonemes for the audios in the 'sample/audio' directory. For other audios, you can extract the phonemes by other ASR tools and then map them to the CMU phoneset. Or email to [email protected] for help.

Generate Demo Results

python test_script.py --img_path xxx.jpg --audio_path xxx.wav --phoneme_path xxx.json --save_dir "YOUR_DIR"

Note that the input images must keep the same height and width and the face should be appropriately cropped as in samples/imgs. You can also preprocess your images with image_preprocess.py.

License and Citation

@InProceedings{wang2021one,
author = Suzhen Wang, Lincheng Li, Yu Ding, Xin Yu
title = {One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning},
booktitle = {AAAI 2022},
year = {2022},
}

Acknowledgement

This codebase is based on First Order Motion Model and imaginaire, thanks for their contributions.

Code for One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022)

Related tags

Overview

One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022)

Paper | Demo

Requirements

Pretrained Checkpoint

Extract phoneme

Generate Demo Results

License and Citation

Acknowledgement

Owner

FuxiVirtualHuman

Distributing reference energies for SMIRNOFF implementations

Analyses of the individual electric field magnitudes with Roast.

A tutorial on training a DarkNet YOLOv4 model for the CrowdHuman dataset

Organseg dags - The repository contains the codebase for multi-organ segmentation with directed acyclic graphs (DAGs) in CT.

Official codebase for Pretrained Transformers as Universal Computation Engines.

[SDM 2022] Towards Similarity-Aware Time-Series Classification

Dist2Dec: A Simplicial Neural Network for Homology Localization

Implementation of various Vision Transformers I found interesting

A curated (most recent) list of resources for Learning with Noisy Labels

ByteTrack(Multi-Object Tracking by Associating Every Detection Box)のPythonでのONNX推論サンプル

Pose Detection and Machine Learning for real-time body posture analysis during exercise to provide audiovisual feedback on improvement of form.

Awesome Artificial Intelligence, Machine Learning and Deep Learning as we learn it

Real-ESRGAN aims at developing Practical Algorithms for General Image Restoration.

Python Jupyter kernel using Poetry for reproducible notebooks

Building blocks for uncertainty-aware cycle consistency presented at NeurIPS'21.

Implementation of ICLR 2020 paper "Revisiting Self-Training for Neural Sequence Generation"

Code to accompany our paper "Continual Learning Through Synaptic Intelligence" ICML 2017

Pairwise Learning for Neural Link Prediction for OGB (PLNLP-OGB)

領域を指定し、キーを入力することで画像を保存するツールです。クラス分類用のデータセット作成を想定しています。

CVPR 2022 "Online Convolutional Re-parameterization"