Code for Talking Face Generation by Adversarially Disentangled Audio-Visual Representation (AAAI 2019)

Overview

Talking Face Generation by Adversarially Disentangled Audio-Visual Representation (AAAI 2019)

We propose Disentangled Audio-Visual System (DAVS) to address arbitrary-subject talking face generation in this work, which aims to synthesize a sequence of face images that correspond to given speech semantics, conditioning on either an unconstrained speech audio or video.

[Project] [Paper] [Demo]

Recommondation of our CVPR21 repo

This repo is barely maintaining since the version of this code is out of date. If you are interested in the topic of Talking Face Generation, feel free to try the CODE of our CVPR2021 PAPER!

Requirements

Generating test results

Create the default folder "checkpoints" and put the checkpoint in it or get the CHECKPOINT_PATH
  • Samples for testing can be found in this folder named 0572_0019_0003. This is a pre-processed sample from the Voxceleb Dataset.

  • Run the testing script to generate videos from video:

python test_all.py  --test_root ./0572_0019_0003/video --test_type video --test_audio_video_length 99 --test_resume_path CHECKPOINT_PATH
  • Run the testing script to generate videos from audio:
python test_all.py  --test_root ./0572_0019_0003/audio --test_type audio --test_audio_video_length 99 --test_resume_path CHECKPOINT_PATH

Sample Results

  • Talking Effect on Human Characters

  • Talking Effect on Non-human Characters (Trained on Human Faces Only)

Create more samples

  • The face detection tool used in the demo videos can be found at RSA. It will return a Matfile with 5 key point locations in a row for each image. Other face alignment methods are also appliable such as dlib. The key points for face alignement we used are the two for the center of the eyes and the average point of the corners of the mouth. With each image's PATH and the face POINTS, you can find our way of face alignment at preprocess/face_align.py.

  • Our preprocessing of the audio files is the same and borrowed from the matlab code of SyncNet. Then we save the mfcc features into bin files.

Preparing Training Data

  • We used the LRW dataset for training.
  • The directories are arranged like this:
data
├── train, val, test
|	├── 0, 1, 2 ... 499 (one folder for each class)
|	│   ├── 0, 1, 2 ... #videos per class
|	│   │   ├── align_face256
|	│   │   |   ├── 0, 1, ... 28.jpg
|	│   |   ├── mfcc20
|	│   │   |   ├── 2, 3 ... 26.bin

where each video is extracted to frames and aligned using our protocol, and each audio is processed and saved using Matlab.

Training

python train.py
  • This is still a beta version of the training code which only disentangles wid information from pid space. Running the train.py only might not be able to fully reproduce the paper. However, it can be served as a reference for how we implement the whole training process.
  • During our own implementation, the classification part (without generation and disentanglement) is pretrained first. The pretraining training code is temporarily not provided.

Postprocessing Details (Optional)

  • The directly generated results may suffer from a "zoom-in-and-out" condition which we assume is caused by our alignment of the training set. We solve the unstable problem using Subspace Video Stabilization in the demos.

License and Citation

The use of this software is RESTRICTED to non-commercial research and educational purposes.

@inproceedings{zhou2019talking,
  title     = {Talking Face Generation by Adversarially Disentangled Audio-Visual Representation},
  author    = {Zhou, Hang and Liu, Yu and Liu, Ziwei and Luo, Ping and Wang, Xiaogang},
  booktitle = {AAAI Conference on Artificial Intelligence (AAAI)},
  year      = {2019},
}

Acknowledgement

The structure of this codebase is borrowed from pix2pix.

Owner
Hang_Zhou
Ph.D. @ MMLab-CUHK
Hang_Zhou
GAN-based Matrix Factorization for Recommender Systems

GAN-based Matrix Factorization for Recommender Systems This repository contains the datasets' splits, the source code of the experiments and their res

Ervin Dervishaj 9 Nov 06, 2022
Official repo for AutoInt: Automatic Integration for Fast Neural Volume Rendering in CVPR 2021

AutoInt: Automatic Integration for Fast Neural Volume Rendering CVPR 2021 Project Page | Video | Paper PyTorch implementation of automatic integration

Stanford Computational Imaging Lab 149 Dec 22, 2022
Official PyTorch code for Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021)

Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021) This repository is the official P

Jingyun Liang 159 Dec 30, 2022
SPTAG: A library for fast approximate nearest neighbor search

SPTAG: A library for fast approximate nearest neighbor search SPTAG SPTAG (Space Partition Tree And Graph) is a library for large scale vector approxi

Microsoft 4.3k Jan 01, 2023
PyTorch implementation of NeurIPS 2021 paper: "CoFiNet: Reliable Coarse-to-fine Correspondences for Robust Point Cloud Registration"

CoFiNet: Reliable Coarse-to-fine Correspondences for Robust Point Cloud Registration (NeurIPS 2021) PyTorch implementation of the paper: CoFiNet: Reli

76 Jan 03, 2023
Official PyTorch implementation of "Improving Face Recognition with Large AgeGaps by Learning to Distinguish Children" (BMVC 2021)

Inter-Prototype (BMVC 2021): Official Project Webpage This repository provides the official PyTorch implementation of the following paper: Improving F

Jungsoo Lee 16 Jun 30, 2022
Joint learning of images and text via maximization of mutual information

mutual_info_img_txt Joint learning of images and text via maximization of mutual information. This repository incorporates the algorithms presented in

Ruizhi Liao 10 Dec 22, 2022
Learning Off-Policy with Online Planning, CoRL 2021

LOOP: Learning Off-Policy with Online Planning Accepted in Conference of Robot Learning (CoRL) 2021. Harshit Sikchi, Wenxuan Zhou, David Held Paper In

Harshit Sikchi 24 Nov 22, 2022
The code for Expectation-Maximization Attention Networks for Semantic Segmentation (ICCV'2019 Oral)

EMANet News The bug in loading the pretrained model is now fixed. I have updated the .pth. To use it, download it again. EMANet-101 gets 80.99 on the

Xia Li 李夏 663 Nov 30, 2022
implement of SwiftNet:Real-time Video Object Segmentation

SwiftNet The official PyTorch implementation of SwiftNet:Real-time Video Object Segmentation, which has been accepted by CVPR2021. Requirements Python

haochen wang 64 Dec 14, 2022
Code, Data and Demo for Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting

InversePrompting Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting Code: The code is provided in the "chinese_ip"

THUDM 101 Dec 16, 2022
Finetune alexnet with tensorflow - Code for finetuning AlexNet in TensorFlow >= 1.2rc0

Finetune AlexNet with Tensorflow Update 15.06.2016 I revised the entire code base to work with the new input pipeline coming with TensorFlow = versio

Frederik Kratzert 766 Jan 04, 2023
PointCNN: Convolution On X-Transformed Points (NeurIPS 2018)

PointCNN: Convolution On X-Transformed Points Created by Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Xinhan Di, and Baoquan Chen. Introduction PointCNN

Yangyan Li 1.3k Dec 21, 2022
Codes for our paper The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders published to EMNLP 2021.

The Stem Cell Hypothesis Codes for our paper The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders published to EMNLP

Emory NLP 5 Jul 08, 2022
A python library for highly configurable transformers - easing model architecture search and experimentation.

A python library for highly configurable transformers - easing model architecture search and experimentation.

Anthony Fuller 51 Nov 20, 2022
PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition, CVPR 2018

PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place

Mikaela Uy 294 Dec 12, 2022
This is the PyTorch implementation of GANs N’ Roses: Stable, Controllable, Diverse Image to Image Translation

Official PyTorch repo for GAN's N' Roses. Diverse im2im and vid2vid selfie to anime translation.

1.1k Jan 01, 2023
This is the official Pytorch implementation of the paper "Diverse Motion Stylization for Multiple Style Domains via Spatial-Temporal Graph-Based Generative Model"

Diverse Motion Stylization (Official) This is the official Pytorch implementation of this paper. Diverse Motion Stylization for Multiple Style Domains

Soomin Park 28 Dec 16, 2022
Pseudo-rng-app - whos needs science to make a random number when you have pseudoscience?

Pseudo-random numbers with pseudoscience rng is so complicated! Why cant we have a horoscopic, vibe-y way of calculating a random number? Why cant rng

Andrew Blance 1 Dec 27, 2021
[arXiv'22] Panoptic NeRF: 3D-to-2D Label Transfer for Panoptic Urban Scene Segmentation

Panoptic NeRF Project Page | Paper | Dataset Panoptic NeRF: 3D-to-2D Label Transfer for Panoptic Urban Scene Segmentation Xiao Fu*, Shangzhan zhang*,

Xiao Fu 111 Dec 16, 2022