Official code for CVPR2022 paper: Depth-Aware Generative Adversarial Network for Talking Head Video Generation

Last update: Jan 04, 2023

Overview

📖 Depth-Aware Generative Adversarial Network for Talking Head Video Generation (CVPR 2022)

🔥 If DaGAN is helpful in your photos/projects, please help to ⭐ it or recommend it to your friends. Thanks 🔥

[Paper] [Project Page] [Demo] [Poster Video]

Fa-Ting Hong, Longhao Zhang, Li Shen, Dan Xu
The Hong Kong University of Science and Technology

Cartoon Sample

cartoon.mp4

Human Sample

celeb.mp4

Voxceleb1 Dataset

🚩 Updates

🔥 🔥 ✅ May 19, 2022: The depth face model trained on Voxceleb2 is released! (The corresponding checkpoint of DaGAN will release soon). Click the LINK
🔥 🔥 ✅ April 25, 2022: Integrated into Huggingface Spaces 🤗 using Gradio. Try out the web demo: (GPU version will come soon!)
🔥 🔥 ✅ Add SPADE model, which produces more natural results.

🔧 Dependencies and Installation

Python >= 3.7 (Recommend to use Anaconda or Miniconda)
PyTorch >= 1.7
Option: NVIDIA GPU + CUDA
Option: Linux

Installation

We now provide a clean version of DaGAN, which does not require customized CUDA extensions.

Clone repo

git clone https://github.com/harlanhong/CVPR2022-DaGAN.git
cd CVPR2022-DaGAN

Install dependent packages

pip install -r requirements.txt

## Install the Face Alignment lib
cd face-alignment
pip install -r requirements.txt
python setup.py install

⚡ Quick Inference

We take the paper version for an example. More models can be found here.

YAML configs

See config/vox-adv-256.yaml to get description of each parameter.

Pre-trained checkpoint

The pre-trained checkpoint of face depth network and our DaGAN checkpoints can be found under following link: OneDrive.

Inference! To run a demo, download checkpoint and run the following command:

CUDA_VISIBLE_DEVICES=0 python demo.py  --config config/vox-adv-256.yaml --driving_video path/to/driving --source_image path/to/source --checkpoint path/to/checkpoint --relative --adapt_scale --kp_num 15 --generator DepthAwareGenerator

The result will be stored in result.mp4. The driving videos and source images should be cropped before it can be used in our method. To obtain some semi-automatic crop suggestions you can use python crop-video.py --inp some_youtube_video.mp4. It will generate commands for crops using ffmpeg.

💻 Training

Datasets

VoxCeleb. Please follow the instruction from https://github.com/AliaksandrSiarohin/video-preprocessing.

Train on VoxCeleb

To train a model on specific dataset run:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --master_addr="0.0.0.0" --master_port=12348 run.py --config config/vox-adv-256.yaml --name DaGAN --rgbd --batchsize 12 --kp_num 15 --generator DepthAwareGenerator

The code will create a folder in the log directory (each run will create a new name-specific directory). Checkpoints will be saved to this folder. To check the loss values during training see log.txt. By default the batch size is tunned to run on 8 GeForce RTX 3090 gpu (You can obtain the best performance after about 150 epochs). You can change the batch size in the train_params in .yaml file.

🚩 Please use multiple GPUs to train your own model, if you use only one GPU, you would meet the inplace problem.

Also, you can watch the training loss by running the following command:

tensorboard --logdir log/DaGAN/log

When you kill your process for some reasons in the middle of training, a zombie process may occur, you can kill it using our provided tool:

python kill_port.py PORT

Training on your own dataset

Resize all the videos to the same size e.g 256x256, the videos can be in '.gif', '.mp4' or folder with images. We recommend the later, for each video make a separate folder with all the frames in '.png' format. This format is loss-less, and it has better i/o performance.
Create a folder data/dataset_name with 2 subfolders train and test, put training videos in the train and testing in the test.
Create a config config/dataset_name.yaml, in dataset_params specify the root dir the root_dir: data/dataset_name. Also adjust the number of epoch in train_params.

📜 Acknowledgement

Our DaGAN implementation is inspired by FOMM. We appreciate the authors of FOMM for making their codes available to public.

📜 BibTeX

@inproceedings{hong2022depth,
            title={Depth-Aware Generative Adversarial Network for Talking Head Video Generation},
            author={Hong, Fa-Ting and Zhang, Longhao and Shen, Li and Xu, Dan},
            journal={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
            year={2022}
          }

📧 Contact

If you have any question, please email [email protected].

Official code for CVPR2022 paper: Depth-Aware Generative Adversarial Network for Talking Head Video Generation

Related tags

Overview

📖 Depth-Aware Generative Adversarial Network for Talking Head Video Generation (CVPR 2022)

Cartoon Sample

Human Sample

Voxceleb1 Dataset

🔧 Dependencies and Installation

Installation

⚡ Quick Inference

YAML configs

Pre-trained checkpoint

💻 Training

Datasets

Train on VoxCeleb

Training on your own dataset

📜 Acknowledgement

📜 BibTeX

📧 Contact

Owner

Fa-Ting Hong

The official repository for our paper "The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers". We significantly improve the systematic generalization of transformer models on a variety of datasets using simple tricks and careful considerations.

[ICCV'21] Official implementation for the paper Social NCE: Contrastive Learning of Socially-aware Motion Representations

This is the official code for the paper "Learning with Nested Scene Modeling and Cooperative Architecture Search for Low-Light Vision"

A pytorch implementation of Reading Wikipedia to Answer Open-Domain Questions.

PyTorch implementation of SwAV (Swapping Assignments between Views)

《Dual-Resolution Correspondence Network》(NeurIPS 2020)

Only a Matter of Style: Age Transformation Using a Style-Based Regression Model

This YoloV5 based model is fit to detect people and different types of land vehicles, and displaying their density on a fitted map, according to their coordinates and detected labels.

Dynamic Neural Representational Decoders for High-Resolution Semantic Segmentation

本步态识别系统主要基于GaitSet模型进行实现

Using modified BiSeNet for face parsing in PyTorch

Continuous Query Decomposition for Complex Query Answering in Incomplete Knowledge Graphs

LTR_CrossEncoder: Legal Text Retrieval Zalo AI Challenge 2021

Gym for multi-agent reinforcement learning

Implementation of the "PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences" paper.

Smart edu-autobooking - Johnson @ DMI-UNICT study room self-booking system

This repository is maintained for the scientific paper tittled " Study of keyword extraction techniques for Electric Double Layer Capacitor domain using text similarity indexes: An experimental analysis "

Official repository for MixFaceNets: Extremely Efficient Face Recognition Networks

Sharpened cosine similarity torch - A Sharpened Cosine Similarity layer for PyTorch

TensorFlow2 Classification Model Zoo playing with TensorFlow2 on the CIFAR-10 dataset.