Code for Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation (CVPR 2021)

Overview

Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation (CVPR 2021)

Hang Zhou, Yasheng Sun, Wayne Wu, Chen Change Loy, Xiaogang Wang, and Ziwei Liu.

Project | Paper | Demo

We propose Pose-Controllable Audio-Visual System (PC-AVS), which achieves free pose control when driving arbitrary talking faces with audios. Instead of learning pose motions from audios, we leverage another pose source video to compensate only for head motions. The key is to devise an implicit low-dimension pose code that is free of mouth shape or identity information. In this way, audio-visual representations are modularized into spaces of three key factors: speech content, head pose, and identity information.

Requirements

  • Python 3.6 and Pytorch 1.3.0 are used. Basic requirements are listed in the 'requirements.txt'.
pip install -r requirements.txt

Quick Start: Generate Demo Results

  • Download the pre-trained checkpoints.

  • Create the default folder ./checkpoints and unzip the demo.zip at ./checkpoints/demo. There should be a 5 pths in it.

  • Unzip all *.zip files within the misc folder.

  • Run the demo scripts:

bash experiments/demo_vox.sh
  • The --gen_video argument is by default on, ffmpeg >= 4.2.0 is required to use this flag in linux systems. All frames along with an avconcat.mp4 video file will be saved in the ./id_517600055_pose_517600078_audio_681600002/results folder in the following form:

From left to right are the reference input, the generated results, the pose source video and the synced original video with the driving audio.

Prepare Testing Meta Data

  • Automatic VoxCeleb2 Data Formulation

The inference code experiments/demo.sh refers to ./misc/demo.csv for testing data paths. In linux systems, any applicable csv file can be created automatically by running:

python scripts/prepare_testing_files.py

Then modify the meta_path_vox in experiments/demo_vox.sh to './misc/demo2.csv' and run

bash experiments/demo_vox.sh

An additional result should be seen saved.

  • Metadata Details

Detailedly, in scripts/prepare_testing_files.py there are certain flags which enjoy great flexibility when formulating the metadata:

  1. --src_pose_path denotes the driving pose source path. It can be an mp4 file or a folder containing frames in the form of %06d.jpg starting from 0.

  2. --src_audio_path denotes the audio source's path. It can be an mp3 audio file or an mp4 video file. If a video is given, the frames will be automatically saved in ./misc/Mouth_Source/video_name, and disables the --src_mouth_frame_path flag.

  3. --src_mouth_frame_path. When --src_audio_path is not a video path, this flags could provide the folder containing the video frames synced with the source audio.

  4. --src_input_path is the path to the input reference image. When the path is a video file, we will convert it to frames.

  5. --csv_path the path to the to-be-saved metadata.

You can manually modify the metadata csv file or add lines to it according to the rules defined in the scripts/prepare_testing_files.py file or the dataloader data/voxtest_dataset.py.

We provide a number of demo choices in the misc folder, including several ones used in our video. Feel free to rearrange them even across folders. And you are welcome to record audio files by yourself.

  • Self-Prepared Data Processing

Our model handles only VoxCeleb2-like cropped data, thus pre-processing is needed for self-prepared data.

  • Coming soon

Train Your Own Model

  • Coming soon

License and Citation

The usage of this software is under CC-BY-4.0.

@InProceedings{zhou2021pose,
author = {Zhou, Hang and Sun, Yasheng and Wu, Wayne and Loy, Chen Change and Wang, Xiaogang and Liu, Ziwei},
title = {Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2021}
}

Acknowledgement

Owner
Hang_Zhou
Ph.D. Candidate @ MMLab-CUHK
Hang_Zhou
A lightweight library to compare different PyTorch implementations of the same network architecture.

TorchBug is a lightweight library designed to compare two PyTorch implementations of the same network architecture. It allows you to count, and compar

Arjun Krishnakumar 5 Jan 02, 2023
Neural Fixed-Point Acceleration for Convex Optimization

Licensing The majority of neural-scs is licensed under the CC BY-NC 4.0 License, however, portions of the project are available under separate license

Facebook Research 27 Oct 06, 2022
[RSS 2021] An End-to-End Differentiable Framework for Contact-Aware Robot Design

DiffHand This repository contains the implementation for the paper An End-to-End Differentiable Framework for Contact-Aware Robot Design (RSS 2021). I

Jie Xu 60 Jan 04, 2023
Tool cek opsi checkpoint facebook!

tool apa ini? cek_opsi_facebook adalah sebuah tool yang mengecek opsi checkpoint akun facebook yang terkena checkpoint! tujuan dibuatnya tool ini? too

Muhammad Latif Harkat 2 Jul 17, 2022
The official PyTorch implementation of the paper: *Xili Dai, Xiaojun Yuan, Haigang Gong, Yi Ma. "Fully Convolutional Line Parsing." *.

F-Clip — Fully Convolutional Line Parsing This repository contains the official PyTorch implementation of the paper: *Xili Dai, Xiaojun Yuan, Haigang

Xili Dai 115 Dec 28, 2022
Towards Flexible Blind JPEG Artifacts Removal (FBCNN, ICCV 2021)

Towards Flexible Blind JPEG Artifacts Removal (FBCNN, ICCV 2021) Jiaxi Jiang, Kai Zhang, Radu Timofte Computer Vision Lab, ETH Zurich, Switzerland 🔥

Jiaxi Jiang 282 Jan 02, 2023
Implementation of TransGanFormer, an all-attention GAN that combines the finding from the recent GanFormer and TransGan paper

TransGanFormer (wip) Implementation of TransGanFormer, an all-attention GAN that combines the finding from the recent GansFormer and TransGan paper. I

Phil Wang 146 Dec 06, 2022
git git《Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking》(CVPR 2021) GitHub:git2] 《Masksembles for Uncertainty Estimation》(CVPR 2021) GitHub:git3]

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking Ning Wang, Wengang Zhou, Jie Wang, and Houqiang Li Accepted by CVPR

NingWang 236 Dec 22, 2022
Converts geometry node attributes to built-in attributes

Attribute Converter Simplifies converting attributes created by geometry nodes to built-in attributes like UVs or vertex colors, as a single click ope

Ivan Notaros 12 Dec 22, 2022
KoRean based ELECTRA pre-trained models (KR-ELECTRA) for Tensorflow and PyTorch

KoRean based ELECTRA (KR-ELECTRA) This is a release of a Korean-specific ELECTRA model with comparable or better performances developed by the Computa

12 Jun 03, 2022
Video Instance Segmentation with a Propose-Reduce Paradigm (ICCV 2021)

Propose-Reduce VIS This repo contains the official implementation for the paper: Video Instance Segmentation with a Propose-Reduce Paradigm Huaijia Li

DV Lab 39 Nov 23, 2022
Blender Add-on that sets a Material's Base Color to one of Pantone's Colors of the Year

Blender PCOY (Pantone Color of the Year) MCMC (Mid-Century Modern Colors) HG71 (House & Garden Colors 1971) Blender Add-ons That Assign a Custom Color

Don Schnitzius 15 Nov 20, 2022
Pseudo-Visual Speech Denoising

Pseudo-Visual Speech Denoising This code is for our paper titled: Visual Speech Enhancement Without A Real Visual Stream published at WACV 2021. Autho

Sindhu 94 Oct 22, 2022
Pytorch codes for "Self-supervised Multi-view Stereo via Effective Co-Segmentation and Data-Augmentation"

Self-Supervised-MVS This repository is the official PyTorch implementation of our AAAI 2021 paper: "Self-supervised Multi-view Stereo via Effective Co

hongbin_xu 127 Jan 04, 2023
Voice Conversion by CycleGAN (语音克隆/语音转换):CycleGAN-VC3

CycleGAN-VC3-PyTorch 中文说明 | English This code is a PyTorch implementation for paper: CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectr

Kun Ma 110 Dec 24, 2022
A PyTorch Toolbox for Face Recognition

FaceX-Zoo FaceX-Zoo is a PyTorch toolbox for face recognition. It provides a training module with various supervisory heads and backbones towards stat

JDAI-CV 1.6k Jan 06, 2023
CellRank's reproducibility repository.

CellRank's reproducibility repository We believe that reproducibility is key and have made it as simple as possible to reproduce our results. Please e

Theis Lab 8 Oct 08, 2022
Leveraging OpenAI's Codex to solve cornerstone problems in Music

Music-Codex Leveraging OpenAI's Codex to solve cornerstone problems in Music Please NOTE: Presented generated samples were created by OpenAI's Codex P

Alex 2 Mar 11, 2022
Weakly Supervised Segmentation by Tensorflow.

Weakly Supervised Segmentation by Tensorflow. Implements semantic segmentation in Simple Does It: Weakly Supervised Instance and Semantic Segmentation, by Khoreva et al. (CVPR 2017).

CHENG-YOU LU 52 Dec 27, 2022
DECAF: Deep Extreme Classification with Label Features

DECAF DECAF: Deep Extreme Classification with Label Features @InProceedings{Mittal21, author = "Mittal, A. and Dahiya, K. and Agrawal, S. and Sain

46 Nov 06, 2022