This repository contains a PyTorch implementation of "AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis".

Last update: Dec 29, 2022

Related tags

Deep Learning AD-NeRF

Overview

AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis

| Project Page | Paper |

PyTorch implementation for the paper "AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis"

Prerequisites

You can create an anaconda environment called adnerf with:

conda env create -f environment.yml
conda activate adnerf

PyTorch3D

Recommend install from a local clone

git clone https://github.com/facebookresearch/pytorch3d.git
cd pytorch3d && pip install -e .

Basel Face Model 2009

Put "01_MorphableModel.mat" to data_util/face_tracking/3DMM/; cd data_util/face_tracking; run
```
python convert_BFM.py
```

Train AD-NeRF

Data Preprocess ($id Obama for example)
```
bash process_data.sh Obama
```
- Input: A portrait video at 25fps containing voice audio. (dataset/vids/$id.mp4)
- Output: folder dataset/$id that contains all files for training
Train Two NeRFs (Head-NeRF and Torso-NeRF)
- Train Head-NeRF with command
```
python NeRFs/HeadNeRF/run_nerf.py --config dataset/$id/HeadNeRF_config.txt
```
- Copy latest trainied model from dataset/$id/logs/$id_head to dataset/$id/logs/$id_com
- Train Torso-NeRF with command
```
python NeRFs/TorsoNeRF/run_nerf.py --config dataset/$id/TorsoNeRF_config.txt
```

Run AD-NeRF for rendering

Reconstruct original video with audio input

python NeRFs/TorsoNeRF/run_nerf.py --config dataset/$id/TorsoNeRFTest_config.txt --aud_file=dataset/$id/aud.npy --test_size=300

Drive the target person with another audio input

python NeRFs/TorsoNeRF/run_nerf.py --config dataset/$id/TorsoNeRFTest_config.txt --aud_file=${deepspeechfile.npy} --test_size=-1

Acknowledgments

We use face-parsing.PyTorch for parsing head and torso maps, and DeepSpeech for audio feature extraction. The NeRF model is implemented based on NeRF-pytorch.

This repository contains a PyTorch implementation of "AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis".

Related tags

Overview

AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis

| Project Page | Paper |

Prerequisites

Train AD-NeRF

Run AD-NeRF for rendering

Acknowledgments

Owner

A clear, concise, simple yet powerful and efficient API for deep learning.

[CVPR 2020] Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation

DeepFill v1/v2 with Contextual Attention and Gated Convolution, CVPR 2018, and ICCV 2019 Oral

Implementation of CVPR'2022:Reconstructing Surfaces for Sparse Point Clouds with On-Surface Priors

Model Serving Made Easy

Pixel-Perfect Structure-from-Motion with Featuremetric Refinement (ICCV 2021, Oral)

Deepface is a lightweight face recognition and facial attribute analysis (age, gender, emotion and race) framework for python

Implementation of Segformer, Attention + MLP neural network for segmentation, in Pytorch

Change is Everywhere: Single-Temporal Supervised Object Change Detection in Remote Sensing Imagery (ICCV 2021)

Official implementation of the PICASO: Permutation-Invariant Cascaded Attentional Set Operator

Guiding evolutionary strategies by (inaccurate) differentiable robot simulators @ NeurIPS, 4th Robot Learning Workshop

[CVPR 2020] Interpreting the Latent Space of GANs for Semantic Face Editing

Source code for the NeurIPS 2021 paper "On the Second-order Convergence Properties of Random Search Methods"

MRI reconstruction (e.g., QSM) using deep learning methods

Geometric Deep Learning Extension Library for PyTorch

Web-interface + rest API for classification and regression (https://jeff1evesque.github.io/machine-learning.docs)

An efficient toolkit for Face Stylization based on the paper "AgileGAN: Stylizing Portraits by Inversion-Consistent Transfer Learning"

Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"

Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation

[CVPR 2021] Scan2Cap: Context-aware Dense Captioning in RGB-D Scans