3D Avatar Lip Syncronization from speech (JALI based face-rigging)

Last update: Dec 20, 2022

Overview

visemenet-inference

Inference Demo of "VisemeNet-tensorflow"
- VisemeNet is an audio-driven animator centric speech animation driving a JALI or standard FACS-based face-rigging from input audio.
- The original repo is outdated and difficult to setup the environment for testing the pretrained model. This code is to provide a super-clean inference module based on the original author's repo.

How to freeze graph

This repo does not need bazel-build for "freeze-graph" function
Thanks to https://github.com/lighttransport/VisemeNet-infer for giving some examples.

Requirements

Python 3.6.x using "pyenv"
Tensorflow 1.1.0

Setup the envs and packages

# Install Virtualenv using pyenv
pyenv install 3.6.5
pyenv virtualenv 3.6.5 visemenet-freeze
pyenv activate visemenet-freeze

# Install packages
pip install tensorflow==1.1.0

Clone the repo

# Clone Visemenet repo and the pretrained model
git clone https://github.com/yzhou359/VisemeNet_tensorflow.git
curl -L https://www.dropbox.com/sh/7nbqgwv0zz8pbk9/AAAghy76GVYDLqPKdANcyDuba?dl=0 > pretrained_model.zip
unzip prtrained_model.zip -d VisemeNet_tensorflow/data/ckpt/pretrain_biwi/

Freeze Graph and Save as pb

# Freeze Graph
python freeze_graph.py

Model Inference

Colab Demo

This code provides the simple and clean inference code without any needless ones
It's compatible with TF 2.0 Version

Requirements

Tensorflow 2.x
numpy
scipy
python_speech_features

How to run inference

import numpy as np
from inference import VisemeRegressor

pb_filepath = "./visemenet_frozen.pb"
wav_file_path = "./test_audio.wav"
out_txt_path = "./maya_viseme_outputs.txt"

viseme_regressor = VisemeRegressor(pb_filepath=pb_filepath)

viseme_outputs = viseme_regressor.predict_outputs(wav_file_path=wav_file_path)

np.savetxt(out_txt_path, viseme_outputs, '%.4f')

3D Avatar Lip Syncronization from speech (JALI based face-rigging)

Related tags

Overview

visemenet-inference

How to freeze graph

Requirements

Model Inference

Requirements

How to run inference

Owner

Junhwan Jang

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

U-Net Brain Tumor Segmentation

Pytorch implementation of MalConv

PyTorch and GPyTorch implementation of the paper "Conditioning Sparse Variational Gaussian Processes for Online Decision-making."

Training deep models using anime, illustration images.

Official Implementation of Domain-Aware Universal Style Transfer

基于深度强化学习的原神自动钓鱼AI

Download & Install mods for your favorit game with a few simple clicks

A simple image/video to Desmos graph converter run locally

Calibrate your listeners! Robust communication-based training for pragmatic speakers. Findings of EMNLP 2021.

A Low Complexity Speech Enhancement Framework for Full-Band Audio (48kHz) based on Deep Filtering.

PyTorch implementation of Neural View Synthesis and Matching for Semi-Supervised Few-Shot Learning of 3D Pose

Supercharging Imbalanced Data Learning WithCausal Representation Transfer

Source code for Transformer-based Multi-task Learning for Disaster Tweet Categorisation (UCD's participation in TREC-IS 2020A, 2020B and 2021A).

A custom DeepStack model for detecting 16 human actions.

DeepMReye: magnetic resonance-based eye tracking using deep neural networks

Source Code for ICSE 2022 Paper - ``Can We Achieve Fairness Using Semi-Supervised Learning?''

PyTorch Implementation of Region Similarity Representation Learning (ReSim)

Computing Shapley values using VAEAC

Neural Style and MSG-Net