Demo programs for the Talking Head Anime from a Single Image 2: More Expressive project.

Overview

Demo Code for "Talking Head Anime from a Single Image 2: More Expressive"

This repository contains demo programs for the Talking Head Anime from a Single Image 2: More Expressive project. Similar to the previous version, it has two programs:

  • The manual_poser lets you manipulate the facial expression and the head rotation of an anime character, given in a single image, through a graphical user interface. The poser is available in two forms: a standard GUI application, and a Jupyter notebook.
  • The ifacialmocap_puppeteer lets you transfer your facial motion, captured by a commercial iOS application called iFacialMocap, to an image of an anime character.

Try the Manual Poser on Google Colab

If you do not have the required hardware (discussed below) or do not want to download the code and set up an environment to run it, click this link to try running the manual poser on Google Colab.

Hardware Requirements

Both programs require a recent and powerful Nvidia GPU to run. I could personally ran them at good speed with the Nvidia Titan RTX. However, I think recent high-end gaming GPUs such as the RTX 2080, the RTX 3080, or better would do just as well.

The ifacialmocap_puppeteer requires an iOS device that is capable of computing blend shape parameters from a video feed. This means that the device must be able to run iOS 11.0 or higher and must have a TrueDepth front-facing camera. (See this page for more info.) In other words, if you have the iPhone X or something better, you should be all set. Personally, I have used an iPhone 12 mini.

Software Requirements

Both programs were written in Python 3. To run the GUIs, the following software packages are required:

  • Python >= 3.8
  • PyTorch >= 1.7.1 with CUDA support
  • SciPY >= 1.6.0
  • wxPython >= 4.1.1
  • Matplotlib >= 3.3.4

In particular, I created the environment to run the programs with Anaconda, using the following commands:

> conda create -n talking-head-anime-2-demo python=3.8
> conda activate talking-head-anime-2-demo
> conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
> conda install scipy
> pip install wxPython
> conda install matplotlib

To run the Jupyter notebook version of the manual_poser, you also need:

  • Jupyter Notebook >= 6.2.0
  • IPyWidgets >= 7.6.3

This means that, in addition to the commands above, you also need to run:

> conda install -c conda-forge notebook
> conda install -c conda-forge ipywidgets
> jupyter nbextension enable --py widgetsnbextension

Lastly, the ifacialmocap_puppeteer requires iFacialMocap, which is available in the App Store for 980 yen. You also need to install the paired desktop application on your PC or Mac. (Linux users, I'm sorry!) Your iOS and your computer must also use the same network. (For example, you may connect them to the same wireless router.)

Automatic Environment Construction with Anaconda

You can also use Anaconda to download and install all Python packages in one command. Open your shell, change the directory to where you clone the repository, and run:

conda env create -f environment.yml

This will create an environment called talking-head-anime-2-demo containing all the required Python packages.

Download the Model

Before running the programs, you need to download the model files from this Dropbox link and unzip it to the data folder of the repository's directory. In the end, the data folder should look like:

+ data
  + illust
    - waifu_00.png
    - waifu_01.png
    - waifu_02.png
    - waifu_03.png
    - waifu_04.png
    - waifu_05.png
    - waifu_06.png
    - waifu_06_buggy.png
  - combiner.pt
  - eyebrow_decomposer.pt
  - eyebrow_morphing_combiner.pt
  - face_morpher.pt
  - two_algo_face_rotator.pt

The model files are distributed with the Creative Commons Attribution 4.0 International License, which means that you can use them for commercial purposes. However, if you distribute them, you must, among other things, say that I am the creator.

Running the manual_poser Desktop Application

Open a shell. Change your working directory to the repository's root directory. Then, run:

> python tha2/app/manual_poser.py

Note that before running the command above, you might have to activate the Python environment that contains the required packages. If you created an environment using Anaconda as was discussed above, you need to run

> conda activate talking-head-anime-2-demo

if you have not already activated the environment.

Running the manual_poser Jupyter Notebook

Open a shell. Activate the environment. Change your working directory to the repository's root directory. Then, run:

> jupyter notebook

A browser window should open. In it, open tha2.ipynb. Once you have done so, you should see that it only has one cell. Run it. Then, scroll down to the end of the document, and you'll see the GUI there.

Running the ifacialmocap_puppeteer

First, run iFacialMocap on your iOS device. It should show you the device's IP address. Jot it down. Keep the app open.

IP address in iFacialMocap screen

Then, run the companion desktop application.

iFaciaMocap desktop application

Click "Open Advanced Setting >>". The application should expand.

Click the 'Open Advanced Setting >>' button.

Click the button that says "Maya" on the right side.

Click the 'Maya' button.

Then, click "Blender."

Select 'Blender' mode in the desktop application

Next, replace the IP address on the left side with your iOS device's IP address.

Replace IP address with device's IP address.

Click "Connect to Blender."

Click 'Connect to Blender.'

Open a shell. Activate the environment. Change your working directory to the repository's root directory. Then, run:

> python tha2/app/ifacialmocap_puppeteer.py

If the programs are connected properly, you should see that the many progress bars at the bottom of the ifacialmocap_puppeteer window should move when you move your face in front of the iOS device's front-facing camera.

You should see the progress bars moving.

If all is well, load an character image, and it should follow your facial movement.

Constraints on Input Images

In order for the model to work well, the input image must obey the following constraints:

  • It must be of size 256 x 256.
  • It must be of PNG format.
  • It must have an alpha channel.
  • It must contain only one humanoid anime character.
  • The character must be looking straight ahead.
  • The head of the character should be roughly contained in the middle 128 x 128 box.
  • All pixels that do not belong to the character (i.e., background pixels) should have RGBA = (0,0,0,0).

Image specification

FAQ: I prepared an image just like you said, why is my output so ugly?!?

This is most likely because your image does not obey the "background RGBA = (0,0,0,0)" constraint. In other words, your background pixels are (RRR,GGG,BBB,0) for some RRR, GGG, BBB > 0 rather than (0,0,0,0). This happens when you use Photoshop because it does not clear the RGB channels of transparent pixels.

Let's see an example. When I tried to use the manual_poser with data/illust/waifu_06_buggy.png. Here's what I got.

A failure case

When you look at the image, there seems to be nothing wrong with it.

waifu_06_buggy.png

However, if you inspect it with GIMP, you will see that the RGB channels have what backgrounds, which means that those pixels have non-zero RGB values.

In the buggy image, background pixels have colors in the RGB channels.

What you want, instead, is something like the non-buggy version: data/illust/waifu_06.png, which looks exactly the same as the buggy one to the naked eyes.

waifu_06.png

However, in GIMP, all channels have black backgrounds.

In the good image, background pixels do not have colors in any channels.

Because of this, the output was clean.

A success case

A way to make sure that your image works well with the model is to prepare it with GIMP. When exporting your image to the PNG format, make sure to uncheck "Save color values from transparent pixels" before you hit "Export."

Make sure to uncheck 'Save color values from transparent pixels' before exporting!

Disclaimer

While the author is an employee of Google Japan, this software is not Google's product and is not supported by Google.

The copyright of this software belongs to me as I have requested it using the IARC process. However, Google might claim the rights to the intellectual property of this invention.

The code is released under the MIT license. The model is released under the Creative Commons Attribution 4.0 International License.

Owner
Pramook Khungurn
A software developer from Thailand, interested in computer graphics, machine learning, and algorithms.
Pramook Khungurn
Statistics and Mathematics for Machine Learning, Deep Learning , Deep NLP

Stat4ML Statistics and Mathematics for Machine Learning, Deep Learning , Deep NLP This is the first course from our trio courses: Statistics Foundatio

Omid Safarzadeh 83 Dec 29, 2022
A very simple framework for state-of-the-art Natural Language Processing (NLP)

A very simple framework for state-of-the-art NLP. Developed by Humboldt University of Berlin and friends. Flair is: A powerful NLP library. Flair allo

flair 12.3k Jan 02, 2023
To create a deep learning model which can explain the content of an image in the form of speech through caption generation with attention mechanism on Flickr8K dataset.

To create a deep learning model which can explain the content of an image in the form of speech through caption generation with attention mechanism on Flickr8K dataset.

Ragesh Hajela 0 Feb 08, 2022
Material for GW4SHM workshop, 16/03/2022.

GW4SHM Workshop Wednesday, 16th March 2022 (13:00 – 15:15 GMT): Presented by: Dr. Rhodri Nelson, Imperial College London Project website: https://www.

Devito Codes 1 Mar 16, 2022
构建一个多源(公众号、RSS)、干净、个性化的阅读环境

2C 构建一个多源(公众号、RSS)、干净、个性化的阅读环境 作为一名微信公众号的重度用户,公众号一直被我设为汲取知识的地方。随着使用程度的增加,相信大家或多或少会有一个比较头疼的问题——广告问题。 假设你关注的公众号有十来个,若一个公众号两周接一次广告,理论上你会面临二十多次广告,实际上会更多,运

howie.hu 678 Dec 28, 2022
LOT: A Benchmark for Evaluating Chinese Long Text Understanding and Generation

LOT: A Benchmark for Evaluating Chinese Long Text Understanding and Generation Tasks | Datasets | LongLM | Baselines | Paper Introduction LOT is a ben

46 Dec 28, 2022
PG-19 Language Modelling Benchmark

PG-19 Language Modelling Benchmark This repository contains the PG-19 language modeling benchmark. It includes a set of books extracted from the Proje

DeepMind 161 Oct 30, 2022
GVT is a generic translation tool for parts of text on the PC screen with Text to Speak functionality.

GVT is a generic translation tool for parts of text on the PC screen with Text to Speech functionality. I wanted to create it because the existing tools that I experimented with did not satisfy me in

Nuked 1 Aug 21, 2022
Proquabet - Convert your prose into proquints and then you essentially have Vogon poetry

Proquabet Turn your prose into a constant stream of encrypted and meaningless-so

Milo Fultz 2 Oct 10, 2022
Task-based datasets, preprocessing, and evaluation for sequence models.

SeqIO: Task-based datasets, preprocessing, and evaluation for sequence models. SeqIO is a library for processing sequential data to be fed into downst

Google 290 Dec 26, 2022
Simple translation demo showcasing our headliner package.

Headliner Demo This is a demo showcasing our Headliner package. In particular, we trained a simple seq2seq model on an English-German dataset. We didn

Axel Springer News Media & Tech GmbH & Co. KG - Ideas Engineering 16 Nov 24, 2022
Long text token classification using LongFormer

Long text token classification using LongFormer

abhishek thakur 161 Aug 07, 2022
ChainKnowledgeGraph, 产业链知识图谱包括A股上市公司、行业和产品共3类实体

ChainKnowledgeGraph, 产业链知识图谱包括A股上市公司、行业和产品共3类实体,包括上市公司所属行业关系、行业上级关系、产品上游原材料关系、产品下游产品关系、公司主营产品、产品小类共6大类。 上市公司4,654家,行业511个,产品95,559条、上游材料56,824条,上级行业480条,下游产品390条,产品小类52,937条,所属行业3,946条。

liuhuanyong 415 Jan 06, 2023
Gold standard corpus annotated with verb-preverb connections for Hungarian.

Hungarian Preverb Corpus A gold standard corpus manually annotated with verb-preverb connections for Hungarian. corpus The corpus consist of the follo

RIL Lexical Knowledge Representation Research Group 3 Jan 27, 2022
Multilingual text (NLP) processing toolkit

polyglot Polyglot is a natural language pipeline that supports massive multilingual applications. Free software: GPLv3 license Documentation: http://p

RAMI ALRFOU 2.1k Jan 07, 2023
A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

Chimera: Learning Shared Semantic Space for Speech-to-Text Translation This is a Pytorch implementation for the "Chimera" paper Learning Shared Semant

Chi Han 43 Dec 28, 2022
Sentiment-Analysis and EDA on the IMDB Movie Review Dataset

Sentiment-Analysis and EDA on the IMDB Movie Review Dataset The main part of the work focuses on the exploration and study of different approaches whi

Nikolas Petrou 1 Jan 12, 2022
Script to generate VAD dataset used in Asteroid recipe

About the dataset LibriVAD is an open source dataset for voice activity detection in noisy environments. It is derived from LibriSpeech signals (clean

11 Sep 15, 2022
Search msDS-AllowedToActOnBehalfOfOtherIdentity

前言 现在进行RBCD的攻击手段主要是搜索mS-DS-CreatorSID,如果机器的创建者是我们可控的话,那就可以修改对应机器的msDS-AllowedToActOnBehalfOfOtherIdentity,利用工具SharpAllowedToAct-Modify 那我们索性也试试搜索所有计算机

Jumbo 26 Dec 05, 2022
Understand Text Summarization and create your own summarizer in python

Automatic summarization is the process of shortening a text document with software, in order to create a summary with the major points of the original document. Technologies that can make a coherent

Sreekanth M 1 Oct 18, 2022