Implementation of CVAE. Trained CVAE on faces from UTKFace Dataset to produce synthetic faces with a given degree of happiness/smileyness.

Last update: Jan 09, 2022

Related tags

Deep Learning SmileCVAE

Overview

Conditional Smiles! (SmileCVAE)

About

Implementation of AE, VAE and CVAE. Trained CVAE on faces from UTKFace Dataset. Using an encoding of the Smile-strength degree to produce conditional generation of synthetic faces with a given smile degree.

Installation

Clone the repository git clone https://github.com/raulorteg/SmileCVAE
Create virtual environment:

Update pip python -m pip install pip --upgrade
Install virtualenv using pip python -m pip install virtualenv
Create Virtual environment virtualenv SmileCVAE
Activate Virtual environment (Mac OS/Linux: source SmileCVAE/bin/activate, Windows: SmileCVAE\Scripts\activate)
(Note: to deactivate environemt run deactivate)

Install requirements on the Virtual environment python -m pip install -r requirements.txt

Results

Training

In the .gif below the reconstruction for a group of 32 faces from the dataset can be visualized for all epochs.

Below, the final reconstruction of the CVAE for 32 faces of the dataset side by side to those original 32 images, for comparison.

Conditional generation

Using synthetic.py, we can sample from the prior distribution of the CVAE, concatenate the vector with our desired ecnoding of the smile degree and let the CVAE decode this sampled noise into a synthetic face of the desired smile degree. The range of smile-degree encodings in the training set is [-1,+1], where +1 is most smiley, -1 is most non-smiley. Below side to side 64 synthetic images for encodings -0.5, +0.5 are shown produced with this method.

Forcing smiles

With the trained model, one can use the pictures from the training set and instead of feeding in the smile-degree encode of the corresponding picture we can fix an encoding or shift it by a factor to force the image a smile/non smile. Below this is done for 32 picture of the training set, on the op the original 32 images are shown, below the reconstruction with their actual encoding, and then we shift the encoding by +0.5, +0.7, -0.5, -0.7 to change the smile degree in the original picture (zoom in to see in detail!). Finally the same diagram is now shown for a single picture.

The Dataset

The images of the faces come from UTKFace Dataset. However the images do not have any encoding of a continuous degree of "smiley-ness". This "smile-strength" degree is produced by creating a slideshow of the images and exposing them to three subjects (me and a couple friends), by registering wheather the face was classified as smiley or non-smiley we encourage the subjects to answer as fast as possible so as to rely on first impression and the reaction time is registered.

Notes: Bias in the Dataset

Its interesting to see that the when generating synthetic images with encodings < 0 (non-happy) the faces look more male-like and when generating synthetic images with encodings > 0 (happy) they tend to be more female-like. This is more apparent at the extremes, see the Note below. The original dataset although doesnt contains a smile degree encode, it has information of the image encoded in the filename, namely "gender" and "smile" as boolean values. Using this information then I can go and see if there was a bias in the dataset. In the piechart below the distribution of gender, and smile are shown. From there we can see that that although there are equals amount of men and women in the dataset, there were more non-smiley men than smiley men, and the bias of the synthetic generation may come from this unbalance.

Notes: Extending the encoding of smile-degree over the range for synthetic faces

Altough the range of smile-strength in the training set is [-1,+1], when generating synthetic images we can ask the model to generate outside of the range. But notice that then the synthetic faces become much more homogeneus, more than 64 different people it looks like small variations of the same synthetic image. Below side to side 64 synthetic images for encodings -3 (super not happy), +3 (super happy) are shown produced with this method.

References:

Fagertun, J., Andersen, T., Hansen, T., & Paulsen, R. R. (2013). 3D gender recognition using cognitive modeling. In 2013 International Workshop on Biometrics and Forensics (IWBF) IEEE. https://doi.org/10.1109/IWBF.2013.6547324
Kingma, Diederik & Welling, Max. (2013). Auto-Encoding Variational Bayes. ICLR.
Learning Structured Output Representation using Deep Conditional Generative Models, Kihyuk Sohn, Xinchen Yan, Honglak Lee

Implementation of CVAE. Trained CVAE on faces from UTKFace Dataset to produce synthetic faces with a given degree of happiness/smileyness.

Related tags

Overview

Conditional Smiles! (SmileCVAE)

About

Installation

Results

Training

Conditional generation

Forcing smiles

The Dataset

Notes: Bias in the Dataset

Notes: Extending the encoding of smile-degree over the range for synthetic faces

References:

Owner

Raúl Ortega

Go from graph data to a secure and interactive visual graph app in 15 minutes. Batteries-included self-hosting of graph data apps with Streamlit, Graphistry, RAPIDS, and more!

Python codes for Lite Audio-Visual Speech Enhancement.

Convolutional Neural Network for 3D meshes in PyTorch

Time Dependent DFT in Tamm-Dancoff Approximation

Interpretable and Generalizable Person Re-Identification with Query-Adaptive Convolution and Temporal Lifting

[ICML 2020] "When Does Self-Supervision Help Graph Convolutional Networks?" by Yuning You, Tianlong Chen, Zhangyang Wang, Yang Shen

Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization' (ICCV-21 Oral)

TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning

Code and project page for ICCV 2021 paper "DisUnknown: Distilling Unknown Factors for Disentanglement Learning"

Code for "Human Pose Regression with Residual Log-likelihood Estimation", ICCV 2021 Oral

Train a deep learning net with OpenStreetMap features and satellite imagery.

Tensorflow implementation of "Learning Deconvolution Network for Semantic Segmentation"

Dynamic Capacity Networks using Tensorflow

Gradient representations in ReLU networks as similarity functions

The official implementation of paper "Finding the Task-Optimal Low-Bit Sub-Distribution in Deep Neural Networks" (IJCV under review).

shufflev2-yolov5：lighter, faster and easier to deploy

Does Oversizing Improve Prosumer Profitability in a Flexibility Market? - A Sensitivity Analysis using PV-battery System

[arXiv'22] Panoptic NeRF: 3D-to-2D Label Transfer for Panoptic Urban Scene Segmentation

Learning to Reach Goals via Iterated Supervised Learning

GraPE is a Rust/Python library for high-performance Graph Processing and Embedding.