PyTorch Implementation of Vector Quantized Variational AutoEncoders.

Last update: Oct 06, 2021

Related tags

Deep Learning VQVAE-PyTorch

Overview

Pytorch implementation of VQVAE.

This paper combines 2 tricks:

Vector Quantization (check out this amazing blog for better understanding.)
Straight-Through (It solves the problem of back-propagation through discrete latent variables, which are intractable.)

This model has a neural network encoder and decoder, and a prior just like the vanila Variational AutoEncoder(VAE). But this model also has a latent embedding space called codebook(size: K x D). Here, K is the size of latent space and D is the dimension of each embedding e.

In vanilla variational autoencoders, the output from the encoder z(x) is used to parameterize a Normal/Gaussian distribution, which is sampled from to get a latent representation z of the input x using the 'reparameterization trick'. This latent representation is then passed to the decoder. However, In VQVAEs, z(x) is used as a "key" to do nearest neighbour lookup into the embedding codebook c, and get zq(x), the closest embedding in the space. This is called Vector Quantization(VQ) operation. Then, zq(x) is passed to the decoder, which reconstructs the input x. The decoder can either parameterize p(x|z) as the mean of Normal distribution using a transposed convolution layer like in vannila VAE, or it can autoregressively generate categorical distribution over [0,255] pixel values like PixelCNN. In this project, the first approach is used.

The loss function is combined of 3 components:

Regular Reconstruction loss
Vector Quantization loss
Commitment loss

Vector Quantization loss encourages the items in the codebook to move closer to the encoder output ||sg[ze(x) - e||^2] and Commitment loss encourages the output of the encoder to be close to embedding it picked, to commit to its codebook embedding. ||ze(x) - sg[e]]||^2 . commitment loss is multiplied with a constant beta, which is 1.0 for this project. Here, sg means "stop-gradient". Which means we don't propagate the gradients with respect to that term.

Results:

The Model is trained on MNIST and CIFAR10 datasets.

Target 👉 Reconstructed Image

👉

Details:

Trained models for MNIST and CIFAR10 are in the Trained models directory.
Hidden size of the bottleneck(z) for MNIST and CIFAR10 is 128, 256 respectively.

PyTorch Implementation of Vector Quantized Variational AutoEncoders.

Related tags

Overview

Results:

Target 👉 Reconstructed Image

Details:

Owner

Vrushank Changawala

Starter Code for VALUE benchmark

ML-PersonalWork - Big assignment PersonalWork in Machine Learning, 2021 autumn BUAA.

Codes for NeurIPS 2021 paper "On the Equivalence between Neural Network and Support Vector Machine".

A Weakly Supervised Amodal Segmenter with Boundary Uncertainty Estimation

Official repository of Semantic Image Matting

An official PyTorch Implementation of Boundary-aware Self-supervised Learning for Video Scene Segmentation (BaSSL)

PoseViz – Multi-person, multi-camera 3D human pose visualization tool built using Mayavi.

Neural Ensemble Search for Performant and Calibrated Predictions

Official repository for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'21, Oral Presentation)

Official PyTorch implementation of "Physics-aware Difference Graph Networks for Sparsely-Observed Dynamics".

A 1.3B text-to-image generation model trained on 14 million image-text pairs

MixRNet(Using mixup as regularization and tuning hyper-parameters for ResNets)

Using image super resolution models with vapoursynth and speeding them up with TensorRT

The official repo for OC-SORT: Observation-Centric SORT on video Multi-Object Tracking. OC-SORT is simple, online and robust to occlusion/non-linear motion.

Adaptable tools to make reinforcement learning and evolutionary computation algorithms.

Code for a real-time distributed cooperative slam(RDC-SLAM) system for ROS compatible platforms.

An air quality monitoring service with a Raspberry Pi and a SDS011 sensor.

基于Flask开发后端、VUE开发前端框架，在WEB端部署YOLOv5目标检测模型

A diff tool for language models

A multi-functional library for full-stack Deep Learning. Simplifies Model Building, API development, and Model Deployment.