Clockwork Variational Autoencoders (CW-VAE)

Vaibhav Saxena, Jimmy Ba, Danijar Hafner

If you find this code useful, please reference in your paper:

@article{saxena2021clockworkvae,
  title={Clockwork Variational Autoencoders}, 
  author={Saxena, Vaibhav and Ba, Jimmy and Hafner, Danijar},
  journal={arXiv preprint arXiv:2102.09532},
  year={2021},
}

Method

Clockwork VAEs are deep generative model that learn long-term dependencies in video by leveraging hierarchies of representations that progress at different clock speeds. In contrast to prior video prediction methods that typically focus on predicting sharp but short sequences in the future, Clockwork VAEs can accurately predict high-level content, such as object positions and identities, for 1000 frames.

Clockwork VAEs build upon the Recurrent State Space Model (RSSM), so each state contains a deterministic component for long-term memory and a stochastic component for sampling diverse plausible futures. Clockwork VAEs are trained end-to-end to optimize the evidence lower bound (ELBO) that consists of a reconstruction term for each image and a KL regularizer for each stochastic variable in the model.

More information:

Instructions

This repository contains the code for training the Clockwork VAE model on the datasets minerl, mazes, and mmnist.

The datasets will automatically be downloaded into the --datadir directory.

python3 train.py --logdir /path/to/logdir --datadir /path/to/datasets --config configs/<dataset>.yml

The evaluation script writes open-loop video predictions in both PNG and NPZ format and plots of PSNR and SSIM to the data directory.

python3 eval.py --logdir /path/to/logdir

Clockwork Variational Autoencoder

Related tags

Overview

Clockwork Variational Autoencoders (CW-VAE)

Method

Instructions

Owner

Vaibhav Saxena

Disentangled Face Attribute Editing via Instance-Aware Latent Space Search, accepted by IJCAI 2021.

Custom studies about block sparse attention.

This repo contains research materials released by members of the Google Brain team in Tokyo.

ADGAN - The Implementation of paper Controllable Person Image Synthesis with Attribute-Decomposed GAN

Code for "Training Neural Networks with Fixed Sparse Masks" (NeurIPS 2021).

Tensorflow 2 implementations of the C-SimCLR and C-BYOL self-supervised visual representation methods from "Compressive Visual Representations" (NeurIPS 2021)

A curated list of neural rendering resources.

Deep Learning Interviews book: Hundreds of fully solved job interview questions from a wide range of key topics in AI.

A trusty face recognition research platform developed by Tencent Youtu Lab

TensorFlow 2 implementation of the Yahoo Open-NSFW model

Physics-informed Neural Operator for Learning Partial Differential Equation

Change Detection in SAR Images Based on Multiscale Capsule Network

AITom is an open-source platform for AI driven cellular electron cryo-tomography analysis.

This repository contains the code for the paper 'PARM: Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval' published at ECIR'22.

[CVPR 2021] Unsupervised 3D Shape Completion through GAN Inversion

Reinforcement Learning for Portfolio Management

Collection of generative models in Pytorch version.

Solve a Rubiks Cube using Python Opencv and Kociemba module

Code for our ICASSP 2021 paper: SA-Net: Shuffle Attention for Deep Convolutional Neural Networks

Project ArXiv Citation Network