Implementation of experiments in the paper Clockwork Variational Autoencoders (project website) using JAX and Flax

Last update: Oct 05, 2022

Overview

Clockwork VAEs in JAX/Flax

Implementation of experiments in the paper Clockwork Variational Autoencoders (project website) using JAX and Flax, ported from the official TensorFlow implementation.

Running on a single TPU v3, training is 10x faster than reported in the paper (60h -> 6h on minerl).

Method

Clockwork VAEs are deep generative model that learn long-term dependencies in video by leveraging hierarchies of representations that progress at different clock speeds. In contrast to prior video prediction methods that typically focus on predicting sharp but short sequences in the future, Clockwork VAEs can accurately predict high-level content, such as object positions and identities, for 1000 frames.

Clockwork VAEs build upon the Recurrent State Space Model (RSSM), so each state contains a deterministic component for long-term memory and a stochastic component for sampling diverse plausible futures. Clockwork VAEs are trained end-to-end to optimize the evidence lower bound (ELBO) that consists of a reconstruction term for each image and a KL regularizer for each stochastic variable in the model.

Instructions

This repository contains the code for training the Clockwork VAE model on the datasets minerl, mazes, and mmnist.

The datasets will automatically be downloaded into the --datadir directory.

python3 train.py --logdir /path/to/logdir --datadir /path/to/datasets --config configs/<dataset>.yml

The evaluation script writes open-loop video predictions in both PNG and NPZ format and plots of PSNR and SSIM to the data directory.

python3 eval.py --logdir /path/to/logdir

Known differences from the original

Flax' default kernel initializer, layer precision and GRU implementation (avoiding redundant biases) are used.
For some configuration parameters, only the defaults are implemented.
Training metrics and videos are logged with wandb.
The base configuration is in config.py.

Added features:

This implementation runs on TPU out-of-the-box.
Apart from the config file, configuration can be done via command line and wandb.
Matching the seed of a previous run will exactly repeat it.

Things to watch out for

Replication of paper results for the mazes dataset has not been confirmed yet.

Getting evaluation metrics is a memory bottleneck during training, due to the large eval_seq_len. If you run out of device memory, consider lowering it during training, for example to 100. Remember to pass in the original value to eval.py to get unchanged results.

Acknowledgements

Thanks to Vaibhav Saxena and Danijar Hafner for helpful discussions and to Jamie Townsend for reviewing code.

Implementation of experiments in the paper Clockwork Variational Autoencoders (project website) using JAX and Flax

Related tags

Overview

Clockwork VAEs in JAX/Flax

Method

Instructions

Known differences from the original

Things to watch out for

Acknowledgements

Owner

Julius Kunze

The 2nd place solution of 2021 google landmark retrieval on kaggle.

Acute ischemic stroke dataset

Cervix ROI Segmentation Using U-NET

The code is the training example of AAAI2022 Security AI Challenger Program Phase 8: Data Centric Robot Learning on ML models.

AI Flow is an open source framework that bridges big data and artificial intelligence.

AugLy is a data augmentations library that currently supports four modalities (audio, image, text & video) and over 100 augmentations

Western-3DSlicer-Modules - Point-Set Registrations for Ultrasound Probe Calibrations

An extremely simple, intuitive, hardware-friendly, and well-performing network structure for LiDAR semantic segmentation on 2D range image. IROS21

A light-weight image labelling tool for Python designed for creating segmentation data sets.

OpenMMLab Video Perception Toolbox. It supports Video Object Detection (VID), Multiple Object Tracking (MOT), Single Object Tracking (SOT), Video Instance Segmentation (VIS) with a unified framework.

NeuroFind - A solution to the to the Task given by the Oberseminar of Messtechnik Institute of TU Dresden in 2021

NAACL'2021: Factual Probing Is [MASK]: Learning vs. Learning to Recall

Forest R-CNN: Large-Vocabulary Long-Tailed Object Detection and Instance Segmentation (ACM MM 2020)

Full Resolution Residual Networks for Semantic Image Segmentation

This is the official implementation of the paper "Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation".

This repository contains an implementation of ConvMixer for the ICLR 2022 submission "Patches Are All You Need?".

Generating Videos with Scene Dynamics

The Pytorch implementation for "Video-Text Pre-training with Learned Regions"

Implementation for the paper SMPLicit: Topology-aware Generative Model for Clothed People (CVPR 2021)

“英特尔创新大师杯”深度学习挑战赛赛道3：CCKS2021中文NLP地址相关性任务

Implementation of experiments in the paper Clockwork Variational Autoencoders (project website) using JAX and Flax

Related tags

Overview

Clockwork VAEs in JAX/Flax

Method

Instructions

Known differences from the original

Things to watch out for

Acknowledgements

Owner

Julius Kunze

The 2nd place solution of 2021 google landmark retrieval on kaggle.

Acute ischemic stroke dataset

Cervix ROI Segmentation Using U-NET

The code is the training example of AAAI2022 Security AI Challenger Program Phase 8: Data Centric Robot Learning on ML models.

AI Flow is an open source framework that bridges big data and artificial intelligence.

AugLy is a data augmentations library that currently supports four modalities (audio, image, text & video) and over 100 augmentations

Western-3DSlicer-Modules - Point-Set Registrations for Ultrasound Probe Calibrations

An extremely simple, intuitive, hardware-friendly, and well-performing network structure for LiDAR semantic segmentation on 2D range image. IROS21

A light-weight image labelling tool for Python designed for creating segmentation data sets.

OpenMMLab Video Perception Toolbox. It supports Video Object Detection (VID), Multiple Object Tracking (MOT), Single Object Tracking (SOT), Video Instance Segmentation (VIS) with a unified framework.

NeuroFind - A solution to the to the Task given by the Oberseminar of Messtechnik Institute of TU Dresden in 2021

NAACL'2021: Factual Probing Is [MASK]: Learning vs. Learning to Recall

Forest R-CNN: Large-Vocabulary Long-Tailed Object Detection and Instance Segmentation (ACM MM 2020)

Full Resolution Residual Networks for Semantic Image Segmentation

This is the official implementation of the paper "Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation".

This repository contains an implementation of ConvMixer for the ICLR 2022 submission "Patches Are All You Need?".

Generating Videos with Scene Dynamics

The Pytorch implementation for "Video-Text Pre-training with Learned Regions"

Implementation for the paper SMPLicit: Topology-aware Generative Model for Clothed People (CVPR 2021)

“英特尔创新大师杯”深度学习挑战赛 赛道3：CCKS2021中文NLP地址相关性任务

“英特尔创新大师杯”深度学习挑战赛赛道3：CCKS2021中文NLP地址相关性任务