Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation

Last update: Sep 26, 2022

Related tags

Deep Learning TE-VQGAN

Overview

Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation

Woncheol Shin¹, Gyubok Lee¹, Jiyoung Lee¹, Joonseok Lee^2,3, Edward Choi¹ | Paper

¹KAIST, ²Google Research, ³Seoul National University

Abstract

Recently, vector-quantized image modeling has demonstrated impressive performance on generation tasks such as text-to-image generation. However, we discover that the current image quantizers do not satisfy translation equivariance in the quantized space due to aliasing, degrading performance in the downstream text-to-image generation and image-to-text generation, even in simple experimental setups. Instead of focusing on anti-aliasing, we take a direct approach to encourage translation equivariance in the quantized space. In particular, we explore a desirable property of image quantizers, called 'Translation Equivariance in the Quantized Space' and propose a simple but effective way to achieve translation equivariance by regularizing orthogonality in the codebook embedding vectors. Using this method, we improve accuracy by +22% in text-to-image generation and +26% in image-to-text generation, outperforming the VQGAN.

Requirements

TBU

Download Dataset

TBU

Training TE-VQGAN (Stage 1)

TBU

Training Bi-directional Image-Text Generator (Stage 2)

TBU

Thanks to

The implementation of 'TE-VQGAN' and 'Bi-directional Image-Text Generator' is based on VQGAN and DALLE-pytorch. Thanks to all related works!

Citation

@misc{shin2021translationequivariant,
      title={Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation}, 
      author={Woncheol Shin and Gyubok Lee and Jiyoung Lee and Joonseok Lee and Edward Choi},
      year={2021},
      eprint={2112.00384},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation

Related tags

Overview

Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation

Abstract

Requirements

Download Dataset

Training TE-VQGAN (Stage 1)

Training Bi-directional Image-Text Generator (Stage 2)

Thanks to

Citation

Owner

Woncheol Shin

Transformer Huffman coding - Complete Huffman coding through transformer

Continual Learning of Electronic Health Records (EHR).

This is the implementation of our work Deep Extreme Cut (DEXTR), for object segmentation from extreme points.

Blind visual quality assessment on 360° Video based on progressive learning

Go from graph data to a secure and interactive visual graph app in 15 minutes. Batteries-included self-hosting of graph data apps with Streamlit, Graphistry, RAPIDS, and more!

Code for the paper "Can Active Learning Preemptively Mitigate Fairness Issues?" presented at RAI 2021.

Code for DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning

Deep Q Learning with OpenAI Gym and Pokemon Showdown

State of the Art Neural Networks for Deep Learning

PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models

Behavioral "black-box" testing for recommender systems

Multi-task Multi-agent Soft Actor Critic for SMAC

Official implementation of "StyleCariGAN: Caricature Generation via StyleGAN Feature Map Modulation" (SIGGRAPH 2021)

AI assistant built in python.the features are it can display time,say weather,open-google,youtube,instagram.

This repository contains the source code for the paper "DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks",

VolumeGAN - 3D-aware Image Synthesis via Learning Structural and Textural Representations

An Open-Source Tool for Automatic Disease Diagnosis..

Using Clinical Drug Representations for Improving Mortality and Length of Stay Predictions

Distributed DataLoader For Pytorch Based On Ray

Official git for "CTAB-GAN: Effective Table Data Synthesizing"