CLOOB training (JAX) and inference (JAX and PyTorch)

Last update: Nov 27, 2022

Related tags

Overview

cloob-training

Pretrained models

There are two pretrained CLOOB models in this repo at the moment, a 16 epoch and a 32 epoch ViT-B/16 checkpoint trained on LAION 400M.

Zero-shot ImageNet validation set accuracy (using OpenCLIP's code):

Model name	Top 1	Top 5
cloob_laion_400m_vit_b_16_16_epochs	0.61238	0.8492
cloob_laion_400m_vit_b_16_32_epochs	0.62816	0.85964
OpenAI CLIP ViT-B/32	0.6327	0.88772
OpenAI CLIP ViT-B/16	0.68132	0.91768
OpenAI CLIP ViT-L/14	0.75388	0.9454
OpenAI CLIP ViT-L/14 @ 336 px	0.76564	0.9515
OpenAI CLIP RN50	0.59806	0.86498
OpenAI CLIP RN101	0.62296	0.88106
OpenAI CLIP RN50x4	0.66268	0.9046
OpenAI CLIP RN50x16	0.70754	0.92822
OpenAI CLIP RN50x64	0.74134	0.94146

PyTorch

from cloob_training import model_pt, pretrained

pretrained.list_configs()

returns:

['cloob_laion_400m_vit_b_16_16_epochs', 'cloob_laion_400m_vit_b_16_32_epochs']

The models can be used by:

config = pretrained.get_config('cloob_laion_400m_vit_b_16_16_epochs')
model = model_pt.get_pt_model(config)
checkpoint = pretrained.download_checkpoint(config)
model.load_state_dict(model_pt.get_pt_params(config, checkpoint))
model.eval().requires_grad_(False).to('cuda')

Model class attributes:

model.config: the model config dict.

model.image_encoder: the image encoder, which expects NCHW batches of normalized images (preprocessed by model.normalize), where C = model.config['image_encoder']['input_channels'] and H, W = model.config['image_encoder']['image_size'].

model.text_encoder: the text encoder, which expects text tokenized by model.tokenize.

model.normalize: the preprocessor for image tensors.

model.tokenize: the preprocessor for text.

JAX

Coming soon...

Training (JAX only)

Coming soon...

CLOOB training (JAX) and inference (JAX and PyTorch)

Related tags

Overview

cloob-training

Pretrained models

PyTorch

JAX

Training (JAX only)

Owner

Katherine Crowson

Exploiting Robust Unsupervised Video Person Re-identification

Auditing Black-Box Prediction Models for Data Minimization Compliance

Transformers based fully on MLPs

An implementation of "Optimal Textures: Fast and Robust Texture Synthesis and Style Transfer through Optimal Transport"

Sentinel-1 vessel detection model used in the xView3 challenge

Official PyTorch implementation of MX-Font (Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts)

[TIP 2021] SADRNet: Self-Aligned Dual Face Regression Networks for Robust 3D Dense Face Alignment and Reconstruction

The official implementation of A Unified Game-Theoretic Interpretation of Adversarial Robustness.

A python script to convert images to animated sus among us crewmate twerk jifs as seen on r/196

Various operations like path tracking, counting, etc by using yolov5

2021 CCF BDCI 全国信息检索挑战杯（CCIR-Cup）智能人机交互自然语言理解赛道第二名参赛解决方案

Jingju baseline - A baseline model of our project of Beijing opera script generation

Viperdb - A tiny log-structured key-value database written in pure Python

Implementation of the master's thesis "Temporal copying and local hallucination for video inpainting".

Vector Quantization, in Pytorch

StyleMapGAN - Official PyTorch Implementation

Official repository for "Deep Recurrent Neural Network with Multi-scale Bi-directional Propagation for Video Deblurring".

A general framework for deep learning experiments under PyTorch based on pytorch-lightning

On Uncertainty, Tempering, and Data Augmentation in Bayesian Classification

Implementation of Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning