This code provides a PyTorch implementation for OTTER (Optimal Transport distillation for Efficient zero-shot Recognition), as described in the paper.

Related tags

Deep LearningOTTER
Overview

Data Efficient Language-Supervised Zero-Shot Recognition with Optimal Transport Distillation

This repository contains PyTorch evaluation code, training code and pretrained models for OTTER (Optimal Transport distillation for Efficient zero-shot Recognition). Link to the paper.

Bichen Wu*, Ruizhe Cheng*, Peizhao Zhang, Tianren Gao, Joseph E. Gonzalez, Peter Vajda (* indicates equal contribution)

If you used this code for your experiments, please consider citing our paper:

@inproceedings{otter,
    Author = {Wu, Bichen and Cheng, Ruizhe and Zhang, Peizhao and Vajda, Peter and Gonzalez, Joseph E},
    Title = {Data Efficient Language-supervised Zero-shot Recognition with Optimal Transport Distillation},
    Journal = {arXiv:2112.09445},
    Year = {2021}
}

And our related work:

@inproceedings{cheng2021data,
  title={Data-Efficient Language-Supervised Zero-Shot Learning with Self-Distillation},
  author={Cheng, Ruizhe and Wu, Bichen and Zhang, Peizhao and Vajda, Peter and Gonzalez, Joseph E},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={3119--3124},
  year={2021}
}

Model Zoo

OTTER achieves good zero-shot image recognition results on multi-labeled Google Open Images V6 and ImageNet10K from Tencent Images.

Dataset Method Image Encoder Text Encoder GOI [email protected]=1 GOI [email protected]=5 GOI [email protected]=10 IN10K [email protected]=1 IN10K [email protected]=5 IN10K [email protected]=10 url
CC 3M InfoNCE RN50 DeCLUTR-Sci-base 26.8 55.1 66.4 10.9 29.4 40.5 model
CC 3M LS RN50 DeCLUTR-Sci-base 26.3 55.9 67.5 10.1 29.6 39.8 model
CC 3M KD RN50 DeCLUTR-Sci-base 26.7 55.3 67.1 10.0 27.5 38.5 model
CC 3M OTTER RN50 DeCLUTR-Sci-base 29.1 59.6 70.9 12.0 31.8 42.1 model

Usage

First, git clone the repository

git clone https://github.com/facebookresearch/OTTER.git

Then, install required packkages using pip

conda create --name otter python=3.8
conda activate otter
pip install -r requirements.txt

Try out classifying with a pretrained OTTER or one of its baseline models.

import torch
from PIL import Image
import otter

device = "cuda" if torch.cuda.is_available() else "cpu"
temperature = 60

model, preprocess = otter.load("OTTER") # KD, LS, InfoNCE
model = model.to(device)

image = Image.open("doge.jpg")
image = preprocess(image).unsqueeze(0).to(device)
texts = ['photo of a dog', 'photo of a sofa', 'photo of a flower']

with torch.no_grad():
    features = model.forward_features(image, texts)
    image_logits, text_logits = model.compute_logits(features)
    image_logits *= temperature

    probs = image_logits.softmax(dim=-1).cpu().numpy()

print("Probs:", probs)  # Probs: [[0.92657197 0.00180788 0.07162025]]

Evaluation

You can evaluate a pretrained model with launch_scripts/eval.sh.

Note that for faster evaluation, we used FAISS for knn lookup. The result however will be slightly different from using sklearn knn functions.

Data preparation

Download the Conceptual Caption or YFCC 15M (subset of YFCC100M) dataset for training. Download Google Open Images's or ImageNet 10K's test set for evaluation.

Conceptual Captions

First, download Train-GCC-training.tsv, which contains captions and image urls, from the official CC website. Then, follow the instructions in this repo to efficiently download Conceptual Captions. After the download completes, there should be a downloaded_training_report.tsv. Make sure it's in the same cc root folder as Train-GCC-training.tsv along with the training folder that contains all the images.

Run python data/cc_preprocess.py --cc_root /data/cc to generate a processed_labels.csv, which contains paired image paths and captions. This preprocessing step filters out invalid images that can't be opened by PIL. Note that not all images in the conceptual captions dataset are available. In our case, we had 2911810 valid images from the train set of conceptual captions.

YFCC 15M

Follow the instructions in here to download the 15 million images which were used in training CLIP.

After downloading all the zip files, convert the zip files to datadings format (with compression if necessary). In data/yfcc.py, the YFCC dataset takes in the datadings folder.

Google Open Images

Download the test set of Google Open Images V6 from here. We have provided the class names and label annotations in the dataset_meta_data folder.

ImageNet 10K (from Tencent ML-Images)

You can also evaluate on the validation set of multi-labeled ImageNet 10K from Tencent ML-Images. Download the ImageNet portion of Tencent ML-Images from here. We have also included the class names and label annotations in the dataset_meta_data folder.

The datasets should be placed in the following way:

DATA_ROOT/
  cc/
    processed_labels.csv
    training/
      ... (images)
  open-images/
    test/
      ... (images)
  tencent/
    images/
      ... (images)

Single node training

You can launch training on a single node with scripts in launch_scripts.

Dataset Analysis

You can analyze the prevalence of the noisy matching problem with python3 data_analysis.py --data_root <data_root> --datasets cc --batch 512 --stop 1000. The script uses a pretrained OpenAI CLIP model to estimate the the on-diagonal vs off-diagonal matching scores of an image-caption dataset.

License

This source code is licensed under the MIT license found in the LICENSE file in the root directory of this source tree.

Owner
Meta Research
Meta Research
The implementation for paper Joint t-SNE for Comparable Projections of Multiple High-Dimensional Datasets.

Joint t-sne This is the implementation for paper Joint t-SNE for Comparable Projections of Multiple High-Dimensional Datasets. abstract: We present Jo

IDEAS Lab 7 Dec 18, 2022
Source code for "FastBERT: a Self-distilling BERT with Adaptive Inference Time".

FastBERT Source code for "FastBERT: a Self-distilling BERT with Adaptive Inference Time". Good News 2021/10/29 - Code: Code of FastPLM is released on

Weijie Liu 584 Jan 02, 2023
A Gura parser implementation for Python

Gura Python parser This repository contains the implementation of a Gura (compliant with version 1.0.0) format parser in Python. Installation pip inst

Gura Config Lang 19 Jan 25, 2022
PyTorch code for the paper "FIERY: Future Instance Segmentation in Bird's-Eye view from Surround Monocular Cameras"

FIERY This is the PyTorch implementation for inference and training of the future prediction bird's-eye view network as described in: FIERY: Future In

Wayve 406 Dec 24, 2022
AAI supports interdisciplinary research to help better understand human, animal, and artificial cognition.

AnimalAI 3 AAI supports interdisciplinary research to help better understand human, animal, and artificial cognition. It aims to support AI research t

Matthew Crosby 58 Dec 12, 2022
Dealing With Misspecification In Fixed-Confidence Linear Top-m Identification

Dealing With Misspecification In Fixed-Confidence Linear Top-m Identification This repository is the official implementation of [Dealing With Misspeci

0 Oct 25, 2021
Code for the paper "Improved Techniques for Training GANs"

Status: Archive (code is provided as-is, no updates expected) improved-gan code for the paper "Improved Techniques for Training GANs" MNIST, SVHN, CIF

OpenAI 2.2k Jan 01, 2023
Codebase for Diffusion Models Beat GANS on Image Synthesis.

Codebase for Diffusion Models Beat GANS on Image Synthesis.

Katherine Crowson 128 Dec 02, 2022
Code for "Hierarchical Skills for Efficient Exploration" HSD-3 Algorithm and Baselines

Hierarchical Skills for Efficient Exploration This is the source code release for the paper Hierarchical Skills for Efficient Exploration. It contains

Facebook Research 38 Dec 06, 2022
A DNN inference latency prediction toolkit for accurately modeling and predicting the latency on diverse edge devices.

Note: This is an alpha (preview) version which is still under refining. nn-Meter is a novel and efficient system to accurately predict the inference l

Microsoft 244 Jan 06, 2023
A multi-entity Transformer for multi-agent spatiotemporal modeling.

baller2vec This is the repository for the paper: Michael A. Alcorn and Anh Nguyen. baller2vec: A Multi-Entity Transformer For Multi-Agent Spatiotempor

Michael A. Alcorn 56 Nov 15, 2022
NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.

NVIDIA Merlin NVIDIA Merlin is an open source library designed to accelerate recommender systems on NVIDIA’s GPUs. It enables data scientists, machine

419 Jan 03, 2023
DeepLM: Large-scale Nonlinear Least Squares on Deep Learning Frameworks using Stochastic Domain Decomposition (CVPR 2021)

DeepLM DeepLM: Large-scale Nonlinear Least Squares on Deep Learning Frameworks using Stochastic Domain Decomposition (CVPR 2021) Run Please install th

Jingwei Huang 130 Dec 02, 2022
Rainbow DQN implementation that outperforms the paper's results on 40% of games using 20x less data 🌈

Rainbow 🌈 An implementation of Rainbow DQN which outperforms the paper's (Hessel et al. 2017) results on 40% of tested games while using 20x less dat

Dominik Schmidt 31 Dec 21, 2022
A Simplied Framework of GAN Inversion

Framework of GAN Inversion Introcuction You can implement your own inversion idea using our repo. We offer a full range of tuning settings (in hparams

Kangneng Zhou 13 Sep 27, 2022
Code for the ICME 2021 paper "Exploring Driving-Aware Salient Object Detection via Knowledge Transfer"

TSOD Code for the ICME 2021 paper "Exploring Driving-Aware Salient Object Detection via Knowledge Transfer" Usage For training, open train_test, run p

Jinming Su 2 Dec 23, 2021
DCA - Official Python implementation of Delaunay Component Analysis algorithm

Delaunay Component Analysis (DCA) Official Python implementation of the Delaunay

Petra Poklukar 9 Sep 06, 2022
Semi-supevised Semantic Segmentation with High- and Low-level Consistency

Semi-supevised Semantic Segmentation with High- and Low-level Consistency This Pytorch repository contains the code for our work Semi-supervised Seman

123 Dec 30, 2022
State-Relabeling Adversarial Active Learning

State-Relabeling Adversarial Active Learning Code for SRAAL [2020 CVPR Oral] Requirements torch = 1.6.0 numpy = 1.19.1 tqdm = 4.31.1 AL Results The

10 Jul 14, 2022
Space robot - (Course Project) Using the space robot to capture the target satellite that is disabled and spinning, then stabilize and fix it up

Space robot - (Course Project) Using the space robot to capture the target satellite that is disabled and spinning, then stabilize and fix it up

Mingrui Yu 3 Jan 07, 2022