This code provides a PyTorch implementation for OTTER (Optimal Transport distillation for Efficient zero-shot Recognition), as described in the paper.

Related tags

Deep LearningOTTER
Overview

Data Efficient Language-Supervised Zero-Shot Recognition with Optimal Transport Distillation

This repository contains PyTorch evaluation code, training code and pretrained models for OTTER (Optimal Transport distillation for Efficient zero-shot Recognition). Link to the paper.

Bichen Wu*, Ruizhe Cheng*, Peizhao Zhang, Tianren Gao, Joseph E. Gonzalez, Peter Vajda (* indicates equal contribution)

If you used this code for your experiments, please consider citing our paper:

@inproceedings{otter,
    Author = {Wu, Bichen and Cheng, Ruizhe and Zhang, Peizhao and Vajda, Peter and Gonzalez, Joseph E},
    Title = {Data Efficient Language-supervised Zero-shot Recognition with Optimal Transport Distillation},
    Journal = {arXiv:2112.09445},
    Year = {2021}
}

And our related work:

@inproceedings{cheng2021data,
  title={Data-Efficient Language-Supervised Zero-Shot Learning with Self-Distillation},
  author={Cheng, Ruizhe and Wu, Bichen and Zhang, Peizhao and Vajda, Peter and Gonzalez, Joseph E},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={3119--3124},
  year={2021}
}

Model Zoo

OTTER achieves good zero-shot image recognition results on multi-labeled Google Open Images V6 and ImageNet10K from Tencent Images.

Dataset Method Image Encoder Text Encoder GOI [email protected]=1 GOI [email protected]=5 GOI [email protected]=10 IN10K [email protected]=1 IN10K [email protected]=5 IN10K [email protected]=10 url
CC 3M InfoNCE RN50 DeCLUTR-Sci-base 26.8 55.1 66.4 10.9 29.4 40.5 model
CC 3M LS RN50 DeCLUTR-Sci-base 26.3 55.9 67.5 10.1 29.6 39.8 model
CC 3M KD RN50 DeCLUTR-Sci-base 26.7 55.3 67.1 10.0 27.5 38.5 model
CC 3M OTTER RN50 DeCLUTR-Sci-base 29.1 59.6 70.9 12.0 31.8 42.1 model

Usage

First, git clone the repository

git clone https://github.com/facebookresearch/OTTER.git

Then, install required packkages using pip

conda create --name otter python=3.8
conda activate otter
pip install -r requirements.txt

Try out classifying with a pretrained OTTER or one of its baseline models.

import torch
from PIL import Image
import otter

device = "cuda" if torch.cuda.is_available() else "cpu"
temperature = 60

model, preprocess = otter.load("OTTER") # KD, LS, InfoNCE
model = model.to(device)

image = Image.open("doge.jpg")
image = preprocess(image).unsqueeze(0).to(device)
texts = ['photo of a dog', 'photo of a sofa', 'photo of a flower']

with torch.no_grad():
    features = model.forward_features(image, texts)
    image_logits, text_logits = model.compute_logits(features)
    image_logits *= temperature

    probs = image_logits.softmax(dim=-1).cpu().numpy()

print("Probs:", probs)  # Probs: [[0.92657197 0.00180788 0.07162025]]

Evaluation

You can evaluate a pretrained model with launch_scripts/eval.sh.

Note that for faster evaluation, we used FAISS for knn lookup. The result however will be slightly different from using sklearn knn functions.

Data preparation

Download the Conceptual Caption or YFCC 15M (subset of YFCC100M) dataset for training. Download Google Open Images's or ImageNet 10K's test set for evaluation.

Conceptual Captions

First, download Train-GCC-training.tsv, which contains captions and image urls, from the official CC website. Then, follow the instructions in this repo to efficiently download Conceptual Captions. After the download completes, there should be a downloaded_training_report.tsv. Make sure it's in the same cc root folder as Train-GCC-training.tsv along with the training folder that contains all the images.

Run python data/cc_preprocess.py --cc_root /data/cc to generate a processed_labels.csv, which contains paired image paths and captions. This preprocessing step filters out invalid images that can't be opened by PIL. Note that not all images in the conceptual captions dataset are available. In our case, we had 2911810 valid images from the train set of conceptual captions.

YFCC 15M

Follow the instructions in here to download the 15 million images which were used in training CLIP.

After downloading all the zip files, convert the zip files to datadings format (with compression if necessary). In data/yfcc.py, the YFCC dataset takes in the datadings folder.

Google Open Images

Download the test set of Google Open Images V6 from here. We have provided the class names and label annotations in the dataset_meta_data folder.

ImageNet 10K (from Tencent ML-Images)

You can also evaluate on the validation set of multi-labeled ImageNet 10K from Tencent ML-Images. Download the ImageNet portion of Tencent ML-Images from here. We have also included the class names and label annotations in the dataset_meta_data folder.

The datasets should be placed in the following way:

DATA_ROOT/
  cc/
    processed_labels.csv
    training/
      ... (images)
  open-images/
    test/
      ... (images)
  tencent/
    images/
      ... (images)

Single node training

You can launch training on a single node with scripts in launch_scripts.

Dataset Analysis

You can analyze the prevalence of the noisy matching problem with python3 data_analysis.py --data_root <data_root> --datasets cc --batch 512 --stop 1000. The script uses a pretrained OpenAI CLIP model to estimate the the on-diagonal vs off-diagonal matching scores of an image-caption dataset.

License

This source code is licensed under the MIT license found in the LICENSE file in the root directory of this source tree.

Owner
Meta Research
Meta Research
Fine-tune pretrained Convolutional Neural Networks with PyTorch

Fine-tune pretrained Convolutional Neural Networks with PyTorch. Features Gives access to the most popular CNN architectures pretrained on ImageNet. A

Alex Parinov 694 Nov 23, 2022
Weakly Supervised Text-to-SQL Parsing through Question Decomposition

Weakly Supervised Text-to-SQL Parsing through Question Decomposition The official repository for the paper "Weakly Supervised Text-to-SQL Parsing thro

14 Dec 19, 2022
Multi-Modal Machine Learning toolkit based on PaddlePaddle.

简体中文 | English PaddleMM 简介 飞桨多模态学习工具包 PaddleMM 旨在于提供模态联合学习和跨模态学习算法模型库,为处理图片文本等多模态数据提供高效的解决方案,助力多模态学习应用落地。 近期更新 2022.1.5 发布 PaddleMM 初始版本 v1.0 特性 丰富的任务

njustkmg 520 Dec 28, 2022
a generic C++ library for image analysis

VIGRA Computer Vision Library Copyright 1998-2013 by Ullrich Koethe This file is part of the VIGRA computer vision library. You may use,

Ullrich Koethe 378 Dec 30, 2022
Exploring Versatile Prior for Human Motion via Motion Frequency Guidance (3DV2021)

Exploring Versatile Prior for Human Motion via Motion Frequency Guidance This is the codebase for video-based human motion reconstruction in human-mot

Jiachen Xu 5 Jul 14, 2022
Code for the paper "Adversarially Regularized Autoencoders (ICML 2018)" by Zhao, Kim, Zhang, Rush and LeCun

ARAE Code for the paper "Adversarially Regularized Autoencoders (ICML 2018)" by Zhao, Kim, Zhang, Rush and LeCun https://arxiv.org/abs/1706.04223 Disc

Junbo (Jake) Zhao 399 Jan 02, 2023
The source code of CVPR17 'Generative Face Completion'.

GenerativeFaceCompletion Matcaffe implementation of our CVPR17 paper on face completion. In each panel from left to right: original face, masked input

Yijun Li 313 Oct 18, 2022
Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechanism

Period-alternatives-of-Softmax Experimental Demo for our paper 'Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechani

slwang9353 0 Sep 06, 2021
MoCoPnet - Deformable 3D Convolution for Video Super-Resolution

MoCoPnet: Exploring Local Motion and Contrast Priors for Infrared Small Target Super-Resolution Pytorch implementation of local motion and contrast pr

Xinyi Ying 28 Dec 15, 2022
Datasets, tools, and benchmarks for representation learning of code.

The CodeSearchNet challenge has been concluded We would like to thank all participants for their submissions and we hope that this challenge provided

GitHub 1.8k Dec 25, 2022
PyTorch implementation DRO: Deep Recurrent Optimizer for Structure-from-Motion

DRO: Deep Recurrent Optimizer for Structure-from-Motion This is the official PyTorch implementation code for DRO-sfm. For technical details, please re

Alibaba Cloud 56 Dec 12, 2022
Pytorch implementation for A-NeRF: Articulated Neural Radiance Fields for Learning Human Shape, Appearance, and Pose

A-NeRF: Articulated Neural Radiance Fields for Learning Human Shape, Appearance, and Pose Paper | Website | Data A-NeRF: Articulated Neural Radiance F

Shih-Yang Su 172 Dec 22, 2022
SpecAugmentPyTorch - A Pytorch (support batch and channel) implementation of GoogleBrain's SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

SpecAugment An implementation of SpecAugment for Pytorch How to use Install pytorch, version=1.9.0 (new feature (torch.Tensor.take_along_dim) is used

IMLHF 3 Oct 11, 2022
Code for the paper: Adversarial Training Against Location-Optimized Adversarial Patches. ECCV-W 2020.

Adversarial Training Against Location-Optimized Adversarial Patches arXiv | Paper | Code | Video | Slides Code for the paper: Sukrut Rao, David Stutz,

Sukrut Rao 32 Dec 13, 2022
Dynamica causal Bayesian optimisation

Dynamic Causal Bayesian Optimization This is a Python implementation of Dynamic Causal Bayesian Optimization as presented at NeurIPS 2021. Abstract Th

nd308 18 Nov 22, 2022
Multi-Template Mouse Brain MRI Atlas (MBMA): both in-vivo and ex-vivo

Multi-template MRI mouse brain atlas (both in vivo and ex vivo) Mouse Brain MRI atlas (both in-vivo and ex-vivo) (repository relocated from the origin

8 Nov 18, 2022
Implementation of the bachelor's thesis "Real-time stock predictions with deep learning and news scraping".

Real-time stock predictions with deep learning and news scraping This repository contains a partial implementation of my bachelor's thesis "Real-time

David Álvarez de la Torre 0 Feb 09, 2022
Deepface is a lightweight face recognition and facial attribute analysis (age, gender, emotion and race) framework for python

deepface Deepface is a lightweight face recognition and facial attribute analysis (age, gender, emotion and race) framework for python. It is a hybrid

Kushal Shingote 2 Feb 10, 2022
Deep Learning to Improve Breast Cancer Detection on Screening Mammography

Shield: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Deep Learning to Improve Breast

Li Shen 305 Jan 03, 2023