CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP

Related tags

Deep Learningcloob
Overview

CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP

Andreas Fürst* 1, Elisabeth Rumetshofer* 1, Viet Tran1, Hubert Ramsauer1, Fei Tang3, Johannes Lehner1, David Kreil2, Michael Kopp2, Günter Klambauer1, Angela Bitto-Nemling1, Sepp Hochreiter1 2

1 ELLIS Unit Linz and LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria
2 Institute of Advanced Research in Artificial Intelligence (IARAI)
3 HERE Technologies
* Equal contribution


Detailed blog post on this paper at this link.

The full paper is available here.


Implementation of CLOOB

This repository contains the implemenation of CLOOB used to obtain the results reported in the paper. The implementation is based on OpenCLIP, an open source implementation of OpenAI's CLIP.

Setup

We provide an 'environment.yml' file to set up a conda environment with all required packages. Run the following command to clone the repository and create the environment.

# Clone repository and swtich into the directory
git clone https://github.com/ml-jku/cloob
cd cloob

# Create the environment and activate it
conda env create --file environment.yml
conda activate cloob

# Additionally, webdataset needs to be installed from git repo for pre-training on YFCC 
pip install git+https://github.com/tmbdev/webdataset.git

# Add the directory to the PYTHONPATH environment variable
export PYTHONPATH="$PYTHONPATH:$PWD/src"

Data

For pre-training we use the two datasets supported by OpenCLIP, namely Conceptual Captions and YFCC.

Conceptual Captions

OpenCLIP already provides a script to download and prepare the Conceptual Captions dataset, which contains 2.89M training images and 13k validation images. First, download the Conceptual Captions URLs and then run the script gather_cc.py.

python3 src/data/gather_cc.py path/to/Train_GCC-training.tsv path/to/Validation_GCC-1.1.0-Validation.tsv

YFCC

We use the same subset of ~15M images from the YFCC100M dataset as CLIP. They provide a list of (line number, photo identifier, photo hash) of each image contained in this subset here.

For more information see YFCC100m Subset on OpenAI's github.

Downstream Tasks

In the paper we report results on several downstream tasks. Except for ImageNet we provide links to already pre-processed versions (where necessary) of the respective test set.

Dataset Description Official Processed
Birdsnap This dataset contains images of North American bird species, however
our dataset is smaller than reported in CLIP as some samples are no longer available.
Link Link
Country211 This dataset was published in CLIP and is a small subset of the YFCC100m dataset.
It consists of photos that can be assigned to 211 countries via GPS coordinates.
For each country 200 photos are sampled for the training set and 100 for testing.
Link Link
Flowers102 Images of 102 flower categories commonly occuring in the United Kingdom were collected.
Several classes are very similar and there is a large variation in scale, pose and lighting.
Link Link
GTSRB This dataset was released for a challenge held at the IJCNN 2011.
The dataset contains images of german traffic signs from more than 40 classes.
Link Link
Stanford Cars This dataset contains images of 196 car models at the level of make,
model and year (e.g. Tesla Model S Sedan 2012).
Link Link
UCF101 The dataset has been created by extracting the middle frame from each video. Link Link
ImageNet This dataset spans 1000 object classes and contains 1,281,167 training images,
50,000 validation images and 100,000 test images.
Link -
ImageNet v2 The ImageNetV2 dataset contains new test data for the ImageNet benchmark. Link -

Usage

In the following there is an example command for pretraining on CC with an effective batch size of 512 when used on 4 GPUs.

/conceptual_captions/Train-GCC-training_output.csv" \ --val-data=" /conceptual_captions/Validation_GCC-1.1.0-Validation_output.csv" \ --path-data=" /conceptual_captions" \ --imagenet-val=" /imagenet/val" \ --warmup 20000 \ --batch-size=128 \ --lr=1e-3 \ --wd=0.1 \ --lr-scheduler="cosine-restarts" \ --restart-cycles=10 \ --epochs=70 \ --method="cloob" \ --init-inv-tau=30 \ --init-scale-hopfield=8 \ --workers=8 \ --model="RN50" \ --dist-url="tcp://127.0.0.1:6100" \ --batch-size-eval=512 ">
python -u src/training/main.py \
--train-data="
       
        /conceptual_captions/Train-GCC-training_output.csv
        "
        \
--val-data="
       
        /conceptual_captions/Validation_GCC-1.1.0-Validation_output.csv
        "
        \
--path-data="
       
        /conceptual_captions
        "
        \
--imagenet-val="
       
        /imagenet/val
        "
        \
--warmup 20000 \
--batch-size=128 \
--lr=1e-3 \
--wd=0.1 \
--lr-scheduler="cosine-restarts" \
--restart-cycles=10 \
--epochs=70 \
--method="cloob" \
--init-inv-tau=30 \
--init-scale-hopfield=8 \
--workers=8 \
--model="RN50" \
--dist-url="tcp://127.0.0.1:6100" \
--batch-size-eval=512

Zeroshot evaluation of downstream tasks

We provide a Jupyter notebook to perform zeroshot evaluation with a trained model.

LICENSE

MIT LICENSE

Owner
Institute for Machine Learning, Johannes Kepler University Linz
Software of the Institute for Machine Learning, JKU Linz
Institute for Machine Learning, Johannes Kepler University Linz
potpourri3d - An invigorating blend of 3D geometry tools in Python.

A Python library of various algorithms and utilities for 3D triangle meshes and point clouds. Managed by Nicholas Sharp, with new tools added lazily as needed. Currently, mainly bindings to C++ tools

Nicholas Sharp 295 Jan 05, 2023
existing and custom freqtrade strategies supporting the new hyperstrategy format.

freqtrade-strategies Description Existing and self-developed strategies, rewritten to support the new HyperStrategy format from the freqtrade-develop

39 Aug 20, 2021
MultiTaskLearning - Multi Task Learning for 3D segmentation

Multi Task Learning for 3D segmentation Perception stack of an Autonomous Drivin

2 Sep 22, 2022
Feature board for ERPNext

ERPNext Feature Board Feature board for ERPNext Development Prerequisites k3d kubectl helm bench Install K3d Cluster # export K3D_FIX_CGROUPV2=1 # use

Revant Nandgaonkar 16 Nov 09, 2022
Fuzzing tool (TFuzz): a fuzzing tool based on program transformation

T-Fuzz T-Fuzz consists of 2 components: Fuzzing tool (TFuzz): a fuzzing tool based on program transformation Crash Analyzer (CrashAnalyzer): a tool th

HexHive 244 Nov 09, 2022
Tensor-Based Quantum Machine Learning

TensorLy_Quantum TensorLy-Quantum is a Python library for Tensor-Based Quantum Machine Learning that builds on top of TensorLy and PyTorch. Website: h

TensorLy 85 Dec 03, 2022
The codes for the work "Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation"

Swin-Unet The codes for the work "Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation"(https://arxiv.org/abs/2105.05537). A validatio

869 Jan 07, 2023
PyTorch implementation of Tacotron speech synthesis model.

tacotron_pytorch PyTorch implementation of Tacotron speech synthesis model. Inspired from keithito/tacotron. Currently not as much good speech quality

Ryuichi Yamamoto 279 Dec 09, 2022
Machine Learning automation and tracking

The Open-Source MLOps Orchestration Framework MLRun is an open-source MLOps framework that offers an integrative approach to managing your machine-lea

873 Jan 04, 2023
PyTorch implementation of paper: HPNet: Deep Primitive Segmentation Using Hybrid Representations.

HPNet This repository contains the PyTorch implementation of paper: HPNet: Deep Primitive Segmentation Using Hybrid Representations. Installation The

Siming Yan 42 Dec 07, 2022
This a classic fintech problem that introduces real life difficulties such as data imbalance. Check out the notebook to find out more!

Credit Card Fraud Detection Introduction Online transactions have become a crucial part of any business over the years. Many of those transactions use

Jonathan Hasbani 0 Jan 20, 2022
Official Pytorch implementation of Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations

Scene Representation Networks This is the official implementation of the NeurIPS submission "Scene Representation Networks: Continuous 3D-Structure-Aw

Vincent Sitzmann 365 Jan 06, 2023
Unified unsupervised and semi-supervised domain adaptation network for cross-scenario face anti-spoofing, Pattern Recognition

USDAN The implementation of Unified unsupervised and semi-supervised domain adaptation network for cross-scenario face anti-spoofing, which is accepte

11 Nov 03, 2022
Graph Attention Networks

GAT Graph Attention Networks (Veličković et al., ICLR 2018): https://arxiv.org/abs/1710.10903 GAT layer t-SNE + Attention coefficients on Cora Overvie

Petar Veličković 2.6k Jan 05, 2023
A library for differentiable nonlinear optimization.

Theseus A library for differentiable nonlinear optimization built on PyTorch to support constructing various problems in robotics and vision as end-to

Meta Research 1.1k Dec 30, 2022
A Gura parser implementation for Python

Gura Python parser This repository contains the implementation of a Gura (compliant with version 1.0.0) format parser in Python. Installation pip inst

Gura Config Lang 19 Jan 25, 2022
This project aims at providing a concise, easy-to-use, modifiable reference implementation for semantic segmentation models using PyTorch.

Semantic Segmentation on PyTorch (include FCN, PSPNet, Deeplabv3, Deeplabv3+, DANet, DenseASPP, BiSeNet, EncNet, DUNet, ICNet, ENet, OCNet, CCNet, PSANet, CGNet, ESPNet, LEDNet, DFANet)

2.4k Jan 08, 2023
This is the official implementation of VaxNeRF (Voxel-Accelearated NeRF).

VaxNeRF Paper | Google Colab This is the official implementation of VaxNeRF (Voxel-Accelearated NeRF). This codebase is implemented using JAX, buildin

naruya 132 Nov 21, 2022
Vehicle speed detection with python

Vehicle-speed-detection In the project simulate the tracker.py first then simulate the SpeedDetector.py. Finally, a new window pops up and the output

3 Dec 15, 2022
Code and data form the paper BERT Got a Date: Introducing Transformers to Temporal Tagging

BERT Got a Date: Introducing Transformers to Temporal Tagging Satya Almasian*, Dennis Aumiller*, and Michael Gertz Heidelberg University Contact us vi

54 Dec 04, 2022