Code for Efficient Visual Pretraining with Contrastive Detection

Related tags

Deep Learningdetcon
Overview

Code for DetCon

This repository contains code for the ICCV 2021 paper "Efficient Visual Pretraining with Contrastive Detection" by Olivier J. Hénaff, Skanda Koppula, Jean-Baptiste Alayrac, Aaron van den Oord, Oriol Vinyals, João Carreira.

This repository includes sample code to run pretraining with DetCon. In particular, we're providing a sample script for generating the Felzenzwalb segmentations for ImageNet images (using skimage) and a pre-training experiment setup (dataloader, augmentation pipeline, optimization config, and loss definition) that describes the DetCon-B(YOL) model described in the paper. The original code uses a large grid of TPUs and internal infrastructure for training, but we've extracted the key DetCon loss+experiment in this folder so that external groups can have a reference should they want to explore a similar approaches.

This repository builds heavily from the BYOL open source release, so speed-up tricks and features in that setup may likely translate to the code here.

Running this code

Running ./setup.sh will create and activate a virtualenv and install all necessary dependencies. To enter the environment after running setup.sh, run source /tmp/detcon_venv/bin/activate.

Running bash test.sh will run a single training step on a mock image/Felzenszwalb mask as a simple validation that all dependencies are set up correctly and the DetCon pre-training can run smoothly. On our 16-core machine, running on CPU, we find this takes around 3-4 minutes.

A TFRecord dataset containing each ImageNet image, label, and its corresponding Felzenszwalb segmentation/mask can then be generated using the generate_fh_masks Python script. You will first have to download two pieces of ImageNet metadata into the same directory as the script:

wget https://raw.githubusercontent.com/tensorflow/models/master/research/slim/datasets/imagenet_metadata.txt wget https://raw.githubusercontent.com/tensorflow/models/master/research/slim/datasets/imagenet_lsvrc_2015_synsets.txt

And to run the multi-threaded mask generation script:

python generate_fh_masks_for_imagenet.py -- \
--train_directory=imagenet-train \
--output_directory=imagenet-train-fh

This single-machine, multi-threaded version of the mask generation script takes 2-3 days on a 16-core CPU machine to complete CPU-based processing of the ImageNet training and validation set. The script assumes the same ImageNet directory structure as github.com/tensorflow/models/blob/master/research/slim/datasets/build_imagenet_data.py (more details in the link).

You can then run the main training loop and execute multiple DetCon-B training steps by running from the parent directory the command:

python -m detcon.main_loop \
  --dataset_directory='/tmp/imagenet-fh-train' \
  --pretrain_epochs=100`

Note that you will need to update the dataset_directory flag, to point to the generated Felzenzwalb/image TFRecord dataset previously generated. Additionally, to use accelerators, users will need to install the correct version of jaxlib with CUDA support.

Citing this work

If you use this code in your work, please consider referencing our work:

@article{henaff2021efficient,
  title={{Efficient Visual Pretraining with Contrastive Detection}},
  author={H{\'e}naff, Olivier J and Koppula, Skanda and Alayrac, Jean-Baptiste and Oord, Aaron van den and Vinyals, Oriol and Carreira, Jo{\~a}o},
  journal={International Conference on Computer Vision},
  year={2021}
}

Disclaimer

This is not an officially supported Google product.

Owner
DeepMind
DeepMind
Official Pytorch implementation for video neural representation (NeRV)

NeRV: Neural Representations for Videos (NeurIPS 2021) Project Page | Paper | UVG Data Hao Chen, Bo He, Hanyu Wang, Yixuan Ren, Ser-Nam Lim, Abhinav S

hao 214 Dec 28, 2022
AI Toolkit for Healthcare Imaging

Medical Open Network for AI MONAI is a PyTorch-based, open-source framework for deep learning in healthcare imaging, part of PyTorch Ecosystem. Its am

Project MONAI 3.7k Jan 07, 2023
Official code repository for the EMNLP 2021 paper

Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization PyTorch code for the EMNLP 2021 paper "Integrating Visuospatia

Adyasha Maharana 23 Dec 19, 2022
Continuum Learning with GEM: Gradient Episodic Memory

Gradient Episodic Memory for Continual Learning Source code for the paper: @inproceedings{GradientEpisodicMemory, title={Gradient Episodic Memory

Facebook Research 360 Dec 27, 2022
Net2net - Network-to-Network Translation with Conditional Invertible Neural Networks

Net2Net Code accompanying the NeurIPS 2020 oral paper Network-to-Network Translation with Conditional Invertible Neural Networks Robin Rombach*, Patri

CompVis Heidelberg 206 Dec 20, 2022
Reference models and tools for Cloud TPUs.

Cloud TPUs This repository is a collection of reference models and tools used with Cloud TPUs. The fastest way to get started training a model on a Cl

5k Jan 05, 2023
Learning trajectory representations using self-supervision and programmatic supervision.

Trajectory Embedding for Behavior Analysis (TREBA) Implementation from the paper: Jennifer J. Sun, Ann Kennedy, Eric Zhan, David J. Anderson, Yisong Y

58 Jan 06, 2023
Implementation of ConvMixer-Patches Are All You Need? in TensorFlow and Keras

Patches Are All You Need? - ConvMixer ConvMixer, an extremely simple model that is similar in spirit to the ViT and the even-more-basic MLP-Mixer in t

Sayan Nath 8 Oct 03, 2022
A Real-World Benchmark for Reinforcement Learning based Recommender System

RL4RS: A Real-World Benchmark for Reinforcement Learning based Recommender System RL4RS is a real-world deep reinforcement learning recommender system

121 Dec 01, 2022
Code for Fold2Seq paper from ICML 2021

[ICML2021] Fold2Seq: A Joint Sequence(1D)-Fold(3D) Embedding-based Generative Model for Protein Design Environment file: environment.yml Data and Feat

International Business Machines 43 Dec 04, 2022
Code accompanying paper: Meta-Learning to Improve Pre-Training

Meta-Learning to Improve Pre-Training This folder contains code to run experiments in the paper Meta-Learning to Improve Pre-Training, NeurIPS 2021. P

28 Dec 31, 2022
This repo holds codes of the ICCV21 paper: Visual Alignment Constraint for Continuous Sign Language Recognition.

VAC_CSLR This repo holds codes of the paper: Visual Alignment Constraint for Continuous Sign Language Recognition.(ICCV 2021) [paper] Prerequisites Th

Yuecong Min 64 Dec 19, 2022
Code for Recurrent Mask Refinement for Few-Shot Medical Image Segmentation (ICCV 2021).

Recurrent Mask Refinement for Few-Shot Medical Image Segmentation Steps Install any missing packages using pip or conda Preprocess each dataset using

XIE LAB @ UCI 39 Dec 08, 2022
A curated list of the top 10 computer vision papers in 2021 with video demos, articles, code and paper reference.

The Top 10 Computer Vision Papers of 2021 The top 10 computer vision papers in 2021 with video demos, articles, code, and paper reference. While the w

Louis-François Bouchard 118 Dec 21, 2022
CAMoE + Dual SoftMax Loss (DSL): Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss

CAMoE + Dual SoftMax Loss (DSL): Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss This is official implement of "

程星 87 Dec 24, 2022
Entity-Based Knowledge Conflicts in Question Answering.

Entity-Based Knowledge Conflicts in Question Answering Run Instructions | Paper | Citation | License This repository provides the Substitution Framewo

Apple 35 Oct 19, 2022
PyTorch source code for Distilling Knowledge by Mimicking Features

LSHFM.detection This is the PyTorch source code for Distilling Knowledge by Mimicking Features. And this project contains code for object detection wi

Guo-Hua Wang 4 Dec 17, 2022
Codes for building and training the neural network model described in Domain-informed neural networks for interaction localization within astroparticle experiments.

Domain-informed Neural Networks Codes for building and training the neural network model described in Domain-informed neural networks for interaction

DIDACTS 0 Dec 13, 2021