Official codebase used to develop Vision Transformer, MLP-Mixer, LiT and more.

Overview

Big Vision

This codebase is designed for training large-scale vision models on Cloud TPU VMs. It is based on Jax/Flax libraries, and uses tf.data and TensorFlow Datasets for scalable input pipelines in the Cloud.

The open-sourcing of this codebase has two main purposes:

  1. Publishing the code of research projects developed in this codebase (see a list below).
  2. Providing a strong starting point for running large-scale vision experiments on Google Cloud TPUs, which should scale seamlessly and out-of-the box from a single TPU core to a distributed setup with up to 2048 TPU cores.

Note, that despite being TPU-centric, our codebase should in general support CPU, GPU and single-host multi-GPU training, thanks to JAX' well-executed and transparent support for multiple backends.

big_vision aims to support research projects at Google. We are unlikely to work on feature requests or accept external contributions, unless they were pre-approved (ask in an issue first). For a well-supported transfer-only codebase, see also vision_transformer.

The following research projects were originally conducted in the big_vision codebase:

Architecture research

Multimodal research

Knowledge distillation

Misc

  • Are we done with ImageNet?, by Lucas Beyer*, Olivier J. Hénaff*, Alexander Kolesnikov*, Xiaohua Zhai*, and Aäron van den Oord*

Codebase high-level organization and principles in a nutshell

The main entry point is a trainer module, which typically does all the boilerplate related to creating a model and an optimizer, loading the data, checkpointing and training/evaluating the model inside a loop. We provide the canonical trainer train.py in the root folder. Normally, individual projects within big_vision fork and customize this trainer.

All models, evaluators and preprocessing operations live in the corresponding subdirectories and can often be reused between different projects. We encourage compatible APIs within these directories to facilitate reusability, but it is not strictly enforced, as individual projects may need to introduce their custom APIs.

We have a powerful configuration system, with the configs living in the configs/ directory. Custom trainers and modules can seamlessly extend/modify the configuration options.

Training jobs are robust to interruptions and will resume seamlessly from the last saved checkpoint (assuming user provides the correct --workdir path).

Each configuration file contains a comment at the top with a COMMAND snippet to run it, and some hint of expected runtime and results. See below for more details, but generally speaking, running on a GPU machine involves calling python -m COMMAND while running on TPUs, including multi-host, involves

gcloud alpha compute tpus tpu-vm ssh $NAME --zone=$ZONE --worker=all
  --command "bash big_vision/run_tpu.sh COMMAND"

See instructions below for more details on how to use Google Cloud TPUs.

Current and future contents

The first release contains the core part of pre-training, transferring, and evaluating classification models at scale on Cloud TPU VMs.

Features and projects we plan to release in the near future, in no particular order:

  • ImageNet-21k in TFDS.
  • MLP-Mixer.
  • Loading misc public models used in our publications (NFNet, MoCov3, DINO).
  • Contrastive Image-Text model training and evaluation as in LiT and CLIP.
  • "Patient and consistent" distillation.
  • Memory-efficient Polyak-averaging implementation.
  • Advanced JAX compute and memory profiling. We are using internal tools for this, but may eventually add support for the publicly available ones.

We will continue releasing code of our future publications developed within big_vision here.

Non-content

The following exist in the internal variant of this codebase, and there is no plan for their release:

  • Regular regression tests for both quality and speed. They rely heavily on internal infrastructure.
  • Advanced logging, monitoring, and plotting of experiments. This also relies heavily on internal infrastructure. However, we are open to ideas on this and may add some in the future, especially if implemented in a self-contained manner.
  • Not yet published, ongoing research projects.

Running on Cloud TPU VMs

Create TPU VMs

To create a single machine with 8 TPU cores, follow the following Cloud TPU JAX document: https://cloud.google.com/tpu/docs/run-calculation-jax

To support large-scale vision research, more cores with multiple hosts are recommended. Below we provide instructions on how to do it.

First, create some useful variables, which we be reused:

export NAME="a name of the TPU deployment, e.g. my-tpu-machine"
export ZONE="GCP geographical zone, e.g. europe-west4-a"
export GS_BUCKET_NAME="Name of the storage bucket, e.g. my_bucket"

The following command line will create TPU VMs with 32 cores, 4 hosts.

gcloud alpha compute tpus tpu-vm create $NAME --zone $ZONE --accelerator-type v3-32 --version tpu-vm-tf-2.8.0

Install big_vision on TPU VMs

Fetch the big_vision repository, copy it to all TPU VM hosts, and install dependencies.

git clone --branch=master https://github.com/google-research/big_vision
gcloud alpha compute tpus tpu-vm scp --recurse big_vision/big_vision $NAME: --worker=all --zone=$ZONE
gcloud alpha compute tpus tpu-vm ssh $NAME --zone=$ZONE --worker=all --command "bash big_vision/run_tpu.sh"

Download and prepare TFDS datasets

Everything in this section you need to do only once, and, alternatively, you can also do it on your local machine and copy the result to the cloud bucket. For convenience, we provide instructions on how to prepare data using Cloud TPUs.

Download and prepare TFDS datasets using a single worker. Seven TFDS datasets used during evaluations will be generated under ~/tensorflow_datasets/ (should take 10-15 minutes in total).

gcloud alpha compute tpus tpu-vm ssh $NAME --zone=$ZONE --worker=0 --command "bash big_vision/run_tpu.sh big_vision.tools.download_tfds_datasets cifar10 cifar100 oxford_iiit_pet oxford_flowers102 cars196 dtd uc_merced"

Copy the datasets to GS bucket, to make them accessible to all TPU workers.

gcloud alpha compute tpus tpu-vm ssh $NAME --zone=$ZONE --worker=0 --command "rm -r ~/tensorflow_datasets/downloads && gsutil cp -r ~/tensorflow_datasets gs://$GS_BUCKET_NAME"

If you want to integrate other public or custom datasets, i.e. imagenet2012, please follow the official guideline.

Pre-trained models

For the full list of pre-trained models check out the load function defined in the same module as the model code. And for example config on how to use these models, see configs/transfer.py.

Run the transfer script on TPU VMs

The following command line fine-tunes a pre-trained vit-i21k-augreg-b/32 model on cifar10 dataset.

gcloud alpha compute tpus tpu-vm ssh $NAME --zone=$ZONE --worker=all --command "TFDS_DATA_DIR=gs://$GS_BUCKET_NAME/tensorflow_datasets bash big_vision/run_tpu.sh big_vision.train --config big_vision/configs/transfer.py:model=vit-i21k-augreg-b/32,dataset=cifar10,crop=resmall_crop --workdir gs://$GS_BUCKET_NAME/big_vision/workdir/`date '+%m-%d_%H%M'` --config.lr=0.03"

Run the train script on TPU VMs

To train your own big_vision models on a large dataset, e.g. imagenet2012 (prepare the TFDS dataset), run the following command line.

gcloud alpha compute tpus tpu-vm ssh $NAME --zone=$ZONE --worker=all --command "TFDS_DATA_DIR=gs://$GS_BUCKET_NAME/tensorflow_datasets bash big_vision/run_tpu.sh big_vision.train --config big_vision/configs/bit_i1k.py  --workdir gs://$GS_BUCKET_NAME/big_vision/workdir/`date '+%m-%d_%H%M'`"

ViT baseline

We provide a well-tuned ViT-S/16 baseline in the config file named vit_s16_i1k.py. It achieves 76.5% accuracy on ImageNet validation split in 90 epochs of training, being a strong and simple starting point for research on the ViT models.

Please see our arXiv note for more details and if this baseline happens to by useful for your research, consider citing

@article{vit_baseline,
  url = {https://arxiv.org/abs/2205.01580},
  author = {Beyer, Lucas and Zhai, Xiaohua and Kolesnikov, Alexander},
  title = {Better plain ViT baselines for ImageNet-1k},
  journal={arXiv preprint arXiv:2205.01580},
  year = {2022},
}

Citing the codebase

If you found this codebase useful for your research, please consider using the following BibTEX to cite it:

@misc{big_vision,
  author = {Beyer, Lucas and Zhai, Xiaohua and Kolesnikov, Alexander},
  title = {Big Vision},
  year = {2022},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/google-research/big_vision}}
}

Disclaimer

This is not an official Google Product.

Owner
Google Research
Google Research
Introduction to AI assignment 1 HCM University of Technology, term 211

Sokoban Bot Introduction to AI assignment 1 HCM University of Technology, term 211 Abstract This is basically a solver for Sokoban game using Breadth-

Quang Minh 4 Dec 12, 2022
Repository for "Toward Practical Monocular Indoor Depth Estimation" (CVPR 2022)

Toward Practical Monocular Indoor Depth Estimation Cho-Ying Wu, Jialiang Wang, Michael Hall, Ulrich Neumann, Shuochen Su [arXiv] [project site] DistDe

Meta Research 122 Dec 13, 2022
Pytorch implementation of paper Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data

Pytorch implementation of paper Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data

Hrishikesh Kamath 31 Nov 20, 2022
RANZCR-CLiP 7th Place Solution

RANZCR-CLiP 7th Place Solution This repository is WIP. (18 Mar 2021) Installation git clone https://github.com/analokmaus/kaggle-ranzcr-clip-public.gi

Hiroshechka Y 21 Oct 22, 2022
Anti-UAV base on PaddleDetection

Paddle-Anti-UAV Anti-UAV base on PaddleDetection Background UAVs are very popular and we can see them in many public spaces, such as parks and playgro

Qingzhong Wang 2 Apr 20, 2022
Yolact-keras实例分割模型在keras当中的实现

Yolact-keras实例分割模型在keras当中的实现 目录 性能情况 Performance 所需环境 Environment 文件下载 Download 训练步骤 How2train 预测步骤 How2predict 评估步骤 How2eval 参考资料 Reference 性能情况 训练数

Bubbliiiing 11 Dec 26, 2022
A unified framework for machine learning with time series

Welcome to sktime A unified framework for machine learning with time series We provide specialized time series algorithms and scikit-learn compatible

The Alan Turing Institute 6k Jan 08, 2023
[ICML 2020] Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Control

PG-MORL This repository contains the implementation for the paper Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Contro

MIT Graphics Group 65 Jan 07, 2023
AMTML-KD: Adaptive Multi-teacher Multi-level Knowledge Distillation

AMTML-KD: Adaptive Multi-teacher Multi-level Knowledge Distillation

Frank Liu 26 Oct 13, 2022
DEEPAGÉ: Answering Questions in Portuguese about the Brazilian Environment

DEEPAGÉ: Answering Questions in Portuguese about the Brazilian Environment This repository is related to the paper DEEPAGÉ: Answering Questions in Por

0 Dec 10, 2021
An 16kHz implementation of HiFi-GAN for soft-vc.

HiFi-GAN An 16kHz implementation of HiFi-GAN for soft-vc. Relevant links: Official HiFi-GAN repo HiFi-GAN paper Soft-VC repo Soft-VC paper Example Usa

Benjamin van Niekerk 42 Dec 27, 2022
A Framework for Encrypted Machine Learning in TensorFlow

TF Encrypted is a framework for encrypted machine learning in TensorFlow. It looks and feels like TensorFlow, taking advantage of the ease-of-use of t

TF Encrypted 0 Jul 06, 2022
Flaxformer: transformer architectures in JAX/Flax

Flaxformer is a transformer library for primarily NLP and multimodal research at Google.

Google 116 Jan 05, 2023
Caffe-like explicit model constructor. C(onfig)Model

cmodel Caffe-like explicit model constructor. C(onfig)Model Installation pip install git+https://github.com/bonlime/cmodel Usage In order to allow usi

1 Feb 18, 2022
A public available dataset for road boundary detection in aerial images

Topo-boundary This is the official github repo of paper Topo-boundary: A Benchmark Dataset on Topological Road-boundary Detection Using Aerial Images

Zhenhua Xu 79 Jan 04, 2023
ICS 4u HD project, start before-wards. A curtain shooting game using python.

Touhou-Star-Salvation HDCH ICS 4u HD project, start before-wards. A curtain shooting game using python and pygame. By Jason Li For arts and gameplay,

15 Dec 22, 2022
Implementation of Gans

GAN Generative Adverserial Networks are an approach to generative data modelling using Deep learning methods. I have currently implemented : DCGAN on

Sibam Parida 5 Sep 07, 2021
Vrcwatch - Supply the local time to VRChat as Avatar Parameters through OSC

English: README-EN.md VRCWatch VRCWatch は、VRChat 内のアバター向けに現在時刻を送信するためのプログラムです。 使

Kosaki Mezumona 17 Nov 30, 2022
you can add any codes in any language by creating its respective folder (if already not available).

HACKTOBERFEST-2021-WEB-DEV Beginner-Hacktoberfest Need Your first pr for hacktoberfest 2k21 ? come on in About This is repository of Responsive Portfo

Suman Sharma 8 Oct 17, 2022
A High-Level Fusion Scheme for Circular Quantities published at the 20th International Conference on Advanced Robotics

Monte Carlo Simulation to the Paper A High-Level Fusion Scheme for Circular Quantities published at the 20th International Conference on Advanced Robotics

Sören Kohnert 0 Dec 06, 2021