Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning. CVPR 2018

Last update: Oct 01, 2022

Overview

Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning

Tensorflow code and models for the paper:

Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning
Yin Cui, Yang Song, Chen Sun, Andrew Howard, Serge Belongie
CVPR 2018

This repository contains code and pre-trained models used in the paper and 2 demos to demonstrate: 1) the importance of pre-training data on transfer learning; 2) how to calculate domain similarity between source domain and target domain.

Notice that we used a mini validation set (./inat_minival.txt) contains 9,697 images that are randomly selected from the original iNaturalist 2017 validation set. The rest of valdiation images were combined with the original training set to train our model in the paper. There are 665,473 training images in total.

Dependencies:

Python (3.5)
Tensorflow (1.11)
pyemd
scikit-learn
scikit-image

Preparation:

Clone the repo with recursive:

git clone --recursive https://github.com/richardaecn/cvpr18-inaturalist-transfer.git

Install dependencies. Please refer to TensorFlow, pyemd, scikit-learn and scikit-image official websites for installation guide.
Download data and feature and unzip them into the same directory as the cloned repo. You should have two folders './data' and './feature' in the repo's directory.

Datasets (optional):

In the paper, we used data from 9 publicly available datasets:

We provide a download link that includes the entire CUB-200-2011 dataset and data splits for the rest of 8 datasets. The provided link contains sufficient data for this repo. If you would like to use other 8 datasets, please download them from the official websites and put them in the corresponding subfolders under './data'.

Pre-trained Models (optional):

The models were trained using TensorFlow-Slim. We implemented Squeeze-and-Excitation Networks (SENet) under './slim'. The pre-trained models can be downloaded from the following links:

Network	Pre-trained Data	Input Size	Download Link
Inception-V3	ImageNet	299	link
Inception-V3	iNat2017	299	link
Inception-V3	iNat2017	448	link
Inception-V3	iNat2017	299 -> 560 FT¹	link
Inception-V3	ImageNet + iNat2017	299	link
Inception-V3 SE	ImageNet + iNat2017	299	link
Inception-V4	iNat2017	448	link
Inception-V4	iNat2017	448 -> 560 FT²	link
Inception-ResNet-V2	ImageNet + iNat2017	299	link
Inception-ResNet-V2 SE	ImageNet + iNat2017	299	link
ResNet-V2 50	ImageNet + iNat2017	299	link
ResNet-V2 101	ImageNet + iNat2017	299	link
ResNet-V2 152	ImageNet + iNat2017	299	link

¹ This model was trained with 299 input size on train + 90% val and then fine-tuned with 560 input size on 90% val.

² This model was trained with 448 input size on train + 90% val and then fine-tuned with 560 input size on 90% val.

TensorFlow Hub also provides a pre-trained Inception-V3 299 on iNat2017 original training set here.

Featrue Extraction (optional):

Run the following Python script to extract feature:

python feature_extraction.py

To run this script, you need to download the checkpoint of Inception-V3 299 trained on iNat2017. The dataset and pre-trained model can be modified in the script.

We provide a download link that includes features used in the domos of this repo.

Demos

Linear logistic regression on extracted features:

This demo shows the importance of pre-training data on transfer learning. Based on features extracted from an Inception-V3 pre-trained on iNat2017, we are able to achieve 89.9% classification accuracy on CUB-200-2011 with the simple logistic regression, outperforming most state-of-the-art methods.

LinearClassifierDemo.ipynb

Calculating domain similarity by Earth Mover's Distance (EMD): This demo gives an example to calculate the domain similarity proposed in the paper. Results correspond to part of the Fig. 5 in the original paper.

DomainSimilarityDemo.ipynb

Training and Evaluation

Convert dataset into '.tfrecord':

python convert_dataset.py --dataset_name=cub_200 --num_shards=10

Train (fine-tune) the model on 1 GPU:

CUDA_VISIBLE_DEVICES=0 ./train.sh

Evaluate the model on another GPU simultaneously:

CUDA_VISIBLE_DEVICES=1 ./eval.sh

Run Tensorboard for visualization:

tensorboard --logdir=./checkpoints/cub_200/ --port=6006

Citation

If you find our work helpful in your research, please cite it as:

@inproceedings{Cui2018iNatTransfer,
  title = {Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning},
  author = {Yin Cui, Yang Song, Chen Sun, Andrew Howard, Serge Belongie},
  booktitle={CVPR},
  year={2018}
}

Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning. CVPR 2018

Related tags

Overview

Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning

Dependencies:

Preparation:

Datasets (optional):

Pre-trained Models (optional):

Featrue Extraction (optional):

Demos

Training and Evaluation

Citation

Owner

Yin Cui

This is the official repository for our paper: ''Pruning Self-attentions into Convolutional Layers in Single Path''.

The 2nd place solution of 2021 google landmark retrieval on kaggle.

A `Neural = Symbolic` framework for sound and complete weighted real-value logic

Code repository for our paper regarding the L3D dataset.

Safe Local Motion Planning with Self-Supervised Freespace Forecasting, CVPR 2021

Character Controllers using Motion VAEs

Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields.

Official PyTorch implementation of "Improving Face Recognition with Large AgeGaps by Learning to Distinguish Children" (BMVC 2021)

TensorFlow implementation of "TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?"

BMVC 2021 Oral: code for BI-GCN: Boundary-Aware Input-Dependent Graph Convolution for Biomedical Image Segmentation

🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

eXPeditious Data Transfer

Computer-Vision-Paper-Reviews - Computer Vision Paper Reviews with Key Summary along Papers & Codes

A PyTorch implementation of the Relational Graph Convolutional Network (RGCN).

Fast mesh denoising with data driven normal filtering using deep variational autoencoders

A new data augmentation method for extreme lighting conditions.

Search Youtube Video and Get Video info

Pytorch implementation for reproducing StackGAN_v2 results in the paper StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks

Global Rhythm Style Transfer Without Text Transcriptions

A Repository of Community-Driven Natural Instructions