DeepLab-ResNet rebuilt in TensorFlow

Overview

DeepLab-ResNet-TensorFlow

Build Status

This is an (re-)implementation of DeepLab-ResNet in TensorFlow for semantic image segmentation on the PASCAL VOC dataset.

Frequently Asked Questions

If you encounter some problems and would like to create an issue, please read this first. If the guide below does not cover your question, please use search to see if a similar issue has already been solved before. Finally, if you are unable to find an answer, please fill in the issue with details of your problem provided.

Which python version should I use?

All the experiments are been done using python2.7. python3 will likely require some minor modifications.

After training, I have multiple files that look like model.ckpt-xxxx.index, model.ckpt-xxxx.dataxxxx and model.ckpt-xxxx.meta. Which one of them should I use to restore the model for inference?

Instead of providing a path to one of those files, you must provide just model.ckpt-xxxx. It will fetch other files.

My model is not learning anything. What should I do?

First, check that your images are being read correctly. The setup implies that segmentation masks are saved without a colour map, i.e., each pixel contains a class index, not an RGB value. Second, tune your hyperparameters. As there are no general strategies that work for each case, the design of this procedure is up to you.

I want to use my own dataset. What should I do?

Please refer to this topic.

Updates

29 Jan, 2017:

  • Fixed the implementation of the batch normalisation layer: it now supports both the training and inference steps. If the flag --is-training is provided, the running means and variances will be updated; otherwise, they will be kept intact. The .ckpt files have been updated accordingly - to download please refer to the new link provided below.
  • Image summaries during the training process can now be seen using TensorBoard.
  • Fixed the evaluation procedure: the 'void' label (255) is now correctly ignored. As a result, the performance score on the validation set has increased to 80.1%.

11 Feb, 2017:

  • The training script train.py has been re-written following the original optimisation setup: SGD with momentum, weight decay, learning rate with polynomial decay, different learning rates for different layers, ignoring the 'void' label (255).
  • The training script with multi-scale inputs train_msc.py has been added: the input is resized to 0.5 and 0.75 of the original resolution, and 4 losses are aggregated: loss on the original resolution, on the 0.75 resolution, on the 0.5 resolution, and loss on the all fused outputs.
  • Evaluation of a single-scale converted pre-trained model on the PASCAL VOC validation dataset (using 'SegmentationClassAug') leads to 86.9% mIoU (as trainval was likely to be used for final training). This is confirmed by the official PASCAL VOC server. The score on the test dataset is 75.8%.

22 Feb, 2017:

  • The training script with multi-scale inputs train_msc.py now supports gradients accumulation: the relevant parameter --grad-update-every effectively mimics the behaviour of iter_size of Caffe. This allows to use batches of bigger sizes with less GPU memory being consumed. (Thanks to @arslan-chaudhry for this contribution!)
  • The random mirror and random crop options have been added. (Again big thanks to @arslan-chaudhry !)

23 Apr, 2017:

  • TensorFlow 1.1.0 is now supported.
  • Three new flags --num-classes, --ignore-label and --not-restore-last are added to ease the usability of the scripts on new datasets. Check out these instructions on how to set up the training process on your dataset.

Model Description

The DeepLab-ResNet is built on a fully convolutional variant of ResNet-101 with atrous (dilated) convolutions, atrous spatial pyramid pooling, and multi-scale inputs (not implemented here).

The model is trained on a mini-batch of images and corresponding ground truth masks with the softmax classifier at the top. During training, the masks are downsampled to match the size of the output from the network; during inference, to acquire the output of the same size as the input, bilinear upsampling is applied. The final segmentation mask is computed using argmax over the logits. Optionally, a fully-connected probabilistic graphical model, namely, CRF, can be applied to refine the final predictions. On the test set of PASCAL VOC, the model achieves 79.7% of mean intersection-over-union.

For more details on the underlying model please refer to the following paper:

@article{CP2016Deeplab,
  title={DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs},
  author={Liang-Chieh Chen and George Papandreou and Iasonas Kokkinos and Kevin Murphy and Alan L Yuille},
  journal={arXiv:1606.00915},
  year={2016}
}

Requirements

TensorFlow needs to be installed before running the scripts. TensorFlow v1.1.0 is supported; for TensorFlow v0.12 please refer to this branch; for TensorFlow v0.11 please refer to this branch. Note that those branches may not have the same functional as the current master.

To install the required python packages (except TensorFlow), run

pip install -r requirements.txt

or for a local installation

pip install --user -r requirements.txt

Caffe to TensorFlow conversion

To imitate the structure of the model, we have used .caffemodel files provided by the authors. The conversion has been performed using Caffe to TensorFlow with an additional configuration for atrous convolution and batch normalisation (since the batch normalisation provided by Caffe-tensorflow only supports inference). There is no need to perform the conversion yourself as you can download the already converted models - deeplab_resnet.ckpt (pre-trained) and deeplab_resnet_init.ckpt (the last layers are randomly initialised) - here.

Nevertheless, it is easy to perform the conversion manually, given that the appropriate .caffemodel file has been downloaded, and Caffe to TensorFlow dependencies have been installed. The Caffe model definition is provided in misc/deploy.prototxt. To extract weights from .caffemodel, run the following:

python convert.py /path/to/deploy/prototxt --caffemodel /path/to/caffemodel --data-output-path /where/to/save/numpy/weights

As a result of running the command above, the model weights will be stored in /where/to/save/numpy/weights. To convert them to the native TensorFlow format (.ckpt), simply execute:

python npy2ckpt.py /where/to/save/numpy/weights --save-dir=/where/to/save/ckpt/weights

Dataset and Training

To train the network, one can use the augmented PASCAL VOC 2012 dataset with 10582 images for training and 1449 images for validation.

The training script allows to monitor the progress in the optimisation process using TensorBoard's image summary. Besides that, one can also exploit random scaling and mirroring of the inputs during training as a means for data augmentation. For example, to train the model from scratch with random scale and mirroring turned on, simply run:

python train.py --random-mirror --random-scale

To see the documentation on each of the training settings run the following:

python train.py --help

An additional script, fine_tune.py, demonstrates how to train only the last layers of the network. The script train_msc.py with multi-scale inputs fully resembles the training setup of the original model.

Evaluation

The single-scale model shows 86.9% mIoU on the Pascal VOC 2012 validation dataset ('SegmentationClassAug'). No post-processing step with CRF is applied.

The following command provides the description of each of the evaluation settings:

python evaluate.py --help

Inference

To perform inference over your own images, use the following command:

python inference.py /path/to/your/image /path/to/ckpt/file

This will run the forward pass and save the resulted mask with this colour map:

Using your dataset

In order to apply the same scripts using your own dataset, you would need to follow the next steps:

  1. Make sure that your segmentation masks are in the same format as the ones in the DeepLab setup (i.e., without a colour map). This means that if your segmentation masks are RGB images, you would need to convert each 3-D RGB vector into a 1-D label. For example, take a look here;
  2. Create a file with instances of your dataset in the same format as in files here;
  3. Change the flags data-dir and data-list accordingly in thehttps://gist.github.com/DrSleep/4bce37254c5900545e6b65f6a0858b9c); script file that you will be using (e.g., python train.py --data-dir /my/data/dir --data-list /my/data/list);
  4. Change the IMG_MEAN vector accordingly in the script file that you will be using;
  5. For visualisation purposes, you will also need to change the colour map here;
  6. Change the flags num-classes and ignore-label accordingly in the script that you will be using (e.g., python train.py --ignore-label 255 --num-classes 21).
  7. If restoring weights from the PASCAL models for your dataset with a different number of classes, you will also need to pass the --not-restore-last flag, which will prevent the last layers of size 21 from being restored.

Missing features

The post-processing step with CRF is currently being implemented here.

Other implementations

Owner
Vladimir
ML/CV enthusiast
Vladimir
Python3 / PyTorch implementation of the following paper: Fine-grained Semantics-aware Representation Enhancement for Self-supervisedMonocular Depth Estimation. ICCV 2021 (oral)

FSRE-Depth This is a Python3 / PyTorch implementation of FSRE-Depth, as described in the following paper: Fine-grained Semantics-aware Representation

77 Dec 28, 2022
Official implementation of "Generating 3D Molecules for Target Protein Binding"

Generating 3D Molecules for Target Protein Binding This is the official implementation of the GraphBP method proposed in the following paper. Meng Liu

DIVE Lab, Texas A&M University 74 Dec 07, 2022
This is the repository for the AAAI 21 paper [Contrastive and Generative Graph Convolutional Networks for Graph-based Semi-Supervised Learning].

CG3 This is the repository for the AAAI 21 paper [Contrastive and Generative Graph Convolutional Networks for Graph-based Semi-Supervised Learning]. R

12 Oct 28, 2022
Supplementary code for TISMIR paper "Sliding-Window Pitch-Class Histograms as a Means of Modeling Musical Form"

Sliding-Window Pitch-Class Histograms as a Means of Modeling Musical Form This is supplementary code for the TISMIR paper Sliding-Window Pitch-Class H

1 Nov 27, 2021
A high-level Python library for Quantum Natural Language Processing

lambeq About lambeq is a toolkit for quantum natural language processing (QNLP). Documentation: https://cqcl.github.io/lambeq/ Getting started Prerequ

Cambridge Quantum 315 Jan 01, 2023
Medical image analysis framework merging ANTsPy and deep learning

ANTsPyNet A collection of deep learning architectures and applications ported to the python language and tools for basic medical image processing. Bas

Advanced Normalization Tools Ecosystem 118 Dec 24, 2022
Code examples and benchmarks from the paper "Understanding Entropy Coding With Asymmetric Numeral Systems (ANS): a Statistician's Perspective"

Code For the Paper "Understanding Entropy Coding With Asymmetric Numeral Systems (ANS): a Statistician's Perspective" Author: Robert Bamler Date: 22 D

4 Nov 02, 2022
Banglore House Prediction Using Flask Server (Python)

Banglore House Prediction Using Flask Server (Python) ๐ŸŒ Links ๐ŸŒ ๐Ÿ“‚ Repo In this repository, I've implemented a Machine Learning-based Bangalore Hous

Dhyan Shah 1 Jan 24, 2022
้ ˜ๅŸŸใ‚’ๆŒ‡ๅฎšใ—ใ€ใ‚ญใƒผใ‚’ๅ…ฅๅŠ›ใ™ใ‚‹ใ“ใจใง็”ปๅƒใ‚’ไฟๅญ˜ใ™ใ‚‹ใƒ„ใƒผใƒซใงใ™ใ€‚ใ‚ฏใƒฉใ‚นๅˆ†้กž็”จใฎใƒ‡ใƒผใ‚ฟใ‚ปใƒƒใƒˆไฝœๆˆใ‚’ๆƒณๅฎšใ—ใฆใ„ใพใ™ใ€‚

image-capture-class-annotation ้ ˜ๅŸŸใ‚’ๆŒ‡ๅฎšใ—ใ€ใ‚ญใƒผใ‚’ๅ…ฅๅŠ›ใ™ใ‚‹ใ“ใจใง็”ปๅƒใ‚’ไฟๅญ˜ใ™ใ‚‹ใƒ„ใƒผใƒซใงใ™ใ€‚ ใ‚ฏใƒฉใ‚นๅˆ†้กž็”จใฎใƒ‡ใƒผใ‚ฟใ‚ปใƒƒใƒˆไฝœๆˆใ‚’ๆƒณๅฎšใ—ใฆใ„ใพใ™ใ€‚ Requirement OpenCV 3.4.2 or later Usage ๅฎŸ่กŒๆ–นๆณ•ใฏไปฅไธ‹ใงใ™ใ€‚ ่ตทๅ‹•ๅพŒใฏใƒžใ‚ฆใ‚นใ‚ฏใƒชใƒƒใ‚ฏ4

KazuhitoTakahashi 5 May 28, 2021
Code to produce syntactic representations that can be used to study syntax processing in the human brain

Can fMRI reveal the representation of syntactic structure in the brain? The code base for our paper on understanding syntactic representations in the

Aniketh Janardhan Reddy 4 Dec 18, 2022
Tutorials and implementations for "Self-normalizing networks"

Self-Normalizing Networks Tutorials and implementations for "Self-normalizing networks"(SNNs) as suggested by Klambauer et al. (arXiv pre-print). Vers

Institute of Bioinformatics, Johannes Kepler University Linz 1.6k Jan 07, 2023
Spatial-Temporal Transformer for Dynamic Scene Graph Generation, ICCV2021

Spatial-Temporal Transformer for Dynamic Scene Graph Generation Pytorch Implementation of our paper Spatial-Temporal Transformer for Dynamic Scene Gra

Yuren Cong 119 Jan 01, 2023
MusicYOLO framework uses the object detection model, YOLOx, to locate notes in the spectrogram.

MusicYOLO MusicYOLO framework uses the object detection model, YOLOX, to locate notes in the spectrogram. Its performance on the ISMIR2014 dataset, MI

Xianke Wang 2 Aug 02, 2022
Official PyTorch implementation of "Proxy Synthesis: Learning with Synthetic Classes for Deep Metric Learning" (AAAI 2021)

Proxy Synthesis: Learning with Synthetic Classes for Deep Metric Learning Official PyTorch implementation of "Proxy Synthesis: Learning with Synthetic

NAVER/LINE Vision 30 Dec 06, 2022
StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation

StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation Demo video: CVPR 2021 Oral: Single Channel Manipulation: Localized or attribu

Zongze Wu 267 Dec 30, 2022
A simple AI that will give you si ple task and this is made with python

Crystal-AI A simple AI that will give you si ple task and this is made with python Prerequsites: Python3.6.2 pyttsx3 pip install pyttsx3 pyaudio pip i

CrystalAnd 1 Dec 25, 2021
Jigsaw Rate Severity of Toxic Comments

Jigsaw Rate Severity of Toxic Comments

Guanshuo Xu 66 Nov 30, 2022
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

ALBERT ***************New March 28, 2020 *************** Add a colab tutorial to run fine-tuning for GLUE datasets. ***************New January 7, 2020

Google Research 3k Jan 01, 2023
[AAAI-2022] Official implementations of MCL: Mutual Contrastive Learning for Visual Representation Learning

Mutual Contrastive Learning for Visual Representation Learning This project provides source code for our Mutual Contrastive Learning for Visual Repres

winycg 48 Jan 02, 2023
Source code for The Power of Many: A Physarum Swarm Steiner Tree Algorithm

Physarum-Swarm-Steiner-Algo Source code for The Power of Many: A Physarum Steiner Tree Algorithm Code implements ideas from the following papers: Sher

Sheryl Hsu 2 Mar 28, 2022