DeepLab resnet v2 model in pytorch

Overview

pytorch-deeplab-resnet

DeepLab resnet v2 model implementation in pytorch.

The architecture of deepLab-ResNet has been replicated exactly as it is from the caffe implementation. This architecture calculates losses on input images over multiple scales ( 1x, 0.75x, 0.5x ). Losses are calculated individually over these 3 scales. In addition to these 3 losses, one more loss is calculated after merging the output score maps on the 3 scales. These 4 losses are added to calculate the total loss.

Updates

18 July 2017

  • One more evaluation script is added, evalpyt2.py. The old evaluation script evalpyt.py uses a different methodoloy to take mean of IOUs than the one used by authors. Results section has been updated to incorporate this change.

24 June 2017

  • Now, weights over the 3 scales ( 1x, 0.75x, 0.5x ) are shared as in the caffe implementation. Previously, each of the 3 scales had seperate weights. Results are almost same after making this change (more in the results section). However, the size of the trained .pth model has reduced significantly. Memory occupied on GPU(11.9 GB) and time taken (~3.5 hours) during training are same as before. Links to corresponding .pth files have been updated.
  • Custom data can be used to train pytorch-deeplab-resnet using train.py, flag --NoLabels (total number of labels in training data) has been added to train.py and evalpyt.py for this purpose. Please note that labels should be denoted by contiguous values (starting from 0) in the ground truth images. For eg. if there are 7 (no_labels) different labels, then each ground truth image must have these labels as 0,1,2,3,...6 (no_labels-1).

The older version (prior to 24 June 2017) is available here.

Usage

Note that this repository has been tested with python 2.7 only.

Converting released caffemodel to pytorch model

To convert the caffemodel released by authors, download the deeplab-resnet caffemodel (train_iter_20000.caffemodel) pretrained on VOC into the data folder. After that, run

python convert_deeplab_resnet.py

to generate the corresponding pytorch model file (.pth). The generated .pth snapshot file can be used to get the exsct same test performace as offered by using the caffemodel in caffe (as shown by numbers in results section). If you do not want to generate the .pth file yourself, you can download it here.

To run convert_deeplab_resnet.py, deeplab v2 caffe and pytorch (python 2.7) are required.

If you want to train your model in pytorch, move to the next section.

Training

Step 1: Convert init.caffemodel to a .pth file: init.caffemodel contains MS COCO trained weights. We use these weights as initilization for all but the final layer of our model. For the last layer, we use random gaussian with a standard deviation of 0.01 as the initialization. To convert init.caffemodel to a .pth file, run (or download the converted .pth here)

python init_net_surgery.py

To run init_net_surgery .py, deeplab v2 caffe and pytorch (python 2.7) are required.

Step 2: Now that we have our initialization, we can train deeplab-resnet by running,

python train.py

To get a description of each command-line arguments, run

python train.py -h

To run train.py, pytorch (python 2.7) is required.

By default, snapshots are saved in every 1000 iterations in the data/snapshots. The following features have been implemented in this repository -

  • Training regime is the same as that of the caffe implementation - SGD with momentum is used, along with the poly lr decay policy. A weight decay has been used. The last layer has 10 times the learning rate of other layers.
  • The iter_size parameter of caffe has been implemented, effectively increasing the batch_size to batch_size times iter_size
  • Random flipping and random scaling of input has been used as data augmentation. The caffe implementation uses 4 fixed scales (0.5,0.75,1,1.25,1.5) while in the pytorch implementation, for each iteration scale is randomly picked in the range - [0.5,1.3].
  • The boundary label (255 in ground truth labels) has not been ignored in the loss function in the current version, instead it has been merged with the background. The ignore_label caffe parameter would be implemented in the future versions. Post processing using CRF has not been implemented.
  • Batchnorm parameters are kept fixed during training. Also, caffe setting use_global_stats = True is reproduced during training. Running mean and variance are not calculated during training.

When run on a Nvidia Titan X GPU, train.py occupies about 11.9 GB of memory.

Evaluation

Evaluation of the saved models can be done by running

python evalpyt.py

To get a description of each command-line arguments, run

python evalpyt.py -h

Results

When trained on VOC augmented training set (with 10582 images) using MS COCO pretrained initialization in pytorch, we get a validation performance of 72.40%(evalpyt2.py, on VOC). The corresponding .pth file can be downloaded here. This is in comparision to 75.54% that is acheived by using train_iter_20000.caffemodel released by authors, which can be replicated by running this file . The .pth model converted from .caffemodel using the first section also gives 75.54% mean IOU. A previous version of this file reported mean IOU of 78.48% on the pytorch trained model which is caclulated in a different way (evalpyt.py, Mean IOU is calculated for each image and these values are averaged together. This way of calculating mean IOU is different than the one used by authors).

To replicate this performance, run

train.py --lr 0.00025 --wtDecay 0.0005 --maxIter 20000 --GTpath <train gt images path here> --IMpath <train images path here> --LISTpath data/list/train_aug.txt

Dataset

The model presented in the results section was trained using the augmented VOC train set which was released by this paper. You may download this augmented data directly from here.

Note that this code can be used to train pytorch-deeplab-resnet model for other datasets also.

Acknowledgement

A part of the code has been borrowed from https://github.com/ry/tensorflow-resnet.

Owner
Isht Dwivedi
Isht Dwivedi
Implementation of paper "DeepTag: A General Framework for Fiducial Marker Design and Detection"

Implementation of paper DeepTag: A General Framework for Fiducial Marker Design and Detection. Project page: https://herohuyongtao.github.io/research/

Yongtao Hu 46 Dec 12, 2022
True Few-Shot Learning with Language Models

This codebase supports using language models (LMs) for true few-shot learning: learning to perform a task using a limited number of examples from a single task distribution.

Ethan Perez 124 Jan 04, 2023
Database Reasoning Over Text project for ACL paper

Database Reasoning over Text This repository contains the code for the Database Reasoning Over Text paper, to appear at ACL2021. Work is performed in

Facebook Research 320 Dec 12, 2022
The source code of "SIDE: Center-based Stereo 3D Detector with Structure-aware Instance Depth Estimation", accepted to WACV 2022.

SIDE: Center-based Stereo 3D Detector with Structure-aware Instance Depth Estimation The source code of our work "SIDE: Center-based Stereo 3D Detecto

10 Dec 18, 2022
RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition

RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition (PyTorch) Paper: https://arxiv.org/abs/2105.01883 Citation: @

260 Jan 03, 2023
Normalizing Flows with a resampled base distribution

Resampling Base Distributions of Normalizing Flows Normalizing flows are a popular class of models for approximating probability distributions. Howeve

Vincent Stimper 24 Nov 03, 2022
Transformer model implemented with Pytorch

transformer-pytorch Transformer model implemented with Pytorch Attention is all you need-[Paper] Architecture Self-Attention self_attention.py class

Mingu Kang 12 Sep 03, 2022
Reimplementation of the paper `Human Attention Maps for Text Classification: Do Humans and Neural Networks Focus on the Same Words? (ACL2020)`

Human Attention for Text Classification Re-implementation of the paper Human Attention Maps for Text Classification: Do Humans and Neural Networks Foc

Shunsuke KITADA 15 Dec 13, 2021
This code is for our paper "VTGAN: Semi-supervised Retinal Image Synthesis and Disease Prediction using Vision Transformers"

ICCV Workshop 2021 VTGAN This code is for our paper "VTGAN: Semi-supervised Retinal Image Synthesis and Disease Prediction using Vision Transformers"

Sharif Amit Kamran 25 Dec 08, 2022
Back to Basics: Efficient Network Compression via IMP

Back to Basics: Efficient Network Compression via IMP Authors: Max Zimmer, Christoph Spiegel, Sebastian Pokutta This repository contains the code to r

IOL Lab @ ZIB 1 Nov 19, 2021
[CVPR 2021] Official PyTorch Implementation for "Iterative Filter Adaptive Network for Single Image Defocus Deblurring"

IFAN: Iterative Filter Adaptive Network for Single Image Defocus Deblurring Checkout for the demo (GUI/Google Colab)! The GUI version might occasional

Junyong Lee 173 Dec 30, 2022
Learning Confidence for Out-of-Distribution Detection in Neural Networks

Learning Confidence Estimates for Neural Networks This repository contains the code for the paper Learning Confidence for Out-of-Distribution Detectio

235 Jan 05, 2023
Code release for NeurIPS 2020 paper "Co-Tuning for Transfer Learning"

CoTuning Official implementation for NeurIPS 2020 paper Co-Tuning for Transfer Learning. [News] 2021/01/13 The COCO 70 dataset used in the paper is av

THUML @ Tsinghua University 35 Sep 23, 2022
The official PyTorch code for NeurIPS 2021 ML4AD Paper, "Does Thermal data make the detection systems more reliable?"

MultiModal-Collaborative (MMC) Learning Framework for integrating RGB and Thermal spectral modalities This is the official code for NeurIPS 2021 Machi

NeurAI 12 Nov 02, 2022
JudeasRx - graphical app for doing personalized causal medicine using the methods invented by Judea Pearl et al.

JudeasRX Instructions Read the references given in the Theory and Notation section below Fire up the Jupyter Notebook judeas-rx.ipynb The notebook dra

Robert R. Tucci 19 Nov 07, 2022
Deep Learning Slide Captcha

滑动验证码深度学习识别 本项目使用深度学习 YOLOV3 模型来识别滑动验证码缺口,基于 https://github.com/eriklindernoren/PyTorch-YOLOv3 修改。 只需要几百张缺口标注图片即可训练出精度高的识别模型,识别效果样例: 克隆项目 运行命令: git cl

Python3WebSpider 55 Jan 02, 2023
Toontown: Galaxy, a new Toontown game based on Disney's Toontown Online

Toontown: Galaxy The official archive repo for Toontown: Galaxy, a new Toontown

1 Feb 15, 2022
Pytorch based library to rank predicted bounding boxes using text/image user's prompts.

pytorch_clip_bbox: Implementation of the CLIP guided bbox ranking for Object Detection. Pytorch based library to rank predicted bounding boxes using t

Sergei Belousov 50 Nov 27, 2022