Sematic-Segmantation - Semantic Segmentation on MIT ADE20K dataset in PyTorch

Overview

Semantic Segmentation on MIT ADE20K dataset in PyTorch

This is a PyTorch implementation of semantic segmentation models on MIT ADE20K scene parsing dataset (http://sceneparsing.csail.mit.edu/).

ADE20K is the largest open source dataset for semantic segmentation and scene parsing, released by MIT Computer Vision team. Follow the link below to find the repository for our dataset and implementations on Caffe and Torch7: https://github.com/CSAILVision/sceneparsing

If you simply want to play with our demo, please try this link: http://scenesegmentation.csail.mit.edu You can upload your own photo and parse it!

You can also use this colab notebook playground here to tinker with the code for segmenting an image.

You can reach the dataset here.

All pretrained models can be found at: http://sceneparsing.csail.mit.edu/model/pytorch

[From left to right: Test Image, Ground Truth, Predicted Result]

Color encoding of semantic categories can be found here: https://docs.google.com/spreadsheets/d/1se8YEtb2detS7OuPE86fXGyD269pMycAWe2mtKUj2W8/edit?usp=sharing

Updates

  • HRNet model is now supported.
  • We use configuration files to store most options which were in argument parser. The definitions of options are detailed in config/defaults.py.
  • We conform to Pytorch practice in data preprocessing (RGB [0, 1], substract mean, divide std).

Highlights

Syncronized Batch Normalization on PyTorch

This module computes the mean and standard-deviation across all devices during training. We empirically find that a reasonable large batch size is important for segmentation. We thank Jiayuan Mao for his kind contributions, please refer to Synchronized-BatchNorm-PyTorch for details.

The implementation is easy to use as:

  • It is pure-python, no C++ extra extension libs.
  • It is completely compatible with PyTorch's implementation. Specifically, it uses unbiased variance to update the moving average, and use sqrt(max(var, eps)) instead of sqrt(var + eps).
  • It is efficient, only 20% to 30% slower than UnsyncBN.

Dynamic scales of input for training with multiple GPUs

For the task of semantic segmentation, it is good to keep aspect ratio of images during training. So we re-implement the DataParallel module, and make it support distributing data to multiple GPUs in python dict, so that each gpu can process images of different sizes. At the same time, the dataloader also operates differently.

Now the batch size of a dataloader always equals to the number of GPUs, each element will be sent to a GPU. It is also compatible with multi-processing. Note that the file index for the multi-processing dataloader is stored on the master process, which is in contradict to our goal that each worker maintains its own file list. So we use a trick that although the master process still gives dataloader an index for __getitem__ function, we just ignore such request and send a random batch dict. Also, the multiple workers forked by the dataloader all have the same seed, you will find that multiple workers will yield exactly the same data, if we use the above-mentioned trick directly. Therefore, we add one line of code which sets the defaut seed for numpy.random before activating multiple worker in dataloader.

State-of-the-Art models

  • PSPNet is scene parsing network that aggregates global representation with Pyramid Pooling Module (PPM). It is the winner model of ILSVRC'16 MIT Scene Parsing Challenge. Please refer to https://arxiv.org/abs/1612.01105 for details.
  • UPerNet is a model based on Feature Pyramid Network (FPN) and Pyramid Pooling Module (PPM). It doesn't need dilated convolution, an operator that is time-and-memory consuming. Without bells and whistles, it is comparable or even better compared with PSPNet, while requiring much shorter training time and less GPU memory. Please refer to https://arxiv.org/abs/1807.10221 for details.
  • HRNet is a recently proposed model that retains high resolution representations throughout the model, without the traditional bottleneck design. It achieves the SOTA performance on a series of pixel labeling tasks. Please refer to https://arxiv.org/abs/1904.04514 for details.

Supported models

We split our models into encoder and decoder, where encoders are usually modified directly from classification networks, and decoders consist of final convolutions and upsampling. We have provided some pre-configured models in the config folder.

Encoder:

  • MobileNetV2dilated
  • ResNet18/ResNet18dilated
  • ResNet50/ResNet50dilated
  • ResNet101/ResNet101dilated
  • HRNetV2 (W48)

Decoder:

  • C1 (one convolution module)
  • C1_deepsup (C1 + deep supervision trick)
  • PPM (Pyramid Pooling Module, see PSPNet paper for details.)
  • PPM_deepsup (PPM + deep supervision trick)
  • UPerNet (Pyramid Pooling + FPN head, see UperNet for details.)

Performance:

IMPORTANT: The base ResNet in our repository is a customized (different from the one in torchvision). The base models will be automatically downloaded when needed.

Architecture MultiScale Testing Mean IoU Pixel Accuracy(%) Overall Score Inference Speed(fps)
MobileNetV2dilated + C1_deepsup No 34.84 75.75 54.07 17.2
Yes 33.84 76.80 55.32 10.3
MobileNetV2dilated + PPM_deepsup No 35.76 77.77 56.27 14.9
Yes 36.28 78.26 57.27 6.7
ResNet18dilated + C1_deepsup No 33.82 76.05 54.94 13.9
Yes 35.34 77.41 56.38 5.8
ResNet18dilated + PPM_deepsup No 38.00 78.64 58.32 11.7
Yes 38.81 79.29 59.05 4.2
ResNet50dilated + PPM_deepsup No 41.26 79.73 60.50 8.3
Yes 42.14 80.13 61.14 2.6
ResNet101dilated + PPM_deepsup No 42.19 80.59 61.39 6.8
Yes 42.53 80.91 61.72 2.0
UperNet50 No 40.44 79.80 60.12 8.4
Yes 41.55 80.23 60.89 2.9
UperNet101 No 42.00 80.79 61.40 7.8
Yes 42.66 81.01 61.84 2.3
HRNetV2 No 42.03 80.77 61.40 5.8
Yes 43.20 81.47 62.34 1.9

The training is benchmarked on a server with 8 NVIDIA Pascal Titan Xp GPUs (12GB GPU memory), the inference speed is benchmarked a single NVIDIA Pascal Titan Xp GPU, without visualization.

Environment

The code is developed under the following configurations.

  • Hardware: >=4 GPUs for training, >=1 GPU for testing (set [--gpus GPUS] accordingly)
  • Software: Ubuntu 16.04.3 LTS, CUDA>=8.0, Python>=3.5, PyTorch>=0.4.0
  • Dependencies: numpy, scipy, opencv, yacs, tqdm

Quick start: Test on an image using our trained model

  1. Here is a simple demo to do inference on a single image:
chmod +x demo_test.sh
./demo_test.sh

This script downloads a trained model (ResNet50dilated + PPM_deepsup) and a test image, runs the test script, and saves predicted segmentation (.png) to the working directory.

  1. To test on an image or a folder of images ($PATH_IMG), you can simply do the following:
python3 -u test.py --imgs $PATH_IMG --gpu $GPU --cfg $CFG

Training

  1. Download the ADE20K scene parsing dataset:
chmod +x download_ADE20K.sh
./download_ADE20K.sh
  1. Train a model by selecting the GPUs ($GPUS) and configuration file ($CFG) to use. During training, checkpoints by default are saved in folder ckpt.
python3 train.py --gpus $GPUS --cfg $CFG 
  • To choose which gpus to use, you can either do --gpus 0-7, or --gpus 0,2,4,6.

For example, you can start with our provided configurations:

  • Train MobileNetV2dilated + C1_deepsup
python3 train.py --gpus GPUS --cfg config/ade20k-mobilenetv2dilated-c1_deepsup.yaml
  • Train ResNet50dilated + PPM_deepsup
python3 train.py --gpus GPUS --cfg config/ade20k-resnet50dilated-ppm_deepsup.yaml
  • Train UPerNet101
python3 train.py --gpus GPUS --cfg config/ade20k-resnet101-upernet.yaml
  1. You can also override options in commandline, for example python3 train.py TRAIN.num_epoch 10 .

Evaluation

  1. Evaluate a trained model on the validation set. Add VAL.visualize True in argument to output visualizations as shown in teaser.

For example:

  • Evaluate MobileNetV2dilated + C1_deepsup
python3 eval_multipro.py --gpus GPUS --cfg config/ade20k-mobilenetv2dilated-c1_deepsup.yaml
  • Evaluate ResNet50dilated + PPM_deepsup
python3 eval_multipro.py --gpus GPUS --cfg config/ade20k-resnet50dilated-ppm_deepsup.yaml
  • Evaluate UPerNet101
python3 eval_multipro.py --gpus GPUS --cfg config/ade20k-resnet101-upernet.yaml

Integration with other projects

This library can be installed via pip to easily integrate with another codebase

pip install git+https://github.com/CSAILVision/[email protected]

Now this library can easily be consumed programmatically. For example

from mit_semseg.config import cfg
from mit_semseg.dataset import TestDataset
from mit_semseg.models import ModelBuilder, SegmentationModule

Reference

If you find the code or pre-trained models useful, please cite the following papers:

Semantic Understanding of Scenes through ADE20K Dataset. B. Zhou, H. Zhao, X. Puig, T. Xiao, S. Fidler, A. Barriuso and A. Torralba. International Journal on Computer Vision (IJCV), 2018. (https://arxiv.org/pdf/1608.05442.pdf)

@article{zhou2018semantic,
  title={Semantic understanding of scenes through the ade20k dataset},
  author={Zhou, Bolei and Zhao, Hang and Puig, Xavier and Xiao, Tete and Fidler, Sanja and Barriuso, Adela and Torralba, Antonio},
  journal={International Journal on Computer Vision},
  year={2018}
}

Scene Parsing through ADE20K Dataset. B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso and A. Torralba. Computer Vision and Pattern Recognition (CVPR), 2017. (http://people.csail.mit.edu/bzhou/publication/scene-parse-camera-ready.pdf)

@inproceedings{zhou2017scene,
    title={Scene Parsing through ADE20K Dataset},
    author={Zhou, Bolei and Zhao, Hang and Puig, Xavier and Fidler, Sanja and Barriuso, Adela and Torralba, Antonio},
    booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
    year={2017}
}
Owner
Berat Eren Terzioğlu
AI & Computer Vision Engineer
Berat Eren Terzioğlu
Image super-resolution through deep learning

srez Image super-resolution through deep learning. This project uses deep learning to upscale 16x16 images by a 4x factor. The resulting 64x64 images

David Garcia 5.3k Dec 28, 2022
labelpix is a graphical image labeling interface for drawing bounding boxes

Welcome to labelpix 👋 labelpix is a graphical image labeling interface for drawing bounding boxes. 🏠 Homepage Install pip install -r requirements.tx

schissmantics 26 May 24, 2022
以孤立语假设和宽度优先搜索为基础,构建了一种多通道堆叠注意力Transformer结构的斗地主ai

ddz-ai 介绍 斗地主是一种扑克游戏。游戏最少由3个玩家进行,用一副54张牌(连鬼牌),其中一方为地主,其余两家为另一方,双方对战,先出完牌的一方获胜。 ddz-ai以孤立语假设和宽度优先搜索为基础,构建了一种多通道堆叠注意力Transformer结构的系统,使其经过大量训练后,能在实际游戏中获

freefuiiismyname 88 May 15, 2022
Implementation for Stankevičiūtė et al. "Conformal time-series forecasting", NeurIPS 2021.

Conformal time-series forecasting Implementation for Stankevičiūtė et al. "Conformal time-series forecasting", NeurIPS 2021. If you use our code in yo

Kamilė Stankevičiūtė 36 Nov 21, 2022
LocUNet is a deep learning method to localize a UE based solely on the reported signal strengths from a set of BSs.

LocUNet LocUNet is a deep learning method to localize a UE based solely on the reported signal strengths from a set of BSs. The method utilizes accura

4 Oct 05, 2022
Benchmark for evaluating open-ended generation

OpenMEVA Contributed by Jian Guan, Zhexin Zhang. Thank Jiaxin Wen for DeBugging. OpenMEVA is a benchmark for evaluating open-ended story generation me

25 Nov 15, 2022
Transferable Unrestricted Attacks, which won 1st place in CVPR’21 Security AI Challenger: Unrestricted Adversarial Attacks on ImageNet.

Transferable Unrestricted Adversarial Examples This is the PyTorch implementation of the Arxiv paper: Towards Transferable Unrestricted Adversarial Ex

equation 16 Dec 29, 2022
codes for Self-paced Deep Regression Forests with Consideration on Ranking Fairness

Self-paced Deep Regression Forests with Consideration on Ranking Fairness This is official codes for paper Self-paced Deep Regression Forests with Con

Learning in Vision 4 Sep 11, 2022
Ensemble Visual-Inertial Odometry (EnVIO)

Ensemble Visual-Inertial Odometry (EnVIO) Authors : Jae Hyung Jung, Yeongkwon Choe, and Chan Gook Park 1. Overview This is a ROS package of Ensemble V

Jae Hyung Jung 95 Jan 03, 2023
ScaleNet: A Shallow Architecture for Scale Estimation

ScaleNet: A Shallow Architecture for Scale Estimation Repository for the code of ScaleNet paper: "ScaleNet: A Shallow Architecture for Scale Estimatio

Axel Barroso 34 Nov 09, 2022
PyTorch implementation of Advantage async actor-critic Algorithms (A3C) in PyTorch

Advantage async actor-critic Algorithms (A3C) in PyTorch @inproceedings{mnih2016asynchronous, title={Asynchronous methods for deep reinforcement lea

LEI TAI 111 Dec 08, 2022
2021-AIAC-QQ-Browser-Hyperparameter-Optimization-Rank6

2021-AIAC-QQ-Browser-Hyperparameter-Optimization-Rank6

Aigege 8 Mar 31, 2022
Neighbor2Seq: Deep Learning on Massive Graphs by Transforming Neighbors to Sequences

Neighbor2Seq: Deep Learning on Massive Graphs by Transforming Neighbors to Sequences This repository is an official PyTorch implementation of Neighbor

DIVE Lab, Texas A&M University 8 Jun 12, 2022
Official Code For TDEER: An Efficient Translating Decoding Schema for Joint Extraction of Entities and Relations (EMNLP2021)

TDEER 🦌 🦒 Official Code For TDEER: An Efficient Translating Decoding Schema for Joint Extraction of Entities and Relations (EMNLP2021) Overview TDEE

33 Dec 23, 2022
Implementation of "A Deep Learning Loss Function based on Auditory Power Compression for Speech Enhancement" by pytorch

This repository is used to suspend the results of our paper "A Deep Learning Loss Function based on Auditory Power Compression for Speech Enhancement"

ScorpioMiku 19 Sep 30, 2022
YOLOX + ROS(1, 2) object detection package

YOLOX + ROS(1, 2) object detection package

Ar-Ray 158 Dec 21, 2022
Deep Hedging Demo - An Example of Using Machine Learning for Derivative Pricing.

Deep Hedging Demo Pricing Derivatives using Machine Learning 1) Jupyter version: Run ./colab/deep_hedging_colab.ipynb on Colab. 2) Gui version: Run py

Yu Man Tam 102 Jan 06, 2023
Tensorflow implementation and notebooks for Implicit Maximum Likelihood Estimation

tf-imle Tensorflow 2 and PyTorch implementation and Jupyter notebooks for Implicit Maximum Likelihood Estimation (I-MLE) proposed in the NeurIPS 2021

NEC Laboratories Europe 69 Dec 13, 2022
Multi-robot collaborative exploration and mapping through Voronoi partition and DRL in unknown environment

Voronoi Multi_Robot Collaborate Exploration Introduction In the unknown environment, the cooperative exploration of multiple robots is completed by Vo

PeaceWord 6 Nov 22, 2022
SciPy fixes and extensions

scipyx SciPy is large library used everywhere in scientific computing. That's why breaking backwards-compatibility comes as a significant cost and is

Nico Schlömer 16 Jul 17, 2022