RTSeg: Real-time Semantic Segmentation Comparative Study

Overview

Real-time Semantic Segmentation Comparative Study

The repository contains the official TensorFlow code used in our papers:

Description

Semantic segmentation benefits robotics related applications especially autonomous driving. Most of the research on semantic segmentation is only on increasing the accuracy of segmentation models with little attention to computationally efficient solutions. The few work conducted in this direction does not provide principled methods to evaluate the     different design choices for segmentation. In RTSeg, we address this gap by presenting a real-time semantic segmentation benchmarking framework with a decoupled design for feature extraction and decoding methods. The code and the experimental results are presented on the CityScapes dataset for urban scenes.



Models

Encoder Skip U-Net DilationV1 DilationV2
VGG-16 Yes Yes Yes No
ResNet-18 Yes Yes Yes No
MobileNet Yes Yes Yes Yes
ShuffleNet Yes Yes Yes Yes

NOTE: The rest of the pretrained weights for all the implemented models will be released soon. Stay in touch for the updates.

Reported Results

Test Set

Model GFLOPs Class IoU Class iIoU Category IoU Category iIoU
SegNet 286.03 56.1 34.2 79.8 66.4
ENet 3.83 58.3 24.4 80.4 64.0
DeepLab - 70.4 42.6 86.4 67.7
SkipNet-VGG16 - 65.3 41.7 85.7 70.1
ShuffleSeg 2.0 58.3 32.4 80.2 62.2
SkipNet-MobileNet 6.2 61.5 35.2 82.0 63.0

Validation Set

Encoder Decoder Coarse mIoU
MobileNet SkipNet No 61.3
ShuffleNet SkipNet No 55.5
ResNet-18 UNet No 57.9
MobileNet UNet No 61.0
ShuffleNet UNet No 57.0
MobileNet Dilation No 57.8
ShuffleNet Dilation No 53.9
MobileNet SkipNet Yes 62.4
ShuffleNet SkipNet Yes 59.3

** GFLOPs is computed on image resolution 360x640. However, the mIOU(s) are computed on the official image resolution required by CityScapes evaluation script 1024x2048.**

** Regarding Inference time, issue is reported here. We were not able to outperform the reported inference time from ENet architecture it could be due to discrepencies in the optimization we perform. People are welcome to improve on the optimization method we're using.

Usage

  1. Download the weights, processed data, and trained meta graphs from here
  2. Extract pretrained_weights.zip
  3. Extract full_cityscapes_res.zip under data/
  4. Extract unet_resnet18.zip under experiments/

Run

The file named run.sh provide a good example for running different architectures. Have a look at this file.

Examples to the running command in run.sh file:

python3 main.py --load_config=[config_file_name].yaml [train/test] [Trainer Class Name] [Model Class Name]
  • Remove comment from run.sh for running fcn8s_mobilenet on the validation set of cityscapes to get its mIoU. Our framework evaluation will produce results lower than the cityscapes evaluation script by small difference, for the final evaluation we use the cityscapes evaluation script. UNet ResNet18 should have 56% on validation set, but with cityscapes script we got 57.9%. The results on the test set for SkipNet-MobileNet and SkipNet-ShuffleNet are publicly available on the Cityscapes Benchmark.
python3 main.py --load_config=unet_resnet18_test.yaml test Train LinkNET
  • To measure running time, run in inference mode.
python3 main.py --load_config=unet_resnet18_test.yaml inference Train LinkNET
  • To run on different dataset or model, take one of the configuration files such as: config/experiments_config/unet_resnet18_test.yaml and modify it or create another .yaml configuration file depending on your needs.

NOTE: The current code does not contain the optimized code for measuring inference time, the final code will be released soon.

Main Dependencies

Python 3 and above
tensorflow 1.3.0/1.4.0
numpy 1.13.1
tqdm 4.15.0
matplotlib 2.0.2
pillow 4.2.1
PyYAML 3.12

All Dependencies

pip install -r [requirements_gpu.txt] or [requirements.txt]

Citation

If you find RTSeg useful in your research, please consider citing our work:

@ARTICLE{2018arXiv180302758S,
   author = {{Siam}, M. and {Gamal}, M. and {Abdel-Razek}, M. and {Yogamani}, S. and
    {Jagersand}, M.},
    title = "{RTSeg: Real-time Semantic Segmentation Comparative Study}",
  journal = {ArXiv e-prints},
archivePrefix = "arXiv",
   eprint = {1803.02758},
 primaryClass = "cs.CV",
 keywords = {Computer Science - Computer Vision and Pattern Recognition},
     year = 2018,
    month = mar,
   adsurl = {http://adsabs.harvard.edu/abs/2018arXiv180302758S},
  adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}

If you find ShuffleSeg useful in your research, please consider citing it as well:

@ARTICLE{2018arXiv180303816G,
   author = {{Gamal}, M. and {Siam}, M. and {Abdel-Razek}, M.},
    title = "{ShuffleSeg: Real-time Semantic Segmentation Network}",
  journal = {ArXiv e-prints},
archivePrefix = "arXiv",
   eprint = {1803.03816},
 primaryClass = "cs.CV",
 keywords = {Computer Science - Computer Vision and Pattern Recognition},
     year = 2018,
    month = mar,
   adsurl = {http://adsabs.harvard.edu/abs/2018arXiv180303816G},
  adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Related Project

Real-time Motion Segmentation using 2-stream shuffleseg Code

Owner
Mennatullah Siam
PhD Student
Mennatullah Siam
Official code for UnICORNN (ICML 2021)

UnICORNN (Undamped Independent Controlled Oscillatory RNN) [ICML 2021] This repository contains the implementation to reproduce the numerical experime

Konstantin Rusch 21 Dec 22, 2022
LaBERT - A length-controllable and non-autoregressive image captioning model.

Length-Controllable Image Captioning (ECCV2020) This repo provides the implemetation of the paper Length-Controllable Image Captioning. Install conda

bearcatt 53 Nov 13, 2022
Learning kernels to maximize the power of MMD tests

Code for the paper "Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy" (arXiv:1611.04488; published at ICLR 2017), by Douga

Danica J. Sutherland 201 Dec 17, 2022
Automatic detection and classification of Covid severity degree in LUS (lung ultrasound) scans

Final-Project Final project in the Technion, Biomedical faculty, by Mor Ventura, Dekel Brav & Omri Magen. Subproject 1: Automatic Detection of LUS Cha

Mor Ventura 1 Dec 18, 2021
classify fashion-mnist dataset with pytorch

Fashion-Mnist Classifier with PyTorch Inference 1- clone this repository: git clone https://github.com/Jhamed7/Fashion-Mnist-Classifier.git 2- Instal

1 Jan 14, 2022
A simple implementation of Kalman filter in Multi Object Tracking

kalman Filter in Multi-object Tracking A simple implementation of Kalman filter in Multi Object Tracking 本实现是在https://github.com/liuchangji/kalman-fil

124 Dec 29, 2022
g2o: A General Framework for Graph Optimization

g2o - General Graph Optimization Linux: Windows: g2o is an open-source C++ framework for optimizing graph-based nonlinear error functions. g2o has bee

Rainer Kümmerle 2.5k Dec 30, 2022
Submission to Twitter's algorithmic bias bounty challenge

Twitter Ethics Challenge: Pixel Perfect Submission to Twitter's algorithmic bias bounty challenge, by Travis Hoppe (@metasemantic). Abstract We build

Travis Hoppe 4 Aug 19, 2022
Lane follower: Lane-detector (OpenCV) + Object-detector (YOLO5) + CAN-bus

Lane Follower This code is for the lane follower, including perception and control, as shown below. Environment Hardware Industrial Camera Intel-NUC(1

Siqi Fan 3 Jul 07, 2022
Python code to fuse multiple RGB-D images into a TSDF voxel volume.

Volumetric TSDF Fusion of RGB-D Images in Python This is a lightweight python script that fuses multiple registered color and depth images into a proj

Andy Zeng 845 Jan 03, 2023
Code for "FGR: Frustum-Aware Geometric Reasoning for Weakly Supervised 3D Vehicle Detection", ICRA 2021

FGR This repository contains the python implementation for paper "FGR: Frustum-Aware Geometric Reasoning for Weakly Supervised 3D Vehicle Detection"(I

Yi Wei 31 Dec 08, 2022
Parameter Efficient Deep Probabilistic Forecasting

PEDPF Parameter Efficient Deep Probabilistic Forecasting (PEDPF) is a repository containing code to run experiments for several deep learning based pr

Olivier Sprangers 10 Jun 13, 2022
Joint Discriminative and Generative Learning for Person Re-identification. CVPR'19 (Oral)

Joint Discriminative and Generative Learning for Person Re-identification [Project] [Paper] [YouTube] [Bilibili] [Poster] [Supp] Joint Discriminative

NVIDIA Research Projects 1.2k Dec 30, 2022
Code for the paper One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation, CVPR 2021.

One Thing One Click One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation (CVPR2021) Code for the paper One Thi

44 Dec 12, 2022
Embeds a story into a music playlist by sorting the playlist so that the order of the music follows a narrative arc.

playlist-story-builder This project attempts to embed a story into a music playlist by sorting the playlist so that the order of the music follows a n

Dylan R. Ashley 0 Oct 28, 2021
a spacial-temporal pattern detection system for home automation

Argos a spacial-temporal pattern detection system for home automation. Based on OpenCV and Tensorflow, can run on raspberry pi and notify HomeAssistan

Angad Singh 133 Jan 05, 2023
Convenient tool for speeding up the intern/officer review process.

icpc-app-screen Convenient tool for speeding up the intern/officer applicant review process. Eliminates the pain from reading application responses of

1 Oct 30, 2021
PyExplainer: A Local Rule-Based Model-Agnostic Technique (Explainable AI)

PyExplainer PyExplainer is a local rule-based model-agnostic technique for generating explanations (i.e., why a commit is predicted as defective) of J

AI Wizards for Software Management (AWSM) Research Group 14 Nov 13, 2022
Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.

Multi-Task Framework for Cross-Lingual Abstractive Summarization (MCLAS) The code for ACL2021 paper Cross-Lingual Abstractive Summarization with Limit

Yu Bai 43 Nov 07, 2022
Implements MLP-Mixer: An all-MLP Architecture for Vision.

MLP-Mixer-CIFAR10 This repository implements MLP-Mixer as proposed in MLP-Mixer: An all-MLP Architecture for Vision. The paper introduces an all MLP (

Sayak Paul 51 Jan 04, 2023