The official implementation of "Rethink Dilated Convolution for Real-time Semantic Segmentation"

Related tags

Deep LearningRegSeg
Overview

RegSeg

The official implementation of "Rethink Dilated Convolution for Real-time Semantic Segmentation"

Paper: arxiv

params

D block

DBlock

Decoder

Decoder

Setup

Install the dependencies in requirements.txt by using pip and virtualenv.

Download Cityscapes

go to https://www.cityscapes-dataset.com, create an account, and download gtFine_trainvaltest.zip and leftImg8bit_trainvaltest.zip. You can delete the test images to save some space if you don't want to submit to the competition. Name the directory cityscapes_dataset. Make sure that you have downloaded the required python packages and run

CITYSCAPES_DATASET=cityscapes_dataset csCreateTrainIdLabelImgs

There are 19 classes.

Results from paper

To see the ablation studies results from the paper, go here.

Usage

To visualize your model, go to show.py. To train, validate, benchmark, and save the results of your model, go to train.py.

Results on Cityscapes server

RegSeg (exp48_decoder26, 30FPS): 78.3

Larger RegSeg (exp53_decoder29, 20 FPS): 79.5

Citation

If you find our work helpful, please consider citing our paper.

@article{gao2021rethink,
  title={Rethink Dilated Convolution for Real-time Semantic Segmentation},
  author={Gao, Roland},
  journal={arXiv preprint arXiv:2111.09957},
  year={2021}
}
Comments
  • question about STDC2-Seg75

    question about STDC2-Seg75

    Hi, I note that you benchmark the computation of STDC2-Seg75 which is not reported in the CVPR2021 paper. Did you test the speed of STDC-Seg on your own platform? How about the results?

    opened by ydhongHIT 2
  • Can not show.py

    Can not show.py

    I try show.py. But I can not.

    $ python3 show.py
    name= cityscapes
    train size: 2975
    val size: 500
    Traceback (most recent call last):
      File "show.py", line 358, in <module>
        show_cityscapes_model()
      File "show.py", line 337, in show_cityscapes_model
        show(model,val_loader,device,show_cityscapes_mask,num_images=num_images,skip=skip,images_per_line=images_per_line)
      File "show.py", line 134, in show
        outputs = model(images)
      File "/home/sounansu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/sounansu/RegSeg/model.py", line 76, in forward
        x=self.stem(x)
      File "/home/sounansu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/sounansu/RegSeg/blocks.py", line 22, in forward
        x = self.conv(x)
      File "/home/sounansu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/sounansu/.local/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 446, in forward
        return self._conv_forward(input, self.weight, self.bias)
      File "/home/sounansu/.local/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 442, in _conv_forward
        return F.conv2d(input, weight, bias, self.stride,
    RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same
    
    opened by sounansu 2
  • The pretrained model link

    The pretrained model link

    Hi, thank you for sharing the code. Can you provide download link about the pretrained model(exp48_decoder26 and exp53_decoder29) in Cityscapes dataset, Thank you very much!

    opened by gaowq2017 1
  • About train bug

    About train bug

    When using seg_transforms.py through your scripts 'camvid_efficientnet_b1_hyperseg-s', there always exsist 'TypeError: resize() got an unexpected keyword argument 'interpolation'' in 174 line. Does this bug only appear in this scripts and should I modify the code when using this scripts?

    opened by 870572761 0
  • CVE-2007-4559 Patch

    CVE-2007-4559 Patch

    Patching CVE-2007-4559

    Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

    If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

    opened by TrellixVulnTeam 0
  • About train code

    About train code

    When training, how did the miou and accuracy calculate? On train dataset or validate dataset? I think it's calculated on val dataset due to https://github.com/RolandGao/RegSeg/blob/main/train.py#L238. I trained the base regseg model with config cityscapes_trainval_1000epochs.yam on Cityscapes and got the unbelievable results. 840794c66f23deb33666dcffc4af5b5

    opened by Asthestarsfalll 6
  • confusion on field of view  and model inference time

    confusion on field of view and model inference time

    Hi, RolandGao, nice to see a good job! I see you've done a lot of experiments on the backbone setting, but I still have some confusion after reading your published paper.

    • First, You calculate the fov of 4095 to see the bottom-right pixel when training cityscape (1024x2048), so you have verify the backbone should be exp48 [ (1,1) + (1,2) + 4 * (1, 4) + 7 *(1, 14) ] with fov (3807). But I also find the same backbone when training the CamVid (720x960). Why not use a shallow backbone? I am training my own dataset with image resolution (512 x 512), do I need to modify the backbone architecture? Can you give some advice?
    • Second, I test inference time of regseg. I notice that the speed is not better than other real-time archs due to split and dilated conv even if model costs low GFLOPs. In the application, what we are concerned about is the speed, so is there any strategy to improve the speed?
    opened by LinaShanghaitech 5
  • Why not pretrain on ImageNet?

    Why not pretrain on ImageNet?

    Hi, Thanks for your excellent work ! I notice that RegSeg can achieve a high accuracy on Cityscapes without pretraining. I also did a lot of ablation studies and I think DDRNet will drop around 3% miou if they do not use ImageNet pretraining. How about trying to train your encoder on ImageNet and see what will happen? I really look forward to your result ! Thanks !

    opened by RobinhoodKi 1
Owner
Roland
University of Toronto CS 2023
Roland
Author: Wenhao Yu ([email protected]). ACL 2022. Commonsense Reasoning on Knowledge Graph for Text Generation

Diversifying Commonsense Reasoning Generation on Knowledge Graph Introduction -- This is the pytorch implementation of our ACL 2022 paper "Diversifyin

DM2 Lab @ ND 61 Dec 30, 2022
Official implementation of Self-supervised Graph Attention Networks (SuperGAT), ICLR 2021.

SuperGAT Official implementation of Self-supervised Graph Attention Networks (SuperGAT). This model is presented at How to Find Your Friendly Neighbor

Dongkwan Kim 127 Dec 28, 2022
Official PyTorch(Geometric) implementation of DPGNN(DPGCN) in "Distance-wise Prototypical Graph Neural Network for Node Imbalance Classification"

DPGNN This repository is an official PyTorch(Geometric) implementation of DPGNN(DPGCN) in "Distance-wise Prototypical Graph Neural Network for Node Im

Yu Wang (Jack) 18 Oct 12, 2022
Implementation of Shape Generation and Completion Through Point-Voxel Diffusion

Shape Generation and Completion Through Point-Voxel Diffusion Project | Paper Implementation of Shape Generation and Completion Through Point-Voxel Di

Linqi Zhou 103 Dec 29, 2022
The audio-video synchronization of MKV Container Format is exploited to achieve data hiding

The audio-video synchronization of MKV Container Format is exploited to achieve data hiding, where the hidden data can be utilized for various management purposes, including hyper-linking, annotation

Maxim Zaika 1 Nov 17, 2021
Inhomogeneous Social Recommendation with Hypergraph Convolutional Networks

Inhomogeneous Social Recommendation with Hypergraph Convolutional Networks This is our Pytorch implementation for the paper: Zirui Zhu, Chen Gao, Xu C

Zirui Zhu 3 Dec 30, 2022
Get 2D point positions (e.g., facial landmarks) projected on 3D mesh

points2d_projection_mesh Input 2D points (e.g. facial landmarks) on an image Camera parameters (extrinsic and intrinsic) of the image Aligned 3D mesh

5 Dec 08, 2022
Pose estimation for iOS and android using TensorFlow 2.0

💃 Mobile 2D Single Person (Or Your Own Object) Pose Estimation for TensorFlow 2.0 This repository is forked from edvardHua/PoseEstimationForMobile wh

tucan9389 165 Nov 16, 2022
GeoTransformer - Geometric Transformer for Fast and Robust Point Cloud Registration

Geometric Transformer for Fast and Robust Point Cloud Registration PyTorch imple

Zheng Qin 220 Jan 05, 2023
Repo for flood prediction using LSTMs and HAND

Abstract Every year, floods cause billions of dollars’ worth of damages to life, crops, and property. With a proper early flood warning system in plac

1 Oct 27, 2021
Research code of ICCV 2021 paper "Mesh Graphormer"

MeshGraphormer ✨ ✨ This is our research code of Mesh Graphormer. Mesh Graphormer is a new transformer-based method for human pose and mesh reconsructi

Microsoft 251 Jan 08, 2023
Model-based 3D Hand Reconstruction via Self-Supervised Learning, CVPR2021

S2HAND: Model-based 3D Hand Reconstruction via Self-Supervised Learning S2HAND presents a self-supervised 3D hand reconstruction network that can join

Yujin Chen 72 Dec 12, 2022
An open source Jetson Nano baseboard and tools to design your own.

My Jetson Nano Baseboard This basic baseboard gives the user the foundation and the flexibility to design their own baseboard for the Jetson Nano. It

NVIDIA AI IOT 57 Dec 29, 2022
Learning based AI for playing multi-round Koi-Koi hanafuda card games. Have fun.

Koi-Koi AI Learning based AI for playing multi-round Koi-Koi hanafuda card games. Platform Python PyTorch PySimpleGUI (for the interface playing vs AI

Sanghai Guan 10 Nov 20, 2022
Using Clinical Drug Representations for Improving Mortality and Length of Stay Predictions

Using Clinical Drug Representations for Improving Mortality and Length of Stay Predictions Usage Clone the code to local. https://github.com/tanlab/MI

Computational Biology and Machine Learning lab @ TOBB ETU 3 Oct 18, 2022
🔎 Super-scale your images and run experiments with Residual Dense and Adversarial Networks.

Image Super-Resolution (ISR) The goal of this project is to upscale and improve the quality of low resolution images. This project contains Keras impl

idealo 4k Jan 08, 2023
Neural HMMs are all you need (for high-quality attention-free TTS)

Neural HMMs are all you need (for high-quality attention-free TTS) Shivam Mehta, Éva Székely, Jonas Beskow, and Gustav Eje Henter This is the official

Shivam Mehta 0 Oct 28, 2022
A collection of Google research projects related to Federated Learning and Federated Analytics.

Federated Research Federated Research is a collection of research projects related to Federated Learning and Federated Analytics. Federated learning i

Google Research 483 Jan 05, 2023
Deep Structured Instance Graph for Distilling Object Detectors (ICCV 2021)

DSIG Deep Structured Instance Graph for Distilling Object Detectors Authors: Yixin Chen, Pengguang Chen, Shu Liu, Liwei Wang, Jiaya Jia. [pdf] [slide]

DV Lab 31 Nov 17, 2022
YOLOv5 in PyTorch > ONNX > CoreML > TFLite

This repository represents Ultralytics open-source research into future object detection methods, and incorporates lessons learned and best practices evolved over thousands of hours of training and e

Ultralytics 34.1k Dec 31, 2022