SeMask: Semantically Masked Transformers for Semantic Segmentation.

Last update: Dec 30, 2022

Overview

SeMask: Semantically Masked Transformers

Jitesh Jain, Anukriti Singh, Nikita Orlov, Zilong Huang, Jiachen Li, Steven Walton, Humphrey Shi

This repo contains the code for our paper SeMask: Semantically Masked Transformers for Semantic Segmentation.

Results
Setup Instructions
Citing SeMask

1. Results

Note: † denotes the backbones were pretrained on ImageNet-22k and 384x384 resolution images.

ADE20K

Method	Backbone	Crop Size	mIoU	mIoU (ms+flip)	#params	config	Checkpoint
SeMask-T FPN	SeMask Swin-T	512x512	42.11	43.16	35M	config	TBD
SeMask-S FPN	SeMask Swin-S	512x512	45.92	47.63	56M	config	TBD
SeMask-B FPN	SeMask Swin-B^†	512x512	49.35	50.98	96M	config	TBD
SeMask-L FPN	SeMask Swin-L^†	640x640	51.89	53.52	211M	config	TBD
SeMask-L MaskFormer	SeMask Swin-L^†	640x640	54.75	56.15	219M	config	TBD
SeMask-L Mask2Former	SeMask Swin-L^†	640x640	56.41	57.52	222M	config	TBD
SeMask-L Mask2Former FAPN	SeMask Swin-L^†	640x640	56.68	58.00	227M	config	TBD
SeMask-L Mask2Former MSFAPN	SeMask Swin-L^†	640x640	56.54	58.22	224M	config	TBD

Cityscapes

Method	Backbone	Crop Size	mIoU	mIoU (ms+flip)	#params	config	Checkpoint
SeMask-T FPN	SeMask Swin-T	768x768	74.92	76.56	34M	config	TBD
SeMask-S FPN	SeMask Swin-S	768x768	77.13	79.14	56M	config	TBD
SeMask-B FPN	SeMask Swin-B^†	768x768	77.70	79.73	96M	config	TBD
SeMask-L FPN	SeMask Swin-L^†	768x768	78.53	80.39	211M	config	TBD
SeMask-L Mask2Former	SeMask Swin-L^†	512x1024	83.97	84.98	222M	config	TBD

COCO-Stuff 10k

Method	Backbone	Crop Size	mIoU	mIoU (ms+flip)	#params	config	Checkpoint
SeMask-T FPN	SeMask Swin-T	512x512	37.53	38.88	35M	config	TBD
SeMask-S FPN	SeMask Swin-S	512x512	40.72	42.27	56M	config	TBD
SeMask-B FPN	SeMask Swin-B^†	512x512	44.63	46.30	96M	config	TBD
SeMask-L FPN	SeMask Swin-L^†	640x640	47.47	48.54	211M	config	TBD

2. Setup Instructions

We provide the codebase with SeMask incorporated into various models. Please check the setup instructions inside the corresponding folders:

SeMask-FPN: Setup Instructions
SeMask-MaskFormer: Setup Instructions
SeMask-Mask2Former: Setup Instructions
SeMask-FAPN: Setup Instructions

3. Citing SeMask

@article{jain2022semask,
  title={SeMask: Semantically Masking Transformer Backbones for Effective Semantic Segmentation},
  author={Jitesh Jain and Anukriti Singh and Nikita Orlov and Zilong Huang and Jiachen Li and Steven Walton and Humphrey Shi},
  journal={arXiv preprint arXiv:...},
  year={2022}
}

Acknowledgements

Code is based heavily on the following repositories: Swin-Transformer-Semantic-Segmentation, Mask2Former, MaskFormer and FaPN-full.

SeMask: Semantically Masked Transformers for Semantic Segmentation.

Related tags

Overview

SeMask: Semantically Masked Transformers

Contents

1. Results

ADE20K

Cityscapes

COCO-Stuff 10k

2. Setup Instructions

3. Citing SeMask

Acknowledgements

Owner

Picsart AI Research (PAIR)

This repository contains the source code for the paper First Order Motion Model for Image Animation

Air Pollution Prediction System using Linear Regression and ANN

Hl classification bc - A Network-Based High-Level Data Classification Algorithm Using Betweenness Centrality

Official PyTorch implementation of the ICRA 2021 paper: Adversarial Differentiable Data Augmentation for Autonomous Systems.

Everything you need to know about NumPy( Creating Arrays, Indexing, Math,Statistics,Reshaping).

Scientific Computation Methods in C and Python (Open for Hacktoberfest 2021)

Semi-supervised Domain Adaptation via Minimax Entropy

This repository includes the code of the sequence-to-sequence model for discontinuous constituent parsing described in paper Discontinuous Grammar as a Foreign Language.

Codes for TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization.

A PyTorch library for Vision Transformers

RP-GAN: Stable GAN Training with Random Projections

Implement A3C for Mujoco gym envs

Generating Images with Recurrent Adversarial Networks

Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis (CVPR2022)

Emulation and Feedback Fuzzing of Firmware with Memory Sanitization

This is the research repository for Vid2Doppler: Synthesizing Doppler Radar Data from Videos for Training Privacy-Preserving Activity Recognition.

Pre-trained models for a Cascaded-FCN in caffe and tensorflow that segments

Implementation of ConvMixer in TensorFlow and Keras

Pytorch implementation of our method for high-resolution (e.g. 2048x1024) photorealistic video-to-video translation.

Python scripts for performing 3D human pose estimation using the Mobile Human Pose model in ONNX.