SeMask: Semantically Masked Transformers for Semantic Segmentation.

Last update: Dec 30, 2022

Overview

SeMask: Semantically Masked Transformers

Jitesh Jain, Anukriti Singh, Nikita Orlov, Zilong Huang, Jiachen Li, Steven Walton, Humphrey Shi

This repo contains the code for our paper SeMask: Semantically Masked Transformers for Semantic Segmentation.

Results
Setup Instructions
Citing SeMask

1. Results

Note: † denotes the backbones were pretrained on ImageNet-22k and 384x384 resolution images.

ADE20K

Method	Backbone	Crop Size	mIoU	mIoU (ms+flip)	#params	config	Checkpoint
SeMask-T FPN	SeMask Swin-T	512x512	42.11	43.16	35M	config	TBD
SeMask-S FPN	SeMask Swin-S	512x512	45.92	47.63	56M	config	TBD
SeMask-B FPN	SeMask Swin-B^†	512x512	49.35	50.98	96M	config	TBD
SeMask-L FPN	SeMask Swin-L^†	640x640	51.89	53.52	211M	config	TBD
SeMask-L MaskFormer	SeMask Swin-L^†	640x640	54.75	56.15	219M	config	TBD
SeMask-L Mask2Former	SeMask Swin-L^†	640x640	56.41	57.52	222M	config	TBD
SeMask-L Mask2Former FAPN	SeMask Swin-L^†	640x640	56.68	58.00	227M	config	TBD
SeMask-L Mask2Former MSFAPN	SeMask Swin-L^†	640x640	56.54	58.22	224M	config	TBD

Cityscapes

Method	Backbone	Crop Size	mIoU	mIoU (ms+flip)	#params	config	Checkpoint
SeMask-T FPN	SeMask Swin-T	768x768	74.92	76.56	34M	config	TBD
SeMask-S FPN	SeMask Swin-S	768x768	77.13	79.14	56M	config	TBD
SeMask-B FPN	SeMask Swin-B^†	768x768	77.70	79.73	96M	config	TBD
SeMask-L FPN	SeMask Swin-L^†	768x768	78.53	80.39	211M	config	TBD
SeMask-L Mask2Former	SeMask Swin-L^†	512x1024	83.97	84.98	222M	config	TBD

COCO-Stuff 10k

Method	Backbone	Crop Size	mIoU	mIoU (ms+flip)	#params	config	Checkpoint
SeMask-T FPN	SeMask Swin-T	512x512	37.53	38.88	35M	config	TBD
SeMask-S FPN	SeMask Swin-S	512x512	40.72	42.27	56M	config	TBD
SeMask-B FPN	SeMask Swin-B^†	512x512	44.63	46.30	96M	config	TBD
SeMask-L FPN	SeMask Swin-L^†	640x640	47.47	48.54	211M	config	TBD

2. Setup Instructions

We provide the codebase with SeMask incorporated into various models. Please check the setup instructions inside the corresponding folders:

SeMask-FPN: Setup Instructions
SeMask-MaskFormer: Setup Instructions
SeMask-Mask2Former: Setup Instructions
SeMask-FAPN: Setup Instructions

3. Citing SeMask

@article{jain2022semask,
  title={SeMask: Semantically Masking Transformer Backbones for Effective Semantic Segmentation},
  author={Jitesh Jain and Anukriti Singh and Nikita Orlov and Zilong Huang and Jiachen Li and Steven Walton and Humphrey Shi},
  journal={arXiv preprint arXiv:...},
  year={2022}
}

Acknowledgements

Code is based heavily on the following repositories: Swin-Transformer-Semantic-Segmentation, Mask2Former, MaskFormer and FaPN-full.

SeMask: Semantically Masked Transformers for Semantic Segmentation.

Related tags

Overview

SeMask: Semantically Masked Transformers

Contents

1. Results

ADE20K

Cityscapes

COCO-Stuff 10k

2. Setup Instructions

3. Citing SeMask

Acknowledgements

Owner

Picsart AI Research (PAIR)

Continuous Security Group Rule Change Detection & Response at scale

This repository contains code accompanying the paper "An End-to-End Chinese Text Normalization Model based on Rule-Guided Flat-Lattice Transformer"

PyTorch implementation for our NeurIPS 2021 Spotlight paper "Long Short-Term Transformer for Online Action Detection".

This is the code for Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning

Incorporating Transformer and LSTM to Kalman Filter with EM algorithm

This repository includes code of my study about Asynchronous in Frequency domain of GAN images.

Self-training with Weak Supervision (NAACL 2021)

Code for ICCV 2021 paper "HuMoR: 3D Human Motion Model for Robust Pose Estimation"

LightSeq is a high performance training and inference library for sequence processing and generation implemented in CUDA

A large-scale video dataset for the training and evaluation of 3D human pose estimation models

TensorFlow (v2.7.0) benchmark results on an M1 Macbook Air 2020 laptop (macOS Monterey v12.1).

A Comprehensive Study on Learning-Based PE Malware Family Classification Methods

MobileNetV1-V2，MobileNeXt，GhostNet，AdderNet，ShuffleNetV1-V2，Mobile+ViT etc.

[ICCV 2021 Oral] Mining Latent Classes for Few-shot Segmentation

An official implementation of "SFNet: Learning Object-aware Semantic Correspondence" (CVPR 2019, TPAMI 2020) in PyTorch.

Learning Logic Rules for Document-Level Relation Extraction

Colossal-AI: A Unified Deep Learning System for Large-Scale Parallel Training

Code for: https://berkeleyautomation.github.io/bags/

A Nim frontend for pytorch, aiming to be mostly auto-generated and internally using ATen.

Reproduced Code for Image Forgery Detection papers.