TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation, CVPR2022

Overview

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation

Paper Links: TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation (CVPR 2022)

by Wenqiang Zhang*, Zilong Huang*, Guozhong Luo, Tao Chen, Xinggang Wang, Wenyu Liu, Gang Yu, Chunhua Shen.

(*) equal contribution, (†) corresponding author.

Introduction

Although vision transformers (ViTs) have achieved great success in computer vision, the heavy computational cost makes it not suitable to deal with dense prediction tasks such as semantic segmentation on mobile devices. In this paper, we present a mobile-friendly architecture named Token Pyramid Vision TransFormer(TopFormer). The proposed TopFormer takes Tokens from various scales as input to produce scale-aware semantic features, which are then injected into the corresponding tokens to augment the representation. Experimental results demonstrate that our method significantly outperforms CNN- and ViT-based networks across several semantic segmentation datasets and achieves a good trade-off between accuracy and latency.

The latency is measured on a single Qualcomm Snapdragon 865 with input size 512×512×3, only an ARM CPU core is used for speed testing. *indicates the input size is 448×448×3.

Updates

  • 04/23/2022: TopFormer backbone has been integrated into PaddleViT, checkout here for the 3rd party implementation on Paddle framework!

Requirements

  • pytorch 1.5+
  • mmcv-full==1.3.14

Main results

The classification models pretrained on ImageNet can be downloaded from Baidu Drive/Google Drive.

ADE20K

Model Params(M) FLOPs(G) mIoU(ss) Link
TopFormer-T_448x448_2x8_160k 1.4 0.5 32.5 Baidu Drive, Google Drive
TopFormer-T_448x448_4x8_160k 1.4 0.5 33.4 Baidu Drive, Google Drive
TopFormer-T_512x512_2x8_160k 1.4 0.6 33.6 Baidu Drive, Google Drive
TopFormer-T_512x512_4x8_160k 1.4 0.6 34.6 Baidu Drive, Google Drive
TopFormer-S_512x512_2x8_160k 3.1 1.2 36.5 Baidu Drive, Google Drive
TopFormer-S_512x512_4x8_160k 3.1 1.2 37.0 Baidu Drive, Google Drive
TopFormer-B_512x512_2x8_160k 5.1 1.8 38.3 Baidu Drive, Google Drive
TopFormer-B_512x512_4x8_160k 5.1 1.8 39.2 Baidu Drive, Google Drive
  • ss indicates single-scale.
  • The password of Baidu Drive is topf

Usage

Please see MMSegmentation for dataset prepare.

For training, run:

sh tools/dist_train.sh local_configs/topformer/<config-file> <num-of-gpus-to-use> --work-dir /path/to/save/checkpoint

To evaluate, run:

sh tools/dist_test.sh local_configs/topformer/<config-file> <checkpoint-path> <num-of-gpus-to-use>

To test the inference speed in mobile device, please refer to tnn_runtime.

Acknowledgement

The implementation is based on MMSegmentation.

Citation

if you find our work helpful to your experiments, please cite with:

@article{zhang2022topformer,
  title     = {TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation},
  author    = {Zhang, Wenqiang and Huang, Zilong and Luo, Guozhong and Chen, Tao and Wang,  Xinggang and Liu, Wenyu and Yu, Gang and Shen, Chunhua.},
  booktitle = {Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR)},
  year      = {2022}
}
Issues
  • FileNotFoundError

    FileNotFoundError

    Hello, developer, when I try to run in mmsegmentation environment, I get the following error FileNotFoundError: modelzoos/classification/topformer-T-224-66.2.pth can not be found. where should I get this file? topformer-T-224-66.2.pth

    opened by ke-dev 3
  • how to modify the batch size during inference?

    how to modify the batch size during inference?

    I tried to modify the value samples_per_gpu from 2 to 1 in config file, but the elapsed time in inference log seems to change not much. Do I have something wrong?

    opened by wmkai 2
  • AttributeError: 'ConfigDict' object has no attribute 'dist_params'

    AttributeError: 'ConfigDict' object has no attribute 'dist_params'

    Hello, Thanks for providing such a great repo. But I encountered an error when run bash tools/dist_train.sh local_configs/topformer/topformer_base.py 8 --work-dir /path/to/save/checkpoint Details are as follows: image

    opened by wmkai 2
  • The validation set result is 0

    The validation set result is 0

    Hi, thanks for your excellent work. When I used toformer to train the ADE20K dataset, the result of the validation set appeared 0. Have you ever encountered this situation? 337ba331169f76b1985b94671ea2f86 This is the command I use bash tools/dist_train.sh local_configs/topformer/topformer_tiny_448x448_160k_2x8_ade20k.py 1 look forward to you reply.

    opened by ke-dev 1
  • size of output

    size of output

    Hello, As shown in figure 2, 1/4 scale of input is present in the output. so for 448448 input, there is a 5656 image in the output!!! what is the point that I can't see?

    opened by javadmozaffari 1
  • Deploy test error with onnx backend

    Deploy test error with onnx backend

    Hello,

    Thanks for sharing this great repo. I trained and test with my own dataset, the prediction results are quite good.

    But when I exported the onnx file with command:

    python3 tools/convert2onnx.py  local_configs/topformer/topformer_tiny_512x512_80k_2x8_drive_inviol1k.py --input-img results/0109.png --shape 512 512 --checkpoint results/tiny_20k/latest.pth --output-file results/tiny_80k/tiny_512_512.onnx --show
    

    then try the deploy script with command: python tools/deploy_test.py local_configs/topformer/topformer_tiny_512x512_80k_2x8_drive_inviol1k.py results/tiny_80k/tiny_512_512.onnx --backend=onnxruntime --show

    Then it show error:

    File "tools/deploy_test.py", line 297, in main() File "tools/deploy_test.py", line 268, in main results = single_gpu_test( File "/home/yidong/anaconda3/envs/mask2former/lib/python3.8/site-packages/mmsegmentation-0.19.0-py3.8.egg/mmseg/apis/test.py", line 91, in single_gpu_test result = model(return_loss=False, **data) File "/home/yidong/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/home/yidong/anaconda3/envs/mask2former/lib/python3.8/site-packages/mmcv/parallel/data_parallel.py", line 42, in forward return super().forward(*inputs, **kwargs) File "/home/yidong/.local/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 165, in forward return self.module(*inputs[0], **kwargs[0]) File "/home/yidong/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/home/yidong/anaconda3/envs/mask2former/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 98, in new_func return old_func(*args, **kwargs) File "/home/yidong/anaconda3/envs/mask2former/lib/python3.8/site-packages/mmsegmentation-0.19.0-py3.8.egg/mmseg/models/segmentors/base.py", line 110, in forward return self.forward_test(img, img_metas, **kwargs) File "/home/yidong/anaconda3/envs/mask2former/lib/python3.8/site-packages/mmsegmentation-0.19.0-py3.8.egg/mmseg/models/segmentors/base.py", line 92, in forward_test return self.simple_test(imgs[0], img_metas[0], **kwargs) File "tools/deploy_test.py", line 84, in simple_test self.sess.run_with_iobinding(self.io_binding) File "/home/yidong/anaconda3/envs/mask2former/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 276, in run_with_iobinding self._sess.run_with_iobinding(iobinding._iobinding, run_options) RuntimeError: Error in execution: Got invalid dimensions for input: input for the following indices index: 2 Got: 1080 Expected: 512 index: 3 Got: 1878 Expected: 512 Please fix either the inputs or the model.

    Does it have any requirements for input images or shapes when export onnx file? the config setting as: img_scale = (1920, 1080) , crop_size = (512, 512).

    I also tried some other shape size, still confused with this error and any helps would be much appreciated.

    opened by wangyidong3 1
  • What's SASE

    What's SASE

    Excellent work.In your paper you compare SASE with FFN and ASPP , I read the code also, but I can't find out what is sase, could you please tell me what's SASE or the related paper

    opened by Liupengshuaige 1
  • onnx to tensorrt

    onnx to tensorrt

    Hi, I used this command to get tmp.onnx. python3 tools/pytorch2onnx.py \ ./local_configs/topformer/topformer_tiny_448x448_160k_2x8_ade20k.py \ --input-img ./imgs/999438_0_B_0007_7680-3840-11776-7936.jpg \ --shape 2048 2048 \ --checkpoint ./work_dirs/topformer_tiny_448x448_160k_2x8_ade20k/latest.pth

    then I got an error below when I try to convert tmp.onnx to tensorrt assert is_tensorrt_plugin_loaded(), 'TensorRT plugin should be compiled.' AssertionError: TensorRT plugin should be compiled.

    my environment configuration is cuda : 10.1 tensorrt:8.4.0.6 I also try cuda 11.3, but get the same error,have you tried switching to tensorrt? what is the difference between this and the operation of mmsegmentation docs? https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/useful_tools.md

    look forward to your reply.

    opened by ke-dev 0
  • libGL.so.1 Resolving Deficiencies

    libGL.so.1 Resolving Deficiencies

    Fixed the following error in the process of importing OpenCV on the Docker environment. For example, the same error occurs when running convert2onnx.py in the tools folder.

    ImportError: libGL.so.1: cannot open shared object file: No such file or directory
    
    opened by PINTO0309 0
  • is there any ImageNet pre-training config of Topformer?

    is there any ImageNet pre-training config of Topformer?

    I want to change some modules when using Topformer to do classification on ImageNet. Could you please provide the ImageNet Pre-training config? Thanks!

    opened by wmkai 1
  • Have you compared the effects of topformer and swin transformer ?

    Have you compared the effects of topformer and swin transformer ?

    Hi! Excuse me, have you compared the speed and accuracy of topformer and swin transformer under the same computing power? Looking forward to your reply. good luck!

    opened by tensorflowt 1
  • Is batchsize>1 supported in convert2onnx.py file?

    Is batchsize>1 supported in convert2onnx.py file?

    Hi! Thank you so much for your open source project, it's great! I encountered a problem when verifying its performance, as follows: When I want to test batchsize=2, convert the pth model to the onnx model. The specific operations are as follows: 864cd9253c929a44f7e5ba3ae149160 And perform model conversion script execution: 434eec985937b597c56e6ceb9aff0f0 But the generated model is still batchsize=1: 0e883d8c29cfd465294fe13deef078f What do I need to achieve onnx model generation with batchsize=2? Thank you very much, looking forward to your reply!

    opened by tensorflowt 1
  • add web demo/model to Huggingface

    add web demo/model to Huggingface

    Hi, would you be interested in adding TopFormer to Hugging Face? The Hub offers free hosting, and it would make your work more accessible and visible to the rest of the ML community.

    Example from other organizations: Keras: https://huggingface.co/keras-io Microsoft: https://huggingface.co/microsoft Facebook: https://huggingface.co/facebook

    Example spaces with repos: github: https://github.com/salesforce/BLIP Spaces: https://huggingface.co/spaces/salesforce/BLIP

    github: https://github.com/facebookresearch/omnivore Spaces: https://huggingface.co/spaces/akhaliq/omnivore

    and here are guides for adding spaces/models/datasets to your org

    How to add a Space: https://huggingface.co/blog/gradio-spaces how to add models: https://huggingface.co/docs/hub/adding-a-model uploading a dataset: https://huggingface.co/docs/datasets/upload_dataset.html

    Please let us know if you would be interested and if you have any questions, we can also help with the technical implementation.

    opened by AK391 1
Owner
Hust Visual Learning Team
Hust Visual Learning Team belongs to the Artificial Intelligence Research Institute in the School of EIC in HUST, Lead by @xinggangw
Hust Visual Learning Team
[CVPR2022] Representation Compensation Networks for Continual Semantic Segmentation

RCIL [CVPR2022] Representation Compensation Networks for Continual Semantic Segmentation Chang-Bin Zhang1, Jia-Wen Xiao1, Xialei Liu1, Ying-Cong Chen2

Chang-Bin Zhang 46 Jun 16, 2022
CVPR2022 (Oral) - Rethinking Semantic Segmentation: A Prototype View

Rethinking Semantic Segmentation: A Prototype View Rethinking Semantic Segmentation: A Prototype View, Tianfei Zhou, Wenguan Wang, Ender Konukoglu and

Tianfei Zhou 148 Jun 21, 2022
Adaptive Pyramid Context Network for Semantic Segmentation (APCNet CVPR'2019)

Adaptive Pyramid Context Network for Semantic Segmentation (APCNet CVPR'2019) Introduction Official implementation of Adaptive Pyramid Context Network

null 18 Mar 26, 2022
Code for the CVPR2022 paper "Frequency-driven Imperceptible Adversarial Attack on Semantic Similarity"

Introduction This is an official release of the paper "Frequency-driven Imperceptible Adversarial Attack on Semantic Similarity" (arxiv link). Abstrac

Leo 13 Jun 16, 2022
Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding (CVPR2022)

Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding by Qiaole Dong*, Chenjie Cao*, Yanwei Fu Paper and Supple

Qiaole Dong 98 Jun 26, 2022
Video Frame Interpolation with Transformer (CVPR2022)

VFIformer Official PyTorch implementation of our CVPR2022 paper Video Frame Interpolation with Transformer Dependencies python >= 3.8 pytorch >= 1.8.0

DV Lab 35 Jun 21, 2022
Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation) Download Synthia dataset The model uses

null 29 Jun 2, 2022
nnFormer: Interleaved Transformer for Volumetric Segmentation Code for paper "nnFormer: Interleaved Transformer for Volumetric Segmentation "

nnFormer: Interleaved Transformer for Volumetric Segmentation Code for paper "nnFormer: Interleaved Transformer for Volumetric Segmentation ". Please

jsguo 530 Jun 23, 2022
Predicting Semantic Map Representations from Images with Pyramid Occupancy Networks

This is the code associated with the paper Predicting Semantic Map Representations from Images with Pyramid Occupancy Networks, published at CVPR 2020.

Thomas Roddick 183 Jun 21, 2022
The official implementation of paper Siamese Transformer Pyramid Networks for Real-Time UAV Tracking, accepted by WACV22

SiamTPN Introduction This is the official implementation of the SiamTPN (WACV2022). The tracker intergrates pyramid feature network and transformer in

Robotics and Intelligent Systems Control @ NYUAD 21 Jun 2, 2022
(IEEE TIP 2021) Regularized Densely-connected Pyramid Network for Salient Instance Segmentation

RDPNet IEEE TIP 2021: Regularized Densely-connected Pyramid Network for Salient Instance Segmentation PyTorch training and testing code are available.

Yu-Huan Wu 39 May 21, 2022
Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers (arXiv2021)

Polyp-PVT by Bo Dong, Wenhai Wang, Deng-Ping Fan, Jinpeng Li, Huazhu Fu, & Ling Shao. This repo is the official implementation of "Polyp-PVT: Polyp Se

Deng-Ping Fan 82 Jun 23, 2022
This is an official implementation of the CVPR2022 paper "Blind2Unblind: Self-Supervised Image Denoising with Visible Blind Spots".

Blind2Unblind: Self-Supervised Image Denoising with Visible Blind Spots Blind2Unblind Citing Blind2Unblind @inproceedings{wang2022blind2unblind, tit

demonsjin 32 Jun 23, 2022
PSTR: End-to-End One-Step Person Search With Transformers (CVPR2022)

PSTR (CVPR2022) This code is an official implementation of "PSTR: End-to-End One-Step Person Search With Transformers (CVPR2022)". End-to-end one-step

Jiale Cao 13 Jun 21, 2022
CVPR2022 paper "Dense Learning based Semi-Supervised Object Detection"

[CVPR2022] DSL: Dense Learning based Semi-Supervised Object Detection DSL is the first work on Anchor-Free detector for Semi-Supervised Object Detecti

Bhchen 42 Jun 30, 2022
[CVPR2022] Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos

Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos Created by Muheng Li, Lei Chen, Yueqi Duan, Zhilan Hu, Jianjiang Feng, Jie

null 37 Jun 30, 2022
The official codes of our CVPR2022 paper: A Differentiable Two-stage Alignment Scheme for Burst Image Reconstruction with Large Shift

TwoStageAlign The official codes of our CVPR2022 paper: A Differentiable Two-stage Alignment Scheme for Burst Image Reconstruction with Large Shift Pa

Shi Guo 26 Jun 16, 2022
Official code for "Eigenlanes: Data-Driven Lane Descriptors for Structurally Diverse Lanes", CVPR2022

[CVPR 2022] Eigenlanes: Data-Driven Lane Descriptors for Structurally Diverse Lanes Dongkwon Jin, Wonhui Park, Seong-Gyun Jeong, Heeyeon Kwon, and Cha

Dongkwon Jin 67 Jun 27, 2022
Group R-CNN for Point-based Weakly Semi-supervised Object Detection (CVPR2022)

Group R-CNN for Point-based Weakly Semi-supervised Object Detection (CVPR2022) By Shilong Zhang*, Zhuoran Yu*, Liyang Liu*, Xinjiang Wang, Aojun Zhou,

Shilong Zhang 114 Jun 23, 2022