This project aims to explore the deployment of Swin-Transformer based on TensorRT, including the test results of FP16 and INT8.

Overview

Swin Transformer

This project aims to explore the deployment of SwinTransformer based on TensorRT, including the test results of FP16 and INT8.

Introduction(Quoted from the Original Project )

Swin Transformer original github repo (the name Swin stands for Shifted window) is initially described in arxiv, which capably serves as a general-purpose backbone for computer vision. It is basically a hierarchical Transformer whose representation is computed with shifted windows. The shifted windowing scheme brings greater efficiency by limiting self-attention computation to non-overlapping local windows while also allowing for cross-window connection.

Setup

  1. Please refer to the Install session for conda environment build.
  2. Please refer to the Data preparation session to prepare Imagenet-1K.
  3. Install the TensorRT, now we choose the TensorRT 8.2 GA(8.2.1.8) as the test version.

Code Structure

Focus on the modifications and additions.

.
├── export.py                  # Export the PyTorch model to ONNX format
├── get_started.md            
├── main.py
├── models
│   ├── build.py
│   ├── __init__.py
│   ├── swin_mlp.py
│   └── swin_transformer.py    # Build the model, modified to export the onnx and build the TensorRT engine
├── README.md
├── trt                        # Directory for TensorRT's engine evaluation and visualization.
│   ├── engine.py
│   ├── eval_trt.py            # Evaluate the tensorRT engine's accuary.
│   ├── onnxrt_eval.py         # Run the onnx model, generate the results, just for debugging
├── utils.py
└── weights

Export to ONNX and Build TensorRT Engine

You need to pay attention to the two modification below.

  1. Exporting the operator roll to ONNX opset version 9 is not supported. A: Please refer to torch/onnx/symbolic_opset9.py, add the support of exporting torch.roll.

  2. Node (Concat_264) Op (Concat) [ShapeInferenceError] All inputs to Concat must have same rank.
    A: Please refer to the modifications in models/swin_transformer.py. We use the input_resolution and window_size to compute the nW.

       if mask is not None:
         nW = int(self.input_resolution[0]*self.input_resolution[1]/self.window_size[0]/self.window_size[1])
         #nW = mask.shape[0]
         #print('nW: ', nW)
         attn = attn.view(B_ // nW, nW, self.num_heads, N, N) + mask.unsqueeze(1).unsqueeze(0)
         attn = attn.view(-1, self.num_heads, N, N)
         attn = self.softmax(attn)

Accuray Test Results on ImageNet-1K Validation Dataset

  1. Download the Swin-T pretrained model from Model Zoo. Evaluate the accuracy of the Pytorch pretrained model.

    $ python -m torch.distributed.launch --nproc_per_node 1 --master_port 12345 main.py --eval --cfg configs/swin_tiny_patch4_window7_224.yaml --resume ./weights/swin_tiny_patch4_window7_224.pth --data-path ../imagenet_1k
  2. export.py exports a pytorch model to onnx format.

    $ python export.py --eval --cfg configs/swin_tiny_patch4_window7_224.yaml --resume ./weights/swin_tiny_patch4_window7_224.pth --data-path ../imagenet_1k --batch-size 16
  3. Build the TensorRT engine using trtexec.

    $ trtexec --onnx=./weights/swin_tiny_patch4_window7_224.onnx --buildOnly --verbose --saveEngine=./weights/swin_tiny_patch4_window7_224_batch16.engine --workspace=4096

    Add the --fp16 or --best tag to build the corresponding fp16 or int8 model. Take fp16 as an example.

    $ trtexec --onnx=./weights/swin_tiny_patch4_window7_224.onnx --buildOnly --verbose --fp16 --saveEngine=./weights/swin_tiny_patch4_window7_224_batch16_fp16.engine --workspace=4096

    You can use the trtexec to test the throughput of the TensorRT engine.

    $ trtexec --loadEngine=./weights/swin_tiny_patch4_window7_224_batch16.engine
  4. trt/eval_trt.py aims to evalute the accuracy of the TensorRT engine.

$ python trt/eval_trt.py --eval --cfg configs/swin_tiny_patch4_window7_224.yaml --resume ./weights/swin_tiny_patch4_window7_224_batch16.engine --data-path ../imagenet_1k --batch-size 16
  1. trt/onnxrt_eval.py aims to evalute the accuracy of the Onnx model, just for debug.
    $ python trt/onnxrt_eval.py --eval --cfg configs/swin_tiny_patch4_window7_224.yaml --resume ./weights/swin_tiny_patch4_window7_224.onnx --data-path ../imagenet_1k --batch-size 16
SwinTransformer(T4) [email protected] Notes
PyTorch Pretrained Model 81.160
TensorRT Engine(FP32) 81.156
TensorRT Engine(FP16) - TensorRT 8.0.3.4: 81.156% vs TensorRT 8.2.1.8: 72.768%

Notes: Reported a nvbug for the FP16 accuracy issue, please refer to nvbug 3464358.

Speed Test of TensorRT engine(T4)

SwinTransformer(T4) FP32 FP16 INT8
batchsize=1 245.388 qps 510.072 qps 514.707 qps
batchsize=16 316.8624 qps 804.112 qps 804.1072 qps
batchsize=64 329.13984 qps 833.4208 qps 849.5168 qps
batchsize=256 331.9808 qps 844.10752 qps 840.33024 qps

Analysis: Compared with FP16, INT8 does not speed up at present. The main reason is that, for the Transformer structure, most of the calculations are processed by Myelin. Currently Myelin does not support the PTQ path, so the current test results are expected.
Attached the int8 and fp16 engine layer information with batchsize=128 on T4.

Build with int8 precision:

[12/04/2021-06:34:17] [V] [TRT] Engine Layer Information:
Layer(Reformat): Reformatting CopyNode for Input Tensor 0 to Conv_0, Tactic: 0, input_0[Float(128,3,224,224)] -> Reformatted Input Tensor 0 to Conv_0[Int8(128,3,224,224)]
Layer(CaskConvolution): Conv_0, Tactic: 1025026069226666066, Reformatted Input Tensor 0 to Conv_0[Int8(128,3,224,224)] -> 191[Int8(128,96,56,56)]
Layer(Reformat): Reformatting CopyNode for Input Tensor 0 to {ForeignNode[318...Transpose_2125 + Flatten_2127 + (Unnamed Layer* 4178) [Shuffle]]}, Tactic: 0, 191[Int8(128,96,56,56)] -> Reformatted Input Tensor 0 to {ForeignNode[318...Transpose_2125 + Flatten_2127 + (Unnamed Layer* 4178) [Shuffle]]}[Half(128,96,56,56)]
Layer(Myelin): {ForeignNode[318...Transpose_2125 + Flatten_2127 + (Unnamed Layer* 4178) [Shuffle]]}, Tactic: 0, Reformatted Input Tensor 0 to {ForeignNode[318...Transpose_2125 + Flatten_2127 + (Unnamed Layer* 4178) [Shuffle]]}[Half(128,96,56,56)] -> (Unnamed Layer* 4178) [Shuffle]_output[Half(128,768,1,1)]
Layer(CaskConvolution): Gemm_2128, Tactic: -1838109259315759592, (Unnamed Layer* 4178) [Shuffle]_output[Half(128,768,1,1)] -> (Unnamed Layer* 4179) [Fully Connected]_output[Half(128,1000,1,1)]
Layer(Reformat): Reformatting CopyNode for Input Tensor 0 to (Unnamed Layer* 4183) [Shuffle], Tactic: 0, (Unnamed Layer* 4179) [Fully Connected]_output[Half(128,1000,1,1)] -> Reformatted Input Tensor 0 to (Unnamed Layer* 4183) [Shuffle][Float(128,1000,1,1)]
Layer(NoOp): (Unnamed Layer* 4183) [Shuffle], Tactic: 0, Reformatted Input Tensor 0 to (Unnamed Layer* 4183) [Shuffle][Float(128,1000,1,1)] -> output_0[Float(128,1000)]

Build with fp16 precision:

[12/04/2021-06:44:31] [V] [TRT] Engine Layer Information:
Layer(Reformat): Reformatting CopyNode for Input Tensor 0 to Conv_0, Tactic: 0, input_0[Float(128,3,224,224)] -> Reformatted Input Tensor 0 to Conv_0[Half(128,3,224,224)]
Layer(CaskConvolution): Conv_0, Tactic: 1579845938601132607, Reformatted Input Tensor 0 to Conv_0[Half(128,3,224,224)] -> 191[Half(128,96,56,56)]
Layer(Myelin): {ForeignNode[318...(Unnamed Layer* 4183) [Shuffle]]}, Tactic: 0, 191[Half(128,96,56,56)] -> Reformatted Output Tensor 0 to {ForeignNode[318...(Unnamed Layer* 4183) [Shuffle]]}[Half(128,1000)]
Layer(Reformat): Reformatting CopyNode for Output Tensor 0 to {ForeignNode[318...(Unnamed Layer* 4183) [Shuffle]]}, Tactic: 0, Reformatted Output Tensor 0 to {ForeignNode[318...(Unnamed Layer* 4183) [Shuffle]]}[Half(128,1000)] -> output_0[Float(128,1000)]

Todo

After the FP16 nvbug 3464358 solved, will do the QAT optimization.

Owner
maggiez
maggiez
maggiez
For medical image segmentation

LeViT_UNet For medical image segmentation Our model is based on LeViT (https://github.com/facebookresearch/LeViT). You'd better gitclone its codes. Th

13 Dec 24, 2022
Adversarial-autoencoders - Tensorflow implementation of Adversarial Autoencoders

Adversarial Autoencoders (AAE) Tensorflow implementation of Adversarial Autoencoders (ICLR 2016) Similar to variational autoencoder (VAE), AAE imposes

Qian Ge 236 Nov 13, 2022
MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification

MixText This repo contains codes for the following paper: Jiaao Chen, Zichao Yang, Diyi Yang: MixText: Linguistically-Informed Interpolation of Hidden

GT-SALT 309 Dec 12, 2022
Reducing Information Bottleneck for Weakly Supervised Semantic Segmentation (NeurIPS 2021)

Reducing Information Bottleneck for Weakly Supervised Semantic Segmentation (NeurIPS 2021) The implementation of Reducing Infromation Bottleneck for W

Jungbeom Lee 81 Dec 16, 2022
Automatically creates genre collections for your Plex media

Plex Auto Genres Plex Auto Genres is a simple script that will add genre collection tags to your media making it much easier to search for genre speci

Shane Israel 63 Dec 31, 2022
(NeurIPS 2020) Wasserstein Distances for Stereo Disparity Estimation

Wasserstein Distances for Stereo Disparity Estimation Accepted in NeurIPS 2020 as Spotlight. [Project Page] Wasserstein Distances for Stereo Disparity

Divyansh Garg 92 Dec 12, 2022
Sequence-tagging using deep learning

Classification using Deep Learning Requirements PyTorch version = 1.9.1+cu111 Python version = 3.8.10 PyTorch-Lightning version = 1.4.9 Huggingface

Vineet Kumar 2 Dec 20, 2022
Multiview 3D object detection on MultiviewC dataset through moft3d.

Multiview Orthographic Feature Transformation for 3D Object Detection Multiview 3D object detection on MultiviewC dataset through moft3d. Introduction

Jiahao Ma 20 Dec 21, 2022
[NeurIPS 2021] Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods

Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods Large Scale Learning on Non-Homophilous Graphs: New Benchmark

60 Jan 03, 2023
Pytorch implementation of "ARM: Any-Time Super-Resolution Method"

ARM-Net Dependencies Python 3.6 Pytorch 1.7 Results Train Data preprocessing cd data_scripts python extract_subimages_test.py python data_augmentation

Bohong Chen 55 Nov 24, 2022
Combining Diverse Feature Priors

Combining Diverse Feature Priors This repository contains code for reproducing the results of our paper. Paper: https://arxiv.org/abs/2110.08220 Blog

Madry Lab 5 Nov 12, 2022
Equivariant layers for RC-complement symmetry in DNA sequence data

Equi-RC Equivariant layers for RC-complement symmetry in DNA sequence data This is a repository that implements the layers as described in "Reverse-Co

7 May 19, 2022
Harmonious Textual Layout Generation over Natural Images via Deep Aesthetics Learning

Harmonious Textual Layout Generation over Natural Images via Deep Aesthetics Learning Code for the paper Harmonious Textual Layout Generation over Nat

7 Aug 09, 2022
GemNet model in PyTorch, as proposed in "GemNet: Universal Directional Graph Neural Networks for Molecules" (NeurIPS 2021)

GemNet: Universal Directional Graph Neural Networks for Molecules Reference implementation in PyTorch of the geometric message passing neural network

Data Analytics and Machine Learning Group 124 Dec 30, 2022
Official PyTorch code for Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution (MANet, ICCV2021)

Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution (MANet, ICCV2021) This repository is the official PyTorc

Jingyun Liang 139 Dec 29, 2022
Use graph-based analysis to re-classify stocks and to improve Markowitz portfolio optimization

Dynamic Stock Industrial Classification Use graph-based analysis to re-classify stocks and experiment different re-classification methodologies to imp

Sheng Yang 10 Dec 05, 2022
RipsNet: a general architecture for fast and robust estimation of the persistent homology of point clouds

RipsNet: a general architecture for fast and robust estimation of the persistent homology of point clouds This repository contains the code asscoiated

Felix Hensel 14 Dec 12, 2022
Large Scale Multi-Illuminant (LSMI) Dataset for Developing White Balance Algorithm under Mixed Illumination

Large Scale Multi-Illuminant (LSMI) Dataset for Developing White Balance Algorithm under Mixed Illumination (ICCV 2021) Dataset License This work is l

DongYoung Kim 33 Jan 04, 2023
An Ensemble of CNN (Python 3.5.1 Tensorflow 1.3 numpy 1.13)

An Ensemble of CNN (Python 3.5.1 Tensorflow 1.3 numpy 1.13)

0 May 06, 2022
Code for paper "Multi-level Disentanglement Graph Neural Network"

Multi-level Disentanglement Graph Neural Network (MD-GNN) This is a PyTorch implementation of the MD-GNN, and the code includes the following modules:

Lirong Wu 6 Dec 29, 2022