Vision Transformer Segmentation Network

This implementation of ViT in pytorch uses a super simple and straight-forward way of generating an output of the same size as the input by applying the inverse rearrange operation on all the predicted outputs. This enables convolution-free multi-class segmentation.

Most of the code is taken from https://github.com/lucidrains/vit-pytorch/blob/main/vit_pytorch/vit.py

Default Architecture Parameters:

model = ViTSeg( image_size=112, 
                channels=1,
                patch_size=7, 
                num_classes=1, 
                dim=768, 
                depth=6, 
                heads=12, 
                mlp_dim=2048, 
                learned_pos=False, 
                use_token=False)

image_size: An integer or a tuple defining the size of the input image (some code rewrite would enable any image size to be passed)
channels: An integer defining the umber of channels in the input image
patch_size: An integer or a tuple defining the size of the patches
num_classes: An integer representing the nuber of channels in the ouput
dim: An integer defining the size of the embedding dimension
depth: An integer defining the number of transformer layers
heads: An integer defining the number of heads in the transformer layers
mlp_dim: An integer defining the size of the MLP in the transformer layers
learned_pos: A boolean which, if true, switches from fixed positional encoding to learned positional encodings
use_token: A boolean which, if true, add a CLS token in the input and output

Citation

If you find this repository useful, please consider citing it:

@article{reynaud2021vitseg,
  title={ViTSeg-https://github.com/HReynaud/ViTSeg}, 
  url={https://github.com/HReynaud/ViTSeg},  
  Author={Reynaud, Hadrien}, 
  Year={2021}
}

A simple approach to emable dense segmentation with ViT.

Related tags

Overview

Vision Transformer Segmentation Network

Default Architecture Parameters:

Citation

Owner

HReynaud

PyTorch implementation of DARDet: A Dense Anchor-free Rotated Object Detector in Aerial Images

Code for the KDD 2021 paper 'Filtration Curves for Graph Representation'

Learning Energy-Based Models by Diffusion Recovery Likelihood

Code and models for "Rethinking Deep Image Prior for Denoising" (ICCV 2021)

The datasets and code of ACL 2021 paper "Aspect-Category-Opinion-Sentiment Quadruple Extraction with Implicit Aspects and Opinions".

[NeurIPS 2021] Introspective Distillation for Robust Question Answering

Hyperbolic Procrustes Analysis Using Riemannian Geometry

Paper list of log-based anomaly detection

League of Legends Reinforcement Learning Environment (LoLRLE) multiple training scenarios using PPO.

[ICCV 2021] Official PyTorch implementation for Deep Relational Metric Learning.

Code and data form the paper BERT Got a Date: Introducing Transformers to Temporal Tagging

This repository contains the official code of the paper Equivariant Subgraph Aggregation Networks (ICLR 2022)

HiddenMarkovModel implements hidden Markov models with Gaussian mixtures as distributions on top of TensorFlow

Adversarial Adaptation with Distillation for BERT Unsupervised Domain Adaptation

PyTorch version of the paper 'Enhanced Deep Residual Networks for Single Image Super-Resolution' (CVPRW 2017)

A framework for GPU based high-performance medical image processing and visualization

End-to-end Temporal Action Detection with Transformer. [Under review]

PROJECT - Az Residential Real Estate Analysis

Cweqgen - The CW Equation Generator

Adapter-BERT: Parameter-Efficient Transfer Learning for NLP.