Vision Transformer Segmentation Network

This implementation of ViT in pytorch uses a super simple and straight-forward way of generating an output of the same size as the input by applying the inverse rearrange operation on all the predicted outputs. This enables convolution-free multi-class segmentation.

Most of the code is taken from https://github.com/lucidrains/vit-pytorch/blob/main/vit_pytorch/vit.py

Default Architecture Parameters:

model = ViTSeg( image_size=112, 
                channels=1,
                patch_size=7, 
                num_classes=1, 
                dim=768, 
                depth=6, 
                heads=12, 
                mlp_dim=2048, 
                learned_pos=False, 
                use_token=False)

image_size: An integer or a tuple defining the size of the input image (some code rewrite would enable any image size to be passed)
channels: An integer defining the umber of channels in the input image
patch_size: An integer or a tuple defining the size of the patches
num_classes: An integer representing the nuber of channels in the ouput
dim: An integer defining the size of the embedding dimension
depth: An integer defining the number of transformer layers
heads: An integer defining the number of heads in the transformer layers
mlp_dim: An integer defining the size of the MLP in the transformer layers
learned_pos: A boolean which, if true, switches from fixed positional encoding to learned positional encodings
use_token: A boolean which, if true, add a CLS token in the input and output

Citation

If you find this repository useful, please consider citing it:

@article{reynaud2021vitseg,
  title={ViTSeg-https://github.com/HReynaud/ViTSeg}, 
  url={https://github.com/HReynaud/ViTSeg},  
  Author={Reynaud, Hadrien}, 
  Year={2021}
}

A simple approach to emable dense segmentation with ViT.

Related tags

Overview

Vision Transformer Segmentation Network

Default Architecture Parameters:

Citation

Owner

HReynaud

Jaxtorch (a jax nn library)

StyleGAN - Official TensorFlow Implementation

SeqFormer: a Frustratingly Simple Model for Video Instance Segmentation

Algo-burn - Script to configure an Algorand address as a "burn" address for one or more ASA tokens

PyTorch implementation for Score-Based Generative Modeling through Stochastic Differential Equations (ICLR 2021, Oral)

Discord Multi Tool that focuses on design and easy usage

Pytorch reimplementation of the Mixer (MLP-Mixer: An all-MLP Architecture for Vision)

Unofficial implementation of MUSIQ (Multi-Scale Image Quality Transformer)

DGCNN - Dynamic Graph CNN for Learning on Point Clouds

Sequential Model-based Algorithm Configuration

Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network

CVPR 2021: "The Spatially-Correlative Loss for Various Image Translation Tasks"

code for "Feature Importance-aware Transferable Adversarial Attacks"

Run containerized, rootless applications with podman

PyTorch code for ICLR 2021 paper Unbiased Teacher for Semi-Supervised Object Detection

An automated algorithm to extract the linear blend skinning (LBS) from a set of example poses

ZEBRA: Zero Evidence Biometric Recognition Assessment

This is a code repository for paper OODformer: Out-Of-Distribution Detection Transformer

Code for Towards Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games

Implementations of the algorithms in the paper Approximative Algorithms for Multi-Marginal Optimal Transport and Free-Support Wasserstein Barycenters