Transformers based fully on MLPs

Overview

Awesome MLP-based Transformers papersAwesome

An up-to-date list of Transformers based fully on MLPs without attention!

Why this repo?

After transformers and fully-based attention mechanism models took over most of the deep learning world since 2019, it appears that the power does not come from attention, and indeed replacing the feed-forward network in a transformer by attention performs horrible (~30% top-1 on ImageNet). It appears that Attention is not all we need. After all, we don't need inductive-biased models such as CNNs anymore, and we can lean back on MLPs since (1) we have enough data, (2) We have powerful optimization, regularization and data augmentation techniques. As we saw a big hipe on transformers awesome vision transformer and BERT-related papers, we expect to see a big hipe in fully MLP-based networks without attention, and the research focus is now shited to finding efficient ways of mixing tokens without involving attention mechanisms. This repository aims at gathering and collecting all these kind of papers.

Contributing

Please help in contributing to this list by submitting an issue or a pull request

- Paper Name [[pdf]](link) [[code]](link)

Papers

  • MLP-Mixer: An all-MLP Architecture for Vision [pdf] [official code] [code] [code] [code] [Yannic Kilcher Video]
  • Do You Even Need Attention? A Stack of Feed-Forward Layers Does Surprisingly Well on ImageNet [pdf] [code]
  • ResMLP: Feedforward networks for image classification with data-efficient training [pdf] [code] [code] [code]
  • Pay Attention to MLPs [pdf] [code] [code] [code]
  • FNet: Mixing Tokens with Fourier Transforms [pdf] [code] [Yannic Kilcher Video]
  • Can Attention Enable MLPs To Catch Up With CNNs? [pdf]
  • MixerGAN: An MLP-Based Architecture for Unpaired Image-to-Image Translation [pdf]
  • On the Bias Against Inductive Biases [pdf]
  • S2 MLP: Spatial-Shift MLP Architecture for Vision [pdf]
  • Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition [pdf] [code]
  • Rethinking Token-Mixing MLP for MLP-based Vision Backbone [pdf]
  • Global Filter Networks for Image Classification [pdf] [code]
  • What Makes for Hierarchical Vision Transformer? [pdf]
  • As-MLP: An Axial Shifted MLP architecture for Vision [pdf][code]
  • CycleMLP: A MLP-like Architecture for Dense Prediction [pdf][code]
  • S2 MLPv2: Improved Spatial-Shift MLP Architecture for Vision [pdf]
  • RaftMLP: Do MLP-based Models Dream of Winning Over Computer Vision? [pdf] [code]
  • Hire-MLP: Vision MLP via Hierarchical Rearrangement [pdf]
  • Sparse-MLP: A Fully-MLP Architecture with Conditional Computation [pdf]
  • Sparse MLP for Image Recognition: Is Self-Attention Really Necessary? [pdf]
  • Patches Are All You Need? [pdf] [code]
  • Exploring the Limits of Large Scale Pre-training [pdf]
  • Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs [pdf] [code]
  • Cascaded Cross MLP-Mixer GANs for Cross-View Image Translation [pdf] [code]
  • Are We Ready for a New Paradigm Shift? A Survey on Visual Deep MLP [pdf]
  • MetaFormer is Actually What You Need for Vision [pdf] [code]
  • An Image Patch is a Wave: Phase-Aware Vision MLP [pdf]
  • MorphMLP: A Self-Attention Free, MLP-Like Backbone for Image and Video [pdf]
  • SWAT: Spatial Structure Within and Among Tokens [pdf]
  • MLP Architectures for Vision-and-Language Modeling: An Empirical Study [pdf] [code]
  • RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality [pdf] [code]
Owner
Fawaz Sammani
The human brain is a miracle every human has, and mathematically modelling that brain is an overwhelming matter! I like teaching machines vision-language
Fawaz Sammani
A small tool to joint picture including gif

README 做设计的时候遇到拼接长图的情况,但是发现没有什么好用的能拼接gif的工具。 于是自己写了个gif拼接小工具。 可以自动拼接gif、png和jpg等常见格式。 效果 从上至下 从下至上 从左至右 从右至左 使用 克隆仓库 git clone https://github.com/Dels

3 Dec 15, 2021
Residual Pathway Priors for Soft Equivariance Constraints

Residual Pathway Priors for Soft Equivariance Constraints This repo contains the implementation and the experiments for the paper Residual Pathway Pri

Marc Finzi 13 Oct 12, 2022
ALL Snow Removed: Single Image Desnowing Algorithm Using Hierarchical Dual-tree Complex Wavelet Representation and Contradict Channel Loss (HDCWNet)

ALL Snow Removed: Single Image Desnowing Algorithm Using Hierarchical Dual-tree Complex Wavelet Representation and Contradict Channel Loss (HDCWNet) (

Wei-Ting Chen 49 Dec 27, 2022
Train Scene Graph Generation for Visual Genome and GQA in PyTorch >= 1.2 with improved zero and few-shot generalization.

Scene Graph Generation Object Detections Ground truth Scene Graph Generated Scene Graph In this visualization, woman sitting on rock is a zero-shot tr

Boris Knyazev 93 Dec 28, 2022
Predicting lncRNA–protein interactions based on graph autoencoders and collaborative training

Predicting lncRNA–protein interactions based on graph autoencoders and collaborative training Code for our paper "Predicting lncRNA–protein interactio

zhanglabNKU 1 Nov 29, 2022
Pytorch implementation of MaskGIT: Masked Generative Image Transformer

Pytorch implementation of MaskGIT: Masked Generative Image Transformer

Dominic Rampas 247 Dec 16, 2022
ParaGen is a PyTorch deep learning framework for parallel sequence generation

ParaGen is a PyTorch deep learning framework for parallel sequence generation. Apart from sequence generation, ParaGen also enhances various NLP tasks, including sequence-level classification, extrac

Bytedance Inc. 169 Dec 22, 2022
CONditionals for Ordinal Regression and classification in PyTorch

CONDOR pytorch implementation for ordinal regression with deep neural networks. Documentation: https://GarrettJenkinson.github.io/condor_pytorch About

7 Jul 25, 2022
Clockwork Convnets for Video Semantic Segmentation

Clockwork Convnets for Video Semantic Segmentation This is the reference implementation of arxiv:1608.03609: Clockwork Convnets for Video Semantic Seg

Evan Shelhamer 141 Nov 21, 2022
Official tensorflow implementation for CVPR2020 paper “Learning to Cartoonize Using White-box Cartoon Representations”

Tensorflow implementation for CVPR2020 paper “Learning to Cartoonize Using White-box Cartoon Representations”.

3.7k Dec 31, 2022
A simple, clean TensorFlow implementation of Generative Adversarial Networks with a focus on modeling illustrations.

IllustrationGAN A simple, clean TensorFlow implementation of Generative Adversarial Networks with a focus on modeling illustrations. Generated Images

268 Nov 27, 2022
Apply our monocular depth boosting to your own network!

MergeNet - Boost Your Own Depth Boost custom or edited monocular depth maps using MergeNet Input Original result After manual editing of base You can

Computational Photography Lab @ SFU 142 Dec 17, 2022
Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning

Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning This repository is the official implementation of CARE.

ChongjianGE 89 Dec 02, 2022
Semantic Segmentation of images using PixelLib with help of Pascalvoc dataset trained with Deeplabv3+ framework.

CARscan- Approach 1 - Segmentation of images by detecting contours. It failed because in images with elements along with cars were also getting detect

Padmanabha Banerjee 5 Jul 29, 2021
Learning from graph data using Keras

Steps to run = Download the cora dataset from this link : https://linqs.soe.ucsc.edu/data unzip the files in the folder input/cora cd code python eda

Mansar Youness 64 Nov 16, 2022
Developed an optimized algorithm which finds the most optimal path between 2 points in a 3D Maze using various AI search techniques like BFS, DFS, UCS, Greedy BFS and A*

Developed an optimized algorithm which finds the most optimal path between 2 points in a 3D Maze using various AI search techniques like BFS, DFS, UCS, Greedy BFS and A*. The algorithm was extremely

1 Mar 28, 2022
Official implementation of Long-Short Transformer in PyTorch.

Long-Short Transformer (Transformer-LS) This repository hosts the code and models for the paper: Long-Short Transformer: Efficient Transformers for La

NVIDIA Corporation 198 Dec 29, 2022
GluonMM is a library of transformer models for computer vision and multi-modality research

GluonMM is a library of transformer models for computer vision and multi-modality research. It contains reference implementations of widely adopted baseline models and also research work from Amazon

42 Dec 02, 2022
DeFMO: Deblurring and Shape Recovery of Fast Moving Objects (CVPR 2021)

Evaluation, Training, Demo, and Inference of DeFMO DeFMO: Deblurring and Shape Recovery of Fast Moving Objects (CVPR 2021) Denys Rozumnyi, Martin R. O

Denys Rozumnyi 139 Dec 26, 2022
YOLOv3 in PyTorch > ONNX > CoreML > TFLite

This repository represents Ultralytics open-source research into future object detection methods, and incorporates lessons learned and best practices

Ultralytics 9.3k Jan 07, 2023