This repository contains an implementation of ConvMixer for the ICLR 2022 submission "Patches Are All You Need?".

Last update: Dec 30, 2022

Related tags

Overview

Patches Are All You Need? 🤷

This repository contains an implementation of ConvMixer for the ICLR 2022 submission "Patches Are All You Need?".

Code overview

The most important code is in convmixer.py. We trained ConvMixers using the timm framework, which we copied from here.

Update: ConvMixer is now integrated into the timm framework itself. You can see the PR here.

Inside pytorch-image-models, we have made the following modifications. (Though one could look at the diff, we think it is convenient to summarize them here.)

Added ConvMixers
- added timm/models/convmixer.py
- modified timm/models/__init__.py
Added "OneCycle" LR Schedule
- added timm/scheduler/onecycle_lr.py
- modified timm/scheduler/scheduler.py
- modified timm/scheduler/scheduler_factory.py
- modified timm/scheduler/__init__.py
- modified train.py (added two lines to support this LR schedule)

We are confident that the use of the OneCycle schedule here is not critical, and one could likely just as well train ConvMixers with the built-in cosine schedule.

Evaluation

We provide some model weights below:

Model Name	Kernel Size	Patch Size	File Size
ConvMixer-1536/20	9	7	207MB
ConvMixer-768/32*	7	7	85MB
ConvMixer-1024/20	9	14	98MB

* Important: ConvMixer-768/32 here uses ReLU instead of GELU, so you would have to change convmixer.py accordingly (we will fix this later).

You can evaluate ConvMixer-1536/20 as follows:

python validate.py --model convmixer_1536_20 --b 64 --num-classes 1000 --checkpoint [/path/to/convmixer_1536_20_ks9_p7.pth.tar] [/path/to/ImageNet1k-val]

You should get a 81.37% accuracy.

Training

If you had a node with 10 GPUs, you could train a ConvMixer-1536/20 as follows (these are exactly the settings we used):

sh distributed_train.sh 10 [/path/to/ImageNet1k] 
    --train-split [your_train_dir] 
    --val-split [your_val_dir] 
    --model convmixer_1536_20 
    -b 64 
    -j 10 
    --opt adamw 
    --epochs 150 
    --sched onecycle 
    --amp 
    --input-size 3 224 224
    --lr 0.01 
    --aa rand-m9-mstd0.5-inc1 
    --cutmix 0.5 
    --mixup 0.5 
    --reprob 0.25 
    --remode pixel 
    --num-classes 1000 
    --warmup-epochs 0 
    --opt-eps=1e-3 
    --clip-grad 1.0

We also included a ConvMixer-768/32 in timm/models/convmixer.py (though it is simple to add more ConvMixers). We trained that one with the above settings but with 300 epochs instead of 150 epochs.

In the near future, we will upload weights.

The tweetable version of ConvMixer, which requires from torch.nn import *:

def ConvMixr(h,d,k,p,n):
 S,C,A=Sequential,Conv2d,lambda x:S(x,GELU(),BatchNorm2d(h))
 R=type('',(S,),{'forward':lambda s,x:s[0](x)+x})
 return S(A(C(3,h,p,p)),*[S(R(A(C(h,h,k,groups=h,padding=k//2))),A(C(h,h,1))) for i in range(d)],AdaptiveAvgPool2d((1,1)),Flatten(),Linear(h,n))

Comments

Cifar10 baseline doesn't reach 95%

Hello, I tried convmixer256 on Cifar-10 with the same timm options specified for ImageNet (except the num_classes) and it doesn't go beyond 90% accuracy. Could you please specify the options used for Cifar-10 experiment ?

opened by K-H-Ismail 13
What's new about this model?

Why “patches” are all you need? Patch embedding is Conv7x7 stem, The body is simply repeated Conv9x9 + Conv1x1, (Not challenging your work, it's indeed very interesting), but just kindly wondering what's new about this model?

opened by vztu 5
Training scheme modifications for small GPUs

Hi authors. Your paper has demonstrated a quite intriguing observation. I wish you luck with your submission. Thanks for sharing the code of the submission. When running the code, I got an issue regarding OOM when using the default batch size of 64. In the end I can only run with 8 samples per batch per GPU as my GPUs have only 11GB. I would like to know if you have tried smaller GPUs and achieved the same results. So far, besides learning rate modified according to the linear rule, I haven't made any change yet. If you tried training using smaller GPUs before, could you please share your experience? Thank you very much!

opened by justanhduc 4
Experiments with full convolutional layers instead of patch embedding？

Have the author tried to replace the patch embedding with the just convolution？That is, using 1 stride instead of p？

With this setting, this is a standard convolution network like MobileNet. I wonder what would be the performance？Is the performance gain of Convmix due to the patch embedding or the depthwise conv layers？

Very interested in this work, thanks.

opened by forjiuzhou 2
Training time

Hi, first of all thanks for a very interesting paper.

I would like to know how long did it take you to train the models? I'm trying to train ConvMixer-768/32 using 2xV100 and one epoch is ~3 hours, so I would estimate that full training would take ~= 2 * 3 * 300 ~= 1800 GPU hours, which is insane. Even if you trained with 10 GPUs it would take ~1 week for one experiment to finish. Are my calculations correct?

opened by bonlime 1
padding=same？

https://github.com/tmp-iclr/convmixer/blob/1cefd860a1a6a85369887d1a633425cedc2afd0a/convmixer.py#L18 There is an error:TypeError: conv2d(): argument 'padding' (position 5) must be tuple of ints, not str.

opened by linhaoqi027 1
Add Docker environment & web demo

Hey @ashertrockman, @tmp-iclr ! wave

This pull request makes it possible to run your model inside a Docker environment, which makes it easier for other people to run it. We're using an open source tool called Cog to make this process easier.

This also means we can make a web page where other people can try out your model! View it here: https://replicate.com/locuslab/convmixer and have a look at some Image classification examples we already uploaded.

By clicking "Claim this model" You'll be able to edit the everything, and we'll feature it on our website and tweet about it too.

In case you're wondering who I am, I'm from Replicate, where we're trying to make machine learning reproducible. We got frustrated that we couldn't run all the really interesting ML work being done. So, we're going round implementing models we like. blush

opened by ariel415el 0
Add Docker environment & web demo

Hey @ashertrockman, @tmp-iclr ! 👋

This pull request makes it possible to run your model inside a Docker environment, which makes it easier for other people to run it. We're using an open source tool called Cog to make this process easier.

This also means we can make a web page where other people can try out your model! View it here: https://replicate.com/locuslab/convmixer and have a look at some Image classification examples we already uploaded.

By clicking "Claim this model" You'll be able to edit the everything, and we'll feature it on our website and tweet about it too.

In case you're wondering who I am, I'm from Replicate, where we're trying to make machine learning reproducible. We got frustrated that we couldn't run all the really interesting ML work being done. So, we're going round implementing models we like. 😊

opened by ariel415el 0
Fix notebooks
Hi.

Fixed errors in pytorch-image-models/notebooks/{EffResNetComparison,GeneralizationToImageNetV2}.ipynb notebooks:

added missed pynvml installation;

resolved missed imports;

resolved errors due to outdated calls of timm library.

Tested in colab env: "Run all" without any errors.
opened by amrzv 0

CIFAR-10 training settings

First of all, thank you for the interesting work. I was experimenting the one with patch size 1 and kernel size 9 with CIFAR-10 with the following training settings:

--model tiny_convmixer
 -b 64 -j 8 
--opt adamw 
--epochs 200 
--sched onecycle 
--amp 
--input-size 3 32 32 
--lr 0.01 
--aa rand-m9-mstd0.5-inc1 
--cutmix 0.5 
--mixup 0.5 
--reprob 0.25 
--remode pixel 
--num-classes 10
--warmup-epochs 0
--opt-eps 1e-3
--clip-grad 1.0
--scale 0.75 1.0
--weight-decay 0.01
--mean 0.4914 0.4822 0.4465
--std 0.2471 0.2435 0.2616

I could get only 95.89%. I am supposed to get 96.03% according to Table 4 in the paper. Can you please let me know any setting I missed? Thank you again.

opened by fugokidi 0

Segmentation ConvMixer architecture ?

I was trying to figure what a Segmentation ConvMixer would look like, and came up with that (residual connection inspired by MultiResUNet). Does it make sense to you ?

opened by divideconcept 0
Request more experiment results to compare to other architecture.

Hi! This work is pretty interesting, but I think there should are more results like in "Demystifying Local Vision Transformer: Sparse Connectivity, Weight Sharing, and Dynamic Weight" as they replace local self-attention with depth-wise convolution in Swin Transformer. Since you conduct an advanced one with a more simple architecture compared to SwinTransformer, so I wonder if ConvMixer can get similar performance on object detection and semantic segmentation.

opened by LuoXin-s 1

Releases(timm-v1.0)

timm-v1.0(Oct 10, 2021)

These weights have slightly different parameter names and aren't compatible with this codebase.
Source code(tar.gz)
Source code(zip)
convmixer_1024_20_ks9_p14.pth.tar(93.42 MB)
convmixer_1536_20_ks9_p7.pth.tar(197.50 MB)
convmixer_768_32_ks7_p7_relu.pth.tar(81.04 MB)
v1.0(Oct 9, 2021)
We provide weights for:

ConvMixer-1536/20 (k = 9, p = 7)

ConvMixer-768/32 (k = 7, p = 7)

IMPORTANT: This model used ReLU instead of GELU.

Currently, you would need to change nn.GELU() to nn.ReLU() in convmixer.py to use these weights; we will fix this later.

ConvMixer-1024/20 (k = 9, p = 14)

Source code(tar.gz)
Source code(zip)
convmixer_1024_20_ks9_p14.pth.tar(93.41 MB)
convmixer_1536_20_ks9_p7.pth.tar(197.50 MB)
convmixer_768_32_ks7_p7_relu.pth.tar(81.04 MB)

Owner

ICLR 2022 Author

Patches Are All You Need? 🤷

GitHub Repository

Tensorflow 2 implementations of the C-SimCLR and C-BYOL self-supervised visual representation methods from "Compressive Visual Representations" (NeurIPS 2021)

Compressive Visual Representations This repository contains the source code for our paper, Compressive Visual Representations. We developed informatio

30 Nov 23, 2022

DL & CV-based indicator toolset for the vehicle drivers via live dash-cam footage.

Vehicle Indicator Toolset Deep Learning and Computer Vision based indicator toolset for vehicle drivers using live dash-cam footages. Tracking of vehi

12 Dec 28, 2021

A cross-lingual COVID-19 fake news dataset

CrossFake An English-Chinese COVID-19 fake&real news dataset from the ICDMW 2021 paper below: Cross-lingual COVID-19 Fake News Detection. Jiangshu Du,

11 Dec 01, 2022

load .txt to train YOLOX, same as Yolo others

YOLOX train your data you need generate data.txt like follow format (per line- one image). prepare one data.txt like this: img_path1 x1,y1,x2,y2,clas

18 Aug 18, 2022

A small library for creating and manipulating custom JAX Pytree classes

Treeo A small library for creating and manipulating custom JAX Pytree classes Light-weight: has no dependencies other than jax. Compatible: Treeo Tree

58 Nov 23, 2022

Final project code: Implementing MAE with downscaled encoders and datasets, for ESE546 FA21 at University of Pennsylvania

546 Final Project: Masked Autoencoder Haoran Tang, Qirui Wu 1. Training To train the network, please run mae_pretraining.py. Please modify folder path

0 Apr 22, 2022

Official PyTorch code for the paper: "Point-Based Modeling of Human Clothing" (ICCV 2021)

Point-Based Modeling of Human Clothing Paper | Project page | Video This is an official PyTorch code repository of the paper "Point-Based Modeling of

64 Nov 22, 2022

💡 Learnergy is a Python library for energy-based machine learning models.

Learnergy: Energy-based Machine Learners Welcome to Learnergy. Did you ever reach a bottleneck in your computational experiments? Are you tired of imp

57 Nov 17, 2022

Improving Query Representations for DenseRetrieval with Pseudo Relevance Feedback:A Reproducibility Study.

APR The repo for the paper Improving Query Representations for DenseRetrieval with Pseudo Relevance Feedback:A Reproducibility Study. Environment setu

8 Nov 26, 2022

ConE: Cone Embeddings for Multi-Hop Reasoning over Knowledge Graphs

ConE: Cone Embeddings for Multi-Hop Reasoning over Knowledge Graphs This is the code of paper ConE: Cone Embeddings for Multi-Hop Reasoning over Knowl

33 Dec 07, 2022

It is modified Tensorflow 2.x version of Mask R-CNN

[TF 2.X] Mask R-CNN for Object Detection and Segmentation [Notice] : The original mask-rcnn uses the tensorflow 1.X version. I modified it for tensorf

34 Nov 09, 2022

Code for paper entitled "Improving Novelty Detection using the Reconstructions of Nearest Neighbours"

NLN: Nearest-Latent-Neighbours A repository containing the implementation of the paper entitled Improving Novelty Detection using the Reconstructions

4 Dec 14, 2022

A set of simple scripts to process the Imagenet-1K dataset as TFRecords and make index files for NVIDIA DALI.

Overview This is a set of simple scripts to process the Imagenet-1K dataset as TFRecords and make index files for NVIDIA DALI. Make TFRecords To run t

8 Nov 01, 2022

To prepare an image processing model to classify the type of disaster based on the image dataset

Disaster Classificiation using CNNs bunnysaini/Disaster-Classificiation Goal To prepare an image processing model to classify the type of disaster bas

1 Jan 24, 2022

LONG-TERM SERIES FORECASTING WITH QUERYSELECTOR – EFFICIENT MODEL OF SPARSEATTENTION

Query Selector Here you can find code and data loaders for the paper https://arxiv.org/pdf/2107.08687v1.pdf . Query Selector is a novel approach to sp

62 Dec 17, 2022

Privacy-Preserving Machine Learning (PPML) Tutorial Presented at PyConDE 2022

PPML: Machine Learning on Data you cannot see Repository for the tutorial on Privacy-Preserving Machine Learning (PPML) presented at PyConDE 2022 Abst

10 Aug 16, 2022

The code for our paper CrossFormer: A Versatile Vision Transformer Based on Cross-scale Attention.

CrossFormer This repository is the code for our paper CrossFormer: A Versatile Vision Transformer Based on Cross-scale Attention. Introduction Existin

238 Jan 06, 2023

A curated list of awesome neural radiance fields papers

Awesome Neural Radiance Fields A curated list of awesome neural radiance fields papers, inspired by awesome-computer-vision. How to submit a pull requ

3.9k Dec 27, 2022

Code of 3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces

3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces Installation After cloning the repo open

37 Dec 03, 2022

The authors' implementation of Unsupervised Adversarial Learning of 3D Human Pose from 2D Joint Locations

Unsupervised Adversarial Learning of 3D Human Pose from 2D Joint Locations This is the authors' implementation of Unsupervised Adversarial Learning of

140 Dec 07, 2022

This repository contains an implementation of ConvMixer for the ICLR 2022 submission "Patches Are All You Need?".

Related tags

Overview

Patches Are All You Need? 🤷

Code overview

Evaluation

Training

Comments

Releases(timm-v1.0)

timm-v1.0(Oct 10, 2021)

v1.0(Oct 9, 2021)

Owner

ICLR 2022 Author

Tensorflow 2 implementations of the C-SimCLR and C-BYOL self-supervised visual representation methods from "Compressive Visual Representations" (NeurIPS 2021)

DL & CV-based indicator toolset for the vehicle drivers via live dash-cam footage.

A cross-lingual COVID-19 fake news dataset

load .txt to train YOLOX, same as Yolo others

A small library for creating and manipulating custom JAX Pytree classes

Final project code: Implementing MAE with downscaled encoders and datasets, for ESE546 FA21 at University of Pennsylvania

Official PyTorch code for the paper: "Point-Based Modeling of Human Clothing" (ICCV 2021)

💡 Learnergy is a Python library for energy-based machine learning models.

Improving Query Representations for DenseRetrieval with Pseudo Relevance Feedback:A Reproducibility Study.

ConE: Cone Embeddings for Multi-Hop Reasoning over Knowledge Graphs

It is modified Tensorflow 2.x version of Mask R-CNN

Code for paper entitled "Improving Novelty Detection using the Reconstructions of Nearest Neighbours"

A set of simple scripts to process the Imagenet-1K dataset as TFRecords and make index files for NVIDIA DALI.

To prepare an image processing model to classify the type of disaster based on the image dataset

LONG-TERM SERIES FORECASTING WITH QUERYSELECTOR – EFFICIENT MODEL OF SPARSEATTENTION

Privacy-Preserving Machine Learning (PPML) Tutorial Presented at PyConDE 2022

The code for our paper CrossFormer: A Versatile Vision Transformer Based on Cross-scale Attention.

A curated list of awesome neural radiance fields papers

Code of 3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces

The authors' implementation of Unsupervised Adversarial Learning of 3D Human Pose from 2D Joint Locations