CMT: Convolutional Neural Networks Meet Vision Transformers

Last update: Dec 30, 2022

Related tags

Overview

CMT: Convolutional Neural Networks Meet Vision Transformers

1. Introduction

This repo is the CMT model which impelement with pytorch, no reference source code so this is a non-official version.

2. Enveriments

python 3.7+
pytorch 1.7.1
pillow
apex
opencv-python

You can see this repo to find how to install the apex

3. DataSet

Trainig

/data/home/imagenet/train/xxx.jpeg, 0
/data/home/imagenet/train/xxx.jpeg, 1
...
/data/home/imagenet/train/xxx.jpeg, 999

Testing

/data/home/imagenet/test/xxx.jpeg, 0
/data/home/imagenet/test/xxx.jpeg, 1
...
/data/home/imagenet/test/xxx.jpeg, 999

4. Training & Inference

Training

CMT-Tiny

#!/bin/bash
OMP_NUM_THREADS=1
MKL_NUM_THREADS=1
export OMP_NUM_THREADS
export MKL_NUM_THREADS
cd CMT-pytorch;
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -W ignore -m torch.distributed.launch --nproc_per_node 8 train.py --batch_size 512 --num_workers 48 --lr 6e-3 --optimizer_name "adamw" --tf_optimizer 1 --cosine 1 --model_name cmtti --max_epochs 300 \
--warmup_epochs 5 --num-classes 1000 --input_size 184 \ --crop_size 160 --weight_decay 1e-1 --grad_clip 0 --repeated-aug 0 --max_grad_norm 5.0 
--drop_path_rate 0.1 --FP16 0 --qkv_bias 1 
--ape 0 --rpe 1 --pe_nd 0 --mode O2 --amp 1 --apex 0 \ 
--train_file $file_folder$/train.txt \
--val_file $file_folder$/val.txt \
--log-dir $save_folder$/log_dir \
--checkpoints-path $save_folder$/checkpoints

Note: If you use the bs 128 * 8 may be get more accuracy, balance the acc & speed.

Inference

#!/bin/bash
cd CMT-pytorch;
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -W ignore test.py \
--dist-url 'tcp://127.0.0.1:9966' --dist-backend 'nccl' --multiprocessing-distributed=1 --world-size=1  --rank=0 
--batch-size 128 --num-workers 48 --num-classes 1000 --input_size 184 --crop_size 160 \
--ape 0 --rpe 1 --pe_nd 0 --qkv_bias 1 --swin 0 --model_name cmtti --dropout 0.1 --emb_dropout 0.1 \
--test_file $file_folder$/val.txt \
--checkpoints-path $save_folder$/checkpoints/xxx.pth.tar \
--save_folder $save_folder$/acc_logits/

calculate acc

python utils/calculate_acc.py --logits_file $save_folder$/acc_logits/

5. Imagenet Result

model-name	input_size	FLOPs	Params	[email protected]_crop(ours)	acc(papers)	weights
CMT-T	160x160	516M	11.3M	75.124%	79.2%	weights
CMT-T	224x224	1.01G	11.3M	78.4%	-	weights
CMT-XS	192x192	-	-	-	81.8%	-
CMT-S	224x224	-	-	-	83.5%	-
CMT-L	256x256	-	-	-	84.5%	-

6. TODO

Other result may comming sonn if someone need.
Release the CMT-XS result on the imagenet.
Check the diff with papers, author give the hyparameters on the issue
Adjusting the best hyperparameters for CMT or transformers

Supplementary

If you want to know more, I give the CMT explanation, as well as the tuning and training process on here.

CMT: Convolutional Neural Networks Meet Vision Transformers

Related tags

Overview

CMT: Convolutional Neural Networks Meet Vision Transformers

1. Introduction

2. Enveriments

3. DataSet

4. Training & Inference

5. Imagenet Result

6. TODO

Supplementary

Owner

FlyEgle

Pytorch implementations of popular off-policy multi-agent reinforcement learning algorithms, including QMix, VDN, MADDPG, and MATD3.

PyTorch implementation of our ICCV paper DeFRCN: Decoupled Faster R-CNN for Few-Shot Object Detection.

Self-supervised spatio-spectro-temporal represenation learning for EEG analysis

“Data Augmentation for Cross-Domain Named Entity Recognition” (EMNLP 2021)

Replication of Pix2Seq with Pretrained Model

library for nonlinear optimization, wrapping many algorithms for global and local, constrained or unconstrained, optimization

Code for ICML 2021 paper: How could Neural Networks understand Programs?

Stacked Generative Adversarial Networks

This repository is related to an Arabic tutorial, within the tutorial we discuss the common data structure and algorithms and their worst and best case for each, then implement the code using Python.

Pytorch reimplementation of PSM-Net: "Pyramid Stereo Matching Network"

Improving Query Representations for DenseRetrieval with Pseudo Relevance Feedback:A Reproducibility Study.

LOFO (Leave One Feature Out) Importance calculates the importances of a set of features based on a metric of choice,

Fastquant - Backtest and optimize your trading strategies with only 3 lines of code!

From the basics to slightly more interesting applications of Tensorflow

Real-Time Seizure Detection using EEG: A Comprehensive Comparison of Recent Approaches under a Realistic Setting

Solution to the Weather4cast 2021 challenge

MMdet2-based reposity about lightweight detection model: Nanodet, PicoDet.

Neural implicit reconstruction experiments for the Vector Neuron paper

DECAF: Deep Extreme Classification with Label Features

Nicely is a real-time Feedback and Intervention Program Depression is a prevalent issue across all age groups, socioeconomic classes, and cultural identities.