Implementation of "A MLP-like Architecture for Dense Prediction"

Last update: Dec 27, 2022

Related tags

Deep Learning CycleMLP

Overview

A MLP-like Architecture for Dense Prediction (arXiv)

Updates

(22/07/2021) Initial release.

Model Zoo

We provide CycleMLP models pretrained on ImageNet 2012.

Model	Parameters	FLOPs	Top 1 Acc.	Download
CycleMLP-B1	15M	2.1G	78.9%	model
CycleMLP-B2	27M	3.9G	81.6%	model
CycleMLP-B3	38M	6.9G	82.4%	model
CycleMLP-B4	52M	10.1G	83.0%	model
CycleMLP-B5	76M	12.3G	83.2%	model

Usage

Install

PyTorch 1.7.0+ and torchvision 0.8.1+
timm:

pip install 'git+https://github.com/rwightman/[email protected]'

or

git clone https://github.com/rwightman/pytorch-image-models
cd pytorch-image-models
git checkout c2ba229d995c33aaaf20e00a5686b4dc857044be
pip install -e .

fvcore (optional, for FLOPs calculation)
mmcv, mmdetection, mmsegmentation (optional)

Data preparation

Download and extract ImageNet train and val images from http://image-net.org/. The directory structure is:

│path/to/imagenet/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......

Evaluation

To evaluate a pre-trained CycleMLP-B5 on ImageNet val with a single GPU run:

python main.py --eval --model CycleMLP_B5 --resume path/to/CycleMLP_B5.pth --data-path /path/to/imagenet

Training

To train CycleMLP-B5 on ImageNet on a single node with 8 gpus for 300 epochs run:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --model CycleMLP_B5 --batch-size 128 --data-path /path/to/imagenet --output_dir /path/to/save

Acknowledgement

This code is based on DeiT and pytorch-image-models. Thanks for their wonderful works

Citing

@article{chen2021cyclemlp,
  title={CycleMLP: A MLP-like Architecture for Dense Prediction},
  author={Chen, Shoufa and Xie, Enze and Ge, Chongjian and Liang, Ding and Luo, Ping},
  journal={arXiv preprint arXiv:2107.10224},
  year={2021}
}

License

CycleMLP is released under MIT License.

Comments

detection result

Applying PVT detection framework, I tried a CycleMLP-B1 based detector with RetinaNet 1x. I got AP=27.1, fairly inferior to the reported 38.6. Could you give some advices to reproduce the reported result?

The specific configure is as follows

base = [ 'base/models/retinanet_r50_fpn.py', 'base/datasets/coco_detection.py', 'base/schedules/schedule_1x.py', 'base/default_runtime.py' ] #optimizer model = dict( pretrained='./pretrained/CycleMLP_B1.pth', backbone=dict( type='CycleMLP_B1_feat', style='pytorch'), neck=dict( type='FPN', in_channels=[64, 128, 320, 512], out_channels=256, start_level=1, add_extra_convs='on_input', num_outs=5)) #optimizer optimizer = dict(delete=True, type='AdamW', lr=0.0001, weight_decay=0.0001) optimizer_config = dict(grad_clip=None)

find_unused_parameters = True

opened by mountain111 6
Compiling CycleMLP

Thank you for this great repo and interesting paper.

I tried compiling CycleMLP to onnx and not surpassingly the process failed since CycleMLP include dynamic offset creation in https://github.com/ShoufaChen/CycleMLP/blob/main/cycle_mlp.py#L132 and as such cannot be converted to a frozen graph. Were you able to convert CycleMLP to onnx or any other frozen graph framework?

Thanks in advance.

opened by shairoz-deci 6
Questions about offset calculation

Hi, thanks for your wonderful work.

I'm currently studying your work, and come up with some question about the offset calculations.

I understood the offset calculation mentioned on the paper, but can't understand about how generated offset is being used in the code.

For ex) if $S_H \times S_W : 3 \times 1$; I understood how the offset is applied in this figure

by calculate like this:

However, when I run the offset generating code, I can't figure out how this offset is being used in deform_conv2d

Can you provide more detailed information about this??

And also, the paper contains how $S_H \times S_W: 3 \times 3$ works, but in the code, it seems like either one ofkernel_size[0] or kernel_size[1] has to be 1. So, if I want to use $S_H \times S_W : 3 \times 3$, do I have to make $3 \times 1$ and $1 \times 3$ offsets and add those together?

Thank you again for your work. I really learned a lot.

opened by tae-mo 5
Example of CycleMLP Configuration for Dense Prediction

Hello.

First of all, thank you for curating this interesting work. I was wondering, are there any working examples of how I can use CycleMLP for dense prediction while maintaining the original input size (e.g., predict a 0 or 1 value for each pixel in an input image)? In addition, I am interested in only a single ("annotated") output image, although I noticed the model definitions given in this repository output multiple downsampled versions of the original input image. Any thoughts on this?

Thank you in advance for your time.

opened by amorehead 2
Swin-B vs CycleMLP-B on image classification

For classificaion on ImageNet-1k, the acuracy of Swin-B is 83.5, which is 0.1 higher than the proposed CycleMLP-B. But, in this paper, the authors reprot that the accuracy of Swin-B is 83.3, which is 0.1 lower than the proposed CycleMLP-B. Why are these accuracies different?

opened by hkzhang91 1

question about the offset

Thanks for your work!

The implementation of this code inspired me. But the calculation of offset here is confusing. Although this issue (https://github.com/ShoufaChen/CycleMLP/issues/10) has asked similar questions, I haven't found a reasonable explanation.

https://github.com/ShoufaChen/CycleMLP/blob/2f76a1f6e3cc6672143fdac46e3db5f9a7341253/cycle_mlp.py#L127-L136

kernel_size = (1, 3)
start_idx = (kernel_size[0] * kernel_size[1]) // 2
for i in range(num_channels):
    offset[0, 2 * i + 0, 0, 0] = 0
    # relative offset
    offset[0, 2 * i + 1, 0, 0] = (i + start_idx) % kernel_size[1] - (kernel_size[1] // 2)
offset.reshape(num_channels, 2)

tensor([[ 0.,  0.],
        [ 0.,  1.],
        [ 0., -1.],
        [ 0.,  0.],
        [ 0.,  1.],
        [ 0., -1.]])

the results are different with the figure in paper:

Some codes for verification:

import torch
from torchvision.ops import deform_conv2d

num_channels = 6

data = torch.arange(1, 6).reshape(1, 1, 1, 5).expand(-1, num_channels, -1, -1)
data
"""
tensor([[[[1, 2, 3, 4, 5]],
         [[1, 2, 3, 4, 5]],
         [[1, 2, 3, 4, 5]],
         [[1, 2, 3, 4, 5]],
         [[1, 2, 3, 4, 5]],
         [[1, 2, 3, 4, 5]]]])
"""

weight = torch.eye(num_channels).reshape(num_channels, num_channels, 1, 1)
weight.reshape(num_channels, num_channels)
"""
tensor([[1., 0., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0.],
        [0., 0., 0., 1., 0., 0.],
        [0., 0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 0., 1.]])
"""

offset = torch.empty(1, 2 * num_channels * 1 * 1, 1, 1)
kernel_size = (1, 3)
start_idx = (kernel_size[0] * kernel_size[1]) // 2
for i in range(num_channels):
    offset[0, 2 * i + 0, 0, 0] = 0
    # relative offset
    offset[0, 2 * i + 1, 0, 0] = (
        (i + start_idx) % kernel_size[1] - (kernel_size[1] // 2)
    )
offset.reshape(num_channels, 2)
"""
tensor([[ 0.,  0.],
        [ 0.,  1.],
        [ 0., -1.],
        [ 0.,  0.],
        [ 0.,  1.],
        [ 0., -1.]])
"""

deform_conv2d(
    data.float(), 
    offset=offset.expand(-1, -1, -1, 5).float(), 
    weight=weight.float(), 
    bias=None,
)
"""
tensor([[[[1., 2., 3., 4., 5.]],
         [[2., 3., 4., 5., 0.]],
         [[0., 1., 2., 3., 4.]],
         [[1., 2., 3., 4., 5.]],
         [[2., 3., 4., 5., 0.]],
         [[0., 1., 2., 3., 4.]]]])
"""

opened by lartpang 1

question about the offset

Hi, thank you very much for your excellent work. In Fig.4 of your paper, you show the pseudo-kernel when kernel size is 1x3. But I when I find that function "gen_offset" does not generate the same offset as Fig.4. The offset it generates is "0,1,0,-1,0,0,0,1..." instead of "0,1,0,-1,0,1,0,-1', which is shown in Fig.4. So could you please tell me the reason?

opened by linjing7 1
About "crop_pct"

Hi, thanks for your great work and code. I wonder the parameter crop_pct actually works in which part of code. When I go throught the timm, I can't find out how this crop_pct is loaded.

opened by ggjy 1
How to deploy CycleMLP-T for training？

Thank you very much for such a wonderful work!

After learning the cycle_mlp source code in the repository, I am very confused to deploy CycleMLP Block based on Swin Transformer. Is it convenient for you to release swin-based CycleMLP? Looking forward to your reply, Thanks!

opened by Pak287 0

Releases(v0.1)

v0.1(Jul 21, 2021)

Source code(tar.gz)
Source code(zip)
CycleMLP_B1.pth(57.92 MB)
CycleMLP_B2.pth(102.45 MB)
CycleMLP_B3.pth(146.75 MB)
CycleMLP_B4.pth(198.05 MB)
CycleMLP_B5.pth(289.38 MB)
CycleMLP_base.pth(335.13 MB)
CycleMLP_small.pth(189.52 MB)
CycleMLP_tiny.pth(108.05 MB)

Owner

Shoufa Chen

GitHub Repository

Over9000 optimizer

Optimizers and tests Every result is avg of 20 runs. Dataset LR Schedule Imagenette size 128, 5 epoch Imagewoof size 128, 5 epoch Adam - baseline OneC

405 Nov 27, 2022

Cross-Document Coreference Resolution

Cross-Document Coreference Resolution This repository contains code and models for end-to-end cross-document coreference resolution, as decribed in ou

29 Nov 28, 2022

Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis"

StrengthNet Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis" https://arxiv.org/abs/2110

65 Dec 20, 2022

On-device speech-to-index engine powered by deep learning.

30 Nov 24, 2022

KAPAO is an efficient multi-person human pose estimation model that detects keypoints and poses as objects and fuses the detections to predict human poses.

KAPAO (Keypoints and Poses as Objects) KAPAO is an efficient single-stage multi-person human pose estimation model that models keypoints and poses as

664 Dec 30, 2022

We propose a new method for effective shadow removal by regarding it as an exposure fusion problem.

Auto-exposure fusion for single-image shadow removal We propose a new method for effective shadow removal by regarding it as an exposure fusion proble

146 Dec 31, 2022

GDSC-ML Team Interview Task

GDSC-ML-Team---Interview-Task Task 1 : Clean or Messy room In this task we have to classify the given test images as clean or messy. - Link for datase

1 Jan 19, 2022

some classic model used to segment the medical images like CT、X-ray and so on

github_project This is a project for medical image segmentation. This project includes common medical image segmentation models such as U-net, FCN, De

2 Mar 30, 2022

AirPose: Multi-View Fusion Network for Aerial 3D Human Pose and Shape Estimation

AirPose AirPose: Multi-View Fusion Network for Aerial 3D Human Pose and Shape Estimation Check the teaser video This repository contains the code of A

41 Dec 05, 2022

an implementation of softmax splatting for differentiable forward warping using PyTorch

softmax-splatting This is a reference implementation of the softmax splatting operator, which has been proposed in Softmax Splatting for Video Frame I

338 Dec 28, 2022

Implementations of LSTM: A Search Space Odyssey variants and their training results on the PTB dataset.

An LSTM Odyssey Code for training variants of "LSTM: A Search Space Odyssey" on Fomoro. Check out the blog post. Training Install TensorFlow. Clone th

95 Apr 13, 2022

Powerful unsupervised domain adaptation method for dense retrieval.

Powerful unsupervised domain adaptation method for dense retrieval

191 Dec 28, 2022

Code & Experiments for "LILA: Language-Informed Latent Actions" to be presented at the Conference on Robot Learning (CoRL) 2021.

LILA LILA: Language-Informed Latent Actions Code and Experiments for Language-Informed Latent Actions (LILA), for using natural language to guide assi

11 Nov 25, 2022

[AAAI 2022] Separate Contrastive Learning for Organs-at-Risk and Gross-Tumor-Volume Segmentation with Limited Annotation

A paper Introduction This is an official release of the paper Separate Contrastive Learning for Organs-at-Risk and Gross-Tumor-Volume Segmentation wit

14 Dec 08, 2022

Face recognize and crop them

Face Recognize Cropping Module Source 아이디어 Face Alignment with OpenCV and Python Requirement 필요 라이브러리 imutil dlib python-opence (cv2) Usage 사용 방법 open

1 Feb 15, 2022

Using Machine Learning to Create High-Res Fine Art

BIG.art: Using Machine Learning to Create High-Res Fine Art How to use GLIDE and BSRGAN to create ultra-high-resolution paintings with fine details By

13 Nov 27, 2022

Machine Learning Privacy Meter: A tool to quantify the privacy risks of machine learning models with respect to inference attacks, notably membership inference attacks

ML Privacy Meter Machine learning is playing a central role in automated decision making in a wide range of organization and service providers. The da

357 Jan 06, 2023