[Preprint] ConvMLP: Hierarchical Convolutional MLPs for Vision, 2021

Overview

Convolutional MLP

ConvMLP: Hierarchical Convolutional MLPs for Vision

Preprint link: ConvMLP: Hierarchical Convolutional MLPs for Vision

By Jiachen Li[1,2], Ali Hassani[1]*, Steven Walton[1]*, and Humphrey Shi[1,2,3]

In association with SHI Lab @ University of Oregon[1] and University of Illinois Urbana-Champaign[2], and Picsart AI Research (PAIR)[3]

Comparison

Abstract

MLP-based architectures, which consist of a sequence of consecutive multi-layer perceptron blocks, have recently been found to reach comparable results to convolutional and transformer-based methods. However, most adopt spatial MLPs which take fixed dimension inputs, therefore making it difficult to apply them to downstream tasks, such as object detection and semantic segmentation. Moreover, single-stage designs further limit performance in other computer vision tasks and fully connected layers bear heavy computation. To tackle these problems, we propose ConvMLP: a hierarchical Convolutional MLP for visual recognition, which is a light-weight, stage-wise, co-design of convolution layers, and MLPs. In particular, ConvMLP-S achieves 76.8% top-1 accuracy on ImageNet-1k with 9M parameters and 2.4 GMACs (15% and 19% of MLP-Mixer-B/16, respectively). Experiments on object detection and semantic segmentation further show that visual representation learned by ConvMLP can be seamlessly transferred and achieve competitive results with fewer parameters.

Model

How to run

Getting Started

Our base model is in pure PyTorch and Torchvision. No extra packages are required. Please refer to PyTorch's Getting Started page for detailed instructions.

You can start off with src.convmlp, which contains the three variants: convmlp_s, convmlp_m, convmlp_l:

from src.convmlp import convmlp_l, convmlp_s

model = convmlp_l(pretrained=True, progress=True)
model_sm = convmlp_s(num_classes=10)

Image Classification

timm is recommended for image classification training and required for the training script provided in this repository:

./dist_classification.sh $NUM_GPUS -c $CONFIG_FILE /path/to/dataset

You can use our training configurations provided in configs/classification:

./dist_classification.sh 8 -c configs/classification/convmlp_s_imagenet.yml /path/to/ImageNet
./dist_classification.sh 8 -c configs/classification/convmlp_m_imagenet.yml /path/to/ImageNet
./dist_classification.sh 8 -c configs/classification/convmlp_l_imagenet.yml /path/to/ImageNet

Object Detection

mmdetection is recommended for object detection training and required for the training script provided in this repository:

./dist_detection.sh $CONFIG_FILE $NUM_GPUS /path/to/dataset

You can use our training configurations provided in configs/detection:

./dist_detection.sh configs/detection/retinanet_convmlp_s_fpn_1x_coco.py 8 /path/to/COCO
./dist_detection.sh configs/detection/retinanet_convmlp_m_fpn_1x_coco.py 8 /path/to/COCO
./dist_detection.sh configs/detection/retinanet_convmlp_l_fpn_1x_coco.py 8 /path/to/COCO

Object Detection & Instance Segmentation

mmdetection is recommended for training Mask R-CNN and required for the training script provided in this repository (same as above).

You can use our training configurations provided in configs/detection:

./dist_detection.sh configs/detection/maskrcnn_convmlp_s_fpn_1x_coco.py 8 /path/to/COCO
./dist_detection.sh configs/detection/maskrcnn_convmlp_m_fpn_1x_coco.py 8 /path/to/COCO
./dist_detection.sh configs/detection/maskrcnn_convmlp_l_fpn_1x_coco.py 8 /path/to/COCO

Semantic Segmentation

mmsegmentation is recommended for semantic segmentation training and required for the training script provided in this repository:

./dist_segmentation.sh $CONFIG_FILE $NUM_GPUS /path/to/dataset

You can use our training configurations provided in configs/segmentation:

./dist_segmentation.sh configs/segmentation/fpn_convmlp_s_512x512_40k_ade20k.py 8 /path/to/ADE20k
./dist_segmentation.sh configs/segmentation/fpn_convmlp_m_512x512_40k_ade20k.py 8 /path/to/ADE20k
./dist_segmentation.sh configs/segmentation/fpn_convmlp_l_512x512_40k_ade20k.py 8 /path/to/ADE20k

Results

Image Classification

Feature maps from ResNet50, MLP-Mixer-B/16, our Pure-MLP Baseline and ConvMLP-M are presented in the image below. It can be observed that representations learned by ConvMLP involve more low-level features like edges or textures compared to the rest. Feature map visualization

Dataset Model Top-1 Accuracy # Params MACs
ImageNet ConvMLP-S 76.8% 9.0M 2.4G
ConvMLP-M 79.0% 17.4M 3.9G
ConvMLP-L 80.2% 42.7M 9.9G

If importing the classification models, you can pass pretrained=True to download and set these checkpoints. The same holds for the training script (classification.py and dist_classification.sh): pass --pretrained. The segmentation/detection training scripts also download the pretrained backbone if you pass the correct config files.

Downstream tasks

You can observe the summarized results from applying our model to object detection, instance and semantic segmentation, compared to ResNet, in the image below.

Object Detection

Dataset Model Backbone # Params APb APb50 APb75 Checkpoint
MS COCO Mask R-CNN ConvMLP-S 28.7M 38.4 59.8 41.8 Download
ConvMLP-M 37.1M 40.6 61.7 44.5 Download
ConvMLP-L 62.2M 41.7 62.8 45.5 Download
RetinaNet ConvMLP-S 18.7M 37.2 56.4 39.8 Download
ConvMLP-M 27.1M 39.4 58.7 42.0 Download
ConvMLP-L 52.9M 40.2 59.3 43.3 Download

Instance Segmentation

Dataset Model Backbone # Params APm APm50 APm75 Checkpoint
MS COCO Mask R-CNN ConvMLP-S 28.7M 35.7 56.7 38.2 Download
ConvMLP-M 37.1M 37.2 58.8 39.8 Download
ConvMLP-L 62.2M 38.2 59.9 41.1 Download

Semantic Segmentation

Dataset Model Backbone # Params mIoU Checkpoint
ADE20k Semantic FPN ConvMLP-S 12.8M 35.8 Download
ConvMLP-M 21.1M 38.6 Download
ConvMLP-L 46.3M 40.0 Download

Transfer

Dataset Model Top-1 Accuracy # Params
CIFAR-10 ConvMLP-S 98.0% 8.51M
ConvMLP-M 98.6% 16.90M
ConvMLP-L 98.6% 41.97M
CIFAR-100 ConvMLP-S 87.4% 8.56M
ConvMLP-M 89.1% 16.95M
ConvMLP-L 88.6% 42.04M
Flowers-102 ConvMLP-S 99.5% 8.56M
ConvMLP-M 99.5% 16.95M
ConvMLP-L 99.5% 42.04M

Citation

@article{li2021convmlp,
      title={ConvMLP: Hierarchical Convolutional MLPs for Vision}, 
      author={Jiachen Li and Ali Hassani and Steven Walton and Humphrey Shi},
      year={2021},
      eprint={2109.04454},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Owner
SHI Lab
Research in Synergetic & Holistic Intelligence, with current focus on Computer Vision, Machine Learning, and AI Systems & Applications
SHI Lab
Securetar - A streaming wrapper around python tarfile and allow secure handling files and support encryption

Secure Tar Secure Tarfile library It's a streaming wrapper around python tarfile

Pascal Vizeli 2 Dec 09, 2022
[CVPR 2022] Unsupervised Image-to-Image Translation with Generative Prior

GP-UNIT - Official PyTorch Implementation This repository provides the official PyTorch implementation for the following paper: Unsupervised Image-to-

Shuai Yang 125 Jan 03, 2023
N-Omniglot is a large neuromorphic few-shot learning dataset

N-Omniglot [Paper] || [Dataset] N-Omniglot is a large neuromorphic few-shot learning dataset. It reconstructs strokes of Omniglot as videos and uses D

11 Dec 05, 2022
PyTorch implementation of CVPR'18 - Perturbative Neural Networks

This is an attempt to reproduce results in Perturbative Neural Networks paper. See original repo for details.

Michael Klachko 57 May 14, 2021
BEAMetrics: Benchmark to Evaluate Automatic Metrics in Natural Language Generation

BEAMetrics: Benchmark to Evaluate Automatic Metrics in Natural Language Generation Installing The Dependencies $ conda create --name beametrics python

7 Jul 04, 2022
This is the official PyTorch implementation for "Mesa: A Memory-saving Training Framework for Transformers".

Mesa: A Memory-saving Training Framework for Transformers This is the official PyTorch implementation for Mesa: A Memory-saving Training Framework for

Zhuang AI Group 105 Dec 06, 2022
Code for Towards Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games

Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games How to run our algorithm? Create the new environment using: conda

MARL @ SJTU 8 Dec 27, 2022
RTSeg: Real-time Semantic Segmentation Comparative Study

Real-time Semantic Segmentation Comparative Study The repository contains the official TensorFlow code used in our papers: RTSEG: REAL-TIME SEMANTIC S

Mennatullah Siam 592 Nov 18, 2022
Python library for loading and using triangular meshes.

Trimesh is a pure Python (2.7-3.4+) library for loading and using triangular meshes with an emphasis on watertight surfaces. The goal of the library i

Michael Dawson-Haggerty 2.2k Jan 07, 2023
Using PyTorch Perform intent classification using three different models to see which one is better for this task

Using PyTorch Perform intent classification using three different models to see which one is better for this task

Yoel Graumann 1 Feb 14, 2022
The codebase for our paper "Generative Occupancy Fields for 3D Surface-Aware Image Synthesis" (NeurIPS 2021)

Generative Occupancy Fields for 3D Surface-Aware Image Synthesis (NeurIPS 2021) Project Page | Paper Xudong Xu, Xingang Pan, Dahua Lin and Bo Dai GOF

xuxudong 97 Nov 10, 2022
Defending graph neural networks against adversarial attacks (NeurIPS 2020)

GNNGuard: Defending Graph Neural Networks against Adversarial Attacks Authors: Xiang Zhang ( Zitnik Lab @ Harvard 44 Dec 07, 2022

Protect against subdomain takeover

domain-protect scans Amazon Route53 across an AWS Organization for domain records vulnerable to takeover deploy to security audit account scan your en

OVO Technology 0 Nov 17, 2022
Faster RCNN with PyTorch

Faster RCNN with PyTorch Note: I re-implemented faster rcnn in this project when I started learning PyTorch. Then I use PyTorch in all of my projects.

Long Chen 1.6k Dec 23, 2022
Python scripts for performing stereo depth estimation using the MobileStereoNet model in Tensorflow Lite.

TFLite-MobileStereoNet Python scripts for performing stereo depth estimation using the MobileStereoNet model in Tensorflow Lite. Stereo depth estimati

Ibai Gorordo 4 Feb 14, 2022
Multi-Glimpse Network With Python

Multi-Glimpse Network Multi-Glimpse Network: A Robust and Efficient Classification Architecture based on Recurrent Downsampled Attention arXiv Require

9 May 10, 2022
Using pretrained GROVER to extract the atomic fingerprints from molecule

Extracting atomic fingerprints from molecules using pretrained Graph Neural Network models (GROVER).

Xuan Vu Nguyen 1 Jan 28, 2022
CV backbones including GhostNet, TinyNet and TNT, developed by Huawei Noah's Ark Lab.

CV Backbones including GhostNet, TinyNet, TNT (Transformer in Transformer) developed by Huawei Noah's Ark Lab. GhostNet Code TinyNet Code TNT Code Pyr

HUAWEI Noah's Ark Lab 3k Jan 08, 2023
Pretraining Representations For Data-Efficient Reinforcement Learning

Pretraining Representations For Data-Efficient Reinforcement Learning Max Schwarzer, Nitarshan Rajkumar, Michael Noukhovitch, Ankesh Anand, Laurent Ch

Mila 40 Dec 11, 2022
《Dual-Resolution Correspondence Network》(NeurIPS 2020)

Dual-Resolution Correspondence Network Dual-Resolution Correspondence Network, NeurIPS 2020 Dependency All dependencies are included in asset/dualrcne

Active Vision Laboratory 45 Nov 21, 2022