[CVPR 2021] Involution: Inverting the Inherence of Convolution for Visual Recognition, a brand new neural operator

Last update: Dec 28, 2022

Overview

involution

Official implementation of a neural operator as described in Involution: Inverting the Inherence of Convolution for Visual Recognition (CVPR'21)

By Duo Li, Jie Hu, Changhu Wang, Xiangtai Li, Qi She, Lei Zhu, Tong Zhang, and Qifeng Chen

TL; DR. involution is a general-purpose neural primitive that is versatile for a spectrum of deep learning models on different vision tasks. involution bridges convolution and self-attention in design, while being more efficient and effective than convolution, simpler than self-attention in form.

Getting Started

This repository is fully built upon the OpenMMLab toolkits. For each individual task, the config and model files follow the same directory organization as mmcls, mmdet, and mmseg respectively, so just copy-and-paste them to the corresponding locations to get started.

For example, in terms of evaluating detectors

git clone https://github.com/open-mmlab/mmdetection # and install

cp det/mmdet/models/backbones/* mmdetection/mmdet/models/backbones
cp det/mmdet/models/necks/* mmdetection/mmdet/models/necks
cp det/mmdet/models/utils/* mmdetection/mmdet/models/utils

cp det/configs/_base_/models/* mmdetection/mmdet/configs/_base_/models
cp det/configs/_base_/schedules/* mmdetection/mmdet/configs/_base_/schedules
cp det/configs/involution mmdetection/mmdet/configs -r

cd mmdetection
# evaluate checkpoints
bash tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}]

For more detailed guidance, please refer to the original mmcls, mmdet, and mmseg tutorials.

Currently, we provide an memory-efficient implementation of the involuton operator based on CuPy. Please install this library in advance. A customized CUDA kernel would bring about further acceleration on the hardware. Any contribution from the community regarding this is welcomed!

Model Zoo

The parameters/FLOPs↓ and performance↑ compared to the convolution baselines are marked in the parentheses. Part of these checkpoints are obtained in our reimplementation runs, whose performance may show slight differences with those reported in our paper. Models are trained with 64 GPUs on ImageNet, 8 GPUs on COCO, and 4 GPUs on Cityscapes.

Image Classification on ImageNet

Model	Params(M)	FLOPs(G)	Top-1 (%)	Top-5 (%)	Config	Download
RedNet-26	9.23_(32.8%↓)	1.73_(29.2%↓)	75.96	93.19	config	model \| log
RedNet-38	12.39_(36.7%↓)	2.22_(31.3%↓)	77.48	93.57	config	model \| log
RedNet-50	15.54_(39.5%↓)	2.71_(34.1%↓)	78.35	94.13	config	model \| log
RedNet-101	25.65_(42.6%↓)	4.74_(40.5%↓)	78.92	94.35	config	model \| log
RedNet-152	33.99_(43.5%↓)	6.79_(41.4%↓)	79.12	94.38	config	model \| log

Before finetuning on the following downstream tasks, download the ImageNet pre-trained RedNet-50 weights and set the pretrained argument in det/configs/_base_/models/*.py or seg/configs/_base_/models/*.py to your local path.

Object Detection and Instance Segmentation on COCO

Faster R-CNN

Backbone	Neck	Style	Lr schd	Params(M)	FLOPs(G)	box AP	Config	Download
RedNet-50-FPN	convolution	pytorch	1x	31.6_(23.9%↓)	177.9_(14.1%↓)	39.5_(1.8↑)	config	model \| log
RedNet-50-FPN	involution	pytorch	1x	29.5_(28.9%↓)	135.0_(34.8%↓)	40.2_(2.5↑)	config	model \| log

Mask R-CNN

Backbone	Neck	Style	Lr schd	Params(M)	FLOPs(G)	box AP	mask AP	Config	Download
RedNet-50-FPN	convolution	pytorch	1x	34.2_(22.6%↓)	224.2_(11.5%↓)	39.9_(1.5↑)	35.7_(0.8↑)	config	model \| log
RedNet-50-FPN	involution	pytorch	1x	32.2_(27.1%↓)	181.3_(28.5%↓)	40.8_(2.4↑)	36.4_(1.3↑)	config	model \| log

RetinaNet

Backbone	Neck	Style	Lr schd	Params(M)	FLOPs(G)	box AP	Config	Download
RedNet-50-FPN	convolution	pytorch	1x	27.8_(26.3%↓)	210.1_(12.2%↓)	38.2_(1.6↑)	config	model \| log
RedNet-50-FPN	involution	pytorch	1x	26.3_(30.2%↓)	199.9_(16.5%↓)	38.2_(1.6↑)	config	model \| log

Semantic Segmentation on Cityscapes

Method	Backbone	Neck	Crop Size	Lr schd	Params(M)	FLOPs(G)	mIoU	Config	download
FPN	RedNet-50	convolution	512x1024	80000	18.5_(35.1%↓)	293.9_(19.0%↓)	78.0_(3.6↑)	config	model \| log
FPN	RedNet-50	involution	512x1024	80000	16.4_(42.5%↓)	205.2_(43.4%↓)	79.1_(4.7↑)	config	model \| log
UPerNet	RedNet-50	convolution	512x1024	80000	56.4_(15.1%↓)	1825.6_(3.6%↓)	80.6_(2.4↑)	config	model \| log

Citation

If you find our work useful in your research, please cite:

@InProceedings{Li_2021_CVPR,
author = {Li, Duo and Hu, Jie and Wang, Changhu and Li, Xiangtai and She, Qi and Zhu, Lei and Zhang, Tong and Chen, Qifeng},
title = {Involution: Inverting the Inherence of Convolution for Visual Recognition},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2021}
}

[CVPR 2021] Involution: Inverting the Inherence of Convolution for Visual Recognition, a brand new neural operator

Related tags

Overview

involution

Getting Started

Model Zoo

Image Classification on ImageNet

Object Detection and Instance Segmentation on COCO

Faster R-CNN

Mask R-CNN

RetinaNet

Semantic Segmentation on Cityscapes

Citation

Owner

Duo Li

The official implementation of the Interspeech 2021 paper WSRGlow: A Glow-based Waveform Generative Model for Audio Super-Resolution.

Near-Optimal Sparse Allreduce for Distributed Deep Learning (published in PPoPP'22)

General-purpose program synthesiser

Details about the wide minima density hypothesis and metrics to compute width of a minima

Federated Deep Reinforcement Learning for the Distributed Control of NextG Wireless Networks.

This repository contains the source code for the paper First Order Motion Model for Image Animation

Computational Methods Course at UdeA. Forked and size reduced from:

SimplEx - Explaining Latent Representations with a Corpus of Examples

最新版本yolov5+deepsort目标检测和追踪，支持5.0版本可训练自己数据集

A Self-Supervised Contrastive Learning Framework for Aspect Detection

Code for Two-stage Identifier: "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition"

Robot Servers and Server Manager software for robo-gym

This program can detect your face and add an Christams hat on the top of your head

Explaining in Style: Training a GAN to explain a classifier in StyleSpace

Empirical Study of Transformers for Source Code & A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code

A curated list of automated deep learning (including neural architecture search and hyper-parameter optimization) resources.

Exploring Visual Engagement Signals for Representation Learning

Pytorch implementation of the paper: "A Unified Framework for Separating Superimposed Images", in CVPR 2020.

Object-aware Contrastive Learning for Debiased Scene Representation

基于AlphaPose的TensorRT加速