Code for our CVPR2021 paper coordinate attention

Last update: Jan 05, 2023

Related tags

Overview

Coordinate Attention for Efficient Mobile Network Design (preprint)

This repository is a PyTorch implementation of our coordinate attention (will appear in CVPR2021).

Our coordinate attention can be easily plugged into any classic building blocks as a feature representation augmentation tool. Here (pytorch-image-models) is a code base that you might want to train a classification model on ImageNet.

Note that the results reported in the paper are based on regular training setting (200 training epochs, random crop, and cosine learning schedule) without using extra label smoothing, random augmentation, random erasing, mixup. For specific numbers in ImageNet classification, COCO object detection, and semantic segmentation, please refer to our paper.

Comparison to Squeeze-and-Excitation block and CBAM

(a) Squeeze-and-Excitation block (b) CBAM (C) Coordinate attention block

How to plug the proposed CA block in the inverted residual block and the sandglass block

(a) MobileNetV2 (b) MobileNeXt

Some tips for designing lightweight attention blocks

SiLU activation (h_swish in the code) works better than ReLU6
Either horizontal or vertical direction attention performs the same to the SE attention
When applied to MobileNeXt, adding the attention block after the first depthwise 3x3 convolution works better
Note sure whether the results would be better if a softmax is applied between the horizontal and vertical features

Object detection

We use this repo (ssdlite-pytorch-mobilenext).

Semantic segmentation

We use this repo. You can also refer to mmsegmentation alternatively.

Citation

You may want to cite:

@inproceedings{hou2021coordinate,
  title={Coordinate Attention for Efficient Mobile Network Design},
  author={Hou, Qibin and Zhou, Daquan and Feng, Jiashi},
  booktitle={CVPR},
  year={2021}
}

@inproceedings{sandler2018mobilenetv2,
  title={Mobilenetv2: Inverted residuals and linear bottlenecks},
  author={Sandler, Mark and Howard, Andrew and Zhu, Menglong and Zhmoginov, Andrey and Chen, Liang-Chieh},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  pages={4510--4520},
  year={2018}
}

@inproceedings{zhou2020rethinking,
  title={Rethinking bottleneck structure for efficient mobile network design},
  author={Zhou, Daquan and Hou, Qibin and Chen, Yunpeng and Feng, Jiashi and Yan, Shuicheng}
  booktitle={ECCV},
  year={2020}
}

@inproceedings{hu2018squeeze,
  title={Squeeze-and-excitation networks},
  author={Hu, Jie and Shen, Li and Sun, Gang},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  pages={7132--7141},
  year={2018}
}

@inproceedings{woo2018cbam,
  title={Cbam: Convolutional block attention module},
  author={Woo, Sanghyun and Park, Jongchan and Lee, Joon-Young and Kweon, In So},
  booktitle={Proceedings of the European conference on computer vision (ECCV)},
  pages={3--19},
  year={2018}
}

Code for our CVPR2021 paper coordinate attention

Related tags

Overview

Coordinate Attention for Efficient Mobile Network Design (preprint)

Comparison to Squeeze-and-Excitation block and CBAM

How to plug the proposed CA block in the inverted residual block and the sandglass block

Some tips for designing lightweight attention blocks

Object detection

Semantic segmentation

Citation

Owner

Qibin (Andrew) Hou

This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models are Pix2Pix, Pix2PixHD, CycleGAN and PointWise.

Constrained Language Models Yield Few-Shot Semantic Parsers

Python implementation of "Multi-Instance Pose Networks: Rethinking Top-Down Pose Estimation"

DL & CV-based indicator toolset for the vehicle drivers via live dash-cam footage.

Everything about being a TA for ITP/AP course!

Repository for "Toward Practical Monocular Indoor Depth Estimation" (CVPR 2022)

Code for paper "Context-self contrastive pretraining for crop type semantic segmentation"

PG2Net: Personalized and Group PreferenceGuided Network for Next Place Prediction

Implementations for the ICLR-2021 paper: SEED: Self-supervised Distillation For Visual Representation.

Official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models.

Additional code for Stable-baselines3 to load and upload models from the Hub.

A high-performance anchor-free YOLO. Exceeding yolov3~v5 with ONNX, TensorRT, NCNN, and Openvino supported.

A python script to convert images to animated sus among us crewmate twerk jifs as seen on r/196

Keras implementation of Normalizer-Free Networks and SGD - Adaptive Gradient Clipping

Example Of Fine-Tuning BERT For Named-Entity Recognition Task And Preparing For Cloud Deployment Using Flask, React, And Docker

Extracting and filtering paraphrases by bridging natural language inference and paraphrasing

Computer Vision Paper Reviews with Key Summary of paper, End to End Code Practice and Jupyter Notebook converted papers

Code repository for the paper "Tracking People with 3D Representations"

Reverse engineering Rosetta 2 in M1 Mac

KIDA: Knowledge Inheritance in Data Aggregation