The code for Expectation-Maximization Attention Networks for Semantic Segmentation (ICCV'2019 Oral)

Related tags

Deep LearningEMANet
Overview

EMANet

News

  • The bug in loading the pretrained model is now fixed. I have updated the .pth. To use it, download it again.
  • EMANet-101 gets 80.99 on the PASCAL VOC dataset (Thanks for Sensetimes' server). So, with a classic backbone(ResNet) instead of some newest ones(WideResNet, HRNet), EMANet still achieves the top performance.
  • EMANet-101 (OHEM) gets 81.14 in mIoU on Cityscapes val using single-scale inference, and 81.9 on test server with multi-scale inference.

Background

This repository is for Expectation-Maximization Attention Networks for Semantic Segmentation (to appear in ICCV 2019, Oral presentation),

by Xia Li, Zhisheng Zhong, Jianlong Wu, Yibo Yang, Zhouchen Lin and Hong Liu from Peking University.

The source code is now available!

citation

If you find EMANet useful in your research, please consider citing:

@inproceedings{li19,
    author={Xia Li and Zhisheng Zhong and Jianlong Wu and Yibo Yang and Zhouchen Lin and Hong Liu},
    title={Expectation-Maximization Attention Networks for Semantic Segmentation},
    booktitle={International Conference on Computer Vision},   
    year={2019},   
}

table of contents

Introduction

Self-attention mechanism has been widely used for various tasks. It is designed to compute the representation of each position by a weighted sum of the features at all positions. Thus, it can capture long-range relations for computer vision tasks. However, it is computationally consuming. Since the attention maps are computed w.r.t all other positions. In this paper, we formulate the attention mechanism into an expectation-maximization manner and iteratively estimate a much more compact set of bases upon which the attention maps are computed. By a weighted summation upon these bases, the resulting representation is low-rank and deprecates noisy information from the input. The proposed Expectation-Maximization Attention (EMA) module is robust to the variance of input and is also friendly in memory and computation. Moreover, we set up the bases maintenance and normalization methods to stabilize its training procedure. We conduct extensive experiments on popular semantic segmentation benchmarks including PASCAL VOC, PASCAL Context, and COCO Stuff, on which we set new records. EMA Unit

Design

As so many peers have starred at this repo, I feel the great pressure, and try to release the code with high quality. That's why I didn't release it until today (Aug, 22, 2018). It's known that the design of the code structure is not an easy thing. Different designs are suitable for different usage. Here, I aim at making research on Semantic Segmentation, especially on PASCAL VOC, more easier. So, I delete necessary encapsulation as much as possible, and leave over less than 10 python files. To be honest, the global variables in settings are not a good design for large project. But for research, it offers great flexibility. So, hope you can understand that

For research, I recommand seperatting each experiment with a folder. Each folder contains the whole project, and should be named as the experiment settings, such as 'EMANet101.moving_avg.l2norm.3stages'. Through this, you can keep tracks of all the experiments, and find their differences just by the 'diff' command.

Usage

  1. Install the libraries listed in the 'requirements.txt'
  2. Downloads images and labels of PASCAL VOC and SBD, decompress them together.
  3. Downloads the pretrained ResNet50 and ResNet101, unzip them, and put into the 'models' folder.
  4. Change the 'DATA_ROOT' in settings.py to where you place the dataset.
  5. Run sh clean.sh to clear the models and logs from the last experiment.
  6. Run python train.py for training and sh tensorboard.sh for visualization on your browser.
  7. Or you can download the pretraind model, put into the 'models' folder, and skip step 6.
  8. Run python eval.py for validation

Ablation Studies

The following results are referred from the paper. For this repo, it's not strange to get even higer performance. If so, I'd like you share it in the issue. By now, this repo only provides the SS inference. I may release the code for MS and Flip latter.

Tab 1. Detailed comparisons with Deeplabs. All results are achieved with the backbone ResNet-101 and output stride 8. The FLOPs and memory are computed with the input size 513×513. SS: Single scale input during test. MS: Multi-scale input. Flip: Adding left-right flipped input. EMANet (256) and EMANet (512) represent EMANet withthe number of input channels for EMA as 256 and 512, respectively.

Method SS MS+Flip FLOPs Memory Params
ResNet-101 - - 190.6G 2.603G 42.6M
DeeplabV3 78.51 79.77 +63.4G +66.0M +15.5M
DeeplabV3+ 79.35 80.57 +84.1G +99.3M +16.3M
PSANet 78.51 79.77 +56.3G +59.4M +18.5M
EMANet(256) 79.73 80.94 +21.1G +12.3M +4.87M
EMANet(512) 80.05 81.32 +43.1G +22.1M +10.0M

To be note, the majority overheads of EMANets come from the 3x3 convs before and after the EMA Module. As for the EMA Module itself, its computation is only 1/3 of a 3x3 conv's, and its parameter number is even smaller than a 1x1 conv.

Comparisons with SOTAs

Note that, for validation on the 'val' set, you just have to train 30k on the 'trainaug' set. But for test on the evaluation server, you should first pretrain on COCO, and then 30k on 'trainaug', and another 30k on the 'trainval' set.

Tab 2. Comparisons on the PASCAL VOC test dataset.

Method Backbone mIoU(%)
GCN ResNet-152 83.6
RefineNet ResNet-152 84.2
Wide ResNet WideResNet-38 84.9
PSPNet ResNet-101 85.4
DeeplabV3 ResNet-101 85.7
PSANet ResNet-101 85.7
EncNet ResNet-101 85.9
DFN ResNet-101 86.2
Exfuse ResNet-101 86.2
IDW-CNN ResNet-101 86.3
SDN DenseNet-161 86.6
DIS ResNet-101 86.8
EMANet101 ResNet-101 87.7
DeeplabV3+ Xception-65 87.8
Exfuse ResNeXt-131 87.9
MSCI ResNet-152 88.0
EMANet152 ResNet-152 88.2

Code Borrowed From

RESCAN

Pytorch-Encoding

Synchronized-BN

Pytorch Implementation of Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations

NANSY: Unofficial Pytorch Implementation of Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations Notice Papers' D

Dongho Choi 최동호 104 Dec 23, 2022
blind SQLIpy sebuah alat injeksi sql yang menggunakan waktu sql untuk mendapatkan sebuah server database.

blind SQLIpy Alat blind SQLIpy ini merupakan alat injeksi sql yang menggunakan metode time based blind sql injection metode tersebut membutuhkan waktu

Galih Anggoro Prasetya 4 Feb 24, 2022
TextureGAN in Pytorch

TextureGAN This code is our PyTorch implementation of TextureGAN [Project] [Arxiv] TextureGAN is a generative adversarial network conditioned on sketc

Patsorn 147 Dec 14, 2022
Implementation of Hierarchical Transformer Memory (HTM) for Pytorch

Hierarchical Transformer Memory (HTM) - Pytorch Implementation of Hierarchical Transformer Memory (HTM) for Pytorch. This Deepmind paper proposes a si

Phil Wang 63 Dec 29, 2022
Improved Fitness Optimization Landscapes for Sequence Design

ReLSO Improved Fitness Optimization Landscapes for Sequence Design Description Citation How to run Training models Original data source Description In

Krishnaswamy Lab 44 Dec 20, 2022
TensorFlow ROCm port

Documentation TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, a

ROCm Software Platform 622 Jan 09, 2023
Official PyTorch code for the paper: "Point-Based Modeling of Human Clothing" (ICCV 2021)

Point-Based Modeling of Human Clothing Paper | Project page | Video This is an official PyTorch code repository of the paper "Point-Based Modeling of

Visual Understanding Lab @ Samsung AI Center Moscow 64 Nov 22, 2022
Code for Multimodal Neural SLAM for Interactive Instruction Following

Code for Multimodal Neural SLAM for Interactive Instruction Following Code structure The code is adapted from E.T. and most training as well as data p

7 Dec 07, 2022
PyTorch implementation of Trust Region Policy Optimization

PyTorch implementation of TRPO Try my implementation of PPO (aka newer better variant of TRPO), unless you need to you TRPO for some specific reasons.

Ilya Kostrikov 366 Nov 15, 2022
Time-series-deep-learning - Developing Deep learning LSTM, BiLSTM models, and NeuralProphet for multi-step time-series forecasting of stock price.

Stock Price Prediction Using Deep Learning Univariate Time Series Predicting stock price using historical data of a company using Neural networks for

Abdultawwab Safarji 7 Nov 27, 2022
Repository relating to the CVPR21 paper TimeLens: Event-based Video Frame Interpolation

TimeLens: Event-based Video Frame Interpolation This repository is about the High Speed Event and RGB (HS-ERGB) dataset, used in the 2021 CVPR paper T

Robotics and Perception Group 544 Dec 19, 2022
Training Very Deep Neural Networks Without Skip-Connections

DiracNets v2 update (January 2018): The code was updated for DiracNets-v2 in which we removed NCReLU by adding per-channel a and b multipliers without

Sergey Zagoruyko 585 Oct 12, 2022
A computer vision pipeline to identify the "icons" in Christian paintings

Christian-Iconography A computer vision pipeline to identify the "icons" in Christian paintings. A bit about iconography. Iconography is related to id

Rishab Mudliar 3 Jul 30, 2022
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Semantic Segmentation.

Swin Transformer for Semantic Segmentation of satellite images This repo contains the supported code and configuration files to reproduce semantic seg

23 Oct 10, 2022
68 keypoint annotations for COFW test data

68 keypoint annotations for COFW test data This repository contains manually annotated 68 keypoints for COFW test data (original annotation of CFOW da

31 Dec 06, 2022
Simple-System-Convert--C--F - Simple System Convert With Python

Simple-System-Convert--C--F REQUIREMENTS Python version : 3 HOW TO USE Run the c

Jonathan Santos 2 Feb 16, 2022
Spatio-Temporal Entropy Model (STEM) for end-to-end leaned video compression.

Spatio-Temporal Entropy Model A Pytorch Reproduction of Spatio-Temporal Entropy Model (STEM) for end-to-end leaned video compression. More details can

16 Nov 28, 2022
Official implementation for ICDAR 2021 paper "Handwritten Mathematical Expression Recognition with Bidirectionally Trained Transformer"

Handwritten Mathematical Expression Recognition with Bidirectionally Trained Transformer Description Convert offline handwritten mathematical expressi

Wenqi Zhao 87 Dec 27, 2022
Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

Annoy Annoy (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings to search for points in space that are close to a given quer

Spotify 10.6k Jan 04, 2023
Curated list of awesome GAN applications and demo

gans-awesome-applications Curated list of awesome GAN applications and demonstrations. Note: General GAN papers targeting simple image generation such

Minchul Shin 4.5k Jan 07, 2023