Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Last update: Dec 15, 2022

Overview

Learning Pixel-level Semantic Affinity with Image-level Supervision

This code is deprecated. Please see https://github.com/jiwoon-ahn/irn instead.

Introduction

The code and trained models of:

Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, Jiwoon Ahn and Suha Kwak, CVPR 2018 [Paper]

We have developed a framework based on AffinityNet to generate accurate segmentation labels of training images given their image-level class labels only. A segmentation network learned with our synthesized labels outperforms previous state-of-the-arts by large margins on the PASCAL VOC 2012.

*Our code was first implemented in Tensorflow at the time of CVPR 2018 submssion, and later we migrated to PyTorch. Some trivial details (optimizer, channel size, and etc.) have been changed.

Citation

If you find the code useful, please consider citing our paper using the following BibTeX entry.

@InProceedings{Ahn_2018_CVPR,
author = {Ahn, Jiwoon and Kwak, Suha},
title = {Learning Pixel-Level Semantic Affinity With Image-Level Supervision for Weakly Supervised Semantic Segmentation},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2018}
}

Prerequisite

Tested on Ubuntu 16.04, with Python 3.5, PyTorch 0.4, Torchvision 0.2.1, CUDA 9.0, and 1x NVIDIA TITAN X (Pascal).
The PASCAL VOC 2012 development kit: You also need to specify the path ('voc12_root') of your downloaded dev kit.
(Optional) If you want to try with the VGG-16 based network, PyCaffe and VGG-16 ImageNet pretrained weights [vgg16_20M.caffemodel]
(Optional) If you want to try with the ResNet-38 based network, Mxnet and ResNet-38 pretrained weights [ilsvrc-cls_rna-a1_cls1000_ep-0001.params]

Usage

1. Train a classification network to get CAMs.

python3 train_cls.py --lr 0.1 --batch_size 16 --max_epoches 15 --crop_size 448 --network [network.vgg16_cls | network.resnet38_cls] --voc12_root [your_voc12_root_folder] --weights [your_weights_file] --wt_dec 5e-4

2. Generate labels for AffinityNet by applying dCRF on CAMs.

python3 infer_cls.py --infer_list voc12/train_aug.txt --voc12_root [your_voc12_root_folder] --network [network.vgg16_cls | network.resnet38_cls] --weights [your_weights_file] --out_cam [desired_folder] --out_la_crf [desired_folder] --out_ha_crf [desired_folder]

(Optional) Check the accuracy of CAMs.

python3 infer_cls.py --infer_list voc12/val.txt --voc12_root [your_voc12_root_folder] --network network.resnet38_cls --weights res38_cls.pth --out_cam_pred [desired_folder]

3. Train AffinityNet with the labels

python3 train_aff.py --lr 0.1 --batch_size 8 --max_epoches 8 --crop_size 448 --voc12_root [your_voc12_root_folder] --network [network.vgg16_aff | network.resnet38_aff] --weights [your_weights_file] --wt_dec 5e-4 --la_crf_dir [your_output_folder] --ha_crf_dir [your_output_folder]

4. Perform Random Walks on CAMs

python3 infer_aff.py --infer_list [voc12/val.txt | voc12/train.txt] --voc12_root [your_voc12_root_folder] --network [network.vgg16_aff | network.resnet38_aff] --weights [your_weights_file] --cam_dir [your_output_folder] --out_rw [desired_folder]

Results and Trained Models

Class Activation Map

Model	Train (mIoU)	Val (mIoU)
VGG-16	48.9	46.6	[Weights]
ResNet-38	47.7	47.2	[Weights]
ResNet-38	48.0	46.8	CVPR submission

Random Walk with AffinityNet

Model	alpha	Train (mIoU)	Val (mIoU)
VGG-16	4/16/32	59.6	54.0	[Weights]
ResNet-38	4/16/32	61.0	60.2	[Weights]
ResNet-38	4/16/24	58.1	57.0	CVPR submission

*beta=8, gamma=5, t=256 for all settings

Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Related tags

Overview

Learning Pixel-level Semantic Affinity with Image-level Supervision

Introduction

Citation

Prerequisite

Usage

1. Train a classification network to get CAMs.

2. Generate labels for AffinityNet by applying dCRF on CAMs.

(Optional) Check the accuracy of CAMs.

3. Train AffinityNet with the labels

4. Perform Random Walks on CAMs

Results and Trained Models

Class Activation Map

Random Walk with AffinityNet

Owner

Jiwoon Ahn

So-ViT: Mind Visual Tokens for Vision Transformer

This Jupyter notebook shows one way to implement a simple first-order low-pass filter on sampled data in discrete time.

Multi-Task Deep Neural Networks for Natural Language Understanding

PyTorch implementation of Munchausen Reinforcement Learning based on DQN and SAC. Handles discrete and continuous action spaces

Demonstrates how to divide a DL model into multiple IR model files (division) and introduce a simplest way to implement a custom layer works with OpenVINO IR models.

A Lighting Pytorch Framework for Recommendation System, Easy-to-use and Easy-to-extend.

An MQA (Studio, originalSampleRate) identifier for lossless flac files written in Python.

Mixed Transformer UNet for Medical Image Segmentation

Efficient 6-DoF Grasp Generation in Cluttered Scenes

A machine learning benchmark of in-the-wild distribution shifts, with data loaders, evaluators, and default models.

EFENet: Reference-based Video Super-Resolution with Enhanced Flow Estimation

PyTorch implementation of the ACL, 2021 paper Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks.

MARE - Multi-Attribute Relation Extraction

A Confidence-based Iterative Solver of Depths and Surface Normals for Deep Multi-view Stereo

Pytorch implementation of "A simple neural network module for relational reasoning" (Relational Networks)

🔅 Shapash makes Machine Learning models transparent and understandable by everyone

Cross-Document Coreference Resolution

FCOSR: A Simple Anchor-free Rotated Detector for Aerial Object Detection

[ICCV2021] Official code for "Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition"

DeceFL: A Principled Decentralized Federated Learning Framework