Official implementation of TMANet.

Related tags

Deep LearningTMANet
Overview

Temporal Memory Attention for Video Semantic Segmentation, arxiv

PWC PWC

Introduction

We propose a Temporal Memory Attention Network (TMANet) to adaptively integrate the long-range temporal relations over the video sequence based on the self-attention mechanism without exhaustive optical flow prediction. Our method achieves new state-of-the-art performances on two challenging video semantic segmentation datasets, particularly 80.3% mIoU on Cityscapes and 76.5% mIoU on CamVid with ResNet-50. (Accepted by ICIP2021)

If this codebase is helpful for you, please consider give me a star ⭐ 😊 .

image

Updates

2021/1: TMANet training and evaluation code released.

2021/6: Update README.md:

  • adding some Camvid dataset download links;
  • update 'camvid_video_process.py' script.

Usage

  • Install mmseg

    • Please refer to mmsegmentation to get installation guide.
    • This repository is based on mmseg-0.7.0 and pytorch 1.6.0.
  • Clone the repository

    git clone https://github.com/wanghao9610/TMANet.git
    cd TMANet
    pip install -e .
  • Prepare the datasets

    • Download Cityscapes dataset and Camvid dataset.

    • For Camvid dataset, we need to extract frames from downloaded videos according to the following steps:

      • Download the raw video from here, in which I provide a google drive link to download.
      • Put the downloaded raw video(e.g. 0016E5.MXF, 0006R0.MXF, 0005VD.MXF, 01TP_extract.avi) to ./data/camvid/raw .
      • Download the extracted images and labels from here and split.txt file from here, untar the tar.gz file to ./data/camvid , and we will get two subdirs "./data/camvid/images" (stores the images with annotations), and "./data/camvid/labels" (stores the ground truth for semantic segmentation). Reference the following shell command:
        cd TMANet
        cd ./data/camvid
        wget https://drive.google.com/file/d/1FcVdteDSx0iJfQYX2bxov0w_j-6J7plz/view?usp=sharing
        # or first download on your PC then upload to your server.
        tar -xf camvid.tar.gz 
      • Generate image_sequence dir frame by frame from the raw videos. Reference the following shell command:
        cd TMANet
        python tools/convert_datasets/camvid_video_process.py
    • For Cityscapes dataset, we need to request the download link of 'leftImg8bit_sequence_trainvaltest.zip' from Cityscapes dataset official webpage.

    • The converted/downloaded datasets store on ./data/camvid and ./data/cityscapes path.

      File structure of video semantic segmentation dataset is as followed.

      β”œβ”€β”€ data                                              β”œβ”€β”€ data                              
      β”‚   β”œβ”€β”€ cityscapes                                    β”‚   β”œβ”€β”€ camvid                        
      β”‚   β”‚   β”œβ”€β”€ gtFine                                    β”‚   β”‚   β”œβ”€β”€ images                    
      β”‚   β”‚   β”‚   β”œβ”€β”€ train                                 β”‚   β”‚   β”‚   β”œβ”€β”€ xxx{img_suffix}       
      β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ xxx{img_suffix}                   β”‚   β”‚   β”‚   β”œβ”€β”€ yyy{img_suffix}       
      β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ yyy{img_suffix}                   β”‚   β”‚   β”‚   β”œβ”€β”€ zzz{img_suffix}       
      β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ zzz{img_suffix}                   β”‚   β”‚   β”œβ”€β”€ annotations               
      β”‚   β”‚   β”‚   β”œβ”€β”€ val                                   β”‚   β”‚   β”‚   β”œβ”€β”€ train.txt             
      β”‚   β”‚   β”œβ”€β”€ leftImg8bit                               β”‚   β”‚   β”‚   β”œβ”€β”€ val.txt               
      β”‚   β”‚   β”‚   β”œβ”€β”€ train                                 β”‚   β”‚   β”‚   β”œβ”€β”€ test.txt              
      β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ xxx{seg_map_suffix}               β”‚   β”‚   β”œβ”€β”€ labels                    
      β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ yyy{seg_map_suffix}               β”‚   β”‚   β”‚   β”œβ”€β”€ xxx{seg_map_suffix}   
      β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ zzz{seg_map_suffix}               β”‚   β”‚   β”‚   β”œβ”€β”€ yyy{seg_map_suffix}   
      β”‚   β”‚   β”‚   β”œβ”€β”€ val                                   β”‚   β”‚   β”‚   β”œβ”€β”€ zzz{seg_map_suffix}   
      β”‚   β”‚   β”œβ”€β”€ leftImg8bit_sequence                      β”‚   β”‚   β”œβ”€β”€ image_sequence            
      β”‚   β”‚   β”‚   β”œβ”€β”€ train                                 β”‚   β”‚   β”‚   β”œβ”€β”€ xxx{sequence_suffix}  
      β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ xxx{sequence_suffix}              β”‚   β”‚   β”‚   β”œβ”€β”€ yyy{sequence_suffix}  
      β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ yyy{sequence_suffix}              β”‚   β”‚   β”‚   β”œβ”€β”€ zzz{sequence_suffix}  
      β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ zzz{sequence_suffix}              
      β”‚   β”‚   β”‚   β”œβ”€β”€ val                                   
      
  • Evaluation

    • Download the trained models for Cityscapes and Camvid. And put them on ./work_dirs/{config_file}
    • Run the following command(on Cityscapes):
    sh eval.sh configs/video/cityscapes/tmanet_r50-d8_769x769_80k_cityscapes_video.py
  • Training

    • Please download the pretrained ResNet-50 model, and put it on ./init_models .
    • Run the following command(on Cityscapes):
    sh train.sh configs/video/cityscapes/tmanet_r50-d8_769x769_80k_cityscapes_video.py

    Note: the above evaluation and training shell commands execute on Cityscapes, if you want to execute evaluation or training on Camvid, please replace the config file on the shell command with the config file of Camvid.

Citation

If you find TMANet is useful in your research, please consider citing:

@misc{wang2021temporal,
    title={Temporal Memory Attention for Video Semantic Segmentation}, 
    author={Hao Wang and Weining Wang and Jing Liu},
    year={2021},
    eprint={2102.08643},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Acknowledgement

Thanks mmsegmentation contribution to the community!

Owner
wanghao
wanghao
Code for: https://berkeleyautomation.github.io/bags/

DeformableRavens Code for the paper Learning to Rearrange Deformable Cables, Fabrics, and Bags with Goal-Conditioned Transporter Networks. Here is the

Daniel Seita 121 Dec 30, 2022
Official implementation for "Low-light Image Enhancement via Breaking Down the Darkness"

Low-light Image Enhancement via Breaking Down the Darkness by Qiming Hu, Xiaojie Guo. 1. Dependencies Python3 PyTorch=1.0 OpenCV-Python, TensorboardX

Qiming Hu 30 Jan 01, 2023
Deep universal probabilistic programming with Python and PyTorch

Getting Started | Documentation | Community | Contributing Pyro is a flexible, scalable deep probabilistic programming library built on PyTorch. Notab

7.7k Dec 30, 2022
Fre-GAN: Adversarial Frequency-consistent Audio Synthesis

Fre-GAN Vocoder Fre-GAN: Adversarial Frequency-consistent Audio Synthesis Training: python train.py --config config.json Citation: @misc{kim2021frega

Rishikesh (ΰ€‹ΰ€·ΰ€Ώΰ€•ΰ₯‡ΰ€Ά) 93 Dec 17, 2022
ICLR2021 (Under Review)

Self-Supervised Time Series Representation Learning by Inter-Intra Relational Reasoning This repository contains the official PyTorch implementation o

Haoyi Fan 58 Dec 30, 2022
Full Resolution Residual Networks for Semantic Image Segmentation

Full-Resolution Residual Networks (FRRN) This repository contains code to train and qualitatively evaluate Full-Resolution Residual Networks (FRRNs) a

Toby Pohlen 274 Oct 27, 2022
How to Become More Salient? Surfacing Representation Biases of the Saliency Prediction Model

How to Become More Salient? Surfacing Representation Biases of the Saliency Prediction Model

Bogdan Kulynych 49 Nov 05, 2022
A simple implementation of Kalman filter in single object tracking

kalman-filter-in-single-object-tracking A simple implementation of Kalman filter in single object tracking https://www.bilibili.com/video/BV1Qf4y1J7D4

130 Dec 26, 2022
GNEE - GAT Neural Event Embeddings

GNEE - GAT Neural Event Embeddings This repository contains source code for the GNEE (GAT Neural Event Embeddings) method introduced in the paper: "Se

JoΓ£o Pedro Rodrigues Mattos 0 Sep 15, 2021
Semi-Supervised Semantic Segmentation via Adaptive Equalization Learning, NeurIPS 2021 (Spotlight)

Semi-Supervised Semantic Segmentation via Adaptive Equalization Learning, NeurIPS 2021 (Spotlight) Abstract Due to the limited and even imbalanced dat

Hanzhe Hu 99 Dec 12, 2022
This is the official code for the paper "Ad2Attack: Adaptive Adversarial Attack for Real-Time UAV Tracking".

Ad^2Attack:Adaptive Adversarial Attack on Real-Time UAV Tracking Demo video πŸ“Ή Our video on bilibili demonstrates the test results of Ad^2Attack on se

Intelligent Vision for Robotics in Complex Environment 10 Nov 07, 2022
[NeurIPS 2021] "Drawing Robust Scratch Tickets: Subnetworks with Inborn Robustness Are Found within Randomly Initialized Networks" by Yonggan Fu, Qixuan Yu, Yang Zhang, Shang Wu, Xu Ouyang, David Cox, Yingyan Lin

Drawing Robust Scratch Tickets: Subnetworks with Inborn Robustness Are Found within Randomly Initialized Networks Yonggan Fu, Qixuan Yu, Yang Zhang, S

12 Dec 11, 2022
Official Pytorch Implementation of 3DV2021 paper: SAFA: Structure Aware Face Animation.

SAFA: Structure Aware Face Animation (3DV2021) Official Pytorch Implementation of 3DV2021 paper: SAFA: Structure Aware Face Animation. Getting Started

QiulinW 122 Dec 23, 2022
Discord-Protect is a simple discord bot allowing you to have some security on your discord server by ordering a captcha to the user who joins your server.

Discord-Protect Discord-Protect is a simple discord bot allowing you to have some security on your discord server by ordering a captcha to the user wh

Tir Omar 2 Oct 28, 2021
Tracking Progress in Question Answering over Knowledge Graphs

Tracking Progress in Question Answering over Knowledge Graphs Table of contents Question Answering Systems with Descriptions The QA Systems Table cont

Knowledge Graph Question Answering 47 Jan 02, 2023
A dual benchmarking study of visual forgery and visual forensics techniques

A dual benchmarking study of facial forgery and facial forensics In recent years, visual forgery has reached a level of sophistication that humans can

8 Jul 06, 2022
DecoupledNet is semantic segmentation system which using heterogeneous annotations

DecoupledNet: Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation Created by Seunghoon Hong, Hyeonwoo Noh and Bohyung Han at POSTE

Hyeonwoo Noh 74 Sep 22, 2021
PCAM: Product of Cross-Attention Matrices for Rigid Registration of Point Clouds

PCAM: Product of Cross-Attention Matrices for Rigid Registration of Point Clouds PCAM: Product of Cross-Attention Matrices for Rigid Registration of P

valeo.ai 24 May 31, 2022
The official code for paper "R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling".

R2D2 This is the official code for paper titled "R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Mode

Alipay 49 Dec 17, 2022
TResNet: High Performance GPU-Dedicated Architecture

TResNet: High Performance GPU-Dedicated Architecture paperV2 | pretrained models Official PyTorch Implementation Tal Ridnik, Hussam Lawen, Asaf Noy, I

426 Dec 28, 2022