[ACM MM 2021] Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation)

Last update: Dec 13, 2022

Related tags

Overview

Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation) [arXiv] [paper]

@inproceedings{hou2021multiview,
  title={Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation)},
  author={Hou, Yunzhong and Zheng, Liang},
  booktitle={Proceedings of the 29th ACM International Conference on Multimedia (MM ’21)},
  year={2021}
}

Overview

We release the PyTorch code for MVDeTr, a state-of-the-art multiview pedestrian detector. Its superior performance should be credited to transformer architectures, updated loss terms, and view-coherent data augmentations. Moreover, MVDeTr is also very efficient and can be trained on a single RTX 2080TI. This repo also includes a simplified version of MVDet, which also runs on a single RTX 2080TI.

MVDeTr Code

This repo is dedicated to the code for MVDeTr.

Dependencies

This code uses the following libraries

python
pytorch & tochvision
numpy
matplotlib
pillow
opencv-python
kornia

Data Preparation

By default, all datasets are in ~/Data/. We use MultiviewX and Wildtrack in this project.

Your ~/Data/ folder should look like this

Data
├── MultiviewX/
│   └── ...
└── Wildtrack/ 
    └── ...

Code Preparation

Before running the code, one should go to multiview_detector/models/ops and run bash mask.sh to build the deformable transformer (forked from Deformable DETR).

Training

In order to train classifiers, please run the following,

python main.py -d wildtrack
python main.py -d multiviewx

This should automatically return evaluation results similar to the reported 91.5% MODA on Wildtrack dataset and 93.7% MODA on MultiviewX dataset.

Architectures

This repo supports multiple architecture variants. For MVDeTr, please specify --world_feat deform_trans; for a similar fully convolutional architecture like MVDet, please specify --world_feat conv.

Loss terms

This repo supports multiple loss terms. For the focal loss variant as in MVDeTr, please specify --use_mse 0; for the MSE loss as in MVDet, please specify ----use_mse 1.

Augmentations

This repo includes support for view coherent data augmentation, which applies affine transformations onto the per-view inputs, and then invert the per-view feature maps to maintain multiview coherency.

Pre-trained models

You can download the checkpoints at this link.

[ACM MM 2021] Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation)

Related tags

Overview

Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation) [arXiv] [paper]

Overview

Content

MVDeTr Code

Dependencies

Data Preparation

Code Preparation

Training

Architectures

Loss terms

Augmentations

Pre-trained models

Owner

Yunzhong Hou

Active learning for Mask R-CNN in Detectron2

Bootstrapped Representation Learning on Graphs

PyTorch implementations of Generative Adversarial Networks.

Data visualization app for H&M competition in kaggle

Wide Residual Networks (WideResNets) in PyTorch

3DMV jointly combines RGB color and geometric information to perform 3D semantic segmentation of RGB-D scans.

AWS provides a Python SDK, "Boto3" ,which can be used to access the AWS-account from the local.

The project page of paper: Architecture disentanglement for deep neural networks [ICCV 2021, oral]

Here is the diagnostic tool for BMVC 2021 paper Diagnosing Errors in Video Relation Detectors.

Retinal vessel segmentation based on GT-UNet

Official Implementation of "Tracking Grow-Finish Pigs Across Large Pens Using Multiple Cameras"

Chinese Advertisement Board Identification(Pytorch)

Neuralnetwork - Basic Multilayer Perceptron Neural Network for deep learning

Stock-history-display - something like a easy yearly review for your stock performance

DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers

Pytorch Implementation of DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis (TTS Extension)

Towards Improving Embedding Based Models of Social Network Alignment via Pseudo Anchors

A Lighting Pytorch Framework for Recommendation System, Easy-to-use and Easy-to-extend.

Official Implementation for HyperStyle: StyleGAN Inversion with HyperNetworks for Real Image Editing

Code for the paper "Unsupervised Contrastive Learning of Sound Event Representations", ICASSP 2021.