Drone-based Joint Density Map Estimation, Localization and Tracking with Space-Time Multi-Scale Attention Network

Last update: Nov 16, 2022

Related tags

Overview

DroneCrowd

Paper Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark.

Introduction

This paper proposes a space-time multi-scale attention network (STANet) to solve density map estimation, localization and tracking in dense crowds of video clips captured by drones with arbitrary crowd density, perspective, and flight altitude. Our STANet method aggregates multi-scale feature maps in sequential frames to exploit the temporal coherency, and then predict the density maps, localize the targets, and associate them in crowds simultaneously. A coarse-to-fine process is designed to gradually apply the attention module on the aggregated multi-scale feature maps to enforce the network to exploit the discriminative space-time features for better performance. The whole network is trained in an end-to-end manner with the multi-task loss, formed by three terms, i.e., the density map loss, localization loss and association loss. The non-maximal suppression followed by the min-cost flow framework is used to generate the trajectories of targets' in scenarios. Since existing crowd counting datasets merely focus on crowd counting in static cameras rather than density map estimation, counting and tracking in crowds on drones, we have collected a new large-scale drone-based dataset, DroneCrowd, formed by 112 video clips with 33,600 high resolution frames (i.e., 1920x1080) captured in 70 different scenarios. With intensive amount of effort, our dataset provides 20,800 people trajectories with 4.8 million head annotations and several video-level attributes in sequences. Extensive experiments are conducted on two challenging public datasets, i.e., Shanghaitech and UCF-QNRF, and our DroneCrowd, to demonstrate that STANet achieves favorable performance against the state-of-the-arts.

Dataset

ECCV2020 Challenge

The VisDrone 2020 Crowd Counting Challenge requires participating algorithms to count persons in each frame. The challenge will provide 112 challenging sequences, including 82 video sequences for training (2,420 frames in total), and 30 sequences for testing (900 frames in total), which are available on the download page. We manually annotate persons with points in each video frame.

DroneCrowd (1.03 GB): BaiduYun(code: h0j8)| GoogleDrive

DroneCrowd (Full Version)

This full version consists of 112 video clips with 33,600 high resolution frames (i.e., 1920x1080) captured in 70 different scenarios. With intensive amount of effort, our dataset provides 20,800 people trajectories with 4.8 million head annotations and several video-level attributes in sequences.

DroneCrowd BaiduYun(code:ml1u)| GoogleDrive

Code

Space-Time Neighbor-Aware Network (STNNet-pytorch)

Space-Time Multi-Scale Attention Network (STANet-pytorch)

Citation

Please cite this paper if you want to use it in your work.

@inproceedings{dronecrowd_cvpr2021,
  author    = {Longyin Wen and
               Dawei Du and
               Pengfei Zhu and
               Qinghua Hu and
               Qilong Wang and
               Liefeng Bo and
               Siwei Lyu},
  title     = {Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark},
  booktitle = {CVPR},
  year      = {2021}
}

Drone-based Joint Density Map Estimation, Localization and Tracking with Space-Time Multi-Scale Attention Network

Related tags

Overview

DroneCrowd

Introduction

Dataset

ECCV2020 Challenge

DroneCrowd (Full Version)

Code

Citation

Owner

VisDrone

Inference code for "StylePeople: A Generative Model of Fullbody Human Avatars" paper. This code is for the part of the paper describing video-based avatars.

git git《Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking》(CVPR 2021) GitHub:git2] 《Masksembles for Uncertainty Estimation》(CVPR 2021) GitHub:git3]

A multi-functional library for full-stack Deep Learning. Simplifies Model Building, API development, and Model Deployment.

Deep learning library for solving differential equations and more

Official codebase for Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World

Code for the published paper : Learning to recognize rare traffic sign

Official Implementation of LARGE: Latent-Based Regression through GAN Semantics

AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages

In this project, we develop a face recognize platform based on MTCNN object-detection netcwork and FaceNet self-supervised network.

The repository is for safe reinforcement learning baselines.

Simple tool to combine(merge) onnx models. Simple Network Combine Tool for ONNX.

[ICCV 2021] A Simple Baseline for Semi-supervised Semantic Segmentation with Strong Data Augmentation

A GOOD REPRESENTATION DETECTS NOISY LABELS

Algorithmic trading using machine learning.

Spatio-Temporal Entropy Model (STEM) for end-to-end leaned video compression.

Deep Learning Interviews book: Hundreds of fully solved job interview questions from a wide range of key topics in AI.

Learning to Segment Instances in Videos with Spatial Propagation Network

An expansion for RDKit to read all types of files in one line

Sub-Cluster AdaCos: Learning Representations for Anomalous Sound Detection.

Predicting Event Memorability from Contextual Visual Semantics