Official Implement of CVPR 2021 paper “Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting”

Last update: Dec 08, 2022

Related tags

Deep Learning RGBTCrowdCounting

Overview

RGBT Crowd Counting

Lingbo Liu, Jiaqi Chen, Hefeng Wu, Guanbin Li, Chenglong Li, Liang Lin. "Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting." IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021. [PDF]

Download RGBT-CC Dataset & Models: [Dropbox][BaiduYun (PW: RGBT)]

Our framework can be implemented with various backbone networks. You can refer to this page for implementing BL+IADM. Moreover, the proposed framework can also be applied to RGBD crowd counting and the implementation of CSRNet+IADM is available.

If you use this code and benchmark for your research, please cite our work:

@inproceedings{liu2021cross,
  title={Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting},
  author={Liu, Lingbo and Chen, Jiaqi and Wu, Hefeng and Li, Guanbin and Li, Chenglong and Lin, Liang},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition},
  year={2021}
}

Introduction

Crowd counting is a fundamental yet challenging task, which desires rich information to generate pixel-wise crowd density maps. In this work, we find that incorporating optical and thermal information can greatly help to recognize pedestrians. To promote future researches in this field, we introduce a large-scale RGBT Crowd Counting (RGBT-CC) benchmark, which contains 2,030 pairs of RGB-thermal images with 138,389 annotated people. Furthermore, to facilitate the multimodal crowd counting, we propose a cross-modal collaborative representation learning framework, which consists of multiple modality-specific branches, a modality-shared branch, and an Information Aggregation-Distribution Module (IADM) to capture the complementary information of different modalities fully. Specifically, our IADM incorporates two collaborative information transfers to dynamically enhance the modality-shared and modality-specific representations with a dual information propagation mechanism. Extensive experiments conducted on the RGBT-CC benchmark demonstrate the effectiveness of our framework for RGBT crowd counting.

RGBT-CC Benchmark

To promote the future research of this task, we propose a large-scale RGBT Crowd Counting (RGBT-CC) benchmark. Specifically, this benchmark consists of 2,030 pairs of 640x480 RGB-thermal images captured in various scenarios (e.g., malls, streets, playgrounds, train stations, metro stations, etc). Among these samples, 1,013 pairs are captured in the light and 1,017 pairs are in the darkness. A total of 138,389 pedestrians are marked with point annotations, on average 68 people per image. Finally, the proposed RGBT-CC benchmark is randomly divided into three parts: 1030 pairs are used for training, 200 pairs are for validation and 800 pairs are for testing. Compared with those Internet-based datasets with serious bias, our RGBT-CC dataset has closer crowd density distribution to realistic cities, since our images are captured in urban scenes with various densities. Therefore, our dataset has wider applications for urban crowd analysis.

Method

The proposed RGBT crowd counting framework is composed of three parallel backbones and an Information Aggregation-Distribution Module (IADM). Specifically, the top and bottom backbones are developed for modality-specific (i.e. RGB images and thermal images) representation learning, while the middle backbone is designed for modality-shared representation learning. To fully exploit the multimodal complementarities, our IADM dynamically transfers the specific-shared information to collaboratively enhance the modality-specific and modality-shared representations. Consequently, the final modality-shared feature contains comprehensive information and facilitates generating high-quality crowd density maps.

Experiments

More References

Crowd Counting with Deep Structured Scale Integration Network, ICCV 2019 [PDF]

Crowd Counting using Deep Recurrent Spatial-Aware Network, IJCAI 2018 [PDF]

Efficient Crowd Counting via Structured Knowledge Transfer, ACM MM 2020 [PDF]

Official Implement of CVPR 2021 paper “Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting”

Related tags

Overview

RGBT Crowd Counting

Introduction

RGBT-CC Benchmark

Method

Experiments

More References

Owner

HyperCube: Implicit Field Representations of Voxelized 3D Models

Here is the diagnostic tool for BMVC 2021 paper Diagnosing Errors in Video Relation Detectors.

DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers

The mini-AlphaStar (mini-AS, or mAS) - mini-scale version (non-official) of the AlphaStar (AS)

The code for the NeurIPS 2021 paper "A Unified View of cGANs with and without Classifiers".

PyTorch implementation of the paper:A Convolutional Approach to Melody Line Identification in Symbolic Scores.

Graph InfoClust: Leveraging cluster-level node information for unsupervised graph representation learning

验证码识别深度学习 tensorflow 神经网络

Official implementation for the paper: Multi-label Classification with Partial Annotations using Class-aware Selective Loss

An end-to-end implementation of intent prediction with Metaflow and other cool tools

SuperSonic, a new open-source framework to allow compiler developers to integrate RL into compilers easily, regardless of their RL expertise

《A-CNN: Annularly Convolutional Neural Networks on Point Clouds》(2019)

History Aware Multimodal Transformer for Vision-and-Language Navigation

Scales, Chords, and Cadences: Practical Music Theory for MIR Researchers

DIP-football - A football video analyse system based on Yolov5, alphapose, Qt6

PolyGlot, a fuzzing framework for language processors

SlotRefine: A Fast Non-Autoregressive Model forJoint Intent Detection and Slot Filling

Trainable PyTorch reproduction of AlphaFold 2

MiraiML: asynchronous, autonomous and continuous Machine Learning in Python

Code for models used in Bashiri et al., "A Flow-based latent state generative model of neural population responses to natural images".

Official Implement of CVPR 2021 paper “Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting”

Related tags

Overview

RGBT Crowd Counting

Introduction

RGBT-CC Benchmark

Method

Experiments

More References

Owner

HyperCube: Implicit Field Representations of Voxelized 3D Models

Here is the diagnostic tool for BMVC 2021 paper Diagnosing Errors in Video Relation Detectors.

DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers

The mini-AlphaStar (mini-AS, or mAS) - mini-scale version (non-official) of the AlphaStar (AS)

The code for the NeurIPS 2021 paper "A Unified View of cGANs with and without Classifiers".

PyTorch implementation of the paper:A Convolutional Approach to Melody Line Identification in Symbolic Scores.

Graph InfoClust: Leveraging cluster-level node information for unsupervised graph representation learning

验证码识别 深度学习 tensorflow 神经网络

Official implementation for the paper: Multi-label Classification with Partial Annotations using Class-aware Selective Loss

An end-to-end implementation of intent prediction with Metaflow and other cool tools

SuperSonic, a new open-source framework to allow compiler developers to integrate RL into compilers easily, regardless of their RL expertise

《A-CNN: Annularly Convolutional Neural Networks on Point Clouds》(2019)

History Aware Multimodal Transformer for Vision-and-Language Navigation

Scales, Chords, and Cadences: Practical Music Theory for MIR Researchers

DIP-football - A football video analyse system based on Yolov5, alphapose, Qt6

PolyGlot, a fuzzing framework for language processors

SlotRefine: A Fast Non-Autoregressive Model forJoint Intent Detection and Slot Filling

Trainable PyTorch reproduction of AlphaFold 2

MiraiML: asynchronous, autonomous and continuous Machine Learning in Python

Code for models used in Bashiri et al., "A Flow-based latent state generative model of neural population responses to natural images".

验证码识别深度学习 tensorflow 神经网络