Joint detection and tracking model named DEFT, or ``Detection Embeddings for Tracking.

Overview

DEFT

DEFT: Detection Embeddings for Tracking

DEFT: Detection Embeddings for Tracking,
Mohamed Chaabane, Peter Zhang, J. Ross Beveridge, Stephen O'Hara
arXiv technical report (arXiv 2102.02267)

@article{Chaabane2021deft,
  title={DEFT: Detection Embeddings for Tracking},
  author={Chaabane, Mohamed and Zhang, Peter and Beveridge, Ross and O'Hara, Stephen},
  journal={arXiv preprint arXiv:2102.02267},
  year={2021}
}

Contact: [email protected]. Any questions or discussion are welcome!

Abstract

Most modern multiple object tracking (MOT) systems follow the tracking-by-detection paradigm, consisting of a detector followed by a method for associating detections into tracks. There is a long history in tracking of combining motion and appearance features to provide robustness to occlusions and other challenges, but typically this comes with the trade-off of a more complex and slower implementation. Recent successes on popular 2D tracking benchmarks indicate that top-scores can be achieved using a state-of-the-art detector and relatively simple associations relying on single-frame spatial offsets -- notably outperforming contemporary methods that leverage learned appearance features to help re-identify lost tracks. In this paper, we propose an efficient joint detection and tracking model named DEFT, or Detection Embeddings for Tracking. Our approach relies on an appearance-based object matching network jointly-learned with an underlying object detection network. An LSTM is also added to capture motion constraints. DEFT has comparable accuracy and speed to the top methods on 2D online tracking leaderboards while having significant advantages in robustness when applied to more challenging tracking data. DEFT raises the bar on the nuScenes monocular 3D tracking challenge, more than doubling the performance of the previous top method.

Video examples on benchmarks test sets

Tracking performance

Results on MOT challenge test set

Dataset MOTA MOTP IDF1 IDS
MOT16 (Public) 61.7 78.3 60.2 768
MOT16 (Private) 68.03 78.71 66.39 925
MOT17 (Public) 60.4 78.1 59.7 2581
MOT17 (Private) 66.6 78.83 65.42 2823

The results are obtained on the MOT challenge evaluation server.

Results on 2D Vehicle Tracking on KITTI test set

Dataset MOTA MOTP MT ML IDS
KITTI 88.95 84.55 84.77 1.85 343

Tthe results are obtained on the KITTI challenge evaluation server.

Results on 3D Tracking on nuScenes test set

Dataset AMOTA MOTAR MOTA
nuScenes 17.7 48.4 15.6

Tthe results are obtained on the nuScenes challenge evaluation server.

Installation

  • Clone this repo, and run the following commands.
  • create a new conda environment and activate the environment.
git clone [email protected]:MedChaabane/DEFT.git
cd DEFT
conda create -y -n DEFT python=3.7
conda activate DEFT
  • Install PyTorch and the dependencies.
conda install -y pytorch==1.2.0 torchvision==0.4.0 cudatoolkit=10.0 -c pytorch
pip install -r requirements.txt  
pip install cython; pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
cd src/lib/model/networks/
git clone https://github.com/CharlesShang/DCNv2
cd DCNv2
./make.sh

Datsets Preparation

We use similar datasets preparation like in CenterTrack framework

MOT 2017

  • Run the dataset preprocessing script.
cd src/tools/
sh get_mot_17.sh
  • The output data structure should be:
  ${DEFT_ROOT}
  |-- data
  `-- |-- mot17
      `-- |--- train
          |   |--- MOT17-02-FRCNN
          |   |    |--- img1
          |   |    |--- gt
          |   |    |   |--- gt.txt
          |   |    |   |--- gt_train_half.txt
          |   |    |   |--- gt_val_half.txt
          |   |    |--- det
          |   |    |   |--- det.txt
          |   |    |   |--- det_train_half.txt
          |   |    |   |--- det_val_half.txt
          |   |--- ...
          |--- test
          |   |--- MOT17-01-FRCNN
          |---|--- ...
          `---| annotations
              |--- train_half.json
              |--- val_half.json
              |--- train.json
              `--- test.json

KITTI Tracking

  ${DEFT_ROOT}
  |-- data
  `-- |-- kitti_tracking
      `-- |-- data_tracking_image_2
          |   |-- training
          |   |-- |-- image_02
          |   |-- |-- |-- 0000
          |   |-- |-- |-- ...
          |-- |-- testing
          |-- label_02
          |   |-- 0000.txt
          |   |-- ...
          `-- data_tracking_calib
  • Run the dataset preprocessing script.
cd src/tools/
sh get_kitti_tracking.sh
  • The resulting data structure should look like:
  ${DEFT_ROOT}
  |-- data
  `-- |-- kitti_tracking
      `-- |-- data_tracking_image_2
          |   |-- training
          |   |   |-- image_02
          |   |   |   |-- 0000
          |   |   |   |-- ...
          |-- |-- testing
          |-- label_02
          |   |-- 0000.txt
          |   |-- ...
          |-- data_tracking_calib
          |-- label_02_val_half
          |   |-- 0000.txt
          |   |-- ...
          |-- label_02_train_half
          |   |-- 0000.txt
          |   |-- ...
          `-- annotations
              |-- tracking_train.json
              |-- tracking_test.json
              |-- tracking_train_half.json
              `-- tracking_val_half.json

nuScenes Tracking

  • Download the dataset from nuScenes website. You only need to download the "Keyframe blobs", and only need the images data. You also need to download the maps and all metadata.
  • Unzip, rename, and place the data as below. You will need to merge folders from different zip files.
 ${DEFT_ROOT}
  |-- data
  `-- |-- nuscenes
      `-- |-- v1.0-trainval
          |   |-- samples
          |   |   |-- CAM_BACK
          |   |   |   | -- xxx.jpg
          |   |   |-- CAM_BACK_LEFT
          |   |   |-- CAM_BACK_RIGHT
          |   |   |-- CAM_FRONT
          |   |   |-- CAM_FRONT_LEFT
          |   |   |-- CAM_FRONT_RIGHT
          |-- |-- maps
          `-- |-- v1.0-trainval_meta
  • Run the dataset preprocessing script.
cd src/tools/
convert_nuScenes.py

References

Please cite the corresponding References if you use the datasets.

  @article{MOT16,
    title = {{MOT}16: {A} Benchmark for Multi-Object Tracking},
    shorttitle = {MOT16},
    url = {http://arxiv.org/abs/1603.00831},
    journal = {arXiv:1603.00831 [cs]},
    author = {Milan, A. and Leal-Taix\'{e}, L. and Reid, I. and Roth, S. and Schindler, K.},
    month = mar,
    year = {2016},
    note = {arXiv: 1603.00831},
    keywords = {Computer Science - Computer Vision and Pattern Recognition}
  }


  @INPROCEEDINGS{Geiger2012CVPR,
    author = {Andreas Geiger and Philip Lenz and Raquel Urtasun},
    title = {Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite},
    booktitle = {CVPR},
    year = {2012}
  }


  @inproceedings{nuscenes2019,
  title={{nuScenes}: A multimodal dataset for autonomous driving},
  author={Holger Caesar and Varun Bankiti and Alex H. Lang and Sourabh Vora and Venice Erin Liong and Qiang Xu and Anush Krishnan and Yu Pan and Giancarlo Baldan and Oscar Beijbom},
  booktitle={CVPR},
  year={2020}
  }

Training and Evaluation Experiments

Scripts for training and evaluating DEFT on MOT, KITTI and nuScenes are available in the experiments folder. The outputs videos and results (same as submission format) will be on the folders $dataset_name$_videos and $dataset_name$_results.

Acknowledgement

A large portion of code is borrowed from xingyizhou/CenterTrack, shijieS/SST and Zhongdao/Towards-Realtime-MOT, many thanks to their wonderful work!

Owner
Mohamed Chaabane
PhD Student, Computer Science @ Colorado State University with a deep interest in Deep learning, Machine Learning and Computer Vision.
Mohamed Chaabane
High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently.

TL;DR Ignite is a high-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently. Click on the image to

4.2k Jan 01, 2023
Implementation for the "Surface Reconstruction from 3D Line Segments" paper.

Surface Reconstruction from 3D Line Segments Surface reconstruction from 3d line segments. Langlois, P. A., Boulch, A., & Marlet, R. In 2019 Internati

85 Jan 04, 2023
Ladder Variational Autoencoders (LVAE) in PyTorch

Ladder Variational Autoencoders (LVAE) PyTorch implementation of Ladder Variational Autoencoders (LVAE) [1]: where the variational distributions q at

Andrea Dittadi 63 Dec 22, 2022
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

Introduction English | 简体中文 MMAction2 is an open-source toolbox for video understanding based on PyTorch. It is a part of the OpenMMLab project. The m

OpenMMLab 2.7k Jan 07, 2023
GenshinMapAutoMarkTools - Tools To add/delete/refresh resources mark in Genshin Impact Map

使用说明 适配 windows7以上 64位 原神1920x1080窗口(其他分辨率后续适配) 待更新渊下宫 English version is to be

Zero_Circle 209 Dec 28, 2022
XtremeDistil framework for distilling/compressing massive multilingual neural network models to tiny and efficient models for AI at scale

XtremeDistilTransformers for Distilling Massive Multilingual Neural Networks ACL 2020 Microsoft Research [Paper] [Video] Releasing [XtremeDistilTransf

Microsoft 125 Jan 04, 2023
Summary Explorer is a tool to visually explore the state-of-the-art in text summarization.

Summary Explorer Summary Explorer is a tool to visually inspect the summaries from several state-of-the-art neural summarization models across multipl

Webis 42 Aug 14, 2022
Semi-Supervised Learning for Fine-Grained Classification

Semi-Supervised Learning for Fine-Grained Classification This repo contains the code of: A Realistic Evaluation of Semi-Supervised Learning for Fine-G

25 Nov 08, 2022
Code for reproducing key results in the paper "InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets"

Status: Archive (code is provided as-is, no updates expected) InfoGAN Code for reproducing key results in the paper InfoGAN: Interpretable Representat

OpenAI 1k Dec 19, 2022
Effect of Different Encodings and Distance Functions on Quantum Instance-based Classifiers

Effect of Different Encodings and Distance Functions on Quantum Instance-based Classifiers The repository contains the code to reproduce the experimen

Alessandro Berti 4 Aug 24, 2022
An Easy-to-use, Modular and Prolongable package of deep-learning based Named Entity Recognition Models.

DeepNER An Easy-to-use, Modular and Prolongable package of deep-learning based Named Entity Recognition Models. This repository contains complex Deep

Derrick 9 May 30, 2022
Improving Deep Network Debuggability via Sparse Decision Layers

Improving Deep Network Debuggability via Sparse Decision Layers This repository contains the code for our paper: Leveraging Sparse Linear Layers for D

Madry Lab 35 Nov 14, 2022
Hack Camera, Microphone, Location, Clipboard With Just a Link. Also, Get Many Details About Victim's Device. And So On...

An Automated Tool to Hack Victim's Camera, Microphone, Location, Clipboard. Has 2 Extra Features. Version 1.1 Update Fixed Some Major Bugs Data Saving

ToxicNoob 36 Jan 07, 2023
RoBERTa Marathi Language model trained from scratch during huggingface 🤗 x flax community week

RoBERTa base model for Marathi Language (मराठी भाषा) Pretrained model on Marathi language using a masked language modeling (MLM) objective. RoBERTa wa

Nipun Sadvilkar 23 Oct 19, 2022
A project that uses optical flow and machine learning to detect aimhacking in video clips.

waldo-anticheat A project that aims to use optical flow and machine learning to visually detect cheating or hacking in video clips from fps games. Che

waldo.vision 542 Dec 03, 2022
A multi-functional library for full-stack Deep Learning. Simplifies Model Building, API development, and Model Deployment.

chitra What is chitra? chitra (चित्र) is a multi-functional library for full-stack Deep Learning. It simplifies Model Building, API development, and M

Aniket Maurya 210 Dec 21, 2022
DETReg: Unsupervised Pretraining with Region Priors for Object Detection

DETReg: Unsupervised Pretraining with Region Priors for Object Detection Amir Bar, Xin Wang, Vadim Kantorov, Colorado J Reed, Roei Herzig, Gal Chechik

Amir Bar 283 Dec 27, 2022
A Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population

DeepKE is a knowledge extraction toolkit supporting low-resource and document-level scenarios for entity, relation and attribute extraction. We provide comprehensive documents, Google Colab tutorials

ZJUNLP 1.6k Jan 05, 2023
💃 VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena

💃 VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena.

Heidelberg-NLP 17 Nov 07, 2022
DIVeR: Deterministic Integration for Volume Rendering

DIVeR: Deterministic Integration for Volume Rendering This repo contains the training and evaluation code for DIVeR. Setup python 3.8 pytorch 1.9.0 py

64 Dec 27, 2022