Learning Spatio-Temporal Transformer for Visual Tracking

Last update: Dec 29, 2022

Related tags

Deep Learning Stark

Overview

STARK

The official implementation of the paper Learning Spatio-Temporal Transformer for Visual Tracking

Hiring research interns for visual transformer projects: [email protected]

Highlights

End-to-End, Post-processing Free

STARK is an end-to-end tracking approach, which directly predicts one accurate bounding box as the tracking result.
Besides, STARK does not use any hyperparameters-sensitive post-processing, leading to stable performances.

Real-Time Speed

STARK-ST50 and STARK-ST101 run at 40FPS and 30FPS respectively on a Tesla V100 GPU.

Strong performance

Tracker	LaSOT (AUC)	GOT-10K (AO)	TrackingNet (AUC)
STARK	67.1	68.8	82.0
TransT	64.9	67.1	81.4
TrDiMP	63.7	67.1	78.4
Siam R-CNN	64.8	64.9	81.2

Purely PyTorch-based Code

STARK is implemented purely based on the PyTorch.

Install the environment

Option1: Use the Anaconda

conda create -n stark python=3.6
conda activate stark
bash install.sh

Option2: Use the docker file

We provide the complete docker at here

Data Preparation

Put the tracking datasets in ./data. It should look like:

${STARK_ROOT}
 -- data
     -- lasot
         |-- airplane
         |-- basketball
         |-- bear
         ...
     -- got10k
         |-- test
         |-- train
         |-- val
     -- coco
         |-- annotations
         |-- images
     -- trackingnet
         |-- TRAIN_0
         |-- TRAIN_1
         ...
         |-- TRAIN_11
         |-- TEST

Run the following command to set paths for this project

python tracking/create_default_local_file.py --workspace_dir . --data_dir ./data --save_dir .

After running this command, you can also modify paths by editing these two files

lib/train/admin/local.py  # paths about training
lib/test/evaluation/local.py  # paths about testing

Train STARK

Training with multiple GPUs using DDP

# STARK-S50
python tracking/train.py --script stark_s --config baseline --save_dir . --mode multiple --nproc_per_node 8  # STARK-S50
# STARK-ST50
python tracking/train.py --script stark_st1 --config baseline --save_dir . --mode multiple --nproc_per_node 8  # STARK-ST50 Stage1
python tracking/train.py --script stark_st2 --config baseline --save_dir . --mode multiple --nproc_per_node 8 --script_prv stark_st1 --config_prv baseline  # STARK-ST50 Stage2
# STARK-ST101
python tracking/train.py --script stark_st1 --config baseline_R101 --save_dir . --mode multiple --nproc_per_node 8  # STARK-ST101 Stage1
python tracking/train.py --script stark_st2 --config baseline_R101 --save_dir . --mode multiple --nproc_per_node 8 --script_prv stark_st1 --config_prv baseline_R101  # STARK-ST101 Stage2

(Optionally) Debugging training with a single GPU

python tracking/train.py --script stark_s --config baseline --save_dir . --mode single

Test and evaluate STARK on benchmarks

LaSOT

python tracking/test.py stark_st baseline --dataset lasot --threads 32
python tracking/analysis_results.py # need to modify tracker configs and names

GOT10K-test

python tracking/test.py stark_st baseline_got10k_only --dataset got10k_test --threads 32
python lib/test/utils/transform_got10k.py --tracker_name stark_st --cfg_name baseline_got10k_only

TrackingNet

python tracking/test.py stark_st baseline --dataset trackingnet --threads 32
python lib/test/utils/transform_trackingnet.py --tracker_name stark_st --cfg_name baseline

VOT2020
Before evaluating "STARK+AR" on VOT2020, please install some extra packages following external/AR/README.md

cd external/vot20/<workspace_dir>
export PYTHONPATH=<path to the stark project>:$PYTHONPATH
bash exp.sh

VOT2020-LT

cd external/vot20_lt/<workspace_dir>
export PYTHONPATH=<path to the stark project>:$PYTHONPATH
bash exp.sh

Test FLOPs, Params, and Speed

# Profiling STARK-S50 model
python tracking/profile_model.py --script stark_s --config baseline
# Profiling STARK-ST50 model
python tracking/profile_model.py --script stark_st2 --config baseline
# Profiling STARK-ST101 model
python tracking/profile_model.py --script stark_st2 --config baseline_R101

Model Zoo

The trained models, the training logs, and the raw tracking results are provided in the model zoo

Acknowledgments

Thanks for the great PyTracking Library, which helps us to quickly implement our ideas.
We use the implementation of the DETR from the official repo https://github.com/facebookresearch/detr.

Learning Spatio-Temporal Transformer for Visual Tracking

Related tags

Overview

STARK

Highlights

End-to-End, Post-processing Free

Real-Time Speed

Strong performance

Purely PyTorch-based Code

Install the environment

Data Preparation

Train STARK

Test and evaluate STARK on benchmarks

Test FLOPs, Params, and Speed

Model Zoo

Acknowledgments

Owner

Multimedia Research

PyTorch implementation for OCT-GAN Neural ODE-based Conditional Tabular GANs (WWW 2021)

The official implementation of the Hybrid Self-Attention NEAT algorithm

ViViT: Curvature access through the generalized Gauss-Newton's low-rank structure

A coin flip game in which you can put the amount of money below or equal to 1000 and then choose heads or tail

This repository contains all data used for writing a research paper Multiple Object Trackers in OpenCV: A Benchmark, presented in ISIE 2021 conference in Kyoto, Japan.

A curated list of resources for Image and Video Deblurring

End-to-End Object Detection with Fully Convolutional Network

Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome

Package for working with hypernetworks in PyTorch.

Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation

Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer.

SCI-AIDE : High-fidelity Few-shot Histopathology Image Synthesis for Rare Cancer Diagnosis

This repository provides the official implementation of 'Learning to ignore: rethinking attention in CNNs' accepted in BMVC 2021.

DP-CL(Continual Learning with Differential Privacy)

Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search

Deep Learning Datasets Maker is a QGIS plugin to make datasets creation easier for raster and vector data.

Code for PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning

Language-Driven Semantic Segmentation

Pythonic particle-based (super-droplet) warm-rain/aqueous-chemistry cloud microphysics package with box, parcel & 1D/2D prescribed-flow examples in Python, Julia and Matlab

a practicable framework used in Deep Learning. So far UDL only provide DCFNet implementation for the ICCV paper (Dynamic Cross Feature Fusion for Remote Sensing Pansharpening)