Learning Tracking Representations via Dual-Branch Fully Transformer Networks

Related tags

Deep LearningDualTFR
Overview

Learning Tracking Representations via Dual-Branch Fully Transformer Networks

DualTFR

We achieves the runner-ups for both VOT2021ST (short-term) and RT(real-time). The variants of DualTFR take 3rd/4th places of VOT2020RT and 4th places of VOT2020ST

For VOT21 challenge model weight download:

We provide the models of Five trackers SAMN, SAMN_DiMP, DualTFR, DualTFRst, DualTFRon here.

Note that the AlphaRefine (https://github.com/MasterBin-IIAU/AlphaRefine) model and SuperDiMP (https://github.com/visionml/pytracking) model are the same with the original author.

Tracker model quantity model name
SAMN 1 SAMN.tar
SAMN_DiMP 2 super_dimp.pth.tar, SAMN.tar
DualTFR 2 DualTFR.tar, ar.pth.tar
DualTFRst 2 DualTFRst.tar, ar.pth.tar
DualTFRon 2 DualTFRon.tar, ar.pth.tar

Models can be downloaded from BaiduNetDisk or GoogleDrive:

BaiduNetDisk:

https://pan.baidu.com/s/1RHA7HVlXtNEzYPGIjJbQ-g (sruh)

GoogleDrive:

https://drive.google.com/drive/folders/1Z61_mfh2vwzqDxejt5idBOgYhWOCZOr5?usp=sharing

Code will be released soon.

We present a simple Siamese-like Dual-branch network based on solely Transformer networks to learn about tracking features. Given a template and a search image, we divide them into non-overlapping image patches and extract a feature vector for each based on its matching results with others within an attention window. Then for each token, we estimate whether it contains the target object and the corresponding size. The prominent advantage of the approach is that the features are learned from matching, and ultimately, for matching. So the features are aligned with the subsequent object tracking task. The method achieves comparable results comparing to the best-performing methods which first use CNN to extract features and then use Transformer to fuse them. Without bells and whistles, it outperforms the state-of-the-art methods on GOT-10k and VOT2020 benchmarks. In addition, the method achieves real-time inference speed (about 40 fps).

Acknowledgments

Contacts

  • Fei Xie, School of Automation, Southeast University, China, [email protected], wechat: 372998044
Owner
phiphi
phiphi
CS583: Deep Learning

CS583: Deep Learning

Shusen Wang 2.6k Dec 30, 2022
Project for tracking occupancy in Tel-Aviv parking lots.

Ahuzat Dibuk - Tracking occupancy in Tel-Aviv parking lots main.py This module was set-up to be executed on Google Cloud Platform. I run it every 15 m

Geva Kipper 35 Nov 22, 2022
A simple baseline for 3d human pose estimation in PyTorch.

3d_pose_baseline_pytorch A PyTorch implementation of a simple baseline for 3d human pose estimation. You can check the original Tensorflow implementat

weigq 312 Jan 06, 2023
Image Fusion Transformer

Image-Fusion-Transformer Platform Python 3.7 Pytorch =1.0 Training Dataset MS-COCO 2014 (T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ram

Vibashan VS 68 Dec 23, 2022
This repository consists of Blender python scripts and corresponding assets to generate variants of the CANDLE dataset

candle-simulator This repository consists of Blender python scripts and corresponding assets to generate variants of the IITH-CANDLE dataset. The rend

1 Dec 15, 2021
Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques

Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques This repository is derived from the NMTGMinor

Tu Anh Dinh 1 Sep 07, 2022
Human Activity Recognition example using TensorFlow on smartphone sensors dataset and an LSTM RNN. Classifying the type of movement amongst six activity categories - Guillaume Chevalier

LSTMs for Human Activity Recognition Human Activity Recognition (HAR) using smartphones dataset and an LSTM RNN. Classifying the type of movement amon

Guillaume Chevalier 3.1k Dec 30, 2022
An implementation of quantum convolutional neural network with MindQuantum. Huawei, classifying MNIST dataset

关于实现的一点说明 山东大学 2020级 苏博南 www.subonan.com 文件说明 tools.py 这里面主要有两个函数: resize(a, lenb) 这其实是我找同学写的一个小算法hhh。给出一个$28\times 28$的方阵a,返回一个$lenb\times lenb$的方阵。因

ぼっけなす 2 Aug 29, 2022
Code and Experiments for ACL-IJCNLP 2021 Paper Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering.

Code and Experiments for ACL-IJCNLP 2021 Paper Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering.

Sidd Karamcheti 50 Nov 16, 2022
Auditing Black-Box Prediction Models for Data Minimization Compliance

Data-Minimization-Auditor An auditing tool for model-instability based data minimization that is introduced in "Auditing Black-Box Prediction Models f

Bashir Rastegarpanah 2 Mar 24, 2022
improvement of CLIP features over the traditional resnet features on the visual question answering, image captioning, navigation and visual entailment tasks.

CLIP-ViL In our paper "How Much Can CLIP Benefit Vision-and-Language Tasks?", we show the improvement of CLIP features over the traditional resnet fea

310 Dec 28, 2022
Generalized and Efficient Blackbox Optimization System.

OpenBox Doc | OpenBox中文文档 OpenBox: Generalized and Efficient Blackbox Optimization System OpenBox is an efficient and generalized blackbox optimizatio

DAIR Lab 238 Dec 29, 2022
Red Team tool for exfiltrating files from a target's Google Drive that you have access to, via Google's API.

GD-Thief Red Team tool for exfiltrating files from a target's Google Drive that you(the attacker) has access to, via the Google Drive API. This includ

Antonio Piazza 39 Dec 27, 2022
LSTC: Boosting Atomic Action Detection with Long-Short-Term Context

LSTC: Boosting Atomic Action Detection with Long-Short-Term Context This Repository contains the code on AVA of our ACM MM 2021 paper: LSTC: Boosting

Tencent YouTu Research 9 Oct 11, 2022
A collection of educational notebooks on multi-view geometry and computer vision.

Multiview notebooks This is a collection of educational notebooks on multi-view geometry and computer vision. Subjects covered in these notebooks incl

Max 65 Dec 09, 2022
Tom-the-AI - A compound artificial intelligence software for Linux systems.

Tom the AI (version 0.82) WARNING: This software is not yet ready to use, I'm still setting up the GitHub repository. Should be ready in a few days. T

2 Apr 28, 2022
MRI reconstruction (e.g., QSM) using deep learning methods

deepMRI: Deep learning methods for MRI Authors: Yang Gao, Hongfu Sun This repo is devloped based on Pytorch (1.8 or later) and matlab (R2019a or later

Hongfu Sun 17 Dec 18, 2022
A curated list of neural rendering resources.

Awesome-of-Neural-Rendering A curated list of neural rendering and related resources. Please feel free to pull requests or open an issue to add papers

Zhiwei ZHANG 43 Dec 09, 2022
Model serving at scale

Run inference at scale Cortex is an open source platform for large-scale machine learning inference workloads. Workloads Realtime APIs - respond to pr

Cortex Labs 7.9k Jan 06, 2023
使用OpenCV部署全景驾驶感知网络YOLOP,可同时处理交通目标检测、可驾驶区域分割、车道线检测,三项视觉感知任务,包含C++和Python两种版本的程序实现。本套程序只依赖opencv库就可以运行, 从而彻底摆脱对任何深度学习框架的依赖。

YOLOP-opencv-dnn 使用OpenCV部署全景驾驶感知网络YOLOP,可同时处理交通目标检测、可驾驶区域分割、车道线检测,三项视觉感知任务,依然是包含C++和Python两种版本的程序实现 onnx文件从百度云盘下载,链接:https://pan.baidu.com/s/1A_9cldU

178 Jan 07, 2023