Global Tracking Transformers, CVPR 2022

Overview

Global Tracking Transformers

Global Tracking Transformers,
Xingyi Zhou, Tianwei Yin, Vladlen Koltun, Philipp Krähenbühl,
CVPR 2022 (arXiv 2203.13250)

Features

  • Object association within a long temporal window (32 frames).

  • Classification after tracking for long-tail recognition.

  • "Detector" of global trajectories.

Installation

See installation instructions.

Demo

Run our demo using Colab (no GPU needed): Open In Colab

We use the default detectron2 demo interface. For example, to run TAO model on an example video (video source: TAO/YFCC100M dataset), download the model and run

python demo.py --config-file configs/GTR_TAO_DR2101.yaml --video-input docs/yfcc_v_acef1cb6d38c2beab6e69e266e234f.mp4 --output output/demo_yfcc.mp4 --opts MODEL.WEIGHTS models/GTR_TAO_DR2101.pth

If setup correctly, the output on output/demo_yfcc.mp4 should look like:

Benchmark evaluation and training

Please first prepare datasets, then check our MODEL ZOO to reproduce results in our paper. We highlight key results below:

  • MOT17 test set
MOTA IDF1 HOTA DetA AssA FPS
75.3 71.5 59.1 61.6 57.0 19.6
  • TAO test set
Track mAP FPS
20.1 11.2

License

The majority of GTR is licensed under the Apache 2.0 license, however portions of the project are available under separate license terms: trackeval in gtr/tracking/trackeval/, is licensed under the MIT license. FairMOT in gtr/tracking/local_tracker is under MIT license. Please see NOTICE for license details. The demo video is from TAO dataset, which is originally from YFCC100M dataset. Please be aware of the original dataset license.

Citation

If you find this project useful for your research, please use the following BibTeX entry.

@inproceedings{zhou2022global,
  title={Global Tracking Transformers},
  author={Zhou, Xingyi and Yin, Tianwei and Koltun, Vladlen and Kr{\"a}henb{\"u}hl, Philipp},
  booktitle={CVPR},
  year={2022}
}
Issues
  • Training memory issue & missing file

    Training memory issue & missing file

    Hello, Thanks for sharing the source code of nice work!

    I have tried the TAO training code (GTR_TAO_DR2101.yaml) but failed full training due to the memory overhead error. It seems the memory usage increases gradually during training, and reaches the max memory limit. As I am currently using A6000 with 48G gpu, it should be enough based on your training spec (4x 32G V100 gpu). Could you give any ideas? My initial solution is to reduce the video length 8 to 2.

    Moreover, I cannot find the move_tao_keyframes.py file. Could you please provide this file?

    Thanks,

    opened by tkdtks123 4
  • Error Running Demo

    Error Running Demo

    Hello, I'm having trouble running the inference (the "Demo" section in the README). Below is a notebook link showing the setup and error.

    Here is the link to the notebook.

    Let me know if anything else needs to be provided.

    Much appreciated!

    opened by alckasoc 3
  • Not able to run in x86 in CPU

    Not able to run in x86 in CPU

    Hi @xingyizhou @noahcao Thank you for sharing this work When I'm trying to run the script in my x86 machine in cpu $python demo.py --config-file configs/GTR_TAO_DR2101.yaml --video-input docs/yfcc_v_acef1cb6d38c2beab6e69e266e234f.mp4 --output output/demo_yfcc.mp4 --opts MODEL.WEIGHTS GTR_TAO_DR2101.pth, I'm getting the following error:

    Traceback (most recent call last): File "/home/sravan/SAT/Tracker/GTR/demo.py", line 161, in for vis_frame in demo.run_on_video(video): File "/home/sravan/SAT/Tracker/GTR/gtr/predictor.py", line 147, in run_on_video outputs = self.video_predictor(frames) File "/home/sravan/anaconda3/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context return func(*args, **kwargs) File "/home/sravan/SAT/Tracker/GTR/gtr/predictor.py", line 103, in call predictions = self.model(inputs) File "/home/sravan/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/sravan/SAT/Tracker/GTR/gtr/modeling/meta_arch/gtr_rcnn.py", line 61, in forward return self.sliding_inference(batched_inputs) File "/home/sravan/SAT/Tracker/GTR/gtr/modeling/meta_arch/gtr_rcnn.py", line 81, in sliding_inference instances_wo_id = self.inference( File "/home/sravan/SAT/Tracker/GTR/gtr/modeling/meta_arch/custom_rcnn.py", line 107, in inference features = self.backbone(images.tensor) File "/home/sravan/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/sravan/anaconda3/lib/python3.9/site-packages/detectron2/modeling/backbone/fpn.py", line 126, in forward bottom_up_features = self.bottom_up(x) File "/home/sravan/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/sravan/SAT/Tracker/GTR/third_party/CenterNet2/centernet/modeling/backbone/res2net.py", line 630, in forward x = stage(x) File "/home/sravan/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/sravan/anaconda3/lib/python3.9/site-packages/torch/nn/modules/container.py", line 141, in forward input = module(input) File "/home/sravan/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/sravan/SAT/Tracker/GTR/third_party/CenterNet2/centernet/modeling/backbone/res2net.py", line 457, in forward sp = self.convs[i](sp, offset, mask) File "/home/sravan/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/sravan/anaconda3/lib/python3.9/site-packages/detectron2/layers/deform_conv.py", line 474, in forward x = modulated_deform_conv( File "/home/sravan/anaconda3/lib/python3.9/site-packages/detectron2/layers/deform_conv.py", line 211, in forward raise NotImplementedError("Deformable Conv is not supported on CPUs!") NotImplementedError: Deformable Conv is not supported on CPUs!

    How can I solve this?

    opened by navaravan 2
  • A question about the speed

    A question about the speed

    Thanks for releasing this great work. May I ask for more details about the speed evaluation?

    For TAO data, as you used the default detectron2, may I know if you count the inference time of detectron2 for 11.2 FPS, or only the GTR inference time? Since the TAO video sampling rate may not be 30 FPS, does it need to consider this factor and transfer the inference speed?

    Thanks.

    opened by fandulu 2
  • can't evaluate on MOT17

    can't evaluate on MOT17

    Hi Xingyi,

    I believe the guidelines you write at the doc has some issue. To be precise, to directly evaluate on MOT17 by:

    python train_net.py --config-file configs/GTR_MOT_FPN.yaml --eval-only MODEL.WEIGHTS  output/GTR_MOT/GTR_MOT_FPN/model_0004999.pth
    

    we will get the error as: gtr.tracking.trackeval.utils.TrackEvalException: GT file not found for sequence: MOT17-02-FRCNN

    Besides, to evaluate on the self-splitted half-val, I assumed we need the files "gt_val_half.txt" under the directory of each sequence?

    Could you help to double check if your guideline can work fine with the current version and reach the requirement of the TrackEvallib you adopted? I thought you may miss some guidelines about data splitting and preparation?

    opened by noahcao 2
  • Typos in training guidelines?

    Typos in training guidelines?

    Hi Xingyi,

    Thanks for the wonderful job. I tried to run the training on MOT17 following the guidelines. But I found some potential typos making that doable.

    1. Should we rename the MOT17 train to trainval, which is not explained in the prepare datasets doc?
    2. Should the datasets for training be ("mot17_halftrain","crowdhuman_train") instead of ("mot17_halftrain","crowdhuman_amodal_train") in the config file? the later one would raises an error of unregistered dataset: image
    opened by noahcao 2
  • Joint or separate training

    Joint or separate training

    Nice work! Thank you for sharing the code.

    Is training of detector and tracker is joint or separate? It seems from the paper (Section 5.2) that the first detector needs to be trained then the detector is frozen and the tracker is finetuned after that? Is that right inference?

    Thanks Gurkirt

    opened by gurkirt 2
  • Reproducing Transformer Fine Tuning - TAO

    Reproducing Transformer Fine Tuning - TAO

    I'm following the instructions here to reproduce the transformer head fine tuning on TAO here: https://github.com/xingyizhou/GTR/blob/master/docs/MODEL_ZOO.md#tao and I can't seem to get the results reported in the MODEL_ZOO or paper.

    Here are the steps I'm following:

    1. Download and setup the datasets as described here: https://github.com/xingyizhou/GTR/tree/master/datasets
    2. Download the trained detection model C2_LVISCOCO_DR2101_4x.pth from the link in the third bullet point under note section in TAO and place it in a models/ directory. The link for the config is broken in this bullet point but I'm using the C2_LVISCOCO_DR2101_4x.yaml in configs/ folder
    3. run python train_net.py --num-gpus 8 --config-file configs/C2_LVISCOCO_DR2101_4x.yaml MODEL.WEIGHTS models/C2_LVISCOCO_DR2101_4x.pth This took about 6 days on 8 Titan X GPUs.

    The reason I believe it didn't train properly is because when I run TAO validation on the output model of the training using: python train_net.py --config-file configs/GTR_TAO_DR2101.yaml --eval-only MODEL.WEIGHTS output/GTR_TAO_first_train/C2_LVISCOCO_DR2101_4x/model_final.pth the mAP is 10.6 but when I run TAO validation on the pretraind model, GTR_TAO_DR2101.pth, downloaded from MODEL_ZOO: python train_net.py --config-file configs/GTR_TAO_DR2101.yaml --eval-only MODEL.WEIGHTS models/GTR_TAO_DR2101.pth the output is correct 22.5 mAP as reported.

    Any ideas why the model training isn't working correctly? Am i using the wrong configurations or something?

    opened by abhik-nd 0
  • Question: What is `data` in gtr/tracking/trackeval/metrics `eval_sequence` methods?

    Question: What is `data` in gtr/tracking/trackeval/metrics `eval_sequence` methods?

    I don't have an issue. I'm just curious what the data variable has in https://github.com/xingyizhou/GTR/tree/master/gtr/tracking/trackeval/metrics eval_sequence methods.

    From looking at clear.py, I'm guessing:

    data = {
        "num_gt_ids": Total number of unique gt ids in the entire sequence,
        "num_tracker_dets": Total number of tracker detections in the entire sequence,
        "num_gt_dets": Total number of gt detections in the entire sequence,
        "gt_ids": [
            [0, 1],  # Zeroth frame.
            [0, 2, 3],
            ...
        ],
         "tracker_ids": [
            [0, 1, 2],  # Zeroth frame.
            [0, 2],
            ... 
        ],
        "similarity_scores": [  # Length is len(video) - 1
             # What is this?
        ]
    }
    

    Is similarity scores just a list of these: traj_score = torch.mm(asso_nonk, id_inds) # n_k x M?

    It looks like the similarity score matrix is of shape: (number of ground truth instances at frame t, number of tracked instances at frame t). I'm still confused about what this exactly is.

    Is there code provided for getting this data format (i.e. from instances returned from the sliding_inference method here)?

    Thank you!

    opened by alckasoc 0
  • About gtr_roi_heads.py

    About gtr_roi_heads.py

    In Line 242, An exception is thrown when len(ind)>=1, but the relevant situation is handled later; How to understand this code, what is the reason if I keep throwing exceptions; Thanks!

    opened by isyangshu 0
  • EfficientDetResizeCrop

    EfficientDetResizeCrop

    Hello, I want to understand some operations in 'custom_augmentation_impl.py'. Specifically, in line 56, the smallest scale-factor is chosen, which does not seem to match 'Scale the shorter edge to the given size'. I'm wondering if it's an issue with my understanding, and what exactly would be the impact.

    opened by isyangshu 0
  • any tips to real-world det&track scenario?

    any tips to real-world det&track scenario?

    Great work! it works pretty well on TAO. As for applying the method to real world multi-categories scenario, training on single image and interpolation or training on labeled video sequences, which approach can benefits much ? thank you.

    opened by dragen1860 0
  • Add Web Demo & Docker environment

    Add Web Demo & Docker environment

    Hey @xingyizhou ! 👋

    Nice work on the global tracking transformer!

    I'm from Replicate, where we're trying to make machine learning reproducible. We noticed you have registered an account with us, and this pull request makes it possible to run your model inside a Docker environment, which makes it easier for other people to run it. We're using an open source tool called Cog to make this process easier.

    This also means we can make a web page where other people can try out your model! View it here: https://replicate.com/xingyizhou/gtr. The docker file can be found under the tab ‘run model with docker’. The demo makes it easy for anyone to upload a customised video and see the result effortless.

    We usually add some examples to the for un-registered users (it looks like the screenshot below), but we'd like to invite you to claim the page so you can own the page, customise the Example gallery as you like, push any future update to the web demo, and we'll feature it on our website and tweet about it too. You can find the 'Claim this model' button on the top of the page.

    Thank you!

    Screenshot 2022-06-01 at 10 29 14

    opened by chenxwh 0
Owner
Xingyi Zhou
CS Ph.D. student at UT Austin.
Xingyi Zhou
Python code for ICLR 2022 spotlight paper EViT: Expediting Vision Transformers via Token Reorganizations

Expediting Vision Transformers via Token Reorganizations This repository contain

Youwei Liang 79 Jul 7, 2022
Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch

Memorizing Transformers - Pytorch Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memori

Phil Wang 281 Jul 10, 2022
Code for CVPR 2021 paper: Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning

Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning This is the PyTorch companion code for the paper: A

Amazon 66 Jun 8, 2022
Global Rhythm Style Transfer Without Text Transcriptions

Global Prosody Style Transfer Without Text Transcriptions This repository provides a PyTorch implementation of AutoPST, which enables unsupervised glo

Kaizhi Qian 167 Jun 29, 2022
An easy to use, user-friendly and efficient code for extracting OpenAI CLIP (Global/Grid) features from image and text respectively.

Extracting OpenAI CLIP (Global/Grid) Features from Image and Text This repo aims at providing an easy to use and efficient code for extracting image &

Jianjie(JJ) Luo 6 May 18, 2022
Tracking Progress in Natural Language Processing

Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

Sebastian Ruder 20.6k Jul 5, 2022
Learning Spatio-Temporal Transformer for Visual Tracking

STARK The official implementation of the paper Learning Spatio-Temporal Transformer for Visual Tracking Highlights The strongest performances Tracker

Multimedia Research 426 Jul 6, 2022
Framework for fine-tuning pretrained transformers for Named-Entity Recognition (NER) tasks

NERDA Not only is NERDA a mesmerizing muppet-like character. NERDA is also a python package, that offers a slick easy-to-use interface for fine-tuning

Ekstra Bladet 134 Jun 28, 2022
KoBART model on huggingface transformers

KoBART-Transformers SKT에서 공개한 KoBART를 편리하게 사용할 수 있게 transformers로 포팅하였습니다. Install (Optional) BartModel과 PreTrainedTokenizerFast를 이용하면 설치하실 필요 없습니다. p

Hyunwoong Ko 54 Apr 26, 2022
Big Bird: Transformers for Longer Sequences

BigBird, is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences. Moreover, BigBird comes along with a theoretical understanding of the capabilities of a complete transformer that the sparse model can handle.

Google Research 410 Jul 1, 2022
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0 ?? Transformers provides thousands of pretrained models to perform tasks o

Hugging Face 66.2k Jul 1, 2022
:mag: Transformers at scale for question answering & neural search. Using NLP via a modular Retriever-Reader-Pipeline. Supporting DPR, Elasticsearch, HuggingFace's Modelhub...

Haystack is an end-to-end framework for Question Answering & Neural search that enables you to ... ... ask questions in natural language and find gran

deepset 4.9k Jul 6, 2022
🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

spacy-transformers: Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy This package provides spaCy components and architectures to use tr

Explosion 1.1k Jun 30, 2022
spaCy plugin for Transformers , Udify, ELmo, etc.

Camphr - spaCy plugin for Transformers, Udify, Elmo, etc. Camphr is a Natural Language Processing library that helps in seamless integration for a wid

null 343 Mar 12, 2022
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0 ?? Transformers provides thousands of pretrained models to perform tasks o

Hugging Face 40.9k Feb 18, 2021
:mag: End-to-End Framework for building natural language search interfaces to data by utilizing Transformers and the State-of-the-Art of NLP. Supporting DPR, Elasticsearch, HuggingFace’s Modelhub and much more!

Haystack is an end-to-end framework that enables you to build powerful and production-ready pipelines for different search use cases. Whether you want

deepset 1.4k Feb 18, 2021
🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

spacy-transformers: Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy This package provides spaCy components and architectures to use tr

Explosion 903 Feb 17, 2021
spaCy plugin for Transformers , Udify, ELmo, etc.

Camphr - spaCy plugin for Transformers, Udify, Elmo, etc. Camphr is a Natural Language Processing library that helps in seamless integration for a wid

null 327 Feb 18, 2021
A deep learning-based translation library built on Huggingface transformers

DL Translate A deep learning-based translation library built on Huggingface transformers and Facebook's mBART-Large ?? GitHub Repository ?? Documentat

Xing Han Lu 214 Jun 24, 2022