AOT (Associating Objects with Transformers) in PyTorch

Overview

AOT (Associating Objects with Transformers) in PyTorch

A modular reference PyTorch implementation of Associating Objects with Transformers for Video Object Segmentation (NIPS 2021). [paper]

alt text

alt text

Highlights

  • High performance: up to 85.5% (R50-AOTL) on YouTube-VOS 2018 and 82.1% (SwinB-AOTL) on DAVIS-2017 Test-dev under standard settings.
  • High efficiency: up to 51fps (AOTT) on DAVIS-2017 (480p) even with 10 objects and 41fps on YouTube-VOS (1.3x480p). AOT can process multiple objects (less than a pre-defined number, 10 in default) as efficiently as processing a single object. This project also supports inferring any number of objects together within a video by automatic separation and aggregation.
  • Multi-GPU training and inference
  • Mixed precision training and inference
  • Test-time augmentation: multi-scale and flipping augmentations are supported.

TODO

  • Code documentation
  • Demo tool
  • Adding your own dataset

Requirements

  • Python3
  • pytorch >= 1.7.0 and torchvision
  • opencv-python
  • Pillow

Optional (for better efficiency):

  • Pytorch Correlation (recommend to install from source instead of using pip)

Demo

Coming

Model Zoo and Results

Pre-trained models and corresponding results reproduced by this project can be found in MODEL_ZOO.md.

Getting Started

  1. Prepare datasets:

    Please follow the below instruction to prepare datasets in each correspondding folder.

    • Static

      datasets/Static: pre-training dataset with static images. A guidance can be found in AFB-URR.

    • YouTube-VOS

      A commonly-used large-scale VOS dataset.

      datasets/YTB/2019: version 2019, download link. train is required for training. valid (6fps) and valid_all_frames (30fps, optional) are used for evaluation.

      datasets/YTB/2018: version 2018, download link. Only valid (6fps) and valid_all_frames (30fps, optional) are required for this project and used for evaluation.

    • DAVIS

      A commonly-used small-scale VOS dataset.

      datasets/DAVIS: TrainVal (480p) contains both the training and validation split. Test-Dev (480p) contains the Test-dev split. The full-resolution version is also supported for training and evluation but not required.

  2. Prepare ImageNet pre-trained encoders

    Select and download below checkpoints into pretrain_models:

    The current default training configs are not optimized for encoders larger than ResNet-50. If you want to use larger encoders, we recommond to early stop the main-training stage at 80,000 iteration (100,000 in default) to avoid over-fitting on the seen classes of YouTube-VOS.

  3. Training and Evaluation

    The example script will train AOTT with 2 stages using 4 GPUs and auto-mixed precision (--amp). The first stage is a pre-training stage using Static dataset, and the second stage is main-training stage, which uses both YouTube-VOS 2019 train and DAVIS-2017 train for training, resulting in a model can generalize to different domains (YouTube-VOS and DAVIS) and different frame rates (6fps, 24fps, and 30fps).

    Notably, you can use only the YouTube-VOS 2019 train split in the second stage by changing pre_ytb_dav to pre_ytb, which leads to better YouTube-VOS performance on unseen classes. Besides, if you don't want to do the first stage, you can start the training from stage ytb, but the performance will drop about 1~2% absolutely.

    After the training is finished, the example script will evaluate the model on YouTube-VOS and DAVIS, and the results will be packed into Zip files. For calculating scores, please use offical YouTube-VOS servers (2018 server and 2019 server) and offical DAVIS toolkit.

Adding your own dataset

Coming

Troubleshooting

Waiting

Citations

Please consider citing the related paper(s) in your publications if it helps your research.

@inproceedings{yang2021aot,
  title={Associating Objects with Transformers for Video Object Segmentation},
  author={Yang, Zongxin and Wei, Yunchao and Yang, Yi},
  booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
  year={2021}
}

License

This project is released under the BSD-3-Clause license. See LICENSE for additional details.

Owner
CS graduate student, Zhejiang University.
This is a pytorch implementation of the NeurIPS paper GAN Memory with No Forgetting.

GAN Memory for Lifelong learning This is a pytorch implementation of the NeurIPS paper GAN Memory with No Forgetting. Please consider citing our paper

Miaoyun Zhao 43 Dec 27, 2022
Probabilistic Programming and Statistical Inference in PyTorch

PtStat Probabilistic Programming and Statistical Inference in PyTorch. Introduction This project is being developed during my time at Cogent Labs. The

Stefano Peluchetti 109 Nov 26, 2022
Code for "ATISS: Autoregressive Transformers for Indoor Scene Synthesis", NeurIPS 2021

ATISS: Autoregressive Transformers for Indoor Scene Synthesis This repository contains the code that accompanies our paper ATISS: Autoregressive Trans

138 Dec 22, 2022
Python based framework for Automatic AI for Regression and Classification over numerical data.

Python based framework for Automatic AI for Regression and Classification over numerical data. Performs model search, hyper-parameter tuning, and high-quality Jupyter Notebook code generation.

BlobCity, Inc 141 Dec 21, 2022
Fast Differentiable Matrix Sqrt Root

Official Pytorch implementation of ICLR 22 paper Fast Differentiable Matrix Square Root

YueSong 42 Dec 30, 2022
(CVPR2021) ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic

ClassSR (CVPR2021) ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic Paper Authors: Xiangtao Kong, Hengyuan

Xiangtao Kong 308 Jan 05, 2023
Meta Self-learning for Multi-Source Domain Adaptation: A Benchmark

Meta Self-Learning for Multi-Source Domain Adaptation: A Benchmark Project | Arxiv | YouTube | | Abstract In recent years, deep learning-based methods

CVSM Group - email: <a href=[email protected]"> 188 Dec 12, 2022
A Repository of Community-Driven Natural Instructions

A Repository of Community-Driven Natural Instructions TLDR; this repository maintains a community effort to create a large collection of tasks and the

AI2 244 Jan 04, 2023
TensorFlow, PyTorch and Numpy layers for generating Orthogonal Polynomials

OrthNet TensorFlow, PyTorch and Numpy layers for generating multi-dimensional Orthogonal Polynomials 1. Installation 2. Usage 3. Polynomials 4. Base C

Chuan 29 May 25, 2022
Model Serving Made Easy

The easiest way to build Machine Learning APIs BentoML makes moving trained ML models to production easy: Package models trained with any ML framework

BentoML 4.4k Jan 08, 2023
[CVPR 2021] Official PyTorch Implementation for "Iterative Filter Adaptive Network for Single Image Defocus Deblurring"

IFAN: Iterative Filter Adaptive Network for Single Image Defocus Deblurring Checkout for the demo (GUI/Google Colab)! The GUI version might occasional

Junyong Lee 173 Dec 30, 2022
Official repository of "DeepMIH: Deep Invertible Network for Multiple Image Hiding", TPAMI 2022.

DeepMIH: Deep Invertible Network for Multiple Image Hiding (TPAMI 2022) This repo is the official code for DeepMIH: Deep Invertible Network for Multip

Junpeng Jing 67 Nov 22, 2022
Real-Time Semantic Segmentation in Mobile device

Real-Time Semantic Segmentation in Mobile device This project is an example project of semantic segmentation for mobile real-time app. The architectur

708 Jan 01, 2023
MolRep: A Deep Representation Learning Library for Molecular Property Prediction

MolRep: A Deep Representation Learning Library for Molecular Property Prediction Summary MolRep is a Python package for fairly measuring algorithmic p

AI-Health @NSCC-gz 83 Dec 24, 2022
RAMA: Rapid algorithm for multicut problem

RAMA: Rapid algorithm for multicut problem Solves multicut (correlation clustering) problems orders of magnitude faster than CPU based solvers without

Paul Swoboda 60 Dec 13, 2022
PyTorch implementation of DeepDream algorithm

neural-dream This is a PyTorch implementation of DeepDream. The code is based on neural-style-pt. Here we DeepDream a photograph of the Golden Gate Br

121 Nov 05, 2022
Group Fisher Pruning for Practical Network Compression(ICML2021)

Group Fisher Pruning for Practical Network Compression (ICML2021) By Liyang Liu*, Shilong Zhang*, Zhanghui Kuang, Jing-Hao Xue, Aojun Zhou, Xinjiang W

Shilong Zhang 129 Dec 13, 2022
Rule based classification A hotel s customers dataset

Rule-based-classification-A-hotel-s-customers-dataset- Aim: Categorize new customers by segment and predict how much revenue they can generate This re

Şebnem 4 Jan 02, 2022
This repository contains the code and models for the following paper.

DC-ShadowNet Introduction This is an implementation of the following paper DC-ShadowNet: Single-Image Hard and Soft Shadow Removal Using Unsupervised

AuAgCu 65 Dec 27, 2022