Dense Unsupervised Learning for Video Segmentation (NeurIPS*2021)

Overview

Dense Unsupervised Learning for Video Segmentation

License Framework

This repository contains the official implementation of our paper:

Dense Unsupervised Learning for Video Segmentation
Nikita Araslanov, Simone Schaub-Mayer and Stefan Roth
To appear at NeurIPS*2021. [paper] [supp] [talk] [example results] [arXiv]

drawing

We efficiently learn spatio-temporal correspondences
without any supervision, and achieve state-of-the-art
accuracy of video object segmentation.

Contact: Nikita Araslanov fname.lname (at) visinf.tu-darmstadt.de


Installation

Requirements. To reproduce our results, we recommend Python >=3.6, PyTorch >=1.4, CUDA >=10.0. At least one Titan X GPUs (12GB) or equivalent is required. The code was primarily developed under PyTorch 1.8 on a single A100 GPU.

The following steps will set up a local copy of the repository.

  1. Create conda environment:
conda create --name dense-ulearn-vos
source activate dense-ulearn-vos
  1. Install PyTorch >=1.4 (see PyTorch instructions). For example on Linux, run:
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
  1. Install the dependencies:
pip install -r requirements.txt
  1. Download the data:
Dataset Website Target directory with video sequences
YouTube-VOS Link data/ytvos/train/JPEGImages/
OxUvA Link data/OxUvA/images/dev/
TrackingNet Link data/tracking/train/jpegs/
Kinetics-400 Link data/kinetics400/video_jpeg/train/

The last column in this table specifies a path to subdirectories (relative to the project root) containing images of video frames. You can obviously use a different path structure. In this case, you will need to adjust the paths in data/filelists/ for every dataset accordingly.

  1. Download filelists:
cd data/filelists
bash download.sh

This will download lists of training and validation paths for all datasets.

Training

We following bash script will train a ResNet-18 model from scratch on one of the four supported datasets (see above):

bash ./launch/train.sh [ytvos|oxuva|track|kinetics]

We also provide our final models for download.

Dataset Mean J&F (DAVIS-2017) Link MD5
OxUvA 65.3 oxuva_e430_res4.pth (132M) af541[...]d09b3
YouTube-VOS 69.3 ytvos_e060_res4.pth (132M) c3ae3[...]55faf
TrackingNet 69.4 trackingnet_e088_res4.pth (88M) 3e7e9[...]95fa9
Kinetics-400 68.7 kinetics_e026_res4.pth (88M) 086db[...]a7d98

Inference and evaluation

Inference

To run the inference use launch/infer_vos.sh:

bash ./launch/infer_vos.sh [davis|ytvos]

The first argument selects the validation dataset to use (davis for DAVIS-2017; ytvos for YouTube-VOS). The bash variables declared in the script further help to set up the paths for reading the data and the pre-trained models as well as the output directory:

  • EXP, RUN_ID and SNAPSHOT determine the pre-trained model to load.
  • VER specifies a suffix for the output directory (in case you would like to experiment with different configurations for label propagation). Please, refer to launch/infer_vos.sh for their usage.

The inference script will create two directories with the result: [res3|res4|key]_vos and [res3|res4|key]_vis, where the prefix corresponds to the codename of the output CNN layer used in the evaluation (selected in infer_vos.sh using KEY variable). The vos-directory contains the segmentation result ready for evaluation; the vis-directory produces the results for visualisation purposes. You can optionally disable generating the visualisation by setting VERBOSE=False in infer_vos.py.

Evaluation: DAVIS-2017

Please use the official evaluation package. Install the repository, then simply run:

python evaluation_method.py --task semi-supervised --davis_path data/davis2017 --results_path <path-to-vos-directory>

Evaluation: YouTube-VOS 2018

Please use the official CodaLab evaluation server. To create the submission, rename the vos-directory to Annotations and compress it to Annotations.zip for uploading.

Acknowledgements

We thank PyTorch contributors and Allan Jabri for releasing their implementation of the label propagation.

Citation

We hope you find our work useful. If you would like to acknowledge it in your project, please use the following citation:

@inproceedings{Araslanov:2021:DUL,
  author    = {Araslanov, Nikita and Simone Schaub-Mayer and Roth, Stefan},
  title     = {Dense Unsupervised Learning for Video Segmentation},
  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
  volume    = {34},
  year = {2021}
}
Owner
Visual Inference Lab @TU Darmstadt
Visual Inference Lab @TU Darmstadt
Chinese clinical named entity recognition using pre-trained BERT model

Chinese clinical named entity recognition (CNER) using pre-trained BERT model Introduction Code for paper Chinese clinical named entity recognition wi

Xiangyang Li 109 Dec 14, 2022
Attention-based Transformation from Latent Features to Point Clouds (AAAI 2022)

Attention-based Transformation from Latent Features to Point Clouds This repository contains a PyTorch implementation of the paper: Attention-based Tr

12 Nov 11, 2022
RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation

Multipath RefineNet A MATLAB based framework for semantic image segmentation and general dense prediction tasks on images. This is the source code for

Guosheng Lin 575 Dec 06, 2022
Double pendulum simulator using a symplectic Euler's method and Hamiltonian mechanics

Symplectic Double Pendulum Simulator Double pendulum simulator using a symplectic Euler's method. The program calculates the momentum and position of

Scott Marino 1 Jan 12, 2022
StyleGAN2 Webtoon / Anime Style Toonify

StyleGAN2 Webtoon / Anime Style Toonify Korea Webtoon or Japanese Anime Character Stylegan2 base high Quality 1024x1024 / 512x512 Generate and Transfe

121 Dec 21, 2022
N-RPG - Novel role playing game da turfu

N-RPG Ce README sera la page de garde du projet. Contenu Il contiendra la présen

4 Mar 15, 2022
Official implementation of the MM'21 paper Constrained Graphic Layout Generation via Latent Optimization

[MM'21] Constrained Graphic Layout Generation via Latent Optimization This repository provides the official code for the paper "Constrained Graphic La

Kotaro Kikuchi 73 Dec 27, 2022
Uses Open AI Gym environment to create autonomous cryptocurrency bot to trade cryptocurrencies.

Crypto_Bot Uses Open AI Gym environment to create autonomous cryptocurrency bot to trade cryptocurrencies. Steps to get started using the bot: Sign up

21 Oct 03, 2022
Reproduction of Vision Transformer in Tensorflow2. Train from scratch and Finetune.

Vision Transformer(ViT) in Tensorflow2 Tensorflow2 implementation of the Vision Transformer(ViT). This repository is for An image is worth 16x16 words

sungjun lee 42 Dec 27, 2022
A set of examples around hub for creating and processing datasets

Examples for Hub - Dataset Format for AI A repository showcasing examples of using Hub Uploading Dataset Places365 Colab Tutorials Notebook Link Getti

Activeloop 11 Dec 14, 2022
Convolutional Neural Network to detect deforestation in the Amazon Rainforest

Convolutional Neural Network to detect deforestation in the Amazon Rainforest This project is part of my final work as an Aerospace Engineering studen

5 Feb 17, 2022
A python script to dump all the challenges locally of a CTFd-based Capture the Flag.

A python script to dump all the challenges locally of a CTFd-based Capture the Flag. Features Connects and logins to a remote CTFd instance. Dumps all

Podalirius 77 Dec 07, 2022
Code for "Neural 3D Scene Reconstruction with the Manhattan-world Assumption" CVPR 2022 Oral

News 05/10/2022 To make the comparison on ScanNet easier, we provide all quantitative and qualitative results of baselines here, including COLMAP, COL

ZJU3DV 365 Dec 30, 2022
A diff tool for language models

LMdiff Qualitative comparison of large language models. Demo & Paper: http://lmdiff.net LMdiff is a MIT-IBM Watson AI Lab collaboration between: Hendr

Hendrik Strobelt 27 Dec 29, 2022
Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020

XDVioDet Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020. The proj

peng 64 Dec 12, 2022
Explainability for Vision Transformers (in PyTorch)

Explainability for Vision Transformers (in PyTorch) This repository implements methods for explainability in Vision Transformers

Jacob Gildenblat 442 Jan 04, 2023
This is a re-implementation of TransGAN: Two Pure Transformers Can Make One Strong GAN (CVPR 2021) in PyTorch.

TransGAN: Two Transformers Can Make One Strong GAN [YouTube Video] Paper Authors: Yifan Jiang, Shiyu Chang, Zhangyang Wang CVPR 2021 This is re-implem

Ahmet Sarigun 79 Jan 05, 2023
一套完整的微博舆情分析流程代码,包括微博爬虫、LDA主题分析和情感分析。

已经将项目的关键文件上传,包含微博爬虫、LDA主题分析和情感分析三个部分。 1.微博爬虫 实现微博评论爬取和微博用户信息爬取,一天大概十万条。 2.LDA主题分析 实现文档主题抽取,包括数据清洗及分词、主题数的确定(主题一致性和困惑度)和最优主题模型的选择(暴力搜索)。 3.情感分析 实现评论文本的

182 Jan 02, 2023
MoveNet Single Pose on OpenVINO

MoveNet Single Pose tracking on OpenVINO Running Google MoveNet Single Pose models on OpenVINO. A convolutional neural network model that runs on RGB

35 Nov 11, 2022
Tom-the-AI - A compound artificial intelligence software for Linux systems.

Tom the AI (version 0.82) WARNING: This software is not yet ready to use, I'm still setting up the GitHub repository. Should be ready in a few days. T

2 Apr 28, 2022