Referring Video Object Segmentation

Overview

Awesome-Referring-Video-Object-Segmentation Awesome

Welcome to starts ⭐ & comments πŸ’Ή & sharing πŸ˜€ !!

- 2021.12.12: Recent papers (from 2021) 
- welcome to add if any information misses. 😎

Introduction

image

Referring video object segmentation aims at segmenting an object in video with language expressions.

Unlike the previous video object segmentation, the task exploits a different type of supervision, language expressions, to identify and segment an object referred by the given language expressions in a video. A detailed explanation of the new task can be found in the following paper.

Seonguk Seo, Joon-Young Lee, Bohyung Han, β€œURVOS: Unified Referring Video Object Segmentation Network with a Large-Scale Benchmark”, European Conference on Computer Vision (ECCV), 2020:https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123600205.pdf

Impressive Works Related to Referring Video Object Segmentation (RVOS)

Cross-modal progressive comprehension for referring segmentation:https://arxiv.org/abs/2105.07175 image

Benchmark

The 3rd Large-scale Video Object Segmentation - Track 3: Referring Video Object Segmentation

Datasets

image

Refer-YouTube-VOS-datasets

  • YouTube-VOS:
wget https://github.com/JerryX1110/awesome-rvos/blob/main/down_YTVOS_w_refer.py
python down_YTVOS_w_refer.py

Folder structure:

${current_path}/
└── refer_youtube_vos/ 
    β”œβ”€β”€ train/
    β”‚   β”œβ”€β”€ JPEGImages/
    β”‚   β”‚   └── */ (video folders)
    β”‚   β”‚       └── *.jpg (frame image files) 
    β”‚   └── Annotations/
    β”‚       └── */ (video folders)
    β”‚           └── *.png (mask annotation files) 
    β”œβ”€β”€ valid/
    β”‚   └── JPEGImages/
    β”‚       └── */ (video folders)
    β”‚           └── *.jpg (frame image files) 
    └── meta_expressions/
        β”œβ”€β”€ train/
        β”‚   └── meta_expressions.json  (text annotations)
        └── valid/
            └── meta_expressions.json  (text annotations)
  • A2D-Sentences:

REPO:https://web.eecs.umich.edu/~jjcorso/r/a2d/

paper:https://arxiv.org/abs/1803.07485

image

Citation:

@misc{gavrilyuk2018actor,
      title={Actor and Action Video Segmentation from a Sentence}, 
      author={Kirill Gavrilyuk and Amir Ghodrati and Zhenyang Li and Cees G. M. Snoek},
      year={2018},
      eprint={1803.07485},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

License: The dataset may not be republished in any form without the written consent of the authors.

README Dataset and Annotation (version 1.0, 1.9GB, tar.bz) Evaluation Toolkit (version 1.0, tar.bz)

mkdir a2d_sentences
cd a2d_sentences
wget https://web.eecs.umich.edu/~jjcorso/bigshare/A2D_main_1_0.tar.bz
tar jxvf A2D_main_1_0.tar.bz
mkdir text_annotations

cd text_annotations
wget https://kgavrilyuk.github.io/actor_action/a2d_annotation.txt
wget https://kgavrilyuk.github.io/actor_action/a2d_missed_videos.txt
wget https://github.com/JerryX1110/awesome-rvos/blob/main/down_a2d_annotation_with_instances.py
python down_a2d_annotation_with_instances.py
unzip a2d_annotation_with_instances.zip
#rm a2d_annotation_with_instances.zip
cd ..

cd ..

Folder structure:

${current_path}/
└── a2d_sentences/ 
    β”œβ”€β”€ Release/
    β”‚   β”œβ”€β”€ videoset.csv  (videos metadata file)
    β”‚   └── CLIPS320/
    β”‚       └── *.mp4     (video files)
    └── text_annotations/
        β”œβ”€β”€ a2d_annotation.txt  (actual text annotations)
        β”œβ”€β”€ a2d_missed_videos.txt
        └── a2d_annotation_with_instances/ 
            └── */ (video folders)
                └── *.h5 (annotations files) 

Citation:

@inproceedings{YaXuCaCVPR2017,
  author = {Yan, Y. and Xu, C. and Cai, D. and {\bf Corso}, {\bf J. J.}},
  booktitle = {{Proceedings of IEEE Conference on Computer Vision and Pattern Recognition}},
  tags = {computer vision, activity recognition, video understanding, semantic segmentation},
  title = {Weakly Supervised Actor-Action Segmentation via Robust Multi-Task Ranking},
  year = {2017}
}
@inproceedings{XuCoCVPR2016,
  author = {Xu, C. and {\bf Corso}, {\bf J. J.}},
  booktitle = {{Proceedings of IEEE Conference on Computer Vision and Pattern Recognition}},
  datadownload = {http://web.eecs.umich.edu/~jjcorso/r/a2d},
  tags = {computer vision, activity recognition, video understanding, semantic segmentation},
  title = {Actor-Action Semantic Segmentation with Grouping-Process Models},
  year = {2016}
}
@inproceedings{XuHsXiCVPR2015,
  author = {Xu, C. and Hsieh, S.-H. and Xiong, C. and {\bf Corso}, {\bf J. J.}},
  booktitle = {{Proceedings of IEEE Conference on Computer Vision and Pattern Recognition}},
  datadownload = {http://web.eecs.umich.edu/~jjcorso/r/a2d},
  poster = {http://web.eecs.umich.edu/~jjcorso/pubs/xu_corso_CVPR2015_A2D_poster.pdf},
  tags = {computer vision, activity recognition, video understanding, semantic segmentation},
  title = {Can Humans Fly? {Action} Understanding with Multiple Classes of Actors},
  url = {http://web.eecs.umich.edu/~jjcorso/pubs/xu_corso_CVPR2015_A2D.pdf},
  year = {2015}
}

image

downloading_script

mkdir jhmdb_sentences
cd jhmdb_sentences
wget http://files.is.tue.mpg.de/jhmdb/Rename_Images.tar.gz
wget https://kgavrilyuk.github.io/actor_action/jhmdb_annotation.txt
wget http://files.is.tue.mpg.de/jhmdb/puppet_mask.zip
tar -xzvf  Rename_Images.tar.gz
unzip puppet_mask.zip
cd ..

Folder structure:

${current_path}/
└── jhmdb_sentences/ 
    β”œβ”€β”€ Rename_Images/  (frame images)
    β”‚   └── */ (action dirs)
    β”œβ”€β”€ puppet_mask/  (mask annotations)
    β”‚   └── */ (action dirs)
    └── jhmdb_annotation.txt  (text annotations)

Citation:

@inproceedings{Jhuang:ICCV:2013,
title = {Towards understanding action recognition},
author = {H. Jhuang and J. Gall and S. Zuffi and C. Schmid and M. J. Black},
booktitle = {International Conf. on Computer Vision (ICCV)},
month = Dec,
pages = {3192-3199},
year = {2013}
}

image image image

Owner
Explorer
Explorer
Scripts of Machine Learning Algorithms from Scratch. Implementations of machine learning models and algorithms using nothing but NumPy with a focus on accessibility. Aims to cover everything from basic to advance.

Algo-ScriptML Python implementations of some of the fundamental Machine Learning models and algorithms from scratch. The goal of this project is not t

Algo Phantoms 81 Nov 26, 2022
Top #1 Submission code for the first https://alphamev.ai MEV competition with best AUC (0.9893) and MSE (0.0982).

alphamev-winning-submission Top #1 Submission code for the first alphamev MEV competition with best AUC (0.9893) and MSE (0.0982). The code won't run

70 Oct 29, 2022
PyTorch implementation for COMPLETER: Incomplete Multi-view Clustering via Contrastive Prediction (CVPR 2021)

Completer: Incomplete Multi-view Clustering via Contrastive Prediction This repo contains the code and data of the following paper accepted by CVPR 20

XLearning Group 72 Dec 07, 2022
Classification of Long Sequential Data using Circular Dilated Convolutional Neural Networks

Classification of Long Sequential Data using Circular Dilated Convolutional Neural Networks arXiv preprint: https://arxiv.org/abs/2201.02143. Architec

19 Nov 30, 2022
Code for ICML 2021 paper: How could Neural Networks understand Programs?

OSCAR This repository contains the source code of our ICML 2021 paper How could Neural Networks understand Programs?. Environment Run following comman

Dinglan Peng 115 Dec 17, 2022
Official code for the CVPR 2021 paper "How Well Do Self-Supervised Models Transfer?"

How Well Do Self-Supervised Models Transfer? This repository hosts the code for the experiments in the CVPR 2021 paper How Well Do Self-Supervised Mod

Linus Ericsson 157 Dec 16, 2022
A simple Rock-Paper-Scissors game using CV in python

ML18_Rock-Paper-Scissors-using-CV A simple Rock-Paper-Scissors game using CV in python For IITISOC-21 Rules and procedure to play the interactive game

Anirudha Bhagwat 3 Aug 08, 2021
Code release for ConvNeXt model

A ConvNet for the 2020s Official PyTorch implementation of ConvNeXt, from the following paper: A ConvNet for the 2020s. arXiv 2022. Zhuang Liu, Hanzi

Meta Research 4.6k Jan 08, 2023
End-to-end beat and downbeat tracking in the time domain.

WaveBeat End-to-end beat and downbeat tracking in the time domain. | Paper | Code | Video | Slides | Setup First clone the repo. git clone https://git

Christian J. Steinmetz 60 Dec 24, 2022
πŸ§‘β€πŸ”¬ verify your TEAL program by experiment and observation

Graviton - Testing TEAL with Dry Runs Tutorial Local Installation The following instructions assume that you have make available in your local environ

Algorand 18 Jan 03, 2023
Evaluation Pipeline for our ECCV2020: Journey Towards Tiny Perceptual Super-Resolution.

Journey Towards Tiny Perceptual Super-Resolution Test code for our ECCV2020 paper: https://arxiv.org/abs/2007.04356 Our x4 upscaling pre-trained model

Royson 6 Mar 30, 2022
Towards Open-World Feature Extrapolation: An Inductive Graph Learning Approach

This repository holds the implementation for paper Towards Open-World Feature Extrapolation: An Inductive Graph Learning Approach Download our preproc

Qitian Wu 42 Dec 27, 2022
A collection of models for image<->text generation in ACM MM 2021.

Bi-directional Image and Text Generation UMT-BITG (image & text generator) Unifying Multimodal Transformer for Bi-directional Image and Text Generatio

Multimedia Research 63 Oct 30, 2022
Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

CoProtector Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

Zhensu Sun 1 Oct 26, 2021
Code for the bachelors-thesis flaky fault localization

Flaky_Fault_Localization Scripts for the Bachelors-Thesis: "Flaky Fault Localization" by Christian Kasberger. The thesis examines the usefulness of sp

Christian Kasberger 1 Oct 26, 2021
TrackTech: Real-time tracking of subjects and objects on multiple cameras

TrackTech: Real-time tracking of subjects and objects on multiple cameras This project is part of the 2021 spring bachelor final project of the Bachel

5 Jun 17, 2022
Code for the ECIR'22 paper "Evaluating the Robustness of Retrieval Pipelines with Query Variation Generators"

Query Variation Generators This repository contains the code and annotation data for the ECIR'22 paper "Evaluating the Robustness of Retrieval Pipelin

Gustavo Penha 12 Nov 20, 2022
A PyTorch implementation of "Semi-Supervised Graph Classification: A Hierarchical Graph Perspective" (WWW 2019)

SEAL β €β €β € A PyTorch implementation of Semi-Supervised Graph Classification: A Hierarchical Graph Perspective (WWW 2019) Abstract Node classification an

Benedek Rozemberczki 202 Dec 27, 2022
Codebase for arXiv preprint "NeRF++: Analyzing and Improving Neural Radiance Fields"

NeRF++ Codebase for arXiv preprint "NeRF++: Analyzing and Improving Neural Radiance Fields" Work with 360 capture of large-scale unbounded scenes. Sup

Kai Zhang 722 Dec 28, 2022
GLODISMO: Gradient-Based Learning of Discrete Structured Measurement Operators for Signal Recovery

GLODISMO: Gradient-Based Learning of Discrete Structured Measurement Operators for Signal Recovery This is the code to the paper: Gradient-Based Learn

3 Feb 15, 2022