Referring Video Object Segmentation

Overview

Awesome-Referring-Video-Object-Segmentation Awesome

Welcome to starts โญ & comments ๐Ÿ’น & sharing ๐Ÿ˜€ !!

- 2021.12.12: Recent papers (from 2021) 
- welcome to add if any information misses. ๐Ÿ˜Ž

Introduction

image

Referring video object segmentation aims at segmenting an object in video with language expressions.

Unlike the previous video object segmentation, the task exploits a different type of supervision, language expressions, to identify and segment an object referred by the given language expressions in a video. A detailed explanation of the new task can be found in the following paper.

Seonguk Seo, Joon-Young Lee, Bohyung Han, โ€œURVOS: Unified Referring Video Object Segmentation Network with a Large-Scale Benchmarkโ€, European Conference on Computer Vision (ECCV), 2020:https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123600205.pdf

Impressive Works Related to Referring Video Object Segmentation (RVOS)

Cross-modal progressive comprehension for referring segmentation:https://arxiv.org/abs/2105.07175 image

Benchmark

The 3rd Large-scale Video Object Segmentation - Track 3: Referring Video Object Segmentation

Datasets

image

Refer-YouTube-VOS-datasets

  • YouTube-VOS:
wget https://github.com/JerryX1110/awesome-rvos/blob/main/down_YTVOS_w_refer.py
python down_YTVOS_w_refer.py

Folder structure:

${current_path}/
โ””โ”€โ”€ refer_youtube_vos/ 
    โ”œโ”€โ”€ train/
    โ”‚   โ”œโ”€โ”€ JPEGImages/
    โ”‚   โ”‚   โ””โ”€โ”€ */ (video folders)
    โ”‚   โ”‚       โ””โ”€โ”€ *.jpg (frame image files) 
    โ”‚   โ””โ”€โ”€ Annotations/
    โ”‚       โ””โ”€โ”€ */ (video folders)
    โ”‚           โ””โ”€โ”€ *.png (mask annotation files) 
    โ”œโ”€โ”€ valid/
    โ”‚   โ””โ”€โ”€ JPEGImages/
    โ”‚       โ””โ”€โ”€ */ (video folders)
    โ”‚           โ””โ”€โ”€ *.jpg (frame image files) 
    โ””โ”€โ”€ meta_expressions/
        โ”œโ”€โ”€ train/
        โ”‚   โ””โ”€โ”€ meta_expressions.json  (text annotations)
        โ””โ”€โ”€ valid/
            โ””โ”€โ”€ meta_expressions.json  (text annotations)
  • A2D-Sentences:

REPO:https://web.eecs.umich.edu/~jjcorso/r/a2d/

paper:https://arxiv.org/abs/1803.07485

image

Citation:

@misc{gavrilyuk2018actor,
      title={Actor and Action Video Segmentation from a Sentence}, 
      author={Kirill Gavrilyuk and Amir Ghodrati and Zhenyang Li and Cees G. M. Snoek},
      year={2018},
      eprint={1803.07485},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

License: The dataset may not be republished in any form without the written consent of the authors.

README Dataset and Annotation (version 1.0, 1.9GB, tar.bz) Evaluation Toolkit (version 1.0, tar.bz)

mkdir a2d_sentences
cd a2d_sentences
wget https://web.eecs.umich.edu/~jjcorso/bigshare/A2D_main_1_0.tar.bz
tar jxvf A2D_main_1_0.tar.bz
mkdir text_annotations

cd text_annotations
wget https://kgavrilyuk.github.io/actor_action/a2d_annotation.txt
wget https://kgavrilyuk.github.io/actor_action/a2d_missed_videos.txt
wget https://github.com/JerryX1110/awesome-rvos/blob/main/down_a2d_annotation_with_instances.py
python down_a2d_annotation_with_instances.py
unzip a2d_annotation_with_instances.zip
#rm a2d_annotation_with_instances.zip
cd ..

cd ..

Folder structure:

${current_path}/
โ””โ”€โ”€ a2d_sentences/ 
    โ”œโ”€โ”€ Release/
    โ”‚   โ”œโ”€โ”€ videoset.csv  (videos metadata file)
    โ”‚   โ””โ”€โ”€ CLIPS320/
    โ”‚       โ””โ”€โ”€ *.mp4     (video files)
    โ””โ”€โ”€ text_annotations/
        โ”œโ”€โ”€ a2d_annotation.txt  (actual text annotations)
        โ”œโ”€โ”€ a2d_missed_videos.txt
        โ””โ”€โ”€ a2d_annotation_with_instances/ 
            โ””โ”€โ”€ */ (video folders)
                โ””โ”€โ”€ *.h5 (annotations files) 

Citation:

@inproceedings{YaXuCaCVPR2017,
  author = {Yan, Y. and Xu, C. and Cai, D. and {\bf Corso}, {\bf J. J.}},
  booktitle = {{Proceedings of IEEE Conference on Computer Vision and Pattern Recognition}},
  tags = {computer vision, activity recognition, video understanding, semantic segmentation},
  title = {Weakly Supervised Actor-Action Segmentation via Robust Multi-Task Ranking},
  year = {2017}
}
@inproceedings{XuCoCVPR2016,
  author = {Xu, C. and {\bf Corso}, {\bf J. J.}},
  booktitle = {{Proceedings of IEEE Conference on Computer Vision and Pattern Recognition}},
  datadownload = {http://web.eecs.umich.edu/~jjcorso/r/a2d},
  tags = {computer vision, activity recognition, video understanding, semantic segmentation},
  title = {Actor-Action Semantic Segmentation with Grouping-Process Models},
  year = {2016}
}
@inproceedings{XuHsXiCVPR2015,
  author = {Xu, C. and Hsieh, S.-H. and Xiong, C. and {\bf Corso}, {\bf J. J.}},
  booktitle = {{Proceedings of IEEE Conference on Computer Vision and Pattern Recognition}},
  datadownload = {http://web.eecs.umich.edu/~jjcorso/r/a2d},
  poster = {http://web.eecs.umich.edu/~jjcorso/pubs/xu_corso_CVPR2015_A2D_poster.pdf},
  tags = {computer vision, activity recognition, video understanding, semantic segmentation},
  title = {Can Humans Fly? {Action} Understanding with Multiple Classes of Actors},
  url = {http://web.eecs.umich.edu/~jjcorso/pubs/xu_corso_CVPR2015_A2D.pdf},
  year = {2015}
}

image

downloading_script

mkdir jhmdb_sentences
cd jhmdb_sentences
wget http://files.is.tue.mpg.de/jhmdb/Rename_Images.tar.gz
wget https://kgavrilyuk.github.io/actor_action/jhmdb_annotation.txt
wget http://files.is.tue.mpg.de/jhmdb/puppet_mask.zip
tar -xzvf  Rename_Images.tar.gz
unzip puppet_mask.zip
cd ..

Folder structure:

${current_path}/
โ””โ”€โ”€ jhmdb_sentences/ 
    โ”œโ”€โ”€ Rename_Images/  (frame images)
    โ”‚   โ””โ”€โ”€ */ (action dirs)
    โ”œโ”€โ”€ puppet_mask/  (mask annotations)
    โ”‚   โ””โ”€โ”€ */ (action dirs)
    โ””โ”€โ”€ jhmdb_annotation.txt  (text annotations)

Citation:

@inproceedings{Jhuang:ICCV:2013,
title = {Towards understanding action recognition},
author = {H. Jhuang and J. Gall and S. Zuffi and C. Schmid and M. J. Black},
booktitle = {International Conf. on Computer Vision (ICCV)},
month = Dec,
pages = {3192-3199},
year = {2013}
}

image image image

Owner
Explorer
Explorer
A note taker for NVDA. Allows the user to create, edit, view, manage and export notes to different formats.

Quick Notetaker add-on for NVDA The Quick Notetaker add-on is a wonderful tool which allows writing notes quickly and easily anytime and from any app

5 Dec 06, 2022
Viewmaker Networks: Learning Views for Unsupervised Representation Learning

Viewmaker Networks: Learning Views for Unsupervised Representation Learning Alex Tamkin, Mike Wu, and Noah Goodman Paper link: https://arxiv.org/abs/2

Alex Tamkin 31 Dec 01, 2022
Use your Philips Hue lights as Racing Flags. Works with Assetto Corsa, Assetto Corsa Competizione and iRacing.

phue-racing-flags Use your Philips Hue lights as Racing Flags. Explore the docs ยป Report Bug ยท Request Feature Table of Contents About The Project Bui

50 Sep 03, 2022
Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation, available for both PyTorch and Tensorflow.

Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation, available for both PyTorch and Tensorflow.

730 Jan 09, 2023
NumQMBasic - A mini-course offered to Undergrad physics students

The best way to use this material is by forking it by click the Fork button at the top, right corner. Then you will get your own copy to play with! Th

Raghu 35 Dec 05, 2022
This repository implements and evaluates convolutional networks on the Mรถbius strip as toy model instantiations of Coordinate Independent Convolutional Networks.

Orientation independent Mรถbius CNNs This repository implements and evaluates convolutional networks on the Mรถbius strip as toy model instantiations of

Maurice Weiler 59 Dec 09, 2022
A new data augmentation method for extreme lighting conditions.

Random Shadows and Highlights This repo has the source code for the paper: Random Shadows and Highlights: A new data augmentation method for extreme l

Osama Mazhar 35 Nov 26, 2022
Recommendation algorithms for large graphs

Fast recommendation algorithms for large graphs based on link analysis. License: Apache Software License Author: Emmanouil (Manios) Krasanakis Depende

Multimedia Knowledge and Social Analytics Lab 27 Jan 07, 2023
Code release for paper: The Boombox: Visual Reconstruction from Acoustic Vibrations

The Boombox: Visual Reconstruction from Acoustic Vibrations Boyuan Chen, Mia Chiquier, Hod Lipson, Carl Vondrick Columbia University Project Website |

Boyuan Chen 12 Nov 30, 2022
A Simple Long-Tailed Rocognition Baseline via Vision-Language Model

BALLAD This is the official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model. Requirements Python3 Pytorch(1.7.

Teli Ma 4 Jan 20, 2022
This repository provides some of the code implemented and the data used for the work proposed in "A Cluster-Based Trip Prediction Graph Neural Network Model for Bike Sharing Systems".

cluster-link-prediction This repository provides some of the code implemented and the data used for the work proposed in "A Cluster-Based Trip Predict

Bรกrbara 0 Dec 28, 2022
GUPNet - Geometry Uncertainty Projection Network for Monocular 3D Object Detection

GUPNet This is the official implementation of "Geometry Uncertainty Projection Network for Monocular 3D Object Detection". citation If you find our wo

Yan Lu 103 Dec 28, 2022
Depression Asisstant GDSC Challenge Solution

Depression Asisstant can help you give solution. Please using Python version 3.9.5 for contribute.

Ananda Rauf 1 Jan 30, 2022
DLFlow is a deep learning framework.

DLFlowๆ˜ฏไธ€ๅฅ—ๆทฑๅบฆๅญฆไน pipeline๏ผŒๅฎƒ็ป“ๅˆไบ†Spark็š„ๅคง่ง„ๆจก็‰นๅพๅค„็†่ƒฝๅŠ›ๅ’ŒTensorflowๆจกๅž‹ๆž„ๅปบ่ƒฝๅŠ›ใ€‚ๅˆฉ็”จDLFlowๅฏไปฅๅฟซ้€Ÿๅค„็†ๅŽŸๅง‹็‰นๅพใ€่ฎญ็ปƒๆจกๅž‹ๅนถ่ฟ›่กŒๅคง่ง„ๆจกๅˆ†ๅธƒๅผ้ข„ๆต‹๏ผŒๅๅˆ†้€‚ๅˆ็ฆป็บฟ็Žฏๅขƒไธ‹็š„็”ŸไบงไปปๅŠกใ€‚ๅˆฉ็”จDLFlow๏ผŒ็”จๆˆทๅช้œ€ไธ“ๆณจไบŽๆจกๅž‹ๅผ€ๅ‘๏ผŒ่€Œๆ— ้œ€ๅ…ณๅฟƒๅŽŸๅง‹็‰นๅพๅค„็†ใ€pipelineๆž„ๅปบใ€็”Ÿไบง้ƒจ็ฝฒ็ญ‰ๅทฅไฝœใ€‚

DiDi 152 Oct 27, 2022
A repository for the paper "Improved Adversarial Systems for 3D Object Generation and Reconstruction".

Improved Adversarial Systems for 3D Object Generation and Reconstruction: This is a repository for the paper "Improved Adversarial Systems for 3D Obje

Edward Smith 188 Dec 25, 2022
My personal code and solution to the Synacor Challenge from 2012 OSCON.

Synacor OSCON Challenge Solution (2012) This repository contains my code and solution to solve the Synacor OSCON 2012 Challenge. If you are interested

2 Mar 20, 2022
Complete the code of prefix-tuning in low data setting

Prefix Tuning Note: ไฝœ่€…ๅœจ่ฎบๆ–‡ไธญๆๅˆฐไฝฟ็”จ็œŸๅฎž็š„wordๅŽปๅˆๅง‹ๅŒ–prefix็š„ๆ“ไฝœ๏ผˆInitializing the prefix with activations of real words๏ผŒsignificantly improves generation๏ผ‰ใ€‚ๆˆ‘ๅœจไฝฟ็”จไฝœ่€…ๆไพ›็š„

Andrew Zeng 4 Jul 11, 2022
Pytorch implementation of our paper under review โ€” Lottery Jackpots Exist in Pre-trained Models

Lottery Jackpots Exist in Pre-trained Models (Paper Link) Requirements Python = 3.7.4 Pytorch = 1.6.1 Torchvision = 0.4.1 Reproduce the Experiment

Yuxin Zhang 27 Jun 28, 2022
LWCC: A LightWeight Crowd Counting library for Python that includes several pretrained state-of-the-art models.

LWCC: A LightWeight Crowd Counting library for Python LWCC is a lightweight crowd counting framework for Python. It wraps four state-of-the-art models

Matija Terลกek 39 Dec 28, 2022
A face dataset generator with out-of-focus blur detection and dynamic interval adjustment.

A face dataset generator with out-of-focus blur detection and dynamic interval adjustment.

Yutian Liu 2 Jan 29, 2022