(CVPR 2022) Pytorch implementation of "Self-supervised transformers for unsupervised object discovery using normalized cut"

Overview

(CVPR 2022) TokenCut

Pytorch implementation of Tokencut:

Self-supervised Transformers for Unsupervised Object Discovery using Normalized Cut

Yangtao Wang, Xi Shen, Shell Xu Hu, Yuan Yuan, James L. Crowley, Dominique Vaufreydaz

[Project page] [Paper] Colab demo Hugging Face Spaces

TokenCut teaser

If our project is helpful for your research, please consider citing :

@inproceedings{wang2022tokencut,
          title={Self-supervised Transformers for Unsupervised Object Discovery using Normalized Cut},
          author={Wang, Yangtao and Shen, Xi and Hu, Shell Xu and Yuan, Yuan and Crowley, James L. and Vaufreydaz, Dominique},
          booktitle={Conference on Computer Vision and Pattern Recognition}
          year={2022}
        }

Table of Content

1. Updates

03/10/2022 Creating a 480p Demo using Gradio. Try out the Web Demo: Hugging Face Spaces

Internet image results:

TokenCut visualizations TokenCut visualizations TokenCut visualizations TokenCut visualizations

02/26/2022 Integrated into Huggingface Spaces 🤗 using Gradio. Try out the Web Demo: Hugging Face Spaces

02/26/2022 A simple TokenCut Colab Demo is available.

02/21/2022 Initial commit: Code of TokenCut is released, including evaluation of unsupervised object discovery, unsupervised saliency object detection, weakly supervised object locolization.

2. Installation

2.1 Dependencies

This code was implemented with Python 3.7, PyTorch 1.7.1 and CUDA 11.2. Please refer to the official installation. If CUDA 10.2 has been properly installed :

pip install torch==1.7.1 torchvision==0.8.2

In order to install the additionnal dependencies, please launch the following command:

pip install -r requirements.txt

2.2 Data

We provide quick download commands in DOWNLOAD_DATA.md for VOC2007, VOC2012, COCO, CUB, ImageNet, ECSSD, DUTS and DUT-OMRON as well as DINO checkpoints.

3. Quick Start

3.1 Detecting an object in one image

We provide TokenCut visualization for bounding box prediction and attention map. Using all for all visualization results.

python main_tokencut.py --image_path examples/VOC07_000036.jpg --visualize pred
python main_tokencut.py --image_path examples/VOC07_000036.jpg --visualize attn
python main_tokencut.py --image_path examples/VOC07_000036.jpg --visualize all 

3.2 Segmenting a salient region in one image

We provide TokenCut segmentation results as follows:

cd unsupervised_saliency_detection 
python get_saliency.py --sigma-spatial 16 --sigma-luma 16 --sigma-chroma 8 --vit-arch small --patch-size 16 --img-path ../examples/VOC07_000036.jpg --out-dir ./output

4. Evaluation

Following are the different steps to reproduce the results of TokenCut presented in the paper.

4.1 Unsupervised object discovery

TokenCut visualizations TokenCut visualizations TokenCut visualizations

PASCAL-VOC

In order to apply TokenCut and compute corloc results (VOC07 68.8, VOC12 72.1), please launch:

python main_tokencut.py --dataset VOC07 --set trainval
python main_tokencut.py --dataset VOC12 --set trainval

If you want to extract Dino features, which corresponds to the KEY features in DINO:

mkdir features
python main_lost.py --dataset VOC07 --set trainval --save-feat-dir features/VOC2007

COCO

Results are provided given the 2014 annotations following previous works. The following command line allows you to get results on the subset of 20k images of the COCO dataset (corloc 58.8), following previous litterature. To be noted that the 20k images are a subset of the train set.

python main_tokencut.py --dataset COCO20k --set train

Different models

We have tested the method on different setups of the VIT model, corloc results are presented in the following table (more can be found in the paper).

arch pre-training dataset
VOC07 VOC12 COCO20k
ViT-S/16 DINO 68.8 72.1 58.8
ViT-S/8 DINO 67.3 71.6 60.7
ViT-B/16 DINO 68.8 72.4 59.0

Previous results on the dataset VOC07 can be obtained by launching:

python main_tokencut.py --dataset VOC07 --set trainval #VIT-S/16
python main_tokencut.py --dataset VOC07 --set trainval --patch_size 8 #VIT-S/8
python main_tokencut.py --dataset VOC07 --set trainval --arch vit_base #VIT-B/16

4.2 Unsupervised saliency detection

TokenCut visualizations TokenCut visualizations TokenCut visualizations

To evaluate on ECSSD, DUTS, DUT_OMRON dataset:

python get_saliency.py --out-dir ECSSD --sigma-spatial 16 --sigma-luma 16 --sigma-chroma 8 --nb-vis 1 --vit-arch small --patch-size 16 --dataset ECSSD

python get_saliency.py --out-dir DUTS --sigma-spatial 16 --sigma-luma 16 --sigma-chroma 8 --nb-vis 1 --vit-arch small --patch-size 16 --dataset DUTS

python get_saliency.py --out-dir DUT --sigma-spatial 16 --sigma-luma 16 --sigma-chroma 8 --nb-vis 1 --vit-arch small --patch-size 16 --dataset DUT

This should give:

Method ECSSD DUTS DUT-OMRON
maxF IoU Acc maxF IoU Acc maxF IoU Acc
TokenCut 80.3 71.2 91.8 67.2 57.6 90.3 60.0 53.3 88.0
TokenCut + BS 87.4 77.2 93.4 75.5 62,4 91.4 69.7 61.8 89.7

4.3 Weakly supervised object detection

TokenCut visualizations TokenCut visualizations TokenCut visualizations

Fintune DINO small on CUB

To finetune ViT-S/16 on CUB on a single node with 4 gpus for 1000 epochs run:

python -m torch.distributed.launch --nproc_per_node=4 main.py --data_path /path/to/data --batch_size_per_gpu 256 --dataset cub --weight_decay 0.005 --pretrained_weights ./dino_deitsmall16_pretrain.pth --epoch 1000 --output_dir ./path/to/checkpoin --lr 2e-4 --warmup-epochs 50 --num_labels 200 --num_workers 16 --n_last_blocks 1 --avgpool_patchtokens true --arch vit_small --patch_size 16

Evaluation on CUB

To evaluate a fine-tuned ViT-S/16 on CUB val with a single GPU run:

python eval.py --pretrained_weights ./path/to/checkpoint --dataset cub --data_path ./path/to/data --batch_size_per_gpu 1 --no_center_crop

This should give:

Top1_cls: 79.12, top5_cls94.80, gt_loc: 0.914, top1_loc:0.723

Evaluate on Imagenet

To Evaluate ViT-S/16 finetuned on ImageNet val with a single GPU run:

python eval.py --pretrained_weights /path/to/checkpoint --classifier_weights /path/to/linear_weights--dataset imagenet --data_path ./path/to/data --batch_size_per_gpu 1 --num_labels 1000 --batch_size_per_gpu 1 --no_center_crop --input_size 256 --tau 0.2 --patch_size 16 --arch vit_small

5. Acknowledgement

TokenCut code is built on top of LOST, DINO, Segswap, and Bilateral_Sovlver. We would like to sincerely thanks those authors for their great works.

Owner
YANGTAO WANG
PhD, Computer Vision, Deep Learning
YANGTAO WANG
CvT2DistilGPT2 is an encoder-to-decoder model that was developed for chest X-ray report generation.

CvT2DistilGPT2 Improving Chest X-Ray Report Generation by Leveraging Warm-Starting This repository houses the implementation of CvT2DistilGPT2 from [1

The Australian e-Health Research Centre 21 Dec 28, 2022
Official implementation of "Articulation Aware Canonical Surface Mapping"

Articulation-Aware Canonical Surface Mapping Nilesh Kulkarni, Abhinav Gupta, David F. Fouhey, Shubham Tulsiani Paper Project Page Requirements Python

Nilesh Kulkarni 56 Dec 16, 2022
CCPD: a diverse and well-annotated dataset for license plate detection and recognition

CCPD (Chinese City Parking Dataset, ECCV) UPdate on 10/03/2019. CCPD Dataset is now updated. We are confident that images in subsets of CCPD is much m

detectRecog 1.8k Dec 30, 2022
Joint Discriminative and Generative Learning for Person Re-identification. CVPR'19 (Oral)

Joint Discriminative and Generative Learning for Person Re-identification [Project] [Paper] [YouTube] [Bilibili] [Poster] [Supp] Joint Discriminative

NVIDIA Research Projects 1.2k Dec 30, 2022
Large scale and asynchronous Hyperparameter Optimization at your fingertip.

Syne Tune This package provides state-of-the-art distributed hyperparameter optimizers (HPO) where trials can be evaluated with several backend option

Amazon Web Services - Labs 236 Jan 01, 2023
Detecting Potentially Harmful and Protective Suicide-related Content on Twitter

TwitterSuicideML Scripts for reproducing the Machine Learning analysis of the paper: Detecting Potentially Harmful and Protective Suicide-related Cont

3 Oct 17, 2022
Code for the paper "Improving Vision-and-Language Navigation with Image-Text Pairs from the Web" (ECCV 2020)

Improving Vision-and-Language Navigation with Image-Text Pairs from the Web Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh

Arjun Majumdar 44 Dec 14, 2022
Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation, NeurIPS 2021 Spotlight

PCAN for Multiple Object Tracking and Segmentation This is the offical implementation of paper PCAN for MOTS. We also present a trailer that consists

ETH VIS Group 328 Dec 29, 2022
Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language (NeurIPS 2021)

VRDP (NeurIPS 2021) Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language Mingyu Ding, Zhenfang Chen, Tao Du, Pin

Mingyu Ding 36 Sep 20, 2022
An Straight Dilated Network with Wavelet for image Deblurring

SDWNet: A Straight Dilated Network with Wavelet Transformation for Image Deblurring(offical) 1. Introduction This repo is not only used for our paper(

FlyEgle 41 Jan 04, 2023
Python port of R's Comprehensive Dynamic Time Warp algorithm package

Welcome to the dtw-python package Comprehensive implementation of Dynamic Time Warping algorithms. DTW is a family of algorithms which compute the loc

Dynamic Time Warping algorithms 154 Dec 26, 2022
Source code of the paper "Deep Learning of Latent Variable Models for Industrial Process Monitoring".

Source code of the paper "Deep Learning of Latent Variable Models for Industrial Process Monitoring".

Xiangyin Kong 7 Nov 08, 2022
Real-time Neural Representation Fusion for Robust Volumetric Mapping

NeuralBlox: Real-Time Neural Representation Fusion for Robust Volumetric Mapping Paper | Supplementary This repository contains the implementation of

ETHZ ASL 106 Dec 24, 2022
TensorFlow (Python) implementation of DeepTCN model for multivariate time series forecasting.

DeepTCN TensorFlow TensorFlow (Python) implementation of multivariate time series forecasting model introduced in Chen, Y., Kang, Y., Chen, Y., & Wang

Flavia Giammarino 21 Dec 19, 2022
Mixed Neural Likelihood Estimation for models of decision-making

Mixed neural likelihood estimation for models of decision-making Mixed neural likelihood estimation (MNLE) enables Bayesian parameter inference for mo

mackelab 9 Dec 22, 2022
Facial detection, landmark tracking and expression transfer library for Windows, Linux and Mac

Welcome to the CSIRO Face Analysis SDK. Documentation for the SDK can be found in doc/documentation.html. All code in this SDK is provided according t

Luiz Carlos Vieira 7 Jul 16, 2020
RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching

RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching This repository contains the source code for our paper: RAFT-Stereo: Multilevel

Princeton Vision & Learning Lab 328 Jan 09, 2023
Video Corpus Moment Retrieval with Contrastive Learning (SIGIR 2021)

Video Corpus Moment Retrieval with Contrastive Learning PyTorch implementation for the paper "Video Corpus Moment Retrieval with Contrastive Learning"

ZHANG HAO 42 Dec 29, 2022
Code release for the ICML 2021 paper "PixelTransformer: Sample Conditioned Signal Generation".

PixelTransformer Code release for the ICML 2021 paper "PixelTransformer: Sample Conditioned Signal Generation". Project Page Installation Please insta

Shubham Tulsiani 24 Dec 17, 2022
Sequence to Sequence Models with PyTorch

Sequence to Sequence models with PyTorch This repository contains implementations of Sequence to Sequence (Seq2Seq) models in PyTorch At present it ha

Sandeep Subramanian 708 Dec 19, 2022