Official implementation of CATs: Cost Aggregation Transformers for Visual Correspondence NeurIPS'21

Last update: Jan 04, 2023

Overview

CATs: Cost Aggregation Transformers for Visual Correspondence NeurIPS'21

For more information, check out the paper on [arXiv].

Training with different backbones and evaluations of them are to be updated soon..

Check out our new paper! [arXiv]

Network

Our model CATs is illustrated below:

Environment Settings

git clone https://github.com/SunghwanHong/CATs
cd CATs

conda create -n CATs python=3.6
conda activate CATs

pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install -U scikit-image
pip install git+https://github.com/albumentations-team/albumentations
pip install tensorboardX termcolor timm tqdm requests pandas

Evaluation

Download pre-trained weights on Link
All datasets are automatically downloaded into directory specified by argument datapath

Result on SPair-71k: (PCK 49.9%)

  python test.py --pretrained "/path_to_pretrained_model/spair" --benchmark spair

Result on SPair-71k, feature backbone frozen: (PCK 42.4%)

  python test.py --pretrained "/path_to_pretrained_model/spair_frozen" --benchmark spair

Results on PF-PASCAL: (PCK 75.4%, 92.6%, 96.4%)

  python test.py --pretrained "/path_to_pretrained_model/pfpascal" --benchmark pfpascal

Results on PF-PACAL, feature backbone frozen: (PCK 67.5%, 89.1%, 94.9%)

  python test.py --pretrained "/path_to_pretrained_model/pfpascal_frozen" --benchmark pfpascal

Acknowledgement

We borrow code from public projects (huge thanks to all the projects). We mainly borrow code from DHPF and GLU-Net.

BibTeX

If you find this research useful, please consider citing:

@inproceedings{cho2021cats,
  title={CATs: Cost Aggregation Transformers for Visual Correspondence},
  author={Cho, Seokju and Hong, Sunghwan and Jeon, Sangryul and Lee, Yunsung and Sohn, Kwanghoon and Kim, Seungryong},
  booktitle={Thirty-Fifth Conference on Neural Information Processing Systems},
  year={2021}
}

Official implementation of CATs: Cost Aggregation Transformers for Visual Correspondence NeurIPS'21

Related tags

Overview

CATs: Cost Aggregation Transformers for Visual Correspondence NeurIPS'21

Network

Environment Settings

Evaluation

Acknowledgement

BibTeX

Owner

Sunghwan Hong

Focal and Global Knowledge Distillation for Detectors

This is a Python wrapper for TA-LIB based on Cython instead of SWIG.

Libraries, tools and tasks created and used at DeepMind Robotics.

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL 2021.

RodoSol-ALPR Dataset

Implementation of "The Power of Scale for Parameter-Efficient Prompt Tuning"

Code for BMVC2021 "MOS: A Low Latency and Lightweight Framework for Face Detection, Landmark Localization, and Head Pose Estimation"

Safe Bayesian Optimization

Object DGCNN and DETR3D, Our implementations are built on top of MMdetection3D.

CS50x-AI - Artificial Intelligence with Python from Harvard University

Official repo of the paper "Surface Form Competition: Why the Highest Probability Answer Isn't Always Right"

Pytorch implementation for A-NeRF: Articulated Neural Radiance Fields for Learning Human Shape, Appearance, and Pose

Implementation for "Exploiting Aliasing for Manga Restoration" (CVPR 2021)

FAST Aiming at the problems of cumbersome steps and slow download speed of GNSS data

Task Transformer Network for Joint MRI Reconstruction and Super-Resolution (MICCAI 2021)

Code release for ConvNeXt model

"Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback"

Utilities to bridge Canvas-generated course rosters with GitLab's API.

Leaderboard and Visualization for RLCard

Official repository accompanying a CVPR 2022 paper EMOCA: Emotion Driven Monocular Face Capture And Animation. EMOCA takes a single image of a face as input and produces a 3D reconstruction. EMOCA sets the new standard on reconstructing highly emotional images in-the-wild