Custom implementation of Corrleation Module

Overview

PyPI

Pytorch Correlation module

this is a custom C++/Cuda implementation of Correlation module, used e.g. in FlowNetC

This tutorial was used as a basis for implementation, as well as NVIDIA's cuda code

  • Build and Install C++ and CUDA extensions by executing python setup.py install,
  • Benchmark C++ vs. CUDA by running python benchmark.py {cpu, cuda},
  • Run gradient checks on the code by running python grad_check.py --backend {cpu, cuda}.

Requirements

This module is expected to compile for Pytorch 1.6.

Installation

this module is available on pip

pip install spatial-correlation-sampler

For a cpu-only version, you can install from source with

python setup_cpu.py install

Known Problems

This module needs compatible gcc version and CUDA to be compiled. Namely, CUDA 9.1 and below will need gcc5, while CUDA 9.2 and 10.0 will need gcc7 See this issue for more information

Usage

API has a few difference with NVIDIA's module

  • output is now a 5D tensor, which reflects the shifts horizontal and vertical.
input (B x C x H x W) -> output (B x PatchH x PatchW x oH x oW)
  • Output sizes oH and oW are no longer dependant of patch size, but only of kernel size and padding
  • Patch size patch_size is now the whole patch, and not only the radii.
  • stride1 is now stride andstride2 is dilation_patch, which behave like dilated convolutions
  • equivalent max_displacement is then dilation_patch * (patch_size - 1) / 2.
  • dilation is a new parameter, it acts the same way as dilated convolution regarding the correlation kernel
  • to get the right parameters for FlowNetC, you would have
kernel_size=1
patch_size=21,
stride=1,
padding=0,
dilation=1
dilation_patch=2

Example

import torch
from spatial_correlation_sampler import SpatialCorrelationSampler, 

device = "cuda"
batch_size = 1
channel = 1
H = 10
W = 10
dtype = torch.float32

input1 = torch.randint(1, 4, (batch_size, channel, H, W), dtype=dtype, device=device, requires_grad=True)
input2 = torch.randint_like(input1, 1, 4).requires_grad_(True)

#You can either use the function or the module. Note that the module doesn't contain any parameter tensor.

#function

out = spatial_correlation_sample(input1,
	                         input2,
                                 kernel_size=3,
                                 patch_size=1,
                                 stride=2,
                                 padding=0,
                                 dilation=2,
                                 dilation_patch=1)

#module

correlation_sampler = SpatialCorrelationSampler(
    kernel_size=3,
    patch_size=1,
    stride=2,
    padding=0,
    dilation=2,
    dilation_patch=1)
out = correlation_sampler(input1, input2)

Benchmark

  • default parameters are from benchmark.py, FlowNetC parameters are same as use in FlowNetC with a batch size of 4, described in this paper, implemented here and here.
  • Feel free to file an issue to add entries to this with your hardware !

CUDA Benchmark

  • See here for a benchmark script working with NVIDIA's code, and Pytorch.
  • Benchmark are launched with environment variable CUDA_LAUNCH_BLOCKING set to 1.
  • Only float32 is benchmarked.
  • FlowNetC correlation parameters where launched with the following command:
CUDA_LAUNCH_BLOCKING=1 python benchmark.py --scale ms -k1 --patch 21 -s1 -p0 --patch_dilation 2 -b4 --height 48 --width 64 -c256 cuda -d float

CUDA_LAUNCH_BLOCKING=1 python NV_correlation_benchmark.py --scale ms -k1 --patch 21 -s1 -p0 --patch_dilation 2 -b4 --height 48 --width 64 -c256
implementation Correlation parameters device pass min time avg time
ours default 980 GTX forward 5.745 ms 5.851 ms
ours default 980 GTX backward 77.694 ms 77.957 ms
NVIDIA default 980 GTX forward 13.779 ms 13.853 ms
NVIDIA default 980 GTX backward 73.383 ms 73.708 ms
ours FlowNetC 980 GTX forward 26.102 ms 26.179 ms
ours FlowNetC 980 GTX backward 208.091 ms 208.510 ms
NVIDIA FlowNetC 980 GTX forward 35.363 ms 35.550 ms
NVIDIA FlowNetC 980 GTX backward 283.748 ms 284.346 ms

Notes

  • The overhead of our implementation regarding kernel_size > 1 during backward needs some investigation, feel free to dive in the code to improve it !
  • The backward pass of NVIDIA is not entirely correct when stride1 > 1 and kernel_size > 1, because not everything is computed, see here.

CPU Benchmark

  • No other implementation is avalaible on CPU.
  • It is obviously not recommended to run it on CPU if you have a GPU.
Correlation parameters device pass min time avg time
default E5-2630 v3 @ 2.40GHz forward 159.616 ms 188.727 ms
default E5-2630 v3 @ 2.40GHz backward 282.641 ms 294.194 ms
FlowNetC E5-2630 v3 @ 2.40GHz forward 2.138 s 2.144 s
FlowNetC E5-2630 v3 @ 2.40GHz backward 7.006 s 7.075 s
Owner
Clément Pinard
PhD ENSTA Paris, Deep Learning Engineer @ ContentSquare
Clément Pinard
An addernet CUDA version

Training addernet accelerated by CUDA Usage cd adder_cuda python setup.py install cd .. python main.py Environment pytorch 1.10.0 CUDA 11.3 benchmark

LingXY 4 Jun 20, 2022
Official Repository for our ICCV2021 paper: Continual Learning on Noisy Data Streams via Self-Purified Replay

Continual Learning on Noisy Data Streams via Self-Purified Replay This repository contains the official PyTorch implementation for our ICCV2021 paper.

Jinseo Jeong 22 Nov 23, 2022
Semi-Supervised Semantic Segmentation with Pixel-Level Contrastive Learning from a Class-wise Memory Bank

This repository provides the official code for replicating experiments from the paper: Semi-Supervised Semantic Segmentation with Pixel-Level Contrast

Iñigo Alonso Ruiz 58 Dec 15, 2022
⚾🤖⚾ Automatic baseball pitching overlay in realtime

⚾ Automatically overlaying pitch motion and trajectory with machine learning! This project takes your baseball pitching clips and automatically genera

Tony Chou 240 Dec 05, 2022
DeepLab resnet v2 model in pytorch

pytorch-deeplab-resnet DeepLab resnet v2 model implementation in pytorch. The architecture of deepLab-ResNet has been replicated exactly as it is from

Isht Dwivedi 601 Dec 22, 2022
PyTorch implementation of DD3D: Is Pseudo-Lidar needed for Monocular 3D Object detection?

PyTorch implementation of DD3D: Is Pseudo-Lidar needed for Monocular 3D Object detection? (ICCV 2021), Dennis Park*, Rares Ambrus*, Vitor Guizilini, Jie Li, and Adrien Gaidon.

Toyota Research Institute - Machine Learning 364 Dec 27, 2022
Image transformations designed for Scene Text Recognition (STR) data augmentation. Published at ICCV 2021 Workshop on Interactive Labeling and Data Augmentation for Vision.

Data Augmentation for Scene Text Recognition (ICCV 2021 Workshop) (Pronounced as "strog") Paper Arxiv Why it matters? Scene Text Recognition (STR) req

Rowel Atienza 152 Dec 28, 2022
minimizer-space de Bruijn graphs (mdBG) for whole genome assembly

rust-mdbg: Minimizer-space de Bruijn graphs (mdBG) for whole-genome assembly rust-mdbg is an ultra-fast minimizer-space de Bruijn graph (mdBG) impleme

Barış Ekim 148 Dec 01, 2022
MassiveSumm: a very large-scale, very multilingual, news summarisation dataset

MassiveSumm: a very large-scale, very multilingual, news summarisation dataset This repository contains links to data and code to fetch and reproduce

Daniel Varab 19 Dec 16, 2022
Variational autoencoder for anime face reconstruction

VAE animeface Variational autoencoder for anime face reconstruction Introduction This repository is an exploratory example to train a variational auto

Minzhe Zhang 2 Dec 11, 2021
Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection

Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection

61 Jan 07, 2023
Official implementation of SynthTIGER (Synthetic Text Image GEneratoR) ICDAR 2021

🐯 SynthTIGER: Synthetic Text Image GEneratoR Official implementation of SynthTIGER | Paper | Datasets Moonbin Yim1, Yoonsik Kim1, Han-cheol Cho1, Sun

Clova AI Research 256 Jan 05, 2023
Decensoring Hentai with Deep Neural Networks. Formerly named DeepMindBreak.

DeepCreamPy Decensoring Hentai with Deep Neural Networks. Formerly named DeepMindBreak. A deep learning-based tool to automatically replace censored a

616 Jan 06, 2023
DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models

DSEE Codes for [Preprint] DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models Xuxi Chen, Tianlong Chen, Yu Cheng, Weizhu Ch

VITA 4 Dec 27, 2021
SpineAI Bilsky Grading With Python

SpineAI-Bilsky-Grading SpineAI Paper with Code 📫 Contact Address correspondence to J.T.P.D.H. (e-mail: james_hallinan AT nuhs.edu.sg) Disclaimer This

<a href=[email protected]"> 2 Dec 16, 2021
Adaptive Pyramid Context Network for Semantic Segmentation (APCNet CVPR'2019)

Adaptive Pyramid Context Network for Semantic Segmentation (APCNet CVPR'2019) Introduction Official implementation of Adaptive Pyramid Context Network

21 Nov 09, 2022
We have made you a wrapper you can't refuse

We have made you a wrapper you can't refuse We have a vibrant community of developers helping each other in our Telegram group. Join us! Stay tuned fo

20.6k Jan 09, 2023
Repository for the COLING 2020 paper "Explainable Automated Fact-Checking: A Survey."

Explainable Fact Checking: A Survey This repository and the accompanying webpage contain resources for the paper "Explainable Fact Checking: A Survey"

Neema Kotonya 42 Nov 17, 2022
Free Book about Deep-Learning approaches for Chess (like AlphaZero, Leela Chess Zero and Stockfish NNUE)

Free Book about Deep-Learning approaches for Chess (like AlphaZero, Leela Chess Zero and Stockfish NNUE)

Dominik Klein 189 Dec 21, 2022
Lunar is a neural network aimbot that uses real-time object detection accelerated with CUDA on Nvidia GPUs.

Lunar Lunar is a neural network aimbot that uses real-time object detection accelerated with CUDA on Nvidia GPUs. About Lunar can be modified to work

Zeyad Mansour 276 Jan 07, 2023