Custom implementation of Corrleation Module

Overview

PyPI

Pytorch Correlation module

this is a custom C++/Cuda implementation of Correlation module, used e.g. in FlowNetC

This tutorial was used as a basis for implementation, as well as NVIDIA's cuda code

  • Build and Install C++ and CUDA extensions by executing python setup.py install,
  • Benchmark C++ vs. CUDA by running python benchmark.py {cpu, cuda},
  • Run gradient checks on the code by running python grad_check.py --backend {cpu, cuda}.

Requirements

This module is expected to compile for Pytorch 1.6.

Installation

this module is available on pip

pip install spatial-correlation-sampler

For a cpu-only version, you can install from source with

python setup_cpu.py install

Known Problems

This module needs compatible gcc version and CUDA to be compiled. Namely, CUDA 9.1 and below will need gcc5, while CUDA 9.2 and 10.0 will need gcc7 See this issue for more information

Usage

API has a few difference with NVIDIA's module

  • output is now a 5D tensor, which reflects the shifts horizontal and vertical.
input (B x C x H x W) -> output (B x PatchH x PatchW x oH x oW)
  • Output sizes oH and oW are no longer dependant of patch size, but only of kernel size and padding
  • Patch size patch_size is now the whole patch, and not only the radii.
  • stride1 is now stride andstride2 is dilation_patch, which behave like dilated convolutions
  • equivalent max_displacement is then dilation_patch * (patch_size - 1) / 2.
  • dilation is a new parameter, it acts the same way as dilated convolution regarding the correlation kernel
  • to get the right parameters for FlowNetC, you would have
kernel_size=1
patch_size=21,
stride=1,
padding=0,
dilation=1
dilation_patch=2

Example

import torch
from spatial_correlation_sampler import SpatialCorrelationSampler, 

device = "cuda"
batch_size = 1
channel = 1
H = 10
W = 10
dtype = torch.float32

input1 = torch.randint(1, 4, (batch_size, channel, H, W), dtype=dtype, device=device, requires_grad=True)
input2 = torch.randint_like(input1, 1, 4).requires_grad_(True)

#You can either use the function or the module. Note that the module doesn't contain any parameter tensor.

#function

out = spatial_correlation_sample(input1,
	                         input2,
                                 kernel_size=3,
                                 patch_size=1,
                                 stride=2,
                                 padding=0,
                                 dilation=2,
                                 dilation_patch=1)

#module

correlation_sampler = SpatialCorrelationSampler(
    kernel_size=3,
    patch_size=1,
    stride=2,
    padding=0,
    dilation=2,
    dilation_patch=1)
out = correlation_sampler(input1, input2)

Benchmark

  • default parameters are from benchmark.py, FlowNetC parameters are same as use in FlowNetC with a batch size of 4, described in this paper, implemented here and here.
  • Feel free to file an issue to add entries to this with your hardware !

CUDA Benchmark

  • See here for a benchmark script working with NVIDIA's code, and Pytorch.
  • Benchmark are launched with environment variable CUDA_LAUNCH_BLOCKING set to 1.
  • Only float32 is benchmarked.
  • FlowNetC correlation parameters where launched with the following command:
CUDA_LAUNCH_BLOCKING=1 python benchmark.py --scale ms -k1 --patch 21 -s1 -p0 --patch_dilation 2 -b4 --height 48 --width 64 -c256 cuda -d float

CUDA_LAUNCH_BLOCKING=1 python NV_correlation_benchmark.py --scale ms -k1 --patch 21 -s1 -p0 --patch_dilation 2 -b4 --height 48 --width 64 -c256
implementation Correlation parameters device pass min time avg time
ours default 980 GTX forward 5.745 ms 5.851 ms
ours default 980 GTX backward 77.694 ms 77.957 ms
NVIDIA default 980 GTX forward 13.779 ms 13.853 ms
NVIDIA default 980 GTX backward 73.383 ms 73.708 ms
ours FlowNetC 980 GTX forward 26.102 ms 26.179 ms
ours FlowNetC 980 GTX backward 208.091 ms 208.510 ms
NVIDIA FlowNetC 980 GTX forward 35.363 ms 35.550 ms
NVIDIA FlowNetC 980 GTX backward 283.748 ms 284.346 ms

Notes

  • The overhead of our implementation regarding kernel_size > 1 during backward needs some investigation, feel free to dive in the code to improve it !
  • The backward pass of NVIDIA is not entirely correct when stride1 > 1 and kernel_size > 1, because not everything is computed, see here.

CPU Benchmark

  • No other implementation is avalaible on CPU.
  • It is obviously not recommended to run it on CPU if you have a GPU.
Correlation parameters device pass min time avg time
default E5-2630 v3 @ 2.40GHz forward 159.616 ms 188.727 ms
default E5-2630 v3 @ 2.40GHz backward 282.641 ms 294.194 ms
FlowNetC E5-2630 v3 @ 2.40GHz forward 2.138 s 2.144 s
FlowNetC E5-2630 v3 @ 2.40GHz backward 7.006 s 7.075 s
Owner
Clément Pinard
PhD ENSTA Paris, Deep Learning Engineer @ ContentSquare
Clément Pinard
Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization' (ICCV-21 Oral)

Learning-Action-Completeness-from-Points Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal A

Pilhyeon Lee 67 Jan 03, 2023
RMTD: Robust Moving Target Defence Against False Data Injection Attacks in Power Grids

RMTD: Robust Moving Target Defence Against False Data Injection Attacks in Power Grids Real-time detection performance. This repo contains the code an

0 Nov 10, 2021
Mixed Neural Likelihood Estimation for models of decision-making

Mixed neural likelihood estimation for models of decision-making Mixed neural likelihood estimation (MNLE) enables Bayesian parameter inference for mo

mackelab 9 Dec 22, 2022
Pytorch codes for Feature Transfer Learning for Face Recognition with Under-Represented Data

FTLNet_Pytorch Pytorch codes for Feature Transfer Learning for Face Recognition with Under-Represented Data 1. Introduction This repo is an unofficial

1 Nov 04, 2020
ByteTrack超详细教程!训练自己的数据集&&摄像头实时检测跟踪

ByteTrack超详细教程!训练自己的数据集&&摄像头实时检测跟踪

Double-zh 45 Dec 19, 2022
Worktory is a python library created with the single purpose of simplifying the inventory management of network automation scripts.

Worktory is a python library created with the single purpose of simplifying the inventory management of network automation scripts.

Renato Almeida de Oliveira 18 Aug 31, 2022
Converts geometry node attributes to built-in attributes

Attribute Converter Simplifies converting attributes created by geometry nodes to built-in attributes like UVs or vertex colors, as a single click ope

Ivan Notaros 12 Dec 22, 2022
SuperSDR: multiplatform KiwiSDR + CAT transceiver integrator

SuperSDR SuperSDR integrates a realtime spectrum waterfall and audio receive from any KiwiSDR around the world, together with a local (or remote) cont

Marco Cogoni 30 Nov 29, 2022
A graph neural network (GNN) model to predict protein-protein interactions (PPI) with no sample features

A graph neural network (GNN) model to predict protein-protein interactions (PPI) with no sample features

2 Jul 25, 2022
Laplace Redux -- Effortless Bayesian Deep Learning

Laplace Redux - Effortless Bayesian Deep Learning This repository contains the code to run the experiments for the paper Laplace Redux - Effortless Ba

Runa Eschenhagen 28 Dec 07, 2022
An unofficial personal implementation of UM-Adapt, specifically to tackle joint estimation of panoptic segmentation and depth prediction for autonomous driving datasets.

Semisupervised Multitask Learning This repository is an unofficial and slightly modified implementation of UM-Adapt[1] using PyTorch. This code primar

Abhinav Atrishi 11 Nov 25, 2022
Pytorch implementation of Learning with Opponent-Learning Awareness

Pytorch implementation of Learning with Opponent-Learning Awareness using DiCE

Alexis David Jacq 82 Sep 15, 2022
Implementation of CVPR'2022:Reconstructing Surfaces for Sparse Point Clouds with On-Surface Priors

Reconstructing Surfaces for Sparse Point Clouds with On-Surface Priors (CVPR 2022) Personal Web Pages | Paper | Project Page This repository contains

151 Dec 26, 2022
Use unsupervised and supervised learning to predict stocks

AIAlpha: Multilayer neural network architecture for stock return prediction This project is meant to be an advanced implementation of stacked neural n

Vivek Palaniappan 1.5k Jan 06, 2023
Notspot robot simulation - Python version

Notspot robot simulation - Python version This repository contains all the files and code needed to simulate the notspot quadrupedal robot using Gazeb

50 Sep 26, 2022
Emotional conditioned music generation using transformer-based model.

This is the official repository of EMOPIA: A Multi-Modal Pop Piano Dataset For Emotion Recognition and Emotion-based Music Generation. The paper has b

hung anna 96 Nov 09, 2022
(CVPR2021) Kaleido-BERT: Vision-Language Pre-training on Fashion Domain

Kaleido-BERT: Vision-Language Pre-training on Fashion Domain Mingchen Zhuge*, Dehong Gao*, Deng-Ping Fan#, Linbo Jin, Ben Chen, Haoming Zhou, Minghui

250 Jan 08, 2023
Code samples for my book "Neural Networks and Deep Learning"

Code samples for "Neural Networks and Deep Learning" This repository contains code samples for my book on "Neural Networks and Deep Learning". The cod

Michael Nielsen 13.9k Dec 26, 2022
OpenVINO黑客松比赛项目

Window_Guard OpenVINO黑客松比赛项目 英文名称:Window_Guard 中文名称:窗口卫士 硬件 树莓派4B 8G版本 一个磁石开关 USB摄像头(MP4视频文件也可以) 软件(库) OpenVINO RPi 使用方法 本项目使用的OPenVINO是是2021.3版本,并使用了

Tango 6 Jul 04, 2021
Open Source Light Field Toolbox for Super-Resolution

BasicLFSR BasicLFSR is an open-source and easy-to-use Light Field (LF) image Super-Ressolution (SR) toolbox based on PyTorch, including a collection o

Squidward 50 Nov 18, 2022