An addernet CUDA version

Last update: Jun 20, 2022

Related tags

Overview

Training addernet accelerated by CUDA

Usage

cd adder_cuda
python setup.py install
cd ..
python main.py

Environment

pytorch 1.10.0 CUDA 11.3

benchmark

version	training_time_per_batch/s
raw	1.61
torch.cdist	1.49
cuda_unoptimized	0.4508
this work	0.3158

The CUDA version of AdderNet has achieved a 5× speed increase over the original version. There seems to be some bugs in the Cuda_unoptimized version, causing the model to fail to converge. Its speed is still listed here for comparison. The experiment was run on RTX 2080Ti platform, and ResNet-20 based on CIFAR-10 was trained.

Time(%)	Time	Calls	Avg	Min	Max	Name
48.57	30.4752s	3920	7.7743ms	162.70us	12.271ms	CONV_BACKWARD
34.85	21.8686s	19680	1.1112ms	5.3770us	11.827ms	_ZN2at6native27unrolled_elementwise_kernel...
7.46	4.67901s	5920	790.37us	26.529us	1.5841ms	CONV
2.24	1.40372s	3920	358.09us	31.298us	845.80us	col2im_kernel
2.10	1.31882s	36862	35.777us	1.4720us	276.24us	vectorized_elementwise_kernel
1.43	900.03ms	5920	152.03us	7.9040us	372.40us	im2col_kernel

Here is the time distribution of training an epoch. If you are interested, you can continue to optimize the CUDA kernel.

An addernet CUDA version

Related tags

Overview

Training addernet accelerated by CUDA

Usage

Environment

benchmark

Owner

LingXY

An image processing project uses Viola-jones technique to detect faces and then use SIFT algorithm for recognition.

Bayesian Meta-Learning Through Variational Gaussian Processes

Densely Connected Search Space for More Flexible Neural Architecture Search (CVPR2020)

pix2pix in tensorflow.js

Differentiable Annealed Importance Sampling (DAIS)

Deep Reinforcement Learning by using an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO)

Real-CUGAN - Real Cascade U-Nets for Anime Image Super Resolution

Final project for Intro to CS class.

Multi-objective constrained optimization for energy applications via tree ensembles

Pmapper is a super-resolution and deconvolution toolkit for python 3.6+

A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.

This is the official Pytorch implementation of the paper "Diverse Motion Stylization for Multiple Style Domains via Spatial-Temporal Graph-Based Generative Model"

Code to reproduce experiments in the paper "Explainability Requires Interactivity".

Code for "Localization with Sampling-Argmax", NeurIPS 2021

InsTrim: Lightweight Instrumentation for Coverage-guided Fuzzing

Benchmark for the generalization of 3D machine learning models across different remeshing/samplings of a surface.

Cooperative Driving Dataset: a dataset for multi-agent driving scenarios

Unofficial PyTorch Implementation for HifiFace (https://arxiv.org/abs/2106.09965)

Diverse Branch Block: Building a Convolution as an Inception-like Unit

Face Identity Disentanglement via Latent Space Mapping [SIGGRAPH ASIA 2020]