Memory-efficient optimum einsum using opt_einsum planning and PyTorch kernels.

Last update: Nov 18, 2022

Overview

opt-einsum-torch

There have been many implementations of Einstein's summation. numpy's numpy.einsum is the least efficient one as it only runs in single thread on CPU. PyTorch's torch.einsum works for both CPU and CUDA tensors. However, since there is no virtual CUDA memory, torch.einsum will run out of CUDA memory for large tensors.

This code aims at implementing a memory-efficient einsum function using PyTorch as the backend. This code also uses the opt_einsum package to optimizes the contraction path to achieve the minimal FLOPS.

Usage

from opt_einsum_torch import EinsumPlanner
import torch

# Some huge tensors
arr1, arr2 = ..., ...
ee = EinsumPlanner(torch.device('cuda:0'), cuda_mem_limit=0.9)
result = ee.einsum('ijk,jkl->il', arr1, arr2)

The resulting tensor result will be a PyTorch CPU tensor. You could convert it into numpy array by simply calling result.numpy().

Future works

Support multiple GPUs.
Memory efficient einsum kernels.
CUDA data transfer profilers.

You might also like...

Implementation of "Efficient Regional Memory Network for Video Object Segmentation" (Xie et al., CVPR 2021).

RMNet This repository contains the source code for the paper Efficient Regional Memory Network for Video Object Segmentation. Cite this work @inprocee

76 Dec 14, 2022

Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation

STCN Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang [a

456 Dec 12, 2022

A memory-efficient implementation of DenseNets

efficient_densenet_pytorch A PyTorch =1.0 implementation of DenseNets, optimized to save GPU memory. Recent updates Now works on PyTorch 1.0! It uses

1.4k Dec 25, 2022

InvTorch: memory-efficient models with invertible functions

InvTorch: Memory-Efficient Invertible Functions This module extends the functionality of torch.utils.checkpoint.checkpoint to work with invertible fun

12 May 12, 2022

Implementation of Memory-Efficient Neural Networks with Multi-Level Generation, ICCV 2021

Memory-Efficient Multi-Level In-Situ Generation (MLG) By Jiaqi Gu, Hanqing Zhu, Chenghao Feng, Mingjie Liu, Zixuan Jiang, Ray T. Chen and David Z. Pan

2 Jan 4, 2022

This is the official repo for TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations at CVPR'21. According to some product reasons, we are not planning to release the training/testing codes and models. However, we will release the dataset and the scripts to prepare the dataset.

TransFill-Reference-Inpainting This is the official repo for TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transf

80 Dec 8, 2022

Efficient-GlobalPointer - Pytorch Efficient GlobalPointer

Releases(0.1.0)

0.1.0(Dec 30, 2021)

Initial release of the package.
Source code(tar.gz)
Source code(zip)

Memory-efficient optimum einsum using opt_einsum planning and PyTorch kernels.

Related tags

Overview

opt-einsum-torch

Usage

Future works

You might also like...

Implementation of "Efficient Regional Memory Network for Video Object Segmentation" (Xie et al., CVPR 2021).

Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation

A memory-efficient implementation of DenseNets

InvTorch: memory-efficient models with invertible functions

Implementation of Memory-Efficient Neural Networks with Multi-Level Generation, ICCV 2021

Efficient-GlobalPointer - Pytorch Efficient GlobalPointer

Official repository for "PAIR: Planning and Iterative Refinement in Pre-trained Transformers for Long Text Generation"

GNPy: Optical Route Planning and DWDM Network Optimization

Releases(0.1.0)

0.1.0(Dec 30, 2021)

Owner

Haoyan Huo

Unrolled Generative Adversarial Networks

DeepRec is a recommendation engine based on TensorFlow.

Convert Mission Planner (ArduCopter) Waypoint Missions to Litchi CSV Format to execute on DJI Drones

Towards Rolling Shutter Correction and Deblurring in Dynamic Scenes (CVPR2021)

The Python code for the paper A Hybrid Quantum-Classical Algorithm for Robust Fitting

PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer

Camview - A CLI-tool used to stream CCTV online footage based on URL params

Code for "PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation" CVPR 2019 oral

The official project of SimSwap (ACM MM 2020)

MMRazor: a model compression toolkit for model slimming and AutoML

This repo provides the source code & data of our paper "GreaseLM: Graph REASoning Enhanced Language Models"

Transformers based fully on MLPs

Coarse implement of the paper "A Simultaneous Denoising and Dereverberation Framework with Target Decoupling", On DNS-2020 dataset, the DNSMOS of first stage is 3.42 and second stage is 3.47.

Cours d'Algorithmique Appliquée avec Python pour BTS SIO SISR

Source Code For Template-Based Named Entity Recognition Using BART

Fully Convlutional Neural Networks for state-of-the-art time series classification

Koç University deep learning framework.

A booklet on machine learning systems design with exercises

PyTorch implementation of DeepUME: Learning the Universal Manifold Embedding for Robust Point Cloud Registration (BMVC 2021)

Pre-training of Graph Augmented Transformers for Medication Recommendation