Prevent `CUDA error: out of memory` in just 1 line of code.

Overview

🐨 Koila

Koila solves CUDA error: out of memory error painlessly. Fix it with just one line of code, and forget it.

Type Checking Formatting Unit testing License: MIT Tweet

Koila

πŸš€ Features

  • πŸ™… Prevents CUDA error: out of memory error with one single line of code.

  • πŸ¦₯ Lazily evaluates pytorch code to save computing power.

  • βœ‚οΈ Automatically splits along the batch dimension to more GPU friendly numbers (2's powers) to speed up the execution.

  • 🀏 Minimal API (wrapping all inputs will be enough).

πŸ€” Why Koila?

Ever encountered RuntimeError: CUDA error: out of memory? We all love PyTorch because of its speed, efficiency, and transparency, but that means it doesn't do extra things. Things like preventing a very common error that has been bothering many users since 2017.

This library aims to prevent that by being a light-weight wrapper over native PyTorch. When a tensor is wrapped, the library automatically computes the amount of remaining GPU memory and uses the right batch size, saving everyone from having to manually finetune the batch size whenever a model is used.

Also, the library automatically uses the right batch size to GPU. Did you know that using bigger batches doesn't always speed up processing? It's handled automatically in this library too.

Because Koila code is PyTorch code, as it runs PyTorch under the hood, you can use both together without worrying compatibility.

Oh, and all that in 1 line of code! 😊

⬇️ Installation

Koila is available on PyPI. To install, run the following command.

pip install koila

πŸƒ Getting started

The usage is dead simple. For example, you have the following PyTorch code (copied from PyTorch's tutorial)

Define the input, label, and model:

# A batch of MNIST image
input = torch.randn(8, 28, 28)

# A batch of labels
label = torch.randn(0, 10, [8])

class NeuralNetwork(Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = Flatten()
        self.linear_relu_stack = Sequential(
            Linear(28 * 28, 512),
            ReLU(),
            Linear(512, 512),
            ReLU(),
            Linear(512, 10),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

Define the loss function, calculate output and losses.

loss_fn = CrossEntropyLoss()

# Calculate losses
out = nn(t)
loss = loss_fn(out, label)

# Backward pass
nn.zero_grad()
loss.backward()

Ok. How to adapt the code to use Koila's features?

You change this line of code:

# Wrap the input tensor.
# If a batch argument is provided, that dimension of the tensor would be treated as the batch.
# In this case, the first dimension (dim=0) is used as batch's dimension.
input = lazy(torch.randn(8, 28, 28), batch=0)

Done. You will not run out of memory again.

See examples/getting-started.py for the full example.

πŸ‹οΈ How does it work under the hood?

CUDA error: out of memory generally happens in forward pass, because temporary variables will need to be saved in memory.

Koila is a thin wrapper around PyTorch. It is inspired by TensorFlow's static/lazy evaluation. By building the graph first, and run the model only when necessarily, the model has access to all the information necessarily to determine how much resources is really need to compute the model.

In terms of memory usage, only shapes of temporary variables are required to calculate the memory usage of those variables used in the model. For example, + takes in two tensors with equal sizes, and outputs a tensor with a size equal to the input size, and log takes in one tensor, and outputs another tensor with the same shape. Broadcasting makes it a little more complicated than that, but the general ideas are the same. By tracking all these shapes, one could easily tell how much memory is used in a forward pass. And select the optimal batch size accordingly.

🐌 It sounds slow. Is it?

NO. Indeed, calculating shapes and computing the size and memory usage sound like a lot of work. However, keep in mind that even a gigantic model like GPT-3, which has 96 layers, has only a few hundred nodes in its computing graph. Because Koila's algorithms run in linear time, any modern computer will be able to handle a graph like this instantly.

Most of the computing is spent on computing individual tensors, and transferring tensors across devices. And bear in mind that those checks happen in vanilla PyTorch anyways. So no, not slow at all.

πŸ”Š How to pronounce koila?

This project was originally named koala, the laziest species in the world, and this project is about lazy evaluation of tensors. However, as that name is taken on PyPI, I had no choice but to use another name. Koila is a word made up by me, pronounced similarly to voila (It's a French word), so sounds like koala.

⭐ Give me a star!

If you like what you see, please consider giving this a star (β˜…)!

πŸ—οΈ Why did I build this?

Batch size search is not new. In fact, the mighty popular PyTorch Lightning has it. So why did I go through the trouble and build this project?

PyTorch Lightning's batch size search is deeply integrated in its own ecosystem. You have to use its DataLoader, subclass from their models, and train your models accordingly. While it works well with supervised learning tasks, it's really painful to use in a reinforcement learning task, where interacting with the environment is a must.

In comparison, because Koila is a super lightweight PyTorch wrapper, it works when PyTorch works, thus providing maximum flexibility and minimal changes to existing code.

πŸ“ Todos

  • 🧩 Provide an extensible API to write custom functions for the users.
  • 😌 Simplify internal workings even further. (Especially interaction between Tensors and LazyTensors).
  • πŸͺ Work with multiple GPUs.

🚧 Warning

The code works on many cases, but it's still a work in progress. This is not (yet) a fully PyTorch compatible library due to limited time.

πŸ₯° Contributing

We take openness and inclusiveness very seriously. We have adopted the following Code of Conduct.

Pytorch code for our paper Beyond ImageNet Attack: Towards Crafting Adversarial Examples for Black-box Domains)

Beyond ImageNet Attack: Towards Crafting Adversarial Examples for Black-box Domains (ICLR'2022) This is the Pytorch code for our paper Beyond ImageNet

Alibaba-AAIG 37 Nov 23, 2022
These are the materials for the paper "Few-Shot Out-of-Domain Transfer Learning of Natural Language Explanations"

Few-shot-NLEs These are the materials for the paper "Few-Shot Out-of-Domain Transfer Learning of Natural Language Explanations". You can find the smal

Yordan Yordanov 0 Oct 21, 2022
Code for "Layered Neural Rendering for Retiming People in Video."

Layered Neural Rendering in PyTorch This repository contains training code for the examples in the SIGGRAPH Asia 2020 paper "Layered Neural Rendering

Google 154 Dec 16, 2022
From Canonical Correlation Analysis to Self-supervised Graph Neural Networks

Code for CCA-SSG model proposed in the NeurIPS 2021 paper From Canonical Correlation Analysis to Self-supervised Graph Neural Networks.

Hengrui Zhang 44 Nov 27, 2022
Code for Paper "Evidential Softmax for Sparse MultimodalDistributions in Deep Generative Models"

Evidential Softmax for Sparse Multimodal Distributions in Deep Generative Models Abstract Many applications of generative models rely on the marginali

Stanford Intelligent Systems Laboratory 9 Jun 06, 2022
Deep Watershed Transform for Instance Segmentation

Deep Watershed Transform Performs instance level segmentation detailed in the following paper: Min Bai and Raquel Urtasun, Deep Watershed Transformati

193 Nov 20, 2022
Chunkmogrify: Real image inversion via Segments

Chunkmogrify: Real image inversion via Segments Teaser video with live editing sessions can be found here This code demonstrates the ideas discussed i

David Futschik 112 Jan 04, 2023
Tensorflow implementation for "Improved Transformer for High-Resolution GANs" (NeurIPS 2021).

HiT-GAN Official TensorFlow Implementation HiT-GAN presents a Transformer-based generator that is trained based on Generative Adversarial Networks (GA

Google Research 78 Oct 31, 2022
Learned Token Pruning for Transformers

LTP: Learned Token Pruning for Transformers Check our paper for more details. Installation We follow the same installation procedure as the original H

Sehoon Kim 52 Dec 29, 2022
Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic [Paper] [Colab is coming soon] Approach Example Usage To r

170 Jan 03, 2023
Pytorch implementation of Nueral Style transfer

Nueral Style Transfer Pytorch implementation of Nueral style transfer algorithm , it is used to apply artistic styles to content images . Content is t

Abhinav 9 Oct 15, 2022
Predicting the duration of arrival delays for commercial flights.

Flight Delay Prediction Our objective is to predict arrival delays of commercial flights. According to the US Department of Transportation, about 21%

Jordan Silke 1 Jan 11, 2022
NeRD: Neural Reflectance Decomposition from Image Collections

NeRD: Neural Reflectance Decomposition from Image Collections Project Page | Video | Paper | Dataset Implementation for NeRD. A novel method which dec

Computergraphics (University of TΓΌbingen) 195 Dec 29, 2022
Bare bones use-case for deploying a containerized web app (built in streamlit) on AWS.

Containerized Streamlit web app This repository is featured in a 3-part series on Deploying web apps with Streamlit, Docker, and AWS. Checkout the blo

Collin Prather 62 Jan 02, 2023
Two-Stage Peer-Regularized Feature Recombination for Arbitrary Image Style Transfer

Two-Stage Peer-Regularized Feature Recombination for Arbitrary Image Style Transfer Paper on arXiv Public PyTorch implementation of two-stage peer-reg

NNAISENSE 38 Oct 14, 2022
Live training loss plot in Jupyter Notebook for Keras, PyTorch and others

livelossplot Don't train deep learning models blindfolded! Be impatient and look at each epoch of your training! (RECENT CHANGES, EXAMPLES IN COLAB, A

Piotr MigdaΕ‚ 1.2k Jan 08, 2023
A CNN model to detect hand gestures.

Software Used python - programming language used, tested on v3.8 miniconda - for managing virtual environment Libraries Used opencv - pip install open

Shivanshu 6 Jul 14, 2022
Deep Learning Models for Causal Inference

Extensive tutorials for learning how to build deep learning models for causal inference using selection on observables in Tensorflow 2.

Bernard J Koch 151 Dec 31, 2022
StyleGAN2-ADA - Official PyTorch implementation

Abstract: Training generative adversarial networks (GAN) using too little data typically leads to discriminator overfitting, causing training to diverge. We propose an adaptive discriminator augmenta

NVIDIA Research Projects 3.2k Dec 30, 2022
Node Dependent Local Smoothing for Scalable Graph Learning

Node Dependent Local Smoothing for Scalable Graph Learning Requirements Environments: Xeon Gold 5120 (CPU), 384GB(RAM), TITAN RTX (GPU), Ubuntu 16.04

Wentao Zhang 15 Nov 28, 2022