CUda Matrix Multiply library.

Last update: Dec 27, 2022

Related tags

Overview

cumm

CUda Matrix Multiply library.

cumm is developed during learning of CUTLASS, which use too much c++ template and make code unmaintainable. So I develop pccm, use python as meta programming language, to replace c++ template meta programming. Now pccm become a foundational framework of cumm and my other c++ project such as spconv. cumm also contains a python asyncio-based gemm simulator that share same meta program with CUDA code, enable gemm visualization and easy debug experience.

Install

Prebuilt

We offer python 3.6-3.10 and cuda 10.2/11.1/11.3/11.4 prebuilt binaries for linux (manylinux).

We offer python 3.7-3.10 and cuda 10.2/11.1/11.3/11.4 prebuilt binaries for windows 10/11.

We will offer prebuilts for CUDA versions supported by latest pytorch release. For example, pytorch 1.9 support cuda 10.2 and 11.1, so we support them too.

pip install cumm for CPU-only

pip install cumm-cu102 for CUDA 10.2

pip install cumm-cu111 for CUDA 11.1

pip install cumm-cu113 for CUDA 11.3

pip install cumm-cu114 for CUDA 11.4

Build from source for development (JIT, recommend for develop)

WARNING Use code in tags!!! code in main branch may contain bugs.

The c++ code will be built automatically when you change c++ code in project.

Linux

uninstall cumm installed by pip. you must ensure no "cumm" exists in pip list | grep cumm
install build-essential, install CUDA
git clone https://github.com/FindDefinition/cumm, cd ./cumm, pip install -e .
in python, import cumm and wait for build finish.

Windows

uninstall spconv and cumm installed by pip. you must ensure no "cumm" exists in pip list | grep cumm
install visual studio 2019 or newer. make sure C++ development component is installed. install CUDA
set powershell script execution policy
start a new powershell, run tools/msvc_setup.ps1
git clone https://github.com/FindDefinition/cumm, cd ./cumm, pip install -e .
in python, import cumm and wait for build finish.

Build wheel from source

WARNING Use code in tags!!! code in main branch may contain bugs.

WARNING: If CUMM_CUDA_VERSION is set with a CUDA version, following steps will create a wheel named "cumm-cuxxx", not "cumm", this means you must use cumm-cuxxx in dependency of your project which depend on cumm, not cumm. If CUMM_CUDA_VERSION isn't set, cumm will always built with CUDA, so the CUDA must exists in your system. The wheel name will be cumm even if it is built with cuda.

Linux

It's recommend to build Linux packages in official build docker. Build with CUDA support don't need a real GPU.

Build in Official Docker

select a cuda version. available: CUDA 10.2, 11.1, 11.3, 11.4, 11.5
(Example for CUDA 11.4) git clone https://github.com/FindDefinition/cumm, cd ./cumm, docker run --rm -e PLAT=manylinux2014_x86_64 -e CUMM_CUDA_VERSION=114 -v `pwd`:/io scrin/manylinux2014-cuda:cu114-devel-1.0.0 bash -c "source /etc/bashrc && /io/tools/build-wheels.sh"

Build in your environment

install build-essential, install CUDA
set env for installed cuda version. for example, export CUMM_CUDA_VERSION="11.4". If you want to build CPU-only, run export CUMM_CUDA_VERSION="". If CUMM_CUDA_VERSION isn't set, you need to ensure cuda libraries are inside OS search path, and the built wheel name will be cumm, otherwise cumm-cuxxx
run export CUMM_DISABLE_JIT="1"
run python setup.py bdist_wheel+pip install dists/xxx.whl

Windows 10/11

install visual studio 2019 or newer. make sure C++ development package is installed. install CUDA
set powershell script execution policy
start a new powershell, run tools/msvc_setup.ps1
set env for installed cuda version. for example, $Env:CUMM_CUDA_VERSION = "11.4". If you want to build CPU-only, run $Env:CUMM_CUDA_VERSION = "". . If CUMM_CUDA_VERSION isn't set, you need to ensure cuda libraries are inside OS search path, and the built wheel name will be cumm, otherwise cumm-cuxxx
run $Env:CUMM_DISABLE_JIT = "1"
run python setup.py bdist_wheel+pip install dists/xxx.whl

Note

The work is done when the author is an employee at Tusimple.

LICENSE

Apache 2.0

CUda Matrix Multiply library.

Related tags

Overview

cumm

Install

Prebuilt

Build from source for development (JIT, recommend for develop)

Linux

Windows

Build wheel from source

Linux

Build in Official Docker

Build in your environment

Windows 10/11

Note

LICENSE

Owner

CONetV2: Efficient Auto-Channel Size Optimization for CNNs

这是一个利用facenet和retinaface实现人脸识别的库，可以进行在线的人脸识别。

A Real-Time-Strategy game for Deep Learning research

AlgoVision - A Framework for Differentiable Algorithms and Algorithmic Supervision

A GPU-optional modular synthesizer in pytorch, 16200x faster than realtime, for audio ML researchers.

PenguinSpeciesPredictionML - Basic model to predict Penguin species based on beak size and sex.

CNN designed for pansharpening

The repo for the paper "I3CL: Intra- and Inter-Instance Collaborative Learning for Arbitrary-shaped Scene Text Detection".

Config files for my GitHub profile.

Official implementation of GraphMask as presented in our paper Interpreting Graph Neural Networks for NLP With Differentiable Edge Masking.

Use deep learning, genetic programming and other methods to predict stock and market movements

Official Pytorch implementation of "CLIPstyler:Image Style Transfer with a Single Text Condition"

RGBD-Net - This repository contains a pytorch lightning implementation for the 3DV 2021 RGBD-Net paper.

Post-training Quantization for Neural Networks with Provable Guarantees

System Combination for Grammatical Error Correction Based on Integer Programming

Implementation of a Transformer using ReLA (Rectified Linear Attention)

Effect of Deep Transfer and Multi task Learning on Sperm Abnormality Detection

Versatile Generative Language Model

RSC-Net: 3D Human Pose, Shape and Texture from Low-Resolution Images and Videos

[MedIA2021]MIDeepSeg: Minimally Interactive Segmentation of Unseen Objects from Medical Images Using Deep Learning