It is a simple library to speed up CLIP inference up to 3x (K80 GPU)

Last update: Dec 20, 2022

Overview

CLIP-ONNX

It is a simple library to speed up CLIP inference up to 3x (K80 GPU)

Usage

Install clip-onnx module and requirements first. Use this trick

!pip install git+https://github.com/Lednik7/CLIP-ONNX.git

Example in 3 steps

Download CLIP image from repo

!wget -c -O CLIP.png https://github.com/openai/CLIP/blob/main/CLIP.png?raw=true

Load standard CLIP model, image, text on cpu

import clip
from PIL import Image

# onnx cannot work with cuda
model, preprocess = clip.load("ViT-B/32", device="cpu", jit=False)
# batch first
image = preprocess(Image.open("CLIP.png")).unsqueeze(0) # [1, 3, 224, 224]
text = clip.tokenize(["a diagram", "a dog", "a cat"]) # [3, 77]

Create CLIP-ONNX object to convert model to onnx

from clip_onnx import clip_onnx, attention
clip.model.ResidualAttentionBlock.attention = attention

visual_path = "clip_visual.onnx"
textual_path = "clip_textual.onnx"

# ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
onnx_model = clip_onnx(model, providers=["CPUExecutionProvider"], # cpu mode
                       visual_path=visual_path, textual_path=textual_path)
onnx_model.convert2onnx(image, text, verbose=True)
onnx_model.start_sessions()

Use for standard CLIP API. Batch inference

image_features = onnx_model.encode_image(image)
text_features = onnx_model.encode_text(text)

logits_per_image, logits_per_text = onnx_model(image, text)
probs = logits_per_image.softmax(dim=-1).cpu().numpy()

print("Label probs:", probs)  # prints: [[0.41456965 0.29270944 0.29272085]]

Enjoy the speed

Examples

See examples folder for more details
Some parts of the code were taken from the post. Thank you neverix for this notebook.

Comments

Can't use CUDAExecutionProvider
Hey, I'm trying to use the code on GPU and I encountered 2 problems:

when running pip install git+https://github.com/Lednik7/CLIP-ONNX.git I got the following error (tried on multiple machines): ERROR: Could not find a version that satisfies the requirement torch==1.10.0+cu111 (from clip-onnx)

I fixed it by installing that version of torch by myself. with pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html, and then running the rest of the installation.

After I installed the package, I tried to run the example in the readme with CPUExecutionProvider and it worked fine, but when I'm trying to run it on GPU with CUDAExecutionProvider I get the following error message (again on different machines):

2022-01-31 20:57:03.234399301 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:535 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met. 2022-01-31 20:57:03.872349008 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:535 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.

I can't figure out what is the problem. Any help?
opened by YoadTew 13
Performance is inconsistent with the original model
Hi, thanks for providing this useful tool! However, I found that the result produced by the generated ONNX model is inconsistent with the original CLIP model. Here is the code I used to test the original model:

model, preprocess = clip.load("ViT-B/32", device="cpu", jit=False) image = preprocess(Image.open("CLIP.png")).unsqueeze(0).cpu() # [1, 3, 224, 224] text = clip.tokenize(["a diagram", "a dog", "a cat"]).cpu() # [3, 77] image_features = model.encode_image(image) text_features = model.encode_text(text) logits_per_image, logits_per_text = model(image, text) probs = logits_per_image.softmax(dim=-1).detach().cpu().numpy() print("Label probs:", probs)

The result is: Label probs: [[0.9927937 0.00421069 0.00299573]]

However, when using the onnx model, the result is: Label probs: [[0.41456965 0.29270944 0.29272085]].

Could you help me with this? Thanks!
opened by Cestlaviez 5

Error on installing the torch version in requirements.txt

pip install git+https://github.com/Lednik7/CLIP-ONNX.git

ERROR: Could not find a version that satisfies the requirement torch==1.11.0+cu113 (from versions: 1.0.0, 1.0.1, 1.0.1.post2, 1.1.0, 1.2.0, 1.3.0, 1.3.1, 1.4.0, 1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2, 1.11.0)
ERROR: No matching distribution found for torch==1.11.0+cu113

python version is 3.7.13

opened by dingusagar 2

ERROR: No matching distribution found for onnxruntime==1.11

Hi, Thanks for the great work!

I am having this error when I try to install the package.

ERROR: No matching distribution found for onnxruntime==1.11

Maybe we can update the requirements.txt?

opened by wanliAlex 1
updated and added information

add info about export params

update GPU(K80) benchmarks

update GPU(T4) benchmarks

update CPU benchmarks

change opset_version to 12

updated readme according to the version

update branch link

update version

update packages

opened by Lednik7 0
Replace the operator of "torch.einsum"

q, k, v = (torch.einsum("tbh, oh -> tbo", x, self.attn.in_proj_weight) + self.attn.in_proj_bias).contiguous().chunk( 3, dim=-1)

@Lednik7 Thanks for your great work on Clip-ONNX. for the pytorch operator of "torch.einsum" , if we don't want to use this operator , do you have other codes to replace this operator? this operator is not friendly to some Inference engine, like NV TensorRT, so if you have other codes to replace einsum, that will be better

opened by zhangnju 2

Releases(1.2)

1.2(May 3, 2022)

add info about export params

update GPU(K80) benchmarks

update GPU(T4) benchmarks

update CPU benchmarks

change opset_version to 12

updated readme according to the version

update branch link

update version

update packages
Source code(tar.gz)
Source code(zip)
1.0(May 3, 2022)

Works but with crutches
Source code(tar.gz)
Source code(zip)

Owner

Gerasimov Maxim

16 y.o. Data Scientist. Graduated by Yandex Lyceum and Tinkoff Education

GitHub Repository

Our solution for SSN Invente 2021's Hackathon

Our solution for SSN Invente 2021's Hackathon. To help maitain godowns in a pristine and safe condition using raspberry pi.

1 Jan 12, 2022

Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder

RAVE: Realtime Audio Variational autoEncoder Official implementation of RAVE: A variational autoencoder for fast and high-quality neural audio synthes

587 Jan 01, 2023

A PyTorch implementation of the continual learning experiments with deep neural networks

Brain-Inspired Replay A PyTorch implementation of the continual learning experiments with deep neural networks described in the following paper: Brain

182 Dec 27, 2022

TCube generates rich and fluent narratives that describes the characteristics, trends, and anomalies of any time-series data (domain-agnostic) using the transfer learning capabilities of PLMs.

TCube: Domain-Agnostic Neural Time series Narration This repository contains the code for the paper: "TCube: Domain-Agnostic Neural Time series Narrat

7 Oct 31, 2021

Open & Efficient for Framework for Aspect-based Sentiment Analysis

PyABSA - Open & Efficient for Framework for Aspect-based Sentiment Analysis Fast & Low Memory requirement & Enhanced implementation of Local Context F

567 Jan 07, 2023

A font family with a great monospaced variant for programmers.

Fantasque Sans Mono A programming font, designed with functionality in mind, and with some wibbly-wobbly handwriting-like fuzziness that makes it unas

6.3k Jan 08, 2023

SoGCN: Second-Order Graph Convolutional Networks

SoGCN: Second-Order Graph Convolutional Networks This is the authors' implementation of paper "SoGCN: Second-Order Graph Convolutional Networks" in Py

7 Aug 16, 2022

Keras Image Embeddings using Contrastive Loss

Image to Embedding projection in vector space. Implementation in keras and tensorflow of batch all triplet loss for one-shot/few-shot learning.

5 Mar 21, 2022

Awesome Graph Classification - A collection of important graph embedding, classification and representation learning papers with implementations.

A collection of graph classification methods, covering embedding, deep learning, graph kernel and factorization papers

4.5k Jan 01, 2023

It is a simple library to speed up CLIP inference up to 3x (K80 GPU)

Related tags

Overview

CLIP-ONNX

Usage

Example in 3 steps

Examples

Comments

Can't use CUDAExecutionProvider

Performance is inconsistent with the original model

Error on installing the torch version in requirements.txt

ERROR: No matching distribution found for onnxruntime==1.11

updated and added information

Replace the operator of "torch.einsum"

Releases(1.2)

1.2(May 3, 2022)

1.0(May 3, 2022)

Owner

Gerasimov Maxim

Our solution for SSN Invente 2021's Hackathon

Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder

A PyTorch implementation of the continual learning experiments with deep neural networks

TCube generates rich and fluent narratives that describes the characteristics, trends, and anomalies of any time-series data (domain-agnostic) using the transfer learning capabilities of PLMs.

Open & Efficient for Framework for Aspect-based Sentiment Analysis

A font family with a great monospaced variant for programmers.

SoGCN: Second-Order Graph Convolutional Networks

Keras Image Embeddings using Contrastive Loss

Awesome Graph Classification - A collection of important graph embedding, classification and representation learning papers with implementations.

Implementation of "Meta-rPPG: Remote Heart Rate Estimation Using a Transductive Meta-Learner"

Pytorch implementation of PCT: Point Cloud Transformer

Official Pytorch implementation for video neural representation (NeRV)

Using LSTM to detect spoofing attacks in an Air-Ground network

Real-time 3D multi-person detection made easy with OpenPose and the ZED

To prepare an image processing model to classify the type of disaster based on the image dataset

JAX-based neural network library

The 1st Place Solution of the Facebook AI Image Similarity Challenge (ISC21) : Descriptor Track.

The CLRS Algorithmic Reasoning Benchmark

Vehicle speed detection with python

MoViNets PyTorch implementation: Mobile Video Networks for Efficient Video Recognition;