It is a simple library to speed up CLIP inference up to 3x (K80 GPU)

Last update: Dec 20, 2022

Overview

CLIP-ONNX

It is a simple library to speed up CLIP inference up to 3x (K80 GPU)

Usage

Install clip-onnx module and requirements first. Use this trick

!pip install git+https://github.com/Lednik7/CLIP-ONNX.git

Example in 3 steps

Download CLIP image from repo

!wget -c -O CLIP.png https://github.com/openai/CLIP/blob/main/CLIP.png?raw=true

Load standard CLIP model, image, text on cpu

import clip
from PIL import Image

# onnx cannot work with cuda
model, preprocess = clip.load("ViT-B/32", device="cpu", jit=False)
# batch first
image = preprocess(Image.open("CLIP.png")).unsqueeze(0) # [1, 3, 224, 224]
text = clip.tokenize(["a diagram", "a dog", "a cat"]) # [3, 77]

Create CLIP-ONNX object to convert model to onnx

from clip_onnx import clip_onnx, attention
clip.model.ResidualAttentionBlock.attention = attention

visual_path = "clip_visual.onnx"
textual_path = "clip_textual.onnx"

# ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
onnx_model = clip_onnx(model, providers=["CPUExecutionProvider"], # cpu mode
                       visual_path=visual_path, textual_path=textual_path)
onnx_model.convert2onnx(image, text, verbose=True)
onnx_model.start_sessions()

Use for standard CLIP API. Batch inference

image_features = onnx_model.encode_image(image)
text_features = onnx_model.encode_text(text)

logits_per_image, logits_per_text = onnx_model(image, text)
probs = logits_per_image.softmax(dim=-1).cpu().numpy()

print("Label probs:", probs)  # prints: [[0.41456965 0.29270944 0.29272085]]

Enjoy the speed

Examples

See examples folder for more details
Some parts of the code were taken from the post. Thank you neverix for this notebook.

Comments

Can't use CUDAExecutionProvider
Hey, I'm trying to use the code on GPU and I encountered 2 problems:

when running pip install git+https://github.com/Lednik7/CLIP-ONNX.git I got the following error (tried on multiple machines): ERROR: Could not find a version that satisfies the requirement torch==1.10.0+cu111 (from clip-onnx)

I fixed it by installing that version of torch by myself. with pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html, and then running the rest of the installation.

After I installed the package, I tried to run the example in the readme with CPUExecutionProvider and it worked fine, but when I'm trying to run it on GPU with CUDAExecutionProvider I get the following error message (again on different machines):

2022-01-31 20:57:03.234399301 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:535 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met. 2022-01-31 20:57:03.872349008 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:535 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.

I can't figure out what is the problem. Any help?
opened by YoadTew 13
Performance is inconsistent with the original model
Hi, thanks for providing this useful tool! However, I found that the result produced by the generated ONNX model is inconsistent with the original CLIP model. Here is the code I used to test the original model:

model, preprocess = clip.load("ViT-B/32", device="cpu", jit=False) image = preprocess(Image.open("CLIP.png")).unsqueeze(0).cpu() # [1, 3, 224, 224] text = clip.tokenize(["a diagram", "a dog", "a cat"]).cpu() # [3, 77] image_features = model.encode_image(image) text_features = model.encode_text(text) logits_per_image, logits_per_text = model(image, text) probs = logits_per_image.softmax(dim=-1).detach().cpu().numpy() print("Label probs:", probs)

The result is: Label probs: [[0.9927937 0.00421069 0.00299573]]

However, when using the onnx model, the result is: Label probs: [[0.41456965 0.29270944 0.29272085]].

Could you help me with this? Thanks!
opened by Cestlaviez 5

Error on installing the torch version in requirements.txt

pip install git+https://github.com/Lednik7/CLIP-ONNX.git

ERROR: Could not find a version that satisfies the requirement torch==1.11.0+cu113 (from versions: 1.0.0, 1.0.1, 1.0.1.post2, 1.1.0, 1.2.0, 1.3.0, 1.3.1, 1.4.0, 1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2, 1.11.0)
ERROR: No matching distribution found for torch==1.11.0+cu113

python version is 3.7.13

opened by dingusagar 2

ERROR: No matching distribution found for onnxruntime==1.11

Hi, Thanks for the great work!

I am having this error when I try to install the package.

ERROR: No matching distribution found for onnxruntime==1.11

Maybe we can update the requirements.txt?

opened by wanliAlex 1
updated and added information

add info about export params

update GPU(K80) benchmarks

update GPU(T4) benchmarks

update CPU benchmarks

change opset_version to 12

updated readme according to the version

update branch link

update version

update packages

opened by Lednik7 0
Replace the operator of "torch.einsum"

q, k, v = (torch.einsum("tbh, oh -> tbo", x, self.attn.in_proj_weight) + self.attn.in_proj_bias).contiguous().chunk( 3, dim=-1)

@Lednik7 Thanks for your great work on Clip-ONNX. for the pytorch operator of "torch.einsum" , if we don't want to use this operator , do you have other codes to replace this operator? this operator is not friendly to some Inference engine, like NV TensorRT, so if you have other codes to replace einsum, that will be better

opened by zhangnju 2

Releases(1.2)

1.2(May 3, 2022)

add info about export params

update GPU(K80) benchmarks

update GPU(T4) benchmarks

update CPU benchmarks

change opset_version to 12

updated readme according to the version

update branch link

update version

update packages
Source code(tar.gz)
Source code(zip)
1.0(May 3, 2022)

Works but with crutches
Source code(tar.gz)
Source code(zip)

Owner

Gerasimov Maxim

16 y.o. Data Scientist. Graduated by Yandex Lyceum and Tinkoff Education

GitHub Repository

MetaShift: A Dataset of Datasets for Evaluating Contextual Distribution Shifts and Training Conflicts (ICLR 2022)

MetaShift: A Dataset of Datasets for Evaluating Distribution Shifts and Training Conflicts This repo provides the PyTorch source code of our paper: Me

88 Jan 04, 2023

Official PyTorch implementation of Data-free Knowledge Distillation for Object Detection, WACV 2021.

Introduction This repository is the official PyTorch implementation of Data-free Knowledge Distillation for Object Detection, WACV 2021. Data-free Kno

50 Jan 05, 2023

An image classification app boilerplate to serve your deep learning models asap!

Image 🖼 Classification App Boilerplate Have you been puzzled by tons of videos, blogs and other resources on the internet and don't know where and ho

27 Oct 06, 2022

Plotting points that lie on the intersection of the given curves using gradient descent.

Plotting intersection of curves using gradient descent Webapp Link --- What's the app about Why this app Plotting functions and their intersection. A

2 Jan 09, 2022

The VeriNet toolkit for verification of neural networks

VeriNet The VeriNet toolkit is a state-of-the-art sound and complete symbolic interval propagation based toolkit for verification of neural networks.

9 Dec 21, 2022

hySLAM is a hybrid SLAM/SfM system designed for mapping

HySLAM Overview hySLAM is a hybrid SLAM/SfM system designed for mapping. The system is based on ORB-SLAM2 with some modifications and refactoring. Raú

15 Oct 10, 2022

A PyTorch implementation of the paper "Semantic Image Synthesis via Adversarial Learning" in ICCV 2017

Semantic Image Synthesis via Adversarial Learning This is a PyTorch implementation of the paper Semantic Image Synthesis via Adversarial Learning. Req

146 Nov 25, 2022

Campsite Reservation Finder

yellowstone-camping UPDATE: yellowstone-camping is being expanded and renamed to camply. The updated tool now interfaces with the Recreation.gov API a

233 Jan 08, 2023

This repo contains the code for paper Inverse Weighted Survival Games

Inverse-Weighted-Survival-Games This repo contains the code for paper Inverse Weighted Survival Games instructions general loss function (--lfn) can b

3 Jan 12, 2022

img2pose: Face Alignment and Detection via 6DoF, Face Pose Estimation

img2pose: Face Alignment and Detection via 6DoF, Face Pose Estimation Figure 1: We estimate the 6DoF rigid transformation of a 3D face (rendered in si

519 Dec 29, 2022

Approaches to modeling terrain and maps in python

topography 🌎 Contains different approaches to modeling terrain and topographic-style maps in python Features Inverse Distance Weighting (IDW) A given

1 Aug 10, 2022

Minecraft Hack Detection With Python

Minecraft Hack Detection An attempt to try and use crowd sourced replays to find

3 Mar 26, 2022

Zalo AI challenge 2021 task hum to song

Zalo AI challenge 2021 task Hum to Song pipeline: Chuẩn bị dữ liệu cho quá trình train: Sửa các file đường dẫn trong config/preprocess.yaml raw_path:

105 Dec 16, 2022

This is the official code for the paper "Ad2Attack: Adaptive Adversarial Attack for Real-Time UAV Tracking".

Ad^2Attack：Adaptive Adversarial Attack on Real-Time UAV Tracking Demo video 📹 Our video on bilibili demonstrates the test results of Ad^2Attack on se

10 Nov 07, 2022

NeuroFind - A solution to the to the Task given by the Oberseminar of Messtechnik Institute of TU Dresden in 2021

NeuroFind A solution to the to the Task given by the Oberseminar of Messtechnik

1 Jan 20, 2022

Official code release for: EditGAN: High-Precision Semantic Image Editing

565 Jan 05, 2023

The Hailo Model Zoo includes pre-trained models and a full building and evaluation environment

Hailo Model Zoo The Hailo Model Zoo provides pre-trained models for high-performance deep learning applications. Using the Hailo Model Zoo you can mea

50 Dec 07, 2022

A Transformer-Based Siamese Network for Change Detection

ChangeFormer: A Transformer-Based Siamese Network for Change Detection (Under review at IGARSS-2022) Wele Gedara Chaminda Bandara, Vishal M. Patel Her

214 Dec 29, 2022

Code for "Adversarial attack by dropping information." (ICCV 2021)

AdvDrop Code for "AdvDrop: Adversarial Attack to DNNs by Dropping Information(ICCV 2021)." Human can easily recognize visual objects with lost informa

52 Nov 10, 2022

MediaPipeで姿勢推定を行い、Tokyo2020オリンピック風のピクトグラムを表示するデモ

Tokyo2020-Pictogram-using-MediaPipe MediaPipeで姿勢推定を行い、Tokyo2020オリンピック風のピクトグラムを表示するデモです。 Tokyo2020Pictgram02.mp4 Requirement mediapipe 0.8.6 or later O

295 Dec 26, 2022

It is a simple library to speed up CLIP inference up to 3x (K80 GPU)

Related tags

Overview

CLIP-ONNX

Usage

Example in 3 steps

Examples

Comments

Can't use CUDAExecutionProvider

Performance is inconsistent with the original model

Error on installing the torch version in requirements.txt

ERROR: No matching distribution found for onnxruntime==1.11

updated and added information

Replace the operator of "torch.einsum"

Releases(1.2)

1.2(May 3, 2022)

1.0(May 3, 2022)

Owner

Gerasimov Maxim

MetaShift: A Dataset of Datasets for Evaluating Contextual Distribution Shifts and Training Conflicts (ICLR 2022)

Official PyTorch implementation of Data-free Knowledge Distillation for Object Detection, WACV 2021.

An image classification app boilerplate to serve your deep learning models asap!

Plotting points that lie on the intersection of the given curves using gradient descent.

The VeriNet toolkit for verification of neural networks

hySLAM is a hybrid SLAM/SfM system designed for mapping

A PyTorch implementation of the paper "Semantic Image Synthesis via Adversarial Learning" in ICCV 2017

Campsite Reservation Finder

This repo contains the code for paper Inverse Weighted Survival Games

img2pose: Face Alignment and Detection via 6DoF, Face Pose Estimation

Approaches to modeling terrain and maps in python

Minecraft Hack Detection With Python

Zalo AI challenge 2021 task hum to song

This is the official code for the paper "Ad2Attack: Adaptive Adversarial Attack for Real-Time UAV Tracking".

NeuroFind - A solution to the to the Task given by the Oberseminar of Messtechnik Institute of TU Dresden in 2021

Official code release for: EditGAN: High-Precision Semantic Image Editing

The Hailo Model Zoo includes pre-trained models and a full building and evaluation environment

A Transformer-Based Siamese Network for Change Detection

Code for "Adversarial attack by dropping information." (ICCV 2021)

MediaPipeで姿勢推定を行い、Tokyo2020オリンピック風のピクトグラムを表示するデモ