A containerized REST API around OpenAI's CLIP model.

Last update: Nov 06, 2022

Overview

OpenAI's CLIP — REST API

This is a container wrapping OpenAI's CLIP model in a RESTful interface.

Running the container locally

First, build the container:

docker build -t clip-container:latest .

Then, you can run it:

docker run -it -p 8080:8080 --name "clip-container" --rm clip-container:latest /opt/ml/code/serve

Sending requests:

The container exposes two different endpoints:

GET /ping: Returns 200 status if the container is working properly.
POST /invocations: Processes a list of images and returns the list of labels with their corresponding probabilities.

Here is an example request assuming the container is listening in port 8080:

curl --location --request POST 'http://localhost:8080/invocations' \
--header 'Content-Type: application/json' \
--data-raw '{
    "images": [
        "https://images.unsplash.com/photo-1597308680537-1ba44407ffc0?ixid=MXwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHw%3D&ixlib=rb-1.2.1&auto=format&fit=crop&w=1834&q=80",
        "https://images.unsplash.com/photo-1589270216117-7972b3082c7d?ixid=MXwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHw%3D&ixlib=rb-1.2.1&auto=format&fit=crop&w=1834&q=80"],
    "classes": ["person", "bag", "person with a bag", "woman riding a horse", "woman with a bag", "woman with black shirt and a bag"]
}'

The response looks like this:

[
    {
        "url": "https://images.unsplash.com/photo-1597308680537-1ba44407ffc0?ixid=MXwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHw%3D&ixlib=rb-1.2.1&auto=format&fit=crop&w=1834&q=80", 
        "labels": [
            "woman with black shirt and a bag", 
            "woman with a bag", 
            "person with a bag", 
            "bag", "person"
        ], 
        "probs": [1.0, 1.7488513970320696e-09, 1.1663764917350243e-19, 4.179975909038141e-30, 3.77612043676229e-30]
    }, 
    {
        "url": "https://images.unsplash.com/photo-1589270216117-7972b3082c7d?ixid=MXwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHw%3D&ixlib=rb-1.2.1&auto=format&fit=crop&w=1834&q=80", 
        "labels": [
            "person with a bag", 
            "woman with black shirt and a bag", 
            "bag", 
            "woman with a bag", 
            "person"
        ], 
        "probs": [1.0, 2.4879632576357835e-08, 2.065714813830402e-13, 7.658033346455602e-15, 1.1307645811408335e-23]
    }
]

SageMaker Integration

This container is compatible with SageMaker so you should be able to host it as a SageMaker endpoint with no modifications. The code supports GPU and CPU instances.

A containerized REST API around OpenAI's CLIP model.

Related tags

Overview

OpenAI's CLIP — REST API

Running the container locally

Sending requests:

SageMaker Integration

Owner

Santiago Valdarrama

A general python framework for visual object tracking and video object segmentation, based on PyTorch

Car Price Predictor App used to predict the price of the car based on certain input parameters created using python's scikit-learn, fastapi, numpy and joblib packages.

PoolFormer: MetaFormer is Actually What You Need for Vision

the official code for ICRA 2021 Paper: "Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation"

[ICLR 2022] Contact Points Discovery for Soft-Body Manipulations with Differentiable Physics

Torch-ngp - A pytorch implementation of the hash encoder proposed in instant-ngp

The official implementation of Equalization Loss v1 & v2 (CVPR 2020, 2021) based on MMDetection.

Single Image Deraining Using Bilateral Recurrent Network (TIP 2020)

这是一个yolox-keras的源码，可以用于训练自己的模型。

A Machine Teaching Framework for Scalable Recognition

Allele-specific pipeline for unbiased read mapping(WIP), QTL discovery(WIP), and allelic-imbalance analysis

Learning infinite-resolution image processing with GAN and RL from unpaired image datasets, using a differentiable photo editing model.

ICLR 2021 i-Mix: A Domain-Agnostic Strategy for Contrastive Representation Learning

Self-Supervised Multi-Frame Monocular Scene Flow (CVPR 2021)

AFLNet: A Greybox Fuzzer for Network Protocols

MazeRL is an application oriented Deep Reinforcement Learning (RL) framework

Forecasting directional movements of stock prices for intraday trading using LSTM and random forest

Scales, Chords, and Cadences: Practical Music Theory for MIR Researchers

Cooperative Driving Dataset: a dataset for multi-agent driving scenarios

Vision transformers (ViTs) have found only limited practical use in processing images