Wagtail CLIP allows you to search your Wagtail images using natural language queries.

Related tags

Searchwagtail-clip
Overview

Wagtail CLIP

Wagtail CLIP allows you to search your Wagtail images using natural language queries.

Intro Video

This project was inspired by, and draws heavily from memery, by deepfates et al. It makes use of OpenAI's CLIP model (it was very nice of them to open source it, cheers).

An example project is available here

Installation

You can install this package as follows (requires Git):

pip install \
    [email protected]+https://github.com/MattSegal/wagtail-clip.git \
    -f https://download.pytorch.org/whl/torch_stable.html \
    torch==1.7.1+cpu \
    torchvision==0.8.2+cpu

You will find that this installs ~200MB of deep learning libraries (PyTorch). You will also need to update your Django project's settings:

INSTALLED_APPS = [
    # ... whatever ...
    "wagtailclip",
]

# A place to store ~330MB of downloaded model parameters
WAGTAIL_CLIP_DOWNLOAD_PATH = "/clip"
# Maximum number of search results
WAGTAIL_CLIP_MAX_IMAGE_SEARCH_RESULTS = 256
# A unique name for the search backend.
WAGTAIL_CLIP_SEARCH_BACKEND_NAME = "clip"
# Recommended model, or your can roll your own (read the source).
WAGTAILIMAGES_IMAGE_MODEL = "wagtailclip.NaturalSearchImage"
# Add the search backend.
WAGTAILSEARCH_BACKENDS = {
    # ... whatever ...
    WAGTAIL_CLIP_SEARCH_BACKEND_NAME: {
        "BACKEND": "wagtailclip.search.CLIPSearchBackend",
    },
}

That's enough to get started, however if you want pre-download the ~330MB of model parameters, you can run this management command:

./manage.py download_clip

How it works

This package wraps the CLIP model. which can be used for:

  • encoding text into 1x512 float vectors
  • encoding images into 1x512 float vectors

These vectors can be thought of as points in a 512 dimensional space, where the closer two points are to each other, the more "related" they are. Importantly, CLIP encodes both text and images into the same space, meaning that we can:

  • encode all Wagtail images into vectors and store them in the database
  • encode a user's search query text into a vector; and then
  • compare the search query vector with all the image vectors

This comparison is done using a dot product to get a similarity score for each image. The operation is performed in Python. Once we have a similarity score we pick the top N (say, 256) most similar images and return those as the results.

Will this scale?

Haha probably not. I've tested my naive implementation on up to 2048 images and it runs OK (~3s / query). There are specialized Postgres extensions and vector similarity databases that you can use if you want to do this for tens of thousands of images.

Contributing

If you want to help out, make a pull request and/or email me at [email protected] or DM me on Twitter. Probably better to talk to me first before writing a bunch of code.

Owner
Matt Segal
Matt Segal
Pythonic search engine based on PyLucene.

Lupyne is a search engine based on PyLucene, the Python extension for accessing Java Lucene. Lucene is a relatively low-level toolkit, and PyLucene wr

A. Coady 83 Jan 02, 2023
🔍 Messages Searcher is make for search custom message in all channels in guild and dm.

🔍 Messages Searcher is make for search custom message in all channels in guild and dm.

Kaneki 33 Dec 31, 2022
A Python web searcher library with different search engines

Robert A simple Python web searcher library with different search engines. Install pip install roberthelper Usage from robert import GoogleSearcher

1 Dec 23, 2021
Yuno is context based search engine for anime.

Yuno yuno.mp4 Table of Contents Introduction Power Of Yuno Try Yuno How Yuno was created? References Introduction Yuno is a context based search engin

IAmParadox 354 Dec 19, 2022
This is a Telegram Bot written in Python for searching data on Google Drive.

This is a Telegram Bot written in Python for searching data on Google Drive. Supports multiple Shared Drives (TDs). Manual Guide for deploying the bot

Levi 158 Dec 27, 2022
Senginta is All in one Search Engine Scrapper for used by API or Python Module. It's Free!

Senginta is All in one Search Engine Scrapper. With traditional scrapping, Senginta can be powerful to get result from any Search Engine, and convert to Json. Now support only for Google Product Sear

33 Nov 21, 2022
Super Simple Similarities Service

Super Simple Similarities Service

vincent d warmerdam 95 Dec 25, 2022
An image inline search telegram bot.

Image-Search-Bot An image inline search telegram bot. Note: Use Telegram picture bot. That is better. Not recommending to deploy this bot. Made with P

Fayas Noushad 24 Oct 21, 2022
PwnWiki Telegram database searching bot

pwtgbot PwnWiki Telegram database searching bot. Screenshots How it looks like in the terminal when running How it looks like in Telegram Run Directly

K4YT3X 3 Jan 25, 2022
Free and Open, Distributed, RESTful Search Engine

Elasticsearch Elasticsearch is the distributed, RESTful search and analytics engine at the heart of the Elastic Stack. You can use Elasticsearch to st

elastic 62.4k Jan 08, 2023
Whoosh indexing capabilities for Flask-SQLAlchemy, Python 3 compatibility fork.

Flask-WhooshAlchemy3 Whoosh indexing capabilities for Flask-SQLAlchemy, Python 3 compatibility fork. Performance improvements and suggestions are read

Blake VandeMerwe 27 Mar 10, 2022
TG-searcherBot - Search any channel/chat from keyword

TG-searcherBot Search any channel/chat from keyword. Commands /start - Starts th

TechiError 12 Nov 04, 2022
document organizer with tags and full-text-search, in a simple and clean sqlite3 schema

document organizer with tags and full-text-search, in a simple and clean sqlite3 schema

Manos Pitsidianakis 152 Oct 29, 2022
a Telegram bot writen in Python for searching files in Drive. Based on SearchX-bot

Drive Search Bot This is a Telegram bot writen in Python for searching files in Drive. Based on SearchX-bot How to deploy? Clone this repo: git clone

Hafitz Setya 25 Dec 09, 2022
A library for fast parse & import of Windows Prefetch into Elasticsearch.

prefetch2es Fast import of Windows Prefetch(.pf) into Elasticsearch. prefetch2es uses C library libscca. Usage When using from the commandline interfa

S.Nakano 5 Nov 24, 2022
A sphinx extension for designing beautiful, screen-size responsive web components.

sphinx-design A sphinx extension for designing beautiful, view size responsive web components. Created with inspiration from Bootstrap (v5), Material

Executable Books 109 Jan 01, 2023
Python Elasticsearch handler for the standard python logging framework

Python Elasticsearch Log handler This library provides an Elasticsearch logging appender compatible with the python standard logging library. This lib

Mohammed Mousa 0 Dec 08, 2021
An open source, non-profit search engine implemented in python

Mwmbl: No ads, no tracking, no cruft, no profit Mwmbl is a non-profit, ad-free, free-libre and free-lunch search engine with a focus on useability and

639 Jan 04, 2023
PwnWiki 数据库搜索命令行工具;该工具有点像 searchsploit 命令,只是搜索的不是 Exploit Database 而是 PwnWiki 条目

PWSearch PwnWiki 数据库搜索命令行工具。该工具有点像 searchsploit 命令,只是搜索的不是 Exploit Database 而是 PwnWiki 条目。

K4YT3X 72 Dec 20, 2022
A search engine to query social media insights with political theme

social-insights Social insights is an open source big data project that generates insights about various interesting topics happening every day. Curre

UMass GDSC 10 Feb 28, 2022