Wagtail CLIP allows you to search your Wagtail images using natural language queries.

Related tags

Searchwagtail-clip
Overview

Wagtail CLIP

Wagtail CLIP allows you to search your Wagtail images using natural language queries.

Intro Video

This project was inspired by, and draws heavily from memery, by deepfates et al. It makes use of OpenAI's CLIP model (it was very nice of them to open source it, cheers).

An example project is available here

Installation

You can install this package as follows (requires Git):

pip install \
    [email protected]+https://github.com/MattSegal/wagtail-clip.git \
    -f https://download.pytorch.org/whl/torch_stable.html \
    torch==1.7.1+cpu \
    torchvision==0.8.2+cpu

You will find that this installs ~200MB of deep learning libraries (PyTorch). You will also need to update your Django project's settings:

INSTALLED_APPS = [
    # ... whatever ...
    "wagtailclip",
]

# A place to store ~330MB of downloaded model parameters
WAGTAIL_CLIP_DOWNLOAD_PATH = "/clip"
# Maximum number of search results
WAGTAIL_CLIP_MAX_IMAGE_SEARCH_RESULTS = 256
# A unique name for the search backend.
WAGTAIL_CLIP_SEARCH_BACKEND_NAME = "clip"
# Recommended model, or your can roll your own (read the source).
WAGTAILIMAGES_IMAGE_MODEL = "wagtailclip.NaturalSearchImage"
# Add the search backend.
WAGTAILSEARCH_BACKENDS = {
    # ... whatever ...
    WAGTAIL_CLIP_SEARCH_BACKEND_NAME: {
        "BACKEND": "wagtailclip.search.CLIPSearchBackend",
    },
}

That's enough to get started, however if you want pre-download the ~330MB of model parameters, you can run this management command:

./manage.py download_clip

How it works

This package wraps the CLIP model. which can be used for:

  • encoding text into 1x512 float vectors
  • encoding images into 1x512 float vectors

These vectors can be thought of as points in a 512 dimensional space, where the closer two points are to each other, the more "related" they are. Importantly, CLIP encodes both text and images into the same space, meaning that we can:

  • encode all Wagtail images into vectors and store them in the database
  • encode a user's search query text into a vector; and then
  • compare the search query vector with all the image vectors

This comparison is done using a dot product to get a similarity score for each image. The operation is performed in Python. Once we have a similarity score we pick the top N (say, 256) most similar images and return those as the results.

Will this scale?

Haha probably not. I've tested my naive implementation on up to 2048 images and it runs OK (~3s / query). There are specialized Postgres extensions and vector similarity databases that you can use if you want to do this for tens of thousands of images.

Contributing

If you want to help out, make a pull request and/or email me at [email protected] or DM me on Twitter. Probably better to talk to me first before writing a bunch of code.

Owner
Matt Segal
Matt Segal
Senginta is All in one Search Engine Scrapper for used by API or Python Module. It's Free!

Senginta is All in one Search Engine Scrapper. With traditional scrapping, Senginta can be powerful to get result from any Search Engine, and convert to Json. Now support only for Google Product Sear

33 Nov 21, 2022
Searches for MAC addresses in a text file of a Cisco "show IP arp" in any address format

show-ip-arp-mac-lookup Searches for MAC addresses in a text file of a Cisco "show IP arp" in any address format What it does: Takes a text file with t

Stew Alexander 0 Dec 24, 2022
Pythonic Lucene - A simplified python impelementaiton of Apache Lucene

A simplified python impelementaiton of Apache Lucene, mabye helps to understand how an enterprise search engine really works.

Mahdi Sadeghzadeh Ghamsary 2 Sep 12, 2022
A play store search application programming interface ( API )

Play-Store-API A play store search application programming interface ( API ) Made with Python3

Fayas Noushad 8 Oct 21, 2022
Reverse-ikea-image-search - A simple image of ikea search using jina.ai

IKEA Reverse Image Search This is a demo project to fetch ikea product images(IK

SOUVIK GHOSH 4 Mar 08, 2022
Deep Image Search - AI-Based Image Search Engine

Deep Image Search is an AI-based image search engine that includes deep transfer learning features Extraction and tree-based vectorized search technique.

144 Jan 05, 2023
esguard provides a Python decorator that waits for processing while monitoring the load of Elasticsearch.

esguard esguard provides a Python decorator that waits for processing while monitoring the load of Elasticsearch. Quick Start You need to launch elast

po3rin 5 Dec 08, 2021
Yet another googlesearch - A Python library for executing intelligent, realistic-looking, and tunable Google searches.

yagooglesearch - Yet another googlesearch Overview yagooglesearch is a Python library for executing intelligent, realistic-looking, and tunable Google

115 Dec 29, 2022
Whoosh indexing capabilities for Flask-SQLAlchemy, Python 3 compatibility fork.

Flask-WhooshAlchemy3 Whoosh indexing capabilities for Flask-SQLAlchemy, Python 3 compatibility fork. Performance improvements and suggestions are read

Blake VandeMerwe 27 Mar 10, 2022
This project is a sample demo of Arxiv search related to AI/ML Papers built using Streamlit, sentence-transformers and Faiss.

This project is a sample demo of Arxiv search related to AI/ML Papers built using Streamlit, sentence-transformers and Faiss.

Karn Deb 49 Oct 30, 2022
Full-text multi-table search application for Django. Easy to install and use, with good performance.

django-watson django-watson is a fast multi-model full-text search plugin for Django. It is easy to install and use, and provides high quality search

Dave Hall 1.1k Jan 03, 2023
Home for Elasticsearch examples available to everyone. It's a great way to get started.

Introduction This is a collection of examples to help you get familiar with the Elastic Stack. Each example folder includes a README with detailed ins

elastic 2.5k Jan 03, 2023
solrpy is a Python client for Solr

solrpy solrpy is a Python client for Solr, an enterprise search server built on top of Lucene. solrpy allows you to add documents to a Solr instance,

Jiho Persy Lee 37 Jul 22, 2021
Search emails from a domain through search engines

EmailFinder - search emails through Search Engines

Josué Encinar 155 Dec 30, 2022
Yuno is context based search engine for anime.

Yuno yuno.mp4 Table of Contents Introduction Power Of Yuno Try Yuno How Yuno was created? References Introduction Yuno is a context based search engin

IAmParadox 354 Dec 19, 2022
A web search server for ParlAI, including Blenderbot2.

Description A web search server for ParlAI, including Blenderbot2. Querying the server: The server reacting correctly: Uses html2text to strip the mar

Jules Gagnon-Marchand 119 Jan 06, 2023
PwnWiki 数据库搜索命令行工具;该工具有点像 searchsploit 命令,只是搜索的不是 Exploit Database 而是 PwnWiki 条目

PWSearch PwnWiki 数据库搜索命令行工具。该工具有点像 searchsploit 命令,只是搜索的不是 Exploit Database 而是 PwnWiki 条目。

K4YT3X 72 Dec 20, 2022
Inverted index creation and query search mechanism on Wikipedia pages.

WikiPedia Search Engine Step 1 : Installing Requirements Install "stemming" module for python using pip. Step 2 : Parsing the Data To parse the data,

Piyush Atri 1 Nov 27, 2021
A fast, efficiency python package for searching and getting search results with many different search engines

search A fast, efficiency python package for searching and getting search results with many different search engines. Installation To install the pack

Neurs 0 Oct 06, 2022
Eland is a Python Elasticsearch client for exploring and analyzing data in Elasticsearch with a familiar Pandas-compatible API.

Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch

elastic 463 Dec 30, 2022