A basic duplicate image detection service using perceptual image hash functions and nearest neighbor search, implemented using faiss, fastapi, and imagehash

Last update: Nov 11, 2022

Overview

Duplicate Image Detection

Getting Started

Install dependencies pip install -r requirements.txt
Run service python main.py

Testing

Test with pytest

How it Works

This system uses a perceptual hashing function, similar to Apple's CSAM Detection. Instead of generating image hashes using NeuralHash, it uses a difference hash (dHash), which is simpler and less computationally intensive as it doesn't require neural networks. Since we don't have the same privacy constraints as Apple, we will be using nearest neighbor searches to identify duplicate images.

Difference Hash

dHash is a perceptual hashing function that produces hash values that are resilient to image scaling, as well as changes in color, brightness, and aspect ratio [1]. There are 4 main steps for creating a difference hash for an image:

Convert to greyscale*
Resize image to (hash_size+1, hash_size)
Calculate horizontal gradient, reducing image size to (hash_size, hash_size)
Assign bits based on horizontal gradient values

*We convert the image to greyscale before resizing for optimal performance

Nearest Neighbors

Image hashes that we want to check for duplicates against will be stored in a binary index for fast and efficient nearest neighbor searches. We will use Hamming distance as a metric to determine the similarity between image hashes, for dHash, distances less than 10 (96.09% similarity) likely indicate similar/duplicate images [1].

References

[1] https://www.hackerfactor.com/blog/?/archives/529-Kind-of-Like-That.html

A basic duplicate image detection service using perceptual image hash functions and nearest neighbor search, implemented using faiss, fastapi, and imagehash

Related tags

Overview

Duplicate Image Detection

Getting Started

Testing

How it Works

Difference Hash

Nearest Neighbors

References

Owner

Matthew Podolak

Example of semantic segmentation in Keras

Model-free Vehicle Tracking and State Estimation in Point Cloud Sequences

Numerical differential equation solvers in JAX. Autodifferentiable and GPU-capable.

Serve TensorFlow ML models with TF-Serving and then create a Streamlit UI to use them

Working demo of the Multi-class and Anomaly classification model using the CLIP feature space

Unified learning approach for egocentric hand gesture recognition and fingertip detection

LEAP: Learning Articulated Occupancy of People

PyDEns is a framework for solving Ordinary and Partial Differential Equations (ODEs & PDEs) using neural networks

Multi-Modal Fingerprint Presentation Attack Detection: Evaluation On A New Dataset

TilinGNN: Learning to Tile with Self-Supervised Graph Neural Network (SIGGRAPH 2020)

PyG (PyTorch Geometric) - A library built upon PyTorch to easily write and train Graph Neural Networks (GNNs)

public repo for ESTER dataset and modeling (EMNLP'21)

[CVPR2021] The source code for our paper 《Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning》.

duralava is a neural network which can simulate a lava lamp in an infinite loop.

Bringing Characters to Life with Computer Brains in Unity

Lightweight, Python library for fast and reproducible experimentation :microscope:

CVPR 2021 Challenge on Super-Resolution Space

Colour detection is necessary to recognize objects, it is also used as a tool in various image editing and drawing apps.

ByteTrack with ReID module following the paradigm of FairMOT, tracking strategy is borrowed from FairMOT/JDE.

Deep Image Matting implementation in PyTorch