A basic duplicate image detection service using perceptual image hash functions and nearest neighbor search, implemented using faiss, fastapi, and imagehash

Last update: Nov 11, 2022

Overview

Duplicate Image Detection

Getting Started

Install dependencies pip install -r requirements.txt
Run service python main.py

Testing

Test with pytest

How it Works

This system uses a perceptual hashing function, similar to Apple's CSAM Detection. Instead of generating image hashes using NeuralHash, it uses a difference hash (dHash), which is simpler and less computationally intensive as it doesn't require neural networks. Since we don't have the same privacy constraints as Apple, we will be using nearest neighbor searches to identify duplicate images.

Difference Hash

dHash is a perceptual hashing function that produces hash values that are resilient to image scaling, as well as changes in color, brightness, and aspect ratio [1]. There are 4 main steps for creating a difference hash for an image:

Convert to greyscale*
Resize image to (hash_size+1, hash_size)
Calculate horizontal gradient, reducing image size to (hash_size, hash_size)
Assign bits based on horizontal gradient values

*We convert the image to greyscale before resizing for optimal performance

Nearest Neighbors

Image hashes that we want to check for duplicates against will be stored in a binary index for fast and efficient nearest neighbor searches. We will use Hamming distance as a metric to determine the similarity between image hashes, for dHash, distances less than 10 (96.09% similarity) likely indicate similar/duplicate images [1].

References

[1] https://www.hackerfactor.com/blog/?/archives/529-Kind-of-Like-That.html

A basic duplicate image detection service using perceptual image hash functions and nearest neighbor search, implemented using faiss, fastapi, and imagehash

Related tags

Overview

Duplicate Image Detection

Getting Started

Testing

How it Works

Difference Hash

Nearest Neighbors

References

Owner

Matthew Podolak

An introduction to satellite image analysis using Python + OpenCV and JavaScript + Google Earth Engine

Certifiable Outlier-Robust Geometric Perception

Vector.ai assignment

A Kaggle competition: discriminate gender based on handwriting

PyTorch implementation of paper: HPNet: Deep Primitive Segmentation Using Hybrid Representations.

A Comparative Review of Recent Kinect-Based Action Recognition Algorithms (TIP2020, Matlab codes)

This is an official implementation for "Self-Supervised Learning with Swin Transformers".

A Pytree Module system for Deep Learning in JAX

Embeddinghub is a database built for machine learning embeddings.

Res2Net for Instance segmentation and Object detection using MaskRCNN

Code for BMVC2021 "MOS: A Low Latency and Lightweight Framework for Face Detection, Landmark Localization, and Head Pose Estimation"

Implementation of light baking system for ray tracing based on Activision's UberBake

Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20. model in ONNX

PyTorch implementation of UNet++ (Nested U-Net).

Codes for NeurIPS 2021 paper "Adversarial Neuron Pruning Purifies Backdoored Deep Models"

Semi Supervised Learning for Medical Image Segmentation, a collection of literature reviews and code implementations.

PyTorch implementation of DeepDream algorithm

A benchmark dataset for mesh multi-label-classification based on cube engravings introduced in MeshCNN

This repository contains the official MATLAB implementation of the TDA method for reverse image filtering

In this project, two programs can help you take full agvantage of time on the model training with a remote server