Open source implementation of "A Self-Supervised Descriptor for Image Copy Detection" (SSCD).

Overview

A Self-Supervised Descriptor for Image Copy Detection (SSCD)

This is the open-source codebase for "A Self-Supervised Descriptor for Image Copy Detection", recently accepted to CVPR 2022.

This work uses self-supervised contrastive learning with strong differential entropy regularization to create a fingerprint for image copy detection.

SSCD diagram

About this codebase

This implementation is built on Pytorch Lightning, with some components from Classy Vision.

Our original experiments were conducted in a proprietary codebase using data files (fonts and emoji) that are not licensed for redistribution. This version uses Noto fonts and Twemoji emoji, via the AugLy project. As a result, models trained in this codebase perform slightly differently than our pretrained models.

Pretrained models

We provide trained models from our original experiments to allow others to reproduce our evaluation results.

For convenience, we provide equivalent model files in a few formats:

  • Files ending in .classy.pt are weight files using Classy Vision ResNe(X)t backbones, which is how these models were trained.
  • Files ending in .torchvision.pt are weight files using Torchvision ResNet backbones. These files may be easier to integrate in Torchvision-based codebases. See model.py for how we integrate GeM pooling and L2 normalization into these models.
  • Files ending in .torchscript.pt are standalone TorchScript models that can be used in any pytorch project without any SSCD code.

We provide the following models:

name dataset trunk augmentations dimensions classy vision torchvision torchscript
sscd_disc_blur DISC ResNet50 strong blur 512 link link link
sscd_disc_advanced DISC ResNet50 advanced 512 link link link
sscd_disc_mixup DISC ResNet50 advanced + mixup 512 link link link
sscd_disc_large DISC ResNeXt101 32x4 advanced + mixup 1024 link link
sscd_imagenet_blur ImageNet ResNet50 strong blur 512 link link link
sscd_imagenet_advanced ImageNet ResNet50 advanced 512 link link link
sscd_imagenet_mixup ImageNet ResNet50 advanced + mixup 512 link link link

We recommend sscd_disc_mixup (ResNet50) as a default SSCD model, especially when comparing to other standard ResNet50 models, and sscd_disc_large (ResNeXt101) as a higher accuracy alternative using a bit more compute.

Classy Vision and Torchvision use different default cardinality settings for ResNeXt101. We do not provide a Torchvision version of the sscd_disc_large model for this reason.

Installation

If you only plan to use torchscript models for inference, no installation steps are necessary, and any environment with a recent version of pytorch installed can run our torchscript models.

For all other uses, see installation steps below.

The code is written for pytorch-lightning 1.5 (the latest version at time of writing), and may need changes for future Lightning versions.

Option 1: Install dependencies using Conda

Install and activate conda, then create a conda environment for SSCD as follows:

# Create conda environment
conda create --name sscd -c pytorch -c conda-forge \
  pytorch torchvision cudatoolkit=11.3 \
  "pytorch-lightning>=1.5,<1.6" lightning-bolts \
  faiss python-magic pandas numpy

# Activate environment
conda activate sscd

# Install Classy Vision and AugLy from PIP:
python -m pip install classy_vision augly

You may need to select a cudatoolkit version that corresponds to the system CUDA library version you have installed. See PyTorch documentation for supported combinations of pytorch, torchvision and cudatoolkit versions.

For a non-CUDA (CPU only) installation, replace cudatoolkit=... with cpuonly.

Option 2: Install dependencies using PIP

# Create environment
python3 -m virtualenv ./venv

# Activate environment
source ./venv/bin/activate

# Install dependencies in this environment
python -m pip install -r ./requirements.txt --extra-index-url https://download.pytorch.org/whl/cu113

The --extra-index-url option selects a newer version of CUDA libraries, required for NVidia A100 GPUs. This can be omitted if A100 support is not needed.

Inference using SSCD models

This section describes how to use pretrained SSCD models for inference. To perform inference for DISC and Copydays evaluations, see Evaluation.

Preprocessing

We recommend preprocessing images for inference either resizing the small edge to 288 or resizing the image to a square tensor.

Using fixed-sized square tensors is more efficient on GPUs, to make better use of batching. Copy detection using square tensors benefits from directly resizing to the target tensor size. This skews the image, and does not preserve aspect ratio. This differs from the common practice for classification inference.

from torchvision import transforms

normalize = transforms.Normalize(
    mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225],
)
small_288 = transforms.Compose([
    transforms.Resize(288),
    transforms.ToTensor(),
    normalize,
])
skew_320 = transforms.Compose([
    transforms.Resize([320, 320]),
    transforms.ToTensor(),
    normalize,
])

Inference using Torchscript

Torchscript files can be loaded directly in other projects without any SSCD code or dependencies.

import torch
from PIL import Image

model = torch.jit.load("/path/to/sscd_disc_mixup.torchscript.pt")
img = Image.open("/path/to/image.png").convert('RGB')
batch = small_288(img).unsqueeze(0)
embedding = model(batch)[0, :]

These Torchscript models are prepared for inference. For other uses (eg. fine-tuning), use model weight files, as described below.

Load model weight files

To load model weight files, first construct the Model object, then load the weights using the standard torch.load and load_state_dict methods.

import torch
from sscd.models.model import Model

model = Model("CV_RESNET50", 512, 3.0)
weights = torch.load("/path/to/sscd_disc_mixup.classy.pt")
model.load_state_dict(weights)
model.eval()

Once loaded, these models can be used interchangeably with Torchscript models for inference.

Model backbone strings can be found in the Backbone enum in model.py. Classy Vision models start with the prefix CV_ and Torchvision models start with TV_.

Using SSCD descriptors

SSCD models produce 512 dimension (except the "large" model, which uses 1024 dimensions) L2 normalized descriptors for each input image. The similarity of two images with descriptors a and b can be measured by descriptor cosine similarity (a.dot(b); higher is more similar), or equivalently using euclidean distance ((a-b).norm(); lower is more similar).

For the sscd_disc_mixup model, DISC image pairs with embedding cosine similarity greater than 0.75 are copies with 90% precision, for example. This corresponds to a euclidean distance less than 0.7, or squared euclidean distance less than 0.5.

Descriptor post-processing

For best results, we recommend additional descriptor processing when sample images from the target distribution are available. Centering (subtracting the mean) followed by L2 normalization, or whitening followed by L2 normalization, can improve accuracy.

Score normalization can make similarity more consistent and improve global accuracy metrics (but has no effect on ranking metrics).

Other model formats

If pretrained models in another format (eg. ONYX) would be useful for you, let us know by filing a feature request.

Reproducing evaluation results

To reproduce evaluation results, see Evaluation.

Training SSCD models

For information on how to train SSCD models, see Training.

License

The SSCD codebase uses the CC-NC 4.0 International license.

Citation

If you find our codebase useful, please consider giving a star and cite as:

@article{pizzi2022self,
  title={A Self-Supervised Descriptor for Image Copy Detection},
  author={Pizzi, Ed and Roy, Sreya Dutta and Ravindra, Sugosh Nagavara and Goyal, Priya and Douze, Matthijs},
  journal={Proc. CVPR},
  year={2022}
}
Owner
Meta Research
Meta Research
"Learning Free Gait Transition for Quadruped Robots vis Phase-Guided Controller"

PhaseGuidedControl The current version is developed based on the old version of RaiSim series, and possibly requires further modification. It will be

X-Mechanics 12 Oct 21, 2022
EvoJAX is a scalable, general purpose, hardware-accelerated neuroevolution toolkit

EvoJAX: Hardware-Accelerated Neuroevolution EvoJAX is a scalable, general purpose, hardware-accelerated neuroevolution toolkit. Built on top of the JA

Google 598 Jan 07, 2023
A series of Python scripts to access measurements from Fluke 28X meters. Fluke IR Remote Interface required.

Fluke289_data_access A series of Python scripts to access measurements from Fluke 28X meters. Fluke IR Remote Interface required. Created from informa

3 Dec 08, 2022
A simple baseline for the 2022 IEEE GRSS Data Fusion Contest (DFC2022)

DFC2022 Baseline A simple baseline for the 2022 IEEE GRSS Data Fusion Contest (DFC2022) This repository uses TorchGeo, PyTorch Lightning, and Segmenta

isaac 24 Nov 28, 2022
Code for the paper Task Agnostic Morphology Evolution.

Task-Agnostic Morphology Optimization This repository contains code for the paper Task-Agnostic Morphology Evolution by Donald (Joey) Hejna, Pieter Ab

Joey Hejna 18 Aug 04, 2022
Veri Setinizi Yolov5 Formatına Dönüştürün

Veri Setinizi Yolov5 Formatına Dönüştürün! Bu Repo da Neler Var? Xml Formatındaki Veri Setini .Txt Formatına Çevirme Xml Formatındaki Dosyaları Silme

Kadir Nar 4 Aug 22, 2022
Code of paper "CDFI: Compression-Driven Network Design for Frame Interpolation", CVPR 2021

CDFI (Compression-Driven-Frame-Interpolation) [Paper] (Coming soon...) | [arXiv] Tianyu Ding*, Luming Liang*, Zhihui Zhu, Ilya Zharkov IEEE Conference

Tianyu Ding 95 Dec 04, 2022
A Shading-Guided Generative Implicit Model for Shape-Accurate 3D-Aware Image Synthesis

A Shading-Guided Generative Implicit Model for Shape-Accurate 3D-Aware Image Synthesis Project Page | Paper A Shading-Guided Generative Implicit Model

Xingang Pan 115 Dec 18, 2022
We simulate traveling back in time with a modern camera to rephotograph famous historical subjects.

[SIGGRAPH Asia 2021] Time-Travel Rephotography [Project Website] Many historical people were only ever captured by old, faded, black and white photos,

298 Jan 02, 2023
Using PyTorch Perform intent classification using three different models to see which one is better for this task

Using PyTorch Perform intent classification using three different models to see which one is better for this task

Yoel Graumann 1 Feb 14, 2022
Multilingual Image Captioning

Multilingual Image Captioning Authors: Bhavitvya Malik, Gunjan Chhablani Demo Link: https://huggingface.co/spaces/flax-community/multilingual-image-ca

Gunjan Chhablani 32 Nov 25, 2022
SAMO: Streaming Architecture Mapping Optimisation

SAMO: Streaming Architecture Mapping Optimiser The SAMO framework provides a method of optimising the mapping of a Convolutional Neural Network model

Alexander Montgomerie-Corcoran 20 Dec 10, 2022
(IEEE TIP 2021) Regularized Densely-connected Pyramid Network for Salient Instance Segmentation

RDPNet IEEE TIP 2021: Regularized Densely-connected Pyramid Network for Salient Instance Segmentation PyTorch training and testing code are available.

Yu-Huan Wu 41 Oct 21, 2022
Deep Image Matting implementation in PyTorch

Deep Image Matting Deep Image Matting paper implementation in PyTorch. Differences "fc6" is dropped. Indices pooling. "fc6" is clumpy, over 100 millio

Yang Liu 724 Dec 27, 2022
The code for replicating the experiments from the LFI in SSMs with Unknown Dynamics paper.

Likelihood-Free Inference in State-Space Models with Unknown Dynamics This package contains the codes required to run the experiments in the paper. Th

Alex Aushev 0 Dec 27, 2021
Implementation of paper "DeepTag: A General Framework for Fiducial Marker Design and Detection"

Implementation of paper DeepTag: A General Framework for Fiducial Marker Design and Detection. Project page: https://herohuyongtao.github.io/research/

Yongtao Hu 46 Dec 12, 2022
TransVTSpotter: End-to-end Video Text Spotter with Transformer

TransVTSpotter: End-to-end Video Text Spotter with Transformer Introduction A Multilingual, Open World Video Text Dataset and End-to-end Video Text Sp

weijiawu 66 Dec 26, 2022
Repository of 3D Object Detection with Pointformer (CVPR2021)

3D Object Detection with Pointformer This repository contains the code for the paper 3D Object Detection with Pointformer (CVPR 2021) [arXiv]. This wo

Zhuofan Xia 117 Jan 06, 2023
Python implementation of 3D facial mesh exaggeration using the techniques described in the paper: Computational Caricaturization of Surfaces.

Python implementation of 3D facial mesh exaggeration using the techniques described in the paper: Computational Caricaturization of Surfaces.

Wonjong Jang 8 Nov 01, 2022
SoK: Vehicle Orientation Representations for Deep Rotation Estimation

SoK: Vehicle Orientation Representations for Deep Rotation Estimation Raymond H. Tu, Siyuan Peng, Valdimir Leung, Richard Gao, Jerry Lan This is the o

FIRE Capital One Machine Learning of the University of Maryland 12 Oct 07, 2022