D2LV: A Data-Driven and Local-Verification Approach for Image Copy Detection

Related tags

Deep LearningISC2021
Overview

Facebook AI Image Similarity Challenge: Matching Track —— Team: imgFp

This is the source code of our 3rd place solution to matching track of Image Similarity Challenge (ISC) 2021 organized by Facebook AI. This repo will tell you how to get our result step by step.

Method Overview

For the Matching Track task, we use a global and local dual retrieval method. The global recall model is EsViT, the same as task Descriptor Track. The local recall used SIFT point features. As shown in the figure, our pipeline is divided into four modules. When using an image for query, it is first put into the preprocessing module for overlay detection. Then the global and local features are extracted and retrieved in parallel. There are three recall branches: global recall, original local recall and cropped local recall. The last module will compute the matching score of three branches and merge them into the final result.

method_overview

Installation

Please install python 3.7, Pytorch 1.8 (or higher version) and some packages according to requirements.txt.

gcc version 7.3.1

We run on a 8GPUs (Tesla V100-SXM2-32GB, 32510.5MB), 48CPUs and 300G Memory machine.

Get Result Demo

Now we will describe how to get our result, we use a query image Q24789.jpg as input for demo.

step1: query images preprocess

We train a yolov5 to detect the crop augment in query images. The detils are in README.md of Team: AITechnology in task Descriptor Track. Due to different parameters, we need to preprocess the local recall and global recall respectively.

python preprocessing.py $origin_image_path $save_image_result_path

e.g.
______
cd preprocess
python preprocessing_global.py ../data/queryimages/ ../data/queryimages_crop_global/
python preprocessing_local.py ../data/queryimages/ ../data/queryimages_crop_local/

*note: If Arial.ttf download fails, please copy the local yolov5/Arial.ttf to the specified directory following the command line prompt. cp yolov5/Arial.ttf /root/.config/Ultralytics/Arial.ttf

step2: get original image's local feature

First export the path.

cd local_fea/feature_extract
export LD_LIBRARY_PATH=./extLib/ 

Run the executable program localfea_extract_sift to get the SIFT local point feature, and out to a txt file.

Usage: ./localfea_extract_sift 
    
     
     
      

e.g.
./localfea_extract_sift Q24789 ../../data/queryimages/Q24789.jpg ../feature_out/Q24789.txt

     
    
   

Or you can extract all query images by a list.

python multi_extract_sift.py ../../data/querylist_demo.txt ../../data/queryimages/ ../feature_out/

For example, two point features in a image result txt file are:

Q24789_0_3.1348_65.589_1.76567_-1.09404||0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,16,13,0,0,0,0,0,0,16,28,7,5,0,0,0,0,0,0,0,0,20,12,0,0,23,5,0,0,29,29,7,12,56,29,5,0,0,11,7,20,38,45,10,0,0,0,0,14,0,0,0,0,39,56,36,8,39,14,0,0,46,56,21,24,56,22,0,0,5,8,8,39,38,11,0,0,0,0,19,47,0,0,0,0,8,56,56,7,37,0,0,0,10,52,56,56,52,0,0,0,0,0,35,56,11,0,0,0,0,0,54,45
Q24789_1_8.26344_431.038_1.75921_1.22328||42,27,0,4,11,12,9,14,49,28,0,6,17,25,18,14,45,37,4,0,12,45,8,9,8,17,9,0,27,50,6,0,41,24,0,0,10,14,19,20,50,34,0,6,20,22,17,21,36,22,4,4,43,50,15,12,26,32,8,0,17,50,17,6,28,12,0,0,0,21,31,21,50,14,0,0,17,31,23,38,19,10,9,17,50,50,14,15,17,23,13,10,19,45,26,8,11,11,0,0,0,6,6,0,28,13,0,0,8,20,12,15,11,9,0,0,24,47,12,9,18,38,22,6,13,28,10,8
...

step3: retrieval use original image local feature

We use the GPU Faiss to retrieval, because there are about 600 million SIFT point features in reference images. They need about 165G GPU Memory for Float16 compute.

Firstly, you need extract all local features of reference images by multi_extract_sift.py and store them in uint8 type to save space. (ref_sift_fea_300.pkl (68G) and ref_sift_name_300.pkl (25G))

Then get original image local recall result:

cd local_fea/faiss_search
python db_search.py ../feature_out/ ../faiss_out/local_pair_result.txt

For example, the result txt file ../faiss_out/local_pair_result.txt:

Q24789.jpg,R540735.jpg

step4: get crop image's local feature (only for part images which have crop result)

Same as step2, but only use the croped image in ../../preprocess/local_crop_list.txt.

cd local_fea/feature_extract
python multi_extract_sift.py ../../preprocess/local_crop_list.txt ../../data/queryimages_crop_local/ ../crop_feature_out/

step5: retrieval use crop image local feature (only for part images which have crop result)

Same as step3:

cd local_fea/faiss_search
python db_search.py ../crop_feature_out/ ../crop_faiss_out/crop_local_pair_result.txt

step6: get image's global feature

We train a EsViT model (follow the rules closely) to extract 256 dims global features, the detils are in README.md of Team: AITechnology in task Descriptor Track.

*note: for global feature, if the image have croped image, we will extract feature use the croped image, else use the origin image.

Generate h5 descriptors for all query images and reference images as submission style:

cd global_fea/feature_extract
python predict_FB_model.py --model checkpoints/EsViT_SwinB_finetune_bs8_lr0.0001_adjustlr_0_margin1.0_dataFB_epoch200.pth  --save_h5_name fb_descriptors_demo.h5  --model_type EsViT_SwinB_W14 --query ./query_list_demo.txt --total ./ref_list_demo.txt

*note: The --query and --total parameters are specified as query list and reference list, respectively.

The h5 file will be saved in ./h5_descriptors/fb_descriptors.h5

step7: retrieval use image's global feature

We have already added our h5 file in phase 1. Use faiss to get top1 pairs.

cd global_fea/faiss_search
python faiss_topk.py ../feature_extract/h5_descriptors/fb_descriptors.h5 ./global_pair_result.txt

step8: compute match score and final result

We use the SIFT feature + KNN-matching (K=2) to compute match point as score. We have already compiled it into an executable program.

Usage: ./match_score 
    
     
      
      

      
     
    
   

For example, to get original image local pairs score:

cd match_score
export LD_LIBRARY_PATH=../local_fea/feature_extract/extLib/
./match_score ../local_fea/faiss_out/local_pair_result.txt ../data/queryimages ../data/referenceimages/ ./local_pair_score.txt

The other two recall pairs are the same:

global: 
./match_score ../global_fea/faiss_search/global_pair_result.txt ../data/queryimages_crop_global ../data/referenceimages/ ./global_pair_score.txt

crop local:
./match_score ../local_fea/crop_faiss_out/crop_local_pair_result.txt ../data/queryimages_crop_local ../data/referenceimages/ ./crop_local_pair_score.txt

Finally, the three recall pairs are merged by:

python merge_score.py ./final_result.txt

Others

If you have any problem or error during running code, please email to us.

A library that allows for inference on probabilistic models

Bean Machine Overview Bean Machine is a probabilistic programming language for inference over statistical models written in the Python language using

Meta Research 234 Dec 29, 2022
PyTorch code for the paper: FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning

FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning This is the PyTorch implementation of our paper: FeatMatch: Feature-Based Augmentat

43 Nov 19, 2022
🔮 A refreshing functional take on deep learning, compatible with your favorite libraries

Thinc: A refreshing functional take on deep learning, compatible with your favorite libraries From the makers of spaCy, Prodigy and FastAPI Thinc is a

Explosion 2.6k Dec 30, 2022
Collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and related datasets

The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and related datasets

Jun Chen 139 Dec 21, 2022
Lenia - Mathematical Life Forms

For full version list, see Timeline in Lenia portal [2020-10-13] Update Python version with multi-kernel and multi-channel extensions (v3.4 LeniaNDK.p

Bert Chan 3.1k Dec 28, 2022
Sematic-Segmantation - Semantic Segmentation on MIT ADE20K dataset in PyTorch

Semantic Segmentation on MIT ADE20K dataset in PyTorch This is a PyTorch impleme

Berat Eren Terzioğlu 4 Mar 22, 2022
The code used for the free [email protected] Webinar series on Reinforcement Learning in Finance

Reinforcement Learning in Finance [email protected] Webinar This repository provides the code f

Yves Hilpisch 62 Dec 22, 2022
GrailQA: Strongly Generalizable Question Answering

GrailQA is a new large-scale, high-quality KBQA dataset with 64,331 questions annotated with both answers and corresponding logical forms in different syntax (i.e., SPARQL, S-expression, etc.). It ca

OSU DKI Lab 76 Dec 21, 2022
A Confidence-based Iterative Solver of Depths and Surface Normals for Deep Multi-view Stereo

idn-solver Paper | Project Page This repository contains the code release of our ICCV 2021 paper: A Confidence-based Iterative Solver of Depths and Su

zhaowang 43 Nov 17, 2022
Image-retrieval-baseline - MUGE Multimodal Retrieval Baseline

MUGE Multimodal Retrieval Baseline This repo is implemented based on the open_cl

47 Dec 16, 2022
Source code of CIKM2021 Long Paper "PSSL: Self-supervised Learning for Personalized Search with Contrastive Sampling".

PSSL Source code of CIKM2021 Long Paper "PSSL: Self-supervised Learning for Personalized Search with Contrastive Sampling". It consists of the pre-tra

2 Dec 21, 2021
Orthogonal Over-Parameterized Training

The inductive bias of a neural network is largely determined by the architecture and the training algorithm. To achieve good generalization, how to effectively train a neural network is of great impo

Weiyang Liu 11 Apr 18, 2022
Submodular Subset Selection for Active Domain Adaptation (ICCV 2021)

S3VAADA: Submodular Subset Selection for Virtual Adversarial Active Domain Adaptation ICCV 2021 Harsh Rangwani, Arihant Jain*, Sumukh K Aithal*, R. Ve

Video Analytics Lab -- IISc 13 Dec 28, 2022
Implementation of Neural Style Transfer in Pytorch

PytorchNeuralStyleTransfer Code to run Neural Style Transfer from our paper Image Style Transfer Using Convolutional Neural Networks. Also includes co

Leon Gatys 396 Dec 01, 2022
PyTorch code of my ICDAR 2021 paper Vision Transformer for Fast and Efficient Scene Text Recognition (ViTSTR)

Vision Transformer for Fast and Efficient Scene Text Recognition (ICDAR 2021) ViTSTR is a simple single-stage model that uses a pre-trained Vision Tra

Rowel Atienza 198 Dec 27, 2022
[CVPR 2021] Official PyTorch Implementation for "Iterative Filter Adaptive Network for Single Image Defocus Deblurring"

IFAN: Iterative Filter Adaptive Network for Single Image Defocus Deblurring Checkout for the demo (GUI/Google Colab)! The GUI version might occasional

Junyong Lee 173 Dec 30, 2022
Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)

Taming Visually Guided Sound Generation • [Project Page] • [ArXiv] • [Poster] • • Listen for the samples on our project page. Overview We propose to t

Vladimir Iashin 226 Jan 03, 2023
PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models

PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models Code accompanying CVPR'20 paper of the same title. Paper lin

Alex Damian 7k Dec 30, 2022
A time series processing library

Timeseria Timeseria is a time series processing library which aims at making it easy to handle time series data and to build statistical and machine l

Stefano Alberto Russo 11 Aug 08, 2022
Demos of essentia classifiers hosted on replicate.ai

essentia-replicate-demos Demos of Essentia models hosted on replicate.ai's MTG site. The models Check our site for a complete list of the models avail

Music Technology Group - Universitat Pompeu Fabra 12 Nov 14, 2022