The 1st Place Solution of the Facebook AI Image Similarity Challenge (ISC21) : Descriptor Track.

Last update: Jan 08, 2023

Overview

ISC21-Descriptor-Track-1st

The 1st Place Solution of the Facebook AI Image Similarity Challenge (ISC21) : Descriptor Track.

You can check our solution tech report from: Contrastive Learning with Large Memory Bank and Negative Embedding Subtraction for Accurate Copy Detection

setup

OS

Ubuntu 18.04

CUDA Version

11.1

environment

Run this for python env

conda env create -f environment.yml

data download

mkdir -p input/{query,reference,train}_images
aws s3 cp s3://drivendata-competition-fb-isc-data/all/query_images/ input/query_images/ --recursive --no-sign-request
aws s3 cp s3://drivendata-competition-fb-isc-data/all/reference_images/ input/reference_images/ --recursive --no-sign-request
aws s3 cp s3://drivendata-competition-fb-isc-data/all/train_images/ input/train_images/ --recursive --no-sign-request
aws s3 cp s3://drivendata-competition-fb-isc-data/all/query_images_phase2/ input/query_images_phase2/ --recursive --no-sign-request

train

Run below lines step by step.

cd exp

CUDA_VISIBLE_DEVICES=0,1,2,3 python v83.py \
  -a tf_efficientnetv2_m_in21ft1k --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --seed 9 \
  --epochs 5 --lr 0.1 --wd 1e-6 --batch-size 128 --ncrops 2 \
  --gem-p 1.0 --pos-margin 0.0 --neg-margin 1.0 \
  --input-size 256 --sample-size 1000000 --memory-size 20000 \
  ../input/training_images/
CUDA_VISIBLE_DEVICES=0,1,2,3 python v83.py \
  -a tf_efficientnetv2_m_in21ft1k --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --seed 90 \
  --epochs 10 --lr 0.1 --wd 1e-6 --batch-size 128 --ncrops 2 \
  --gem-p 1.0 --pos-margin 0.0 --neg-margin 1.0 \
  --input-size 256 --sample-size 1000000 --memory-size 20000 \
  --resume ./v83/train/checkpoint_0004.pth.tar \
  ../input/training_images/

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python v86.py \
  -a tf_efficientnetv2_m_in21ft1k --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --seed 99 \
  --epochs 7 --lr 0.1 --wd 1e-6 --batch-size 128 --ncrops 2 \
  --gem-p 1.0 --pos-margin 0.0 --neg-margin 1.0 \
  --input-size 384 --sample-size 1000000 --memory-size 20000 --weight ./v83/train/checkpoint_0005.pth.tar \
  ../input/training_images/

python v98.py \
  -a tf_efficientnetv2_m_in21ft1k --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --seed 999 \
  --epochs 3 --lr 0.1 --wd 1e-6 --batch-size 64 --ncrops 2 \
  --gem-p 1.0 --pos-margin 0.0 --neg-margin 1.0 --weight ./v86/train/checkpoint_0005.pth.tar \
  --input-size 512 --sample-size 1000000 --memory-size 20000 \
  ../input/training_images/

python v107.py \
  -a tf_efficientnetv2_m_in21ft1k --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --seed 99999 \
  --epochs 10 --lr 0.5 --wd 1e-6 --batch-size 16 --ncrops 2 \
  --gem-p 1.0 --pos-margin 0.0 --neg-margin 1.1 --weight ./v98/train/checkpoint_0001.pth.tar \
  --input-size 512 --sample-size 1000000 --memory-size 1000 \
  ../input/training_images/

The final model weight can be downloaded from here: https://drive.google.com/file/d/1ySea-NJp_J0aWvma_WmVbc3Hnwf5LHUf/view?usp=sharing You can execute inference code without run training with this model weight. To locate the model weight to suitable location, run following commands after downloaded the model weight.

mkdir -p exp/v107/train
mv checkpoint_009.pth.tar exp/v107/train/

inference

Note that faiss doesn't work with A100, so I used 4x GTX 1080 Ti for post-process.

cd exp

python v107.py -a tf_efficientnetv2_m_in21ft1k --batch-size 128 --mode extract --gem-eval-p 1.0 --weight ./v107/train/checkpoint_0009.pth.tar --input-size 512 --target-set qrt ../input/

# this script generates final prediction result files
python ../scripts/postprocess.py

Submission files are outputted here:

exp/v107/extract/v107_iso.h5 # descriptor track
exp/v107/extract/v107_iso.csv # matching track

descriptor track local evaluation score:

{
  "average_precision": 0.9479039085717805,
  "recall_p90": 0.9192546583850931
}

Comments

Bugs?

Congratulations! We really appreciate the work. When I run the

python v107.py \
  -a tf_efficientnetv2_m_in21ft1k --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --seed 99999 \
  --epochs 10 --lr 0.5 --wd 1e-6 --batch-size 16 --ncrops 2 \
  --gem-p 1.0 --pos-margin 0.0 --neg-margin 1.1 --weight ./v98/train/checkpoint_0001.pth.tar \
  --input-size 512 --sample-size 1000000 --memory-size 1000 \
  ../input/training_images/

I come across

Traceback (most recent call last):                                              
  File "v107.py", line 774, in <module>
    train(args)
  File "v107.py", line 425, in train
    mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args))
  File "/home/wangwenhao/anaconda3/envs/ISC/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/home/wangwenhao/anaconda3/envs/ISC/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
    while not context.join():
  File "/home/wangwenhao/anaconda3/envs/ISC/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 150, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 5 terminated with the following error:
Traceback (most recent call last):
  File "/home/wangwenhao/anaconda3/envs/ISC/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
  File "/home/wangwenhao/fbisc-descriptor-1st/exp/v107.py", line 573, in main_worker
    train_one_epoch(train_loader, model, loss_fn, optimizer, scaler, epoch, args)
  File "/home/wangwenhao/fbisc-descriptor-1st/exp/v107.py", line 595, in train_one_epoch
    labels = torch.cat([torch.tile(i, dims=(args.ncrops,)), torch.tensor(j)])
ValueError: only one element tensors can be converted to Python scalars

Do you know how to fix it? Thanks.

opened by WangWenhao0716 14

data augment is wrong

train_dataset = ISCDataset(
    train_paths,
    NCropsTransform(
        transforms.Compose(aug_moderate),
        transforms.Compose(aug_hard),
        args.ncrops,
    ),
)

error log: apply_transform() takes from 2 to 3 positional arguments but 5 were given

opened by AItechnology 5

Cannot load state dict for model
Thanks for your amazing work. But I encounter a problem, when I use checkpoint_0009.pth.tar checkpoint,

When I don't remove model = nn.DataParallel(model), I encouter error:

size mismatch for module.backbone.bn1.weight: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for module.backbone.bn1.bias: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for module.backbone.bn1.running_mean: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for module.backbone.bn1.running_var: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for module.fc.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([256, 2048])

Then I remove line model = nn.DataParallel(model), the model seems to load checkpoint successfully, but I feed same input to model, the output feature vector if different for different time I run. I guess the model is not loaded successfully when load state dict, so model will use the weight initialized randomly.

Then I change strict=True in model.load_state_dict(state_dict=state_dict, strict=False), I encounter error RuntimeError: Error(s) in loading state_dict for ISCNet: Missing key(s) in state_dict:, I found that the key of state_dict in model and checkpoint totally diffrent even name pattern. Key of model state dict and checkpoint state dict I attached below. checkpoint.txt model.txt How can I solve the this problem?
opened by NguyenThanhAI 2
Unable to reproduce Stage 1 results
Hi, I attempted to reproduce the Stage 1 training using your provided code, but was unable to obtain the reported muAP of 0.5831. I instead obtained this result at epoch 9 (indexed from 0):

Average Precision: 0.49554 Recall at P90 : 0.32701 Threshold at P90 : -0.375733 Recall at rank 1: 0.62448 Recall at rank 10: 0.65961

I also saw that you continued training from epoch 5, but these are the results I obtained at epoch 5:

Average Precision: 0.47977 Recall at P90 : 0.32501 Threshold at P90 : -0.376619 Recall at rank 1: 0.61409 Recall at rank 10: 0.64903

Both sets of results were obtained on the private ground truth set of Phase 1, using image size 512. Is it possible to provide some insight as to what is happening here? Thank you.
opened by avrilwongaw 1

about the train output feature

sorry to bother you again. I want train the model with a small backbone such as resnet50. Because I only have three GPU and I run with command:

CUDA_VISIBLE_DEVICES=0,1,2 python v83.py  --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --seed 9 \
  --epochs 5 --lr 0.1 --wd 1e-6 --batch-size 96 --ncrops 2 \
  --gem-p 1.0 --pos-margin 0.0 --neg-margin 1.0 \
  --input-size 256 --sample-size 1000000 --memory-size 20000 \
/root/zhx3/data/fb_train_data/train

I find a strange problem. I test checkpoint_000{0..4}.pth.tar model. only the checkpoint_0002.pth.tar ouput different when the input is different. I mean other model will output same embedding no matter what different you input. thanks in advance. the loss log output such as:

epoch 5:   0%|          | 0/15873 [00:00<?, ?it/s]=> loading checkpoint './v83/train/checkpoint_0004.pth.tar'
=> loaded checkpoint './v83/train/checkpoint_0004.pth.tar' (epoch 5)
epoch 6:   0%|          | 0/15873 [00:00<?, ?it/s]epoch=5, loss=1.0154363534772417
epoch 7:   0%|          | 0/15873 [00:00<?, ?it/s]epoch=6, loss=1.012835873522891

opened by Usernamezhx 1

about the memory size

python v107.py \
  -a tf_efficientnetv2_m_in21ft1k --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --seed 99999 \
  --epochs 10 --lr 0.5 --wd 1e-6 \
  --gem-p 1.0 --pos-margin 0.0 --neg-margin 1.1 --weight ./v98/train/checkpoint_0001.pth.tar \
  --input-size 512 --sample-size 1000000 --memory-size 1000 \
  ../input/training_images/

why not set the --memory-size large such as 20000 ? thanks in advance

opened by Usernamezhx 1

will v107 overfit for phase2?

Congratulations and thanks for your sharing.

i find v107 only use the about 5k query-ref pair (i.e. gt in phase1) as positive. How to know whether it overfits for phase2 ?

opened by liangzimei 1

access denied for dataset on aws

Thanks for you work! I have problems downloading the dataset from the given aws buckets

$ aws s3 cp s3://drivendata-competition-fb-isc-data/all/query_images/ input/query_images/ --recursive --no-sign-request
fatal error: An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied

Do I need special permissions to download the data?

opened by sebastianlutter 0

Final optimizer state for the model

Hello @lyakaap

Thanks a lot for this work. I am trying to take this and finetune over a certain task. Is it possible you can provide the state of final optimizer after 4th stage of training. We want to try an experiment where it will be very useful.

Thank you.

opened by shubhamjain0594 11

Releases(v1.0.1)

v1.0.1(Nov 25, 2022)

Source code(tar.gz)
Source code(zip)
isc_ft_v107.pth.tar(400.94 MB)
isc_selfsup_v98.pth.tar(400.94 MB)

Owner

lyakaap

Computer Vision, Deep Learning

GitHub Repository

Construct a neural network frame by Numpy

本项目的CSDN博客链接：https://blog.csdn.net/weixin_41578567/article/details/111482022 1. 概览本项目主要用于神经网络的学习，通过基于numpy的实现，了解神经网络底层前向传播、反向传播以及各类优化器的原理。该项目目前已实现的功

24 Jan 22, 2022

A Tensorflow based library for Time Series Modelling with Gaussian Processes

Markovflow Documentation | Tutorials | API reference | Slack What does Markovflow do? Markovflow is a Python library for time-series analysis via prob

24 Dec 12, 2022

Source code for "Roto-translated Local Coordinate Framesfor Interacting Dynamical Systems"

Roto-translated Local Coordinate Frames for Interacting Dynamical Systems Source code for Roto-translated Local Coordinate Frames for Interacting Dyna

19 Nov 27, 2022

Code examples and benchmarks from the paper "Understanding Entropy Coding With Asymmetric Numeral Systems (ANS): a Statistician's Perspective"

Code For the Paper "Understanding Entropy Coding With Asymmetric Numeral Systems (ANS): a Statistician's Perspective" Author: Robert Bamler Date: 22 D

4 Nov 02, 2022

基于Paddle框架的arcface复现

arcface-Paddle 基于Paddle框架的arcface复现 ArcFace-Paddle 本项目基于paddlepaddle框架复现ArcFace，并参加百度第三届论文复现赛，将在2021年5月15日比赛完后提供AIStudio链接～敬请期待参考项目： InsightFace Padd

16 Dec 15, 2022

Pytorch Implementation for (STANet+ and STANet)

Pytorch Implementation for (STANet+ and STANet) V2-Weakly Supervised Visual-Auditory Saliency Detection with Multigranularity Perception (arxiv), pdf:

14 Nov 29, 2022

Code for Emergent Translation in Multi-Agent Communication

Emergent Translation in Multi-Agent Communication PyTorch implementation of the models described in the paper Emergent Translation in Multi-Agent Comm

75 Jul 15, 2022

A cross-document event and entity coreference resolution system, trained and evaluated on the ECB+ corpus.

A Comprehensive Comparison of Word Embeddings in Event & Entity Coreference Resolution. Introduction This repo contains experimental code derived from

2 May 09, 2022

Adversarial Adaptation with Distillation for BERT Unsupervised Domain Adaptation

Knowledge Distillation for BERT Unsupervised Domain Adaptation Official PyTorch implementation | Paper Abstract A pre-trained language model, BERT, ha

29 Nov 30, 2022

(ICONIP 2020) MobileHand: Real-time 3D Hand Shape and Pose Estimation from Color Image

MobileHand: Real-time 3D Hand Shape and Pose Estimation from Color Image This repo contains the source code for MobileHand, real-time estimation of 3D

90 Dec 12, 2022

Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT

CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT CheXbert is an accurate, automated dee

51 Dec 08, 2022

Picasso: A CUDA-based Library for Deep Learning over 3D Meshes

The Picasso Library is intended for complex real-world applications with large-scale surfaces, while it also performs impressively on the small-scale applications over synthetic shape manifolds. We h

97 Dec 01, 2022

PyTorch code for ICPR 2020 paper Future Urban Scene Generation Through Vehicle Synthesis

Future urban scene generation through vehicle synthesis This repository contains Pytorch code for the ICPR2020 paper "Future Urban Scene Generation Th

4 Oct 11, 2021

Learning 3D Part Assembly from a Single Image

Learning 3D Part Assembly from a Single Image This repository contains a PyTorch implementation of the paper: Learning 3D Part Assembly from A Single

18 Dec 21, 2022

NER for Indian languages

CL-NERIL: A Cross-Lingual Model for NER in Indian Languages Code for the paper - https://arxiv.org/abs/2111.11815 Setup Setup a virtual environment Th

0 Nov 24, 2021

The official implementation of NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021]. https://arxiv.org/pdf/2101.12378.pdf

NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021] Release Notes The offical PyTorch implementation of NeMo, p

76 Nov 23, 2022

The 1st Place Solution of the Facebook AI Image Similarity Challenge (ISC21) : Descriptor Track.

Related tags

Overview

ISC21-Descriptor-Track-1st

setup

OS

CUDA Version

environment

data download

train

inference

Comments

Releases(v1.0.1)

v1.0.1(Nov 25, 2022)

Owner

lyakaap

Construct a neural network frame by Numpy

A Tensorflow based library for Time Series Modelling with Gaussian Processes

Source code for "Roto-translated Local Coordinate Framesfor Interacting Dynamical Systems"

Code examples and benchmarks from the paper "Understanding Entropy Coding With Asymmetric Numeral Systems (ANS): a Statistician's Perspective"

基于Paddle框架的arcface复现

Pytorch Implementation for (STANet+ and STANet)

Code for Emergent Translation in Multi-Agent Communication

A cross-document event and entity coreference resolution system, trained and evaluated on the ECB+ corpus.

Adversarial Adaptation with Distillation for BERT Unsupervised Domain Adaptation

(ICONIP 2020) MobileHand: Real-time 3D Hand Shape and Pose Estimation from Color Image

Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT

Picasso: A CUDA-based Library for Deep Learning over 3D Meshes

PyTorch code for ICPR 2020 paper Future Urban Scene Generation Through Vehicle Synthesis

Learning 3D Part Assembly from a Single Image

NER for Indian languages

The official implementation of NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021]. https://arxiv.org/pdf/2101.12378.pdf

PyGCL: A PyTorch Library for Graph Contrastive Learning

[CVPR 2019 Oral] Multi-Channel Attention Selection GAN with Cascaded Semantic Guidance for Cross-View Image Translation

AI Based Smart Exam Proctoring Package

Semi-SDP Semi-supervised parser for semantic dependency parsing.