Code for the ECIR'22 paper "Evaluating the Robustness of Retrieval Pipelines with Query Variation Generators"

Overview

Query Variation Generators

This repository contains the code and annotation data for the ECIR'22 paper "Evaluating the Robustness of Retrieval Pipelines with Query Variation Generators".

Setup

Install the requirements using

pip install -r requirements.txt

Steps to reproduce the results

First we need to generate_weak supervsion for the desired test sets. We can do that with the scripts/generate_weak_supervision.py. In the paper we test for TREC-DL ('msmarco-passage/trec-dl-2019/judged') and ANTIQUE ('antique/train/split200-valid'), but any IR-datasets (https://ir-datasets.com/index.html) can be used here (as TASK).

python ${REPO_DIR}/examples/generate_weak_supervision.py 
    --task $TASK \
    --output_dir $OUT_DIR 

This will generate one query variation for each method for the original queries. After this, we manually annotated the query variations generated, in order to keep only valid ones for analysis. For that we use analyze_weak_supervision.py (prepares data for manual anotation) and analyze_auto_query_generation_labeling.py (combines auto labels and anotations.).

However, for reproducing the results we can directly use the annotated query set to test neural ranking models robustness (RQ1):

python ${REPO_DIR}/disentangled_information_needs/evaluation/query_rewriting.py \
        --task 'irds:msmarco-passage/trec-dl-2019/judged' \
        --output_dir $OUT_DIR/ \
        --variations_file $OUT_DIR/$VARIATIONS_FILE_TREC_DL \
        --retrieval_model_name "BM25+KNRM" \
        --train_dataset "irds:msmarco-passage/train" \
        --max_iter $MAX_ITER

by using the annotated variations file directly here "$OUT_DIR/$VARIATIONS_FILE_TREC_DL". The same can be done to run rank fusion (RQ2) by replacing query_rewriting.py with rank_fusion.py.

The scripts evaluate_weak_supervision.sh and evaluate_rank_fusion.sh run all models and datasets for both research questions . The first generates the main table of results, Table 4 in the paper, and the second generates the tables for the rank fusion experiments (only available in the Arxiv version of the paper).

Modules and Folders

  • scripts: Contain most of the analysis scripts and also commands to run entire experiments.
  • examples: Contain an example on how to generate query variations.
  • disentangled_information_needs/evaluation: Scripts to evaluate robustness of models for query variations and also to evaluate rank fusion of query variations.
  • disentangled_information_needs/transformations: Methods to generate query variations.
Owner
Gustavo Penha
Researcher - IR - RecSys - ML - NLP. https://linktr.ee/guzpenha
Gustavo Penha
This application explain how we can easily integrate Deepface framework with Python Django application

deepface_suite This application explain how we can easily integrate Deepface framework with Python Django application install redis cache install requ

Mohamed Naji Aboo 3 Apr 18, 2022
Just playing with getting VQGAN+CLIP running locally, rather than having to use colab.

Just playing with getting VQGAN+CLIP running locally, rather than having to use colab.

Nerdy Rodent 2.3k Jan 04, 2023
Pytorch code for "DPFM: Deep Partial Functional Maps" - 3DV 2021 (Oral)

DPFM Code for "DPFM: Deep Partial Functional Maps" - 3DV 2021 (Oral) Installation This implementation runs on python = 3.7, use pip to install depend

Souhaib Attaiki 29 Oct 03, 2022
Simultaneous Detection and Segmentation

Simultaneous Detection and Segmentation This is code for the ECCV Paper: Simultaneous Detection and Segmentation Bharath Hariharan, Pablo Arbelaez,

Bharath Hariharan 96 Jul 20, 2022
Code for "Adversarial attack by dropping information." (ICCV 2021)

AdvDrop Code for "AdvDrop: Adversarial Attack to DNNs by Dropping Information(ICCV 2021)." Human can easily recognize visual objects with lost informa

Ranjie Duan 52 Nov 10, 2022
Supervised 3D Pre-training on Large-scale 2D Natural Image Datasets for 3D Medical Image Analysis

Introduction This is an implementation of our paper Supervised 3D Pre-training on Large-scale 2D Natural Image Datasets for 3D Medical Image Analysis.

24 Dec 06, 2022
Exploring Cross-Image Pixel Contrast for Semantic Segmentation

Exploring Cross-Image Pixel Contrast for Semantic Segmentation Exploring Cross-Image Pixel Contrast for Semantic Segmentation, Wenguan Wang, Tianfei Z

Tianfei Zhou 510 Jan 02, 2023
Time series annotation library.

CrowdCurio Time Series Annotator Library The CrowdCurio Time Series Annotation Library implements classification tasks for time series. Features Suppo

CrowdCurio 51 Sep 15, 2022
Simple improvement of VQVAE that allow to generate x2 sized images compared to baseline

vqvae_dwt_distiller.pytorch Simple improvement of VQVAE that allow to generate x2 sized images compared to baseline. It allows to generate 512x512 ima

Sergei Belousov 25 Jul 19, 2022
给yolov5加个gui界面,使用pyqt5,yolov5是5.0版本

博文地址 https://xugaoxiang.com/2021/06/30/yolov5-pyqt5 代码执行 项目中使用YOLOv5的v5.0版本,界面文件是project.ui pip install -r requirements.txt python main.py 图片检测 视频检测

Xu GaoXiang 215 Dec 30, 2022
The story of Chicken for Club Bing

Chicken Story tl;dr: The time when Microsoft banned my entire country for cheating at Club Bing. (A lot of the details are from memory so I've recreat

Eyal 142 May 16, 2022
Reinforcement learning algorithms in RLlib

raylab Reinforcement learning algorithms in RLlib and PyTorch. Installation pip install raylab Quickstart Raylab provides agents and environments to b

Ângelo 50 Sep 08, 2022
EmoTag helps you train emotion detection model for Chinese audios

emoTag emoTag helps you train emotion detection model for Chinese audios. Environment pip install -r requirement.txt Data We used Emotional Speech Dat

_zza 4 Sep 07, 2022
[ICLR 2021 Spotlight Oral] "Undistillable: Making A Nasty Teacher That CANNOT teach students", Haoyu Ma, Tianlong Chen, Ting-Kuei Hu, Chenyu You, Xiaohui Xie, Zhangyang Wang

Undistillable: Making A Nasty Teacher That CANNOT teach students "Undistillable: Making A Nasty Teacher That CANNOT teach students" Haoyu Ma, Tianlong

VITA 71 Dec 28, 2022
Official implementation of "UCTransNet: Rethinking the Skip Connections in U-Net from a Channel-wise Perspective with Transformer"

[AAAI2022] UCTransNet This repo is the official implementation of "UCTransNet: Rethinking the Skip Connections in U-Net from a Channel-wise Perspectiv

Haonan Wang 199 Jan 03, 2023
PyTorch implementation of Deep HDR Imaging via A Non-Local Network (TIP 2020).

NHDRRNet-PyTorch This is the PyTorch implementation of Deep HDR Imaging via A Non-Local Network (TIP 2020). 0. Differences between Original Paper and

Yutong Zhang 1 Mar 01, 2022
A minimal implementation of face-detection models using flask, gunicorn, nginx, docker, and docker-compose

Face-Detection-flask-gunicorn-nginx-docker This is a simple implementation of dockerized face-detection restful-API implemented with flask, Nginx, and

Pooya-Mohammadi 30 Dec 17, 2022
Official Implementation for "ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement" https://arxiv.org/abs/2104.02699

ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement Recently, the power of unconditional image synthesis has significantly advanced th

967 Jan 04, 2023
This repository includes the official project for the paper: TransMix: Attend to Mix for Vision Transformers.

TransMix: Attend to Mix for Vision Transformers This repository includes the official project for the paper: TransMix: Attend to Mix for Vision Transf

Jie-Neng Chen 130 Jan 01, 2023
Source code for our CVPR 2019 paper - PPGNet: Learning Point-Pair Graph for Line Segment Detection

PPGNet: Learning Point-Pair Graph for Line Segment Detection PyTorch implementation of our CVPR 2019 paper: PPGNet: Learning Point-Pair Graph for Line

SVIP Lab 170 Oct 25, 2022