The repo for reproducing Seed-driven Document Ranking for Systematic Reviews: A Reproducibility Study

Related tags

Deep Learningsdr
Overview

ECIR Reproducibility Paper: Seed-driven Document Ranking for Systematic Reviews: A Reproducibility Study

This code corresponds to the reproducibility paper: "Seed-driven Document Ranking for Systematic Reviews: A Reproducibility Study" and all results gathered from the paper are generated using the code.

Environment setup:

  • This project is implemented and tested only for python version 3.6.12, other python versions are not tested and can not ensure the full run of the results.

First please install the required packages:

pip3 install -r requirements.txt

Query&Eval generation:

First please clone the TAR repository using the command

git clone https://github.com/CLEF-TAR/tar.git

The data that's been used include the following files:

For 2017:
tar/tree/master/2017-TAR/training/qrels/qrel_content_train
tar/tree/master/2017-TAR/testing/qrels/qrel_content_test.txt
Please cat these two files together to make 2017_full.txt

For 2018:
tar/tree/master/2018-TAR/Task2/Training/qrels/full.train.content.2018.qrels
tar/tree/master/2018-TAR/Task2/Testing/qrels/full.test.content.2018.qrels
Please cat these two files together to make 2018_full.txt

For 2019:
tar/tree/master/2019-TAR/Task2/Training/Intervention/qrels/full.train.int.content.2019.qrels
tar/tree/master/2019-TAR/Task2/Testing/Intervention/qrels/full.test.int.content.2019.qrels
Please cat these two files together to make 2019_full.txt, and also 2019_test.txt (note for 2019 these two will be the same)

Then you can generate query and evaluation files by:

For snigle:
python3 topic_query_generation.py --input_qrel qrel_file_for_training+testing --input_test_qrel qrel_file_for_testing --DATA_DIR output_dir

For multiple:
python3 topic_query_generation_multiple.py --input_qrel qrel_file_for_training+testing --input_test_qrel qrel_file_for_testing --DATA_DIR output_dir

Please note: you need to generate for each year and put it in a separate folder, not the overall one.

Collection generation:

For BOW collection generation, the following command is needed

python3 gather_all_pids.py --filenames 2017_full.txt+2018_full.txt+2019_full.txt --output_dir collection/pid_dir --chunks n
python3 collection_gathering.py --filename yourpidsfile --email [email protected] --output output_collection
python3 collection_processing.py --input_collection acquired_collection_file --output_collection processed_file(default is weighted1_bow.jsonl)

Then for BOC collection generation:

  • First ensure to check Quickumls to gather umls data.
  • Second ensure to register on NCBO to get api keys, and fill in these keys in ncbo_request_word.py
  • For BOC collection then, run the following command to generation boc_collection:
python3 ncbo_request_word.py --input_collection your_generated_bow_collection --num_workers for_multi_procesing --generated_collection output_dir_ncbo
cat output_dir/* > ncbo.tsv
python3 processing_uml.py --input_collection your_bow_collection --input_umls_dir your_output_umls_dir --num_workers for_multi_procesing
python3 processing_umls_word.py --input_collection your_generated_bow_collection --input_umls_dir your_output_umls_dir_from_last_step --output_file umls.tsv
python3 boc_extraction.py --input_collection bow_collection --input_ncbo_collection ncbo.tsv --input_umls_collection umls.tsv --output_collection processed_file(default is weighted1_boc.jsonl)

RQ1: Does the effectiveness of SDR generalise beyond the CLEF TAR 2017 dataset?

For RQ1, single seed driven results are acquired for clef tar 2017, 2018, 2019, for this please run the following command.

bash search.sh 2017_single_data_dir all
bash search.sh 2018_single_data_dir test
bash search.sh 2019_single_data_dir test

to get the run_file of all three years single seed run_file with all methods.

Then evaluation by:

bash evaluation_full.sh 2017_single_data_dir all
bash evaluation_full.sh 2018_single_data_dir test
bash evaluation_full.sh 2019_single_data_dir test

to print out evaluation measures and also save evaluation measurement files in the corresponding eval folder

RQ2: What is the impact of using multiple seed studies collectively on the effectiveness of SDR?

For RQ2, multiple seed driven results are acquired for clef tar 2017, 2018, 2019, for this please run the following command.

bash search_multiple.sh 2017_multiple_data_dir all
bash search_multiple.sh 2018_multiple_data_dir test
bash search_multiple.sh 2019_multiple_data_dir test

to get the run_file of all three years multiple seed run_file with all methods.

Then evaluation by:

bash evaluation_full.sh 2017_multiple_data_dir all
bash evaluation_full.sh 2018_multiple_data_dir test
bash evaluation_full.sh 2019_multiple_data_dir test

to print out evaluation measures and also save evaluation measurement files in the corresponding eval folder

RQ3: To what extent do seed studies impact the ranking stability of single- and multi-SDR?

For this question, we need to use the results acquired from the last two steps, in which we can generate variability graphs by using the following command:

python3 graph_making/distribution_graph.py --year 2017 --type oracle 
python3 graph_making/distribution_graph.py --year 2018 --type oracle 
python3 graph_making/distribution_graph.py --year 2019 --type oracle 

to get distribution graphs of the three years.

Generated run files:

Run files are generated and stored in here, feel free to download for verification or futher research needs.

Example:
run_files/2017/all: 2017 single seed results file
run_files/2017/multiple: 2017 multiple seed results file

Owner
ielab
The Information Engineering Lab
ielab
This repository contains the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"

GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields Project Page | Paper | Supplementary | Video | Slides | Blog | Talk If

1.1k Dec 30, 2022
The first dataset of composite images with rationality score indicating whether the object placement in a composite image is reasonable.

Object-Placement-Assessment-Dataset-OPA Object-Placement-Assessment (OPA) is to verify whether a composite image is plausible in terms of the object p

BCMI 53 Nov 15, 2022
Supervised multi-SNE (S-multi-SNE): Multi-view visualisation and classification

S-multi-SNE Supervised multi-SNE (S-multi-SNE): Multi-view visualisation and classification A repository containing the code to reproduce the findings

Theodoulos Rodosthenous 3 Apr 15, 2022
Code for the paper "Query Embedding on Hyper-relational Knowledge Graphs"

Query Embedding on Hyper-Relational Knowledge Graphs This repository contains the code used for the experiments in the paper Query Embedding on Hyper-

DimitrisAlivas 19 Jul 26, 2022
Lightweight stereo matching network based on MobileNetV1 and MobileNetV2

MobileStereoNet: Towards Lightweight Deep Networks for Stereo Matching

Cognitive Systems Research Group 139 Nov 30, 2022
Code for the Convolutional Vision Transformer (ConViT)

ConViT : Vision Transformers with Convolutional Inductive Biases This repository contains PyTorch code for ConViT. It builds on code from the Data-Eff

Facebook Research 418 Jan 06, 2023
bio_inspired_min_nets_improve_the_performance_and_robustness_of_deep_networks

Code Submission for: Bio-inspired Min-Nets Improve the Performance and Robustness of Deep Networks Run with docker To build a docker environment, chan

0 Dec 09, 2021
The story of Chicken for Club Bing

Chicken Story tl;dr: The time when Microsoft banned my entire country for cheating at Club Bing. (A lot of the details are from memory so I've recreat

Eyal 142 May 16, 2022
A Python package for time series augmentation

tsaug tsaug is a Python package for time series augmentation. It offers a set of augmentation methods for time series, as well as a simple API to conn

Arundo Analytics 278 Jan 01, 2023
The offcial repository for 'CharacterBERT and Self-Teaching for Improving the Robustness of Dense Retrievers on Queries with Typos', SIGIR2022

CharacterBERT-DR The offcial repository for CharacterBERT and Self-Teaching for Improving the Robustness of Dense Retrievers on Queries with Typos, Sh

ielab 11 Nov 15, 2022
The official repo of the CVPR2021 oral paper: Representative Batch Normalization with Feature Calibration

Representative Batch Normalization (RBN) with Feature Calibration The official implementation of the CVPR2021 oral paper: Representative Batch Normali

Open source projects of ShangHua-Gao 76 Nov 09, 2022
Complete the code of prefix-tuning in low data setting

Prefix Tuning Note: 作者在论文中提到使用真实的word去初始化prefix的操作(Initializing the prefix with activations of real words,significantly improves generation)。我在使用作者提供的

Andrew Zeng 4 Jul 11, 2022
Official PyTorch implementation of "Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble" (NeurIPS'21)

Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble This is the code for reproducing the results of the paper Uncertainty-Bas

43 Nov 23, 2022
Code for Talking Face Generation by Adversarially Disentangled Audio-Visual Representation (AAAI 2019)

Talking Face Generation by Adversarially Disentangled Audio-Visual Representation (AAAI 2019) We propose Disentangled Audio-Visual System (DAVS) to ad

Hang_Zhou 750 Dec 23, 2022
Empowering journalists and whistleblowers

Onymochat Empowering journalists and whistleblowers Onymochat is an end-to-end encrypted, decentralized, anonymous chat application. You can also host

Samrat Dutta 19 Sep 02, 2022
Fine-grained Control of Image Caption Generation with Abstract Scene Graphs

Faster R-CNN pretrained on VisualGenome This repository modifies maskrcnn-benchmark for object detection and attribute prediction on VisualGenome data

Shizhe Chen 7 Apr 20, 2021
Rule Based Classification Project For Python

Rule-Based-Classification-Project (ENG) Business Problem: A game company wants to create new level-based customer definitions (personas) by using some

Deniz Can OĞUZ 4 Oct 29, 2022
This is the official repository of XVFI (eXtreme Video Frame Interpolation)

XVFI This is the official repository of XVFI (eXtreme Video Frame Interpolation), https://arxiv.org/abs/2103.16206 Last Update: 20210607 We provide th

Jihyong Oh 195 Dec 29, 2022
Computer Vision application in the web

Computer Vision application in the web Preview Usage Clone this repo git clone https://github.com/amineHY/WebApp-Computer-Vision-streamlit.git cd Web

Amine Hadj-Youcef. PhD 35 Dec 06, 2022
Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.

Core ML Tools Use coremltools to convert machine learning models from third-party libraries to the Core ML format. The Python package contains the sup

Apple 3k Jan 08, 2023