The repo for reproducing Seed-driven Document Ranking for Systematic Reviews: A Reproducibility Study

Related tags

Deep Learningsdr
Overview

ECIR Reproducibility Paper: Seed-driven Document Ranking for Systematic Reviews: A Reproducibility Study

This code corresponds to the reproducibility paper: "Seed-driven Document Ranking for Systematic Reviews: A Reproducibility Study" and all results gathered from the paper are generated using the code.

Environment setup:

  • This project is implemented and tested only for python version 3.6.12, other python versions are not tested and can not ensure the full run of the results.

First please install the required packages:

pip3 install -r requirements.txt

Query&Eval generation:

First please clone the TAR repository using the command

git clone https://github.com/CLEF-TAR/tar.git

The data that's been used include the following files:

For 2017:
tar/tree/master/2017-TAR/training/qrels/qrel_content_train
tar/tree/master/2017-TAR/testing/qrels/qrel_content_test.txt
Please cat these two files together to make 2017_full.txt

For 2018:
tar/tree/master/2018-TAR/Task2/Training/qrels/full.train.content.2018.qrels
tar/tree/master/2018-TAR/Task2/Testing/qrels/full.test.content.2018.qrels
Please cat these two files together to make 2018_full.txt

For 2019:
tar/tree/master/2019-TAR/Task2/Training/Intervention/qrels/full.train.int.content.2019.qrels
tar/tree/master/2019-TAR/Task2/Testing/Intervention/qrels/full.test.int.content.2019.qrels
Please cat these two files together to make 2019_full.txt, and also 2019_test.txt (note for 2019 these two will be the same)

Then you can generate query and evaluation files by:

For snigle:
python3 topic_query_generation.py --input_qrel qrel_file_for_training+testing --input_test_qrel qrel_file_for_testing --DATA_DIR output_dir

For multiple:
python3 topic_query_generation_multiple.py --input_qrel qrel_file_for_training+testing --input_test_qrel qrel_file_for_testing --DATA_DIR output_dir

Please note: you need to generate for each year and put it in a separate folder, not the overall one.

Collection generation:

For BOW collection generation, the following command is needed

python3 gather_all_pids.py --filenames 2017_full.txt+2018_full.txt+2019_full.txt --output_dir collection/pid_dir --chunks n
python3 collection_gathering.py --filename yourpidsfile --email [email protected] --output output_collection
python3 collection_processing.py --input_collection acquired_collection_file --output_collection processed_file(default is weighted1_bow.jsonl)

Then for BOC collection generation:

  • First ensure to check Quickumls to gather umls data.
  • Second ensure to register on NCBO to get api keys, and fill in these keys in ncbo_request_word.py
  • For BOC collection then, run the following command to generation boc_collection:
python3 ncbo_request_word.py --input_collection your_generated_bow_collection --num_workers for_multi_procesing --generated_collection output_dir_ncbo
cat output_dir/* > ncbo.tsv
python3 processing_uml.py --input_collection your_bow_collection --input_umls_dir your_output_umls_dir --num_workers for_multi_procesing
python3 processing_umls_word.py --input_collection your_generated_bow_collection --input_umls_dir your_output_umls_dir_from_last_step --output_file umls.tsv
python3 boc_extraction.py --input_collection bow_collection --input_ncbo_collection ncbo.tsv --input_umls_collection umls.tsv --output_collection processed_file(default is weighted1_boc.jsonl)

RQ1: Does the effectiveness of SDR generalise beyond the CLEF TAR 2017 dataset?

For RQ1, single seed driven results are acquired for clef tar 2017, 2018, 2019, for this please run the following command.

bash search.sh 2017_single_data_dir all
bash search.sh 2018_single_data_dir test
bash search.sh 2019_single_data_dir test

to get the run_file of all three years single seed run_file with all methods.

Then evaluation by:

bash evaluation_full.sh 2017_single_data_dir all
bash evaluation_full.sh 2018_single_data_dir test
bash evaluation_full.sh 2019_single_data_dir test

to print out evaluation measures and also save evaluation measurement files in the corresponding eval folder

RQ2: What is the impact of using multiple seed studies collectively on the effectiveness of SDR?

For RQ2, multiple seed driven results are acquired for clef tar 2017, 2018, 2019, for this please run the following command.

bash search_multiple.sh 2017_multiple_data_dir all
bash search_multiple.sh 2018_multiple_data_dir test
bash search_multiple.sh 2019_multiple_data_dir test

to get the run_file of all three years multiple seed run_file with all methods.

Then evaluation by:

bash evaluation_full.sh 2017_multiple_data_dir all
bash evaluation_full.sh 2018_multiple_data_dir test
bash evaluation_full.sh 2019_multiple_data_dir test

to print out evaluation measures and also save evaluation measurement files in the corresponding eval folder

RQ3: To what extent do seed studies impact the ranking stability of single- and multi-SDR?

For this question, we need to use the results acquired from the last two steps, in which we can generate variability graphs by using the following command:

python3 graph_making/distribution_graph.py --year 2017 --type oracle 
python3 graph_making/distribution_graph.py --year 2018 --type oracle 
python3 graph_making/distribution_graph.py --year 2019 --type oracle 

to get distribution graphs of the three years.

Generated run files:

Run files are generated and stored in here, feel free to download for verification or futher research needs.

Example:
run_files/2017/all: 2017 single seed results file
run_files/2017/multiple: 2017 multiple seed results file

Owner
ielab
The Information Engineering Lab
ielab
Simple PyTorch hierarchical models.

A python package adding basic hierarchal networks in pytorch for classification tasks. It implements a simple hierarchal network structure based on feed-backward outputs.

Rajiv Sarvepalli 5 Mar 06, 2022
Objective of the repository is to learn and build machine learning models using Pytorch. 30DaysofML Using Pytorch

30 Days Of Machine Learning Using Pytorch Objective of the repository is to learn and build machine learning models using Pytorch. List of Algorithms

Mayur 119 Nov 24, 2022
Offline Reinforcement Learning with Implicit Q-Learning

Offline Reinforcement Learning with Implicit Q-Learning This repository contains the official implementation of Offline Reinforcement Learning with Im

Ilya Kostrikov 126 Jan 06, 2023
This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"

This is an official pytorch implementation of ActionCLIP: A New Paradigm for Video Action Recognition [arXiv] Overview Content Prerequisites Data Prep

268 Jan 09, 2023
Pytorch implementation for "Density-aware Chamfer Distance as a Comprehensive Metric for Point Cloud Completion" (NeurIPS 2021)

Density-aware Chamfer Distance This repository contains the official PyTorch implementation of our paper: Density-aware Chamfer Distance as a Comprehe

Tong WU 93 Dec 15, 2022
Send text to girlfriend in the morning

Girlfriend Text Send text to girlfriend (or really anyone with a phone number) in the morning 1. Configure your settings in utils.py. phone_number = "

Paras Adhikary 199 Oct 25, 2022
🎁 3,000,000+ Unsplash images made available for research and machine learning

The Unsplash Dataset The Unsplash Dataset is made up of over 250,000+ contributing global photographers and data sourced from hundreds of millions of

Unsplash 2k Jan 03, 2023
G-NIA model from "Single Node Injection Attack against Graph Neural Networks" (CIKM 2021)

Single Node Injection Attack against Graph Neural Networks This repository is our Pytorch implementation of our paper: Single Node Injection Attack ag

Shuchang Tao 18 Nov 21, 2022
Reusable constraint types to use with typing.Annotated

annotated-types PEP-593 added typing.Annotated as a way of adding context-specific metadata to existing types, and specifies that Annotated[T, x] shou

125 Dec 26, 2022
This repository builds a basic vision transformer from scratch so that one beginner can understand the theory of vision transformer.

vision-transformer-from-scratch This repository includes several kinds of vision transformers from scratch so that one beginner can understand the the

1 Dec 24, 2021
This repository contains the code for the paper "Hierarchical Motion Understanding via Motion Programs"

Hierarchical Motion Understanding via Motion Programs (CVPR 2021) This repository contains the official implementation of: Hierarchical Motion Underst

Sumith Kulal 40 Dec 05, 2022
[Nature Machine Intelligence' 21] "Advancing COVID-19 Diagnosis with Privacy-Preserving Collaboration in Artificial Intelligence"

[UCADI] COVID-19 Diagnosis With Federated Learning Intro We developed a Federated Learning (FL) Framework for global researchers to collaboratively tr

HUST EIC AI-LAB 30 Dec 12, 2022
Hl classification bc - A Network-Based High-Level Data Classification Algorithm Using Betweenness Centrality

A Network-Based High-Level Data Classification Algorithm Using Betweenness Centr

Esteban Vilca 3 Dec 01, 2022
PyTorch DepthNet Training on Still Box dataset

DepthNet training on Still Box Project page This code can replicate the results of our paper that was published in UAVg-17. If you use this repo in yo

Clément Pinard 115 Nov 21, 2022
[CVPR 21] Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.

Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, CVPR 2021. Ayan Kumar Bhunia, Pinaki nath Chowdhury, Yongxin Yan

Ayan Kumar Bhunia 44 Dec 12, 2022
Machine Learning Models were applied to predict the mass of the brain based on gender, age ranges, and head size.

Brain Weight in Humans Variations of head sizes and brain weights in humans Kaggle dataset obtained from this link by Anubhab Swain. Image obtained fr

Anne Livia 1 Feb 02, 2022
Self-supervised learning optimally robust representations for domain generalization.

OptDom: Learning Optimal Representations for Domain Generalization This repository contains the official implementation for Optimal Representations fo

Yangjun Ruan 18 Aug 25, 2022
Moving Object Segmentation in 3D LiDAR Data: A Learning-based Approach Exploiting Sequential Data

LiDAR-MOS: Moving Object Segmentation in 3D LiDAR Data This repo contains the code for our paper: Moving Object Segmentation in 3D LiDAR Data: A Learn

Photogrammetry & Robotics Bonn 394 Dec 29, 2022
Eff video representation - Efficient video representation through neural fields

Neural Residual Flow Fields for Efficient Video Representations 1. Download MPI

41 Jan 06, 2023
Gradient-free global optimization algorithm for multidimensional functions based on the low rank tensor train format

ttopt Description Gradient-free global optimization algorithm for multidimensional functions based on the low rank tensor train (TT) format and maximu

5 May 23, 2022