RCT-ART is an NLP pipeline built with spaCy for converting clinical trial result sentences into tables through jointly extracting intervention, outcome and outcome measure entities and their relations.

Related tags

Deep LearningRCT-ART
Overview

Randomised controlled trial abstract result tabulator

RCT-ART is an NLP pipeline built with spaCy for converting clinical trial result sentences into tables through jointly extracting intervention, outcome and outcome measure entities and their relations. The system is currently constrained to result sentences with specific measures of an outcome for a specific intervention and does not extract comparative relationship (e.g. a relative decrease between the study intervention and placebo).

This repository contains custom pipes and models developed, trained and run using the spaCy library. These are defined and initiated through configs and custom scripts.

In addition, we include all stages of our datasets from their raw format, gold-standard annotations, pre-processed spacy docs and output tables of the system, as well as the evaluation results of the system for its different NLP tasks across each pre-trained model.

Running the system from Python

After cloning this repository and pip installing its dependencies from requirements.txt, the system can be run in two steps:

1. Download and extract the trained models

In the primary study of RCT-ART, we explored a number of BERT-based models in the development of the system. Here, we make available the BioBERT-based named entity recognition (NER) and relation extraction (RE) models:

Download models from here.

The train_models folder of the compression file should be extracted into the root of the cloned directory for the system scripts to be able to access the models.

2a. Demo the system NLP tasks

Once the model folder has been extracted, a streamlit demo of the system NER, RE and tabulation tasks can be run locally on your browser with the following command:

streamlit run scripts/demo.py

2b. Process multiple RCT result sentences

Alternatively, multiple result sentences can be processed by the system using tabulate.py in the scripts directory. Input sentences should be in the Doc format, with the sentences from the study available within datasets/preprocessed.

Training new models for the system

The NER and RE models employed by RCT-ART were both trained using spaCy config files, where we defined their architectures and training hyper-parameters. These are included in the config directory, with a config for each model type and the different BERT-based language representation models we explored in the development of the system. The simplest way to initiate spaCy model training is with the library's inbuilt commands (https://spacy.io/usage/training), passing in the paths of the config file, training set and development set. Below are the commands we used to train the models made available with this repository:

spaCy cmd for training BioBERT-based NER model on all-domains dataset

python -m spacy train configs/ner_biobert.cfg --output ../trained_models/biobert/ner/all_domains --paths.train ../datasets/preprocessed/all_domains/results_only/train.spacy --paths.dev ../datasets/preprocessed/all_domains/results_only/dev.spacy -c ../scripts/custom_functions.py --gpu-id 0

spaCy cmd for training BioBERT-based RE model on all-domains dataset

python -m spacy train configs/rel_biobert.cfg --output ../trained_models/biobert/rel/all_domains  --paths.train ../datasets/preprocessed/all_domains/results_only/train.spacy --paths.dev ../datasets/preprocessed/all_domains/results_only/dev.spacy -c ../scripts/custom_functions.py --gpu-id 0

Repository contents breakdown

The following is a brief description of the assets available in this repository.

configs

Includes the spaCy config files for training NER and RE models of the RCT-ART system. These files define the model architectures, including the BERT-base language representations. Three of BERT language representations were experimented with for each model in the main study of this sytem: BioBERT, SciBERT and RoBERTa.

datasets

Includes all stages of the data used to train and test the RCT-ART models from raw to split gold-standard files in spaCy doc format.

Before filtering and result sentence extraction, abstracts were sourced from the EBM-NLP corpus and the annotated corpus from the Trenta et al. study, which explored automated information extraction from RCTs, and was a key reference for our study.

evaluation_results

Output txt files from the evaluate.py script, giving precision, recall and F1 scores for each of the system tasks across the various dataset cuts.

output_tables

Output csv files from the tabulate.py script, includes the predicted tables output by our system for each test result sentence.

scripts

Below is a contents list of the repository scripts with brief descriptions. Full descriptions can be found at the head of each script.

custom_functions.py -- helper functions for supporting key modules of system.

data_collection.py -- classes and functions for filtering the EBM-NLP corpus and result sentence preprocessing.

demo.py -- a browser-based demo of the RCT-ART system developed with spaCy and Streamlit (see above).

entity_ruler.py -- a script for rules-based entity recognition. Unused in final system, but made available for further development.

evaluate.py -- a set of function for evaluating the system across the NLP tasks: NER, RE, joint NER + RE and tabulation.

preprocessing.py -- a set of function for further data preprocessing after data collection and splitting data into train, test and dev sets.

rel_model.py -- defines the relation extraction model.

rel_pipe.py -- integrates the relation extraction model as a spaCy pipeline component.

tabulate.py -- run the full system by loading the NER and RE models and running their outputs through a tabulation function. Can be used on batches of RCT sentences to output batches of CSV files.

train_multiple_models.py -- iterates through spaCy train commands with different input parameters allowing batches of models to be trained.

Common issues

The transformer models of this system need a GPU with suitable video RAM -- in the primary study, they were trained and run on a GeForce RTX 3080 10GB.

There can be issues with the transformer library dependencies -- CUDA and pytorch. If an issue occurs, ensure CUDA 11.1 is installed on your system, and try reinstalling PyTorch with the following command:

pip3 install torch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio===0.9.1 -f https://download.pytorch.org/whl/torch_stable.html

References

  1. The relation extraction component was adapted from the following spaCy project tutorial.

  2. The EBM-NLP corpus is accessible from here and its publication can be found here.

  3. The glaucoma corpus can be found in the Trenta et al. study.

Training Confidence-Calibrated Classifier for Detecting Out-of-Distribution Samples / ICLR 2018

Training Confidence-Calibrated Classifier for Detecting Out-of-Distribution Samples This project is for the paper "Training Confidence-Calibrated Clas

168 Nov 29, 2022
PyTorch implementation DRO: Deep Recurrent Optimizer for Structure-from-Motion

DRO: Deep Recurrent Optimizer for Structure-from-Motion This is the official PyTorch implementation code for DRO-sfm. For technical details, please re

Alibaba Cloud 56 Dec 12, 2022
Official Implementation of Few-shot Visual Relationship Co-localization

VRC Official implementation of the Few-shot Visual Relationship Co-localization (ICCV 2021) paper project page | paper Requirements Use python = 3.8.

22 Oct 13, 2022
Official code for "Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer. ICCV2021".

Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer. ICCV2021. Introduction We proposed a novel model training paradi

Lucas 103 Dec 14, 2022
Automatically creates genre collections for your Plex media

Plex Auto Genres Plex Auto Genres is a simple script that will add genre collection tags to your media making it much easier to search for genre speci

Shane Israel 63 Dec 31, 2022
A keras implementation of ENet (abandoned for the foreseeable future)

ENet-keras This is an implementation of ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation, ported from ENet-training (lua-t

Pavlos 115 Nov 23, 2021
Official PyTorch implementation of the paper "Graph-based Generative Face Anonymisation with Pose Preservation" in ICIAP 2021

Contents AnonyGAN Installation Dataset Preparation Generating Images Using Pretrained Model Train and Test New Models Evaluation Acknowledgments Citat

Nicola Dall'Asen 10 May 24, 2022
Point cloud processing tool library.

Point Cloud ToolBox This point cloud processing tool library can be used to process point clouds, 3d meshes, and voxels. Environment python 3.7.5 Dep

ZhangXinyun 40 Dec 09, 2022
🐸STT integration examples

🐸 STT 0.9.x Examples These are various examples on how to use or integrate 🐸 STT using our packages. It is a good way to just try out 🐸 STT before

coqui 92 Dec 19, 2022
[ACL-IJCNLP 2021] Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning

CLNER The code is for our ACL-IJCNLP 2021 paper: Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning CLNER is a

71 Dec 08, 2022
Open-source Monocular Python HawkEye for Tennis

Tennis Tracking 🎾 Objectives Track the ball Detect court lines Detect the players To track the ball we used TrackNet - deep learning network for trac

ArtLabs 188 Jan 08, 2023
NER for Indian languages

CL-NERIL: A Cross-Lingual Model for NER in Indian Languages Code for the paper - https://arxiv.org/abs/2111.11815 Setup Setup a virtual environment Th

Akshara P 0 Nov 24, 2021
Official implementation of NeurIPS 2021 paper "Contextual Similarity Aggregation with Self-attention for Visual Re-ranking"

CSA: Contextual Similarity Aggregation with Self-attention for Visual Re-ranking PyTorch training code for CSA (Contextual Similarity Aggregation). We

Hui Wu 19 Oct 21, 2022
Public repository of the 3DV 2021 paper "Generative Zero-Shot Learning for Semantic Segmentation of 3D Point Clouds"

Generative Zero-Shot Learning for Semantic Segmentation of 3D Point Clouds Björn Michele1), Alexandre Boulch1), Gilles Puy1), Maxime Bucher1) and Rena

valeo.ai 15 Dec 22, 2022
Car Parking Tracker Using OpenCv

Car Parking Vacancy Tracker Using OpenCv I used basic image processing methods i

Adwait Kelkar 30 Dec 03, 2022
Simple PyTorch hierarchical models.

A python package adding basic hierarchal networks in pytorch for classification tasks. It implements a simple hierarchal network structure based on feed-backward outputs.

Rajiv Sarvepalli 5 Mar 06, 2022
Aligning Latent and Image Spaces to Connect the Unconnectable

About This repo contains the official implementation of the Aligning Latent and Image Spaces to Connect the Unconnectable paper. It is a GAN model whi

Ivan Skorokhodov 203 Jan 03, 2023
Minimal fastai code needed for working with pytorch

fastai_minima A mimal version of fastai with the barebones needed to work with Pytorch #all_slow Install pip install fastai_minima How to use This lib

Zachary Mueller 14 Oct 21, 2022
A High-Level Fusion Scheme for Circular Quantities published at the 20th International Conference on Advanced Robotics

Monte Carlo Simulation to the Paper A High-Level Fusion Scheme for Circular Quantities published at the 20th International Conference on Advanced Robotics

Sören Kohnert 0 Dec 06, 2021
dataset for ECCV 2020 "Motion Capture from Internet Videos"

Motion Capture from Internet Videos Motion Capture from Internet Videos Junting Dong*, Qing Shuai*, Yuanqing Zhang, Xian Liu, Xiaowei Zhou, Hujun Bao

ZJU3DV 98 Dec 07, 2022