RCT-ART is an NLP pipeline built with spaCy for converting clinical trial result sentences into tables through jointly extracting intervention, outcome and outcome measure entities and their relations.

Related tags

Deep LearningRCT-ART
Overview

Randomised controlled trial abstract result tabulator

RCT-ART is an NLP pipeline built with spaCy for converting clinical trial result sentences into tables through jointly extracting intervention, outcome and outcome measure entities and their relations. The system is currently constrained to result sentences with specific measures of an outcome for a specific intervention and does not extract comparative relationship (e.g. a relative decrease between the study intervention and placebo).

This repository contains custom pipes and models developed, trained and run using the spaCy library. These are defined and initiated through configs and custom scripts.

In addition, we include all stages of our datasets from their raw format, gold-standard annotations, pre-processed spacy docs and output tables of the system, as well as the evaluation results of the system for its different NLP tasks across each pre-trained model.

Running the system from Python

After cloning this repository and pip installing its dependencies from requirements.txt, the system can be run in two steps:

1. Download and extract the trained models

In the primary study of RCT-ART, we explored a number of BERT-based models in the development of the system. Here, we make available the BioBERT-based named entity recognition (NER) and relation extraction (RE) models:

Download models from here.

The train_models folder of the compression file should be extracted into the root of the cloned directory for the system scripts to be able to access the models.

2a. Demo the system NLP tasks

Once the model folder has been extracted, a streamlit demo of the system NER, RE and tabulation tasks can be run locally on your browser with the following command:

streamlit run scripts/demo.py

2b. Process multiple RCT result sentences

Alternatively, multiple result sentences can be processed by the system using tabulate.py in the scripts directory. Input sentences should be in the Doc format, with the sentences from the study available within datasets/preprocessed.

Training new models for the system

The NER and RE models employed by RCT-ART were both trained using spaCy config files, where we defined their architectures and training hyper-parameters. These are included in the config directory, with a config for each model type and the different BERT-based language representation models we explored in the development of the system. The simplest way to initiate spaCy model training is with the library's inbuilt commands (https://spacy.io/usage/training), passing in the paths of the config file, training set and development set. Below are the commands we used to train the models made available with this repository:

spaCy cmd for training BioBERT-based NER model on all-domains dataset

python -m spacy train configs/ner_biobert.cfg --output ../trained_models/biobert/ner/all_domains --paths.train ../datasets/preprocessed/all_domains/results_only/train.spacy --paths.dev ../datasets/preprocessed/all_domains/results_only/dev.spacy -c ../scripts/custom_functions.py --gpu-id 0

spaCy cmd for training BioBERT-based RE model on all-domains dataset

python -m spacy train configs/rel_biobert.cfg --output ../trained_models/biobert/rel/all_domains  --paths.train ../datasets/preprocessed/all_domains/results_only/train.spacy --paths.dev ../datasets/preprocessed/all_domains/results_only/dev.spacy -c ../scripts/custom_functions.py --gpu-id 0

Repository contents breakdown

The following is a brief description of the assets available in this repository.

configs

Includes the spaCy config files for training NER and RE models of the RCT-ART system. These files define the model architectures, including the BERT-base language representations. Three of BERT language representations were experimented with for each model in the main study of this sytem: BioBERT, SciBERT and RoBERTa.

datasets

Includes all stages of the data used to train and test the RCT-ART models from raw to split gold-standard files in spaCy doc format.

Before filtering and result sentence extraction, abstracts were sourced from the EBM-NLP corpus and the annotated corpus from the Trenta et al. study, which explored automated information extraction from RCTs, and was a key reference for our study.

evaluation_results

Output txt files from the evaluate.py script, giving precision, recall and F1 scores for each of the system tasks across the various dataset cuts.

output_tables

Output csv files from the tabulate.py script, includes the predicted tables output by our system for each test result sentence.

scripts

Below is a contents list of the repository scripts with brief descriptions. Full descriptions can be found at the head of each script.

custom_functions.py -- helper functions for supporting key modules of system.

data_collection.py -- classes and functions for filtering the EBM-NLP corpus and result sentence preprocessing.

demo.py -- a browser-based demo of the RCT-ART system developed with spaCy and Streamlit (see above).

entity_ruler.py -- a script for rules-based entity recognition. Unused in final system, but made available for further development.

evaluate.py -- a set of function for evaluating the system across the NLP tasks: NER, RE, joint NER + RE and tabulation.

preprocessing.py -- a set of function for further data preprocessing after data collection and splitting data into train, test and dev sets.

rel_model.py -- defines the relation extraction model.

rel_pipe.py -- integrates the relation extraction model as a spaCy pipeline component.

tabulate.py -- run the full system by loading the NER and RE models and running their outputs through a tabulation function. Can be used on batches of RCT sentences to output batches of CSV files.

train_multiple_models.py -- iterates through spaCy train commands with different input parameters allowing batches of models to be trained.

Common issues

The transformer models of this system need a GPU with suitable video RAM -- in the primary study, they were trained and run on a GeForce RTX 3080 10GB.

There can be issues with the transformer library dependencies -- CUDA and pytorch. If an issue occurs, ensure CUDA 11.1 is installed on your system, and try reinstalling PyTorch with the following command:

pip3 install torch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio===0.9.1 -f https://download.pytorch.org/whl/torch_stable.html

References

  1. The relation extraction component was adapted from the following spaCy project tutorial.

  2. The EBM-NLP corpus is accessible from here and its publication can be found here.

  3. The glaucoma corpus can be found in the Trenta et al. study.

Deep learning models for classification of 15 common weeds in the southern U.S. cotton production systems.

CottonWeeds Deep learning models for classification of 15 common weeds in the southern U.S. cotton production systems. requirements pytorch torchsumma

Dong Chen 8 Jun 07, 2022
An intelligent, flexible grammar of machine learning.

An english representation of machine learning. Modify what you want, let us handle the rest. Overview Nylon is a python library that lets you customiz

Palash Shah 79 Dec 02, 2022
MoCap-Solver: A Neural Solver for Optical Motion Capture Data

MoCap-Solver is a data-driven-based robust marker denoising method, which takes raw mocap markers as input and outputs corresponding clean markers and skeleton motions.

55 Dec 28, 2022
Collections for the lasted paper about multi-view clustering methods (papers, codes)

Multi-View Clustering Papers Collections for the lasted paper about multi-view clustering methods (papers, codes). There also exists some repositories

Andrew Guan 10 Sep 20, 2022
Randomized Correspondence Algorithm for Structural Image Editing

===================================== README: Inpainting based PatchMatch ===================================== @Author: Younesse ANDAM @Conta

Younesse 116 Dec 24, 2022
Computer Vision and Pattern Recognition, NUS CS4243, 2022

CS4243_2022 Computer Vision and Pattern Recognition, NUS CS4243, 2022 Cloud Machine #1 : Google Colab (Free GPU) Follow this Notebook installation : h

Xavier Bresson 142 Dec 15, 2022
Pure python implementation reverse-mode automatic differentiation

MiniGrad A minimal implementation of reverse-mode automatic differentiation (a.k.a. autograd / backpropagation) in pure Python. Inspired by Andrej Kar

Kenny Song 76 Sep 12, 2022
Code for our CVPR 2021 Paper "Rethinking Style Transfer: From Pixels to Parameterized Brushstrokes".

Rethinking Style Transfer: From Pixels to Parameterized Brushstrokes (CVPR 2021) Project page | Paper | Colab | Colab for Drawing App Rethinking Style

CompVis Heidelberg 153 Jan 04, 2023
[ICCV 2021] Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification

Counterfactual Attention Learning Created by Yongming Rao*, Guangyi Chen*, Jiwen Lu, Jie Zhou This repository contains PyTorch implementation for ICCV

Yongming Rao 90 Dec 31, 2022
ALBERT-pytorch-implementation - ALBERT pytorch implementation

ALBERT-pytorch-implementation developing... 모델의 개념이해를 돕기 위한 구현물로 현재 변수명을 상세히 적었고

BG Kim 3 Oct 06, 2022
Experiments on continual learning from a stream of pretrained models.

Ex-model CL Ex-model continual learning is a setting where a stream of experts (i.e. model's parameters) is available and a CL model learns from them

Antonio Carta 6 Dec 04, 2022
Lorien: A Unified Infrastructure for Efficient Deep Learning Workloads Delivery

Lorien: A Unified Infrastructure for Efficient Deep Learning Workloads Delivery Lorien is an infrastructure to massively explore/benchmark the best sc

Amazon Web Services - Labs 45 Dec 12, 2022
Combinatorially Hard Games where the levels are procedurally generated

puzzlegen Implementation of two procedurally simulated environments with gym interfaces. IceSlider: the agent needs to reach and stop on the pink squa

Autonomous Learning Group 3 Jun 26, 2022
Code for 2021 NeurIPS --- Towards Multi-Grained Explainability for Graph Neural Networks

ReFine: Multi-Grained Explainability for GNNs We are trying hard to update the code, but it may take a while to complete due to our tight schedule rec

Shirley (Ying-Xin) Wu 47 Dec 16, 2022
A Python library for Deep Graph Networks

PyDGN Wiki Description This is a Python library to easily experiment with Deep Graph Networks (DGNs). It provides automatic management of data splitti

Federico Errica 194 Dec 22, 2022
Multi-Horizon-Forecasting-for-Limit-Order-Books

Multi-Horizon-Forecasting-for-Limit-Order-Books This jupyter notebook is used to demonstrate our work, Multi-Horizon Forecasting for Limit Order Books

Zihao Zhang 116 Dec 23, 2022
This is the official code for the paper "Ad2Attack: Adaptive Adversarial Attack for Real-Time UAV Tracking".

Ad^2Attack:Adaptive Adversarial Attack on Real-Time UAV Tracking Demo video 📹 Our video on bilibili demonstrates the test results of Ad^2Attack on se

Intelligent Vision for Robotics in Complex Environment 10 Nov 07, 2022
The Official PyTorch Implementation of "VAEBM: A Symbiosis between Variational Autoencoders and Energy-based Models" (ICLR 2021 spotlight paper)

Official PyTorch implementation of "VAEBM: A Symbiosis between Variational Autoencoders and Energy-based Models" (ICLR 2021 Spotlight Paper) Zhisheng

NVIDIA Research Projects 45 Dec 26, 2022
A spherical CNN for weather forecasting

DeepSphere-Weather - Deep Learning on the sphere for weather/climate applications. The code in this repository provides a scalable and flexible framew

DeepSphere 47 Dec 25, 2022
PyTorch implementations of Generative Adversarial Networks.

This repository has gone stale as I unfortunately do not have the time to maintain it anymore. If you would like to continue the development of it as

Erik Linder-Norén 13.4k Jan 08, 2023