Using pretrained language models for biomedical knowledge graph completion.

Overview

LMs for biomedical KG completion

This repository contains code to run the experiments described in:

Scientific Language Models for Biomedical Knowledge Base Completion: An Empirical Study (arXiv link)
Rahul Nadkarni, David Wadden, Iz Beltagy, Noah A. Smith, Hannaneh Hajishirzi, Tom Hope

Data

The edge splits we used for our experiments can be downloaded using the following links:

Link File size
RepoDB, transductive split 11 MB
RepoDB, inductive split 11 MB
Hetionet, transductive split 49 MB
Hetionet, inductive split 49 MB
MSI, transductive split 813 MB
MSI, inductive split 813 MB

Each of these filees should be placed in the subgraph directory before running any of the experiment scripts. Please see the README.md file in the subgraph directory for more information on the edge split files. If you would like to recreate the edge splits yourself or construct new edge splits, use the scripts titled script/create_*_dataset.py.

Environment

The environment.yml file contains all of the necessary packages to use this code. We recommend using Anaconda/Miniconda to set up an environment, which you can do with the command

conda-env create -f environment.yml

Entity names and descriptions

The files that contain entity names and descriptions for all of the datasets can be found in data/processed directory. If you would like to recreate these files yourself, you will need to use the scripts for each dataset found in data/script.

Pre-tokenization

The main training script for the LMs src/lm/run.py can take in pre-tokenized entity names and descriptions as input, and several of the training scripts in script/training are set up to do so. If you would like to pre-tokenize text before fine-tuning, follow the instructions in script/pretokenize.py. You can also pass in one of the .tsv files found in data/processed for the argument --info_filename instead of a file with pre-tokenized text.

Training

All of the scripts for training models can be found in the src directory. The script for training all KGE models is src/kge/run.py, while the script for training LMs is src/lm/run.py. Our code for training KGE models is heavily based on this code from the Open Graph Benchmark Github repository. Examples of how to use each of these scripts, including training with Slurm, can be found in the script/training directory. This directory includes all of the scripts we used to run the experiments for the results in the paper.

Owner
Rahul Nadkarni
Computer Science Ph.D. student
Rahul Nadkarni
Multimodal Temporal Context Network (MTCN)

Multimodal Temporal Context Network (MTCN) This repository implements the model proposed in the paper: Evangelos Kazakos, Jaesung Huh, Arsha Nagrani,

Evangelos Kazakos 13 Nov 24, 2022
Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition

Light-SERNet This is the Tensorflow 2.x implementation of our paper "Light-SERNet: A lightweight fully convolutional neural network for speech emotion

Arya Aftab 29 Nov 12, 2022
A Neural Net Training Interface on TensorFlow, with focus on speed + flexibility

Tensorpack is a neural network training interface based on TensorFlow. Features: It's Yet Another TF high-level API, with speed, and flexibility built

Tensorpack 6.2k Jan 09, 2023
Boundary IoU API (Beta version)

Boundary IoU API (Beta version) Bowen Cheng, Ross Girshick, Piotr Dollár, Alexander C. Berg, Alexander Kirillov [arXiv] [Project] [BibTeX] This API is

Bowen Cheng 177 Dec 29, 2022
Invertible conditional GANs for image editing

Invertible Conditional GANs This is the implementation of the IcGAN model proposed in our paper: Invertible Conditional GANs for image editing. Novemb

Guim 278 Dec 12, 2022
Code for the paper "Unsupervised Contrastive Learning of Sound Event Representations", ICASSP 2021.

Unsupervised Contrastive Learning of Sound Event Representations This repository contains the code for the following paper. If you use this code or pa

Eduardo Fonseca 81 Dec 22, 2022
Benchmark library for high-dimensional HPO of black-box models based on Weighted Lasso regression

LassoBench LassoBench is a library for high-dimensional hyperparameter optimization benchmarks based on Weighted Lasso regression. Note: LassoBench is

Kenan Šehić 5 Mar 15, 2022
AdelaiDet is an open source toolbox for multiple instance-level detection and recognition tasks.

AdelaiDet is an open source toolbox for multiple instance-level detection and recognition tasks.

Adelaide Intelligent Machines (AIM) Group 3k Jan 02, 2023
Towhee is a flexible machine learning framework currently focused on computing deep learning embeddings over unstructured data.

Towhee is a flexible machine learning framework currently focused on computing deep learning embeddings over unstructured data.

1.7k Jan 08, 2023
Pytorch implementation of our method for high-resolution (e.g. 2048x1024) photorealistic video-to-video translation.

vid2vid Project | YouTube(short) | YouTube(full) | arXiv | Paper(full) Pytorch implementation for high-resolution (e.g., 2048x1024) photorealistic vid

NVIDIA Corporation 8.1k Jan 01, 2023
[NeurIPS 2021] Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data

Near-Duplicate Video Retrieval with Deep Metric Learning This repository contains the Tensorflow implementation of the paper Near-Duplicate Video Retr

Liming Jiang 238 Nov 25, 2022
ThunderGBM: Fast GBDTs and Random Forests on GPUs

Documentations | Installation | Parameters | Python (scikit-learn) interface What's new? ThunderGBM won 2019 Best Paper Award from IEEE Transactions o

Xtra Computing Group 647 Jan 04, 2023
Simple reimplemetation experiments about FcaNet

FcaNet-CIFAR An implementation of the paper FcaNet: Frequency Channel Attention Networks on CIFAR10/CIFAR100 dataset. how to run Code: python Cifar.py

76 Feb 04, 2021
Revisiting Weakly Supervised Pre-Training of Visual Perception Models

SWAG: Supervised Weakly from hashtAGs This repository contains SWAG models from the paper Revisiting Weakly Supervised Pre-Training of Visual Percepti

Meta Research 134 Jan 05, 2023
Vision-and-Language Navigation in Continuous Environments using Habitat

Vision-and-Language Navigation in Continuous Environments (VLN-CE) Project Website — VLN-CE Challenge — RxR-Habitat Challenge Official implementations

Jacob Krantz 132 Jan 02, 2023
Code for Talk-to-Edit (ICCV2021). Paper: Talk-to-Edit: Fine-Grained Facial Editing via Dialog.

Talk-to-Edit (ICCV2021) This repository contains the implementation of the following paper: Talk-to-Edit: Fine-Grained Facial Editing via Dialog Yumin

Yuming Jiang 221 Jan 07, 2023
Self-Supervised Pre-Training for Transformer-Based Person Re-Identification

Self-Supervised Pre-Training for Transformer-Based Person Re-Identification [pdf] The official repository for Self-Supervised Pre-Training for Transfo

Hao Luo 116 Jan 04, 2023
Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT

CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT CheXbert is an accurate, automated dee

Stanford Machine Learning Group 51 Dec 08, 2022
LyaNet: A Lyapunov Framework for Training Neural ODEs

LyaNet: A Lyapunov Framework for Training Neural ODEs Provide the model type--config-name to train and test models configured as those shown in the pa

Ivan Dario Jimenez Rodriguez 21 Nov 21, 2022