《K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters》(2020)

Overview

K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters

This repository is the implementation of the paper "K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters".

In the K-adapter paper, we present a flexible approach that supports continual knowledge infusion into large pre-trained models (e.g. RoBERTa in this work). We infuse factual knowledge and linguistic knowledge, and show that adapters for both kinds of knowledge work well on downstream tasks.

For more details, please check the latest version of the paper: https://arxiv.org/abs/2002.01808

Prerequisites

  • Python 3.6
  • PyTorch 1.3.1
  • tensorboardX
  • transformers

We use huggingface/transformers framework, the environment can be installed with:

conda create -n kadapter python=3.6
pip install -r requirements.txt

Pre-training Adapters

In the pre-training procedure, we train each knowledge-specific adapter on different pre-training tasks individually.

1. Process Dataset

  • ./scripts/clean_T_REx.py: clean raw T-Rex dataset (32G), and save the cleaned T-Rex to JSON format
  • ./scripts/create_subdataset-relation-classification.ipynb: create the dataset from T-REx for pre-training factual adapter on relation classification task. This sub-dataset can be found here.
  • refer to this code to get the dependency parsing dataset : create the dataset from Book Corpus for pre-training the linguistic adapter on dependency parsing task.

2. Factual Adapter

To pre-train fac-adapter, run

bash run_pretrain_fac-adapter.sh

3. Linguistic Adapter

To pre-train lin-adapter, run

bash run_pretrain_lin-adapter.sh

The pre-trained fac-adapter and lin-adapter models can be found here.

Fine-tuning on Downstream Tasks

Adapter Structure

  • The fac-adapter (lin-adapter) consists of two transformer layers (L=2, H=768, A = 12)
  • The RoBERTa layers where adapters plug in: 0,11,23 or 0,11,22
  • For using only single adapter
    • Use the concatenation of the last hidden feature of RoBERTa and the last hidden feature of the adapter as the input representation for the task-specific layer.
  • For using combine adapter
    • For each adapter, first concat the last hidden feature of RoBERTa and the last hidden feature of every adapter and feed into a linear layer separately, then concat the representations as input for task-specific layer.

About how to load pretrained RoBERTa and pretrained adapter

  • The pre-trained adapters are in ./pretrained_models/fac-adapter/pytorch_model.bin and ./pretrained_models/lin-adapter/pytorch_model.bin. For using only single adapter, for example, fac-adapter, then you can set the argument meta_fac_adaptermodel= and set meta_lin_adaptermodel=””. For using both adapters, just set the arguments meta_fac_adaptermodel and meta_lin_adaptermodel as the path of adapters.
  • The pretrained RoBERTa will be downloaded automaticly when you run the pipeline.

1. Entity Typing

1.1 OpenEntity

One single 16G P100

(1) run the pipeline

bash run_finetune_openentity_adapter.sh

(2) result

  • with fac-adapter dev: (0.7967123287671233, 0.7580813347236705, 0.7769169115682607) test: (0.7929708951125755, 0.7584033613445378, 0.7753020134228187)
  • with lin-adapter dev: (0.8071672354948806, 0.7398331595411888, 0.7720348204570185) test:(0.8001135718341851, 0.7400210084033614, 0.7688949522510232)
  • with fac-adapter + lin-adapter dev: (0.8001101321585903, 0.7575599582898853, 0.7782538832351366) test: (0.7899568034557235, 0.7627737226277372, 0.7761273209549072)

the results may vary when running on different machines, but should not differ too much. I just search results from per_gpu_train_batch_sizeh: [4, 8] lr: [1e-5, 5e-6], warmup[0,200,500,1000,1200], maybe you can change other parameters and see the results. For w/fac-adapter, the best performance is achieved at gpu_num=1, per_gpu_train_batch_size=4, lr=5e-6, warmup=500(it takes about 2 hours to get the best result running on singe 16G P100) For w/lin-adapter, the best performance is achieved at gpu_num=1, per_gpu_train_batch_size=4, lr=5e-6, warmup=1000(it takes about 2 hours to get the best result running on singe 16G P100)

(3) Data format

Add special token "@" before and after a certain entity, then the first @ is adopted to perform classification. 9 entity categories: ['entity', 'location', 'time', 'organization', 'object', 'event', 'place', 'person', 'group'], each entity can be classified to several of them or none of them. The output is represented as [0,1,1,0,1,0,0,0,0], 0 represents the entity does not belong to the type, while 1 belongs to.

1.2 FIGER

(1) run the pipeline

bash run_finetune_figer_adapter.sh

The detailed hyperparamerters are listed in the running script.

2. Relation Classification

4*16G P100

(1) run the pipeline

bash run_finetune_tacred_adapter.sh

(2) result

  • with fac-adapter

    • 'dev': (0.6686945083853996, 0.7481604120676968, 0.7061989928807085)
    • 'test': (0.693900391717963, 0.7458646616541353, 0.7189447746050153)
  • with lin-adapter

    • 'dev': (0.6679165308118683, 0.7536791758646063, 0.7082108902333621),
    • 'test': (0.6884615384615385, 0.7536842105263157, 0.7195979899497488)
  • with fac-adapter + lin-adapter

    • 'dev': (0.6793893129770993, 0.7367549668874173, 0.7069102462271645)
    • 'test': (0.7014245014245014, 0.7404511278195489, 0.7204096561814192)
  • the results may vary when running on different machines, but should not differ too much.

  • I just search results from per_gpu_train_batch_sizeh: [4, 8] lr: [1e-5, 5e-6], warmup[0,200,1000,1200], maybe you can change other parameters and see the results.

  • The best performance is achieved at gpu_num=4, per_gpu_train_batch_size=8, lr=1e-5, warmup=200 (it takes about 7 hours to get the best result running on 4 16G P100)

  • The detailed hyperparamerters are listed in the running script.

(3) Data format

Add special token "@" before and after the first entity, add '#' before and after the second entity. Then the representations of @ and # are concatenated to perform relation classification.

3. Question Answering

3.1 CosmosQA

One single 16G P100

(1) run the pipeline

bash run_finetune_cosmosqa_adapter.sh

(2) result

CosmosQA dev accuracy: 80.9 CosmosQA test accuracy: 81.8

The best performance is achieved at gpu_num=1, per_gpu_train_batch_size=64, GRADIENT_ACC=32, lr=1e-5, warmup=0 (it takes about 8 hours to get the best result running on singe 16G P100) The detailed hyperparamerters are listed in the running script.

(3) Data format

For each answer, the input is contextquestionanswer, and will get a score for this answers. After getting four scores, we will select the answer with the highest score.

3.2 SearchQA and Quasar-T

The source codes for fine-tuning on SearchQA and Quasar-T dataset are modified based on the code of paper "Denoising Distantly Supervised Open-Domain Question Answering".

Use K-Adapter just like RoBERTa

  • You can use K-Adapter (RoBERTa with adapters) just like RoBERTa, which almost have the same inputs and outputs. Specifically, we add a class RobertawithAdapter in pytorch_transformers/my_modeling_roberta.py.
  • A demo code [run_example.sh and examples/run_example.py] about how to use “RobertawithAdapter”, do inference, save model and load model. You can leave the arguments of adapters as default.
  • Now it is very easy to use Roberta with adapters. If you only want to use single adapter, for example, fac-adapter, then you can set the argument meta_fac_adaptermodel='./pretrained_models/fac-adapter/pytorch_model.bin'' and set meta_lin_adaptermodel=””. If you want to use both adapters, just set the arguments meta_fac_adaptermodel and meta_lin_adaptermodel as the path of adapters.
bash run_example.sh

TODO

  • Remove and merge redundant codes
  • Support other pre-trained models, such as BERT...

Contact

Feel free to contact Ruize Wang ([email protected]) if you have any further questions.

Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
TensorFlow port of PyTorch Image Models (timm) - image models with pretrained weights.

TensorFlow-Image-Models Introduction Usage Models Profiling License Introduction TensorfFlow-Image-Models (tfimm) is a collection of image models with

Martins Bruveris 227 Dec 20, 2022
Out of Distribution Detection on Natural Adversarial Examples

OOD-on-NAE Research project on out of distribution detection for the Computer Vision course by Prof. Rob Fergus (CSCI-GA 2271) Paper out on arXiv - ht

Anugya 1 Jun 08, 2022
10th place solution for Google Smartphone Decimeter Challenge at kaggle.

Under refactoring 10th place solution for Google Smartphone Decimeter Challenge at kaggle. Google Smartphone Decimeter Challenge Global Navigation Sat

12 Oct 25, 2022
Minecraft Hack Detection With Python

Minecraft Hack Detection An attempt to try and use crowd sourced replays to find

Kuleen Sasse 3 Mar 26, 2022
Meta-Learning Sparse Implicit Neural Representations (NeurIPS 2021)

Meta-SparseINR Official PyTorch implementation of "Meta-learning Sparse Implicit Neural Representations" (NeurIPS 2021) by Jaeho Lee*, Jihoon Tack*, N

Jaeho Lee 41 Nov 10, 2022
Решения, подсказки, тесты и утилиты для тренировки по алгоритмам от Яндекса.

Решения и подсказки к тренировке по алгоритмам от Яндекса Что есть внутри Решения с подсказками и комментариями; рекомендую сначала смотреть md файл п

Yankovsky Andrey 50 Dec 26, 2022
Official code for "EagerMOT: 3D Multi-Object Tracking via Sensor Fusion" [ICRA 2021]

EagerMOT: 3D Multi-Object Tracking via Sensor Fusion Read our ICRA 2021 paper here. Check out the 3 minute video for the quick intro or the full prese

Aleksandr Kim 276 Dec 30, 2022
Bald-to-Hairy Translation Using CycleGAN

GANiry: Bald-to-Hairy Translation Using CycleGAN Official PyTorch implementation of GANiry. GANiry: Bald-to-Hairy Translation Using CycleGAN, Fidan Sa

Fidan Samet 10 Oct 27, 2022
KakaoBrain KoGPT (Korean Generative Pre-trained Transformer)

KoGPT KoGPT (Korean Generative Pre-trained Transformer) https://github.com/kakaobrain/kogpt https://huggingface.co/kakaobrain/kogpt Model Descriptions

Kakao Brain 799 Dec 28, 2022
Covid19-Forecasting - An interactive website that tracks, models and predicts COVID-19 Cases

Covid-Tracker This is an interactive website that tracks, models and predicts CO

Adam Lahmadi 1 Feb 01, 2022
PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners for self-supervised ViT.

MAE for Self-supervised ViT Introduction This is an unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners for self-sup

36 Oct 30, 2022
This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the time series forecasting research space.

TSForecasting This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the tim

Rakshitha Godahewa 80 Dec 30, 2022
PPO is a very popular Reinforcement Learning algorithm at present.

PPO is a very popular Reinforcement Learning algorithm at present. OpenAI takes PPO as the current baseline algorithm. We use the PPO algorithm to train a policy to give the best action in any situat

Rosefintech 11 Aug 23, 2021
This package proposes simplified exporting pytorch models to ONNX and TensorRT, and also gives some base interface for model inference.

PyTorch Infer Utils This package proposes simplified exporting pytorch models to ONNX and TensorRT, and also gives some base interface for model infer

Alex Gorodnitskiy 11 Mar 20, 2022
A bunch of random PyTorch models using PyTorch's C++ frontend

PyTorch Deep Learning Models using the C++ frontend Gettting started Clone the repo 1. https://github.com/mrdvince/pytorchcpp 2. cd fashionmnist or

Vince 0 Jul 13, 2021
Conversational text Analysis using various NLP techniques

PyConverse Let me try first Installation pip install pyconverse Usage Please try this notebook that demos the core functionalities: basic usage noteb

Rita Anjana 158 Dec 25, 2022
Command-line tool for downloading and extending the RedCaps dataset.

RedCaps Downloader This repository provides the official command-line tool for downloading and extending the RedCaps dataset. Users can seamlessly dow

RedCaps dataset 33 Dec 14, 2022
Implementation of Deformable Attention in Pytorch from the paper "Vision Transformer with Deformable Attention"

Deformable Attention Implementation of Deformable Attention from this paper in Pytorch, which appears to be an improvement to what was proposed in DET

Phil Wang 128 Dec 24, 2022
SimplEx - Explaining Latent Representations with a Corpus of Examples

SimplEx - Explaining Latent Representations with a Corpus of Examples Code Author: Jonathan Crabbé ( Jonathan Crabbé 14 Dec 15, 2022

This is the source code of the 1st place solution for segmentation task (with Dice 90.32%) in 2021 CCF BDCI challenge.

1st place solution in CCF BDCI 2021 ULSEG challenge This is the source code of the 1st place solution for ultrasound image angioma segmentation task (

Chenxu Peng 30 Nov 22, 2022