The official github repository for Towards Continual Knowledge Learning of Language Models

Last update: Jan 07, 2023

Overview

Towards Continual Knowledge Learning of Language Models

This is the official github repository for Towards Continual Knowledge Learning of Language Models.

In order to reproduce our results, take the following steps:

1. Create conda environment and install requirements

conda create -n ckl python=3.8 && conda activate ckl
pip install -r requirements.txt

Also, make sure to install the correct version of pytorch corresponding to the CUDA version and environment: Refer to https://pytorch.org/

#For CUDA 10.x
pip3 install torch torchvision torchaudio
#For CUDA 11.x
pip3 install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

2. Download the data used for the experiments.

To download only the CKL benchmark dataset:

python download_ckl_data.py

To download ALL of the data used for the experiments (required to reproduce results):

python download_all_data.py

To download the (continually pretrained) model checkpoints of the main experiment (required to reproduce results):

python download_model_checkpoints.py

For the other experimental settings such as multiple CKL phases, GPT-2, we do not separately provide the continually pretrained model checkpoints.

3. Reproducing Experimental Results

We provide all the configs in order to reproduce the zero-shot results of our paper. We only provide the model checkpoints for the main experimental setting (full_setting) which can be downloaded with the command above.

configs
├── full_setting
│   ├── evaluation
│   |   ├── invariantLAMA
│   |   |   ├── t5_baseline.json
│   |   |   ├── t5_kadapters.json
│   |   |   ├── ...
│   |   ├── newLAMA
│   |   ├── newLAMA_easy
│   |   ├── updatedLAMA
│   ├── training
│   |   ├── t5_baseline.json
│   |   ├── t5_kadapters.json
│   |   ├── ...
├── GPT2
│   ├── ...
├── kilt
│   ├── ...
├── small_setting
│   ├── ...
├── split
│   ├── ...

Components in each configurations file

input_length (int) : the input sequence length
output_length (int) : the output sequence length
num_train_epochs (int) : number of training epochs
output_dir (string) : the directory to save the model checkpoints
dataset (string) : the dataset to perform zero-shot evaluation or continual pretraining
dataset_version (string) : the version of the dataset ['full', 'small', 'debug']
train_batch_size (int) : batch size used for training
learning rate (float) : learning rate used for training
model (string) : model name in huggingface models (https://huggingface.co/models)
method (string) : method being used ['baseline', 'kadapter', 'lora', 'mixreview', 'modular_small', 'recadam']
freeze_level (int) : how much of the model to freeze during traininig (0 for none, 1 for freezing only encoder, 2 for freezing all of the parameters)
gradient_accumulation_steps (int) : gradient accumulation used to match the global training batch of each method
ngpu (int) : number of gpus used for the run
num_workers (int) : number of workers for the Dataloader
resume_from_checkpoint (string) : null by default. directory to model checkpoint if resuming from checkpoint
accelerator (string) : 'ddp' by default. the pytorch lightning accelerator to be used.
use_deepspeed (bool) : false by default. Currently not extensively tested.
CUDA_VISIBLE_DEVICES (string) : gpu devices that are made available for this run (e.g. "0,1,2,3", "0")
wandb_log (bool) : whether to log experiment through wandb
wandb_project (string) : project name of wandb
wandb_run_name (string) : the name of this training run
mode (string) : 'pretrain' for all configs
use_lr_scheduling (bool) : true if using learning rate scheduling
check_validation (bool) : true for evaluation (no training)
checkpoint_path (string) : path to the model checkpoint that is used for evaluation
output_log (string) : directory to log evaluation results to
split_num (int) : default is 1. more than 1 if there are multile CKL phases
split (int) : which CKL phase it is

This is an example of getting the invariantLAMA zero-shot evaluation of continually pretrained t5_kadapters

python run.py --config configs/full_setting/evaluation/invariantLAMA/t5_kadapters.json

This is an example of performing continual pretraining on CC-RecentNews (main experiment) with t5_kadapters

python run.py --config configs/full_setting/training/t5_kadapters.json

Reference

@article{jang2021towards,
  title={Towards Continual Knowledge Learning of Language Models},
  author={Jang, Joel and Ye, Seonghyeon and Yang, Sohee and Shin, Joongbo and Han, Janghoon and Kim, Gyeonghun and Choi, Stanley Jungkyu and Seo, Minjoon},
  journal={arXiv preprint arXiv:2110.03215},
  year={2021}
}

The official github repository for Towards Continual Knowledge Learning of Language Models

Related tags

Overview

Towards Continual Knowledge Learning of Language Models

1. Create conda environment and install requirements

2. Download the data used for the experiments.

3. Reproducing Experimental Results

Components in each configurations file

Reference

Owner

Joel Jang | 장요엘

An improvement of FasterGICP: Acceptance-rejection Sampling based 3D Lidar Odometry

Pytorch implementations of the paper Value Functions Factorization with Latent State Information Sharing in Decentralized Multi-Agent Policy Gradients

PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)

NeRF visualization library under construction

FAMIE is a comprehensive and efficient active learning (AL) toolkit for multilingual information extraction (IE)

This is a package for LiDARTag, described in paper: LiDARTag: A Real-Time Fiducial Tag System for Point Clouds

[PNAS2021] The neural architecture of language: Integrative modeling converges on predictive processing

Trustworthy AI related projects

Exploring Classification Equilibrium in Long-Tailed Object Detection, ICCV2021

[NeurIPS2021] Code Release of Learning Transferable Perturbations

A library for implementing Decentralized Graph Neural Network algorithms.

A distributed deep learning framework that supports flexible parallelization strategies.

TensorFlow-based implementation of "ICNet for Real-Time Semantic Segmentation on High-Resolution Images".

L-Verse: Bidirectional Generation Between Image and Text

A python software that can help blind people find things like laptops, phones, etc the same way a guide dog guides a blind person in finding his way.

Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation (CVPR 2022)

Implementation of Kaneko et al.'s MaskCycleGAN-VC model for non-parallel voice conversion.

Hand-distance-measurement-game - Hand Distance Measurement Game

Dense Deep Unfolding Network with 3D-CNN Prior for Snapshot Compressive Imaging, ICCV2021 [PyTorch Code]

Example scripts for the detection of lanes using the ultra fast lane detection model in ONNX.