DEMix Layers for Modular Language Modeling

Related tags

Deep Learningdemix
Overview

DEMix

This repository contains modeling utilities for "DEMix Layers: Disentangling Domains for Modular Language Modeling" (Gururangan et. al, 2021).

This code is a fork of Fairseq. It is based on Python 3.8, CUDA 11 and includes PyTorch 1.8.0, NCCL 2.8.4 and apex.

Dataset

The multidomain dataset scripts are housed in another repository, located here. Clone that repository and follow instructions to setup data to train on.

Follow that tutorial to generate data-bins on eight (small) example domains.

Make sure to set the DATA_DIR accordingly.

Fairseq Installation

If you've already made an environment from the dataset creation phase, just use that. Otherwise:

conda create env --name demix
cd demix/
pip install --editable .

Additionally, please make sure you have the dependencies above installed (check Fairseq documentation for more information).

Tutorial

Here we will follow a tutorial to train on the example domains from the tutorial in the DEMix-data repository. Note that the model that results from this tutorial is pretty bad, because we're working with very small amounts of data and also a small LM. This tutorial is there to help you quickly understand the pipeline, and ensure that each script completes successfully.

To replicate the DEMix paper, with a GPT-3 model, follow the instructions here.

Basic Training

After setting up the example domains, run the following to train a small language model. Note that the scripts in this paper assume you are running on a multi-node GPU cluster with SLURM.

First, allocate some nodes, with GPUs with at least 32GB of RAM. Here we allocate 1 node with 8 volta32GB GPUs.

salloc --gpus-per-node 8 --nodes 1  -C 'volta32gb' --ntasks-per-node 8 --cpus-per-task 10 --mem 400G --time XXX --partition YYY

Then run:

export NUM_GPUS=8
export DISTRIBUTED_PORT=12345
export MODEL=transformer_lm
export EXPERIMENT=demix
# $DATA_DIR was set in DEMix-data tutorial.
export DATA_BIN=${DATA_DIR}/data-bin/
export EXPERIMENT_SUFFIX=tutorial
export SERIALIZATION_DIR=$(pwd)/demix_tutorial_model
bash tutorial/train.sh $NUM_GPUS \
                    $DISTRIBUTED_PORT \
                    $MODEL \
                    $EXPERIMENT \
                    $DATA_BIN \
                    $SERIALIZATION_DIR \
                    $EXPERIMENT_SUFFIX

This will output a trained language model in ${SERIALIZATION_DIR}

To train balanced dense LM, set export EXPERIMENT=dense, to train unbalanced dense LM, set export EXPERIMENT=unbalanced, to train "+Domain Token" LM , set export EXPERIMENT=domain_token.

We have provided a simple script demix/train.sh, with the same interface, with all hyperparameter preset to help replicate results in the paper.

Evaluation

We have two ways to evaluate the demix language model: with and without mixing experts.

Evaluating without mixing experts

To evaluate the language model without mixing experts, you can supply the checkpoint from a GPU on a particular rank (to specify the use of the domain expert that was trained on that GPU):

export DATA_BIN=${DATA_DIR}/data-bin/
export GPU_RANK=0
export PATH_TO_CHECKPOINT=${SERIALIZATION_DIR}/checkpoint_last-rank-${GPU_RANK}.pt
export OUTPUT_PATH=eval_output.jsonl
export SPLIT=valid
export DOMAIN=imdb
bash tutorial/eval_lm.sh $DATA_BIN $PATH_TO_CHECKPOINT $OUTPUT_PATH $SPLIT $DOMAIN

To evaluate on test data, set export SPLIT=test

The same script is used for the other baselines.

For the +domain token model, you can additionally supply a domain token to use at test time:

export DOMAIN_TOKEN=XXX
bash tutorial/eval_lm.sh $DATA_BIN $PATH_TO_CHECKPOINT $OUTPUT_PATH $SPLIT $DOMAIN $DOMAIN_TOKEN

Evaluating with mixing experts

First, we estimate the posterior distribution on 100 sequences of validation data of the domain using the following command:

export DATA_BIN=${DATA_DIR}/data-bin
export DOMAIN=imdb
export DEV_POSTERIOR_OUTPUT=dev_posteriors.jsonl
# set NUM_EVALUATION_GPUS equal to the number of experts you'd like to ensemble.
export NUM_EVALUATION_GPUS=8;
bash tutorial/mix_eval_lm.sh $NUM_EVALUATION_GPUS $DATA_BIN  ${SERIALIZATION_DIR}/checkpoint_last-rank-0.pt:${SERIALIZATION_DIR}/checkpoint_last-rank-1.pt:${SERIALIZATION_DIR}/checkpoint_last-rank-2.pt:${SERIALIZATION_DIR}/checkpoint_last-rank-3.pt:${SERIALIZATION_DIR}/checkpoint_last-rank-4.pt:${SERIALIZATION_DIR}/checkpoint_last-rank-5.pt:${SERIALIZATION_DIR}/checkpoint_last-rank-6.pt:${SERIALIZATION_DIR}/checkpoint_last-rank-7.pt $DOMAIN $DEV_POSTERIOR_OUTPUT estimate;

Then, we open $POSTERIOR_OUTPUT, extracting the exp_avg_posterior value of the last line in that file:

export POSTERIOR=$(tail -n 1 $DEV_POSTERIOR_OUTPUT | jq -rc '.exp_avg_posterior | join(",")')

We use this posterior as the domain prior (supplied as a string) when evaluating on test data, like so:

bash tutorial/mix_eval_lm.sh $NUM_EVALUATION_GPUS $DATA_BIN  ${SERIALIZATION_DIR}/checkpoint_last-rank-0.pt:${SERIALIZATION_DIR}/checkpoint_last-rank-1.pt:${SERIALIZATION_DIR}/checkpoint_last-rank-2.pt:${SERIALIZATION_DIR}/checkpoint_last-rank-3.pt:${SERIALIZATION_DIR}/checkpoint_last-rank-4.pt:${SERIALIZATION_DIR}/checkpoint_last-rank-5.pt:${SERIALIZATION_DIR}/checkpoint_last-rank-6.pt:${SERIALIZATION_DIR}/checkpoint_last-rank-7.pt $DOMAIN $DEV_POSTERIOR_OUTPUT eval $POSTERIOR cached_prior;

Adapting the Language Model

We additionally provide scripts to adapt the language model to a new domain.

DEMix DAPT

In this tutorial, we just adapt one of the existing experts to a new example domain in the demix-data project, located in /path/to/demix-data/new_example_domains.

First, we need to figure out which domain expert has the most affinity to the target domain we want to adapt to:

export NEW_DATA_BIN=/private/home/suching/demix-data/new_example_domains/data-bin/
export NEW_DOMAIN=acl_papers
export DEV_POSTERIOR_OUTPUT=${NEW_DOMAIN}_posterior.jsonl
# set NUM_EVALUATION_GPUS equal to the number of experts you'd like to ensemble.
export NUM_EVALUATION_GPUS=8;
bash tutorial/mix_eval_lm.sh $NUM_EVALUATION_GPUS $NEW_DATA_BIN  ${SERIALIZATION_DIR}/checkpoint_last-rank-0.pt:${SERIALIZATION_DIR}/checkpoint_last-rank-1.pt:${SERIALIZATION_DIR}/checkpoint_last-rank-2.pt:${SERIALIZATION_DIR}/checkpoint_last-rank-3.pt:${SERIALIZATION_DIR}/checkpoint_last-rank-4.pt:${SERIALIZATION_DIR}/checkpoint_last-rank-5.pt:${SERIALIZATION_DIR}/checkpoint_last-rank-6.pt:${SERIALIZATION_DIR}/checkpoint_last-rank-7.pt $NEW_DOMAIN $DEV_POSTERIOR_OUTPUT estimate;
export POSTERIOR=$(tail -n 1 $DEV_POSTERIOR_OUTPUT | jq -rc '.exp_avg_posterior | join(",")')

Here, we find that the most likely expert is expert number 5.

export POSTERIOR=$(tail -n 1 $DEV_POSTERIOR_OUTPUT | jq -rc '.exp_avg_posterior | join(",")')
echo $POSTERIOR

We then adapt expert 5 to the target domain using the tutorial/dapt.sh script, using DEMix DAPT:

export PATH_TO_CHECKPOINT=${SERIALIZATION_DIR}/checkpoint_last-rank-5.pt
export UNFREEZE_PARAMETERS=feedforward
export NEW_SERIALIZATION_DIR=$(pwd)/${NEW_DOMAIN}_demix_dapt
export EXPERIMENT_SUFFIX=test
bash tutorial/dapt.sh $NEW_DATA_BIN $NEW_DOMAIN $PATH_TO_CHECKPOINT $UNFREEZE_PARAMETERS $NEW_SERIALIZATION_DIR $EXPERIMENT_SUFFIX

Once this is trained, you can add that expert to your ensemble when evaluating on new data:

export NEW_DATA_BIN=/path/to/demix-data/new_example_domains/data-bin/
export NEW_DOMAIN=acl_papers
export DEV_POSTERIOR_OUTPUT=${NEW_DOMAIN}_posterior.jsonl
# set NUM_EVALUATION_GPUS equal to the number of experts you'd like to ensemble.
export NUM_EVALUATION_GPUS=8;
export PATH_TO_NEW_EXPERT=${NEW_SERIALIZATION_DIR}/checkpoint_last-rank-0.pt
bash tutorial/mix_eval_lm.sh $NUM_EVALUATION_GPUS $NEW_DATA_BIN  ${SERIALIZATION_DIR}/checkpoint_last-rank-0.pt:${SERIALIZATION_DIR}/checkpoint_last-rank-1.pt:${SERIALIZATION_DIR}/checkpoint_last-rank-2.pt:${SERIALIZATION_DIR}/checkpoint_last-rank-3.pt:${SERIALIZATION_DIR}/checkpoint_last-rank-4.pt:${SERIALIZATION_DIR}/checkpoint_last-rank-5.pt:${SERIALIZATION_DIR}/checkpoint_last-rank-6.pt:${PATH_TO_NEW_EXPERT} $NEW_DOMAIN $DEV_POSTERIOR_OUTPUT estimate;
export POSTERIOR=$(tail -n 1 $DEV_POSTERIOR_OUTPUT | jq -rc '.exp_avg_posterior | join(",")')

Dense DAPT

If you wanted to do Dense DAPT instead, just change the environment variables:

export PATH_TO_CHECKPOINT=/path/to/dense/model/checkpoint_last.pt
export FEEDFORWARD_OR_FULL=full
export SERIALIZATION_DIR=$(pwd)/${NEW_DOMAIN}_dense_dapt
export EXPERIMENT_SUFFIX=test
bash tutorial/dapt.sh $NEW_DATA_BIN $NEW_DOMAIN $PATH_TO_CHECKPOINT $FEEDFORWARD_OR_FULL $SERIALIZATION_DIR $EXPERIMENT_SUFFIX
Owner
Suchin
Allen Institute for AI / Facebook AI
Suchin
GANfolk: Using AI to create portraits of fictional people to sell as NFTs

GANfolk are AI-generated renderings of fictional people. Each image in the collection was created by a pair of Generative Adversarial Networks (GANs) with names and backstories also created with AI.

Robert A. Gonsalves 32 Dec 02, 2022
Pytorch implementation of NeurIPS 2021 paper: Geometry Processing with Neural Fields.

Geometry Processing with Neural Fields Pytorch implementation for the NeurIPS 2021 paper: Geometry Processing with Neural Fields Guandao Yang, Serge B

Guandao Yang 162 Dec 16, 2022
The Official Implementation of the ICCV-2021 Paper: Semantically Coherent Out-of-Distribution Detection.

SCOOD-UDG (ICCV 2021) This repository is the official implementation of the paper: Semantically Coherent Out-of-Distribution Detection Jingkang Yang,

Jake YANG 62 Nov 21, 2022
[CVPR'21] Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild

IVOS-W Paper Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild Zhaoyun Yin, Jia Zheng, Weixin Luo, Shenhan Qian, Hanli

SVIP Lab 38 Dec 12, 2022
Pytorch Performace Tuning, WandB, AMP, Multi-GPU, TensorRT, Triton

Plant Pathology 2020 FGVC7 Introduction A deep learning model pipeline for training, experimentaiton and deployment for the Kaggle Competition, Plant

Bharat Giddwani 0 Feb 25, 2022
This repository contains part of the code used to make the images visible in the article "How does an AI Imagine the Universe?" published on Towards Data Science.

Generative Adversarial Network - Generating Universe This repository contains part of the code used to make the images visible in the article "How doe

Davide Coccomini 9 Dec 18, 2022
Implementation for Curriculum DeepSDF

Curriculum-DeepSDF This repository is an implementation for Curriculum DeepSDF. Full paper is available here. Preparation Please follow original setti

Haidong Zhu 69 Dec 29, 2022
This is the pytorch implementation of the paper - Axiomatic Attribution for Deep Networks.

Integrated Gradients This is the pytorch implementation of "Axiomatic Attribution for Deep Networks". The original tensorflow version could be found h

Tianhong Dai 150 Dec 23, 2022
A Pytorch Implementation of ClariNet

ClariNet A Pytorch Implementation of ClariNet (Mel Spectrogram -- Waveform) Requirements PyTorch 0.4.1 & python 3.6 & Librosa Examples Step 1. Downlo

Sungwon Kim 286 Sep 15, 2022
When in Doubt: Improving Classification Performance with Alternating Normalization

When in Doubt: Improving Classification Performance with Alternating Normalization Findings of EMNLP 2021 Menglin Jia, Austin Reiter, Ser-Nam Lim, Yoa

Menglin Jia 13 Nov 06, 2022
TCNN Temporal convolutional neural network for real-time speech enhancement in the time domain

TCNN Pandey A, Wang D L. TCNN: Temporal convolutional neural network for real-time speech enhancement in the time domain[C]//ICASSP 2019-2019 IEEE Int

凌逆战 16 Dec 30, 2022
a baseline to practice

ccks2021_track3_baseline a baseline to practice 路径可能会有问题,自己改改 torch==1.7.1 pyhton==3.7.1 transformers==4.7.0 cuda==11.0 this is a baseline, you can fi

45 Nov 23, 2022
Image to Image translation, image generataton, few shot learning

Semi-supervised Learning for Few-shot Image-to-Image Translation [paper] Abstract: In the last few years, unpaired image-to-image translation has witn

yaxingwang 49 Nov 18, 2022
The code release of paper 'Domain Generalization for Medical Imaging Classification with Linear-Dependency Regularization' NIPS 2020.

Domain Generalization for Medical Imaging Classification with Linear Dependency Regularization The code release of paper 'Domain Generalization for Me

Yufei Wang 56 Dec 28, 2022
A Python toolbox to create adversarial examples that fool neural networks in PyTorch, TensorFlow, and JAX

Foolbox Native: Fast adversarial attacks to benchmark the robustness of machine learning models in PyTorch, TensorFlow, and JAX Foolbox is a Python li

Bethge Lab 2.4k Dec 25, 2022
P-Tuning v2: Prompt Tuning Can Be Comparable to Finetuning Universally Across Scales and Tasks

P-tuning v2 P-Tuning v2: Prompt Tuning Can Be Comparable to Finetuning Universally Across Scales and Tasks An optimized prompt tuning strategy achievi

THUDM 540 Dec 30, 2022
Advancing mathematics by guiding human intuition with AI

Advancing mathematics by guiding human intuition with AI This repo contains two colab notebooks which accompany the paper, available online at https:/

DeepMind 315 Dec 26, 2022
sense-py-AnishaBaishya created by GitHub Classroom

Compute Statistics Here we compute statistics for a bunch of numbers. This project uses the unittest framework to test functionality. Pass the tests T

1 Oct 21, 2021
This codebase proposes modular light python and pytorch implementations of several LiDAR Odometry methods

pyLiDAR-SLAM This codebase proposes modular light python and pytorch implementations of several LiDAR Odometry methods, which can easily be evaluated

Kitware, Inc. 208 Dec 16, 2022
FairMOT - A simple baseline for one-shot multi-object tracking

FairMOT - A simple baseline for one-shot multi-object tracking

Yifu Zhang 3.6k Jan 08, 2023