Codes for our paper The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders published to EMNLP 2021.

Last update: Jul 08, 2022

Related tags

Deep Learning stem-cell-hypothesis

Overview

The Stem Cell Hypothesis

Codes for our paper The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders published to EMNLP 2021.

Installation

Run the following setup script. Feel free to install a GPU-enabled PyTorch (torch>=1.6.0).

python3 -m venv env
source env/bin/activate
ln -sf "$(which python2)" env/bin/python
pip install -e .

Data Pre-processing

Download OntoNotes 5 (LDC2013T19.tgz) and put it into the following directory:

mkdir -p ~/.elit/thirdparty/catalog.ldc.upenn.edu/LDC2013T19/
cp LDC2013T19.tgz ~/.elit/thirdparty/catalog.ldc.upenn.edu/LDC2013T19/LDC2013T19.tgz

That's all. ELIT will automatically do the rest for you the first time you run the training script.

Experiments

Here we demonstrate how to experiment with BERT-base but feel free to replace the transformer and task name in the script path for other experiments. Our scripts are grouped by transformers and tasks with clear semantics.

Single Task Learning

The following script will train STL-POS with BERT-base and evaluate its performance on the test set:

python3 stem_cell_hypothesis/en_bert_base/single/pos.py

Multi-Task Learning

The following script will train MTL-5 with BERT-base and evaluate its performance on the test set:

python3 stem_cell_hypothesis/en_bert_base/joint/all.py

Pruning Experiments

The following script will train STL-POS-DP with BERT-base and evaluate its performance on the test set:

python3 stem_cell_hypothesis/en_bert_base/gate/pos.py

You can monitor the pruning process in real time via tensorboard:

tensorboard --logdir=data/model/mtl/ontonotes_bert_base_en/gated/pos/0/runs --samples_per_plugin images=1000

which will show how the heads gradually get claimed in http://localhost:6007/#images:

Once 3 runs are finished, you can visualize the overlap of head utilization across runs via:

python3 stem_cell_hypothesis/en_bert_base/gate/vis_gate_overlap_rgb.py

which will generate the following figure (1a):

Similarly, Figure 1g is generated with stem_cell_hypothesis/en_bert_base/gate/vis_gate_overlap_tasks_gray.py.

Probing Experiments

Once a model is trained, you can probe its representations via the scripts in stem_cell_hypothesis/en_bert_base/head. For example, to probe STL-POS performance, run:

python3 stem_cell_hypothesis/en_bert_base/head/pos.py
python3 stem_cell_hypothesis/en_bert_base/head/vis/pos.py

which generates Figure 4:

You may need to manually change the path and update new results in the scripts.

To probe the unsupervised BERT performance for a single task, e.g., SRL, run:

python3 stem_cell_hypothesis/en_bert_base/head/srl_dot.py

which generates Figure 3:

Although not included in the paper due to page limitation, experiments of Chinese, BERT-large, ALBERT, etc. are uploaded to stem_cell_hypothesis. Feel free to run them for your interest.

Citation

If you use this repository in your research, please kindly cite our EMNLP2021 paper:

@inproceedings{he-choi-2021-stem,
    title = "The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders",
    author = "He, Han and Choi, Jinho D.",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-main.451",
    pages = "5555--5577",
    abstract = "Multi-task learning with transformer encoders (MTL) has emerged as a powerful technique to improve performance on closely-related tasks for both accuracy and efficiency while a question still remains whether or not it would perform as well on tasks that are distinct in nature. We first present MTL results on five NLP tasks, POS, NER, DEP, CON, and SRL, and depict its deficiency over single-task learning. We then conduct an extensive pruning analysis to show that a certain set of attention heads get claimed by most tasks during MTL, who interfere with one another to fine-tune those heads for their own objectives. Based on this finding, we propose the Stem Cell Hypothesis to reveal the existence of attention heads naturally talented for many tasks that cannot be jointly trained to create adequate embeddings for all of those tasks. Finally, we design novel parameter-free probes to justify our hypothesis and demonstrate how attention heads are transformed across the five tasks during MTL through label analysis.",
}

Codes for our paper The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders published to EMNLP 2021.

Related tags

Overview

The Stem Cell Hypothesis

Installation

Data Pre-processing

Experiments

Single Task Learning

Multi-Task Learning

Pruning Experiments

Probing Experiments

Citation

Owner

Emory NLP

Multitask Learning Strengthens Adversarial Robustness

基于AlphaPose的TensorRT加速

Pytorch implementation of "Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling"

Repo for "Physion: Evaluating Physical Prediction from Vision in Humans and Machines" submission to NeurIPS 2021 (Datasets & Benchmarks track)

Code for "Adversarial Attack Generation Empowered by Min-Max Optimization", NeurIPS 2021

以孤立语假设和宽度优先搜索为基础，构建了一种多通道堆叠注意力Transformer结构的斗地主ai

Pairwise learning neural link prediction for ogb link prediction

A python library for time-series smoothing and outlier detection in a vectorized way.

Redash reset for python

The Turing Change Point Detection Benchmark: An Extensive Benchmark Evaluation of Change Point Detection Algorithms on real-world data

PyTorch implementation of UNet++ (Nested U-Net).

EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

TensorFlow (Python) implementation of DeepTCN model for multivariate time series forecasting.

The story of Chicken for Club Bing

🗺 General purpose U-Network implemented in Keras for image segmentation

SpeechNAS Better Trade off between Latency and Accuracy for Large Scale Speaker Verification

Simple codebase for flexible neural net training

Multi-label Co-regularization for Semi-supervised Facial Action Unit Recognition (NeurIPS 2019)

Official PyTorch implementation of "Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble" (NeurIPS'21)

UAV-Networks-Routing is a Python simulator for experimenting routing algorithms and mac protocols on unmanned aerial vehicle networks.