Multi-Task Deep Neural Networks for Natural Language Understanding

Overview

License: MIT Travis-CI

New Release
We released Adversarial training for both LM pre-training/finetuning and f-divergence.

Large-scale Adversarial training for LMs: ALUM code.
If you want to use the old version, please use following cmd to clone the code:
git clone -b v0.1 https://github.com/namisan/mt-dnn.git

Multi-Task Deep Neural Networks for Natural Language Understanding

This PyTorch package implements the Multi-Task Deep Neural Networks (MT-DNN) for Natural Language Understanding, as described in:

Xiaodong Liu*, Pengcheng He*, Weizhu Chen and Jianfeng Gao
Multi-Task Deep Neural Networks for Natural Language Understanding
ACL 2019
*: Equal contribution

Xiaodong Liu, Pengcheng He, Weizhu Chen and Jianfeng Gao
Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding
arXiv version

Pengcheng He, Xiaodong Liu, Weizhu Chen and Jianfeng Gao
Hybrid Neural Network Model for Commonsense Reasoning
arXiv version

Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao and Jiawei Han
On the Variance of the Adaptive Learning Rate and Beyond
arXiv version

Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao and Tuo Zhao
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization
arXiv version

Xiaodong Liu, Yu Wang, Jianshu Ji, Hao Cheng, Xueyun Zhu, Emmanuel Awa, Pengcheng He, Weizhu Chen, Hoifung Poon, Guihong Cao, Jianfeng Gao
The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding
arXiv version

Xiaodong Liu, Hao Cheng, Pengcheng He, Weizhu Chen, Yu Wang, Hoifung Poon and Jianfeng Gao
Adversarial Training for Large Neural Language Models
arXiv version

Hao Cheng and Xiaodong Liu and Lis Pereira and Yaoliang Yu and Jianfeng Gao
Posterior Differential Regularization with f-divergence for Improving Model Robustness
arXiv version

Quickstart

Setup Environment

Install via pip:

  1. python3.6
    Reference to download and install : https://www.python.org/downloads/release/python-360/

  2. install requirements
    > pip install -r requirements.txt

Use docker:

  1. Pull docker
    > docker pull allenlao/pytorch-mt-dnn:v0.5

  2. Run docker
    > docker run -it --rm --runtime nvidia allenlao/pytorch-mt-dnn:v1.2 bash
    Please refer to the following link if you first use docker: https://docs.docker.com/

Train a toy MT-DNN model

  1. Download data
    > sh download.sh
    Please refer to download GLUE dataset: https://gluebenchmark.com/

  2. Preprocess data
    > sh experiments/glue/prepro.sh

  3. Training
    > python train.py

Note that we ran experiments on 4 V100 GPUs for base MT-DNN models. You may need to reduce batch size for other GPUs.

GLUE Result reproduce

  1. MTL refinement: refine MT-DNN (shared layers), initialized with the pre-trained BERT model, via MTL using all GLUE tasks excluding WNLI to learn a new shared representation.
    Note that we ran this experiment on 8 V100 GPUs (32G) with a batch size of 32.

    • Preprocess GLUE data via the aforementioned script
    • Training:
      >scripts\run_mt_dnn.sh
  2. Finetuning: finetune MT-DNN to each of the GLUE tasks to get task-specific models.
    Here, we provide two examples, STS-B and RTE. You can use similar scripts to finetune all the GLUE tasks.

    • Finetune on the STS-B task
      > scripts\run_stsb.sh
      You should get about 90.5/90.4 on STS-B dev in terms of Pearson/Spearman correlation.
    • Finetune on the RTE task
      > scripts\run_rte.sh
      You should get about 83.8 on RTE dev in terms of accuracy.

SciTail & SNIL Result reproduce (Domain Adaptation)

  1. Domain Adaptation on SciTail
    >scripts\scitail_domain_adaptation_bash.sh

  2. Domain Adaptation on SNLI
    >scripts\snli_domain_adaptation_bash.sh

Sequence Labeling Task

  1. Preprocess data
    a) Download NER data to data/ner including: {train/valid/test}.txt
    b) Convert NER data to the canonical format: > python experiments\ner\prepro.py --data data\ner --output_dir data\canonical_data
    c) Preprocess the canonical data to the MT-DNN format: > python prepro_std.py --root_dir data\canonical_data --task_def experiments\ner\ner_task_def.yml --model bert-base-uncased

  2. Training
    > python train.py --data_dir <data-path> --init_checkpoint <bert-base-uncased> --train_dataset squad,squad-v2 --test_dataset squad,squad-v2 --task_def experiments\squad\squad_task_def.yml

Question Answer Task

  1. Preprocess data
    a) Download SQuAD data to data/squad including: {train/valid}.txt and then change file name to: {squad_train/squad_dev}.json
    b) Convert data to the MT-DNN format: > python experiments\squad\squad_prepro.py --root_dir data\canonical_data --task_def experiments\squad\squad_task_def.yml --model bert-base-uncased

  2. Training
    > python train.py --data_dir <data-path> --init_checkpoint <bert-model> --train_dataset ner --test_dataset ner --task_def experiments\ner\ner_task_def.yml

SMART

Adv training at the fine-tuning stages: > python train.py --data_dir <data-path> --init_checkpoint <bert/mt-dnn-model> --train_dataset mnli --test_dataset mnli_matched,mnli_mismatched --task_def experiments\glue\glue_task_def.yml --adv_train --adv_opt 1

HNN

The code to reproduce HNN is under hnn folder, to reproduce the results of HNN, run

> hnn/script/hnn_train_large.sh

Extract embeddings

  1. Extracting embeddings of a pair text example
    >python extractor.py --do_lower_case --finput input_examples\pair-input.txt --foutput input_examples\pair-output.json --bert_model bert-base-uncased --checkpoint mt_dnn_models\mt_dnn_base.pt
    Note that the pair of text is split by a special token |||. You may refer input_examples\pair-output.json as example.

  2. Extracting embeddings of a single sentence example
    >python extractor.py --do_lower_case --finput input_examples\single-input.txt --foutput input_examples\single-output.json --bert_model bert-base-uncased --checkpoint mt_dnn_models\mt_dnn_base.pt

Speed up Training

  1. Gradient Accumulation
    If you have small GPUs, you may need to use the gradient accumulation to make training stable.
    For example, if you use the flag: --grad_accumulation_step 4 during the training, the actual batch size will be batch_size * 4.

  2. FP16 The current version of MT-DNN also supports FP16 training, and please install apex.
    You just need to turn on the flag during the training: --fp16
    Please refer the script: scripts\run_mt_dnn_gc_fp16.sh

Convert Tensorflow BERT model to the MT-DNN format

Here, we go through how to convert a Chinese Tensorflow BERT model into mt-dnn format.

  1. Download BERT model from the Google bert web: https://github.com/google-research/bert

  2. Run the following script for MT-DNN format
    python scripts\convert_tf_to_pt.py --tf_checkpoint_root chinese_L-12_H-768_A-12\ --pytorch_checkpoint_path chinese_L-12_H-768_A-12\bert_base_chinese.pt

TODO

  • Publish pretrained Tensorflow checkpoints.

FAQ

Did you share the pretrained mt-dnn models?

Yes, we released the pretrained shared embedings via MTL which are aligned to BERT base/large models: mt_dnn_base.pt and mt_dnn_large.pt.
To obtain the similar models:

  1. run the >sh scripts\run_mt_dnn.sh, and then pick the best checkpoint based on the average dev preformance of MNLI/RTE.
  2. strip the task-specific layers via scritps\strip_model.py.

Why SciTail/SNLI do not enable SAN?

For SciTail/SNLI tasks, the purpose is to test generalization of the learned embedding and how easy it is adapted to a new domain instead of complicated model structures for a direct comparison with BERT. Thus, we use a linear projection on the all domain adaptation settings.

What is the difference between V1 and V2

The difference is in the QNLI dataset. Please refere to the GLUE official homepage for more details. If you want to formulate QNLI as pair-wise ranking task as our paper, make sure that you use the old QNLI data.
Then run the prepro script with flags: > sh experiments/glue/prepro.sh --old_glue
If you have issues to access the old version of the data, please contact the GLUE team.

Did you fine-tune single task for your GLUE leaderboard submission?

We can use the multi-task refinement model to run the prediction and produce a reasonable result. But to achieve a better result, it requires a fine-tuneing on each task. It is worthing noting the paper in arxiv is a littled out-dated and on the old GLUE dataset. We will update the paper as we mentioned below.

Notes and Acknowledgments

BERT pytorch is from: https://github.com/huggingface/pytorch-pretrained-BERT
BERT: https://github.com/google-research/bert
We also used some code from: https://github.com/kevinduh/san_mrc

Related Projects/Codebase

  1. Pretrained UniLM: https://github.com/microsoft/unilm
  2. Pretrained Response Generation Model: https://github.com/microsoft/DialoGPT
  3. Internal MT-DNN repo: https://github.com/microsoft/mt-dnn

How do I cite MT-DNN?

@inproceedings{liu2019mt-dnn,
    title = "Multi-Task Deep Neural Networks for Natural Language Understanding",
    author = "Liu, Xiaodong and He, Pengcheng and Chen, Weizhu and Gao, Jianfeng",
    booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2019",
    address = "Florence, Italy",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/P19-1441",
    pages = "4487--4496"
}


@article{liu2019mt-dnn-kd,
  title={Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding},
  author={Liu, Xiaodong and He, Pengcheng and Chen, Weizhu and Gao, Jianfeng},
  journal={arXiv preprint arXiv:1904.09482},
  year={2019}
}


@article{he2019hnn,
  title={A Hybrid Neural Network Model for Commonsense Reasoning},
  author={He, Pengcheng and Liu, Xiaodong and Chen, Weizhu and Gao, Jianfeng},
  journal={arXiv preprint arXiv:1907.11983},
  year={2019}
}


@article{liu2019radam,
  title={On the Variance of the Adaptive Learning Rate and Beyond},
  author={Liu, Liyuan and Jiang, Haoming and He, Pengcheng and Chen, Weizhu and Liu, Xiaodong and Gao, Jianfeng and Han, Jiawei},
  journal={arXiv preprint arXiv:1908.03265},
  year={2019}
}


@article{jiang2019smart,
  title={SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization},
  author={Jiang, Haoming and He, Pengcheng and Chen, Weizhu and Liu, Xiaodong and Gao, Jianfeng and Zhao, Tuo},
  journal={arXiv preprint arXiv:1911.03437},
  year={2019}
}


@article{liu2020mtmtdnn,
  title={The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding},
  author={Liu, Xiaodong and Wang, Yu and Ji, Jianshu and Cheng, Hao and Zhu, Xueyun and Awa, Emmanuel and He, Pengcheng and Chen, Weizhu and Poon, Hoifung and Cao, Guihong and Jianfeng Gao},
  journal={arXiv preprint arXiv:2002.07972},
  year={2020}
}


@article{liu2020alum,
  title={Adversarial Training for Large Neural Language Models},
  author={Liu, Xiaodong and Cheng, Hao and He, Pengcheng and Chen, Weizhu and Wang, Yu and Poon, Hoifung and Gao, Jianfeng},
  journal={arXiv preprint arXiv:2004.08994},
  year={2020}
}

@article{cheng2020posterior,
  title={Posterior Differential Regularization with f-divergence for Improving Model Robustness},
  author={Cheng, Hao and Liu, Xiaodong and Pereira, Lis and Yu, Yaoliang and Gao, Jianfeng},
  journal={arXiv preprint arXiv:2010.12638},
  year={2020}
}

Contact Information

For help or issues using MT-DNN, please submit a GitHub issue.

For personal communication related to this package, please contact Xiaodong Liu ([email protected]), Yu Wang ([email protected]), Pengcheng He ([email protected]), Weizhu Chen ([email protected]), Jianshu Ji ([email protected]), Hao Cheng ([email protected]) or Jianfeng Gao ([email protected]).

Owner
Xiaodong
And if you gaze long into an abyss, the abyss also gazes into you --Friedrich Nietzsche
Xiaodong
Code & Experiments for "LILA: Language-Informed Latent Actions" to be presented at the Conference on Robot Learning (CoRL) 2021.

LILA LILA: Language-Informed Latent Actions Code and Experiments for Language-Informed Latent Actions (LILA), for using natural language to guide assi

Sidd Karamcheti 11 Nov 25, 2022
Finding Biological Plausibility for Adversarially Robust Features via Metameric Tasks

Adversarially-Robust-Periphery Code + Data from the paper "Finding Biological Plausibility for Adversarially Robust Features via Metameric Tasks" by A

Anne Harrington 2 Feb 07, 2022
Meli Data Challenge 2021 - First Place Solution

My solution for the Meli Data Challenge 2021

Matias Moreyra 23 Mar 09, 2022
Extracting knowledge graphs from language models as a diagnostic benchmark of model performance.

Interpreting Language Models Through Knowledge Graph Extraction Idea: How do we interpret what a language model learns at various stages of training?

EPFL Machine Learning and Optimizationย Laboratory 9 Oct 25, 2022
An implementation of Fastformer: Additive Attention Can Be All You Need in TensorFlow

Fast Transformer This repo implements Fastformer: Additive Attention Can Be All You Need by Wu et al. in TensorFlow. Fast Transformer is a Transformer

Rishit Dagli 139 Dec 28, 2022
CausaLM: Causal Model Explanation Through Counterfactual Language Models

CausaLM: Causal Model Explanation Through Counterfactual Language Models Authors: Amir Feder, Nadav Oved, Uri Shalit, Roi Reichart Abstract: Understan

Amir Feder 39 Jul 10, 2022
KIND: an Italian Multi-Domain Dataset for Named Entity Recognition

KIND (Kessler Italian Named-entities Dataset) KIND is an Italian dataset for Named-Entity Recognition. It contains more than one million tokens with t

Digital Humanities 5 Jun 21, 2022
Easily Process a Batch of Cox Models

ezcox: Easily Process a Batch of Cox Models The goal of ezcox is to operate a batch of univariate or multivariate Cox models and return tidy result. โฌ

Shixiang Wang 15 May 23, 2022
Prevent `CUDA error: out of memory` in just 1 line of code.

๐Ÿจ Koila Koila solves CUDA error: out of memory error painlessly. Fix it with just one line of code, and forget it. ๐Ÿš€ Features ๐Ÿ™… Prevents CUDA error

RenChu Wang 1.7k Jan 02, 2023
A Benchmark For Measuring Systematic Generalization of Multi-Hierarchical Reasoning

Orchard Dataset This repository contains the code used for generating the Orchard Dataset, as seen in the Multi-Hierarchical Reasoning in Sequences: S

Bill Pung 1 Jun 05, 2022
[ICLR 2022] DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR

DAB-DETR This is the official pytorch implementation of our ICLR 2022 paper DAB-DETR. Authors: Shilong Liu, Feng Li, Hao Zhang, Xiao Yang, Xianbiao Qi

336 Dec 25, 2022
Who calls the shots? Rethinking Few-Shot Learning for Audio (WASPAA 2021)

rethink-audio-fsl This repo contains the source code for the paper "Who calls the shots? Rethinking Few-Shot Learning for Audio." (WASPAA 2021) Table

Yu Wang 34 Dec 24, 2022
This repository is an implementation of our NeurIPS 2021 paper (Stylized Dialogue Generation with Multi-Pass Dual Learning) in PyTorch.

MPDL---TODO This repository is an implementation of our NeurIPS 2021 paper (Stylized Dialogue Generation with Multi-Pass Dual Learning) in PyTorch. Ci

CodebaseLi 3 Nov 27, 2022
[ECCV2020] Content-Consistent Matching for Domain Adaptive Semantic Segmentation

[ECCV20] Content-Consistent Matching for Domain Adaptive Semantic Segmentation This is a PyTorch implementation of CCM. News: GTA-4K list is available

Guangrui Li 88 Aug 25, 2022
Imagededup - ๐Ÿ˜Ž Finding duplicate images made easy

imagededup is a python package that simplifies the task of finding exact and near duplicates in an image collection.

idealo 4.3k Jan 07, 2023
RefineGNN - Iterative refinement graph neural network for antibody sequence-structure co-design (RefineGNN)

Iterative refinement graph neural network for antibody sequence-structure co-des

Wengong Jin 83 Dec 31, 2022
Python Tensorflow 2 scripts for detecting objects of any class in an image without knowing their label.

Tensorflow-Mobile-Generic-Object-Localizer Python Tensorflow 2 scripts for detecting objects of any class in an image without knowing their label. Ori

Ibai Gorordo 11 Nov 15, 2022
The code for the CVPR 2021 paper Neural Deformation Graphs, a novel approach for globally-consistent deformation tracking and 3D reconstruction of non-rigid objects.

Neural Deformation Graphs Project Page | Paper | Video Neural Deformation Graphs for Globally-consistent Non-rigid Reconstruction Aljaลพ Boลพiฤ, Pablo P

Aljaz Bozic 134 Dec 16, 2022
Pmapper is a super-resolution and deconvolution toolkit for python 3.6+

pmapper pmapper is a super-resolution and deconvolution toolkit for python 3.6+. PMAP stands for Poisson Maximum A-Posteriori, a highly flexible and a

NASA Jet Propulsion Laboratory 8 Nov 06, 2022
Large-scale language modeling tutorials with PyTorch

Large-scale language modeling tutorials with PyTorch ์•ˆ๋…•ํ•˜์„ธ์š”. ์ €๋Š” TUNiB์—์„œ ๋จธ์‹ ๋Ÿฌ๋‹ ์—”์ง€๋‹ˆ์–ด๋กœ ๊ทผ๋ฌด ์ค‘์ธ ๊ณ ํ˜„์›…์ž…๋‹ˆ๋‹ค. ์ด ์ž๋ฃŒ๋Š” ๋Œ€๊ทœ๋ชจ ์–ธ์–ด๋ชจ๋ธ ๊ฐœ๋ฐœ์— ํ•„์š”ํ•œ ์—ฌ๋Ÿฌ๊ฐ€์ง€ ๊ธฐ์ˆ ๋“ค์„ ์†Œ๊ฐœ๋“œ๋ฆฌ๊ธฐ ์œ„ํ•ด ๋งˆ๋ จํ•˜์˜€์œผ๋ฉฐ ๊ธฐ๋ณธ์ ์œผ๋กœ

TUNiB 172 Dec 29, 2022