Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

Last update: Nov 25, 2022

Related tags

Overview

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

Citation

Please cite as:

@inproceedings{liu2020understanding,
  title={Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning},
  author={Liu, Xuebo and Wang, Longyue and Wong, Derek F and Ding, Liang and Chao, Lidia S and Tu, Zhaopeng},
  booktitle={International Conference on Learning Representations},
  year={2021}
}

Requirements and Installation

This implementation is based on fairseq(v0.9.0)

PyTorch version >= 1.2.0
Python version >= 3.6

git clone https://github.com/SunbowLiu/SurfaceFusion
cd SurfaceFusion
pip install --editable .

Preprocess

Download WMT16 En-Ro Data (Original)

tar -zxvf wmt16.tar.gz
PATH_TO_RAW_DATA=wmt16/en-ro
PATH_TO_DATA=wmt16/en-ro/data-bin
python preprocess.py \
    --source-lang en --target-lang ro \
    --trainpref $PATH_TO_RAW_DATA/train/corpus.bpe \
    --validpref $PATH_TO_RAW_DATA/dev/dev.bpe \
    --testpref $PATH_TO_RAW_DATA/test/test.bpe \
    --destdir $PATH_TO_DATA \
    --joined-dictionary \
    --workers 20

Train (8 gpus)

OUTPUT=checkpoints
python train.py \
    $PATH_TO_DATA \
    --arch transformer_surface_fusion --share-all-embeddings \
    --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \
    --lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates 4000 \
    --lr 0.0005 --min-lr 1e-09 \
    --dropout 0.3  --weight-decay 0.0 \
    --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
    --save-dir $OUTPUT --seed 333 --ddp-backend=no_c10d --fp16 \
    --max-tokens 2048 --update-freq 1 --max-update 60000 --keep-last-epochs 1 \
    --surfacefusion att --sf-gate 0.8 --sf-mode hard

It is noted that we use 16k batch size, i.e., max-tokens * update-freq * num_of_gpus = 16k.

Evaluation (1 gpu)

python generate.py \
    $PATH_TO_DATA \
    --path $OUTPUT/checkpoint_best.pt \
    --beam 4 --lenpen 1.0 --remove-bpe

The model can gain nearly 35.1 BLEU scores.

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

Related tags

Overview

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

Citation

Requirements and Installation

Preprocess

Train (8 gpus)

Evaluation (1 gpu)

Owner

Sunbow Liu

TransZero++: Cross Attribute-guided Transformer for Zero-Shot Learning

Tensorflow implementation of "Learning Deconvolution Network for Semantic Segmentation"

Methods to get the probability of a changepoint in a time series.

AI-Bot - 一个基于watermelon改造的OpenAI-GPT-2的智能机器人

A Tensorfflow implementation of Attend, Infer, Repeat

wgan, wgan2(improved, gp), infogan, and dcgan implementation in lasagne, keras, pytorch

CS5242_2021 - Neural Networks and Deep Learning, NUS CS5242, 2021

SkipGNN: Predicting Molecular Interactions with Skip-Graph Networks (Scientific Reports)

Computer-Vision-Paper-Reviews - Computer Vision Paper Reviews with Key Summary along Papers & Codes

Learning Correspondence from the Cycle-consistency of Time (CVPR 2019)

DimReductionClustering - Dimensionality Reduction + Clustering + Unsupervised Score Metrics

State of the Art Neural Networks for Generative Deep Learning

deep-prae

Meta-Learning Sparse Implicit Neural Representations (NeurIPS 2021)

HugsVision is a easy to use huggingface wrapper for state-of-the-art computer vision

Synthetic Scene Text from 3D Engines

PyImpetus is a Markov Blanket based feature subset selection algorithm that considers features both separately and together as a group in order to provide not just the best set of features but also the best combination of features

FlowTorch is a PyTorch library for learning and sampling from complex probability distributions using a class of methods called Normalizing Flows

Numerical Methods with Python, Numpy and Matplotlib

PyTorch code for JEREX: Joint Entity-Level Relation Extractor