Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

Last update: Nov 25, 2022

Related tags

Overview

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

Citation

Please cite as:

@inproceedings{liu2020understanding,
  title={Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning},
  author={Liu, Xuebo and Wang, Longyue and Wong, Derek F and Ding, Liang and Chao, Lidia S and Tu, Zhaopeng},
  booktitle={International Conference on Learning Representations},
  year={2021}
}

Requirements and Installation

This implementation is based on fairseq(v0.9.0)

PyTorch version >= 1.2.0
Python version >= 3.6

git clone https://github.com/SunbowLiu/SurfaceFusion
cd SurfaceFusion
pip install --editable .

Preprocess

Download WMT16 En-Ro Data (Original)

tar -zxvf wmt16.tar.gz
PATH_TO_RAW_DATA=wmt16/en-ro
PATH_TO_DATA=wmt16/en-ro/data-bin
python preprocess.py \
    --source-lang en --target-lang ro \
    --trainpref $PATH_TO_RAW_DATA/train/corpus.bpe \
    --validpref $PATH_TO_RAW_DATA/dev/dev.bpe \
    --testpref $PATH_TO_RAW_DATA/test/test.bpe \
    --destdir $PATH_TO_DATA \
    --joined-dictionary \
    --workers 20

Train (8 gpus)

OUTPUT=checkpoints
python train.py \
    $PATH_TO_DATA \
    --arch transformer_surface_fusion --share-all-embeddings \
    --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \
    --lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates 4000 \
    --lr 0.0005 --min-lr 1e-09 \
    --dropout 0.3  --weight-decay 0.0 \
    --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
    --save-dir $OUTPUT --seed 333 --ddp-backend=no_c10d --fp16 \
    --max-tokens 2048 --update-freq 1 --max-update 60000 --keep-last-epochs 1 \
    --surfacefusion att --sf-gate 0.8 --sf-mode hard

It is noted that we use 16k batch size, i.e., max-tokens * update-freq * num_of_gpus = 16k.

Evaluation (1 gpu)

python generate.py \
    $PATH_TO_DATA \
    --path $OUTPUT/checkpoint_best.pt \
    --beam 4 --lenpen 1.0 --remove-bpe

The model can gain nearly 35.1 BLEU scores.

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

Related tags

Overview

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

Citation

Requirements and Installation

Preprocess

Train (8 gpus)

Evaluation (1 gpu)

Owner

Sunbow Liu

Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [2021]

Fortuitous Forgetting in Connectionist Networks

Computations and statistics on manifolds with geometric structures.

Deep Watershed Transform for Instance Segmentation

The repo contains the code of the ACL2020 paper `Dice Loss for Data-imbalanced NLP Tasks`

Poisson Surface Reconstruction for LiDAR Odometry and Mapping

Do Smart Glasses Dream of Sentimental Visions? Deep Emotionship Analysis for Eyewear Devices

PyTorch Implementation of NCSOFT's FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis

EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

Development Kit for the SoccerNet Challenge

P-Tuning v2: Prompt Tuning Can Be Comparable to Finetuning Universally Across Scales and Tasks

ParaGen is a PyTorch deep learning framework for parallel sequence generation

A CV toolkit for my papers.

Unofficial PyTorch implementation of MobileViT based on paper "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer".

A free, multiplatform SDK for real-time facial motion capture using blendshapes, and rigid head pose in 3D space from any RGB camera, photo, or video.

🤖 Project template for your next awesome AI project. 🦾

A collection of models for image<->text generation in ACM MM 2021.

Tensorflow Implementation of SMU: SMOOTH ACTIVATION FUNCTION FOR DEEP NETWORKS USING SMOOTHING MAXIMUM TECHNIQUE

Notebooks em Python para Métodos Eletromagnéticos

Official git for "CTAB-GAN: Effective Table Data Synthesizing"