Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

Overview

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

Citation

Please cite as:

@inproceedings{liu2020understanding,
  title={Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning},
  author={Liu, Xuebo and Wang, Longyue and Wong, Derek F and Ding, Liang and Chao, Lidia S and Tu, Zhaopeng},
  booktitle={International Conference on Learning Representations},
  year={2021}
}

Requirements and Installation

This implementation is based on fairseq(v0.9.0)

  • PyTorch version >= 1.2.0
  • Python version >= 3.6
git clone https://github.com/SunbowLiu/SurfaceFusion
cd SurfaceFusion
pip install --editable .

Preprocess

Download WMT16 En-Ro Data (Original)

tar -zxvf wmt16.tar.gz
PATH_TO_RAW_DATA=wmt16/en-ro
PATH_TO_DATA=wmt16/en-ro/data-bin
python preprocess.py \
    --source-lang en --target-lang ro \
    --trainpref $PATH_TO_RAW_DATA/train/corpus.bpe \
    --validpref $PATH_TO_RAW_DATA/dev/dev.bpe \
    --testpref $PATH_TO_RAW_DATA/test/test.bpe \
    --destdir $PATH_TO_DATA \
    --joined-dictionary \
    --workers 20

Train (8 gpus)

OUTPUT=checkpoints
python train.py \
    $PATH_TO_DATA \
    --arch transformer_surface_fusion --share-all-embeddings \
    --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \
    --lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates 4000 \
    --lr 0.0005 --min-lr 1e-09 \
    --dropout 0.3  --weight-decay 0.0 \
    --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
    --save-dir $OUTPUT --seed 333 --ddp-backend=no_c10d --fp16 \
    --max-tokens 2048 --update-freq 1 --max-update 60000 --keep-last-epochs 1 \
    --surfacefusion att --sf-gate 0.8 --sf-mode hard

It is noted that we use 16k batch size, i.e., max-tokens * update-freq * num_of_gpus = 16k.

Evaluation (1 gpu)

python generate.py \
    $PATH_TO_DATA \
    --path $OUTPUT/checkpoint_best.pt \
    --beam 4 --lenpen 1.0 --remove-bpe

The model can gain nearly 35.1 BLEU scores.

Owner
Sunbow Liu
NLP&MT PhD Student
Sunbow Liu
Learnable Multi-level Frequency Decomposition and Hierarchical Attention Mechanism for Generalized Face Presentation Attack Detection

LMFD-PAD Note This is the official repository of the paper: LMFD-PAD: Learnable Multi-level Frequency Decomposition and Hierarchical Attention Mechani

28 Dec 02, 2022
The PyTorch implementation of Directed Graph Contrastive Learning (DiGCL), NeurIPS-2021

Directed Graph Contrastive Learning The PyTorch implementation of Directed Graph Contrastive Learning (DiGCL). In this paper, we present the first con

Tong Zekun 28 Jan 08, 2023
Simple torch.nn.module implementation of Alias-Free-GAN style filter and resample

Alias-Free-Torch Simple torch module implementation of Alias-Free GAN. This repository including Alias-Free GAN style lowpass sinc filter @filter.py A

이준혁(Junhyeok Lee) 64 Dec 22, 2022
A parallel framework for population-based multi-agent reinforcement learning.

MALib: A parallel framework for population-based multi-agent reinforcement learning MALib is a parallel framework of population-based learning nested

MARL @ SJTU 348 Jan 08, 2023
EmoTag helps you train emotion detection model for Chinese audios

emoTag emoTag helps you train emotion detection model for Chinese audios. Environment pip install -r requirement.txt Data We used Emotional Speech Dat

_zza 4 Sep 07, 2022
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped

CSWin-Transformer This repo is the official implementation of "CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows". Th

Microsoft 409 Jan 06, 2023
A Simulation Environment to train Robots in Large Realistic Interactive Scenes

iGibson: A Simulation Environment to train Robots in Large Realistic Interactive Scenes iGibson is a simulation environment providing fast visual rend

Stanford Vision and Learning Lab 493 Jan 04, 2023
MolRep: A Deep Representation Learning Library for Molecular Property Prediction

MolRep: A Deep Representation Learning Library for Molecular Property Prediction Summary MolRep is a Python package for fairly measuring algorithmic p

AI-Health @NSCC-gz 83 Dec 24, 2022
Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes

Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized C

Sam Bond-Taylor 139 Jan 04, 2023
Code needed to reproduce the examples found in "The Temporal Robustness of Stochastic Signals"

The Temporal Robustness of Stochastic Signals Code needed to reproduce the examples found in "The Temporal Robustness of Stochastic Signals" Case stud

0 Oct 28, 2021
Composable transformations of Python+NumPy programsComposable transformations of Python+NumPy programs

Chex Chex is a library of utilities for helping to write reliable JAX code. This includes utils to help: Instrument your code (e.g. assertions) Debug

DeepMind 506 Jan 08, 2023
Code to reproduce the experiments in the paper "Transformer Based Multi-Source Domain Adaptation" (EMNLP 2020)

Transformer Based Multi-Source Domain Adaptation Dustin Wright and Isabelle Augenstein To appear in EMNLP 2020. Read the preprint: https://arxiv.org/a

CopeNLU 36 Dec 05, 2022
Machine learning library for fast and efficient Gaussian mixture models

This repository contains code which implements the Stochastic Gaussian Mixture Model (S-GMM) for event-based datasets Dependencies CMake Premake4 Blaz

Omar Oubari 1 Dec 19, 2022
Safe Policy Optimization with Local Features

Safe Policy Optimization with Local Feature (SPO-LF) This is the source-code for implementing the algorithms in the paper "Safe Policy Optimization wi

Akifumi Wachi 6 Jun 05, 2022
Code for training and evaluation of the model from "Language Generation with Recurrent Generative Adversarial Networks without Pre-training"

Language Generation with Recurrent Generative Adversarial Networks without Pre-training Code for training and evaluation of the model from "Language G

Amir Bar 253 Sep 14, 2022
This project is a loose implementation of paper "Algorithmic Financial Trading with Deep Convolutional Neural Networks: Time Series to Image Conversion Approach"

Stock Market Buy/Sell/Hold prediction Using convolutional Neural Network This repo is an attempt to implement the research paper titled "Algorithmic F

Asutosh Nayak 136 Dec 28, 2022
PyTorch-based framework for Deep Hedging

PFHedge: Deep Hedging in PyTorch PFHedge is a PyTorch-based framework for Deep Hedging. PFHedge Documentation Neural Network Architecture for Efficien

139 Dec 30, 2022
Gradient-free global optimization algorithm for multidimensional functions based on the low rank tensor train format

ttopt Description Gradient-free global optimization algorithm for multidimensional functions based on the low rank tensor train (TT) format and maximu

5 May 23, 2022
PyTorch deep learning projects made easy.

PyTorch Template Project PyTorch deep learning project made easy. PyTorch Template Project Requirements Features Folder Structure Usage Config file fo

Victor Huang 3.8k Jan 01, 2023
[SIGGRAPH 2022 Journal Track] AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars

AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars Fangzhou Hong1*  Mingyuan Zhang1*  Liang Pan1  Zhongang Cai1,2,3  Lei Yang2 

Fangzhou Hong 749 Jan 04, 2023