Code for the paper "Are Sixteen Heads Really Better than One?"

Overview

Are Sixteen Heads Really Better than One?

This repository contains code to reproduce the experiments in our paper Are Sixteen Heads Really Better than One?.

Prerequisite

First, you will need python >=3.6 with pytorch>=1.0. Then, clone our forks of fairseq (for MT experiments) and pytorch-pretrained-BERT (for BERT):

# Fairseq
git clone https://github.com/pmichel31415/fairseq
# Pytorch pretrained BERT
git clone https://github.com/pmichel31415/pytorch-pretrained-BERT
cd pytorch-pretrained-BERT
git checkout paul
cd ..

If you are running into issues with pytorch-pretrained-BERT (because you have another version installed globally for instance), check out this work around (thanks @insop).

You will also need sacrebleu to evaluate BLEU score (pip install sacrebleu).

Ablation experiments

BERT

Running

bash experiments/BERT/heads_ablation.sh MNLI

Will fine-tune a pretrained BERT on MNLI (stored in ./models/MNLI) and perform the individual head ablation experiment from Section 3.1 in the paper alternatively you can run the experiment with CoLA, MRCP or SST-2 as a task in place of MNLI.

MT

You can obtain the pretrained WMT model from this link from the fairseq repo now this link. Use the Moses tokenizer and subword-nmt in conjunction to the BPE codes provided with the pretrained model to prepair any input file you want. Then run:

bash experiments/MT/wmt_ablation.sh $BPE_SEGMENTED_SRC_FILE $DETOKENIZED_REF_FILE

Systematic Pruning Experiments

BERT

To iteratively prune 10% heads in order of increasing importance run

bash experiments/BERT/heads_pruning.sh MNLI --normalize_pruning_by_layer

This will reuse the BERT model fine-tuned if you have run the ablation experiment before (otherwise it'll just fine-tune it for you). The output of this is very verbose, but you can get the gist of the result by calling grep "strategy\|results" -A1 on the output.

WMT

Similarly, just run:

bash experiments/MT/prune_wmt.sh $BPE_SEGMENTED_SRC_FILE $DETOKENIZED_REF_FILE

You might want to change the paths in the experiment files to point to the binarized fairseq dataset on whic you want to estimate importance scores.

Owner
Paul Michel
Laplace Postdoctoral Chair in Data Science at École Normale Supérieure, Paris
Paul Michel
profile tools for pytorch nn models

nnprof Introduction nnprof is a profile tool for pytorch neural networks. Features multi profile mode: nnprof support 4 profile mode: Layer level, Ope

Feng Wang 42 Jul 09, 2022
Big Bird: Transformers for Longer Sequences

BigBird, is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences. Moreover, BigBird comes along with a theoretical understanding of the c

Google Research 457 Dec 23, 2022
Chinese segmentation library

What is loso? loso is a Chinese segmentation system written in Python. It was developed by Victor Lin ( Fang-Pen Lin 82 Jun 28, 2022

Official code for "Parser-Free Virtual Try-on via Distilling Appearance Flows", CVPR 2021

Parser-Free Virtual Try-on via Distilling Appearance Flows, CVPR 2021 Official code for CVPR 2021 paper 'Parser-Free Virtual Try-on via Distilling App

395 Jan 03, 2023
SentimentArcs: a large ensemble of dozens of sentiment analysis models to analyze emotion in text over time

SentimentArcs - Emotion in Text An end-to-end pipeline based on Jupyter notebooks to detect, extract, process and anlayze emotion over time in text. E

jon_chun 14 Dec 19, 2022
DomainWordsDict, Chinese words dict that contains more than 68 domains, which can be used as text classification、knowledge enhance task

DomainWordsDict, Chinese words dict that contains more than 68 domains, which can be used as text classification、knowledge enhance task。涵盖68个领域、共计916万词的专业词典知识库,可用于文本分类、知识增强、领域词汇库扩充等自然语言处理应用。

liuhuanyong 357 Dec 24, 2022
Saptak Bhoumik 14 May 24, 2022
TextFlint is a multilingual robustness evaluation platform for natural language processing tasks,

TextFlint is a multilingual robustness evaluation platform for natural language processing tasks, which unifies general text transformation, task-specific transformation, adversarial attack, sub-popu

TextFlint 587 Dec 20, 2022
Conversational-AI-ChatBot - Intelligent ChatBot built with Microsoft's DialoGPT transformer to make conversations with human users!

Conversational AI ChatBot Intelligent ChatBot built with Microsoft's DialoGPT transformer to make conversations with human users! In this project? Thi

Rajkumar Lakshmanamoorthy 6 Nov 30, 2022
Reproducing the Linear Multihead Attention introduced in Linformer paper (Linformer: Self-Attention with Linear Complexity)

Linear Multihead Attention (Linformer) PyTorch Implementation of reproducing the Linear Multihead Attention introduced in Linformer paper (Linformer:

Kui Xu 58 Dec 23, 2022
Write Alphabet, Words and Sentences with your eyes.

The-Next-Gen-AI-Eye-Writer The Eye tracking Technique has become one of the most popular techniques within the human and computer interaction era, thi

Rohan Kasabe 2 Apr 05, 2022
Course project of [email protected]

NaiveMT Prepare Clone this repository git clone [email protected]:Poeroz/NaiveMT.git

Poeroz 2 Apr 24, 2022
KakaoBrain KoGPT (Korean Generative Pre-trained Transformer)

KoGPT KoGPT (Korean Generative Pre-trained Transformer) https://github.com/kakaobrain/kogpt https://huggingface.co/kakaobrain/kogpt Model Descriptions

Kakao Brain 797 Dec 26, 2022
Transcribing audio files using Hugging Face's implementation of Wav2Vec2 + "chain-linking" NLP tasks to combine speech-to-text with downstream tasks like translation and summarisation.

PART 2: CHAIN LINKING AUDIO-TO-TEXT NLP TASKS 2A: TRANSCRIBE-TRANSLATE-SENTIMENT-ANALYSIS In notebook3.0, I demo a simple workflow to: transcribe a lo

Chua Chin Hon 30 Jul 13, 2022
PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

StyleSpeech - PyTorch Implementation PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation. Status (2021.06.09

Keon Lee 142 Jan 06, 2023
多语言降噪预训练模型MBart的中文生成任务

mbart-chinese 基于mbart-large-cc25 的中文生成任务 Input source input: text + /s + lang_code target input: lang_code + text + /s Usage token_ids_mapping.jso

11 Sep 19, 2022
Random-Word-Generator - Generates meaningful words from dictionary with given no. of letters and words.

Random Word Generator Generates meaningful words from dictionary with given no. of letters and words. This might be useful for generating short links

Mohammed Rabil 1 Jan 01, 2022
✨Rubrix is a production-ready Python framework for exploring, annotating, and managing data in NLP projects.

✨A Python framework to explore, label, and monitor data for NLP projects

Recognai 1.5k Jan 02, 2023
⚖️ A Statutory Article Retrieval Dataset in French.

A Statutory Article Retrieval Dataset in French This repository contains the Belgian Statutory Article Retrieval Dataset (BSARD), as well as the code

Maastricht Law & Tech Lab 19 Nov 17, 2022
An IVR Chatbot which can exponentially reduce the burden of companies as well as can improve the consumer/end user experience.

IVR-Chatbot Achievements 🏆 Team Uhtred won the Maverick 2.0 Bot-a-thon 2021 organized by AbInbev India. ❓ Problem Statement As we all know that, lot

ARYAMAAN PANDEY 9 Dec 08, 2022