(ACL-IJCNLP 2021) Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models.

Overview

BERT Convolutions

Code for the paper Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models. Contains experiments for integrating convolutions and self-attention in BERT models. Code is adapted from Huggingface Transformers. Model code is in src/transformers/modeling_bert.py. Run on Python 3.6.9 and Pytorch 1.7.1 (see requirements.txt).

Training

To train tokenizer, use custom_scripts/train_spm_tokenizer.py. To pre-train BERT with a plain text dataset:

python3 run_language_modeling.py \
--model_type=bert \
--tokenizer_name="./data/sentencepiece/spm.model" \
--config_name="./data/bert_base_config.json" \
--do_train --mlm --line_by_line \
--train_data_file="./data/training_text.txt" \
--per_device_train_batch_size=32 \
--save_steps=25000 \
--block_size=128 \
--max_steps=1000000 \
--warmup_steps=10000 \
--learning_rate=0.0001 --adam_epsilon=1e-6 --weight_decay=0.01 \
--output_dir="./bert-experiments/bert"

The code above produces a cached file of examples (a list of lists of token indices). Each example is an un-truncated and un-padded sentence pair (but includes [CLS] and [SEP] tokens). Convert these lists to an iterable text file using custom_scripts/shuffle_cached_dataset.py. Then, you can pre-train BERT using an iterable dataset (saving memory):

python3 run_language_modeling.py \
--model_type=bert \
--tokenizer_name="./data/sentencepiece/spm.model" \
--config_name="./data/bert_base_config.json" \
--do_train --mlm --train_iterable --line_by_line \
--train_data_file="./data/iterable_pairs_train.txt" \
--per_device_train_batch_size=32 \
--save_steps=25000 \
--block_size=128 \
--max_steps=1000000 \
--warmup_steps=10000 \
--learning_rate=0.0001 --adam_epsilon=1e-6 --weight_decay=0.01 \
--output_dir="./bert-experiments/bert"

Optional flags to change BERT architecture when pre-training from scratch:
In the following, qk uses query/key self-attention, convfixed is a fixed lightweight convolution, convq is query-based dynamic lightweight convolution (relative embeddings), convk is a key-based dynamic lightweight convolution, and convolution is a fixed depthwise convolution.

--attention_kernel="qk_convfixed_convq_convk [num_positions_each_dir]"

Remove absolute position embeddings:

--remove_position_embeddings

Convolutional values, using depthwise-separable (depth) convolutions for half of heads (mixed), and using no activation function (no_act) between the depthwise and pointwise convolutions:

--value_forward="convolution_depth_mixed_no_act [num_positions_each_dir] [num_convolution_groups]"

Convolutional queries/keys for half of heads:

--qk="convolution_depth_mixed_no_act [num_positions_each_dir] [num_convolution_groups]"

Fine-tuning

Training and evaluation for downstream GLUE tasks (note: batch size represents max batch size, because batch size is adjusted for each task):

python3 run_glue.py \
--data_dir="./glue-data/data-tsv" \
--task_name=ALL \
--save_steps=9999999 \
--max_seq_length 128 \
--per_device_train_batch_size 99999 \
--tokenizer_name="./data/sentencepiece/spm.model" \
--model_name_or_path="./bert-experiments/bert" \
--output_dir="./bert-experiments/bert-glue" \
--hyperparams="electra_base" \
--do_eval \
--do_train

Prediction

Run the fine-tuned models on the GLUE test set:
This adds a file with test set predictions to each GLUE task directory.

python3 run_glue.py \
--data_dir="./glue-data/data-tsv" \
--task_name=ALL \
--save_steps=9999999 \
--max_seq_length 128 \
--per_device_train_batch_size 99999 \
--tokenizer_name="./data/sentencepiece/spm.model" \
--model_name_or_path="./bert-experiments/placeholder" \
--output_dir="./bert-experiments/bert-glue" \
--hyperparams="electra_base" \
--do_predict

Then, test results can be compiled into one directory. The test_results directory will contain test predictions, using the fine-tuned model with the highest dev set score for each task. The files in test_results can be zipped and submitted to the GLUE benchmark site for evaluation.

python3 custom_scripts/parse_glue.py \
--input="./bert-experiments/bert-glue" \
--test_dir="./bert-experiments/bert-glue/test_results"

Citation

@inproceedings{chang-etal-2021-convolutions,
  title={Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models},
  author={Tyler Chang and Yifan Xu and Weijian Xu and Zhuowen Tu},
  booktitle={ACL-IJCNLP 2021},
  year={2021},
}
Owner
mlpc-ucsd
mlpc-ucsd
1 Jun 28, 2022
문장단위로 분절된 나무위키 데이터셋. Releases에서 다운로드 받거나, tfds-korean을 통해 다운로드 받으세요.

Namuwiki corpus 문장단위로 미리 분절된 나무위키 코퍼스. 목적이 LM등에서 사용하기 위한 데이터셋이라, 링크/이미지/테이블 등등이 잘려있습니다. 문장 단위 분절은 kss를 활용하였습니다. 라이선스는 나무위키에 명시된 바와 같이 CC BY-NC-SA 2.0

Jeong Ukjae 16 Apr 02, 2022
Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.

anaGo anaGo is a Python library for sequence labeling(NER, PoS Tagging,...), implemented in Keras. anaGo can solve sequence labeling tasks such as nam

Hiroki Nakayama 1.5k Dec 05, 2022
Paddle2.x version AI-Writer

Paddle2.x 版本AI-Writer 用魔改 GPT 生成网文。Tuned GPT for novel generation.

yujun 74 Jan 04, 2023
TextFlint is a multilingual robustness evaluation platform for natural language processing tasks,

TextFlint is a multilingual robustness evaluation platform for natural language processing tasks, which unifies general text transformation, task-specific transformation, adversarial attack, sub-popu

TextFlint 587 Dec 20, 2022
[ICLR 2021 Spotlight] Pytorch implementation for "Long-tailed Recognition by Routing Diverse Distribution-Aware Experts."

RIDE: Long-tailed Recognition by Routing Diverse Distribution-Aware Experts. by Xudong Wang, Long Lian, Zhongqi Miao, Ziwei Liu and Stella X. Yu at UC

Xudong (Frank) Wang 205 Dec 16, 2022
A machine learning model for analyzing text for user sentiment and determine whether its a positive, neutral, or negative review.

Sentiment Analysis on Yelp's Dataset Author: Roberto Sanchez, Talent Path: D1 Group Docker Deployment: Deployment of this application can be found her

Roberto Sanchez 0 Aug 04, 2021
Code-autocomplete, a code completion plugin for Python

Code AutoComplete code-autocomplete, a code completion plugin for Python.

xuming 13 Jan 07, 2023
Input english text, then translate it between languages n times using the Deep Translator Python Library.

mass-translator About Input english text, then translate it between languages n times using the Deep Translator Python Library. How to Use Install dep

2 Mar 04, 2022
NLP-SentimentAnalysis - Coursera Course ( Duration : 5 weeks ) offered by DeepLearning.AI

Coursera Natural Language Processing Specialization This repository contains material related to Coursera Natural Language Processing Specialization.

Nishant Sharma 1 Jun 05, 2022
Header-only C++ HNSW implementation with python bindings

Hnswlib - fast approximate nearest neighbor search Header-only C++ HNSW implementation with python bindings. NEWS: version 0.6 Thanks to (@dyashuni) h

2.3k Jan 05, 2023
GPT-2 Model for Leetcode Questions in python

Leetcode using AI 🤖 GPT-2 Model for Leetcode Questions in python New demo here: https://huggingface.co/spaces/gagan3012/project-code-py Note: the Ans

Gagan Bhatia 100 Dec 12, 2022
Community and sentiment analysis based on tweets

The project has set itself the goal of analyzing the thoughts and interaction of Italian users through the social posts expressed through the Twitter platform on the day of the entry into force of th

3 Nov 17, 2022
Almost State-of-the-art Text Generation library

Ps: we are adding transformer model soon Text Gen 🐐 Almost State-of-the-art Text Generation library Text gen is a python library that allow you build

Emeka boris ama 63 Jun 24, 2022
Auto_code_complete is a auto word-completetion program which allows you to customize it on your needs

auto_code_complete is a auto word-completetion program which allows you to customize it on your needs. the model for this program is one of the deep-learning NLP(Natural Language Process) model struc

RUO 2 Feb 22, 2022
Simple python code to fix your combo list by removing any text after a separator or removing duplicate combos

Combo List Fixer A simple python code to fix your combo list by removing any text after a separator or removing duplicate combos Removing any text aft

Hamidreza Dehghan 3 Dec 05, 2022
Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": UNITER adversarial training part

VILLA: Vision-and-Language Adversarial Training This is the official repository of VILLA (NeurIPS 2020 Spotlight). This repository currently supports

Zhe Gan 109 Dec 31, 2022
BookNLP, a natural language processing pipeline for books

BookNLP BookNLP is a natural language processing pipeline that scales to books and other long documents (in English), including: Part-of-speech taggin

654 Jan 02, 2023
Code for text augmentation method leveraging large-scale language models

HyperMix Code for our paper GPT3Mix and conducting classification experiments using GPT-3 prompt-based data augmentation. Getting Started Installing P

NAVER AI 47 Dec 20, 2022