Code for "Finetuning Pretrained Transformers into Variational Autoencoders"

Last update: Nov 26, 2022

Overview

transformers-into-vaes

Code for Finetuning Pretrained Transformers into Variational Autoencoders (our submission to NLP Insights Workshop 2021).

Gathering data used in the paper:

Download all data (penn, snli, yahoo, yelp) from this repository.
Change data path in base_models.py accordingly.

Running experiments:

Install dependencies.

pip install -r requirements.txt

Run phase 1 (encoder only training):

./run_encoder_training snli

Run phase 2 (full training):

./run_training snli <path_to_checkpoint_from_phase_1>

Calculating metrics:

python evaluate_all.py -d snli -bs 256 -c <path_to_config_file> -ckpt <path_to_checkpoint_file>

Owner

Seongmin Park

NLP Researcher at actionpower.kr Maintainer @hamanlp

GitHub Repository

aMLP Transformer Model for Japanese

aMLP-japanese Japanese aMLP Pretrained Model aMLPとは、Liu, Daiらが提案する、Transformerモデルです。ざっくりというと、BERTの代わりに使えて、より性能の良いモデルです。詳しい解説は、こちらの記事などを参考にしてください。この

13 Aug 11, 2022

OpenAI CLIP text encoders for multiple languages!

Multilingual-CLIP OpenAI CLIP text encoders for any language Colab Notebook · Pre-trained Models · Report Bug Overview OpenAI recently released the pa

481 Dec 30, 2022

Protein Language Model

ProteinLM We pretrain protein language model based on Megatron-LM framework, and then evaluate the pretrained model results on TAPE (Tasks Assessing P

77 Dec 27, 2022

A Paper List for Speech Translation

Keyword: Speech Translation, Spoken Language Processing, Natural Language Processing

138 Dec 24, 2022

Meta learning algorithms to train cross-lingual NLI (multi-task) models

4 Nov 20, 2022

2021 AI CUP Competition on Traditional Chinese Scene Text Recognition - Intermediate Contest

繁體中文場景文字辨識程式碼說明組別：這就是我成員：蔣明憲唐碩謙黃玥菱林冠霆蕭靖騰目錄環境套件安裝方式資料夾布局前處理-製作偵測訓練註解檔前處理-製作分類訓練樣本 part.py ：從 json 裁切出分類訓練樣本 Class.py ：將切出來的樣本按照文字分類到各資料夾

3 Jan 14, 2022

Mysticbbs-rjam - rJAM splitscreen message reader for MysticBBS A46+

rJAM splitscreen message reader for MysticBBS A46+

4 Nov 22, 2022

Index different CKAN entities in Solr, not just datasets

ckanext-sitesearch Index different CKAN entities in Solr, not just datasets Requirements This extension requires CKAN 2.9 or higher and Python 3 Featu

3 Dec 02, 2022

A Non-Autoregressive Transformer based TTS, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS.

A Non-Autoregressive Transformer based TTS, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to ach

237 Jan 02, 2023

Example code for "Real-World Natural Language Processing"

Real-World Natural Language Processing This repository contains example code for the book "Real-World Natural Language Processing." AllenNLP (2.5.0 or

303 Dec 17, 2022

SGMC: Spectral Graph Matrix Completion

SGMC: Spectral Graph Matrix Completion Code for AAAI21 paper "Scalable and Explainable 1-Bit Matrix Completion via Graph Signal Learning". Data Format

8 Dec 12, 2022

HAN2HAN : Hangul Font Generation

36 Dec 28, 2022

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Provides an implementation of today's most used tokenizers, with a focus on performance and versatility. Main features: Train new vocabularies and tok

6.2k Dec 31, 2022

A CSRankings-like index for speech researchers

Speech Rankings This project mimics CSRankings to generate an ordered list of researchers in speech/spoken language processing along with their possib

19 Nov 26, 2022

Sample data associated with the Aurora-BP study

The Aurora-BP Study and Dataset This repository contains sample code, sample data, and explanatory information for working with the Aurora-BP dataset

16 Dec 12, 2022

keras implement of transformers for humans

4.8k Jan 03, 2023

This python module is an easy-to-use port of the text normalization used in the paper "Not low-resource anymore: Aligner ensembling, batch filtering, and new datasets for Bengali-English machine translation". It is intended to be used for normalizing / cleaning Bengali and English text.

normalizer This python module is an easy-to-use port of the text normalization used in the paper "Not low-resource anymore: Aligner ensembling, batch

23 Nov 30, 2022

This repository is home to the Optimus data transformation plugins for various data processing needs.

Transformers Optimus's transformation plugins are implementations of Task and Hook interfaces that allows execution of arbitrary jobs in optimus. To i

37 Dec 14, 2022

MILES is a multilingual text simplifier inspired by LSBert - A BERT-based lexical simplification approach proposed in 2018. Unlike LSBert, MILES uses the bert-base-multilingual-uncased model, as well as simple language-agnostic approaches to complex word identification (CWI) and candidate ranking.

MILES Multilingual Lexical Simplifier Explore the docs » Read LSBert Paper · Report Bug · Request Feature About The Project MILES is a multilingual te

45 Oct 19, 2022

BERTAC (BERT-style transformer-based language model with Adversarially pretrained Convolutional neural network)

BERTAC (BERT-style transformer-based language model with Adversarially pretrained Convolutional neural network) BERTAC is a framework that combines a

6 Jan 24, 2022