Optimus: the first large-scale pre-trained VAE language model

Overview

Optimus: the first pre-trained Big VAE language model

This repository contains source code necessary to reproduce the results presented in the EMNLP 2020 paper Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space.

The network architecture of Optimus: encoder for representation learning and decoder for generation Sentences are organized and manipulated in a pre-trained compact and smooth latent space

For more on this project, see the Microsoft Research Blog post.

News

May 21, 2020: Releasing a demo for latent space manipulation, including sentence interpolation and analogy. Check out the website.

May 20, 2020: The latent space manipulation code is cleaned and released. See instructions at optimius_for_snli.md.

May 13, 2020: The fine-tuning code for langauge modeling is released. See instructions at optimus_finetune_language_models.md

Contents

There are four steps to use this codebase to reproduce the results in the paper.

  1. Dependencies
  2. Prepare datasets
  3. Model training
    1. Pre-training on setences in Wikipedia
    2. Languange Modeling
    3. Guided Language Generation
    4. Low-resource Language Understanding
  4. Collect and plot results

Dependencies

Pull docker from Docker Hub at: chunyl/pytorch-transformers:v2. Please see the instruction at doc/env.md

The project is organized into the following structures, with ensential files & folders visualized. output saves the models checkpoints.

├── Optimus
   └── code
       ├── examples
           ├── big_ae
               ├── modules
                   ├── vae.py
                   └── ...
               ├── run_lm_vae_pretraining_phdist_beta.py
               ├── run_lm_vae_training.py
               └── ...
	   ├── pytorch_transformers
               ├── modeling_bert.py
               ├── modeling_gpt2.py
               └── ...
       ├── scripts
           ├── scripts_docker
	   ├── scripts_local
	   ├── scripts_philly
   └── data
       └── datasets
           ├── wikipedia_json_64_filtered
               └── ...
	   ├── snli_data
           └── ...
   └── output
       ├── pretrain
       ├── LM
       └── ...       

Prepare Datasets

Please download or preparation the data via following the instructions at data/download_datasets.md.

Model Training

1. Pre-training on setences in Wikipedia

We pre-trained our models on Philly (a Microsoft internal compute cluster), the code is specialized for multi-node multi-GPU compute on this platform. The pre-training main python is run_lm_vae_pretraining_phdist_beta.py. You may need to adjust the distributed training scripts.

2. Languange Modeling

To have a fair comparison with existing VAE languange models, we consider a model with latent dimension 32. The pre-trained model is fine-tuned on four commonly datasets for one epoch. Please see the details at doc/optimus_finetune_language_models.md

3. Guided Language Generation

Latent Space Manipulation To ensure good performance, we consider a model with latent dimension 768. The pre-trained model is fine-tuned on SNLI dataset, where sentences show related patterns. Please see the details at Please see the details at doc/optimius_for_snli.md

4. Low-resource Language Understanding

Collect and Plot Results

Once the networks are trained and the results are saved, we extracted key results using Python script. The results can be plotted using the included IPython notebook plots/main_plots.ipynb. Start the IPython Notebook server:

$ cd plots
$ ipython notebook

Select the main_plots.ipynb notebook and execute the included code. Note that without modification, we have copyed our extracted results into the notebook, and script will output figures in the paper. If you've run your own training and wish to plot results, you'll have to organize your results in the same format instead.

Questions?

Please drop me (Chunyuan) a line if you have any questions.

@inproceedings{li2020_Optimus,
  title={Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space},
  author={Li, Chunyuan and Gao, Xiang and Li, Yuan and Li, Xiujun and Peng, Baolin and Zhang, Yizhe and Gao, Jianfeng},
  booktitle={EMNLP},
  year={2020}
}
Owner
Researcher @ Microsoft Research
Code for ICML 2021 paper: How could Neural Networks understand Programs?

OSCAR This repository contains the source code of our ICML 2021 paper How could Neural Networks understand Programs?. Environment Run following comman

Dinglan Peng 115 Dec 17, 2022
Code for Mesh Convolution Using a Learned Kernel Basis

Mesh Convolution This repository contains the implementation (in PyTorch) of the paper FULLY CONVOLUTIONAL MESH AUTOENCODER USING EFFICIENT SPATIALLY

Yi_Zhou 35 Jan 03, 2023
This project helps to colorize grayscale images using multiple exemplars.

Multiple Exemplar-based Deep Colorization (Pytorch Implementation) Pretrained Model [Jitendra Chautharia](IIT Jodhpur)1,3, Prerequisites Python 3.6+ N

jitendra chautharia 3 Aug 05, 2022
ICCV2021: Code for 'Spatial Uncertainty-Aware Semi-Supervised Crowd Counting'

ICCV2021: Code for 'Spatial Uncertainty-Aware Semi-Supervised Crowd Counting'

Yanda Meng 14 May 13, 2022
This repository contains PyTorch code for Robust Vision Transformers.

This repository contains PyTorch code for Robust Vision Transformers.

117 Dec 07, 2022
TransReID: Transformer-based Object Re-Identification

TransReID: Transformer-based Object Re-Identification [arxiv] The official repository for TransReID: Transformer-based Object Re-Identification achiev

569 Dec 30, 2022
Code of paper: "DropAttack: A Masked Weight Adversarial Training Method to Improve Generalization of Neural Networks"

DropAttack: A Masked Weight Adversarial Training Method to Improve Generalization of Neural Networks Abstract: Adversarial training has been proven to

倪仕文 (Shiwen Ni) 58 Nov 10, 2022
Scheme for training and applying a label propagation framework

Factorisation-based Image Labelling Overview This is a scheme for training and applying the factorisation-based image labelling (FIL) framework. Some

Wellcome Centre for Human Neuroimaging 2 Dec 17, 2021
Neural Radiance Fields Using PyTorch

This project is a PyTorch implementation of Neural Radiance Fields (NeRF) for reproduction of results whilst running at a faster speed.

Vedant Ghodke 1 Feb 11, 2022
Physical Anomalous Trajectory or Motion (PHANTOM) Dataset

Physical Anomalous Trajectory or Motion (PHANTOM) Dataset Description This dataset contains the six different classes as described in our paper[]. The

0 Dec 16, 2021
User-friendly bulk RNAseq deconvolution using simulated annealing

Welcome to cellanneal - The user-friendly application for deconvolving omics data sets. cellanneal is an application for deconvolving biological mixtu

11 Dec 16, 2022
Course content and resources for the AIAIART course.

AIAIART course This repo will house the notebooks used for the AIAIART course. Part 1 (first four lessons) ran via Discord in September/October 2021.

Jonathan Whitaker 492 Jan 06, 2023
Learning nonlinear operators via DeepONet

DeepONet: Learning nonlinear operators The source code for the paper Learning nonlinear operators via DeepONet based on the universal approximation th

Lu Lu 239 Jan 02, 2023
A python implementation of Yolov5 to detect fire or smoke in the wild in Jetson Xavier nx and Jetson nano

yolov5-fire-smoke-detect-python A python implementation of Yolov5 to detect fire or smoke in the wild in Jetson Xavier nx and Jetson nano You can see

20 Dec 15, 2022
RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching

RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching This repository contains the source code for our paper: RAFT-Stereo: Multilevel

Princeton Vision & Learning Lab 328 Jan 09, 2023
PCACE: A Statistical Approach to Ranking Neurons for CNN Interpretability

PCACE: A Statistical Approach to Ranking Neurons for CNN Interpretability PCACE is a new algorithm for ranking neurons in a CNN architecture in order

4 Jan 04, 2022
Plover-tapey-tape: an alternative to Plover’s built-in paper tape

plover-tapey-tape plover-tapey-tape is an alternative to Plover’s built-in paper

7 May 29, 2022
social humanoid robots with GPGPU and IoT

Social humanoid robots with GPGPU and IoT Social humanoid robots with GPGPU and IoT Paper Authors Mohsen Jafarzadeh, Stephen Brooks, Shimeng Yu, Balak

0 Jan 07, 2022
Ascend your Jupyter Notebook usage

Jupyter Ascending Sync Jupyter Notebooks from any editor About Jupyter Ascending lets you edit Jupyter notebooks from your favorite editor, then insta

Untitled AI 254 Jan 08, 2023
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Segmentation Transformer Implementation of Segmentation Transformer in PyTorch, a new model to achieve SOTA in semantic segmentation while using trans

Abhay Gupta 161 Dec 08, 2022