Overview

This is the repository for paper: On the Effect of Isotropy on VAE Representations of Text.

Dataset

We provide datasets used in this paper via Google Drive: https://drive.google.com/file/d/1Vh5C1A74DosCpX4Wdnjr5t5sye01FjCb/view?usp=sharing.

LSTM-VAE Implementation and Relevant Evaluations

Before using any file in this repository, please create two directories under the root directory named ''Dataset'' and ''model'', respectively. The Dataset directory is used to storage datasets. The model directory is used to storage models and relevant evaluation results.

External Package Required

Tensorflow 2, Numpy, Pandas, Scikit-Learn, NLTK, Matplotlib.

Python File Usage

lstm_vae.py

VAE training. Type "python lstm_vae.py -h" for help of training configuration. The dataset path is the relative path under Dataset directory. The trained model path is going to be the relative path under model directory.

lstm_ae.py

AE training. Type "python lstm_ae.py -h" for help of training configuration.

quality.py

Qualitative evaluation for VAE models including word imputation, homotopy and generation.

reconstruction.py

Using mean representation to reconstruct test set and calculate BLEU and Rouge scores.

agreement.py

Training a text classifer as well as evaluating on reconstruction.

classification.py

Using a 2-hidden-layer MLP with 128 neurons and ReLU activation for classification task.

perplexity.py

Calculate forward and reverse perplexity on generated sentences.

mnist.py

Train and evaluate on image datasets.

ablation.py

Ablation study.

aggregated.py

Some estimation on aggregated posterior.

robustness.py

Randomly delete 30% of words to evaluate robustness.

utils.py

Commonly used functions.

Example of Usage

This is an example of training and evaluating a VAE trained on a dataset.

First: "python lstm_vae.py -e 200 -r 512 -z 32 -b 128 -lr 0.0005 --epochs 20 --datapath DBpedia -C 5 -s 0 -po diag -m DBpedia_C_5_po_diag_0"

This will create a directory named DBpedia_C_5_po_diag_0 under the model directory. The model will be stored in this directory as well as an epoch_loss.txt file to record losses during training.

Second: "python quality.py -tm 2 -m DBpedia_C_5_po_diag_0"

This will generate 100K sentences using prior.

Third: "python reconstruction.py -m DBpedia_C_5_po_diag_0"

This will reconstruct sentences in test set and write them in mean.txt. This will also record BLEU and Rouge scores after reconstruction.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
ablation.py		ablation.py
aggregated.py		aggregated.py
agreement.py		agreement.py
basic.py		basic.py
boxplot.py		boxplot.py
capacity.py		capacity.py
classification.py		classification.py
iphr.py		iphr.py
lstm_ae.py		lstm_ae.py
lstm_iwae.py		lstm_iwae.py
lstm_vae.py		lstm_vae.py
mnist.py		mnist.py
perplexity.py		perplexity.py
quality.py		quality.py
reconstruction.py		reconstruction.py
robustness.py		robustness.py
rouge.py		rouge.py
tsne.py		tsne.py
utils.py		utils.py

lanzhang128/IGPVAE

Folders and files

Latest commit

History

Repository files navigation