Code to train models from "Paraphrastic Representations at Scale".

Overview

Paraphrastic Representations at Scale

Code to train models from "Paraphrastic Representations at Scale".

The code is written in Python 3.7 and requires H5py, jieba, numpy, scipy, sentencepiece, sacremoses, and PyTorch >= 1.0 libraries. These can be insalled with the following command:

pip install -r requirements.txt

To get started, download the data files used for training from http://www.cs.cmu.edu/~jwieting and download the STS evaluation data:

wget http://phontron.com/data/paraphrase-at-scale.zip
unzip paraphrase-at-scale.zip
rm paraphrase-at-scale.zip
wget http://www.cs.cmu.edu/~jwieting/STS.zip .
unzip STS.zip
rm STS.zip

If you use our code, models, or data for your work please cite:

@article{wieting2021paraphrastic,
    title={Paraphrastic Representations at Scale},
    author={Wieting, John and Gimpel, Kevin and Neubig, Graham and Berg-Kirkpatrick, Taylor},
    journal={arXiv preprint arXiv:2104.15114},
    year={2021}
}

@inproceedings{wieting19simple,
    title={Simple and Effective Paraphrastic Similarity from Parallel Translations},
    author={Wieting, John and Gimpel, Kevin and Neubig, Graham and Berg-Kirkpatrick, Taylor},
    booktitle={Proceedings of the Association for Computational Linguistics},
    url={https://arxiv.org/abs/1909.13872},
    year={2019}
}

To embed a list of sentences:

python -u embed_sentences.py --sentence-file paraphrase-at-scale/example-sentences.txt --load-file paraphrase-at-scale/model.para.lc.100.pt  --sp-model paraphrase-at-scale/paranmt.model --output-file sentence_embeds.np --gpu 0

To score a list of sentence pairs:

python -u score_sentence_pairs.py --sentence-pair-file paraphrase-at-scale/example-sentences-pairs.txt --load-file paraphrase-at-scale/model.para.lc.100.pt  --sp-model paraphrase-at-scale/paranmt.model --gpu 0

To train a model (for example, on ParaNMT):

python -u main.py --outfile model.para.out --lower-case 1 --tokenize 0 --data-file paraphrase-at-scale/paranmt.sim-low=0.4-sim-high=1.0-ovl=0.7.final.h5 \
       --model avg --dim 1024 --epochs 25 --dropout 0.0 --sp-model paraphrase-at-scale/paranmt.model --megabatch-size 100 --save-every-epoch 1 --gpu 0 --vocab-file paraphrase-at-scale/paranmt.sim-low=0.4-sim-high=1.0-ovl=0.7.final.vocab

To download and preprocess raw data for training models (both bilingual and ParaNMT), see preprocess/bilingual and preprocess/paranmt.

Owner
John Wieting
John Wieting
GAN-based Matrix Factorization for Recommender Systems

GAN-based Matrix Factorization for Recommender Systems This repository contains the datasets' splits, the source code of the experiments and their res

Ervin Dervishaj 9 Nov 06, 2022
Semantic Bottleneck Scene Generation

SB-GAN Semantic Bottleneck Scene Generation Coupling the high-fidelity generation capabilities of label-conditional image synthesis methods with the f

Samaneh Azadi 41 Nov 28, 2022
A machine learning malware analysis framework for Android apps.

🕵️ A machine learning malware analysis framework for Android apps. ☢️ DroidDetective is a Python tool for analysing Android applications (APKs) for p

James Stevenson 77 Dec 27, 2022
Taming Transformers for High-Resolution Image Synthesis

Taming Transformers for High-Resolution Image Synthesis CVPR 2021 (Oral) Taming Transformers for High-Resolution Image Synthesis Patrick Esser*, Robin

CompVis Heidelberg 3.5k Jan 03, 2023
Code for "Unsupervised State Representation Learning in Atari"

Unsupervised State Representation Learning in Atari Ankesh Anand*, Evan Racah*, Sherjil Ozair*, Yoshua Bengio, Marc-Alexandre Côté, R Devon Hjelm This

Mila 217 Jan 03, 2023
[CVPR 2022] Pytorch implementation of "Templates for 3D Object Pose Estimation Revisited: Generalization to New objects and Robustness to Occlusions" paper

template-pose Pytorch implementation of "Templates for 3D Object Pose Estimation Revisited: Generalization to New objects and Robustness to Occlusions

Van Nguyen Nguyen 92 Dec 28, 2022
Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search

CLIP-GLaSS Repository for the paper Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search An in-browser demo is

Federico Galatolo 172 Dec 22, 2022
CenterPoint 3D Object Detection and Tracking using center points in the bird-eye view.

CenterPoint 3D Object Detection and Tracking using center points in the bird-eye view. Center-based 3D Object Detection and Tracking, Tianwei Yin, Xin

Tianwei Yin 134 Dec 23, 2022
Official repository of the paper Privacy-friendly Synthetic Data for the Development of Face Morphing Attack Detectors

SMDD-Synthetic-Face-Morphing-Attack-Detection-Development-dataset Official repository of the paper Privacy-friendly Synthetic Data for the Development

10 Dec 12, 2022
Rank 1st in the public leaderboard of ScanRefer (2021-03-18)

InstanceRefer InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring

63 Dec 07, 2022
Reusable constraint types to use with typing.Annotated

annotated-types PEP-593 added typing.Annotated as a way of adding context-specific metadata to existing types, and specifies that Annotated[T, x] shou

125 Dec 26, 2022
PyTorch code of my WACV 2022 paper Improving Model Generalization by Agreement of Learned Representations from Data Augmentation

Improving Model Generalization by Agreement of Learned Representations from Data Augmentation (WACV 2022) Paper ArXiv Why it matters? When data augmen

Rowel Atienza 5 Mar 04, 2022
MG-GCN: Scalable Multi-GPU GCN Training Framework

MG-GCN MG-GCN: multi-GPU GCN training framework. For more information, please read our paper. After cloning our repository, run git submodule update -

Translational Data Analytics (TDA) Lab @GaTech 6 Oct 24, 2022
ScaleNet: A Shallow Architecture for Scale Estimation

ScaleNet: A Shallow Architecture for Scale Estimation Repository for the code of ScaleNet paper: "ScaleNet: A Shallow Architecture for Scale Estimatio

Axel Barroso 34 Nov 09, 2022
Ejemplo Algoritmo Viterbi - Example of a Viterbi algorithm applied to a hidden Markov model on DNA sequence

Ejemplo Algoritmo Viterbi Ejemplo de un algoritmo Viterbi aplicado a modelo ocul

Mateo Velásquez Molina 1 Jan 10, 2022
Learning Energy-Based Models by Diffusion Recovery Likelihood

Learning Energy-Based Models by Diffusion Recovery Likelihood Ruiqi Gao, Yang Song, Ben Poole, Ying Nian Wu, Diederik P. Kingma Paper: https://arxiv.o

Ruiqi Gao 41 Nov 22, 2022
Self-Supervised Pre-Training for Transformer-Based Person Re-Identification

Self-Supervised Pre-Training for Transformer-Based Person Re-Identification [pdf] The official repository for Self-Supervised Pre-Training for Transfo

Hao Luo 116 Jan 04, 2023
Visual Question Answering in Pytorch

Visual Question Answering in pytorch /!\ New version of pytorch for VQA available here: https://github.com/Cadene/block.bootstrap.pytorch This repo wa

Remi 672 Jan 01, 2023
A Closer Look at Structured Pruning for Neural Network Compression

A Closer Look at Structured Pruning for Neural Network Compression Code used to reproduce experiments in https://arxiv.org/abs/1810.04622. To prune, w

Bayesian and Neural Systems Group 140 Dec 05, 2022
Python implementation of MULTIseq barcode alignment using fuzzy string matching and GMM barcode assignment

Python implementation of MULTIseq barcode alignment using fuzzy string matching and GMM barcode assignment.

MT Schmitz 2 Feb 11, 2022