An open source framework for seq2seq models in PyTorch.

Last update: Jan 02, 2023

Overview

pytorch-seq2seq

This is a framework for sequence-to-sequence (seq2seq) models implemented in PyTorch. The framework has modularized and extensible components for seq2seq models, training and inference, checkpoints, etc. This is an alpha release. We appreciate any kind of feedback or contribution.

What's New in 0.1.6

Compatible with PyTorch 0.4
Added support for pre-trained word embedding

Roadmap

Seq2seq is a fast evolving field with new techniques and architectures being published frequently. The goal of this library is facilitating the development of such techniques and applications. While constantly improving the quality of code and documentation, we will focus on the following items:

Evaluation with benchmarks such as WMT machine translation, COCO image captioning, conversational models, etc;
Provide more flexible model options, improving the usability of the library;
Adding latest architectures such as the CNN based model proposed by Convolutional Sequence to Sequence Learning and the transformer model proposed by Attention Is All You Need;
Support features in the new versions of PyTorch.

Installation

This package requires Python 2.7 or 3.6. We recommend creating a new virtual environment for this project (using virtualenv or conda).

Prerequisites

Numpy: pip install numpy (Refer here for problem installing Numpy).
PyTorch: Refer to PyTorch website to install the version w.r.t. your environment.

Install from source

Currently we only support installation from source code using setuptools. Checkout the source code and run the following commands:

pip install -r requirements.txt
python setup.py install

If you already had a version of PyTorch installed on your system, please verify that the active torch package is at least version 0.1.11.

Get Started

Prepare toy dataset

# Run script to generate the reverse toy dataset
# The generated data is stored in data/toy_reverse by default
scripts/toy.sh

Train and play

TRAIN_PATH=data/toy_reverse/train/data.txt
DEV_PATH=data/toy_reverse/dev/data.txt
# Start training
python examples/sample.py --train_path $TRAIN_PATH --dev_path $DEV_PATH

It will take about 3 minutes to train on CPU and less than 1 minute with a Tesla K80. Once training is complete, you will be prompted to enter a new sequence to translate and the model will print out its prediction (use ctrl-C to terminate). Try the example below!

Input:  1 3 5 7 9
Expected output: 9 7 5 3 1 EOS

Checkpoints

Checkpoints are organized by experiments and timestamps as shown in the following file structure

experiment_dir
+-- input_vocab
+-- output_vocab
+-- checkpoints
|  +-- YYYY_mm_dd_HH_MM_SS
   |  +-- decoder
   |  +-- encoder
   |  +-- model_checkpoint

The sample script by default saves checkpoints in the experiment folder of the root directory. Look at the usages of the sample code for more options, including resuming and loading from checkpoints.

Benchmarks

WMT Machine Translation (Coming soon)

Troubleshoots and Contributing

If you have any questions, bug reports, and feature requests, please open an issue on Github. For live discussions, please go to our Gitter lobby.

We appreciate any kind of feedback or contribution. Feel free to proceed with small issues like bug fixes, documentation improvement. For major contributions and new features, please discuss with the collaborators in corresponding issues.

Development Cycle

We are using 4-week release cycles, where during each cycle changes will be pushed to the develop branch and finally merge to the master branch at the end of each cycle.

Development Environment

We setup the development environment using Vagrant. Run vagrant up with our 'Vagrantfile' to get started.

The following tools are needed and installed in the development environment by default:

Git
Python
Python packages: nose, mock, coverage, flake8

Test

The quality and the maintainability of the project is ensured by comprehensive tests. We encourage writing unit tests and integration tests when contributing new codes.

Locally please run nosetests in the package root directory to run unit tests. We use TravisCI to require that a pull request has to pass all unit tests to be eligible to merge. See travis configuration for more information.

Code Style

We follow PEP8 for code style. Especially the style of docstrings is important to generate documentation.

Local: Run the following commands in the package root directory

# Python syntax errors or undefined names
flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics
# Style checks
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics

Github: We use Codacy to check styles on pull requests and branches.

Comments

pytorch-seq2seq slower than OpenNMT-py

Benchmarked the two implementations using WMT's newstest2013 from German to English. See training logs in the gist. Despite accuracy differences, pytorch-seq2seq is 10 times slower than OpenNMT.py.
enhancement high priority

opened by kylegao91 7
Add a predictor method to return more than one possible sequence

Would be possible to add to this library a predictor_n (or to modify the current predictor) method to return more than one sequence as result? I think it would be a great tool to have when using beam search (with TopKDecoder).

I coded a first attempt to do that (it seems to work, https://github.com/juan-cb/pytorch-seq2seq/commit/442431001b122fa15c4b6476a9d7411570f53f20), but I'm not sure if it is the best way to implement that or is completely correct. The desired behavior is to return the n most probable sequences given an src_seq.

Thanks in advance

opened by cbjuan 6
ValueError: lengths array has to be sorted in decreasing order
Took me a while to track this down, but there is an error if you run the sample code with the git version of torchtext.

File "torch/nn/utils/rnn.py", line 79, in pack_padded_sequence raise ValueError("lengths array has to be sorted in decreasing order")

The reason is this commit introduced a month ago in torchtext: https://github.com/pytorch/text/commit/a5049b9d70a699986ae839aca178c33376717cde

This conflicts with this line in the supervised trainer: https://github.com/IBM/pytorch-seq2seq/blob/9e9fefb9dea882958c88e9c29cfbe9ea6d5408fc/seq2seq/trainer/supervised_trainer.py#L85

Simply removing the negative sign fixes the issue, however this will break code if the pypi version of torchtext is used.

Few fixes:

Ask torchtext maintainers to revert this upstream change. See PR https://github.com/pytorch/text/pull/95

Detect undesired sorting and reverse batch.

Detect version of torchtext and sort accordingly.

Add in option for sort direction into the supervised trainer.

enhancement help wanted medium priority
opened by kyteague 6
fail to get meaningful response using pytorch-seq2seq for chatbot

I'm using pytorch-seq2seq for chatbot. I used two dataset ubuntu and twitter. I 've formatted the datasets, modified data path in "example.py" and tuned some hyper-parameters(e.g. hidden_size batch_size epoches).

While I fail to get meaningful response after model finished training. When I typed in some sentences like hello how are you, it often gave me ['EOS'] or ['i', 'i', 'EOS']. Is there any suggestion to handle this issue?

opened by DataTerminatorX 6
Creating pull request of hacks I needed to run sample.py on cuda

When I run even the basic sample.py script under examples on a cuda enabled machine, I still get errors that not all the vectors are cuda vectors (some are still cpu). You can ignore my edits in sample.py, but I did annotate in each place where I needed to add vector = vector.cuda() to make sample.py run. This occurs even when torch.device('cuda') is called, which should not be the case in PyTorch 0.4.0+.

Thank you, and feel free to follow up with any questions.

opened by DavidLKing 5
Doubt on "pytorch-seq2seq/seq2seq/models/EncoderRNN.py"

if self.variable_lengths: embedded = nn.utils.rnn.pack_padded_sequence(embedded, input_lengths, batch_first=True) output, hidden = self.rnn(embedded) if self.variable_lengths: output, _ = nn.utils.rnn.pad_packed_sequence(output, batch_first=True) Hi, Why here goes two if self.variable_lengths?

And this code doesn't identify the h0, c0, is that means they default to zero?

Looking forward to your response. @kylegao91

opened by caozhen-alex 5
Cuda.LongTensor instead of LongTensor on GPU

I found this bug when running the basic script, example/sample.py:

return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: Expected object of type torch.cuda.LongTensor but found type torch.LongTensor for argument #3 'index'

It seems to be a wrong type of tensor on GPU.
fixed in develop

opened by ShilinHe 5
Compatibility of TopKDecoder with DecoderRNN

The codebase contains a TopKDecoder which can be used to do beam search while generating sentences. According to the docstring, the __init__ method takes as input a DecoderRNN object but the code is accessing attributes like .lang and .SOS_token_id which are not present in the DecoderRNN class.

Also my understanding is that the TopKDecoder can be used to generate sentences after the DecoderRNN has been trained. Is this understanding correct.
high priority fixed in develop

opened by abhiskk 5
GPU error when run sample code

When I run the sample code, python examples/sample.py --train_path $TRAIN_PATH --dev_path $DEV_PATH

GPU errors appear as below, It seems data don't satisfy a gpu tensor, I failed to solve it. Has anyone meet the error, too?

/home/Vachel/env3/lib/python3.5/site-packages/torch/nn/functional.py:52: UserWarning: size_average and reduce args will be deprecated, please use reduction='elementwise_mean' instead. warnings.warn(warning.format(ret)) 2018-11-20 23:33:48,774 root INFO Namespace(dev_path='data/toy_reverse/dev/data.txt', expt_dir='./experiment', load_checkpoint=None, log_level='info', resume=False, train_path='data/toy_reverse/train/data.txt') /home/Vachel/env3/lib/python3.5/site-packages/torch/nn/functional.py:52: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead. warnings.warn(warning.format(ret)) /home/Vachel/env3/lib/python3.5/site-packages/torch/nn/modules/rnn.py:38: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.2 and num_layers=1 "num_layers={}".format(dropout, num_layers)) 2018-11-20 23:33:51,817 seq2seq.trainer.supervised_trainer INFO Optimizer: Adam ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) eps: 1e-08 lr: 0.001 weight_decay: 0 ), Scheduler: None Traceback (most recent call last): File "examples/sample.py", line 129, in resume=opt.resume) File "/home/Vachel/SDML/hw3-0/pytorch-seq2seq/seq2seq/trainer/supervised_trainer.py", line 186, in train teacher_forcing_ratio=teacher_forcing_ratio) File "/home/Vachel/SDML/hw3-0/pytorch-seq2seq/seq2seq/trainer/supervised_trainer.py", line 103, in _train_epoches loss = self._train_batch(input_variables, input_lengths.tolist(), target_variables, model, teacher_forcing_ratio) File "/home/Vachel/SDML/hw3-0/pytorch-seq2seq/seq2seq/trainer/supervised_trainer.py", line 55, in _train_batch teacher_forcing_ratio=teacher_forcing_ratio) File "/home/Vachel/env3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, **kwargs) File "/home/Vachel/SDML/hw3-0/pytorch-seq2seq/seq2seq/models/seq2seq.py", line 48, in forward encoder_outputs, encoder_hidden = self.encoder(input_variable, input_lengths) File "/home/Vachel/env3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, **kwargs) File "/home/Vachel/SDML/hw3-0/pytorch-seq2seq/seq2seq/models/EncoderRNN.py", line 68, in forward embedded = self.embedding(input_var) File "/home/Vachel/env3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, **kwargs) File "/home/Vachel/env3/lib/python3.5/site-packages/torch/nn/modules/sparse.py", line 110, in forward self.norm_type, self.scale_grad_by_freq, self.sparse) File "/home/Vachel/env3/lib/python3.5/site-packages/torch/nn/functional.py", line 1110, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: Expected object of type torch.cuda.LongTensor but found type torch.LongTensor for argument #3 'index'
duplicate fixed in develop

opened by vachelch 4

Adding python logger

#7 Changing output logs to use the python logger rather than print statements.

Earlier output :

Namespace(dev_path='../tests/data/eng-fra.txt', expt_dir='./experiment', load_checkpoint=None, resume=False, train_path='../tests/data/eng-fra.txt')
Reading lines...
Read 100 lines
Number of pairs: 100
Reading lines...
Read 100 lines
Number of pairs: 100
Finished epoch 1, Dev Perplexity: 143.2751
Finished epoch 2, Dev Perplexity: 133.0537
Time elapsed: 3s, Progress: 62%, Train Perplexity: 139.8005

Current output :

INFO:__main__:Namespace(dev_path='../tests/data/eng-fra.txt', expt_dir='./experiment', load_checkpoint=None, resume=False, train_path='../tests/data/eng-fra.txt')
INFO:seq2seq.dataset.utils:Reading Lines form ../tests/data/eng-fra.txt
Read 100 lines
INFO:seq2seq.dataset.utils:
Number of pairs: 100
INFO:seq2seq.dataset.utils:Reading Lines form ../tests/data/eng-fra.txt
Read 100 lines
INFO:seq2seq.dataset.utils:
Number of pairs: 100
INFO:seq2seq.trainer.supervised_trainer:Finished epoch 1, Dev Perplexity: 136.4548
INFO:seq2seq.trainer.supervised_trainer:Finished epoch 2, Dev Perplexity: 125.4301
INFO:seq2seq.trainer.supervised_trainer:Time elapsed: 3s, Progress: 62%, Train Perplexity: 134.4973

opened by avinash2692 4

GPU Tesla P100 vs Intel i7 CPU. GPU is only 2x faster.

Only a 2x speed up on a P100 Tesla vs a Intel i7 CPU

GPU: Time elapsed: 4m 36s, Progress: 8%, Train Perplexity: 1.1057

CPU: Time elapsed: 4m 1s, Progress: 3%, Train Perplexity: 1.1451

Running the on SimpleQuestion dataset.

opened by PetrochukM 4
Teacher forcing per timestep?

Hi,

I don't understand why the teacher forcing is being done per the whole sequence. The definition of the teacher forcing claims that at each timestep, a predicted or the ground truth token should be fed from the previous timestep. The implementation here, on the other hand, will first make a decision on whether generate the whole sequence with teacher forcing, and then continues decoding with teacher forcing set to True or False (for the whole sequence), which I believe is not correct.

I really appreciate the feedback on this issue, Thanks!

opened by aligholami 1
Out of memory for NLLLoss even the batch size is small

Hi I'm using this framework on my dataset, everything works fine on CPU, but when I moved them to gpu, it had the error as following: File "/home/ibm_decoder/DecoderRNN.py", line 107, in forward_step predicted_softmax = function(self.out(output.contiguous().view(-1, self.hidden_size)), dim=1).view(batch_size, output_size, -1) File "/home/anaconda2/envs/lib/python3.6/site-packages/torch/nn/functional.py", line 1317, in log_softmax ret = input.log_softmax(dim) RuntimeError: CUDA out of memory. Tried to allocate 2.77 GiB (GPU 0; 10.76 GiB total capacity; 8.66 GiB already allocated; 943.56 MiB free; 9.06 GiB reserved in total by PyTorch) The batch size is only 32, so I don't know what was wrong and what caused such big memory allocation.

opened by serenayj 0
The dimension of predicted_softmax in DecoderRNN.py

https://github.com/IBM/pytorch-seq2seq/blob/f146087a9a271e9b50f46561e090324764b081fb/seq2seq/models/DecoderRNN.py#L105 I think .view(batch_size, output_size, -1) should be .view(batch_size, -1, output_size) or this line just makes no sense.

opened by tk1363704 0
Teacher forcing during beam decoding

https://github.com/IBM/pytorch-seq2seq/blob/f146087a9a271e9b50f46561e090324764b081fb/seq2seq/models/TopKDecoder.py#L83 .

I think teacher_forcing should not be present in beam decoding, since ground truth tokens are not known during inference.

opened by iamsimha 0

Releases(0.1.6)

0.1.6(May 4, 2018)
What's New in 0.1.6

Compatible with PyTorch 0.4

Added support for pre-trained word embedding

Source code(tar.gz)
Source code(zip)

Owner

International Business Machines

GitHub Repository https://ibm.github.io/pytorch-seq2seq/public/index.html

🎐 a python library for doing approximate and phonetic matching of strings.

jellyfish Jellyfish is a python library for doing approximate and phonetic matching of strings. Written by James Turk 1.8k Dec 21, 2022

Toward Model Interpretability in Medical NLP

Toward Model Interpretability in Medical NLP LING380: Topics in Computational Linguistics Final Project James Cross ( 1 Mar 04, 2022

StarGAN - Official PyTorch Implementation

StarGAN - Official PyTorch Implementation ***** New: StarGAN v2 is available at https://github.com/clovaai/stargan-v2 ***** This repository provides t

5.1k Dec 30, 2022

Study German declensions (dER nettE Mann, ein nettER Mann, mit dEM nettEN Mann, ohne dEN nettEN Mann ...) Generate as many exercises as you want using the incredible power of SPACY!

4 Jul 20, 2022

The projects lets you extract glossary words and their definitions from a given piece of text automatically using NLP techniques

Unsupervised technique to Glossary and Definition Extraction Code Files GPT2-DefinitionModel.ipynb - GPT-2 model for definition generation. Data_Gener

28 May 25, 2021

AI Assistant for Building Reliable, High-performing and Fair Multilingual NLP Systems

37 Nov 29, 2022

pysentimiento: A Python toolkit for Sentiment Analysis and Social NLP tasks

A Python multilingual toolkit for Sentiment Analysis and Social NLP tasks

297 Dec 29, 2022

HiFi DeepVariant + WhatsHap workflowHiFi DeepVariant + WhatsHap workflow

HiFi DeepVariant + WhatsHap workflow Workflow steps align HiFi reads to reference with pbmm2 call small variants with DeepVariant, using two-pass meth

2 May 14, 2022

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

GenSen Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning Sandeep Subramanian, Adam Trischler, Yoshua B

309 Oct 19, 2022

AI and Machine Learning workflows on Anthos Bare Metal.

Hybrid and Sovereign AI on Anthos Bare Metal Table of Contents Overview Terraform as IaC Substrate ABM Cluster on GCE using Terraform TensorFlow ResNe

8 Nov 26, 2022

Fake Shakespearean Text Generator

Fake Shakespearean Text Generator This project contains an impelementation of stateful Char-RNN model to generate fake shakespearean texts. Files and

1 Feb 15, 2022

2021语言与智能技术竞赛：机器阅读理解任务

LICS2021 MRC 1. 项目&任务介绍本项目基于官方给定的baseline（DuReader-Checklist-BASELINE）进行二次改造，对整个代码框架做了简单的重构，对核心网络结构添加了注释，解耦了数据读取的模块，并添加了阈值确认的功能，一些小的细节也做了改进。本次任务为202

29 Dec 05, 2022

Production First and Production Ready End-to-End Keyword Spotting Toolkit

223 Jan 02, 2023

Use AutoModelForSeq2SeqLM in Huggingface Transformers to train COMET

Training COMET using seq2seq setting Use AutoModelForSeq2SeqLM in Huggingface Transformers to train COMET. The codes are modified from run_summarizati

9 Dec 17, 2022

Knowledge Management for Humans using Machine Learning & Tags

HyperTag helps humans intuitively express how they think about their files using tags and machine learning. Represent how you think using tags. Find what you look for using semantic search for your t

166 Jan 07, 2023

Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms

FNet: Mixing Tokens with Fourier Transforms Pytorch implementation of Fnet : Mixing Tokens with Fourier Transforms. Citation: @misc{leethorp2021fnet,

217 Dec 05, 2022

Code repository for "It's About Time: Analog clock Reading in the Wild"

it's about time Code repository for "It's About Time: Analog clock Reading in the Wild" Packages required: pytorch (used 1.9, any reasonable version s

52 Nov 10, 2022

:hot_pepper: R²SQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)

R²SQL The PyTorch implementation of paper Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing. (AAAI 2021) Requirement

60 Dec 31, 2022

A framework for evaluating Knowledge Graph Embedding Models in a fine-grained manner.

13 Sep 08, 2022

Bu Chatbot, Konya Bilim Merkezi Yen için tasarlanmış olan bir projedir.

chatbot Bu Chatbot, Konya Bilim Merkezi Yeni Ufuklar Sergisi için 2021 Yılında tasarlanmış olan bir projedir. Chatbot Python ortamında yazılmıştır. Sö

1 Feb 23, 2022

An open source framework for seq2seq models in PyTorch.

Related tags

Overview

pytorch-seq2seq

What's New in 0.1.6

Roadmap

Installation

Prerequisites

Install from source

Get Started

Prepare toy dataset

Train and play

Checkpoints

Benchmarks

Troubleshoots and Contributing

Development Cycle

Development Environment

Test

Code Style

Comments

Releases(0.1.6)

0.1.6(May 4, 2018)

What's New in 0.1.6

Owner

International Business Machines

🎐 a python library for doing approximate and phonetic matching of strings.

Toward Model Interpretability in Medical NLP

StarGAN - Official PyTorch Implementation

Study German declensions (dER nettE Mann, ein nettER Mann, mit dEM nettEN Mann, ohne dEN nettEN Mann ...) Generate as many exercises as you want using the incredible power of SPACY!

The projects lets you extract glossary words and their definitions from a given piece of text automatically using NLP techniques

AI Assistant for Building Reliable, High-performing and Fair Multilingual NLP Systems

pysentimiento: A Python toolkit for Sentiment Analysis and Social NLP tasks

HiFi DeepVariant + WhatsHap workflowHiFi DeepVariant + WhatsHap workflow

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

AI and Machine Learning workflows on Anthos Bare Metal.

Fake Shakespearean Text Generator

2021语言与智能技术竞赛：机器阅读理解任务

Production First and Production Ready End-to-End Keyword Spotting Toolkit

Use AutoModelForSeq2SeqLM in Huggingface Transformers to train COMET

Knowledge Management for Humans using Machine Learning & Tags

Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms

Code repository for "It's About Time: Analog clock Reading in the Wild"

:hot_pepper: R²SQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)

A framework for evaluating Knowledge Graph Embedding Models in a fine-grained manner.

Bu Chatbot, Konya Bilim Merkezi Yen için tasarlanmış olan bir projedir.