Neural machine translation between the writings of Shakespeare and modern English using TensorFlow

Overview

Shakespeare translations using TensorFlow

This is an example of using the new Google's TensorFlow library on monolingual translation going from modern English to Shakespeare based on research from Wei Xu.

Prepare

First download the TensorFlow library depending on your platform:

pip install https://storage.googleapis.com/tensorflow/mac/tensorflow-0.5.0-py2-none-any.whl # for mac
pip install https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.5.0-cp27-none-linux_x86_64.whl # for ubuntu
  1. Grabs parallel data.
  2. Gets train, dev split.
  3. Builds vocabulary
  4. Converts parallel data into ids

From the root directory:

python -m tensorshake.get_data
python -m tensorshake.prepare_corpus

Delete /cache to start anew.

Train

Use the example BASH script to train the model. This saves the check points in the --train_dir directory. If you run it again, the training process continues from the check point. To restart with fresh parameters, simply delete/rename the check points.

./run.sh

Results

Benchmarks from original paper. (Shakespeare -> Modern English)

Input Output
i will bite thee by the ear for that jest . i ’ ll bite you by the ear for that joke .
what further woe conspires against mine age ? what ’ s true despair conspires against my old age ?
how doth my lady ? how is my lady ?
hast thou slain tybalt ? have you killed tybalt ?
an i might live to see thee married once , i have my wish . if i could live to see you married, i ’ ve my wish .
benvolio , who began this bloody fray ? benvolio , who started this bloody fight itself ?
what is your will ? what do you want ?
call her forth to me . bring her out to me .

Cherrypicked examples from this repo (Modern English -> Shakespeare)

Input Output
but you’re not listening to me. but you do not hear me .
Gregory, on my word, we will not be humiliated, like carrying coal. regory , we 'll not carry coals .
but he got the promotion. he is the friend .
i can hit quickly, if i'm motivated. i strike , i am moved .
Did you just give us the finger, sir? have you leave the thumb , sir ?
You don’t know what you’re doing! you do not what you know you .
have you killed Tybalt? hast thou slain tybalt ?
Why, Romeo, are you crazy? why , art thou mad , mad ?

Pre-Trained Models

Here is a link for an example model: https://s3-us-west-2.amazonaws.com/foxtype-nlp/tensorshake/model_cache.zip

Possible improvements

  • word embeddings
  • beam search
  • language model reranking
Owner
Motoki Wu
AI @cresta. Formerly @chegg, @writelab, @foxtype, @lingonautics
Motoki Wu
PHOTONAI is a high level python API for designing and optimizing machine learning pipelines.

PHOTONAI is a high level python API for designing and optimizing machine learning pipelines. We've created a system in which you can easily select and

Medical Machine Learning Lab - University of Münster 57 Nov 12, 2022
C3DPO - Canonical 3D Pose Networks for Non-rigid Structure From Motion.

C3DPO: Canonical 3D Pose Networks for Non-Rigid Structure From Motion By: David Novotny, Nikhila Ravi, Benjamin Graham, Natalia Neverova, Andrea Vedal

Meta Research 309 Dec 16, 2022
ShuttleNet: Position-aware Fusion of Rally Progress and Player Styles for Stroke Forecasting in Badminton (AAAI 2022)

ShuttleNet: Position-aware Rally Progress and Player Styles Fusion for Stroke Forecasting in Badminton (AAAI 2022) Official code of the paper ShuttleN

Wei-Yao Wang 11 Nov 30, 2022
retweet 4 satoshi ⚡️

rt4sat retweet 4 satoshi This bot is the codebase for https://twitter.com/rt4sat please feel free to create an issue if you saw any bugs basically thi

6 Sep 30, 2022
Fast and Context-Aware Framework for Space-Time Video Super-Resolution (VCIP 2021)

Fast and Context-Aware Framework for Space-Time Video Super-Resolution Preparation Dependencies PyTorch 1.2.0 CUDA 10.0 DCNv2 cd model/DCNv2 bash make

Xueheng Zhang 1 Mar 29, 2022
GitHub repository for the ICLR Computational Geometry & Topology Challenge 2021

ICLR Computational Geometry & Topology Challenge 2022 Welcome to the ICLR 2022 Computational Geometry & Topology challenge 2022 --- by the ICLR 2022 W

42 Dec 13, 2022
🦕 NanoSaur is a little tracked robot ROS2 enabled, made for an NVIDIA Jetson Nano

🦕 nanosaur NanoSaur is a little tracked robot ROS2 enabled, made for an NVIDIA Jetson Nano Website: nanosaur.ai Do you need an help? Discord For tech

NanoSaur 162 Dec 09, 2022
Only valid pull requests will be allowed. Use python only and readme changes will not be accepted.

❌ This repo is excluded from hacktoberfest This repo is for python beginners and contains lot of beginner python projects for practice. You can also s

Prajjwal Pathak 50 Dec 28, 2022
Voxel Transformer for 3D object detection

Voxel Transformer This is a reproduced repo of Voxel Transformer for 3D object detection. The code is mainly based on OpenPCDet. Introduction We provi

173 Dec 25, 2022
ParaGen is a PyTorch deep learning framework for parallel sequence generation

ParaGen is a PyTorch deep learning framework for parallel sequence generation. Apart from sequence generation, ParaGen also enhances various NLP tasks, including sequence-level classification, extrac

Bytedance Inc. 169 Dec 22, 2022
Dataset VSD4K includes 6 popular categories: game, sport, dance, vlog, interview and city.

CaFM-pytorch ICCV ACCEPT Introduction of dataset VSD4K Our dataset VSD4K includes 6 popular categories: game, sport, dance, vlog, interview and city.

96 Jul 05, 2022
Real-time Neural Representation Fusion for Robust Volumetric Mapping

NeuralBlox: Real-Time Neural Representation Fusion for Robust Volumetric Mapping Paper | Supplementary This repository contains the implementation of

ETHZ ASL 106 Dec 24, 2022
Python Single Object Tracking Evaluation

pysot-toolkit The purpose of this repo is to provide evaluation API of Current Single Object Tracking Dataset, including VOT2016 VOT2018 VOT2018-LT OT

348 Dec 22, 2022
Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners

DART Implementation for ICLR2022 paper Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. Environment

ZJUNLP 83 Dec 27, 2022
MarcoPolo is a clustering-free approach to the exploration of bimodally expressed genes along with group information in single-cell RNA-seq data

MarcoPolo is a method to discover differentially expressed genes in single-cell RNA-seq data without depending on prior clustering Overview MarcoPolo

Chanwoo Kim 13 Dec 18, 2022
A transformer which can randomly augment VOC format dataset (both image and bbox) online.

VocAug It is difficult to find a script which can augment VOC-format dataset, especially the bbox. Or find a script needs complex requirements so it i

Coder.AN 1 Mar 05, 2022
Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"

Memory Efficient Attention Pytorch Implementation of a memory efficient multi-head attention as proposed in the paper, Self-attention Does Not Need O(

Phil Wang 180 Jan 05, 2023
Graph WaveNet apdapted for brain connectivity analysis.

Graph WaveNet for brain network analysis This is the implementation of the Graph WaveNet model used in our manuscript: S. Wein , A. Schüller, A. M. To

4 Dec 17, 2022
[CVPR 2021] Anycost GANs for Interactive Image Synthesis and Editing

Anycost GAN video | paper | website Anycost GANs for Interactive Image Synthesis and Editing Ji Lin, Richard Zhang, Frieder Ganz, Song Han, Jun-Yan Zh

MIT HAN Lab 726 Dec 28, 2022
Python wrappers to the C++ library SymEngine, a fast C++ symbolic manipulation library.

SymEngine Python Wrappers Python wrappers to the C++ library SymEngine, a fast C++ symbolic manipulation library. Installation Pip See License section

136 Dec 28, 2022