Skip to content

debajyotidatta/RecurrentArchitectures

Repository files navigation

Why this? What is the goal?

The goal of this repository is to write all the recurrent architectures from scratch in tensorflow for learning purposes. This is a Work-In-Progress. I plan to implement some more architectures and publish the results and performances for all of them. The inspiration for this post was the last paragraph of this post: Understanding LSTMs Chris Olah mentioned two papers that did extensive study on recurrent architectures and I wanted to implement all the architectures in these two papers. A short Google search resulted that Jim Flemming already did half the work here, so I decided to implement all the remaining architectures of Jozefowicz's paper. (I also updated parts of his code so that all the architectures work in the newest version of tensorflow. Both these papers are fantastic and worth a read. Feel free to send me a pull request if you spot an error and/or find other papers with recurrent architecture variants. As and when time permits, I will implement them. All the implementations are in Tensorflow (0.12).

Deep Learning Recurrent Architectures

This was mainly because I wanted to learn the actual implementations of various recurrent neural network architecures and implement them from scratch without using pre defined lstm, gru etc. This is directly a fork of LSTM Network Variants, with the code changes to run on the most recent version of tensorflow. (0.12.0 as of this writing). I will keep this repositiory upto date with the new changes.

Also this repo has more network architectures from here: Empirical Exploration of Recurrent Network Architectures

The implementations are not optimal, in the sense, that in the actual implementations of the LSTM, GRU and RNN cells the states and input are concatenated before multiplications to reduce the number of matrix multiplications whereas this is directly an implementation of the lstm network that you would see in a textbook.

Other Tutorials with that are also helpful

Recurrent Architectures Implemented

If with a (*) then it was implemented in LSTM Network Variants, else was implemented by me based on Empirical Exploration of Recurrent Network Architectures . Also network architectures that I have implemented follow the conventions and syntax of Empirical Exploration of Recurrent Network Architectures.

Instructions

See the jupyter notebook here: https://github.com/debajyotidatta/RecurrentArchitectures/blob/master/Empirical%20Exploration%20of%20Recurrent%20Network%20Architectures.ipynb