Policy Gradient Algorithms From Scratch (NumPy)

This repository showcases two policy gradient algorithms (One Step Actor Critic and Proximal Policy Optimization) applied to two MDPs. The algorithms are implemented from scratch with Numpy and utilize linear regression for the value function and single layer Softmax for the policy. The MDPs are: Gridworld and Mountain Car.

Run Instructions

Packages:

numpy and matplotlib

Create virtual environment, install requirements and run: (windows instructions)

Run python -m venv venv
Run .\venv\Scripts\activate (windows)
Run pip install -r requirements.txt
Run python .\experiments.py be wary of long compute times and plots that will pop up and must be exited in order to comtinue.

Some Sample Plots

Files

experiments.py - Runs pre programmed experiments that output various plots both in the terminal and saved to .png files.
mdp.py - Contains two MDP domains: Gridworld and Mountain Car, that the experiments are run on.
models.py - Contains ValueFunction and Policy which are the two models used (linear layers) for function approximation by the algorithms.
policy_gradient_algorithms.py - Contains the policy gradient algorithms One Step Actor Critic and Proximal Policy Optimization (PPO).

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
plots		plots
source		source
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
RL Final Project Write Up.pdf		RL Final Project Write Up.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

plots

plots

source

source

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

RL Final Project Write Up.pdf

RL Final Project Write Up.pdf

requirements.txt

requirements.txt

Repository files navigation

Policy Gradient Algorithms From Scratch (NumPy)

Run Instructions

Packages:

Some Sample Plots

Files

About

Releases

Packages

Languages

License

samkovaly/PolicyGradientsNumpy

Folders and files

Latest commit

History

Repository files navigation

Policy Gradient Algorithms From Scratch (NumPy)

Run Instructions

Packages:

Some Sample Plots

Files

About

Resources

License

Stars

Watchers

Forks

Languages