Policy Gradient Algorithms (One Step Actor Critic & PPO) from scratch using Numpy

Last update: Jan 17, 2022

Related tags

Overview

Policy Gradient Algorithms From Scratch (NumPy)

This repository showcases two policy gradient algorithms (One Step Actor Critic and Proximal Policy Optimization) applied to two MDPs. The algorithms are implemented from scratch with Numpy and utilize linear regression for the value function and single layer Softmax for the policy. The MDPs are: Gridworld and Mountain Car.

Run Instructions

Packages:

numpy and matplotlib

Create virtual environment, install requirements and run: (windows instructions)

Run python -m venv venv
Run .\venv\Scripts\activate (windows)
Run pip install -r requirements.txt
Run python .\experiments.py be wary of long compute times and plots that will pop up and must be exited in order to comtinue.

Some Sample Plots

Files

experiments.py - Runs pre programmed experiments that output various plots both in the terminal and saved to .png files.
mdp.py - Contains two MDP domains: Gridworld and Mountain Car, that the experiments are run on.
models.py - Contains ValueFunction and Policy which are the two models used (linear layers) for function approximation by the algorithms.
policy_gradient_algorithms.py - Contains the policy gradient algorithms One Step Actor Critic and Proximal Policy Optimization (PPO).

MIT License

Policy Gradient Algorithms (One Step Actor Critic & PPO) from scratch using Numpy

Related tags

Overview

Policy Gradient Algorithms From Scratch (NumPy)

Run Instructions

Packages:

Some Sample Plots

Files

Owner

Code for generating alloy / disordered structures through the special quasirandom structure (SQS) algorithm

This repository is not maintained

Implementation for Evolution of Strategies for Cooperation

This is an implementation of the QuickHull algorithm in Python. I

HashDB is a community-sourced library of hashing algorithms used in malware.

Implementation of an ordered dithering algorithm used in computer graphics

marching Squares algorithm in python with clean code.

Algorithm and Structured Programming course project for the first semester of the Internet Systems course at IFPB

Sign data using symmetric-key algorithm encryption.

Genius Square puzzle solver in Python

🧬 Performant Evolutionary Algorithms For Python with Ray support

🧬 Training the car to do self-parking using a genetic algorithm

Implements (high-dimenstional) clustering algorithm

PathPlanning - Common used path planning algorithms with animations.

Implementation of Apriori algorithms via Python

Tic-tac-toe with minmax algorithm.

Cormen-Lib - An academic tool for data structures and algorithms courses

QDax is a tool to accelerate Quality-Diveristy (QD) algorithms through hardware accelerators and massive parallelism

Pathfinding visualizer in pygame: A*

Rover. Finding the shortest pass by Dijkstra’s shortest path algorithm