PyTorch implementations of deep reinforcement learning algorithms and environments

Last update: Jan 04, 2023

Overview

Deep Reinforcement Learning Algorithms with PyTorch

This repository contains PyTorch implementations of deep reinforcement learning algorithms and environments.

(To help you remember things you learn about machine learning in general write them in Save All and try out the public deck there about Fast AI's machine learning textbook.)

Algorithms Implemented

Deep Q Learning (DQN) _{^{(Mnih et al. 2013)}}
DQN with Fixed Q Targets _{^{(Mnih et al. 2013)}}
Double DQN (DDQN) _{^{(Hado van Hasselt et al. 2015)}}
DDQN with Prioritised Experience Replay _{^{(Schaul et al. 2016)}}
Dueling DDQN _{^{(Wang et al. 2016)}}
REINFORCE _{^{(Williams et al. 1992)}}
Deep Deterministic Policy Gradients (DDPG) _{^{(Lillicrap et al. 2016 )}}
Twin Delayed Deep Deterministic Policy Gradients (TD3) _{^{(Fujimoto et al. 2018)}}
Soft Actor-Critic (SAC) _{^{(Haarnoja et al. 2018)}}
Soft Actor-Critic for Discrete Actions (SAC-Discrete) _{^{(Christodoulou 2019)}}
Asynchronous Advantage Actor Critic (A3C) _{^{(Mnih et al. 2016)}}
Syncrhonous Advantage Actor Critic (A2C)
Proximal Policy Optimisation (PPO) _{^{(Schulman et al. 2017)}}
DQN with Hindsight Experience Replay (DQN-HER) _{^{(Andrychowicz et al. 2018)}}
DDPG with Hindsight Experience Replay (DDPG-HER) _{^{(Andrychowicz et al. 2018 )}}
Hierarchical-DQN (h-DQN) _{^{(Kulkarni et al. 2016)}}
Stochastic NNs for Hierarchical Reinforcement Learning (SNN-HRL) _{^{(Florensa et al. 2017)}}
Diversity Is All You Need (DIAYN) _{^{(Eyensbach et al. 2018)}}

All implementations are able to quickly solve Cart Pole (discrete actions), Mountain Car Continuous (continuous actions), Bit Flipping (discrete actions with dynamic goals) or Fetch Reach (continuous actions with dynamic goals). I plan to add more hierarchical RL algorithms soon.

Environments Implemented

Bit Flipping Game _{^{(as described in Andrychowicz et al. 2018)}}
Four Rooms Game _{^{(as described in Sutton et al. 1998)}}
Long Corridor Game _{^{(as described in Kulkarni et al. 2016)}}
Ant-{Maze, Push, Fall} _{^{(as desribed in Nachum et al. 2018 and their accompanying code)}}

Results

1. Cart Pole and Mountain Car

Below shows various RL algorithms successfully learning discrete action game Cart Pole or continuous action game Mountain Car. The mean result from running the algorithms with 3 random seeds is shown with the shaded area representing plus and minus 1 standard deviation. Hyperparameters used can be found in files results/Cart_Pole.py and results/Mountain_Car.py.

2. Hindsight Experience Replay (HER) Experiements

Below shows the performance of DQN and DDPG with and without Hindsight Experience Replay (HER) in the Bit Flipping (14 bits) and Fetch Reach environments described in the papers Hindsight Experience Replay 2018 and Multi-Goal Reinforcement Learning 2018. The results replicate the results found in the papers and show how adding HER can allow an agent to solve problems that it otherwise would not be able to solve at all. Note that the same hyperparameters were used within each pair of agents and so the only difference between them was whether hindsight was used or not.

3. Hierarchical Reinforcement Learning Experiments

The results on the left below show the performance of DQN and the algorithm hierarchical-DQN from Kulkarni et al. 2016 on the Long Corridor environment also explained in Kulkarni et al. 2016. The environment requires the agent to go to the end of a corridor before coming back in order to receive a larger reward. This delayed gratification and the aliasing of states makes it a somewhat impossible game for DQN to learn but if we introduce a meta-controller (as in h-DQN) which directs a lower-level controller how to behave we are able to make more progress. This aligns with the results found in the paper.

The results on the right show the performance of DDQN and algorithm Stochastic NNs for Hierarchical Reinforcement Learning (SNN-HRL) from Florensa et al. 2017. DDQN is used as the comparison because the implementation of SSN-HRL uses 2 DDQN algorithms within it. Note that the first 300 episodes of training for SNN-HRL were used for pre-training which is why there is no reward for those episodes.

Usage

The repository's high-level structure is:

├── agents                    
    ├── actor_critic_agents   
    ├── DQN_agents         
    ├── policy_gradient_agents
    └── stochastic_policy_search_agents 
├── environments   
├── results             
    └── data_and_graphs        
├── tests
├── utilities             
    └── data structures

i) To watch the agents learn the above games

To watch all the different agents learn Cart Pole follow these steps:

git clone https://github.com/p-christ/Deep_RL_Implementations.git
cd Deep_RL_Implementations

conda create --name myenvname
y
conda activate myenvname

pip3 install -r requirements.txt

python results/Cart_Pole.py

For other games change the last line to one of the other files in the Results folder.

ii) To train the agents on another game

Most Open AI gym environments should work. All you would need to do is change the config.environment field (look at Results/Cart_Pole.py for an example of this).

You can also play with your own custom game if you create a separate class that inherits from gym.Env. See Environments/Four_Rooms_Environment.py for an example of a custom environment and then see the script Results/Four_Rooms.py to see how to have agents play the environment.

PyTorch implementations of deep reinforcement learning algorithms and environments

Related tags

Overview

Deep Reinforcement Learning Algorithms with PyTorch

Algorithms Implemented

Environments Implemented

Results

1. Cart Pole and Mountain Car

2. Hindsight Experience Replay (HER) Experiements

3. Hierarchical Reinforcement Learning Experiments

Usage

i) To watch the agents learn the above games

ii) To train the agents on another game

Owner

Petros Christodoulou

The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization

This repository stores the code to reproduce the results published in "TiWS-iForest: Isolation Forest in Weakly Supervised and Tiny ML scenarios"

FaceVerse: a Fine-grained and Detail-controllable 3D Face Morphable Model from a Hybrid Dataset (CVPR2022)

Real-Time Seizure Detection using EEG: A Comprehensive Comparison of Recent Approaches under a Realistic Setting

Implementing Vision Transformer (ViT) in PyTorch

Pytorch-Swin-Unet-V2 - a modified version of Swin Unet based on Swin Transfomer V2

DWIPrep is a robust and easy-to-use pipeline for preprocessing of diverse dMRI data.

Organseg dags - The repository contains the codebase for multi-organ segmentation with directed acyclic graphs (DAGs) in CT.

[CVPR'21 Oral] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Implementation of ProteinBERT in Pytorch

[SIGGRAPH 2020] Attribute2Font: Creating Fonts You Want From Attributes

A repository built on the Flow software package to explore cyber-security attacks on intelligent transportation systems.

Cards Against Humanity AI

最新版本yolov5+deepsort目标检测和追踪，支持5.0版本可训练自己数据集

A curated list of the latest breakthroughs in AI (in 2021) by release date with a clear video explanation, link to a more in-depth article, and code.

Using knowledge-informed machine learning on the PRONOSTIA (FEMTO) and IMS bearing data sets. Predict remaining-useful-life (RUL).

Official code for "Maximum Likelihood Training of Score-Based Diffusion Models", NeurIPS 2021 (spotlight)

GenshinMapAutoMarkTools - Tools To add/delete/refresh resources mark in Genshin Impact Map

Tensorflow 2 Object Detection API kurulumu, GPU desteği, custom model hazırlama

A Pytorch Implementation of ClariNet