Independent and minimal implementations of some reinforcement learning algorithms using PyTorch (including PPO, A3C, A2C, ...).

Last update: Dec 31, 2022

Overview

PyTorch RL Minimal Implementations

There are implementations of some reinforcement learning algorithms, whose characteristics are as follow:

Less packages-based: Only PyTorch and Gym, for building neural networks and testing algorithms' performance respectively, are necessary to install.
Independent implementation: All RL algorithms are implemented in separate files, which facilitates to understand their processes and modify them to adapt to other tasks.
Various expansion configurations: It's convenient to configure various parameters and tools, such as reward normalization, advantage normalization, tensorboard, tqdm and so on.

RL Algorithms List

Name	Type	Estimator	Paper	File
Q-Learning	Value-based / Off policy	TD	Watkins et al. Q-Learning. Machine Learning, 1992	q_learning.py
REINFORCE	Policy-based On policy	MC	Sutton et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation. In NeurIPS, 2000.	reinforce.py
DQN	Value-based / Off policy	TD	Mnih et al. Human-level control through deep reinforcement learning. Nature, 2015.	doing
A2C	Actor-Critic / On policy	n-step TD	Mnih et al. Asynchronous Methods for Deep Reinforcement Learning. In ICML, 2016.	a2c.py
A3C	Actor-Critic / On policy	n-step TD	.Mnih et al. Asynchronous Methods for Deep Reinforcement Learning. In ICML, 2016	a3c.py
ACER	Actor-Critic / On policy	GAE	Wang et al. Sample Efficient Actor-Critic with Experience Replay. In ICLR, 2017.	doing
ACKTR	Actor-Critic / On policy	GAE	Wu et al. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation. In NeurIPS, 2017.	doing
PPO	Actor-Critic / On policy	GAE	Schulman et al. Proximal Policy Optimization Algorithms. arXiv, 2017.	ppo.py

Quick Start

Requirements

pytorch
gym

tensorboard  # for summary writer
tqdm         # for process bar

Abstract Agent

Components / Parameters

Component	Description
policy	neural network model
gamma	discount factor of cumulative reward
lr	learning rate. i.e. `lr_actor`, `lr_critic`
lr_decay	weight decay to schedule the learning rate
lr_scheduler	scheduler for the learning rate
coef_critic_loss	coefficient of critic loss
coef_entropy_loss	coefficient of entropy loss
writer	summary writer to record information
buffer	replay buffer to store historical trajectories
use_cuda	use GPU
clip_grad	gradients clipping
max_grad_norm	maximum norm of gradients clipped
norm_advantage	advantage normalization
open_tb	open summary writer
open_tqdm	open process bar

Methods

Methods	Description
preprocess_obs()	preprocess observation before input into the neural network
select_action()	use actor network to select an action based on the policy distribution.
estimate_obs()	use critic network to estimate the value of observation
update()	update the parameter by calculate losses and gradients
train()	set the neural network to train mode
eval()	set the neural network to evaluate mode
save()	save the model parameters
load()	load the model parameters

Update & To-do & Limitations

Update History

2021-12-09 ADD TRICK:norm_critic_loss in PPO
2021-12-09 ADD PARAM: coef_critic_loss, coef_entropy_loss, log_step
2021-12-07 ADD ALGO: A3C
2021-12-05 ADD ALGO: PPO
2021-11-28 ADD ALGO: A2C
2021-11-20 ADD ALGO: Q learning, Reinforce

Independent and minimal implementations of some reinforcement learning algorithms using PyTorch (including PPO, A3C, A2C, ...).

Related tags

Overview

PyTorch RL Minimal Implementations

RL Algorithms List

Quick Start

Requirements

Abstract Agent

Components / Parameters

Methods

Update & To-do & Limitations

Update History

To-do List

Current Limitations

Reference & Acknowledgements

Owner

Gemini Light

FewBit — a library for memory efficient training of large neural networks

Repository of best practices for deep learning in Julia, inspired by fastai

Libraries, tools and tasks created and used at DeepMind Robotics.

The codebase for our paper "Generative Occupancy Fields for 3D Surface-Aware Image Synthesis" (NeurIPS 2021)

Code for our ALiBi method for transformer language models.

PyTorch implementations of Generative Adversarial Networks.

The Python code for the paper A Hybrid Quantum-Classical Algorithm for Robust Fitting

Minimal PyTorch implementation of Generative Latent Optimization from the paper "Optimizing the Latent Space of Generative Networks"

A universal framework for learning timestamp-level representations of time series

Simulation of moving particles under microscopic imaging

Code for the submitted paper Surrogate-based cross-correlation for particle image velocimetry

CCNet: Criss-Cross Attention for Semantic Segmentation (TPAMI 2020 & ICCV 2019).

Augmented Traffic Control: A tool to simulate network conditions

T2F: text to face generation using Deep Learning

Self-describing JSON-RPC services made easy

Reinforcement learning library(framework) designed for PyTorch, implements DQN, DDPG, A2C, PPO, SAC, MADDPG, A3C, APEX, IMPALA ...

Bringing Computer Vision and Flutter together , to build an awesome app !!

Code for Fold2Seq paper from ICML 2021

A PyTorch implementation of ViTGAN based on paper ViTGAN: Training GANs with Vision Transformers.

Small utility to demangle Nim symbols in callgrind files