Skip to content

jjccero/pbrl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PBRL: A Population Based Reinforcement Learning Library based on PyTorch

Python

Introduction

Small, fast and reproducible implementation of reinforcement learning algorithms.
Support OpenAI Gym (Atari, MuJoCo and Box2D) and custom tasks.

In general, the default hyperparameters of each algorithm are consistent with those in the original paper. PBRL provides default training scripts in the ./examples folder. These scripts can change the hyperparameters of the algorithm by command line parameters.

PBRL provides a base class for PBT so that developers can quickly implement asynchronous training of the model by rewriting the parent and child process working functions.

Installation

Ubuntu is recommended.

Make sure your Conda environment is activated before installing following requirements:
Pytorch

git clone https://github.com/jjccero/pbrl.git
cd pbrl
pip install -e .

Examples

Train and evaluate CartPole agent:

python examples/quick_start.py

Try replacing CartPole above with MountainCar-v0 (rl-baselines3-zoo):

cd examples/ppo
python train.py --obs_norm --reward_norm --adv_norm --gae_lambda 0.98 --repeat 4 --buffer_size 256 --env MountainCar-v0

MuJoCo:

PPG for Humanoid-v3

cd examples/ppg
python train.py --obs_norm --reward_norm

Use Population Based Training:
python pbt_train.py

PPO for Walker2d-v3

cd examples/ppo
python train.py --obs_norm --reward_norm --recompute_adv --lr_decay --subproc

TD3 and SAC for HalfCheetah-v3

cd examples/td3
python train.py
cd examples/sac
python train_sac2.py

Open a new terminal (./result will be automatically created when the training starts):
tensorboard --logdir result

Then you can access the training information by visiting http://localhost:6006/ in browser.

Structure

Algorithm

PPO's Tricks

  • Orthogonal Initialize
  • Learning rate decay
  • Generalized Advantage Estimation (GAE)
  • Observation Normalization and Reward Scaling (RunningMeanStd)

Off-policy algorithms' Tricks

  • Infinite MDPs (done_real = done & (episode_steps < max_episode_steps))
  • DistributionalReplayBuffer (A distributed experience replay buffer is implemented in the pbrl.algorithms.dqn.buffer module, which allows some off-policy algorithms to collect samples through sub processes.)

Population Based Training (PBT)

PBT implements communication between multiple processes by creating pipelines of multiple parent-child processes. This needs to pass the working function of the child process to the constructor of PBT. The parameters of the working function can also be passed through the constructor, and then the run method of PBT executes the listening of the child process commands. It is strongly recommended that you call seed() after PBT().

PBT can be inherited, and the run() method is rewritten to handle the logic of the corresponding work functions, which means that some methods of PBT will not be used if unnecessary.

  • select() Ranked by mean episodic rewards. Agent in the bottom copies the top.
  • explore() Each hyperparameter is randomly perturbed by a factor of 1.2 or 0.8.

Custom Tasks

Each environment needs to rewrite the step(), reset() methods, and define the corresponding action_space and observation_space. Before creating an environment through gym.make(), you need to register the corresponding environment in the module that can be imported.

Refer to the rnn.py to customize your own environment.

cd examples/ppo
python train.py --env RnnTest-v0 --chunk_len 8 --rnn gru --gamma 0.0 --lr 1e-3 --log_interval 2048
  • General RL algorithms will achieve an average reward of 55.5.
  • Because of the state memory unit, RNN based RL algorithms can reach the goal of 100.0.

2021, ICCD Lab, Dalian University of Technology. Author: Jingcheng Jiang.

About

A Population Based Reinforcement Learning Library based on PyTorch

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages