Assessing the Influence of Models on the Performance of Reinforcement Learning Algorithms applied on Continuous Control Tasks

Overview

Assessing the Influence of Models on the Performance of Reinforcement Learning Algorithms applied on Continuous Control Tasks

This is the master thesis project by Giacomo Arcieri, written at the FZI Research Center for Information Technology (Karlsruhe, Germany).

Introduction

Model-Based Reinforcement Learning (MBRL) has recently become popular as it is expected to solve RL problems with fewer trials (i.e. higher sample efficiency) than model-free methods. However, it is not clear how much of the recent MBRL progress is due to improved algorithms or due to improved models. Hence, this work compares a set of mathematical methods that are commonly used as models for MBRL. This thesis aims to provide a benchmark to assess the model influence on RL algorithms. The evaluated models will be (deterministic) Neural Networks (NNs), ensembles of (deterministic) NNs, Bayesian Neural Networks (BNNs), and Gaussian Processes (GPs). Two different and innovative BNNs are applied: the Concrete Dropout NN and the Anchored Ensembling. The model performance is assessed on a large suite of different benchmarking environments, namely one OpenAI Gym Classic Control problem (Pendulum) and seven PyBullet-Gym tasks (MuJoCo implementation). The RL algorithm the model performance is assessed on is Model Predictive Control (MPC) combined with Random Shooting (RS).

Requirements

This project is tested on Python 3.6.

First, you can perform a minimal installation of OpenAI Gym with

git clone https://github.com/openai/gym.git
cd gym
pip install -e .

Then, you can install Pybullet-Gym with

git clone https://github.com/benelot/pybullet-gym.git
cd pybullet-gym
pip install -e .

Important: Do not use python setup.py install or other Pybullet-Gym installation methods.

Finally, you can install all the dependencies with

pip install -r requirements.txt

Important: There are a couple of changes to make in two Pybullet-Gym envs:

  1. There is currently a mistake in Hopper. This project uses HopperMuJoCoEnv-v0, but this env imports the Roboschool locomotor instead of the MuJoCo locomotor. Open the file
pybullet-gym/pybulletgym/envs/mujoco/envs/locomotion/hopper_env.py

and change

from pybulletgym.envs.roboschool.robots.locomotors import Hopper

with

from pybulletgym.envs.mujoco.robots.locomotors.hopper import Hopper
  1. Ant has obs_dim=111 but only the first 27 obs are important, the others are only zeros. If it is true that these zeros do not affect performance, it is also true they slow down the training, especially for the Gaussian Process. Therefore, it is better to delete these unimportant obs. Open the file
pybullet-gym/pybulletgym/envs/mujoco/robots/locomotors/ant.py

and set obs_dim=27 and comment or delete line 25

np.clip(cfrc_ext, -1, 1).flat

Project Description

Models

The models are defined in the folder models:

  • deterministicNN.py: it includes the deterministic NN (NN) and the deterministic ensemble (ens_NNs).

  • PNN.py: here the Anchored Ensembling is defined following this example. PNN defines one NN of the Anchored Ensembling. This is needed to define ens_PNNs which is the Anchored Ensembling as well as the model applied in the evaluation.

  • ConcreteDropout.py: it defines the Concrete Dropout NN, mainly based on the Yarin Gal's notebook, but also on this other project. First, the ConcreteDropout Layer is defined. Then, the Concrete Dropout NN is designed (BNN). Finally, also an ensemble of Concrete Dropout NNs is defined (ens_BNN), but I did not use it in the model comparison (ens_BNN is extremely slow and BNN is already like an ensemble).

  • GP.py: it defines the Gaussian Process model based on gpflow. Two different versions are applied: the GPR and the SVGP (choose by setting the parameter gp_model). Only the GPR performance is reported in the evaluation because the SVGP has not even solved the Pendulum environment.

RL algorithm

The model performance is evaluated in the following files:

  1. main.py: it is defined the function main which takes all the params that are passed to MB_trainer. Five MB_trainer are initialized, each with a different seed, which are run in parallel. It is also possible to run two models in parallel by setting the param model2 as well.

  2. MB_trainer.py: it includes the initialization of the env and the model as well as the RL training loop. The function play_one_step computes one step of the loop. The model is trained with the function training_step. At the end of the loop, a pickle file is saved, wich includes all the rewards achieved by the model in all the episodes of the env.

  3. play_one_step.py: it includes all the functions to compute one step (i.e. to choose one action): the epsilon greedy policy for the exploration, the Information Gain exploration, and the exploitation of the model with MPC+RS (function get_action). The rewards as well as the RS trajectories are computed with the cost functions in cost_functions.py.

  4. training_step.py: first the relevant information is prepared by the function data_training, then the model is trained with the function training_step.

  5. cost_functions.py: it includes all the cost functions of the envs.

Other two files are contained in the folder rewards:

  • plot_rewards.ipynb: it is the notebook where the model performance is plotted. First, the 5 pickles associated with the 5 seeds are combined in only one pickle. Then, the performance is evaluated with various plots.

  • distribution.ipynb: this notebook inspects the distribution of the seeds in InvertedDoublePendulum (Section 6.9 of the thesis).

Results

Our results show significant differences among models performance do exist.

It is the Concrete Dropout NN the clear winner of the model comparison. It reported higher sample efficiency, overall performance and robustness across different seeds in Pendulum, InvertedPendulum, InvertedDoublePendulum, ReacherPyBullet, HalfCheetah, and Hopper. In Walker2D and Ant it was no worse than the others either.

Authors should be aware of the differences found and distinguish between improvements due to better algorithms or due to better models when they present novel methods.

The figures of the evaluation are reported in the folder rewards/images.

Acknowledgment

Special thanks go to the supervisor of this project David Woelfle.

Owner
Giacomo Arcieri
Giacomo Arcieri
OpenDILab RL Kubernetes Custom Resource and Operator Lib

DI Orchestrator DI Orchestrator is designed to manage DI (Decision Intelligence) jobs using Kubernetes Custom Resource and Operator. Prerequisites A w

OpenDILab 205 Dec 29, 2022
Official code for our ICCV paper: "From Continuity to Editability: Inverting GANs with Consecutive Images"

GANInversion_with_ConsecutiveImgs Official code for our ICCV paper: "From Continuity to Editability: Inverting GANs with Consecutive Images" https://a

QingyangXu 38 Dec 07, 2022
Self-training with Weak Supervision (NAACL 2021)

This repo holds the code for our weak supervision framework, ASTRA, described in our NAACL 2021 paper: "Self-Training with Weak Supervision"

Microsoft 148 Nov 20, 2022
EZ graph is an easy to use AI solution that allows you to make and train your neural networks without a single line of code.

EZ-Graph EZ Graph is a GUI that allows users to make and train neural networks without writing a single line of code. Requirements python 3 pandas num

1 Jul 03, 2022
Earth Vision Foundation

EVer - A Library for Earth Vision Researcher EVer is a Pytorch-based Python library to simplify the training and inference of the deep learning model.

Zhuo Zheng 34 Nov 26, 2022
DeepMind's software stack for physics-based simulation and Reinforcement Learning environments, using MuJoCo.

dm_control: DeepMind Infrastructure for Physics-Based Simulation. DeepMind's software stack for physics-based simulation and Reinforcement Learning en

DeepMind 3k Dec 31, 2022
Utilities and information for the signals.numer.ai tournament

dsignals Utilities and information for the signals.numer.ai tournament using eodhistoricaldata.com eodhistoricaldata.com provides excellent historical

Degerhan Usluel 23 Dec 18, 2022
[SIGIR22] Official PyTorch implementation for "CORE: Simple and Effective Session-based Recommendation within Consistent Representation Space".

CORE This is the official PyTorch implementation for the paper: Yupeng Hou, Binbin Hu, Zhiqiang Zhang, Wayne Xin Zhao. CORE: Simple and Effective Sess

RUCAIBox 26 Dec 19, 2022
Optimized Gillespie algorithm for simulating Stochastic sPAtial models of Cancer Evolution (OG-SPACE)

OG-SPACE Introduction Optimized Gillespie algorithm for simulating Stochastic sPAtial models of Cancer Evolution (OG-SPACE) is a computational framewo

Data and Computational Biology Group UNIMIB (was BI*oinformatics MI*lan B*icocca) 0 Nov 17, 2021
My course projects for the 2021 Spring Machine Learning course at the National Taiwan University (NTU)

ML2021Spring There are my projects for the 2021 Spring Machine Learning course at the National Taiwan University (NTU) Course Web : https://speech.ee.

Ding-Li Chen 15 Aug 29, 2022
Pytorch implementation of Straight Sampling Network For Point Cloud Learning (ICIP2021).

Pytorch code for SS-Net This is a pytorch implementation of Straight Sampling Network For Point Cloud Learning (ICIP2021). Environment Code is tested

Sun Ran 1 May 18, 2022
Accelerated NLP pipelines for fast inference on CPU and GPU. Built with Transformers, Optimum and ONNX Runtime.

Optimum Transformers Accelerated NLP pipelines for fast inference 🚀 on CPU and GPU. Built with 🤗 Transformers, Optimum and ONNX runtime. Installatio

Aleksey Korshuk 115 Dec 16, 2022
NovelD: A Simple yet Effective Exploration Criterion

NovelD: A Simple yet Effective Exploration Criterion Intro This is an implementation of the method proposed in NovelD: A Simple yet Effective Explorat

29 Dec 05, 2022
GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training @ KDD 2020

GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training Original implementation for paper GCC: Graph Contrastive Coding for Graph Neural N

THUDM 274 Dec 27, 2022
Statistical-Rethinking-with-Python-and-PyMC3 - Python/PyMC3 port of the examples in " Statistical Rethinking A Bayesian Course with Examples in R and Stan" by Richard McElreath

Statistical Rethinking with Python and PyMC3 This repository has been deprecated in favour of this one, please check that repository for updates, for

Osvaldo Martin 786 Dec 29, 2022
Official code repository for Continual Learning In Environments With Polynomial Mixing Times

Official code for Continual Learning In Environments With Polynomial Mixing Times Continual Learning in Environments with Polynomial Mixing Times This

Sharath Raparthy 1 Dec 19, 2021
High level network definitions with pre-trained weights in TensorFlow

TensorNets High level network definitions with pre-trained weights in TensorFlow (tested with 2.1.0 = TF = 1.4.0). Guiding principles Applicability.

Taehoon Lee 1k Dec 13, 2022
A python library for implementing a recommender system

python-recsys A python library for implementing a recommender system. Installation Dependencies python-recsys is build on top of Divisi2, with csc-pys

Oscar Celma 1.5k Dec 17, 2022
This is an (re-)implementation of DeepLab-ResNet in TensorFlow for semantic image segmentation on the PASCAL VOC dataset.

DeepLab-ResNet-TensorFlow This is an (re-)implementation of DeepLab-ResNet in TensorFlow for semantic image segmentation on the PASCAL VOC dataset. Up

19 Jan 16, 2022
MogFace: Towards a Deeper Appreciation on Face Detection

MogFace: Towards a Deeper Appreciation on Face Detection Introduction In this repo, we propose a promising face detector, termed as MogFace. Our MogFa

48 Dec 20, 2022