Assessing the Influence of Models on the Performance of Reinforcement Learning Algorithms applied on Continuous Control Tasks

Overview

Assessing the Influence of Models on the Performance of Reinforcement Learning Algorithms applied on Continuous Control Tasks

This is the master thesis project by Giacomo Arcieri, written at the FZI Research Center for Information Technology (Karlsruhe, Germany).

Introduction

Model-Based Reinforcement Learning (MBRL) has recently become popular as it is expected to solve RL problems with fewer trials (i.e. higher sample efficiency) than model-free methods. However, it is not clear how much of the recent MBRL progress is due to improved algorithms or due to improved models. Hence, this work compares a set of mathematical methods that are commonly used as models for MBRL. This thesis aims to provide a benchmark to assess the model influence on RL algorithms. The evaluated models will be (deterministic) Neural Networks (NNs), ensembles of (deterministic) NNs, Bayesian Neural Networks (BNNs), and Gaussian Processes (GPs). Two different and innovative BNNs are applied: the Concrete Dropout NN and the Anchored Ensembling. The model performance is assessed on a large suite of different benchmarking environments, namely one OpenAI Gym Classic Control problem (Pendulum) and seven PyBullet-Gym tasks (MuJoCo implementation). The RL algorithm the model performance is assessed on is Model Predictive Control (MPC) combined with Random Shooting (RS).

Requirements

This project is tested on Python 3.6.

First, you can perform a minimal installation of OpenAI Gym with

git clone https://github.com/openai/gym.git
cd gym
pip install -e .

Then, you can install Pybullet-Gym with

git clone https://github.com/benelot/pybullet-gym.git
cd pybullet-gym
pip install -e .

Important: Do not use python setup.py install or other Pybullet-Gym installation methods.

Finally, you can install all the dependencies with

pip install -r requirements.txt

Important: There are a couple of changes to make in two Pybullet-Gym envs:

  1. There is currently a mistake in Hopper. This project uses HopperMuJoCoEnv-v0, but this env imports the Roboschool locomotor instead of the MuJoCo locomotor. Open the file
pybullet-gym/pybulletgym/envs/mujoco/envs/locomotion/hopper_env.py

and change

from pybulletgym.envs.roboschool.robots.locomotors import Hopper

with

from pybulletgym.envs.mujoco.robots.locomotors.hopper import Hopper
  1. Ant has obs_dim=111 but only the first 27 obs are important, the others are only zeros. If it is true that these zeros do not affect performance, it is also true they slow down the training, especially for the Gaussian Process. Therefore, it is better to delete these unimportant obs. Open the file
pybullet-gym/pybulletgym/envs/mujoco/robots/locomotors/ant.py

and set obs_dim=27 and comment or delete line 25

np.clip(cfrc_ext, -1, 1).flat

Project Description

Models

The models are defined in the folder models:

  • deterministicNN.py: it includes the deterministic NN (NN) and the deterministic ensemble (ens_NNs).

  • PNN.py: here the Anchored Ensembling is defined following this example. PNN defines one NN of the Anchored Ensembling. This is needed to define ens_PNNs which is the Anchored Ensembling as well as the model applied in the evaluation.

  • ConcreteDropout.py: it defines the Concrete Dropout NN, mainly based on the Yarin Gal's notebook, but also on this other project. First, the ConcreteDropout Layer is defined. Then, the Concrete Dropout NN is designed (BNN). Finally, also an ensemble of Concrete Dropout NNs is defined (ens_BNN), but I did not use it in the model comparison (ens_BNN is extremely slow and BNN is already like an ensemble).

  • GP.py: it defines the Gaussian Process model based on gpflow. Two different versions are applied: the GPR and the SVGP (choose by setting the parameter gp_model). Only the GPR performance is reported in the evaluation because the SVGP has not even solved the Pendulum environment.

RL algorithm

The model performance is evaluated in the following files:

  1. main.py: it is defined the function main which takes all the params that are passed to MB_trainer. Five MB_trainer are initialized, each with a different seed, which are run in parallel. It is also possible to run two models in parallel by setting the param model2 as well.

  2. MB_trainer.py: it includes the initialization of the env and the model as well as the RL training loop. The function play_one_step computes one step of the loop. The model is trained with the function training_step. At the end of the loop, a pickle file is saved, wich includes all the rewards achieved by the model in all the episodes of the env.

  3. play_one_step.py: it includes all the functions to compute one step (i.e. to choose one action): the epsilon greedy policy for the exploration, the Information Gain exploration, and the exploitation of the model with MPC+RS (function get_action). The rewards as well as the RS trajectories are computed with the cost functions in cost_functions.py.

  4. training_step.py: first the relevant information is prepared by the function data_training, then the model is trained with the function training_step.

  5. cost_functions.py: it includes all the cost functions of the envs.

Other two files are contained in the folder rewards:

  • plot_rewards.ipynb: it is the notebook where the model performance is plotted. First, the 5 pickles associated with the 5 seeds are combined in only one pickle. Then, the performance is evaluated with various plots.

  • distribution.ipynb: this notebook inspects the distribution of the seeds in InvertedDoublePendulum (Section 6.9 of the thesis).

Results

Our results show significant differences among models performance do exist.

It is the Concrete Dropout NN the clear winner of the model comparison. It reported higher sample efficiency, overall performance and robustness across different seeds in Pendulum, InvertedPendulum, InvertedDoublePendulum, ReacherPyBullet, HalfCheetah, and Hopper. In Walker2D and Ant it was no worse than the others either.

Authors should be aware of the differences found and distinguish between improvements due to better algorithms or due to better models when they present novel methods.

The figures of the evaluation are reported in the folder rewards/images.

Acknowledgment

Special thanks go to the supervisor of this project David Woelfle.

Owner
Giacomo Arcieri
Giacomo Arcieri
Official implementation of NeurIPS'21: Implicit SVD for Graph Representation Learning

isvd Official implementation of NeurIPS'21: Implicit SVD for Graph Representation Learning If you find this code useful, you may cite us as: @inprocee

Sami Abu-El-Haija 16 Jan 08, 2023
FACIAL: Synthesizing Dynamic Talking Face With Implicit Attribute Learning. ICCV, 2021.

FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning PyTorch implementation for the paper: FACIAL: Synthesizing Dynamic Talking

226 Jan 08, 2023
Key information extraction from invoice document with Graph Convolution Network

Key Information Extraction from Scanned Invoices Key information extraction from invoice document with Graph Convolution Network Related blog post fro

Phan Hoang 39 Dec 16, 2022
The official code for paper "R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling".

R2D2 This is the official code for paper titled "R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Mode

Alipay 49 Dec 17, 2022
Improving Calibration for Long-Tailed Recognition (CVPR2021)

MiSLAS Improving Calibration for Long-Tailed Recognition Authors: Zhisheng Zhong, Jiequan Cui, Shu Liu, Jiaya Jia [arXiv] [slide] [BibTeX] Introductio

DV Lab 116 Dec 20, 2022
[ICML 2020] "When Does Self-Supervision Help Graph Convolutional Networks?" by Yuning You, Tianlong Chen, Zhangyang Wang, Yang Shen

When Does Self-Supervision Help Graph Convolutional Networks? PyTorch implementation for When Does Self-Supervision Help Graph Convolutional Networks?

Shen Lab at Texas A&M University 106 Nov 11, 2022
NeurIPS'21 Tractable Density Estimation on Learned Manifolds with Conformal Embedding Flows

NeurIPS'21 Tractable Density Estimation on Learned Manifolds with Conformal Embedding Flows This repo contains the code for the paper Tractable Densit

Layer6 Labs 4 Dec 12, 2022
Implementation of the GBST block from the Charformer paper, in Pytorch

Charformer - Pytorch Implementation of the GBST (gradient-based subword tokenization) module from the Charformer paper, in Pytorch. The paper proposes

Phil Wang 105 Dec 26, 2022
FlingBot: The Unreasonable Effectiveness of Dynamic Manipulations for Cloth Unfolding

This repository contains code for training and evaluating FlingBot in both simulation and real-world settings on a dual-UR5 robot arm setup for Ubuntu 18.04

Columbia Artificial Intelligence and Robotics Lab 70 Dec 06, 2022
Turning SymPy expressions into JAX functions

sympy2jax Turn SymPy expressions into parametrized, differentiable, vectorizable, JAX functions. All SymPy floats become trainable input parameters. S

Miles Cranmer 38 Dec 11, 2022
Computational Pathology Toolbox developed by TIA Centre, University of Warwick.

TIA Toolbox Computational Pathology Toolbox developed at the TIA Centre Getting Started All Users This package is for those interested in digital path

Tissue Image Analytics (TIA) Centre 156 Jan 08, 2023
This solves the autonomous driving issue which is supported by deep learning technology. Given a video, it splits into images and predicts the angle of turning for each frame.

Self Driving Car An autonomous car (also known as a driverless car, self-driving car, and robotic car) is a vehicle that is capable of sensing its env

Sagor Saha 4 Sep 04, 2021
Learning Energy-Based Models by Diffusion Recovery Likelihood

Learning Energy-Based Models by Diffusion Recovery Likelihood Ruiqi Gao, Yang Song, Ben Poole, Ying Nian Wu, Diederik P. Kingma Paper: https://arxiv.o

Ruiqi Gao 41 Nov 22, 2022
An efficient and easy-to-use deep learning model compression framework

TinyNeuralNetwork 简体中文 TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework, which contains features like neura

Alibaba 441 Dec 25, 2022
Fast and exact ILP-based solvers for the Minimum Flow Decomposition (MFD) problem, and variants of it.

MFD-ILP Fast and exact ILP-based solvers for the Minimum Flow Decomposition (MFD) problem, and variants of it. The solvers are implemented using Pytho

Algorithmic Bioinformatics Group @ University of Helsinki 4 Oct 23, 2022
Official code for the paper "Self-Supervised Prototypical Transfer Learning for Few-Shot Classification"

Self-Supervised Prototypical Transfer Learning for Few-Shot Classification This repository contains the reference source code and pre-trained models (

EPFL INDY 44 Nov 04, 2022
A unified 3D Transformer Pipeline for visual synthesis

Overview This is the official repo for the paper: NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion. NÜWA is a unified multimodal p

Microsoft 2.6k Jan 06, 2023
Traditional deepdream with VQGAN+CLIP and optical flow. Ready to use in Google Colab

VQGAN-CLIP-Video cat.mp4 policeman.mp4 schoolboy.mp4 forsenBOG.mp4

23 Oct 26, 2022
BED: A Real-Time Object Detection System for Edge Devices

BED: A Real-Time Object Detection System for Edge Devices About this project Thi

Data Analytics Lab at Texas A&M University 44 Nov 18, 2022
Pytorch implementation of “Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement”

Graph-to-Graph Transformers Self-attention models, such as Transformer, have been hugely successful in a wide range of natural language processing (NL

Idiap Research Institute 40 Aug 14, 2022