Deep Reinforcement Learning by using an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO)

Last update: Jun 06, 2022

Overview

V-MPO

Simple code to demonstrate Deep Reinforcement Learning by using an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO) in Pytorch

Getting Started

This project is using Pytorch for Deep Learning Framework, Gym for Reinforcement Learning Environment. Although it's not required, but i recommend run this project on a PC with GPU and 8 GB Ram

Prerequisites

Make sure you have installed Pytorch and Gym.

Click here to install gym
Click here to install pytorch

Installing

Just clone this project into your work folder

git clone https://github.com/wisnunugroho21/reinforcement_learning_v_mpo.git

Running the project

After you clone the project, run following script in cmd/terminal :

Discrete

python discrete.py

Continous

python continous.py

On-Policy adaptation of Maximum a Posteriori Policy Optimization (MPO)

Some of the most successful applications of deep reinforcement learning to chal- lenging domains in discrete and continuous control have used policy gradient methods in the on-policy setting. However, policy gradients can suffer from large variance that may limit performance, and in practice require carefully tuned entropy regularization to prevent policy collapse. As an alternative to policy gradient algo- rithms, we introduce V-MPO, an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO) that performs policy iteration based on a learned state- value function. We show that V-MPO surpasses previously reported scores for both the Atari-57 and DMLab-30 benchmark suites in the multi-task setting, and does so reliably without importance weighting, entropy regularization, or population-based tuning of hyperparameters. On individual DMLab and Atari levels, the proposed algorithm can achieve scores that are substantially higher than has previously been reported. V-MPO is also applicable to problems with high-dimensional, continuous action spaces, which we demonstrate in the context of learning to control simulated humanoids with 22 degrees of freedom from full state observations and 56 degrees of freedom from pixel observations, as well as example OpenAI Gym tasks where V-MPO achieves substantially higher asymptotic scores than previously reported.

You can read full detail of V-MPO in here

Deep Reinforcement Learning by using an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO)

Related tags

Overview

V-MPO

Getting Started

Prerequisites

Installing

Running the project

Discrete

Continous

On-Policy adaptation of Maximum a Posteriori Policy Optimization (MPO)

Owner

Nugroho Dewantoro

Simulation-based inference for the Galactic Center Excess

EsViT: Efficient self-supervised Vision Transformers

PrimitiveNet: Primitive Instance Segmentation with Local Primitive Embedding under Adversarial Metric (ICCV 2021)

Code for our method RePRI for Few-Shot Segmentation. Paper at http://arxiv.org/abs/2012.06166

LoFTR:Detector-Free Local Feature Matching with Transformers CVPR 2021

A Kitti Road Segmentation model implemented in tensorflow.

Faune proche - Retrieval of Faune-France data near a google maps location

Structured Edge Detection Toolbox

Speeding-Up Back-Propagation in DNN: Approximate Outer Product with Memory

Lyapunov-guided Deep Reinforcement Learning for Stable Online Computation Offloading in Mobile-Edge Computing Networks

Code accompanying the paper Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs (Chen et al., CVPR 2020, Oral).

ConformalLayers: A non-linear sequential neural network with associative layers

UDP++ (ECCVW 2020 Oral), (Winner of COCO 2020 Keypoint Challenge).

Codes and pretrained weights for winning submission of 2021 Brain Tumor Segmentation (BraTS) Challenge

Attention-based CNN-LSTM and XGBoost hybrid model for stock prediction

EdiBERT, a generative model for image editing

💛 Code and Dataset for our EMNLP 2021 paper: "Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes"

Official source code to CVPR'20 paper, "When2com: Multi-Agent Perception via Communication Graph Grouping"

A Dynamic Residual Self-Attention Network for Lightweight Single Image Super-Resolution

The Fundamental Clustering Problems Suite (FCPS) summaries 54 state-of-the-art clustering algorithms, common cluster challenges and estimations of the number of clusters as well as the testing for cluster tendency.