Code for the paper: On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations

Related tags

Deep Learningnppac
Overview

Non-Parametric Prior Actor-Critic (N-PPAC)

This repository contains the code for

On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations, Tim G. J. Rudner*, Cong Lu*, Michael A. Osborne, Yarin Gal, Yee Whye Teh. Conference on Neural Information Processing Systems (NeurIPS), 2021.

Abstract: KL-regularized reinforcement learning from expert demonstrations has proved successful in improving the sample efficiency of deep reinforcement learning algorithms, allowing them to be applied to challenging physical real-world tasks. However, we show that KL-regularized reinforcement learning with behavioral policies derived from expert demonstrations suffers from hitherto unrecognized pathological behavior that can lead to slow, unstable, and suboptimal online training. We show empirically that the pathology occurs for commonly chosen behavioral policy classes and demonstrate its impact on sample efficiency and online policy performance. Finally, we show that the pathology can be remedied by specifying non-parametric behavioral policies and that doing so allows KL-regularized RL to significantly outperform state-of-the-art approaches on a variety of challenging locomotion and dexterous hand manipulation tasks.

View on OpenReview

In particular, the code implements:

  • Scripts for estimating behavioral reference policies for a range of model calsses, including non-parametric Gaussian processes, Bayesian neural networks trained via MC Dropout, deep ensembles, and Gaussian neural density models;
  • Scripts for KL-regularized online training that uses different bahevioral expert policies.

How to use this package

We provide a Docker setup which may be built as follows:

docker build -t torch-nppac .

To train the GP policies offline:

bash exp_scripts/paper_clone_gp.sh

To run online training (N-PPAC):

bash exp_scripts/paper_configs.sh

Pre-trained GP policies using final_clone_gp.sh are provided in the folder nppac/trained_gps/.

By default, all data will be stored in data/.

Reference

If you found this repository useful, please cite our paper as follows:

@inproceedings{
    rudner2021pathologies,
    title={On Pathologies in {KL}-Regularized Reinforcement Learning from Expert Demonstrations},
    author={Tim G. J. Rudner and Cong Lu and Michael A. Osborne and Yarin Gal and Yee Whye Teh},
    booktitle={Thirty-Fifth Conference on Neural Information Processing Systems},
    year={2021},
    url={https://openreview.net/forum?id=sS8rRmgAatA}
}

License

The repository is based on RLkit, which may contain further useful scripts. The license for this is contained under the rlkit/ folder.

Owner
Cong Lu
DPhil Student in Autonomous Intelligent Machines and Systems @ Oxford
Cong Lu
Predicting Tweet Sentiment Maching Learning and streamlit

Predicting-Tweet-Sentiment-Maching-Learning-and-streamlit (I prefere using Visual Studio Code ) Open the folder in VS Code Run the first cell in requi

1 Nov 20, 2021
Autoregressive Predictive Coding: An unsupervised autoregressive model for speech representation learning

Autoregressive Predictive Coding This repository contains the official implementation (in PyTorch) of Autoregressive Predictive Coding (APC) proposed

iamyuanchung 173 Dec 18, 2022
Official implementation for TTT++: When Does Self-supervised Test-time Training Fail or Thrive

TTT++ This is an official implementation for TTT++: When Does Self-supervised Test-time Training Fail or Thrive? TL;DR: Online Feature Alignment + Str

VITA lab at EPFL 39 Dec 25, 2022
Implementation of "Meta-rPPG: Remote Heart Rate Estimation Using a Transductive Meta-Learner"

Meta-rPPG: Remote Heart Rate Estimation Using a Transductive Meta-Learner This repository is the official implementation of Meta-rPPG: Remote Heart Ra

Eugene Lee 137 Dec 13, 2022
code for our ECCV 2020 paper "A Balanced and Uncertainty-aware Approach for Partial Domain Adaptation"

Code for our ECCV (2020) paper A Balanced and Uncertainty-aware Approach for Partial Domain Adaptation. Prerequisites: python == 3.6.8 pytorch ==1.1.0

32 Nov 27, 2022
ADGAN - The Implementation of paper Controllable Person Image Synthesis with Attribute-Decomposed GAN

ADGAN - The Implementation of paper Controllable Person Image Synthesis with Attribute-Decomposed GAN CVPR 2020 (Oral); Pose and Appearance Attributes Transfer;

Men Yifang 400 Dec 29, 2022
Collect some papers about transformer with vision. Awesome Transformer with Computer Vision (CV)

Awesome Visual-Transformer Collect some Transformer with Computer-Vision (CV) papers. If you find some overlooked papers, please open issues or pull r

dkliang 2.8k Jan 08, 2023
Constructing interpretable quadratic accuracy predictors to serve as an objective function for an IQCQP problem that represents NAS under latency constraints and solve it with efficient algorithms.

IQNAS: Interpretable Integer Quadratic programming Neural Architecture Search Realistic use of neural networks often requires adhering to multiple con

0 Oct 24, 2021
Neural machine translation between the writings of Shakespeare and modern English using TensorFlow

Shakespeare translations using TensorFlow This is an example of using the new Google's TensorFlow library on monolingual translation going from modern

Motoki Wu 245 Dec 28, 2022
Gesture Volume Control Using OpenCV and MediaPipe

This Project Uses OpenCV and MediaPipe Hand solutions to identify hands and Change system volume by taking thumb and index finger positions

Pratham Bhatnagar 6 Sep 12, 2022
Node-level Graph Regression with Deep Gaussian Process Models

Node-level Graph Regression with Deep Gaussian Process Models Prerequests our implementation is mainly based on tensorflow 1.x and gpflow 1.x: python

1 Jan 16, 2022
This library is a location of the LegacyLogger for PyTorch Lightning.

neptune-contrib Documentation See neptune-contrib documentation site Installation Get prerequisites python versions 3.5.6/3.6 are supported Install li

neptune.ai 26 Oct 07, 2021
PyTorch implementation of deep GRAph Contrastive rEpresentation learning (GRACE).

GRACE The official PyTorch implementation of deep GRAph Contrastive rEpresentation learning (GRACE). For a thorough resource collection of self-superv

Big Data and Multi-modal Computing Group, CRIPAC 186 Dec 27, 2022
The open source code of SA-UNet: Spatial Attention U-Net for Retinal Vessel Segmentation.

SA-UNet: Spatial Attention U-Net for Retinal Vessel Segmentation(ICPR 2020) Overview This code is for the paper: Spatial Attention U-Net for Retinal V

Changlu Guo 151 Dec 28, 2022
Final project code: Implementing BicycleGAN, for CIS680 FA21 at University of Pennsylvania

680 Final Project: BicycleGAN Haoran Tang Instructions 1. Training To train the network, please run train.py. Change hyper-parameters and folder paths

Haoran Tang 0 Apr 22, 2022
Phylogeny Partners

Phylogeny-Partners Two states models Instalation You may need to install the cython, networkx, numpy, scipy package: pip install cython, networkx, num

1 Sep 19, 2022
Progressive Coordinate Transforms for Monocular 3D Object Detection

Progressive Coordinate Transforms for Monocular 3D Object Detection This repository is the official implementation of PCT. Introduction In this paper,

58 Nov 06, 2022
Scalable, event-driven, deep-learning-friendly backtesting library

...Minimizing the mean square error on future experience. - Richard S. Sutton BTGym Scalable event-driven RL-friendly backtesting library. Build on

Andrew 922 Dec 27, 2022
Code for ICLR2018 paper: Improving GAN Training via Binarized Representation Entropy (BRE) Regularization - Y. Cao · W Ding · Y.C. Lui · R. Huang

code for "Improving GAN Training via Binarized Representation Entropy (BRE) Regularization" (ICLR2018 paper) paper: https://arxiv.org/abs/1805.03644 G

21 Oct 12, 2020
[CVPR'21] Multi-Modal Fusion Transformer for End-to-End Autonomous Driving

TransFuser This repository contains the code for the CVPR 2021 paper Multi-Modal Fusion Transformer for End-to-End Autonomous Driving. If you find our

695 Jan 05, 2023