Official PyTorch implementation of "Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble" (NeurIPS'21)

Overview

Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble

DOI License: MIT

This is the code for reproducing the results of the paper Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble accepted at NeurIPS'2021.

This code builds up from the offical code of Reset-Free Lifelong Learning with Skill-Space Planning, originally derived from rlkit.

If you find this repository useful for your research, please cite:

@inproceedings{
    an2021edac,
    title={Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble},
    author={Gaon An and Seungyong Moon and Jang-Hyun Kim and Hyun Oh Song},
    booktitle={Neural Information Processing Systems},
    year={2021}
}

Requirements

To install all the required dependencies:

  1. Install MuJoCo engine, which can be downloaded from here.

  2. Install Python packages listed in requirements.txt using pip. You should specify the versions of mujoco_py and dm_control in requirements.txt depending on the version of MuJoCo engine you have installed as follows:

    • MuJoCo 2.0: mujoco-py<2.1,>=2.0, dm_control==0.0.364896371
    • MuJoCo 2.1.0: mujoco-py<2.2,>=2.1, dm_control==0.0.403778684
    • MuJoCo 2.1.1: to be updated
  3. Manually download and install d4rl package from here. You should remove lines including dm_control in setup.py.

Here is an example of how to install all the dependencies on Ubuntu:

conda create -n edac python=3.7
conda activate edac
# Specify versions of mujoco-py and dm_control in requirements.txt
pip install --no-cache-dir -r requirements.txt

cd .
git clone https://github.com/rail-berkeley/d4rl.git

cd d4rl
# Remove lines including 'dm_control' in setup.py
pip install -e .

Reproducing the results

Gym

To reproduce SAC-N results for MuJoCo Gym, run:

python -m scripts.sac --env_name [ENVIRONMENT] --num_qs [N]

To reproduce EDAC results for MuJoCo Gym, run:

python -m scripts.sac --env_name [ENVIRONMENT] --num_qs [N] --eta [ETA]

Adroit

On Adroit tasks, we apply reward normalization for further training stability. For example, to reproduce the EDAC results for pen-human, run:

python -m scripts.sac --env_name pen-human-v1 --epoch 200 --num_qs 20 --plr 3e-5 --eta 1000 --reward_mean --reward_std

To reproduce the EDAC results for pen-cloned, run:

python -m scripts.sac --env_name pen-human-v1 --epoch 200 --num_qs 20 --plr 3e-5 --eta 10 --max_q_backup --reward_mean --reward_std

Acknowledgement

This work was supported in part by Samsung Advanced Institute of Technology, Samsung Electronics Co., Ltd., Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2020-0-00882, (SW STAR LAB) Development of deployable learning intelligence via self-sustainable and trustworthy machine learning and No. 2019-0-01371, Development of brain-inspired AI with human-like intelligence), and Research Resettlement Fund for the new faculty of Seoul National University. This material is based upon work supported by the Air Force Office of Scientific Research under award number FA2386-20-1-4043.

You might also like...
Official PyTorch implementation of Spatial Dependency Networks.
Official PyTorch implementation of Spatial Dependency Networks.

Spatial Dependency Networks: Neural Layers for Improved Generative Image Modeling Đorđe Miladinović   Aleksandar Stanić   Stefan Bauer   Jürgen Schmid

Official implementation of our CVPR2021 paper
Official implementation of our CVPR2021 paper "OTA: Optimal Transport Assignment for Object Detection" in Pytorch.

OTA: Optimal Transport Assignment for Object Detection This project provides an implementation for our CVPR2021 paper "OTA: Optimal Transport Assignme

This is the official PyTorch implementation of the paper
This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

TransFG: A Transformer Architecture for Fine-grained Recognition Official PyTorch code for the paper: TransFG: A Transformer Architecture for Fine-gra

StyleGAN2-ADA - Official PyTorch implementation
StyleGAN2-ADA - Official PyTorch implementation

Need Help? If you’re new to StyleGAN2-ADA and looking to get started, please check out this video series from a course Lia Coleman and I taught in Oct

Official PyTorch implementation of
Official PyTorch implementation of "ArtFlow: Unbiased Image Style Transfer via Reversible Neural Flows"

ArtFlow Official PyTorch implementation of the paper: ArtFlow: Unbiased Image Style Transfer via Reversible Neural Flows Jie An*, Siyu Huang*, Yibing

Official PyTorch implementation of RobustNet (CVPR 2021 Oral)
Official PyTorch implementation of RobustNet (CVPR 2021 Oral)

RobustNet (CVPR 2021 Oral): Official Project Webpage Codes and pretrained models will be released soon. This repository provides the official PyTorch

Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.
Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

PyTorch Implementation of Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers 1 Using Colab Please notic

[PyTorch] Official implementation of CVPR2021 paper
[PyTorch] Official implementation of CVPR2021 paper "PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency". https://arxiv.org/abs/2103.05465

PointDSC repository PyTorch implementation of PointDSC for CVPR'2021 paper "PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency",

Official PyTorch implementation of MX-Font (Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts)

Introduction Pytorch implementation of Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Expert. | paper Song Park1

Comments
  • Performance on D4RL AntMaze tasks?

    Performance on D4RL AntMaze tasks?

    Hello, I have a question about the performance of SAC-N or EDAC on AntMaze tasks. Have you ever tested it?

    In my experiments based on this official implementation, I found that average returns in evaluation are always 0, which is worse than behavior cloning. Then I try to run with a modified reward (r=4*(r-0.5)) and max Q backup. However, they didn't help.

    I'll appreciate it a lot if you give me some related advices. Thanks a lot.

    opened by yuxudong20 3
Releases(v1.0)
This repo holds code for TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

TransUNet This repo holds code for TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation Usage

1.4k Jan 04, 2023
ISTR: End-to-End Instance Segmentation with Transformers (https://arxiv.org/abs/2105.00637)

This is the project page for the paper: ISTR: End-to-End Instance Segmentation via Transformers, Jie Hu, Liujuan Cao, Yao Lu, ShengChuan Zhang, Yan Wa

Jie Hu 182 Dec 19, 2022
Geometric Algebra package for JAX

JAXGA - JAX Geometric Algebra GitHub | Docs JAXGA is a Geometric Algebra package on top of JAX. It can handle high dimensional algebras by storing onl

Robin Kahlow 36 Dec 22, 2022
Use MATLAB to simulate the signal and extract features. Use PyTorch to build and train deep network to do spectrum sensing.

Deep-Learning-based-Spectrum-Sensing Use MATLAB to simulate the signal and extract features. Use PyTorch to build and train deep network to do spectru

10 Dec 14, 2022
Code to reproduce experiments in the paper "Explainability Requires Interactivity".

Explainability Requires Interactivity This repository contains the code to train all custom models used in the paper Explainability Requires Interacti

Digital Health & Machine Learning 5 Apr 07, 2022
PyTorch Implementation of ECCV 2020 Spotlight TuiGAN: Learning Versatile Image-to-Image Translation with Two Unpaired Images

TuiGAN-PyTorch Official PyTorch Implementation of "TuiGAN: Learning Versatile Image-to-Image Translation with Two Unpaired Images" (ECCV 2020 Spotligh

181 Dec 09, 2022
ICS 4u HD project, start before-wards. A curtain shooting game using python.

Touhou-Star-Salvation HDCH ICS 4u HD project, start before-wards. A curtain shooting game using python and pygame. By Jason Li For arts and gameplay,

15 Dec 22, 2022
The official PyTorch implementation for NCSNv2 (NeurIPS 2020)

Improved Techniques for Training Score-Based Generative Models This repo contains the official implementation for the paper Improved Techniques for Tr

174 Dec 26, 2022
An self sufficient AI that crawls the web to learn how to generate art from keywords

Roxx-IO - The Smart Artist AI! TO DO / IDEAS Implement Web-Scraping Functionality Figure out a less annoying (and an off button for it) text to speech

Tatz 5 Mar 21, 2022
An offline deep reinforcement learning library

d3rlpy: An offline deep reinforcement learning library d3rlpy is an offline deep reinforcement learning library for practitioners and researchers. imp

Takuma Seno 817 Jan 02, 2023
Time Delayed NN implemented in pytorch

Pytorch Time Delayed NN Time Delayed NN implemented in PyTorch. Usage kernels = [(1, 25), (2, 50), (3, 75), (4, 100), (5, 125), (6, 150)] tdnn = TDNN

Daniil Gavrilov 79 Aug 04, 2022
A dual benchmarking study of visual forgery and visual forensics techniques

A dual benchmarking study of facial forgery and facial forensics In recent years, visual forgery has reached a level of sophistication that humans can

8 Jul 06, 2022
Adaptive Pyramid Context Network for Semantic Segmentation (APCNet CVPR'2019)

Adaptive Pyramid Context Network for Semantic Segmentation (APCNet CVPR'2019) Introduction Official implementation of Adaptive Pyramid Context Network

21 Nov 09, 2022
Node for thenewboston digital currency network.

Project setup For project setup see INSTALL.rst Community Join the community to stay updated on the most recent developments, project roadmaps, and ra

thenewboston 27 Jul 08, 2022
Paaster is a secure by default end-to-end encrypted pastebin built with the objective of simplicity.

Follow the development of our desktop client here Paaster Paaster is a secure by default end-to-end encrypted pastebin built with the objective of sim

Ward 211 Dec 25, 2022
MVS2D: Efficient Multi-view Stereo via Attention-Driven 2D Convolutions

MVS2D: Efficient Multi-view Stereo via Attention-Driven 2D Convolutions Project Page | Paper If you find our work useful for your research, please con

96 Jan 04, 2023
Dictionary Learning with Uniform Sparse Representations for Anomaly Detection

Dictionary Learning with Uniform Sparse Representations for Anomaly Detection Implementation of the Uniform DL Representation for AD algorithm describ

Paul Irofti 1 Nov 23, 2022
AugLiChem - The augmentation library for chemical systems.

AugLiChem Welcome to AugLiChem! The augmentation library for chemical systems. This package supports augmentation for both crystaline and molecular sy

BaratiLab 17 Jan 08, 2023
Playing around with FastAPI and streamlit to create a YoloV5 object detector

FastAPI-Streamlit-based-YoloV5-detector Playing around with FastAPI and streamlit to create a YoloV5 object detector It turns out that a User Interfac

2 Jan 20, 2022
Redash reset for python

redash-reset This will use a default REDASH_SECRET_KEY key of c292a0a3aa32397cdb050e233733900f this allows you to reset the password of the user ID bu

Robert Wiggins 5 Nov 14, 2022