PyTorch implementation of the ExORL: Exploratory Data for Offline Reinforcement Learning

Last update: Jan 01, 2023

Overview

ExORL: Exploratory Data for Offline Reinforcement Learning

This is an original PyTorch implementation of the ExORL framework from

Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning by

Denis Yarats*, David Brandfonbrener*, Hao Liu, Misha Laskin, Pieter Abbeel, Alessandro Lazaric, and Lerrel Pinto.

*Equal contribution.

Prerequisites

Install MuJoCo if it is not already the case:

Download MuJoCo binaries here.
Unzip the downloaded archive into ~/.mujoco/.
Append the MuJoCo subdirectory bin path into the env variable LD_LIBRARY_PATH.

Install the following libraries:

sudo apt update
sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3 unzip

Install dependencies:

conda env create -f conda_env.yml
conda activate exorl

Datasets

We provide exploratory datasets for 6 DeepMind Control Stuite domains

Domain	Dataset name	Available task names
Cartpole	`cartpole`	`cartpole_balance`, `cartpole_balance_sparse`, `cartpole_swingup`, `cartpole_swingup_sparse`
Cheetah	`cheetah`	`cheetah_run`, `cheetah_run_backward`
Jaco Arm	`jaco`	`jaco_reach_top_left`, `jaco_reach_top_right`, `jaco_reach_bottom_left`, `jaco_reach_bottom_right`
Point Mass Maze	`point_mass_maze`	`point_mass_maze_reach_top_left`, `point_mass_maze_reach_top_right`, `point_mass_maze_reach_bottom_left`, `point_mass_maze_reach_bottom_right`
Quadruped	`quadruped`	`quadruped_walk`, `quadruped_run`
Walker	`walker`	`walker_stand`, `walker_walk`, `walker_run`

For each domain we collected datasets by running 9 unsupervised RL algorithms from URLB for total of 10M steps. Here is the list of algorithms

Unsupervised RL method	Name	Paper
APS	`aps`	paper
APT(ICM)	`icm_apt`	paper
DIAYN	`diayn`	paper
Disagreement	`disagreement`	paper
ICM	`icm`	paper
ProtoRL	`proto`	paper
Random	`random`	N/A
RND	`rnd`	paper
SMM	`smm`	paper

You can download a dataset by running ./download.sh, for example to download ProtoRL dataset for Walker, run

./download.sh walker proto

The script will download the dataset from S3 and store it under datasets/walker/proto/, where you can find episodes (under buffer) and episode videos (under video).

Offline RL training

We also provide implementation of 5 offline RL algorithms for evaluating the datasets

Offline RL method	Name	Paper
Behavior Cloning	`bc`	paper
CQL	`cql`	paper
CRR	`crr`	paper
TD3+BC	`td3_bc`	paper
TD3	`td3`	paper

After downloading required datasets, you can evaluate it using offline RL methon for a specific task. For example, to evaluate a dataset collected by ProtoRL on Walker for the waling task using TD3+BC you can run

python train_offline.py agent=td3_bc expl_agent=proto task=walker_walk

Logs are stored in the output folder. To launch tensorboard run:

tensorboard --logdir output

Citation

If you use this repo in your research, please consider citing the paper as follows:

@article{yarats2022exorl,
  title={Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning},
  author={Denis Yarats, David Brandfonbrener, Hao Liu, Michael Laskin, Pieter Abbeel, Alessandro Lazaric, Lerrel Pinto},
  journal={arXiv preprint arXiv:2201.13425},
  year={2022}
}

License

The majority of ExORL is licensed under the MIT license, however portions of the project are available under separate license terms: DeepMind is licensed under the Apache 2.0 license.

PyTorch implementation of the ExORL: Exploratory Data for Offline Reinforcement Learning

Related tags

Overview

ExORL: Exploratory Data for Offline Reinforcement Learning

Prerequisites

Datasets

Offline RL training

Citation

License

Owner

Denis Yarats

WORD: Revisiting Organs Segmentation in the Whole Abdominal Region

DecoupledNet is semantic segmentation system which using heterogeneous annotations

Sequence to Sequence Models with PyTorch

ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

Official implementation of the Implicit Behavioral Cloning (IBC) algorithm

Semi-Supervised Semantic Segmentation with Cross-Consistency Training (CCT)

gtfs2vec - Learning GTFS Embeddings for comparing PublicTransport Offer in Microregions

RLBot Python bindings for the Rust crate rl_ball_sym

CVNets: A library for training computer vision networks

Very Deep Convolutional Networks for Large-Scale Image Recognition

[ICCV 2021] Released code for Causal Attention for Unbiased Visual Recognition

A toy compiler that can convert Python scripts to pickle bytecode 🥒

Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.

Affine / perspective transformation in Pose Estimation with Tensorflow 2

Code for our ACL 2021 paper "One2Set: Generating Diverse Keyphrases as a Set"

Subdivision-based Mesh Convolutional Networks

Code to reproduce the results in the paper "Tensor Component Analysis for Interpreting the Latent Space of GANs".

PyTorch implementation for Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuous Sign Language Recognition.

YOLOv5🚀 reproduction by Guo Quanhao using PaddlePaddle

Python package for dynamic system estimation of time series