Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX

Last update: Nov 07, 2022

Related tags

Overview

CQL-JAX

This repository implements Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX (FLAX). Implementation is built on top of the SAC base of JAX-RL.

Usage

Install Dependencies-

pip install -r requirements.txt
pip install "jax[cuda111]<=0.21.1" -f https://storage.googleapis.com/jax-releases/jax_releases.html

Run CQL-

python train_offline.py --env_name=hopper-expert-v0 --min_q_weight=5

Please use the following values of min_q_weight on MuJoCo tasks to reproduce CQL results from IQL paper-

Domain	medium	medium-replay	medium-expert
walker	10	1	10
hopper	5	5	1
cheetah	90	80	100

For antmaze tasks min_q_weight=10 is found to work best.

In case of Out-Of Memory errors in JAX, try running with the following env variables-

XLA_PYTHON_CLIENT_MEM_FRACTION=0.80 python ...
XLA_FLAGS=--xla_gpu_force_compilation_parallelism=1 python ...

Performance & Runtime

Returns are more or less same as the torch implementation and comparable to IQL-

Task	CQL(PyTorch)	CQL(JAX)	IQL
hopper-medium-v2	58.5	74.6	66.3
hopper-medium-replay-v2	95.0	92.1	94.7
hopper-medium-expert-v2	105.4	83.2	91.5
antmaze-umaze-v0	74.0	69.5	87.5
antmaze-umaze-diverse-v0	84.0	78.7	62.2
antmaze-medium-play-v0	61.2	14.2	71.2
antmaze-medium-diverse-v0	53.7	10.7	70.2
antmaze-large-play-v0	15.8	0.0	39.6
antmaze-large-diverse-v0	14.9	0.0	47.5

Wall-clock time averages to ~50 mins, improving over IQL paper's 80 min CQL and closing the gap with IQL's 20 min.

Task	CQL(JAX)	IQL
hopper-medium-v2	52	27
hopper-medium-replay-v2	54	30
hopper-medium-expert-v2	57	29

Time efficiency over the original torch implementation is more than 4 times.

For more offline RL algorithm implementations, check out the JAX-RL, IQL and rlkit repositories.

Citation

In case you use CQL-JAX for your research, please cite the following-

@misc{cqljax,
  author = {Suri, Karush},
  title = {{Conservative Q Learning in JAX.}},
  url = {https://github.com/karush17/cql-jax},
  year = {2021}
}

Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX

Related tags

Overview

CQL-JAX

Usage

Performance & Runtime

Citation

References

Owner

Karush Suri

Deep learning models for classification of 15 common weeds in the southern U.S. cotton production systems.

PartImageNet is a large, high-quality dataset with part segmentation annotations

Python Wrapper for Embree

This porject is intented to build the most accurate model for predicting the porbability of loan default

Official implementation of "Variable-Rate Deep Image Compression through Spatially-Adaptive Feature Transform", ICCV 2021

dualPC.R contains the R code for the main functions.

Dynamic Multi-scale Filters for Semantic Segmentation (DMNet ICCV'2019)

This is an official implementation for "DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation"

This is an example of object detection on Micro bacterium tuberculosis using Mask-RCNN

Learning to Predict Gradients for Semi-Supervised Continual Learning

Ascend your Jupyter Notebook usage

[NeurIPS'21] Projected GANs Converge Faster

BrainGNN - A deep learning model for data-driven discovery of functional connectivity

Solutions of Reinforcement Learning 2nd Edition

Weight initialization schemes for PyTorch nn.Modules

Datasets and pretrained Models for StyleGAN3 ...

Joint deep network for feature line detection and description

Cleaned up code for DSTC 10: SIMMC 2.0 track: subtask 2: multimodal coreference resolution

Kaggle Feedback Prize - Evaluating Student Writing 15th solution

L-Verse: Bidirectional Generation Between Image and Text