Off-policy continuous control in PyTorch, with RDPG, RTD3 & RSAC

Overview

offpcc_logo

arXiv technical report soon available.

we are updating the readme to be as comprehensive as possible

Please ask any questions in Issues, thanks.

Introduction

This PyTorch repo implements off-policy RL algorithms for continuous control, including:

  • Standard algorithms: DDPG, TD3, SAC
  • Image-based algorithm: ConvolutionalSAC
  • Recurrent algorithms: RecurrentDPG, RecurrentTD3, RecurrentSAC, RecurrentSACSharing (see report)

where recurrent algorithms are generally not available in other repos.

Structure of codebase

Here, we talk about the organization of this code. In particular, we will talk about

  • Folder: where are certain files located?
  • Classes: how are classes designed to interact with each other?
  • Training/evaluation loop: how environment interaction, learning and evaluation alternate?

A basic understanding of these will make other details easy to understand from code itself.

Folders

  • file
    • containing plots reproducing stable-baselines3; you don’t need to touch this
  • offpcc (the good stuff; you will be using this)
    • algorithms (where DDPG, TD3 and SAC are implemented)
    • algorithms_recurrent (where RDPG, RTD3 and RSAC are implemented)
    • basics (abstract classes, stuff shared by algorithms or algorithms_recurrent, code for training)
    • basics_sb3 (you don’t need to touch this)
    • configs (gin configs)
    • domains (all custom domains are stored within and registered properly)
  • pics_for_readme
    • random pics; you don’t need to touch this
  • temp
    • potentially outdated stuff; you don’t need to touch this

Relationships between classes

There are three core classes in this repo:

  • Any environment written using OpenAI’s API would have:
    • reset method outputs the current state
    • step method takes in an action, outputs (reward, next state, done, info)
  • OffPolicyRLAlgorithm and RecurrentOffPolicyRLAlgorithm are the base class for all algorithms listed in introduction. You should think about them as neural network (e.g., actors, critics, CNNs, RNNs) wrappers that are augmented with methods to help these networks interact with other stuff:
    • act method takes in state from env, outputs action back to env
    • update_networks method takes in batch from buffer
  • The replay buffers ReplayBuffer and RecurrentReplayBuffer are built to interact with the environment and the algorithm classes
    • push method takes in a transition from env
    • sample method outputs a batch for algorithm’s update_networks method

Their relationships are best illustrated by a diagram:

offpcc_steps

Structure of training/evaluation loop

In this repo, we follow the training/evaluation loop style in spinning-up (this is essentially the script: basics/run_fns and the function train). It follows this basic structure, with details added for tracking stats and etc:

state = env.reset()
for t range(total_steps):  # e.g., 1 million
    # environment interaction
    if t >= update_after:
        action = algorithm.act(state)
    else:
        action = env.action_space.sample()
    next_state, reward, done, info = env.step(action)
   	# learning
    if t >= update_after and (t + 1) % update_every == 0:
        for j in range(update_every):
            batch = buffer.sample()
            algorithm.update_networks(batch)
    # evaluation
    if (t + 1) % num_steps_per_epoch == 0:
        ep_len, ep_ret = test_for_one_episode(test_env, algorithm)

Dependencies

Dependencies are available in requirements.txt; although there might be one or two missing dependencies that you need to install yourself.

Train an agent

Setup (wandb & GPU)

Add this to your bashrc or bash_profile and source it.

You should replace “account_name” with whatever wandb account that you want to use.

export OFFPCC_WANDB_ENTITY="account_name"

From the command line:

cd offpcc
CUDA_VISIBLE_DEVICES=3 OFFPCC_WANDB_PROJECT=project123 python launch.py --env <env-name> --algo <algo-name> --config <config-path> --run_id <id>

For DDPG, TD3, SAC

On pendulum-v0:

python launch.py --env pendulum-v0 --algo sac --config configs/test/template_short.gin --run_id 1

Commands and plots for benchmarking on Pybullet domains are in a Issue called “Performance check against SB3”.

For RecurrentDDPG, RecurrentTD3, RecurrentSAC

On pendulum-p-v0:

python launch.py --env pendulum-p-v0 --algo rsac --config configs/test/template_recurrent_100k.gin --run_id 1

Reproducing paper results

To reproduce paper results, simply run the commands in the previous section with the appropriate env name (listed below) and config files (their file names are highly readable). Mapping between env names used in code and env names used in paper:

  • pendulum-v0: pendulum
  • pendulum-p-v0: pendulum-p
  • pendulum-va-v0: pendulum-v
  • dmc-cartpole-balance-v0: cartpole-balance
  • dmc-cartpole-balance-p-v0: cartpole-balance-p
  • dmc-cartpole-balance-va-v0: cartpole-balance-v
  • dmc-cartpole-swingup-v0: cartpole-swingup
  • dmc-cartpole-swingup-p-v0: cartpole-swingup-p
  • dmc-cartpole-swingup-va-v0: cartpole-swingup-v
  • reacher-pomdp-v0: reacher-pomdp
  • water-maze-simple-pomdp-v0: watermaze
  • bumps-normal-test-v0: push-r-bump

Render learned policy

Create a folder in the same directory as offpcc, called results. In there, create a folder with the name of the environment, e.g., pendulum-p-v0. Within that env folder, create a folder with the name of the algorithm, e.g., rsac. You can get an idea of the algorithms available from the algo_name2class diectionary defined in offpcc/launch.py. Within that algorithm folder, create a folder with the run_id, e.g., 1. Simply put the saved actor (also actor summarizer for recurrent algorithms) into that inner most foler - they can be downloaded from the wandb website after your run finishes. Finally, go back into offpcc, and call

python launch.py --env pendulum-v0 --algo sac --config configs/test/template_short.gin --run_id 1 --render

For bumps-normal-test-v0, you need to modify the test_for_one_episode function within offpcc/basics/run_fns.py because, for Pybullet environments, the env.step must only appear once before the env.reset() call.

Owner
Zhihan
Zhihan
A Self-Supervised Contrastive Learning Framework for Aspect Detection

AspDecSSCL A Self-Supervised Contrastive Learning Framework for Aspect Detection This repository is a pytorch implementation for the following AAAI'21

Tian Shi 30 Dec 28, 2022
Codes to calculate solar-sensor zenith and azimuth angles directly from hyperspectral images collected by UAV. Works only for UAVs that have high resolution GNSS/IMU unit.

UAV Solar-Sensor Angle Calculation Table of Contents About The Project Built With Getting Started Prerequisites Installation Datasets Contributing Lic

Sourav Bhadra 1 Jan 15, 2022
This repository contains the code for the paper 'PARM: Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval' published at ECIR'22.

Paragraph Aggregation Retrieval Model (PARM) for Dense Document-to-Document Retrieval This repository contains the code for the paper PARM: A Paragrap

Sophia Althammer 33 Aug 26, 2022
Pytorch implementation of NEGEV method. Paper: "Negative Evidence Matters in Interpretable Histology Image Classification".

Pytorch 1.10.0 code for: Negative Evidence Matters in Interpretable Histology Image Classification (https://arxiv. org/abs/xxxx.xxxxx) Citation: @arti

Soufiane Belharbi 4 Dec 01, 2022
Locationinfo - A script helps the user to show network information such as ip address

Description This script helps the user to show network information such as ip ad

Roxcoder 1 Dec 30, 2021
Tensorflow solution of NER task Using BiLSTM-CRF model with Google BERT Fine-tuning And private Server services

Tensorflow solution of NER task Using BiLSTM-CRF model with Google BERT Fine-tuning

MaCan 4.2k Dec 29, 2022
A 3D sparse LBM solver implemented using Taichi

taichi_LBM3D Background Taichi_LBM3D is a 3D lattice Boltzmann solver with Multi-Relaxation-Time collision scheme and sparse storage structure impleme

Jianhui Yang 121 Jan 06, 2023
This repository contains code and data for "On the Multimodal Person Verification Using Audio-Visual-Thermal Data"

trimodal_person_verification This repository contains the code, and preprocessed dataset featured in "A Study of Multimodal Person Verification Using

ISSAI 7 Aug 31, 2022
Python Single Object Tracking Evaluation

pysot-toolkit The purpose of this repo is to provide evaluation API of Current Single Object Tracking Dataset, including VOT2016 VOT2018 VOT2018-LT OT

348 Dec 22, 2022
[SIGGRAPH'22] StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets

[Project] [PDF] This repository contains code for our SIGGRAPH'22 paper "StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets" by Axel Sauer, Katja

742 Jan 04, 2023
Ejemplo Algoritmo Viterbi - Example of a Viterbi algorithm applied to a hidden Markov model on DNA sequence

Ejemplo Algoritmo Viterbi Ejemplo de un algoritmo Viterbi aplicado a modelo ocul

Mateo Velásquez Molina 1 Jan 10, 2022
An end-to-end image translation model with weight-map for color constancy

CCUnet An end-to-end image translation model with weight-map for color constancy 1. Download the dataset (take Colorchecker_recommended dataset as an

Jianhui Qiu 1 Dec 21, 2021
This library contains a Tensorflow implementation of the paper Stability Analysis of Unfolded WMMSE for Power Allocation

UWMMSE-stability Tensorflow implementation of Stability Analysis of UWMMSE Overview This library contains a Tensorflow implementation of the paper Sta

Arindam Chowdhury 1 Nov 16, 2022
Tooling for converting STAC metadata to ODC data model

手语识别 0、使用到的模型 (1). openpose,作者:CMU-Perceptual-Computing-Lab https://github.com/CMU-Perceptual-Computing-Lab/openpose (2). 图像分类classification,作者:Bubbl

Open Data Cube 65 Dec 20, 2022
An implementation of based on pytorch and mmcv

FisherPruning-Pytorch An implementation of Group Fisher Pruning for Practical Network Compression based on pytorch and mmcv Main Functions Pruning f

Peng Lu 15 Dec 17, 2022
Graph InfoClust: Leveraging cluster-level node information for unsupervised graph representation learning

Graph-InfoClust-GIC [PAKDD 2021] PAKDD'21 version Graph InfoClust: Maximizing Coarse-Grain Mutual Information in Graphs Preprint version Graph InfoClu

Costas Mavromatis 21 Dec 03, 2022
This repository contains the source code of our work on designing efficient CNNs for computer vision

Efficient networks for Computer Vision This repo contains source code of our work on designing efficient networks for different computer vision tasks:

Sachin Mehta 386 Nov 26, 2022
TEA: A Sequential Recommendation Framework via Temporally Evolving Aggregations

TEA: A Sequential Recommendation Framework via Temporally Evolving Aggregations Requirements python 3.6 torch 1.9 numpy 1.19 Quick Start The experimen

DMIRLAB 4 Oct 16, 2022
A PyTorch implementation of "ANEMONE: Graph Anomaly Detection with Multi-Scale Contrastive Learning", CIKM-21

ANEMONE A PyTorch implementation of "ANEMONE: Graph Anomaly Detection with Multi-Scale Contrastive Learning", CIKM-21 Dependencies python==3.6.1 dgl==

Graph Analysis & Deep Learning Laboratory, GRAND 30 Dec 14, 2022
Uncertain natural language inference

Uncertain Natural Language Inference This repository hosts the code for the following paper: Tongfei Chen*, Zhengping Jiang*, Adam Poliak, Keisuke Sak

Tongfei Chen 14 Sep 01, 2022