Bag of Tricks for Natural Policy Gradient Reinforcement Learning [ArXiv]

Setup

Python 3.8.0
pip install -r req.txt
Mujoco 200 license

Main Files

main.py: main run file for model training
models.py: neural networks for policy and critic models
optim.py: second-order approximations for realizing the natural gradient
utils.py: helper functions

Reproducing Experiments

scripts/: bash training scripts formatted for compute canada/SLURM jobs
visualize/json: training hyperparameters for each experiment
visualize/csv: training results in .csv format
visualize/performance.py: (after training) view results & create .csv results
- best to run with VSCode ipython cells

Experiment Example

To run the baseline experiments:

Tune hparams: bash scripts/hparams/baseline.sh
- runs will be saved in runs/hparams_baseline/...
Extract best hparams from runs: python baseline_hparams.py
- the best hparams will be saved in visualize/json/baseline.json
Run training with hparams: bash scripts/baseline/diagonal.sh
- runs will be saved in runs/5e6_baseline/...
Run speed tests: bash scripts/speed/baseline.sh
- runs will be saved in runs/baseline_speed/...
View results: run interactive ipython in visualize/performance.py

# %%
runs_path = pathlib.Path("../runs/5e6_baseline/")
speed_runs_path = pathlib.Path("../runs/baseline_speed/")
name = "baseline"
baseline_data = analyze(runs_path, speed_runs_path)
baseline_df = mean_df(*baseline_data, name, save=True)

Second-order Approximation References

Implementations

Other

Code formatted with Black
Experiment runs format: runs/{experiment_name}/{env_name}/{approximation}_runs/{tensorboard folder}/...

Bag of Tricks for Natural Policy Gradient Reinforcement Learning

Related tags

Overview

Bag of Tricks for Natural Policy Gradient Reinforcement Learning [ArXiv]

Setup

Main Files

Reproducing Experiments

Experiment Example

Second-order Approximation References

Implementations

Other

Owner

Brennan Gebotys

Pytorch Lightning Implementation of SC-Depth Methods.

A multi-scale unsupervised learning for deformable image registration

Weakly Supervised Segmentation by Tensorflow.

Source code of our TTH paper: Targeted Trojan-Horse Attacks on Language-based Image Retrieval.

Official implementation for the paper "Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D Object Detection"

Good Classification Measures and How to Find Them

Decentralized Reinforcment Learning: Global Decision-Making via Local Economic Transactions (ICML 2020)

paper list in the area of reinforcenment learning for recommendation systems

Universal Probability Distributions with Optimal Transport and Convex Optimization

Dataset para entrenamiento de yoloV3 para 4 clases

The official repository for "Intermediate Layers Matter in Momentum Contrastive Self Supervised Learning" paper.

Multi-objective gym environments for reinforcement learning.

EM-POSE 3D Human Pose Estimation from Sparse Electromagnetic Trackers.

Hardware accelerated, batchable and differentiable optimizers in JAX.

Animal Sound Classification (Cats Vrs Dogs Audio Sentiment Classification)

Codes for AAAI22 paper "Learning to Solve Travelling Salesman Problem with Hardness-Adaptive Curriculum"

A lightweight deep network for fast and accurate optical flow estimation.

Multi-task yolov5 with detection and segmentation based on yolov5

Face Identity Disentanglement via Latent Space Mapping [SIGGRAPH ASIA 2020]

Supporting code for "Autoregressive neural-network wavefunctions for ab initio quantum chemistry".