Bag of Tricks for Natural Policy Gradient Reinforcement Learning [ArXiv]

Setup

Python 3.8.0
pip install -r req.txt
Mujoco 200 license

Main Files

main.py: main run file for model training
models.py: neural networks for policy and critic models
optim.py: second-order approximations for realizing the natural gradient
utils.py: helper functions

Reproducing Experiments

scripts/: bash training scripts formatted for compute canada/SLURM jobs
visualize/json: training hyperparameters for each experiment
visualize/csv: training results in .csv format
visualize/performance.py: (after training) view results & create .csv results
- best to run with VSCode ipython cells

Experiment Example

To run the baseline experiments:

Tune hparams: bash scripts/hparams/baseline.sh
- runs will be saved in runs/hparams_baseline/...
Extract best hparams from runs: python baseline_hparams.py
- the best hparams will be saved in visualize/json/baseline.json
Run training with hparams: bash scripts/baseline/diagonal.sh
- runs will be saved in runs/5e6_baseline/...
Run speed tests: bash scripts/speed/baseline.sh
- runs will be saved in runs/baseline_speed/...
View results: run interactive ipython in visualize/performance.py

# %%
runs_path = pathlib.Path("../runs/5e6_baseline/")
speed_runs_path = pathlib.Path("../runs/baseline_speed/")
name = "baseline"
baseline_data = analyze(runs_path, speed_runs_path)
baseline_df = mean_df(*baseline_data, name, save=True)

Second-order Approximation References

Implementations

Other

Code formatted with Black
Experiment runs format: runs/{experiment_name}/{env_name}/{approximation}_runs/{tensorboard folder}/...

Bag of Tricks for Natural Policy Gradient Reinforcement Learning

Related tags

Overview

Bag of Tricks for Natural Policy Gradient Reinforcement Learning [ArXiv]

Setup

Main Files

Reproducing Experiments

Experiment Example

Second-order Approximation References

Implementations

Other

Owner

Brennan Gebotys

modelvshuman is a Python library to benchmark the gap between human and machine vision

Implementation of Feedback Transformer in Pytorch

HarDNeXt: Official HarDNeXt repository

This repository focus on Image Captioning & Video Captioning & Seq-to-Seq Learning & NLP

Codebase for the solution that won first place and was awarded the most human-like agent in the 2021 NeurIPS Competition MineRL BASALT Challenge.

PyTorch implementation for paper StARformer: Transformer with State-Action-Reward Representations.

Apache Spark - A unified analytics engine for large-scale data processing

Pytorch implementation of Hinton's Dynamic Routing Between Capsules

Running Google MoveNet Multipose Tracking models on OpenVINO.

Minimalist Error collection Service compatible with Rollbar clients. Sentry or Rollbar alternative.

Predictive AI layer for existing databases.

In this project we combine techniques from neural voice cloning and musical instrument synthesis to achieve good results from as little as 16 seconds of target data.

dualPC.R contains the R code for the main functions.

Code/data of the paper "Hand-Object Contact Prediction via Motion-Based Pseudo-Labeling and Guided Progressive Label Correction" (BMVC2021)

This is an open source library implementing hyperbox-based machine learning algorithms

Attack on Confidence Estimation algorithm from the paper "Disrupting Deep Uncertainty Estimation Without Harming Accuracy"

Statsmodels: statistical modeling and econometrics in Python

Pytorch implementation of Decoupled Spatial-Temporal Transformer for Video Inpainting

PyTorch GPU implementation of the ES-RNN model for time series forecasting

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.