A Closer Look at Invalid Action Masking in Policy Gradient Algorithms

This repo contains the source code to reproduce the results in the paper A Closer Look at Invalid Action Masking in Policy Gradient Algorithms.

Steps to reproduce the experiments

Our experiments use docker containers to run and Weight and Biases (https://www.wandb.com/) to record the experiments, so the first step is to register a wandb account and get an API key, which we refer to as YOUR_WANDB_KEY

# build the docker container
docker build -t invalid_action_masking:latest -f sharedmemory.Dockerfile .
# build docker run commands. replace `{YOUR_WANDB_KEY}` with your own
WANDB_KEY={YOUR_WANDB_KEY} python docker.py > docker.sh
# run experiments (96 in total)
# if you have limited computational resources, consider not running all of them at a time.
# in addition, notice the commands have --cpuset-cpus="0", --cpuset-cpus="1" for different runs
# to make sure each container is only using one core. By default I assume your machine has 40 cores,
# but feel free to modify the `cores` variable in `docker.py`
bash docker.sh

Steps to reproduce the figures

Record your wandb username, which we will refer to as YOUR_WANDB_ENTITY

cd plots
WANDB_ENTITY={YOUR_WANDB_ENTITY} python episode_reward.py
WANDB_ENTITY={YOUR_WANDB_ENTITY} python approx_kl.py

These command should reproduce the PDFs in plots that are attached to the repo.

Reproduction without WANDB

Although it would be possible, it would require a significant amount of effort to properly log metrics and redo the plotting, so at this time we would not have intructions to do reproduction without WANDB. Note that it is possible to use wandb locally by following https://docs.wandb.com/self-hosted/local.

If you have an issue reproducing the results

We have tested these scripts to reproduce but it is possible that there is a bug and maybe we are assuming something specific regarding the environment. If you couldn't reproduce our results, please file an issue and we will address it as soon as the double-blind review is over.

A Closer Look at Invalid Action Masking in Policy Gradient Algorithms

Related tags

Overview

A Closer Look at Invalid Action Masking in Policy Gradient Algorithms

Steps to reproduce the experiments

Steps to reproduce the figures

Reproduction without WANDB

If you have an issue reproducing the results

Owner

Costa Huang

PROJECT - Az Residential Real Estate Analysis

Official Implementation for Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation

FluidNet re-written with ATen tensor lib

AQP is a modular pipeline built to enable the comparison and testing of different quality metric configurations.

PyTorch code for 'Efficient Single Image Super-Resolution Using Dual Path Connections with Multiple Scale Learning'

Music Source Separation; Train & Eval & Inference piplines and pretrained models we used for 2021 ISMIR MDX Challenge.

MobileNetV1-V2，MobileNeXt，GhostNet，AdderNet，ShuffleNetV1-V2，Mobile+ViT etc.

JORLDY an open-source Reinforcement Learning (RL) framework provided by KakaoEnterprise

Multi-modal Vision Transformers Excel at Class-agnostic Object Detection

Proximal Backpropagation - a neural network training algorithm that takes implicit instead of explicit gradient steps

Prefix-Tuning: Optimizing Continuous Prompts for Generation

ADOP: Approximate Differentiable One-Pixel Point Rendering

Generative Autoregressive, Normalized Flows, VAEs, Score-based models (GANVAS)

Spatially-Adaptive Pixelwise Networks for Fast Image Translation, CVPR 2021

This repository contains the reference implementation for our proposed Convolutional CRFs.

Neural-net-from-scratch - A simple Neural Network from scratch in Python using the Pymathrix library

RADIal is available now! Check the download section

Supporting code for "Autoregressive neural-network wavefunctions for ab initio quantum chemistry".

Fast Soft Color Segmentation

Experiments and code to generate the GINC small-scale in-context learning dataset from "An Explanation for In-context Learning as Implicit Bayesian Inference"