RL agent to play μRTS with Stable-Baselines3

Overview

Gym-μRTS with Stable-Baselines3/PyTorch

This repo contains an attempt to reproduce Gridnet PPO with invalid action masking algorithm to play μRTS using Stable-Baselines3 library. Apart from reproducibility, this might open access to a diverse set of well tested algorithms, and toolings for training, evaluations, and more.

Original paper: Gym-μRTS: Toward Affordable Deep Reinforcement Learning Research in Real-time Strategy Games.

Original code: gym-microrts-paper.

demo.gif

Install

Prerequisites:

  • Python 3.7+
  • Java 8.0+
  • FFmpeg (for video capturing)
git clone https://github.com/kachayev/gym-microrts-paper-sb3
cd gym-microrts-paper-sb3
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Note that I use newer version of gym-microrts compared to the one that was originally used for the paper.

Training

To traing an agent:

$ python ppo_gridnet_diverse_encode_decode_sb3.py

If everything is setup correctly, you'll see typicall SB3 verbose logging:

Using cpu device
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 2e+03    |
|    ep_rew_mean     | 0.0      |
| time/              |          |
|    fps             | 179      |
|    iterations      | 1        |
|    time_elapsed    | 11       |
|    total_timesteps | 2048     |
---------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 1.72e+03     |
|    ep_rew_mean          | -5.0         |
| time/                   |              |
|    fps                  | 55           |
|    iterations           | 2            |
|    time_elapsed         | 74           |
|    total_timesteps      | 4096         |
| train/                  |              |
|    approx_kl            | 0.0056759235 |
|    clip_fraction        | 0.0861       |
|    clip_range           | 0.2          |
|    entropy_loss         | -5.65        |
|    explained_variance   | 0.412        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.024       |
|    n_updates            | 10           |
|    policy_gradient_loss | -0.00451     |
|    value_loss           | 0.00413      |
------------------------------------------

As soon as correctness of the implementation is verified, I will provide details on how to use RL Baselines3 Zoo for training and evaluations.

Implementational Caveats

A few notes / pain points regarding the implementation of the alrogithms, and the process of integrating it with stable-baselines3:

  • Gym does not ship a space for "array of multidiscrete" use case (let's be honest, it's not very common). But it gives an option for defining your space when necessary. A new space, when defined, is not easy to integrate into SB3. In a few different places SB3 raises NotImplementedError facing unknown space (example 1, example 2).
  • Seems like switching to fully rolled out MutliDiscrete space definition has a significant performance penalty. Still investigating if this can be improved.
  • Invalid masking is implemented by passing masks into observations from the wrapper (the observation space is replaced with gym.spaces.Dict to hold both observations and masks). By doing it this way, masks are now available for policy, and fit rollout buffer layout. Masking is implemented by setting logits into -inf (or to a rather small number).

Look for xxx(hack) comments in the code for more details.

Owner
Oleksii Kachaiev
Principal Software Engineer @ Riot, League of Legends Data/ML/AI. Research interests: human-level intelligence for RTS games and complex open world simulations.
Oleksii Kachaiev
Code for SALT: Stackelberg Adversarial Regularization, EMNLP 2021.

SALT: Stackelberg Adversarial Regularization Code for Adversarial Regularization as Stackelberg Game: An Unrolled Optimization Approach, EMNLP 2021. R

Simiao Zuo 10 Jan 10, 2022
Code artifacts for the submission "Mind the Gap! A Study on the Transferability of Virtual vs Physical-world Testing of Autonomous Driving Systems"

Code Artifacts Code artifacts for the submission "Mind the Gap! A Study on the Transferability of Virtual vs Physical-world Testing of Autonomous Driv

Andrea Stocco 2 Aug 24, 2022
PyTorch implementation for COMPLETER: Incomplete Multi-view Clustering via Contrastive Prediction (CVPR 2021)

Completer: Incomplete Multi-view Clustering via Contrastive Prediction This repo contains the code and data of the following paper accepted by CVPR 20

XLearning Group 72 Dec 07, 2022
Colab notebook for openai/glide-text2im.

GLIDE text2im on Colab This repository provides a Colab notebook to produce images conditioned on text prompts with GLIDE [1]. Usage Run text2im.ipynb

Wok 19 Oct 19, 2022
f-BRS: Rethinking Backpropagating Refinement for Interactive Segmentation

f-BRS: Rethinking Backpropagating Refinement for Interactive Segmentation [Paper] [PyTorch] [MXNet] [Video] This repository provides code for training

Visual Understanding Lab @ Samsung AI Center Moscow 516 Dec 21, 2022
Official page of Patchwork (RA-L'21 w/ IROS'21)

Patchwork Official page of "Patchwork: Concentric Zone-based Region-wise Ground Segmentation with Ground Likelihood Estimation Using a 3D LiDAR Sensor

Hyungtae Lim 254 Jan 05, 2023
Development of IP code based on VIPs and AADM

Sparse Implicit Processes In this repository we include the two different versions of the SIP code developed for the article Sparse Implicit Processes

1 Aug 22, 2022
[ICCV 2021] Official PyTorch implementation for Deep Relational Metric Learning.

Ranking Models in Unlabeled New Environments Prerequisites This code uses the following libraries Python 3.7 NumPy PyTorch 1.7.0 + torchivision 0.8.1

Borui Zhang 39 Dec 10, 2022
training script for space time memory network

Trainig Script for Space Time Memory Network This codebase implemented training code for Space Time Memory Network with some cyclic features. Requirem

Yuxi Li 100 Dec 20, 2022
Intel® Neural Compressor is an open-source Python library running on Intel CPUs and GPUs

Intel® Neural Compressor targeting to provide unified APIs for network compression technologies, such as low precision quantization, sparsity, pruning, knowledge distillation, across different deep l

Intel Corporation 846 Jan 04, 2023
Traductor de lengua de señas al español basado en Python con Opencv y MedaiPipe

Traductor de señas Traductor de lengua de señas al español basado en Python con Opencv y MedaiPipe Requerimientos 🔧 Python 3.8 o inferior para evitar

Jahaziel Hernandez Hoyos 3 Nov 12, 2022
Black-Box-Tuning - Black-Box Tuning for Language-Model-as-a-Service

Black-Box-Tuning Source code for paper "Black-Box Tuning for Language-Model-as-a-Service". Being busy recently, the code in this repo and this tutoria

Tianxiang Sun 149 Jan 04, 2023
Distributed Asynchronous Hyperparameter Optimization in Python

Hyperopt: Distributed Hyperparameter Optimization Hyperopt is a Python library for serial and parallel optimization over awkward search spaces, which

6.5k Jan 01, 2023
Main repository for the HackBio'2021 Virtual Internship Experience for #Team-Greider ❤️

Hello 🤟 #Team-Greider The team of 20 people for HackBio'2021 Virtual Bioinformatics Internship 💝 🖨️ 👨‍💻 HackBio: https://thehackbio.com 💬 Ask us

Siddhant Sharma 7 Oct 20, 2022
Safe Model-Based Reinforcement Learning using Robust Control Barrier Functions

README Repository containing the code for the paper "Safe Model-Based Reinforcement Learning using Robust Control Barrier Functions". Specifically, an

Yousef Emam 13 Nov 24, 2022
Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning Source Code

Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning Source Code

STARS Laboratory 8 Sep 14, 2022
Real-Time-Student-Attendence-System - Real Time Student Attendence System

Real-Time-Student-Attendence-System The Student Attendance Management System Pro

Rounak Das 1 Feb 15, 2022
Restricted Boltzmann Machines in Python.

How to Use First, initialize an RBM with the desired number of visible and hidden units. rbm = RBM(num_visible = 6, num_hidden = 2) Next, train the m

Edwin Chen 928 Dec 30, 2022
Official code for paper Exemplar Based 3D Portrait Stylization.

3D-Portrait-Stylization This is the official code for the paper "Exemplar Based 3D Portrait Stylization". You can check the paper on our project websi

60 Dec 07, 2022
AniGAN: Style-Guided Generative Adversarial Networks for Unsupervised Anime Face Generation

AniGAN: Style-Guided Generative Adversarial Networks for Unsupervised Anime Face Generation AniGAN: Style-Guided Generative Adversarial Networks for U

Bing Li 81 Dec 14, 2022