This is the source code of RPG (Reward-Randomized Policy Gradient)

Last update: Nov 25, 2022

Related tags

Overview

RPG (Reward-Randomized Policy Gradient)

Zhenggang Tang, Chao Yu, Boyuan Chen, Huazhe Xu, Xiaolong Wang, Fei Fang, Simon Shaolei Du, Yu Wang, Yi Wu (* equal contribution)

Website: https://sites.google.com/view/staghuntrpg

This is the source code for RPG (Reward-Randomized Policy Gradient), which is proposed in the paper "Discovering Diverse Multi-agent Strategic Behavior via Reward Randomization"(https://arxiv.org/abs/2103.04564).

1. Supported environments

1.1 Agar.io

Agar is a popular multi-player online game. Players control one or more cells in a Petri dish. The goal is to gain as much mass as possible by eating cells smaller than the player's cell while avoiding being eaten by larger ones. Larger cells move slower. Each player starts with one cell but can split a sufficiently large cell into two, allowing them to control multiple cells. The control is performed by mouse motion: all the cells of a player move towards the mouse position.

We transform the Free-For-All (FFA) mode of Agar (https://agar.io/) into an Reinforcement Learning (RL) environment and we believe it can be utilized as a new Multi-agent RL testbed for a wide range of problems, such as cooperation, team formation, intention modeling, etc. If you want to use Agar.io as your testbed, welcome to visit the agar repository: https://github.com/staghuntrpg/agar.

1.2 Grid World

Monster-Hunt In Monster-Hunt, there is a monster and two apples. The monster keeps moving towards its closest agent while apples are static. When a single agent meets the monster, it losses a penalty of 2; if two agents catch the monster at the same time, they both earn a bonus of 5. Eating an apple always gives an agent a bonus of 2. Whenever an apple is eaten or the monster meets an agent, the apple or the monster will respawn randomly. The monster may move over the apple during the chase, in this case, the agent will gain the sum of points if it catches the monster and the apple exactly.
Escalation In Escalation, two agents appear randomly and one grid lights up at the initialization. If two agents step on the lit grid simultaneously, each agent can gain 1 point, and the lit grid will go out with an adjacent grid lighting up. Both agents can gain 1 point again if they step on the next lit grid together. But if one agent steps off the path, the other agent will lose 0.9L points, where L is the current length of stepping together, and the game is over. Another option is that two agents choose to step off the path simultaneously, neither agent will be punished, and the game continues.

2. Usage

git clone https://github.com/staghuntrpg/RPG.git --recursive

Tips: Please don't forget the --recursive in the command, or else you will not have Agar.io environment in your fold.

This repository is separated into two folds, GridWorld and Agar, corresponding to the environments used in the paper "Discovering Diverse Multi-agent Strategic Behavior via Reward Randomization". The installation&training instructions can be found in the subfolders of each environment.

3. Publication

If you find this repository useful, please cite our paper:

@misc{tang2021discovering,
      title={Discovering Diverse Multi-Agent Strategic Behavior via Reward Randomization}, 
      author={Zhenggang Tang and Chao Yu and Boyuan Chen and Huazhe Xu and Xiaolong Wang and Fei Fang and Simon Du and Yu Wang and Yi Wu},
      year={2021},
      eprint={2103.04564},
      archivePrefix={arXiv},
      primaryClass={cs.AI}
}

This is the source code of RPG (Reward-Randomized Policy Gradient)

Related tags

Overview

RPG (Reward-Randomized Policy Gradient)

Zhenggang Tang, Chao Yu, Boyuan Chen, Huazhe Xu, Xiaolong Wang, Fei Fang, Simon Shaolei Du, Yu Wang, Yi Wu (* equal contribution)

1. Supported environments

1.1 Agar.io

1.2 Grid World

2. Usage

3. Publication

Owner

This repository implements a brute-force spellchecker utilizing the Damerau-Levenshtein edit distance.

Code for "Generating Disentangled Arguments with Prompts: a Simple Event Extraction Framework that Works"

Pattern Matching in Python

Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization (ACL 2021)

BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese

Guide: Finetune GPT2-XL (1.5 Billion Parameters) and GPT-NEO (2.7 B) on a single 16 GB VRAM V100 Google Cloud instance with Huggingface Transformers using DeepSpeed

A python package to fine-tune transformer-based models for named entity recognition (NER).

BERN2: an advanced neural biomedical namedentity recognition and normalization tool

Python library for processing Chinese text

Modeling cumulative cases of Covid-19 in the US during the Covid 19 Delta wave using Bayesian methods.

A Python package implementing a new model for text classification with visualization tools for Explainable AI :octocat:

DAGAN - Dual Attention GANs for Semantic Image Synthesis

Contains analysis of trends from Fitbit Dataset (source: Kaggle) to see how the trends can be applied to Bellabeat customers and Bellabeat products

Ray-based parallel data preprocessing for NLP and ML.

Text-Summarization-using-NLP - Text Summarization using NLP to fetch BBC News Article and summarize its text and also it includes custom article Summarization

用Resnet101+GPT搭建一个玩王者荣耀的AI

IndoBERTweet is the first large-scale pretrained model for Indonesian Twitter. Published at EMNLP 2021 (main conference)

Lumped-element impedance calculator and frequency-domain plotter.

A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis

Multiple implementations for abstractive text summurization , using google colab

This is the source code of RPG (Reward-Randomized Policy Gradient)

Related tags

Overview

RPG (Reward-Randomized Policy Gradient)

Zhenggang Tang*, Chao Yu*, Boyuan Chen, Huazhe Xu, Xiaolong Wang, Fei Fang, Simon Shaolei Du, Yu Wang, Yi Wu (* equal contribution)

1. Supported environments

1.1 Agar.io

1.2 Grid World

2. Usage

3. Publication

Owner

This repository implements a brute-force spellchecker utilizing the Damerau-Levenshtein edit distance.

Code for "Generating Disentangled Arguments with Prompts: a Simple Event Extraction Framework that Works"

Pattern Matching in Python

Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization (ACL 2021)

BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese

Guide: Finetune GPT2-XL (1.5 Billion Parameters) and GPT-NEO (2.7 B) on a single 16 GB VRAM V100 Google Cloud instance with Huggingface Transformers using DeepSpeed

A python package to fine-tune transformer-based models for named entity recognition (NER).

BERN2: an advanced neural biomedical namedentity recognition and normalization tool

Python library for processing Chinese text

Modeling cumulative cases of Covid-19 in the US during the Covid 19 Delta wave using Bayesian methods.

A Python package implementing a new model for text classification with visualization tools for Explainable AI :octocat:

DAGAN - Dual Attention GANs for Semantic Image Synthesis

Contains analysis of trends from Fitbit Dataset (source: Kaggle) to see how the trends can be applied to Bellabeat customers and Bellabeat products

Ray-based parallel data preprocessing for NLP and ML.

Text-Summarization-using-NLP - Text Summarization using NLP to fetch BBC News Article and summarize its text and also it includes custom article Summarization

用Resnet101+GPT搭建一个玩王者荣耀的AI

IndoBERTweet is the first large-scale pretrained model for Indonesian Twitter. Published at EMNLP 2021 (main conference)

Lumped-element impedance calculator and frequency-domain plotter.

A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis

Multiple implementations for abstractive text summurization , using google colab

Zhenggang Tang, Chao Yu, Boyuan Chen, Huazhe Xu, Xiaolong Wang, Fei Fang, Simon Shaolei Du, Yu Wang, Yi Wu (* equal contribution)