Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning

Last update: Dec 21, 2022

Related tags

Deep Learning epciclr2020

Overview

Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning

This is the code for implementing the MADDPG algorithm presented in the paper: Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning. It is configured to be run in conjunction with environments from the (https://github.com/qian18long/epciclr2020/tree/master/mpe_local). We show our gif results here (https://sites.google.com/view/epciclr2020/). Note: this codebase has been restructured since the original paper, and the results may vary from those reported in the paper.

Installation

Install tensorflow 1.13.1

pip install tensorflow==1.13.1

Install OpenAI gym

pip install gym==0.13.0

Install other dependencies

pip install joblib imageio

Case study: Multi-Agent Particle Environments

We demonstrate here how the code can be used in conjunction with the(https://github.com/qian18long/epciclr2020/tree/master/mpe_local). It is based on(https://github.com/openai/multiagent-particle-envs)

Quick start

See train_grassland_epc.sh, train_adversarial_epc.sh and train_food_collect_epc.sh for the EPC algorithm for scenario grassland, adversarial and food_collect in the example setting presented in our paper.

Command-line options

Environment options

--scenario: defines which environment in the MPE is to be used (default: "grassland")
--map-size: The size of the environment. 1 if normal and 2 otherwise. (default: "normal")
--sight: The agent's visibility radius. (default: 100)
--alpha: Reward shared weight. (default: 0.0)
--max-episode-len maximum length of each episode for the environment (default: 25)
--num-episodes total number of training episodes (default: 200000)
--num-good: number of good agents in the scenario (default: 2)
--num-adversaries: number of adversaries in the environment (default: 2)
--num-food: number of food(resources) in the scenario (default: 4)
--good-policy: algorithm used for the 'good' (non adversary) policies in the environment (default: "maddpg"; options: {"att-maddpg", "maddpg", "PC", "mean-field"})
--adv-policy: algorithm used for the adversary policies in the environment (default: "maddpg"; options: {"att-maddpg", "maddpg", "PC", "mean-field"})

Core training parameters

--lr: learning rate (default: 1e-2)
--gamma: discount factor (default: 0.95)
--batch-size: batch size (default: 1024)
--num-units: number of units in the MLP (default: 64)
--good-num-units: number of units in the MLP of good agents, if not providing it will be num-units.
--adv-num-units: number of units in the MLP of adversarial agents, if not providing it will be num-units.
--n_cpu_per_agent: cpu usage per agent (default: 1)
--good-share-weights: good agents share weights of the agents encoder within the model.
--adv-share-weights: adversarial agents share weights of the agents encoder within the model.
--use-gpu: Use GPU for training (default: False)
--n-envs: number of environments instances in parallelization

Checkpointing

--save-dir: directory where intermediate training results and model will be saved (default: "/test/")
--save-rate: model is saved every time this number of episodes has been completed (default: 1000)
--load-dir: directory where training state and model are loaded from (default: "test")

Evaluation

--restore: restores previous training state stored in load-dir (or in save-dir if no load-dir has been provided), and continues training (default: False)
--display: displays to the screen the trained policy stored in load-dir (or in save-dir if no load-dir has been provided), but does not continue training (default: False)
--save-gif-data: Save the gif examples to the save-dir (default: False)
--render-gif: Render the gif in the load-dir (default: False)

EPC options

--initial-population: initial population size in the first stage
--num-selection: size of the population selected for reproduction
--num-stages: number of stages
--stage-num-episodes: number of training episodes in each stage
--stage-n-envs: number of environments instances in parallelization in each stage
--test-num-episodes: number of episodes for the competing

Example scripts

.maddpg_o/experiments/train_normal.py: apply the train_helpers.py for MADDPG, Att-MADDPG and mean-field training

.maddpg_o/experiments/train_x2.py: apply a single step doubling training
.maddpg_o/experiments/train_mix_match.py: mix match of the good agents in --sheep-init-load-dirs and adversarial agents in '--wolf-init-load-dirs' for model agents evaluation.
.maddpg_o/experiments/train_epc.py: train the scheduled EPC algorithm.
.maddpg_o/experiments/compete.py: evaluate different models by competition

Paper citation

@inproceedings{epciclr2020,
  author = {Qian Long and Zihan Zhou and Abhinav Gupta and Fei Fang and Yi Wu and Xiaolong Wang},
  title = {Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning},
  booktitle = {International Conference on Learning Representations},
  year = {2020}
}

Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning

Related tags

Overview

Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning

Installation

Case study: Multi-Agent Particle Environments

Quick start

Command-line options

Environment options

Core training parameters

Checkpointing

Evaluation

EPC options

Example scripts

Paper citation

Owner

An implementation of "Optimal Textures: Fast and Robust Texture Synthesis and Style Transfer through Optimal Transport"

Replication of Pix2Seq with Pretrained Model

DeepLM: Large-scale Nonlinear Least Squares on Deep Learning Frameworks using Stochastic Domain Decomposition (CVPR 2021)

CAMPARI: Camera-Aware Decomposed Generative Neural Radiance Fields

Automatic Differentiation Multipole Moment Molecular Forcefield

Finding an Unsupervised Image Segmenter in each of your Deep Generative Models

Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning

Keras-retinanet - Keras implementation of RetinaNet object detection.

Code for the paper "Relation of the Relations: A New Formalization of the Relation Extraction Problem"

Homepage of paper: Paint Transformer: Feed Forward Neural Painting with Stroke Prediction, ICCV 2021.

pcnaDeep integrates cutting-edge detection techniques with tracking and cell cycle resolving models.

Trafffic prediction analysis using hybrid models - Machine Learning

Unofficial pytorch implementation of 'Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization'

This is a tensorflow-based rotation detection benchmark, also called AlphaRotate.

TransReID: Transformer-based Object Re-Identification

Pcos-prediction - Predicts the likelihood of Polycystic Ovary Syndrome based on patient attributes and symptoms

Deep Crop Rotation

Autoencoders pretraining using clustering

Fast, accurate and reliable software for algebraic CT reconstruction

To build a regression model to predict the concrete compressive strength based on the different features in the training data.