A parallel framework for population-based multi-agent reinforcement learning.

Last update: Jan 08, 2023

Overview

MALib: A parallel framework for population-based multi-agent reinforcement learning

MALib is a parallel framework of population-based learning nested with (multi-agent) reinforcement learning (RL) methods, such as Policy Space Response Oracle, Self-Play and Neural Fictitous Self-Play. MALib provides higher-level abstractions of MARL training paradigms, which enables efficient code reuse and flexible deployments on different distributed computing paradigms. The design of MALib also strives to promote the research of other multi-agent learning, including multi-agent imitation learning and model-based MARL.

Installation

The installation of MALib is very easy. We've tested MALib on Python 3.6 and 3.7. This guide is based on ubuntu 18.04 and above. We strongly recommend using conda to manage your dependencies, and avoid version conflicts. Here we show the example of building python 3.7 based conda environment.

conda create -n malib python==3.7 -y
conda activate malib

# install dependencies
./install_deps.sh

# install malib
pip install -e .

External environments are integrated in MALib, such as StarCraftII and vizdoom, you can install them via pip install -e .[envs]. For users who wanna contribute to our repository, run pip install -e .[dev] to complete the development dependencies.

optional: if you wanna use alpha-rank to solve meta-game, install open-spiel with its installation guides

Quick Start

"""PSRO with PPO for Leduc Holdem"""

from malib.envs.poker import poker_aec_env as leduc_holdem
from malib.runner import run
from malib.rollout import rollout_func


env = leduc_holdem.env(fixed_player=True)

run(
    agent_mapping_func=lambda agent_id: agent_id,
    env_description={
        "creator": leduc_holdem.env,
        "config": {"fixed_player": True},
        "id": "leduc_holdem",
        "possible_agents": env.possible_agents,
    },
    training={
        "interface": {
            "type": "independent",
            "observation_spaces": env.observation_spaces,
            "action_spaces": env.action_spaces
        },
    },
    algorithms={
        "PSRO_PPO": {
            "name": "PPO",
            "custom_config": {
                "gamma": 1.0,
                "eps_min": 0,
                "eps_max": 1.0,
                "eps_decay": 100,
            },
        }
    },
    rollout={
        "type": "async",
        "stopper": "simple_rollout",
        "callback": rollout_func.sequential
    }
)

Citing MALib

If you use MALib in your work, please cite the accompanying paper.

@misc{zhou2021malib,
      title={MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning}, 
      author={Ming Zhou and Ziyu Wan and Hanjing Wang and Muning Wen and Runzhe Wu and Ying Wen and Yaodong Yang and Weinan Zhang and Jun Wang},
      year={2021},
      eprint={2106.07551},
      archivePrefix={arXiv},
      primaryClass={cs.MA}
}

A parallel framework for population-based multi-agent reinforcement learning.

Related tags

Overview

MALib: A parallel framework for population-based multi-agent reinforcement learning

Installation

Quick Start

Citing MALib

Owner

MARL @ SJTU

Self-Correcting Quantum Many-Body Control using Reinforcement Learning with Tensor Networks

Code accompanying our paper Feature Learning in Infinite-Width Neural Networks

Large-scale Hyperspectral Image Clustering Using Contrastive Learning, CIKM 21 Workshop

Unofficial implementation of Proxy Anchor Loss for Deep Metric Learning

A unet implementation for Image semantic segmentation

Code & Data for Enhancing Photorealism Enhancement

Official implementation of our paper "LLA: Loss-aware Label Assignment for Dense Pedestrian Detection" in Pytorch.

The implementation of FOLD-R++ algorithm

Quantile Regression DQN a Minimal Working Example, Distributional Reinforcement Learning with Quantile Regression

[ICLR 2022] Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators

QAT(quantize aware training) for classification with MQBench

"Exploring Vision Transformers for Fine-grained Classification" at CVPRW FGVC8

Github for the conference paper GLOD-Gaussian Likelihood OOD detector

A tool for calculating distortion parameters in coordination complexes.

This repo contains the code for the paper "Efficient hierarchical Bayesian inference for spatio-temporal regression models in neuroimaging" that has been accepted to NeurIPS 2021.

Ready-to-use code and tutorial notebooks to boost your way into few-shot image classification.

RipsNet: a general architecture for fast and robust estimation of the persistent homology of point clouds

Official code for the ICCV 2021 paper "DECA: Deep viewpoint-Equivariant human pose estimation using Capsule Autoencoders"

A hifiasm fork for metagenome assembly using Hifi reads.

ShuttleNet: Position-aware Fusion of Rally Progress and Player Styles for Stroke Forecasting in Badminton (AAAI 2022)