A library of multi-agent reinforcement learning components and systems

Last update: Dec 23, 2022

Overview

Mava: a research framework for distributed multi-agent reinforcement learning

Overview
Getting Started
Supported Environments
System implementations
Usage
Installation
Debugging
Roadmap
Contributing
Troubleshooting and FAQ

Mava is a library for building multi-agent reinforcement learning (MARL) systems. Mava provides useful components, abstractions, utilities and tools for MARL and allows for simple scaling for multi-process system training and execution while providing a high level of flexibility and composability.

👷‍♀️ NOTICE: Our release of Mava is foremost to benefit the wider community and make it easier for researchers to work on MARL. However, we consider this release a Beta version of Mava. As with many frameworks, Mava is (and will probably always remain) a work in progress and there is much more the team aims to provide and improve in future releases. From incorporating the latest research and innovations to making the framework more stable, robust and well tested. Furthermore, we are committed and will do our best to keep everything working and have the experience of using Mava be as pleasant as possible. During Beta development breaking changes may occur as well as significant design changes (if we feel it could greatly improve the useability of the framework) but these will be clearly communicated before being incorporated into the codebase. It is also inevitable that there might be bugs we are not aware of and that things might break from time to time. We will do our best to fix these bugs and address any issues as quickly as possible. ⭐

Overview

Systems and the Executor-Trainer Paradigm

At the core of the Mava framework is the concept of a system. A system refers to a full multi-agent reinforcement learning algorithm consisting of the following specific components: an Executor, a Trainer and a Dataset.

The Executor is the part of the system that interacts with the environment, takes actions for each agent and observes the next state as a collection of observations, one for each agent in the system. Essentially, executors are the multi-agent version of the Actor class in Acme and are themselves constructed through feeding to the executor a dictionary of policy networks. The Trainer is responsible for sampling data from the Dataset originally collected from the executor and updating the parameters for every agent in the system. Trainers are therefore the multi-agent version of the Learner class in Acme. The Dataset stores all of the information collected by the executors in the form of a collection of dictionaries for the actions, observations and rewards with keys corresponding to the individual agent ids. The basic system design is shown on the left in the above figure. Several examples of system implementations can be viewed here.

Distributed System Training

Mava shares much of the design philosophy of Acme for the same reason: to allow a high level of composability for novel research (i.e. building new systems) as well as making it possible to scale systems in a simple way, using the same underlying multi-agent RL system code. Mava uses Launchpad for creating distributed programs. In Mava, the system executor (which is responsible for data collection) is distributed across multiple processes each with a copy of the environment. Each process collects and stores data which the Trainer uses to update the parameters of all the actor networks used within each executor. This approach to distributed system training is illustrated on the right in the figure above. ✋ NOTE: In the near future, Mava aims to support additional training setups, e.g. distributed training using multiple trainers to support Bayesian optimisation or population based training (PBT).

Getting Started

We have a Quickstart notebook that can be used to quickly create and train your first Multi-Agent System. For more information on how to use Mava, please view our usage section.

Supported Environments

A given multi-agent system interacts with its environment via an EnvironmentLoop. This loop takes as input a system instance and a multi-agent environment instance which implements the DeepMind Environment API. Mava currently supports multi-agent environment loops and environment wrappers for the following environments and environment suites:


MAD4PG on PettingZoo's Multi-Walker environment.	VDN on the SMAC 3m map.

System Implementations

Mava includes several system implementations. Below we list these together with an indication of the maturity of the system using the following keys: 🟩 -- Tested and working well, 🟨 -- Running and training on simple environments, but not extensively tested and 🟥 -- Implemented but untested and yet to show clear signs of stable training.

🟩 - Multi-Agent Deep Q-Networks (MADQN).
🟩 - Multi-Agent Deep Deterministic Policy Gradient (MADDPG).
🟩 - Multi-Agent Distributed Distributional Deep Deterministic Policy Gradient (MAD4PG).
🟨 - Differentiable Inter-Agent Learning (DIAL).
🟨 - Multi-Agent Proximal Policy Optimisation (MAPPO).
🟨 - Value Decomposition Networks (VDN).
🟥 - Monotonic value function factorisation (QMIX).

Name	Recurrent	Continuous	Discrete	Centralised training	Communication	Multi Processing
MADQN	✔️	❌	✔️	✔️	✔️	✔️
DIAL	✔️	❌	✔️	✔️	✔️	✔️
MADDPG	✔️	✔️	✔️	✔️	❌	✔️
MAD4PG	✔️	✔️	✔️	✔️	❌	✔️
MAPPO	❌	✔️	✔️	✔️	❌	✔️
VDN	❌	❌	✔️	✔️	❌	✔️
QMIX	❌	❌	✔️	✔️	❌	✔️

As we develop Mava further, we aim to have all systems well tested on a wide variety of environments.

Usage

To get a sense of how Mava systems are used we provide the following simplified example of launching a distributed MADQN system.

# Mava imports
from mava.systems.tf import madqn
from mava.components.tf.architectures import DecentralisedPolicyActor
from . import helpers

# Launchpad imports
import launchpad

# Distributed program
program = madqn.MADQN(
    environment_factory=helpers.environment_factory,
    network_factory=helpers.network_factory,
    architecture=DecentralisedPolicyActor,
    num_executors=2,
).build()

# Launch
launchpad.launch(
    program,
    launchpad.LaunchType.LOCAL_MULTI_PROCESSING,
)

The first two arguments to the program are environment and network factory functions. These helper functions are responsible for creating the networks for the system, initialising their parameters on the different compute nodes and providing a copy of the environment for each executor. The next argument num_executors sets the number of executor processes to be run. After building the program we feed it to Launchpad's launch function and specify the launch type to perform local multi-processing, i.e. running the distributed program on a single machine. Scaling up or down is simply a matter of adjusting the number of executor processes.

For a deeper dive, take a look at the detailed working code examples found in our examples subdirectory which show how to instantiate a few MARL systems and environments.

Components

Mava provides several components to support the design of MARL systems such as different system architectures and modules. You can change the architecture to support a different form of information sharing between agents, or add a module to enhance system capabilities. Some examples of common architectures are given below.

In terms of components, you can for example update the above system code in MADQN to use a communication module by wrapping the architecture fed to the system as shown below.

from mava.components.tf.modules import communication

...

# Wrap architecture in communication module
communication.BroadcastedCommunication(
    architecture=architecture,
    shared=True,
    channel_size=1,
    channel_noise=0,
)

All modules in Mava aim to work in this way.

Installation

We have tested mava on Python 3.6, 3.7 and 3.8.

Docker (Recommended)

Build the docker image using the following make command:
```
make build
```
Run an example:
```
make run EXAMPLE=dir/to/example/example.py
```
For example, make run EXAMPLE=examples/petting_zoo/sisl/multiwalker/feedforward/decentralised/run_mad4pg.py. Alternatively, run bash inside a docker container with mava installed, make bash, and from there examples can be run as follows: python dir/to/example/example.py.

To run an example with tensorboard viewing enabled, you can run
```
make run-tensorboard EXAMPLE=dir/to/example/example.py
```
and navigate to http://127.0.0.1:6006/.
Install multi-agent Starcraft 2 environment [Optional]: To install the environment, please run the provided bash script, which is a slightly modified version of the script found here.
```
./install_sc2.sh
```
Or optionally install through docker (each build downloads and installs StarCraftII ~3.8G ):
```
make build
make build_sc2
```
Install 2D RoboCup environment [Optional]: To install the environment, please run the robocup docker build command after running the Mava docker build command.
```
make build
make build_robocup
```

Python virtual environment

If not using docker, we strongly recommend using a Python virtual environment to manage your dependencies in order to avoid version conflicts. Please note that since Launchpad only supports Linux based OSes, using a python virtual environment will only work in these cases:
```
python3 -m venv mava
source mava/bin/activate
pip install --upgrade pip setuptools
```

To install the core libraries, including Reverb - our storage dataset :

pip install id-mava
pip install id-mava[reverb]

Or for nightly builds:

pip install id-mava-nightly
pip install id-mava-nightly[reverb]

To install dependencies for tensorflow agents:
```
pip install id-mava[tf]
```
For distributed agent support:
```
pip install id-mava[launchpad]
```
To install example environments, such as PettingZoo:
```
pip install id-mava[envs]
```
NB: For Flatland, OpenSpiel and SMAC environments, installations have to be done separately. Flatland can be installed using:
```
pip install id-mava[flatland]
```
and for OpenSpiel, after ensuring that the right cmake and clang versions are installed as specified here:
```
pip install id-mava[open_spiel]
```
For StarCraft II installation, this must be installed separately according to your operating system. To install the StarCraft II ML environment and associated packages, please follow the instructions on PySC2 to install the StarCraft II game files. Please ensure you have the required game maps (for both PySC2 and SMAC) extracted in the StarCraft II maps directory. Once this is done you can install the packages for the single agent case (PySC2) and the multi-agent case (SMAC).
```
pip install pysc2
pip install git+https://github.com/oxwhirl/smac.git
```
For the 2D RoboCup environment, a local install has only been tested using the Ubuntu 18.04 operating system. The installation can be performed by running the RoboCup bash script while inside the Mava python virtual environment.
```
./install_robocup.sh
```

We also have a list of optional installs for extra functionality such as the use of Atari environments, environment wrappers, gpu support and agent episode recording.

Debugging

To test and debug new system implementations, we use a simplified version of the spread environment from the MPE suite. Debugging in MARL can be very difficult and time consuming, therefore it is important to use a small environment for debugging that is simple and fast but at the same time still able to clearly show whether a system is able to learn. An illustration of the debugging environment is shown on the right. Agents start at random locations and are assigned specific landmarks which they attempt to reach in as few steps as possible. Rewards are given to each agent independently as a function of their distance to the landmark. The reward is normalised to be between 0 and 1, where 1 is given when the agent is directly on top of the landmark. The further an agent is away from its landmark the more the reward value converges to 0. Collisions between agents result in a reward of -1 received by the colliding agents. To test both discrete and continuous control systems we feature two versions of the environment. In the discrete version the action space for each agent consists of the following five actions: left, right, up, down, stand-still. In the continuous case, the action space consists of real values bounded between -1 and 1 for the acceleration of the agent in the x and y direction. Several examples of running systems on the debugging environment can be found here. Below we show the results from some of our systems trained on the debugging environment.

Roadmap

We have big ambitions for Mava! 🚀 But there is still much work that needs to be done. We have a clear roadmap and wish list for expanding our system implementations and associated modules, improving testing and robustness and providing support for across-machine training. Please visit them using the links below and feel free to add your own suggestions!

In the slightly more longer term, the Mava team plans to release benchmarking results for several different systems and environments and contribute a MARL specific behavioural environment suite (similar to the bsuite for single-agent RL) specifically engineered to study aspects of MARL such as cooperation and coordination.

Contributing

Please read our contributing docs for details on how to submit pull requests, our Contributor License Agreement and community guidelines.

Troubleshooting and FAQs

Please read our troubleshooting and FAQs guide.

Citing Mava

If you use Mava in your work, please cite the accompanying technical report:

@article{pretorius2021mava,
    title={Mava: A Research Framework for Distributed Multi-Agent Reinforcement Learning},
    author={Arnu Pretorius and Kale-ab Tessera and Andries P. Smit and Kevin Eloff
    and Claude Formanek and St John Grimbly and Siphelele Danisa and Lawrence Francis
    and Jonathan Shock and Herman Kamper and Willie Brink and Herman Engelbrecht
    and Alexandre Laterre and Karim Beguir},
    year={2021},
    journal={arXiv preprint arXiv:2107.01460},
    url={https://arxiv.org/pdf/2107.01460.pdf},
}

Comments

Evaluating model after training is done?
Is there an easy way in Mava to load a trained model from the checkpoints, run it again and evaluate it's performance?

I haven't found any example on how to do this and can't find and easy way to do it just by looking at the code.

Is this already implemented in Mava?

If not, could you please point out a way for me to implement this?

To make it more clear, the reason I need this is because I'm working with a model topology that allows a variable number of agents as input. This means I can train the model using a 3-agent environment, but after the training is done I can use this trained model and run it on the same environment with more or less agents and I want to evaluate the performance with different number of agents.
question
opened by mlanas 17
Training affected in development branches
Problem

develop and feature/mava-scaling branches seem to be taking longer to train the debugging environment example run_maddpg.py than the 0.1.0 release. The difference between 0.1.0 and develop is not that big, but the feature/mava-scaling one does seem to affect the training considerably.

| 0.1.0 runs | develop runs | feature/mava-scaling runs | |:-------------:|:-------------------:|:-------------------------------------------:| | | | |

Execution

The tests were executed using docker. Each branch was cloned to a different directory and the 3 docker images were built.

For each branch, the test was executed 3 times with the command:

make run-tensorboard

Note

After cloning the 0.1.0 tag, the simple_spread.py file of the debugging environment was updated to incorporate the changes added in #288 so all the tests are executed in the same environment.
bug
opened by mlanas 11
Feature/jax upgrade networks upgrade acme
What?

Change mlp to layernormmlp in ppo to be consistent with our tf systems.

Upgrade acme, reverb, tf and launchpad (we will have to benchmark this).

Why?

How?

Extra

benchmark in progress size/XS
opened by KaleabTessera 9
Quickstart notebbok: Run Multi-Agent DDPG System.

I was playing around with the quickstart notebook but having this error on Run Multi-Agent DDPG System (I tried locally and on colab):

UnparsedFlagAccessError: Trying to access flag --lp_termination_notice_secs before flags were parsed.
bug

opened by jbakams 9
PettingZoo simple_spread example doesnt learn
Problem

I'm running the PettingZoo simple_spread Mava example (run_maddpg.py) from the develop branch and the MeanEpisodeReturn does not improve.

Is this the expected behaviour? Or should I maybe let it train longer?

Execution

make run-tensorboard EXAMPLE=examples/petting_zoo/mpe/simple_spread/feedforward/decentralised/run_maddpg.py
question
opened by mlanas 8
Questions about multiwalker

Hey, a quick question- how many timesteps did you train multiwalker for with MAD4PG a few months ago when you were able to learn it so effectively that the environment broke and you created an issue with us?
question

opened by jkterry1 7
Feature/Population Based Training

What?

Add the first example of population based training in Mava. This example uses the recurrent MAD4PG algorithm to train a population of 5 networks, using 5 trainers and 5 executors, on the debugging environment. The hyperparameters that are getting tuned are the discount factor, target update rate and the target update period. This PR will remain in draft form for now as it still needs to be tested in a more complicated environment for longer time periods.

Why?

Population based training allows for the joint optimisation of hyperparameters and network parameters in one training setting.

How?

Various hooks have been added inside the MADDPG system. A PBT wrapper has also been added. The PBT wrapper can now wrap an MADDG and MAD4PG system and overwrite the appropriate hooks to add PBT to the system.

Extra
enhancement

opened by DriesSmit 7
Feature/Multiple trainers for MA-DDPG

What?

Implements a scaled-up version of MADDPG where multiple trainers can now be used with multiple executors. A centralised variable server is also implemented that absorbs the responsibilities of the counter node, trainer checkpointing and trainer variable source. The trainers and executors now read and write to the centralised variable source directly. A multiple trainer example is included where 3 trainers and 2 executors are used to train 3 non-weight sharing agents on the debugging environment.

Why?

Multiple trainers allow for the parallelisation of the trainer's tasks. Just like is already done with executors. This also opens the door to hyperparameter tuning directly using Mava in future updates.

How?

Added a new Scaled MA-DDPG system that allows for the use of multiple trainers.

Extra

This PR uses changes proposed in updated-network-keys. Therefore that PR should be merged first. After that point, this PR can be moved out of the draft status.
enhancement

opened by DriesSmit 7
Feature/starcraft wrapper

What?

Implement StarCraft II wrapper #113. Add installation instructions to README #188 .

Why?

SCII is an important test-bed for RL/MARL agents. Specifically, SMAC is used for testing mixing agents etc.

How?

Implement SC2 wrapper in the style of the pettingzoo/debugging env wrappers. Pull some methods from the RLlib wrapper provided by SMAC.

Extra

This is untested for various reasons. Basically would like some experienced wrapper eyes on the file/progress 😄 👁️.
enhancement

opened by sgrimbly 7
[BUG] Remove nested tf.function

Describe the bug Nested tf.function decorators are causes TF to constantly retrace which is could cause significant performance and memory issues.

Additional context I think this bug creeped in when we refactored our code to separate forward and backward passes.

Possible Solution Remove tf.function decorator from the backward pass.

This bug is also related to #77 and #346
bug

opened by arnupretorius 6
feat: Checkpointer Component
What?

A Checkpointer Component for JAX systems.

Why?

Save variables to file and restore pretrained weights

How?

Created a Checkpointer Component for JAX systems that uses ACME JAX checkpointer

Added a checkpointer unit test

Moved Optimisers to an optmiser component

Initialised opt_states in the trainer component

Extra

Close Create a Checkpointer Component for JAX systems

Updated parameter server tests as the checkpointer is no longer integrated into the param server

Modified the test systems to save the experiment data in a temp folder

Fixed a small bug to ensure that parameter client get and set keys are always disjoint sets

Renamed all optimZer to optimiSer in code :smile:

General refactor by removing unused imports and repeated code

Refactored tests to no longer say "separate_networks"

Added a constants.py file as per discussion with @DriesSmit

To follow in another PR:

Checkpointing JAX random states (to be discussed): https://github.com/instadeepai/Mava/issues/746

Checkpointing best parameters: https://github.com/instadeepai/Mava/issues/744

Documenting checkpointer: https://github.com/instadeepai/Mava/issues/749

New issues opened as a result of this investigation

https://github.com/instadeepai/Mava/issues/747

https://github.com/instadeepai/Mava/issues/748

size/XXL
opened by AsadJeewa 5
[BUG] Quickstart example fails in Colab
Describe the bug

Problems when trying to run the quickstart.ipynb notebook

To Reproduce

Steps to reproduce the behavior:

Visit https://colab.research.google.com/github/instadeepai/Mava/blob/develop/examples/quickstart.ipynb

Run the "Install required packages" cell

Hit error in installing box-2d (note: this install fails quietly, because the output is %%captured)

As a result, id-mava isn't installed and the later cells won't run

Expected behavior

The install should work without hiccup for any user trying the Colab notebook.

Context (Environment)

OS: Google Colab – Release 2022/12/6

Additional context

n/a

Possible Solution

Common problem with box-2d: https://stackoverflow.com/questions/54252800/python-cant-install-box2d-swig-exe-failed-with-error-code-1, need to manually install swig first.
bug
opened by callumtilbury 0
feat: support for TPU - sets environment variables correctly to use T…

TPU support

What?

Changed Environment variables in lp_utils.to_device function to set up that only "nodes_on_gpu" can see the TPU and other nodes can only see CPU. This allows the trainer to run on a TPU. Additionally, a new config parameter simply called "use_tpu" was added and threaded through the launcher.

Why?

This is due to launchpad processes crashing if more than one process tries to use a TPU.

How?

As stated in "What", the environment variables decide which platform JAX uses.

Extra

There is a slight problem when wanting to use a TPU. The base python environment (the one calling the training script) needs to be set to only see a CPU otherwise it will crash for the same reason as stated above. This is simple to do through export JAX_PLATFORMS="cpu". One thing that has not been considered in this PR is if someone wants to put certain nodes on the TPU and other nodes on the GPU but that is quite fine grained and can be easily added later down the line. It gets quite complicated as TPUs can only have a single model running on it so I'm also not sure how this will work for non-parameter sharing situations i.e heterogenous agents.
size/M

opened by EdanToledo 4
[FEATURE] Add TPU support
Please describe the purpose of the feature. Is it related to a problem?

Hello, the title is pretty self explanatory. I'd just like to add TPU support for mava. Due to launchpad, TPUs wont work with the code as is. I'm not sure if you have tried it yet but the fix is pretty simple.

Describe the solution you'd like

Essentially, all that needs to be changed is the environment variables that are set in the lp_utils.to_device function. I've already written the code - its like 4-6 lines but I am unable to make a PR.

Describe alternatives you've considered

Crying and not running on a TPU.

How do we know when implementation of this feature is complete?

Checklist:

[X] Code runs on TPU.

Additional context

I am currently using mava on TPU so this is all that is needed to be done.
enhancement
opened by EdanToledo 2
feat: make best checkpoint support norm params
What?

Make the bestcheckpoint component support the case of normalization

Why?

Currently the best checkpointer and the absolute metric features are working fine in the default case, however, they both don't support the case of the normalization of the params such as normalize observation and normalize the target value.

How?

Edit the stored network in the best_checkpoint params

Extra

Close #859

To test this feature:

Run examples/smac/feedforward/decentralised/calculate_absolute_metric.py to check the logged json file

Run examples/smac/feedforward/decentralised/best_checkpointed_net.py to check that it restore the best network

enhancement size/M
opened by OmaymaMahjoub 1
Recurrent IPPO critic

What?

Added support for recurrent critics in IPPO. The system is working, learning and leads to performance increases. The trainer works by using the initial RNN hidden state for training instead of the network hidden state used during interacting with the environment.

Why?

Improve the improve IPPO system performance.

Extra

For now the batch size has to be passed in to mava/systems/ippo/networks.py for initialising the critic hidden states. The system has the most optimal performance when value clipping, orthogonal network initialisation and all normalisation except observation normalisation are turned off. Additionally an MSE loss should be used for the critic network for optimal performance.
enhancement size/XXL

opened by RuanJohn 0
[MAINTAIN] Making the best checkpoint and the absolute metric support norm params

Please describe what needs to be maintained?

Currently the best checkpointer and the absolute metric features are working fine in the default case, however, they both don't support the case of the normalization of the params such as normalize observation and normalize the target value.

Describe the outcome you'd like

Create the option of storing norm_params in the best checkpointer component
maintenance

opened by OmaymaMahjoub 0

Releases(0.1.3)

0.1.3(Jun 15, 2022)
Highlights

This is the last tensorflow system release. After this, tensorflow systems will be deprecated in favour of Jax systems and our new callback redesign (https://github.com/instadeepai/Mava/pull/457).

Systems

Updates to acme, reverb and tensorflow.

Working centralised and state based architectures.

Recurrent and Multiple Trainer PPO.

Environments

What's Changed

Bugfix/ Release aren't triggering pypi push job. by @KaleabTessera in https://github.com/instadeepai/Mava/pull/466

Feature / Release 0.1.2 v2 by @KaleabTessera in https://github.com/instadeepai/Mava/pull/467

fix: Update black version. by @DriesSmit in https://github.com/instadeepai/Mava/pull/470

Bugfix/ Update PZ Version and new jax dockerfiles by @KaleabTessera in https://github.com/instadeepai/Mava/pull/480

Feature/recurrent and multiple trainer MAPPO by @DriesSmit in https://github.com/instadeepai/Mava/pull/326

Feat/maddpg obs optim by @AsadJeewa in https://github.com/instadeepai/Mava/pull/459

feat: Add fixed sampler capability + bugfixes by @DriesSmit in https://github.com/instadeepai/Mava/pull/475

Feature/fix sampler madqn by @EdanToledo in https://github.com/instadeepai/Mava/pull/477

chore: Up the patch version of mava - 0.1.3. by @KaleabTessera in https://github.com/instadeepai/Mava/pull/485

Bugfix/fix old tf architectures by @KaleabTessera in https://github.com/instadeepai/Mava/pull/552

Release 0.1.3 by @KaleabTessera in https://github.com/instadeepai/Mava/pull/486

Full Changelog: https://github.com/instadeepai/Mava/compare/0.1.2...0.1.3
Source code(tar.gz)
Source code(zip)
0.1.2(Mar 28, 2022)
Highlights

Systems

Fixed observation network bug in mappo + changed implementation to use two optims.

Fixes in maddpg/mad4pg loss calculation.

Began on jax system implementations.

Environments

What's Changed

Fix/add loss mask to ppo by @EdanToledo in https://github.com/instadeepai/Mava/pull/441

Mainetenance: Fix tf examples issues by @AsadJeewa in https://github.com/instadeepai/Mava/pull/444

fix: shared weights with agent type by @AsadJeewa in https://github.com/instadeepai/Mava/pull/428

Fix broken readme links and neaten up formatting by @AsadJeewa in https://github.com/instadeepai/Mava/pull/446

Feature/jax abstract builder class by @arnupretorius in https://github.com/instadeepai/Mava/pull/433

docs: updated docs to better represent available options by @sash-a in https://github.com/instadeepai/Mava/pull/448

Feature/jax general system class by @arnupretorius in https://github.com/instadeepai/Mava/pull/425

Bugfix/Mypy Inconsistency Issue by @KaleabTessera in https://github.com/instadeepai/Mava/pull/458

fix/remove flatland wrapper debug print statement by @mmorris44 in https://github.com/instadeepai/Mava/pull/456

Feature/MAPPO Obs Networks Fix + Multiple Optims by @KaleabTessera in https://github.com/instadeepai/Mava/pull/454

Feature/new issue template for investigations by @KaleabTessera in https://github.com/instadeepai/Mava/pull/461

Bugfix/MADD(4)PG by @DriesSmit in https://github.com/instadeepai/Mava/pull/460

Feat/Upped pypi version. by @KaleabTessera in https://github.com/instadeepai/Mava/pull/464

Feature / Release 0.1.2 by @KaleabTessera in https://github.com/instadeepai/Mava/pull/465

Full Changelog: https://github.com/instadeepai/Mava/compare/0.1.1...0.1.2
Source code(tar.gz)
Source code(zip)
0.1.1(Feb 25, 2022)
Highlights

Systems

Stable versions of all systems - noteably stable mappo, vdn and qmix.

Multiple trainer implementations for maddpg and mad4pg.

Removed the dial system.

Environments/ Environment Wrappers

Added Melting Pot support.

What's Changed

Feature/Enforce docstring code coverage. by @KaleabTessera in https://github.com/instadeepai/Mava/pull/271

Chore/Resized gifs in readme. by @KaleabTessera in https://github.com/instadeepai/Mava/pull/272

Feature/Improve Mava agent networks by @DriesSmit in https://github.com/instadeepai/Mava/pull/258

Feature/upgrade acme version and use new adders by @KaleabTessera in https://github.com/instadeepai/Mava/pull/274

Chore/Updated makefile and readme for Windows. by @Nashlen in https://github.com/instadeepai/Mava/pull/273

Fix/supersuit version by @KaleabTessera in https://github.com/instadeepai/Mava/pull/277

Chore/ Update quickstart by @KaleabTessera in https://github.com/instadeepai/Mava/pull/278

Feature/New acme adders and tests. by @KaleabTessera in https://github.com/instadeepai/Mava/pull/276

feature: working version of importance sampling on feedforward madqn. by @jcformanek in https://github.com/instadeepai/Mava/pull/275

fix/ Smac Load by @KaleabTessera in https://github.com/instadeepai/Mava/pull/283

update Dockerfile for SMAC installation by @mnguyen0226 in https://github.com/instadeepai/Mava/pull/286

Bugfix: Simple_spread observation code. by @DriesSmit in https://github.com/instadeepai/Mava/pull/288

Bugfix/launchpad flag issue by @KaleabTessera in https://github.com/instadeepai/Mava/pull/291

Feature/mava reproducibility and PZ wrapper fix by @KaleabTessera in https://github.com/instadeepai/Mava/pull/296

fix: Autorom manual download. by @KaleabTessera in https://github.com/instadeepai/Mava/pull/300

Feature: Add Readme for setting up a new environment by @DriesSmit in https://github.com/instadeepai/Mava/pull/299

Chore/re add autorom by @KaleabTessera in https://github.com/instadeepai/Mava/pull/302

Add checkpoint save interval variable. by @DriesSmit in https://github.com/instadeepai/Mava/pull/301

Feature/Upgraded tf and reverb versions. by @KaleabTessera in https://github.com/instadeepai/Mava/pull/303

Chore/flatland gif by @arnupretorius in https://github.com/instadeepai/Mava/pull/304

Small readme updates by @arnupretorius in https://github.com/instadeepai/Mava/pull/305

Feature: added rendering to flatland wrapper. by @jcformanek in https://github.com/instadeepai/Mava/pull/307

chore/Updates for new acme version. by @KaleabTessera in https://github.com/instadeepai/Mava/pull/308

Fix per agent loggers by @DriesSmit in https://github.com/instadeepai/Mava/pull/313

Removed deprecated shared_weights parameter by @mmorris44 in https://github.com/instadeepai/Mava/pull/319

docs: update README with correct link by @AsadJeewa in https://github.com/instadeepai/Mava/pull/320

small fix for README.md by @arnupretorius in https://github.com/instadeepai/Mava/pull/322

Feature/Multiple trainers for MA-DDPG by @DriesSmit in https://github.com/instadeepai/Mava/pull/253

Fix Flatland package error in Docker build by @DriesSmit in https://github.com/instadeepai/Mava/pull/328

Feature/melting pot by @ldfrancis in https://github.com/instadeepai/Mava/pull/324

Fix RoboCup environment wrapper by @DriesSmit in https://github.com/instadeepai/Mava/pull/334

Feature/eval intervals by @KaleabTessera in https://github.com/instadeepai/Mava/pull/323

Feature/ Smac wrapper Update, MADQN/QMIX/VDN upgrades and Dockerfile improvements by @KaleabTessera in https://github.com/instadeepai/Mava/pull/310

Feature/add robocup gif by @DriesSmit in https://github.com/instadeepai/Mava/pull/336

Feature/auto-push-docker-images and version upgrades by @KaleabTessera in https://github.com/instadeepai/Mava/pull/342

Added a brief explanation of Logging metrics to README by @RuanJohn in https://github.com/instadeepai/Mava/pull/341

Updated pip installation instructions in README by @RuanJohn in https://github.com/instadeepai/Mava/pull/343

Bugfix/dockerfile no module found by @KaleabTessera in https://github.com/instadeepai/Mava/pull/344

feat(git): Added feature and bug templates. by @KaleabTessera in https://github.com/instadeepai/Mava/pull/350

Doc/meltingpot gif by @ldfrancis in https://github.com/instadeepai/Mava/pull/351

Replace types ParallelAdder with ReverbParallelAdder by @AsadJeewa in https://github.com/instadeepai/Mava/pull/356

Update README to link to pypi package by @AsadJeewa in https://github.com/instadeepai/Mava/pull/360

Feature/auto docs by @KaleabTessera in https://github.com/instadeepai/Mava/pull/354

Feature/maintainace issue template by @arnupretorius in https://github.com/instadeepai/Mava/pull/368

Fix/broken launchpad link by @sash-a in https://github.com/instadeepai/Mava/pull/370

feat: filter docker image push based on label. by @KaleabTessera in https://github.com/instadeepai/Mava/pull/375

Maintenance/update readme by @arnupretorius in https://github.com/instadeepai/Mava/pull/378

chore: expand code owner list for better code review by @arnupretorius in https://github.com/instadeepai/Mava/pull/390

Feature/internal feature issue template by @arnupretorius in https://github.com/instadeepai/Mava/pull/379

Feature/internal bug issue template by @arnupretorius in https://github.com/instadeepai/Mava/pull/381

feat: benchmarking issue template by @arnupretorius in https://github.com/instadeepai/Mava/pull/385

Bugfix: Fixed conventional commit pre-commit hook not running. by @KaleabTessera in https://github.com/instadeepai/Mava/pull/395

Fix/checklist for issue templates by @arnupretorius in https://github.com/instadeepai/Mava/pull/388

feat: internal issue tempalte for tests by @arnupretorius in https://github.com/instadeepai/Mava/pull/399

chore: add optional benchmark questions to feature by @arnupretorius in https://github.com/instadeepai/Mava/pull/401

Fix/madqn by @jcformanek in https://github.com/instadeepai/Mava/pull/362

Fix/architecture typo fix. by @RuanJohn in https://github.com/instadeepai/Mava/pull/410

Fix/Smac Wrapper Relies on Flatland Installation by @KaleabTessera in https://github.com/instadeepai/Mava/pull/413

refactor: move examples into tf folder and update examples links by @arnupretorius in https://github.com/instadeepai/Mava/pull/416

fix: readd quickstart notebook by @arnupretorius in https://github.com/instadeepai/Mava/pull/417

fix: Fix broken tests due to new gym version. by @KaleabTessera in https://github.com/instadeepai/Mava/pull/421

Maintenance: Remove redundant value_network code by @AsadJeewa in https://github.com/instadeepai/Mava/pull/423

fix: small bug in the pettingzoo wrapper related to legal action masking by @jcformanek in https://github.com/instadeepai/Mava/pull/432

Fix/Flatland Docker Container by @KaleabTessera in https://github.com/instadeepai/Mava/pull/437

Feature/jax abstract system class by @arnupretorius in https://github.com/instadeepai/Mava/pull/405

Feature/ppo multiple train steps by @EdanToledo in https://github.com/instadeepai/Mava/pull/353

Fix/ Fix docs build. by @KaleabTessera in https://github.com/instadeepai/Mava/pull/435

Feature/jax mava custom config class by @arnupretorius in https://github.com/instadeepai/Mava/pull/414

Feat/Release new mava version. by @KaleabTessera in https://github.com/instadeepai/Mava/pull/438

Merge: Merge Dev into Main for Release by @KaleabTessera in https://github.com/instadeepai/Mava/pull/439

New Contributors

@Nashlen made their first contribution in https://github.com/instadeepai/Mava/pull/273

@mnguyen0226 made their first contribution in https://github.com/instadeepai/Mava/pull/286

@mmorris44 made their first contribution in https://github.com/instadeepai/Mava/pull/319

@AsadJeewa made their first contribution in https://github.com/instadeepai/Mava/pull/320

@RuanJohn made their first contribution in https://github.com/instadeepai/Mava/pull/341

@sash-a made their first contribution in https://github.com/instadeepai/Mava/pull/370

@EdanToledo made their first contribution in https://github.com/instadeepai/Mava/pull/353

Full Changelog: https://github.com/instadeepai/Mava/compare/0.1.0...0.1.1
Source code(tar.gz)
Source code(zip)
0.1.0(Jul 6, 2021)
Highlights

Mava Core

Components

Architectures

Added Centralised, Decentralised, Networked and State Based Architectures.

Modules

Added Broadcast Communication, Epsilon Decay Scheduling, Additive and Monotonic Mixing and Fingerprint Stabilization.

Networks

Added Additive and Monotic Mixing Networks, Hypernetworks, Communication Networks, Epsilon Greedy and DiscreteValued head.

Environment Loops

Added Parallel and Sequential Environment Loops.

Adders

Added Parallel versions of Transition, Sequential and Episode Adders.

Systems

Added feedforward training for maddpg, mad4pg, madqn, mappo, vdn and qmix.

Added recurrent training for madqn, dial, maddpg and mad4pg.

Added continuous network heads for maddpg, mad4pg and mappo.

Added decentralised architecture training for maddpg, mad4pg, madqn, mappo, dial, vdn and qmix.

Added centralised architecture training for maddpg, mad4pg and mappo.

Added state based architecture training for maddpg and mad4pg.

Added networked architecture training for maddpg.

Environments/ Environment Wrappers

Added PettingZoo, SMAC, RoboCup, OpenSpiel, Flatland, Debug Simple Spread, Debug Switch environment and Debug Two-Step game.

Examples

Added quickstart notebook.

Added basic examples for sample systems and environments.

Minor Changes and Fixes

Source code(tar.gz)
Source code(zip)
0.0.9(Jun 9, 2021)

MAVA Pre-release test before official 0.1.0 release.
Source code(tar.gz)
Source code(zip)

Owner

InstaDeep Ltd

InstaDeep offers a host of Enterprise AI products, ranging from GPU-accelerated insights to self-learning decision making systems.

GitHub Repository

Code base for "On-the-Fly Test-time Adaptation for Medical Image Segmentation"

On-the-Fly Adaptation Official Pytorch Code base for On-the-Fly Test-time Adaptation for Medical Image Segmentation Paper Introduction One major probl

17 Nov 10, 2022

GazeScroller - Using Facial Movements to perform Hands-free Gesture on the system

GazeScroller Using Facial Movements to perform Hands-free Gesture on the system

2 Jan 05, 2022

Restricted Boltzmann Machines in Python.

How to Use First, initialize an RBM with the desired number of visible and hidden units. rbm = RBM(num_visible = 6, num_hidden = 2) Next, train the m

928 Dec 30, 2022

Simple and Effective Few-Shot Named Entity Recognition with Structured Nearest Neighbor Learning

structshot Code and data for paper "Simple and Effective Few-Shot Named Entity Recognition with Structured Nearest Neighbor Learning", Yi Yang and Arz

47 Dec 27, 2022

PyTorch implementation of "A Simple Baseline for Low-Budget Active Learning".

A Simple Baseline for Low-Budget Active Learning This repository is the implementation of A Simple Baseline for Low-Budget Active Learning. In this pa

10 Nov 14, 2022

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

README.md shall be finished soon. WSSGG 0 Overview 1 Installation 1.1 Faster-RCNN 1.2 Language Parser 1.3 GloVe Embeddings 2 Settings 2.1 VG-GT-Graph

35 Nov 20, 2022

PyTorch implementation of ENet

PyTorch-ENet PyTorch (v1.1.0) implementation of ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation, ported from the lua-torc

333 Dec 29, 2022

Semantic segmentation task for ADE20k & cityscapse dataset, based on several models.

semantic-segmentation-tensorflow This is a Tensorflow implementation of semantic segmentation models on MIT ADE20K scene parsing dataset and Cityscape

83 Oct 13, 2022

Official implementation of Deep Convolutional Dictionary Learning for Image Denoising.

DCDicL for Image Denoising Hongyi Zheng*, Hongwei Yong*, Lei Zhang, "Deep Convolutional Dictionary Learning for Image Denoising," in CVPR 2021. (* Equ

91 Dec 21, 2022

A computational block to solve entity alignment over textual attributes in a knowledge graph creation pipeline.

How to apply? Create your config.ini file following the example provided in config.ini Choose one of the options below to run: Run with Python3 pip in

3 Jun 23, 2022

Sequence modeling benchmarks and temporal convolutional networks

Sequence Modeling Benchmarks and Temporal Convolutional Networks (TCN) This repository contains the experiments done in the work An Empirical Evaluati

3.5k Jan 01, 2023

Code for our EMNLP 2021 paper “Heterogeneous Graph Neural Networks for Keyphrase Generation”

GATER This repository contains the code for our EMNLP 2021 paper “Heterogeneous Graph Neural Networks for Keyphrase Generation”. Our implementation is

12 Nov 24, 2022

Latent Execution for Neural Program Synthesis

Latent Execution for Neural Program Synthesis This repo provides the code to replicate the experiments in the paper Xinyun Chen, Dawn Song, Yuandong T

16 Oct 02, 2022

A lossless neural compression framework built on top of JAX.

Kompressor Branch CI Coverage main (active) main development A neural compression framework built on top of JAX. Install setup.py assumes a compatible

2 Mar 14, 2022

Yolact-keras实例分割模型在keras当中的实现

Yolact-keras实例分割模型在keras当中的实现目录性能情况 Performance 所需环境 Environment 文件下载 Download 训练步骤 How2train 预测步骤 How2predict 评估步骤 How2eval 参考资料 Reference 性能情况训练数

11 Dec 26, 2022

一些经典的CTR算法的复现; LR, FM, FFM, AFM, DeepFM，xDeepFM, PNN, DCN, DCNv2, DIFM, AutoInt, FiBiNet,AFN,ONN,DIN, DIEN ... （pytorch, tf2.0）

CTR Algorithm 根据论文, 博客, 知乎等方式学习一些CTR相关的算法理解原理并自己动手来实现一遍 pytorch & tf2.0 保持一颗学徒的心！ Schedule Model pytorch tensorflow2.0 paper LR ✔️ ✔️ \ FM ✔️ ✔️ Fac

149 Dec 20, 2022

A library of multi-agent reinforcement learning components and systems

Related tags

Overview

Mava: a research framework for distributed multi-agent reinforcement learning

Table of Contents

Overview

Systems and the Executor-Trainer Paradigm

Distributed System Training

Getting Started

Supported Environments

System Implementations

Usage

Components

Installation

Docker (Recommended)

Python virtual environment

Debugging

Roadmap

Contributing

Troubleshooting and FAQs

Citing Mava

Comments

Problem

Execution

Note

What?

Why?

How?

Extra

Problem

Execution

What?

Why?

How?

Extra

What?

Why?

How?

Extra

What?

Why?

How?

Extra

What?

Why?

How?

Extra

Describe the bug

To Reproduce

Expected behavior

Context (Environment)

Additional context

Possible Solution

TPU support

What?

Why?

How?

Extra

Please describe the purpose of the feature. Is it related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

How do we know when implementation of this feature is complete?

Additional context

What?

Why?

How?

Extra

To test this feature:

What?

Why?

Extra

Please describe what needs to be maintained?

Describe the outcome you'd like

Releases(0.1.3)

0.1.3(Jun 15, 2022)

Highlights

Systems

Environments

What's Changed

0.1.2(Mar 28, 2022)