Resilient projection-based consensus actor-critic (RPBCAC) algorithm

We implement the RPBCAC algorithm with nonlinear approximation from [1] and focus on training performance of cooperative agents in the presence of adversaries. We aim to validate the analytical results presented in the paper and prevent adversarial attacks that can arbitrarily hurt cooperative network performance including the one studied in [2]. The repository contains folders whose description is provided below:

agents - contains resilient and adversarial agents
environments - contains a grid world environment for the cooperative navigation task
simulation_results - contains plots that show training performance
training - contains functions for training agents

To train agents, execute main.py.

Multi-agent grid world: cooperative navigation

We train five agents in a grid-world environment. Their original goal is to approach their desired position without colliding with other agents in the network. We design a grid world of dimension (5 x 5) and consider a reward function that penalizes the agents for distance from the target and colliding with other agents.

We compare the cooperative network performance under the RPBCAC algorithm with the trimming parameter H=0 and H=1, which corresponds to the number of adversarial agents that are assumed to be present in the network. We consider four scenarios:

All agents are cooperative. They maximize the team-average expected returns.
One agent is greedy as it maximizes its own expected returns. It shares parameters with other agents but does not apply consensus updates.
One agent is faulty and does not have a well-defined objective. It shares fixed parameter values with other agents.
One agent is strategic; it maximizes its own returns and leads the cooperative agents to minimize their returns. The strategic agent has knowledge of other agents' rewards and updates two critic estimates (one critic is used to improve the adversary's policy and the other to hurt the cooperative agents' performance).

The simulation results below demonstrate very good performance of the RPBCAC with H=1 (right) compared to the non-resilient case with H=0 (left). The performance is measured by the episode returns. We run simulations where the cooperative agents receive a private reward (green + blue) and a team-average reward (red).

1) All cooperative

2) Three cooperative + one greedy

3) Three cooperative + one faulty

4) Three cooperative + one malicious

References

[1] M. Figura, Y. Lin, J. Liu, V. Gupta, Resilient Consensus-based Multi-agent Reinforcement Learning with Function Approximation. arXiv preprint arXiv:2111.06776, 2021.

[2] M. Figura, K. C. Kosaraju and V. Gupta, Adversarial attacks in consensus-based multi-agent reinforcement learning, 2021 American Control Conference (ACC), 2021.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
agents		agents
environments		environments
simulation_results		simulation_results
training		training
README.md		README.md
env_test.py		env_test.py
main.py		main.py
plot_results.py		plot_results.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

agents

agents

environments

environments

simulation_results

simulation_results

training

training

README.md

README.md

env_test.py

env_test.py

main.py

main.py

plot_results.py

plot_results.py

Repository files navigation

Resilient projection-based consensus actor-critic (RPBCAC) algorithm

Multi-agent grid world: cooperative navigation

1) All cooperative

2) Three cooperative + one greedy

3) Three cooperative + one faulty

4) Three cooperative + one malicious

References

About

Releases

Packages

Languages

mfigura/Resilient-consensus-based-MARL

Folders and files

Latest commit

History

Repository files navigation

Resilient projection-based consensus actor-critic (RPBCAC) algorithm

Multi-agent grid world: cooperative navigation

1) All cooperative

2) Three cooperative + one greedy

3) Three cooperative + one faulty

4) Three cooperative + one malicious

References

About

Resources

Stars

Watchers

Forks

Languages