Malmo Collaborative AI Challenge - Team Pig Catcher

Overview

The Malmo Collaborative AI Challenge - Team Pig Catcher

Approach

The challenge involves 2 agents who can either cooperate or defect. The optimal policy, based on stag hunt [1], depends on the policy of the other agent. Not knowing the other agent's policy, the optimal solution is then based on modelling the other agent's policy. Similarly, the challenge can be considered a sequential social dilemma [2], as goals could change over time.

By treating the other agent as part of the environment, we can use model-free RL, and simply aim to maximise the reward of our agent. As a baseline we take a DRL algorithm - ACER [3] - and train it against the evaluation agent (which randomly uses a focused or random strategy every episode).

We chose to approach this challenge using hierarchical RL. We assume there are 2 subpolicies, one for each type of partner agent. To do so, we use option heads [4], whereby the agent has shared features, but separate heads for different subpolicies. In this case, ACER with 2 subpolicies has 2 Q-value heads and 2 policy heads. To choose which subpolicy to use at any given time, the agent also has an additional classifier head that is trained (using an oracle) to distinguish which option to use. Therefore, we ask the following questions:

  • Can the agent distinguish between the two possible behaviours of the evaluation agent?
  • Does the agent learn qualitatively different subpolicies?

Unfortunately, due to technical difficulties and time restrictions, we were unable to successfully train an agent. Full results and more details can be found in our video.

Design Decisions

For our baseline, we implemented ACER [3] in PyTorch based on reference code [5, 6]. In addition, we augmented the state that the agent receives with the previous action, reward and a step counter [7]. Our challenge entry augments the agent with option heads [4], and we aim to distinguish the different policies of the evaluation agent.

We also introduce a novel contribution - a batch version of ACER - which increases stability. We sample a batch of off-policy trajectories, and then truncate them to match the smallest.

Instructions

Dependencies:

Firstly, build the Malmo Docker image. Secondly, enable running Docker as a non-root user.

Run ACER with OMP_NUM_THREADS=1 python pc_main.py. The code automatically opens up Minecraft (Docker) instances.

Discussion

Team Pig Catcher Discussion Video

References

[1] Game Theory of Mind
[2] Multi-agent Reinforcement Learning in Sequential Social Dilemmas
[3] Sample Efficient Actor-Critic with Experience Replay
[4] Classifying Options for Deep Reinforcement Learning
[5] ikostrikov/pytorch-a3c
[6] pfnet/ChainerRL
[7] Learning to Navigate in Complex Environments



This repository contains the task definition and example code for the Malmo Collaborative AI Challenge. This challenge is organized to encourage research in collaborative AI - to work towards AI agents that learn to collaborate to solve problems and achieve goals. You can find additional details, including terms and conditions, prizes and information on how to participate at the Challenge Homepage.

Join the chat at https://gitter.im/malmo-challenge/Lobby license


Notes for challenge participants: Once you and your team decide to participate in the challenge, please make sure to register your team at our Registration Page. On the registration form, you need to provide a link to the GitHub repository that will contain your solution. We recommend that you fork this repository (learn how), and provide address of the forked repo. You can then update your submission as you make progress on the challenge task. We will consider the version of the code on branch master at the time of the submission deadline as your challenge submission. Your submission needs to contain code in working order, a 1-page description of your approach, and a 1-minute video that shows off your agent. Please see the challenge terms and conditions for further details.


Jump to:

Installation

Prerequisites

Minimal installation

pip install -e git+https://github.com/Microsoft/malmo-challenge#egg=malmopy

or

git clone https://github.com/Microsoft/malmo-challenge
cd malmo-challenge
pip install -e .

Optional extensions

Some of the example code uses additional dependencies to provide 'extra' functionality. These can be installed using:

pip install -e '.[extra1, extra2]'

For example to install gym and chainer:

pip install -e '.[gym]'

Or to install all extras:

pip install -e '.[all]'

The following extras are available:

  • gym: OpenAI Gym is an interface to a wide range of reinforcement learning environments. Installing this extra enables the Atari example agents in samples/atari to train on the gym environments. Note that OpenAI gym atari environments are currently not available on Windows.
  • tensorflow: TensorFlow is a popular deep learning framework developed by Google. In our examples it enables visualizations through TensorBoard.

Getting started

Play the challenge task

The challenge task takes the form of a mini game, called Pig Chase. Learn about the game, and try playing it yourself on our Pig Chase Challenge page.

Run your first experiment

See how to run your first baseline experiment on the Pig Chase Challenge page.

Next steps

Run an experiment in Docker on Azure

Docker is a virtualization platform that makes it easy to deploy software with all its dependencies. We use docker to run experiments locally or in the cloud. Details on how to run an example experiment using docker are in the docker README.

Resources

A crash course in six episodes for software developers who want to become machine learning practitioners.

Featured code sample tensorflow-planespotting Code from the Google Cloud NEXT 2018 session "Tensorflow, deep learning and modern convnets, without a P

Google Cloud Platform 2.6k Jan 08, 2023
[CVPR 2019 Oral] Multi-Channel Attention Selection GAN with Cascaded Semantic Guidance for Cross-View Image Translation

SelectionGAN for Guided Image-to-Image Translation CVPR Paper | Extended Paper | Guided-I2I-Translation-Papers Citation If you use this code for your

Hao Tang 424 Dec 02, 2022
Continuous Query Decomposition for Complex Query Answering in Incomplete Knowledge Graphs

Continuous Query Decomposition This repository contains the official implementation for our ICLR 2021 (Oral) paper, Complex Query Answering with Neura

UCL Natural Language Processing 71 Dec 29, 2022
code from "Tensor decomposition of higher-order correlations by nonlinear Hebbian plasticity"

Code associated with the paper "Tensor decomposition of higher-order correlations by nonlinear Hebbian learning," Ocker & Buice, Neurips 2021. "plot_f

Gabriel Koch Ocker 4 Oct 16, 2022
Code base of object detection

rmdet code base of object detection. 环境安装: 1. 安装conda python环境 - `conda create -n xxx python=3.7/3.8` - `conda activate xxx` 2. 运行脚本,自动安装pytorch1

3 Mar 08, 2022
An official source code for paper Deep Graph Clustering via Dual Correlation Reduction, accepted by AAAI 2022

Dual Correlation Reduction Network An official source code for paper Deep Graph Clustering via Dual Correlation Reduction, accepted by AAAI 2022. Any

yueliu1999 109 Dec 23, 2022
Optimized Gillespie algorithm for simulating Stochastic sPAtial models of Cancer Evolution (OG-SPACE)

OG-SPACE Introduction Optimized Gillespie algorithm for simulating Stochastic sPAtial models of Cancer Evolution (OG-SPACE) is a computational framewo

Data and Computational Biology Group UNIMIB (was BI*oinformatics MI*lan B*icocca) 0 Nov 17, 2021
CLOOB training (JAX) and inference (JAX and PyTorch)

cloob-training Pretrained models There are two pretrained CLOOB models in this repo at the moment, a 16 epoch and a 32 epoch ViT-B/16 checkpoint train

Katherine Crowson 64 Nov 27, 2022
An implementation of EWC with PyTorch

EWC.pytorch An implementation of Elastic Weight Consolidation (EWC), proposed in James Kirkpatrick et al. Overcoming catastrophic forgetting in neural

Ryuichiro Hataya 166 Dec 22, 2022
All materials of Cassandra Event, Udyam'22

Cassandra 2022 Workspace Workshop Materials Workshop-1 Workshop-2 Workshop-3 Workshop-4 Assignments Assignment-1 Assignment-2 Assignment-3 Resources P

36 Dec 31, 2022
Hyperopt for solving CIFAR-100 with a convolutional neural network (CNN) built with Keras and TensorFlow, GPU backend

Hyperopt for solving CIFAR-100 with a convolutional neural network (CNN) built with Keras and TensorFlow, GPU backend This project acts as both a tuto

Guillaume Chevalier 103 Jul 22, 2022
RuDOLPH: One Hyper-Modal Transformer can be creative as DALL-E and smart as CLIP

[Paper] [Хабр] [Model Card] [Colab] [Kaggle] RuDOLPH 🦌 🎄 ☃️ One Hyper-Modal Tr

Sber AI 230 Dec 31, 2022
Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch

🦩 Flamingo - Pytorch Implementation of Flamingo, state-of-the-art few-shot visual question answering attention net, in Pytorch. It will include the p

Phil Wang 630 Dec 28, 2022
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

DLR-RM 4.7k Jan 01, 2023
A variational Bayesian method for similarity learning in non-rigid image registration (CVPR 2022)

A variational Bayesian method for similarity learning in non-rigid image registration We provide the source code and the trained models used in the re

daniel grzech 14 Nov 21, 2022
Auxiliary data to the CHIIR paper Searching to Learn with Instructional Scaffolding

Searching to Learn with Instructional Scaffolding This is the data and analysis code for the paper "Searching to Learn with Instructional Scaffolding"

Arthur Câmara 2 Mar 02, 2022
Neural network chess engine trained on Gary Kasparov's games.

Neural Chess It's not the best chess engine, but it is a chess engine. Proof of concept neural network chess engine (feed-forward multi-layer perceptr

3 Jun 22, 2022
codebase for "A Theory of the Inductive Bias and Generalization of Kernel Regression and Wide Neural Networks"

Eigenlearning This repo contains code for replicating the experiments of the paper A Theory of the Inductive Bias and Generalization of Kernel Regress

Jamie Simon 45 Dec 02, 2022
Modified prey-predator system - Modified prey–predator model describes the rate of change for each species by adding coupling terms.

Modified prey-predator system We aim to study the behaviors of the modified prey–predator model and establish the effects of several parameters that p

Seoyoung Oh 1 Jan 02, 2022
A package for "Procedural Content Generation via Reinforcement Learning" OpenAI Gym interface.

Readme: Illuminating Diverse Neural Cellular Automata for Level Generation This is the codebase used to generate the results presented in the paper av

Sam Earle 27 Jan 05, 2023