Malmo Collaborative AI Challenge - Team Pig Catcher

Overview

The Malmo Collaborative AI Challenge - Team Pig Catcher

Approach

The challenge involves 2 agents who can either cooperate or defect. The optimal policy, based on stag hunt [1], depends on the policy of the other agent. Not knowing the other agent's policy, the optimal solution is then based on modelling the other agent's policy. Similarly, the challenge can be considered a sequential social dilemma [2], as goals could change over time.

By treating the other agent as part of the environment, we can use model-free RL, and simply aim to maximise the reward of our agent. As a baseline we take a DRL algorithm - ACER [3] - and train it against the evaluation agent (which randomly uses a focused or random strategy every episode).

We chose to approach this challenge using hierarchical RL. We assume there are 2 subpolicies, one for each type of partner agent. To do so, we use option heads [4], whereby the agent has shared features, but separate heads for different subpolicies. In this case, ACER with 2 subpolicies has 2 Q-value heads and 2 policy heads. To choose which subpolicy to use at any given time, the agent also has an additional classifier head that is trained (using an oracle) to distinguish which option to use. Therefore, we ask the following questions:

  • Can the agent distinguish between the two possible behaviours of the evaluation agent?
  • Does the agent learn qualitatively different subpolicies?

Unfortunately, due to technical difficulties and time restrictions, we were unable to successfully train an agent. Full results and more details can be found in our video.

Design Decisions

For our baseline, we implemented ACER [3] in PyTorch based on reference code [5, 6]. In addition, we augmented the state that the agent receives with the previous action, reward and a step counter [7]. Our challenge entry augments the agent with option heads [4], and we aim to distinguish the different policies of the evaluation agent.

We also introduce a novel contribution - a batch version of ACER - which increases stability. We sample a batch of off-policy trajectories, and then truncate them to match the smallest.

Instructions

Dependencies:

Firstly, build the Malmo Docker image. Secondly, enable running Docker as a non-root user.

Run ACER with OMP_NUM_THREADS=1 python pc_main.py. The code automatically opens up Minecraft (Docker) instances.

Discussion

Team Pig Catcher Discussion Video

References

[1] Game Theory of Mind
[2] Multi-agent Reinforcement Learning in Sequential Social Dilemmas
[3] Sample Efficient Actor-Critic with Experience Replay
[4] Classifying Options for Deep Reinforcement Learning
[5] ikostrikov/pytorch-a3c
[6] pfnet/ChainerRL
[7] Learning to Navigate in Complex Environments



This repository contains the task definition and example code for the Malmo Collaborative AI Challenge. This challenge is organized to encourage research in collaborative AI - to work towards AI agents that learn to collaborate to solve problems and achieve goals. You can find additional details, including terms and conditions, prizes and information on how to participate at the Challenge Homepage.

Join the chat at https://gitter.im/malmo-challenge/Lobby license


Notes for challenge participants: Once you and your team decide to participate in the challenge, please make sure to register your team at our Registration Page. On the registration form, you need to provide a link to the GitHub repository that will contain your solution. We recommend that you fork this repository (learn how), and provide address of the forked repo. You can then update your submission as you make progress on the challenge task. We will consider the version of the code on branch master at the time of the submission deadline as your challenge submission. Your submission needs to contain code in working order, a 1-page description of your approach, and a 1-minute video that shows off your agent. Please see the challenge terms and conditions for further details.


Jump to:

Installation

Prerequisites

Minimal installation

pip install -e git+https://github.com/Microsoft/malmo-challenge#egg=malmopy

or

git clone https://github.com/Microsoft/malmo-challenge
cd malmo-challenge
pip install -e .

Optional extensions

Some of the example code uses additional dependencies to provide 'extra' functionality. These can be installed using:

pip install -e '.[extra1, extra2]'

For example to install gym and chainer:

pip install -e '.[gym]'

Or to install all extras:

pip install -e '.[all]'

The following extras are available:

  • gym: OpenAI Gym is an interface to a wide range of reinforcement learning environments. Installing this extra enables the Atari example agents in samples/atari to train on the gym environments. Note that OpenAI gym atari environments are currently not available on Windows.
  • tensorflow: TensorFlow is a popular deep learning framework developed by Google. In our examples it enables visualizations through TensorBoard.

Getting started

Play the challenge task

The challenge task takes the form of a mini game, called Pig Chase. Learn about the game, and try playing it yourself on our Pig Chase Challenge page.

Run your first experiment

See how to run your first baseline experiment on the Pig Chase Challenge page.

Next steps

Run an experiment in Docker on Azure

Docker is a virtualization platform that makes it easy to deploy software with all its dependencies. We use docker to run experiments locally or in the cloud. Details on how to run an example experiment using docker are in the docker README.

Resources

DP-CL(Continual Learning with Differential Privacy)

DP-CL(Continual Learning with Differential Privacy) This is the official implementation of the Continual Learning with Differential Privacy. If you us

Phung Lai 3 Nov 04, 2022
Active and Sample-Efficient Model Evaluation

Active Testing: Sample-Efficient Model Evaluation Hi, good to see you here! 👋 This is code for "Active Testing: Sample-Efficient Model Evaluation". P

Jannik Kossen 19 Oct 30, 2022
A simple software for capturing human body movements using the Kinect camera.

KinectMotionCapture A simple software for capturing human body movements using the Kinect camera. The software can seamlessly save joints and bones po

Aleksander Palkowski 5 Aug 13, 2022
M3DSSD: Monocular 3D Single Stage Object Detector

M3DSSD: Monocular 3D Single Stage Object Detector Setup pytorch 0.4.1 Preparation Download the full KITTI detection dataset. Then place a softlink (or

mumianyuxin 64 Dec 27, 2022
Official source code to CVPR'20 paper, "When2com: Multi-Agent Perception via Communication Graph Grouping"

When2com: Multi-Agent Perception via Communication Graph Grouping This is the PyTorch implementation of our paper: When2com: Multi-Agent Perception vi

34 Nov 09, 2022
Code for A Volumetric Transformer for Accurate 3D Tumor Segmentation

VT-UNet This repo contains the supported pytorch code and configuration files to reproduce 3D medical image segmentaion results of VT-UNet. Environmen

Himashi Amanda Peiris 114 Dec 20, 2022
Official PyTorch Implementation of paper EAN: Event Adaptive Network for Efficient Action Recognition

Official PyTorch Implementation of paper EAN: Event Adaptive Network for Efficient Action Recognition

TianYuan 27 Nov 07, 2022
Repository relating to the CVPR21 paper TimeLens: Event-based Video Frame Interpolation

TimeLens: Event-based Video Frame Interpolation This repository is about the High Speed Event and RGB (HS-ERGB) dataset, used in the 2021 CVPR paper T

Robotics and Perception Group 544 Dec 19, 2022
Code release to accompany paper "Geometry-Aware Gradient Algorithms for Neural Architecture Search."

Geometry-Aware Gradient Algorithms for Neural Architecture Search This repository contains the code required to run the experiments for the DARTS sear

18 May 27, 2022
Use tensorflow to implement a Deep Neural Network for real time lane detection

LaneNet-Lane-Detection Use tensorflow to implement a Deep Neural Network for real time lane detection mainly based on the IEEE IV conference paper "To

MaybeShewill-CV 1.9k Jan 08, 2023
This code finds bounding box of a single human mouth.

This code finds bounding box of a single human mouth. In comparison to other face segmentation methods, it is relatively insusceptible to open mouth conditions, e.g., yawning, surgical robots, etc. T

iThermAI 4 Nov 27, 2022
NeWT: Natural World Tasks

NeWT: Natural World Tasks This repository contains resources for working with the NeWT dataset. ❗ At this time the binary tasks are not publicly avail

Visipedia 26 Oct 18, 2022
This folder contains the python code of UR5E's advanced forward kinematics model.

This folder contains the python code of UR5E's advanced forward kinematics model. By entering the angle of the joint of UR5e, the detailed coordinates of up to 48 points around the robot arm can be c

Qiang Wang 4 Sep 17, 2022
RetinaFace: Deep Face Detection Library in TensorFlow for Python

RetinaFace is a deep learning based cutting-edge facial detector for Python coming with facial landmarks.

Sefik Ilkin Serengil 512 Dec 29, 2022
DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences (PacBio) Circular Consensus Sequencing (CCS) data.

DeepConsensus DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences (PacBio) Circular Consensus Sequencing (CCS)

Google 149 Dec 19, 2022
PyTorch implementation of paper: AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer, ICCV 2021.

AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer [Paper] [PyTorch Implementation] [Paddle Implementation] Overview This reposit

148 Dec 30, 2022
Leaf: Multiple-Choice Question Generation

Leaf: Multiple-Choice Question Generation Easy to use and understand multiple-choice question generation algorithm using T5 Transformers. The applicat

Kristiyan Vachev 62 Dec 20, 2022
Display, filter and search log messages in your terminal

Textualog Display, filter and search logging messages in the terminal. This project is powered by rich and textual. Some of the ideas and code in this

Rik Huygen 24 Dec 10, 2022
potpourri3d - An invigorating blend of 3D geometry tools in Python.

A Python library of various algorithms and utilities for 3D triangle meshes and point clouds. Managed by Nicholas Sharp, with new tools added lazily as needed. Currently, mainly bindings to C++ tools

Nicholas Sharp 295 Jan 05, 2023
Official implementation of "Intrinsic Dimension, Persistent Homology and Generalization in Neural Networks", NeurIPS 2021.

PHDimGeneralization Official implementation of "Intrinsic Dimension, Persistent Homology and Generalization in Neural Networks", NeurIPS 2021. Overvie

Tolga Birdal 13 Nov 08, 2022